Mountain View, California, United States
• Implemented the Field Accounting on Sawmill importer which is a petabyte scale pipeline that produces metrics on what data is stored within Sawmill that enabled our customers to have much more accurate data insights, with lower latency. The new pipeline produced unsampled Field Stats (~100x more data covered as we go from a 1% sample to 100%) at no additional resource cost and with lower latency. Time to stats was reduced from 24-72+ hours to P50 latency of 5 minutes, P99 of 2 hours. The project won Silver in 2024 Q3 Perfys.
• Worked on extracting metrics for PTokens across Sawmill logs used to power the Sawmill DMA central data insight dashboard that helped deliver DMA5(2) compliance on all user activity Logs data across all PAs, amounting to XX exabytes. The project won Silver in 2023 Q3 Healthys and the Core Tech Impact Award for H1’24.
• Worked on several projects that helped improve privacy infra for Sawmill logs including migrating from parsing colossus logs to using Ganpati and IoLAR to detect last usage and revoke it for direct log reader clients leading to unused access revocation for 1000+ clients across Google, migration to Grackle expanded permission column, etc.