Experience
2025 — Now
2025 — Now
New York City Metropolitan Area
Work on sensor performance on Nuro's AV compute platform
• Architect and design the Camera sensor pipeline, including software architecture, concurrency code, performance measurements, communication protocols, image tuning with HW team.
• Migrated code from a network socket communication protocol to shared memory improving latency and CPU utilization
• Wrote SW PRD for vendor contract negotiations
• Create AI workflows -- skills for sensor performance debug
2021 — 2025
2021 — 2025
Worked on cross-functional partner teams to build custom accelerator compute systems for Google Cloud
• Lead kernel driver development and testing of GPU drivers
• Drove strategy and design tooling for meeting reliability requirements (MTTR, MTBF)
• Improved data center operations by implementing OOB power throttling
• Set technical direction and mentor rapidly growing team
2019 — 2021
2019 — 2021
Worked on the CPU team to verify correct functionality and performance of ARM IP.
• Built SQLite DB and lead the team in data pipeline flow and analysis
• Analyzed CPU data to generate new insights into L3 and CPU performance and scaling into future generations
• Wrote low level ARM assembly with ability for multi-processor synchronization in bare metal environment
• Created new C library functions for interrupt handling and GIC programming
• Explored and presented instruction fetch pipeline stage improvements for ARM processors
2018 — 2019
2018 — 2019
Vancouver, WA
Worked on the development of HP printer ASICs with a focus on ASIC emulation using FPGAs. Ensured successful integration of previous ASIC blocks, designed and verified new blocks, and partitioned/synthesized modules into FPGA.
• Created a new FPGA emulation platform with an ARM CPU system connected to a outside FPGA system to verify the ASIC HW functionality and enable ARM firmware development simultaneously.
• Enhanced FW productivity by developing test code in C.
• Modified and designed blocks in ASIC and verified them using UVM and Verilog testbenches.
2017 — 2017
Team of three tasked with designing a 5-stage pipelined micro-architecture implementing the LC-3b ISA. Design was done in System Verilog and performance was measured by targeting design on an FPGA. Verification was done by running design through given assembly code and checked against an assembly simulator.
• Improved performance by 25% compared to baseline implementation by implementing a GAp dynamic branch prediction scheme with an 8 way fully associative BTB and early branch resolution.
• Increased max frequency by analyzing critical path timing and reduced cycles per instruction by trying different branch prediction schemes.
• Designed a fully working CPU datapath with forwarding and stalls in place to handle all hazards.
Education
University of Illinois Urbana-Champaign
Bachelor’s Degree
2014 — 2018
Montville Township High School
High School
2010 — 2014