Running GPUs
Experience
2023 — 2026
New York, NY
Owned and authored highly parallel, performance optimized kernels that implements number theoretical algorithms on a tile-based accelerator.
Kernels included both compute bound and memory bound algorithms in high throughput and low latency settings such as FFTs, Merkle Trees, Bit-reversals, field arithmetic hash functions, etc.
Optimized kernels at the workload, algorithm, and instruction level by profiling data layouts, occupancies, instruction stalls, sync barriers, etc. and maximizing resource usage at the hardware level.
Optimizations included zero copy, reducing DRAM accesses, memory coalesces, hypercube swaps.
Worked with the infra team to build the acceleration stack from the bottom up, including profilers, debuggers, and kernel features such as dynamic input sizes, input/output metadata, declarative specifications of memory access patterns, sub-kernel calls, etc.
Worked with the hardware team to support chip bringup with RTL test suites and debugging on silicon.
2022 — 2023
2022 — 2023
San Francisco, California, United States
Contributed to the Motoko smart contract compiler that lowers asynchronous, actor-based, functional programs down to WebAssembly modules that execute on the Internet Computer blockchain network.
Integrated compiler optimizations at the WebAssembly level to improve virtual execution speed by ~10% and reduce binary size by ~15%.
Modified compiler backend to allow direct conversions between fixed-width numerical types.
Wrote WebAssembly level transformation pass to redirect remote function calls between smart contracts.
Introduced language syntax for binding static patterns to dynamic values with an optional fail handler.
2021 — 2021
Philadelphia, Pennsylvania, United States
Topics included asymptotic algorithm efficiency, proof of correctness, deterministic and randomized runtime analysis, implementation, programming style, and testing.
2020 — 2020
Worked under Dr. Steve Zdancewic on the Interaction Trees library as part of the Vellvm LLVM compiler project. Concentrated on researching monad transformers that could model the undefined and non-deterministic behavior of LLVM programs.
Proved monadic laws for a variety of monad transformers using the Coq proof assistant.
Focused on experimenting with variations of a non-determinism (Prop) monad that allows the modeling of non-deterministic program execution.
Variations included a traditional monadic definition, a parametrized equivalence relation, and a category theoretic definition based on functors.
https://github.com/DeepSpec/InteractionTrees/
2019 — 2019
Irvine, CA
Interned on the Global Data Platform team to work on their telemetry data pipeline.
Architected and implemented an ingest point for data pipelines, using Scala and Akka, that receives events, compresses it into Protobuf, and publishes the payload to a downstream Kafka broker.
Refactored and integrated a library that calculates and reports data velocity through the pipelines to allow rate-limiting on specific messages/data-producers.
Built and deployed a backend server to allow components in the pipeline to query a MySQL database, storing information on data producers.
Used Docker, Jenkins, and Terraform to continuously integrate highly available, containerized, and scalable services across data centers in multiple regions.
Education
University of Pennsylvania
Master of Science in Engineering
2020 — 2022
University of Pennsylvania
Bachelor of Science in Engineering
2017 — 2021