• Developed NLU and generative models in TensorFlow to improve multilingual Siri parser accuracy for on-client domains, with a focus on scalability to low-resource languages. Designed, implemented, and deployed NLP software package for few-shot synthetic data generation of PII-free conversational language data, used by Siri teams for multilingual model training.
• Implemented language models end-to-end with data pipelines, training pipelines, on-device testing. Used Docker, AWS S3, protobuf.
Natural Language Processing Lab: Language, Information and Learning (LILY) under Dr. Dragomir Radev
• Collaborative project with Facebook AI. Researched and developed NLP models for hybrid extractive–abstractive text summarization and argument graph modeling.
• Created conversation summarization benchmark on 40,000+ dialogues by implementing clustering and summarization pipeline with PyTorch, HuggingFace Transformers, sklearn. Used pandas, NumPy for data processing. Ran on Linux, used git.