Cambridge, Massachusetts, United States
Developed an end-to-end, multi-modal data preprocessing and ML pipeline for image captioning classification. Published thesis paper, supervised by Professor Gabriel Kreiman.
Created a custom image-text dataset, generated contextual embeddings from BERT and ResNet-18, and leveraged PCA dimensionality reduction to improve efficiency. Implemented training + inference with ML classifiers (SVM, Naive Bayes, DNN) on compressed contextual embeddings, achieving 70% accuracy (competitive with SOTA) with a linear SVM compared to 40.4% with static embeddings.