Overview:
• Architected and implemented a novel end-to-end data pipeline and visualization platform to analyze off-target gene editing effects of proprietary Cas proteins.
Key Contributions:
• ETL workflows on AWS services to process mutations from CRISPResso2 into clean, actionable data. Established comprehensive data infrastructure using AWS + Terraform.
• Web-based analytics platform using Streamlit, integrating interactive Plotly visualizations and a customized genome browser (Gosling). Used by scientists to analyze off-target effects and editing efficiency metrics, providing insights on amplicons from genomic scale down to base-pair resolution.
• Key support to scientists, including Jupyter workflows and D3 visualizations to track experimental sample metadata.
• Automated deployment workflows using GitHub Actions, ensuring reliable, scalable updates to data processing infrastructure and to maintain code quality.
Technologies Used:
• React/Python/SQL, AWS (S3, Athena, Glue, CloudFormation), Docker, Terraform, GitHub Actions, Streamlit, Material UI, Plotly, D3.js, Gosling, Boto3, Pydantic, CRISPResso2, batch processing, step functions