Greater Toronto Area, Canada
• Manage a team of 4 data scientists and engineers for all data science needs for Cybersecurity, Fraud, Physical, Investigations, Supply Chain & Enterprise Resiliency.
• Setting up the Analytics Data Lake (Tools used - S3, Spark, PostGres Database, Airflow, Containers, Kubernetes) to incorporate data from multiple internal & cloud-based sources within Security Organization using APIs. Further, set up Spark Cluster environment using 60 Core, 300Gb RAM environment for processing security and email data each day.
• Classical imbalanced fraud risk scoring model refreshed hourly using PySpark for data ingestion & Tree based model building & deployment on Docker/Kubernetes. (Prec 41%, Rec 95%, $1M + saved).
• Identified characteristics of employees susceptible to phishing based on phishing simulations data (Prec 83% & Rec 89%). Logistic regression model used, and results shared on Power BI.
• Applied Anomaly Detection techniques to identify anomalous traffic using Producer Consumer Ratio, Session & Volume of traffic across network.
• Created infection risk KPI across countries and provided time to return using covid public datasets (RMSE 10 days). The model was shared across Security Executive Council and 500 companies across the globe.