Greater New York City Area
Individual Contributor and Tech Lead for Analytics Team. Design and implementation of big data workflows,various analytical models to assess cyber security risk, and low latency distributed systems.
• Manage a team of local/remote engineers
• Work with Data Scientists to implement various models such as predictive risk scoring, breach risk probability, portfolio based insights, and anomaly detections. Responsible for data engineering, model tuning, and productionizing of models in Spark.
• Contributed knowledge of Spark Tuning, HDFS, optimized formats such as ORC, Parquet to team. Wrote custom UDFs, UDAFs, ML pipelines for Spark. Profiled, tuned, and refactored legacy Spark code to reduce runtimes and decrease shuffles. Implemented analytic model workflows from custom repositories to containerized deployments.
• Responsible for architecture and implementation of Fast/Slow data flows over data lake, including development and maintenance of various tools to support workflow management, allowing for ingestion of TB’s of data daily. Designed and implemented Blue/Green data flow to ensure consistency of platform during batch updates. Maintained and tuned Presto to reduce long tail query times by 93%, allowing for low latency queries over very large dataset with avg. latency less than 200ms.
• Worked closely with Sales and Customer relation teams to understand both specific customer and population trends, providing root cause analysis and insights based on data collected.