• Hi!
    I'm Sofia

    A Software Engineer with Cisco Systems Inc., building models to identify apps for the Cisco Secure Firewall.
    I am interested in working on challenging problems around data!

    Download Resume

About

Who Am I?

My name is Sofia Dutta, a Software Engineer at Cisco Systems Inc. At Cisco, I am part of the Cisco Secure Firewall team working on ways to identify applications for the firewall.

I have previously worked as a technical leader at a startup, building a data quality solution called the Imersis project. Before that I worked on many Data Science and Deep Learning projects as a graduate student at UMBC.

I also worked as a Software Engineer for Tata Consultancy Services from 2010 to 2018.

Python Programming

Deep Learning

Machine Learning

Master's in Data Science with a 4.0 GPA!

Hire me
Experience

Work Experience

Technical Leader 2023 - 2021

Technical Leader at NewWave Telecom and Technologies, Inc., Woodlawn, MD, USA

  • Designed the complete system architecture, database schema, and data workflow for the Imersis data quality analytics platform.
  • Built the project's cloud infrastructure setup from scratch.
  • Built scripts to process hundreds of millions of healthcare records from the Centers for Medicare & Medicaid.
  • Created pre-processing scripts for consuming large batches of unstructured customer data.
  • Developed Apache PySpark code to compute data quality metrics for customer data.
  • Created Looker dashboard visualizations with drill-down options that "explains" why the data quality came out to a certain value.
  • Due to the lack of large quantities of data, re-designed the machine learning goal of the project into a data quality "Explainable AI" system.
  • Built a data quality system for State governments that allows them to understand where to improve their upstream data ingestion processes and helped them observe how that improves their data quality over time.
  • Reduced costs at four levels of the project:
    • Pre-processing: Built scripts that brought down the data pre-processing time from ten days to a couple of hours.
    • Uptime of cluster: Analyzed causes of high cloud expenditure and deployed Apache Airflow workflow management platform scripts to automate resource uptime only during hours of usage. Reduced cost from thousands of dollars to a couple hundred.
    • Storage v/s data transfer: Performed the cost advantage analysis of using a Google compute engine with large storage versus using more network data transfer.
    • Partially working in local servers: Built a system that handled pre-processing in our local data farm to reduce project costs from over ten thousand dollars to a few hundred in a month.
  • Quickly learned new technologies like Apache Airflow, Databricks, and Google Cloud Platform and guided team members in their technology ramp-up for the past 2 years and helped them in setting up data workflows in the cloud.
  • Provided educational expertise and mentoring to junior team members.
  • Built product feature lists with stakeholders, conducted system design sessions with other architects on the team and led code review meetings.
  • Investigated root causes for customer-found defects. Carried out several customer demonstrations to help sell the product and handled rapid prototyping and solution building for ad-hoc requirements and last-minute feature requests from the customer.
  • Advised management, business, and technical staff on the usage of specific technologies like Apache Airflow and Google Cloud Platform.

Data Scientist Intern 2020 - 2020

Data Scientist Intern at NewWave Telecom and Technologies, Inc., Woodlawn, MD, USA

  • Carried out data visualization tasks using LookML, Matplotlib, and Seaborn to present quality measures based on chosen computation metrics.
  • Successfully improved computation speed by 10-fold by deploying data analysis workflow in Google Cloud Platform (GCP) clusters and using Apache Spark for quality metrics computations.
  • Collaborated with the Product Owner and other engineers in creating mechanisms for generating fake training data using Python programming to test out the efficacy of machine learning algorithms used in the project.
  • Carried out necessary DevOps tasks for setting up Big Data Analytics environment by configuring GCP environment to execute Python programs and connected the cloud infrastructure with Looker's dashboards for delivering computed results to be presented to customers.

Researcher 2020 - 2019

Researcher at Ebiquity Group, UMBC, USA

  • Authored an Ontology for Smart Home Access Control by extending earlier research in Semantic Web.
  • Developed an Android app for handling context-sensitive access control in a Smart Home Environment.
  • Created YouTube videos for presentation to the National Institute of Standards and Technology.
  • Published a paper at the IEEE Big Data Security 2020 conference.

Software Engineer, Technical Leader 2018 - 2010

Software Developer and Data Analyst at Tata Consultancy Services (TCS) Ltd., India

  • Led the design, development, and delivery of API interfaces using PL/SQL stored procedures for several projects of TCS.
  • Carried out change based regression impact analysis, created software functional specifications, prepared test plans for several projects of TCS.
  • Performed system integration, user-acceptance and performance testing and ensured client systems had very high uptime even when carrying out data migration activities. Saved millions of dollars in potential revenue lost to the client and was awarded for said effort by clients.
My Specialty

My Skills

Python

95%

Java

80%

PyTorch

90%

Sci-kit Learn, MLlib, OpenCV

85%

Keras

80%

Tensorflow

80%

Apache Airflow

80%

PySpark

90%

TSQL, Oracle SQL, PL/SQL

95%

Oracle Apps, Oracle Fusion

90%

Google Cloud

80%

AWS S3

80%

LookML, Looker

85%

Knowledge Graph, OWL, SPARQL

75%
Education

Education

2020 - 2019
Completed my Master's in Data Science from University of Maryland, Baltimore County, Baltimore, USA. GPA: 4.0!
See transcript here.

2010 - 2006
Completed my Bachelor of Technology in Computer Science from West Bengal University of Technology, Kolkata, India with a 3.5 GPA!
See transcript here.

Captsone Project

Capstone Project

QABot: A Chatbot for Open Question Answering Using Neural Networks

QABot: A Chatbot for Open Question Answering Using Neural Networks: Built “QABot”, a Chatbot using the sequence-to-sequence Deep Learning model that utilizes the Encoder Decoder Neural Network architecture combined with Attention Mechanism to answer user search queries. Created a model by training a Deep Neural using the PyTorch Deep Learning Framework. Used Recurrent Neural Network architecture that are better at dealing with text sequences. Used both Teacher Forcing and Auto-Regressive approaches for model training and Auto-Regressive approach for model evaluation. Used BERT (Bidirectional Encoder Representations from Transformers) for tokenization and combined Transformer and GPT-2 for model fine tuning.

Capstone: QABot standard

QABot standard

Capstone: QABot using RNN

QABot using RNN

Capstone: QABot Transformer

QABot using Transformer

Data Science Projects
Client Projects
Software Certifications
Certificates & Awards
Publications

Publications

Android demos

Demos for research

Smart home controller app with rules demo

Get in Touch

Contact

sofia DOT dutta 17 AT gmail DOT com