• Hi!
    I'm Sofia

    A Senior Data Scientist at NewWave
    interested in working on challenging problems around data...

    Download Resume


Who Am I?

My name is Sofia Dutta, a Senior Data Scientist at NewWave Telecom & Technologies, Inc. At NewWave, I am part of a team working on Machine Learning problems in their Medicaid Data Quality Assistant (MDQA) project from the healthcare domain.

I have worked on several Data Science, Machine Learning, and Deep Learning projects during my work at NewWave and during my graduate studies at UMBC.

Previously, I used to work as a Software Developer and Data Analyst for Tata Consultancy Services from 2010 to 2018.

Python Programming

Deep Learning

Machine Learning

Master's in Data Science with a 4.0 GPA!

Hire me

Work Experience

Senior Data Scientist Present - 2021

Senior Data Scientist at NewWave Telecom and Technologies, Inc., Woodlawn, MD, USA

  • Working on the Imersis project. Performing data quality analysis on millions of healthcare records from Centers for Medicare & Medicaid. Driving system design work and making the right platform choices for NewWave’s projects.
  • Successfully improved computation speed by 10-fold by deploying data analysis workflow in Google Cloud Platform(GCP) clusters and using Apache Spark for quality metrics computations.
  • Utilizing Google Cloud Storage and BigQuery for storage and faster processing of large quantities of healthcare data.
  • Training machine learning models using thousands of rules for quality computation metrics.
  • Leading data-processing efforts, guiding new employees to quickly ramp up on data analytics and achieve project goals.
  • Using Apache Airflow orchestrated Big Data processing on GCP clusters and automated data analytics workflow for Imersis project.

Data Scientist Intern 2020 - 2020

Data Scientist Intern at NewWave Telecom and Technologies, Inc., Woodlawn, MD, USA

  • Carried out data visualization tasks using LookML, Matplotlib, and Seaborn to present data quality outcomes from various quality computation metrics.
  • Created an end-to-end architecture design and database schema design for the Imersis data quality analytics platform.
  • Collaborated with the Product Owner and other engineers in creating mechanisms for generating fake training data using Python programming to test out the efficacy of machine learning algorithms used in the project.
  • Carried out necessary DevOps tasks for setting up Big Data Analytics environment by configuring GCP environment to execute Python programs and connected the cloud infrastructure with Looker's dashboards for delivering computed results to be presented to customers.

Semantic Web Researcher 2020 - 2019

Graduate Student Researcher in Semantic Web and Smart Home Access Control, Ebiquity Group, UMBC, USA

  • Authored an Ontology for Smart Home Access Control by extending earlier research in Semantic Web.
  • Developed an Android app for handling context-sensitive access control in a Smart Home Environment.
  • Created YouTube videos for presentation to the National Institute of Standards and Technology.
  • Published a paper at the IEEE Big Data Security 2020 conference.

Software Developer and Data Analyst 2018 - 2010

Software Developer and Data Analyst at Tata Consultancy Services (TCS) Ltd., India

  • Led the design, development, and delivery management of seven projects for clients of TCS.
  • Created API interfaces using PL/SQL stored procedures for daily usage for clients of TCS.
  • Led meetings to capture requirements from DHL UK, Staples USA, Hyatt USA, Kaiser-Permanente USA.
  • Carried out change based regression analysis and documented software functional specifications.
  • Prepared test plans and executed system integration testing and user-acceptance testing.
  • Ensured client systems were up in four hours after migration activities saving millions of dollars in potential revenue lost.
  • Implemented scripts for data migration of a billion records while adhering to strict time SLA bounds.
  • Completed client data migration from legacy Oracle Apps (11i) to Oracle ERP Suite (R12).
  • From 2013 - 2018, managed continuous integration and continuous deployment in production environments.
  • Got certified in seven Oracle Apps competencies while working for client projects.
My Specialty

My Skills







Sci-kit Learn, MLlib, OpenCV






Apache Airflow






Oracle Apps, Oracle Fusion


Google Cloud




LookML, Looker


Knowledge Graph, OWL, SPARQL



2020 - 2019
Working on my Master's in Data Science from University of Maryland, Baltimore County, Baltimore, USA. GPA: 4.0!
See transcript here.

2010 - 2006
Completed my Bachelor of Technology in Computer Science from West Bengal University of Technology, Kolkata, India with a 3.5 GPA!
See transcript here.

Captsone Project

Capstone Project

QABot: A Chatbot for Open Question Answering Using Neural Networks

QABot: A Chatbot for Open Question Answering Using Neural Networks: Built “QABot”, a Chatbot using the sequence-to-sequence Deep Learning model that utilizes the Encoder Decoder Neural Network architecture combined with Attention Mechanism to answer user search queries. Created a model by training a Deep Neural using the PyTorch Deep Learning Framework. Used Recurrent Neural Network architecture that are better at dealing with text sequences. Used both Teacher Forcing and Auto-Regressive approaches for model training and Auto-Regressive approach for model evaluation. Used BERT (Bidirectional Encoder Representations from Transformers) for tokenization and combined Transformer and GPT-2 for model fine tuning.

Capstone: QABot standard

QABot standard

Capstone: QABot using RNN

QABot using RNN

Capstone: QABot Transformer

QABot using Transformer

Data Science Projects
Client Projects
Software Certifications
Certificates & Awards


Android demos

Demos for research

Smart home controller app with rules demo

Get in Touch


sofia DOT dutta 17 AT gmail DOT com

Usually can be found here...