Sathwick Kiran - Data Science Portfolio

About

About Me

Hey! I am a data professional with 2 years of experience at Sony and an MS in Data Science from the University at Buffalo. My work spans data engineering, analytics, and applied machine learning. I build ETL pipelines, Power BI dashboards, and reporting solutions that cross-functional teams use to make decisions.

At Sony I worked on Python and SQL data workflows across 5+ internal teams, automated reporting pipelines, and put in place data quality checks that improved consistency and reduced manual effort by 15%. I also worked with AWS-based workflows and CI/CD automation, focusing on building reliable data infrastructure that improved reporting efficiency and stakeholder decision-making.

My academic and project work blends data engineering with advanced analytics, from building a LangGraph multi-agent AI system for financial market analysis to processing 3.3M+ records using PySpark and NLP on Hadoop, and building a containerized DataOps observability platform with Kafka, Airflow, dbt, and Tableau.

Outside of data and AI, I'm a drummer, a big cricket fan, and slightly obsessed with Formula 1. You'll usually find me watching qualifying sessions on weekends.

Capabilities

Skills & Certifications

Programming

Python SQL PySpark R Scala PyTorch

Data Science & ML

Machine Learning NLP Statistical Analysis Feature Engineering Scikit-learn Pandas EDA PyTorch

BI & Visualization

Power BI Tableau Excel Apache Superset

Data Engineering & Cloud

ETL / ELT Data Pipelines dbt Airflow Kafka Hadoop Apache Spark AWS Azure Databricks Docker Snowflake

Tools & Platforms

GitLab GitHub Jira CI/CD PostgreSQL LangGraph FastAPI Generative AI Prompt Engineering

Certifications

Azure Machine Learning for Data Scientist

Microsoft Azure

2026

Azure ML: Model Training & Evaluation

Microsoft Azure

2026

Databricks Fundamentals

Databricks

2025

Snowflake Data Warehousing

Snowflake

2025

AWS Cloud Practitioner

Amazon Web Services

2023

Career

Experience

Feb 2026 to Present

Data Science Intern

Youro, LLC · Remote

Built Python/SQL/Azure data pipelines as a founding member of the data team, consolidating patient data into one central store at 99%+ data quality and cutting manual prep time by 60%.
Developed Azure ML severity scoring models returning results in under 100ms, routing patients to the right specialist before they book.
Designed a RAG system using Azure services grounding patient guidance in trusted urology guidelines, meaningfully reducing unreliable responses and improving answer accuracy.
Built Tableau dashboards tracking triage, booking, and engagement across 5+ KPIs, presenting directly to the CEO to drive early product decisions.

Technologies & Tools

Python SQL Azure Azure ML RAG Tableau

Jul 2025 to Dec 2025

Student Associate

University at Buffalo-SUNY

Buffalo, NY

Aug 2022 to Apr 2024

Software Engineer(Data Engineering and Analytics)

Sony India Software Centre · Bengaluru

Engineered batch ETL pipelines in Python and SQL on AWS across 5+ internal data feeds, cutting manual reporting effort by 15%.
Implemented schema validation, null checks, and duplicate detection, improving downstream data reliability across reporting datasets consumed by 3 business units.
Developed analytics-ready datasets feeding Power BI and Tableau dashboards for product, finance, and engineering teams, reducing data-request-to-insight turnaround by 20%.
Integrated GitLab CI/CD pipelines with Jira APIs to automate issue tracking and workflow updates, reducing cross-team coordination effort by 20%.
Built a GitLab API integrated dashboard automating Wiki access requests and manager approval routing, cutting manual documentation coordination by 30%.

Technologies & Tools

Python SQL AWS GitLab CI/CD Power BI Tableau ETL Pipelines Jira

Feb 2022 to May 2022

Software Intern

LMARKS · Bengaluru

Collected, cleaned, and organized 500+ service and customer records using Excel and Python to support weekly reporting.
Prepared reports across 2,000+ customer entries to track service requests, job status, and recurring issue categories.
Analyzed 3+ years of historical installation, service, and customer data to identify follow-up gaps and support CRM process improvements for a business serving 2,000+ customers.

Technologies & Tools

Python Excel SQL Reporting CRM Git

Portfolio

Projects

DataOps Change Impact & Pipeline Failure Observability Platform

Architected and deployed an end-to-end DataOps observability platform on GCP using Docker Compose, engineering real-time pipeline monitoring from Kafka streaming through Airflow orchestration and dbt transformations across 2,185+ events and 7 data models. Built a RAG-based Failure Intelligence engine using FastAPI and Google Gemini AI to answer natural language queries on pipeline health, SLA breaches, and schema change blast radius, deployed with a live Chat UI on GCP. Designed 3 Grafana dashboards backed by dbt mart models with an Airflow DAG orchestrating automated breach detection, schema change impact scoring, and pipeline summary logging.

Python Kafka Airflow dbt PostgreSQL FastAPI Grafana GCP Gemini AI

Code

RideFlow Analytics: NYC Taxi Demand Forecasting with MLOps Pipeline

Engineered an end-to-end MLOps pipeline on AWS ingesting 24 months of NYC Yellow Taxi Parquet data via automated Lambda triggers into a structured S3 hierarchy, with Glue jobs handling transformation and Athena enabling serverless querying. Built and tuned two LightGBM demand forecasting models using 28-day lag features and temporal aggregates, tracking all experiments with MLflow to eliminate training loss across restarts and achieving MAE of 2.895 and MAPE of 9.23%. Deployed a Streamlit app on AWS Elastic Beanstalk serving 200+ concurrent users, with RDS schema indexing on frequently joined columns cutting query latency by 25% and improving real-time dashboard responsiveness.

Python LightGBM Hopsworks Confluent Kafka MLflow GitHub Actions Streamlit GeoPandas

Sentiment Wars: ML & Transformer Benchmarking for Sentiment Classification

Built an end-to-end sentiment classification pipeline on Amazon review data, benchmarking Logistic Regression, Random Forest, SVM, and Naïve Bayes against fine-tuned DistilBERT and DeBERTa v3 base transformer models, achieving up to 93.5% accuracy. Engineered text features using TF-IDF vectorization, n-gram analysis, and polarity scoring, with GridSearchCV hyperparameter tuning and all runs tracked via MLflow integrated with DAGsHub for full experiment reproducibility. Deployed a live Streamlit app serving the fine-tuned DeBERTa model from Hugging Face Hub with real-time confidence scoring, and built a Power BI dashboard for sentiment trend analysis.

Python Scikit-learn DistilBERT DeBERTa TF-IDF MLflow DAGsHub Streamlit Power BI

AI Multi-Agent System for Market Intelligence (Synapse Street)

Led a team of 4 to build a LangGraph multi-agent system on 5GB of stock market data, flagging short-selling opportunities and generating readable investment summaries end-to-end in a 24-hour UB hackathon. Developed a Scikit-learn pipeline on OHLCV features with Qdrant vector search and sentence transformer embeddings, retrieving contextually similar market signals at Precision@10 of 0.60. Designed Streamlit and Tableau dashboards to surface model scores, agent findings, and backtest results for non-technical stakeholders.

LangGraph Python Qdrant Scikit-learn Hadoop HDFS Streamlit Tableau Vultr Cloud

Code Dashboard

Scalable Big Data Analytics using Hadoop, Spark and NLP

Designed and implemented a scalable pipeline for processing and analyzing 3.3M+ Amazon book reviews. Leveraged Hadoop (MapReduce, HDFS) and Apache Spark / PySpark for distributed data processing. Applied TF-IDF vectorization and NLP techniques for sentiment classification, rating prediction, and helpfulness scoring, achieving 95% accuracy post hyperparameter tuning via PySpark MLlib.

PySpark Hadoop Apache Spark NLP / TF-IDF Scikit-learn Python Docker

Code

Osteoporosis Risk Prediction using Machine Learning

Built a predictive modeling solution using demographic, lifestyle, and health data to assess osteoporosis risk and support clinical insights. Prepared 1,958 patient records, engineered features, and trained Logistic Regression, Random Forest, Decision Tree, and SVC models. Evaluated 4 models using accuracy, precision, recall, and F1 score, with Decision Tree achieving 90% accuracy and surfacing key risk factors.

Python Scikit-learn Pandas NumPy Random Forest Decision Tree

Code

Cryptocurrency Price Forecasting and Analysis using Machine Learning

Developed a machine learning pipeline to predict and analyze Bitcoin price fluctuations using large-scale financial datasets. The workflow involved extensive statistical analysis, feature engineering, and model experimentation to capture temporal price patterns. Containerized using Docker and deployed on AWS for scalability.

Python Scikit-learn Pandas Matplotlib Docker AWS

Code

Online Music Store Database System: Design, Normalization, and Query Optimization

Developed a relational database system for an online music store to manage sales, customers, and inventory efficiently. Designed and implemented an Entity Relationship (E/R) model and created 16 normalized tables (post-BCNF decomposition) with indexing strategies for performance optimization.

SQL PostgreSQL E/R Design Indexing BCNF

Code

Stock Market Analysis and Prediction using Machine Learning (NVDA Case Study)

Worked with NVIDIA (NVDA) stock price data to perform data cleaning, feature engineering, visualization, and predictive modeling. Applied KMeans clustering and developed predictive models using Linear Regression, Ridge Regression, and SVM, achieving R² scores of 1.000 and 0.997.

Python Pandas Scikit-learn Matplotlib KMeans SVM

Code

BMSCE_ONE: Centralized University Platform with ML Chatbot

Built a centralized digital platform to unify all university activities into a single hub, enabling students and faculty to seamlessly access announcements, academic schedules, and services. Implemented a machine learning based chatbot to handle FAQs and provide instant support.

Flutter Python Flask MongoDB ML Chatbot Android

Code

Indoor Navigation Using Augmented Reality

Developed an augmented reality based indoor navigation application to help students and visitors navigate campus buildings. Leveraged Unity with C# and LiDAR technology to build precise 3D maps, implementing Dijkstra's algorithm for accurate navigation. Nominated for Best Final Project award.

C# Unity Dijkstra's Algorithm AWS AR LiDAR

Private Repository

I'm Sathwick Kiran.I build data and AI systems that drive business decisions.

About Me

Skills & Certifications

Programming

Data Science & ML

BI & Visualization

Data Engineering & Cloud

Tools & Platforms

Certifications

Experience

Data Science Intern

Technologies & Tools

Student Associate

Buffalo, NY

Software Engineer(Data Engineering and Analytics)

Technologies & Tools

Software Intern

Technologies & Tools

Projects

DataOps Change Impact & Pipeline Failure Observability Platform

RideFlow Analytics: NYC Taxi Demand Forecasting with MLOps Pipeline

Sentiment Wars: ML & Transformer Benchmarking for Sentiment Classification

AI Multi-Agent System for Market Intelligence (Synapse Street)

Scalable Big Data Analytics using Hadoop, Spark and NLP

Osteoporosis Risk Prediction using Machine Learning

Cryptocurrency Price Forecasting and Analysis using Machine Learning

Online Music Store Database System: Design, Normalization, and Query Optimization

Stock Market Analysis and Prediction using Machine Learning (NVDA Case Study)

BMSCE_ONE: Centralized University Platform with ML Chatbot

Indoor Navigation Using Augmented Reality

Education

Master of Science in Data Science

Coursework & Tools

Bachelor of Technology, Computer Science

Technologies & Tools

Get In Touch