Data Science · Analytics · Engineering

I'm Sathwick Kiran.I build data and AI systems that drive business decisions.

Data professional exploring AI, ML, and big data solutions, building pipelines, models, and dashboards that help teams move faster with better information.

Authorized to work in the US  ·  Open to relocation
0
Years Experience
0
Projects Shipped
0
Records Processed
0
Certifications
About

About Me

Sathwick Kiran

Hey! I am a data professional with 2 years of experience at Sony and an MS in Data Science from the University at Buffalo. My work spans data engineering, analytics, and applied machine learning. I build ETL pipelines, Power BI dashboards, and reporting solutions that cross-functional teams use to make decisions.

At Sony I worked on Python and SQL data workflows across 5+ internal teams, automated reporting pipelines, and put in place data quality checks that improved consistency and reduced manual effort by 15%. I also worked with AWS-based workflows and CI/CD automation, focusing on building reliable data infrastructure that improved reporting efficiency and stakeholder decision-making.

My academic and project work blends data engineering with advanced analytics, from building a LangGraph multi-agent AI system for financial market analysis to processing 3.3M+ records using PySpark and NLP on Hadoop, and building a containerized DataOps observability platform with Kafka, Airflow, dbt, and Tableau.

Outside of data and AI, I'm a drummer, a big cricket fan, and slightly obsessed with Formula 1. You'll usually find me watching qualifying sessions on weekends.

Capabilities

Skills & Certifications

Programming

Python SQL PySpark R Scala PyTorch

Data Science & ML

Machine Learning NLP Statistical Analysis Feature Engineering Scikit-learn Pandas EDA PyTorch

BI & Visualization

Power BI Tableau Excel Apache Superset

Data Engineering & Cloud

ETL / ELT Data Pipelines dbt Airflow Kafka Hadoop Apache Spark AWS Azure Databricks Docker Snowflake

Tools & Platforms

GitLab GitHub Jira CI/CD PostgreSQL LangGraph FastAPI Generative AI Prompt Engineering

Certifications

Azure Machine Learning for Data Scientist
Microsoft Azure
2026
Azure ML: Model Training & Evaluation
Microsoft Azure
2026
Databricks Fundamentals
Databricks
2025
Snowflake Data Warehousing
Snowflake
2025
AWS Cloud Practitioner
Amazon Web Services
2023
Career

Experience

Feb 2026 to Present

Data Science Intern

Youro, LLC · Remote
  • Built Python/SQL/Azure data pipelines as a founding member of the data team, consolidating patient data into one central store at 99%+ data quality and cutting manual prep time by 60%.
  • Developed Azure ML severity scoring models returning results in under 100ms, routing patients to the right specialist before they book.
  • Designed a RAG system using Azure services grounding patient guidance in trusted urology guidelines, meaningfully reducing unreliable responses and improving answer accuracy.
  • Built Tableau dashboards tracking triage, booking, and engagement across 5+ KPIs, presenting directly to the CEO to drive early product decisions.

Technologies & Tools

Python SQL Azure Azure ML RAG Tableau
Jul 2025 to Dec 2025

Student Associate

University at Buffalo-SUNY

Buffalo, NY

Aug 2022 to Apr 2024

Software Engineer(Data Engineering and Analytics)

Sony India Software Centre · Bengaluru
  • Engineered batch ETL pipelines in Python and SQL on AWS across 5+ internal data feeds, cutting manual reporting effort by 15%.
  • Implemented schema validation, null checks, and duplicate detection, improving downstream data reliability across reporting datasets consumed by 3 business units.
  • Developed analytics-ready datasets feeding Power BI and Tableau dashboards for product, finance, and engineering teams, reducing data-request-to-insight turnaround by 20%.
  • Integrated GitLab CI/CD pipelines with Jira APIs to automate issue tracking and workflow updates, reducing cross-team coordination effort by 20%.
  • Built a GitLab API integrated dashboard automating Wiki access requests and manager approval routing, cutting manual documentation coordination by 30%.

Technologies & Tools

Python SQL AWS GitLab CI/CD Power BI Tableau ETL Pipelines Jira
Feb 2022 to May 2022

Software Intern

LMARKS · Bengaluru
  • Collected, cleaned, and organized 500+ service and customer records using Excel and Python to support weekly reporting.
  • Prepared reports across 2,000+ customer entries to track service requests, job status, and recurring issue categories.
  • Analyzed 3+ years of historical installation, service, and customer data to identify follow-up gaps and support CRM process improvements for a business serving 2,000+ customers.

Technologies & Tools

Python Excel SQL Reporting CRM Git
Portfolio

Projects

DataOps Change Impact & Pipeline Failure Observability Platform

Architected and deployed an end-to-end DataOps observability platform on GCP using Docker Compose, engineering real-time pipeline monitoring from Kafka streaming through Airflow orchestration and dbt transformations across 2,185+ events and 7 data models. Built a RAG-based Failure Intelligence engine using FastAPI and Google Gemini AI to answer natural language queries on pipeline health, SLA breaches, and schema change blast radius, deployed with a live Chat UI on GCP. Designed 3 Grafana dashboards backed by dbt mart models with an Airflow DAG orchestrating automated breach detection, schema change impact scoring, and pipeline summary logging.

Python Kafka Airflow dbt PostgreSQL FastAPI Grafana GCP Gemini AI

RideFlow Analytics: NYC Taxi Demand Forecasting with MLOps Pipeline

Engineered an end-to-end MLOps pipeline on AWS ingesting 24 months of NYC Yellow Taxi Parquet data via automated Lambda triggers into a structured S3 hierarchy, with Glue jobs handling transformation and Athena enabling serverless querying. Built and tuned two LightGBM demand forecasting models using 28-day lag features and temporal aggregates, tracking all experiments with MLflow to eliminate training loss across restarts and achieving MAE of 2.895 and MAPE of 9.23%. Deployed a Streamlit app on AWS Elastic Beanstalk serving 200+ concurrent users, with RDS schema indexing on frequently joined columns cutting query latency by 25% and improving real-time dashboard responsiveness.

Python LightGBM Hopsworks Confluent Kafka MLflow GitHub Actions Streamlit GeoPandas

Sentiment Wars: ML & Transformer Benchmarking for Sentiment Classification

Built an end-to-end sentiment classification pipeline on Amazon review data, benchmarking Logistic Regression, Random Forest, SVM, and Naïve Bayes against fine-tuned DistilBERT and DeBERTa v3 base transformer models, achieving up to 93.5% accuracy. Engineered text features using TF-IDF vectorization, n-gram analysis, and polarity scoring, with GridSearchCV hyperparameter tuning and all runs tracked via MLflow integrated with DAGsHub for full experiment reproducibility. Deployed a live Streamlit app serving the fine-tuned DeBERTa model from Hugging Face Hub with real-time confidence scoring, and built a Power BI dashboard for sentiment trend analysis.

Python Scikit-learn DistilBERT DeBERTa TF-IDF MLflow DAGsHub Streamlit Power BI

AI Multi-Agent System for Market Intelligence (Synapse Street)

Led a team of 4 to build a LangGraph multi-agent system on 5GB of stock market data, flagging short-selling opportunities and generating readable investment summaries end-to-end in a 24-hour UB hackathon. Developed a Scikit-learn pipeline on OHLCV features with Qdrant vector search and sentence transformer embeddings, retrieving contextually similar market signals at Precision@10 of 0.60. Designed Streamlit and Tableau dashboards to surface model scores, agent findings, and backtest results for non-technical stakeholders.

LangGraph Python Qdrant Scikit-learn Hadoop HDFS Streamlit Tableau Vultr Cloud

Scalable Big Data Analytics using Hadoop, Spark and NLP

Designed and implemented a scalable pipeline for processing and analyzing 3.3M+ Amazon book reviews. Leveraged Hadoop (MapReduce, HDFS) and Apache Spark / PySpark for distributed data processing. Applied TF-IDF vectorization and NLP techniques for sentiment classification, rating prediction, and helpfulness scoring, achieving 95% accuracy post hyperparameter tuning via PySpark MLlib.

PySpark Hadoop Apache Spark NLP / TF-IDF Scikit-learn Python Docker

Osteoporosis Risk Prediction using Machine Learning

Built a predictive modeling solution using demographic, lifestyle, and health data to assess osteoporosis risk and support clinical insights. Prepared 1,958 patient records, engineered features, and trained Logistic Regression, Random Forest, Decision Tree, and SVC models. Evaluated 4 models using accuracy, precision, recall, and F1 score, with Decision Tree achieving 90% accuracy and surfacing key risk factors.

Python Scikit-learn Pandas NumPy Random Forest Decision Tree

Cryptocurrency Price Forecasting and Analysis using Machine Learning

Developed a machine learning pipeline to predict and analyze Bitcoin price fluctuations using large-scale financial datasets. The workflow involved extensive statistical analysis, feature engineering, and model experimentation to capture temporal price patterns. Containerized using Docker and deployed on AWS for scalability.

Python Scikit-learn Pandas Matplotlib Docker AWS

Online Music Store Database System: Design, Normalization, and Query Optimization

Developed a relational database system for an online music store to manage sales, customers, and inventory efficiently. Designed and implemented an Entity Relationship (E/R) model and created 16 normalized tables (post-BCNF decomposition) with indexing strategies for performance optimization.

SQL PostgreSQL E/R Design Indexing BCNF

Stock Market Analysis and Prediction using Machine Learning (NVDA Case Study)

Worked with NVIDIA (NVDA) stock price data to perform data cleaning, feature engineering, visualization, and predictive modeling. Applied KMeans clustering and developed predictive models using Linear Regression, Ridge Regression, and SVM, achieving R² scores of 1.000 and 0.997.

Python Pandas Scikit-learn Matplotlib KMeans SVM

BMSCE_ONE: Centralized University Platform with ML Chatbot

Built a centralized digital platform to unify all university activities into a single hub, enabling students and faculty to seamlessly access announcements, academic schedules, and services. Implemented a machine learning based chatbot to handle FAQs and provide instant support.

Flutter Python Flask MongoDB ML Chatbot Android

Indoor Navigation Using Augmented Reality

Developed an augmented reality based indoor navigation application to help students and visitors navigate campus buildings. Leveraged Unity with C# and LiDAR technology to build precise 3D maps, implementing Dijkstra's algorithm for accurate navigation. Nominated for Best Final Project award.

C# Unity Dijkstra's Algorithm AWS AR LiDAR
Academics

Education

Aug 2024 to Dec 2025

Master of Science in Data Science

University at Buffalo, State University of New York

Specialized in Advanced Machine Learning, Advanced Data Analytics, Statistical Analysis, and Big Data Applications.

Coursework & Tools

Python Machine Learning R Statistical Analysis Big Data Deep Learning LLM Data Intensive Computing MLOps SQL Data Visualization
Aug 2018 to May 2022

Bachelor of Technology, Computer Science

B.M.S. College of Engineering · Bengaluru

Specialized in full stack development, Operating Systems, Advanced Data Structures and Algorithms.

Technologies & Tools

Full Stack Data Structures Algorithms Augmented Reality Unity Python C++ Java C#
Contact

Get In Touch

Open to full-time roles in Data Analytics, Data Engineering, Analytics Engineering, and Data Science. Let's connect.

sathwickkiran04@gmail.com
+1 (716) 359-1432
New York, United States
Authorized to work in the US  ·  Open to relocation