Data professional exploring AI, ML, and big data solutions, building pipelines, models, and dashboards that help teams move faster with better information.
Hey! I am a data professional with 2 years of experience at Sony and an MS in Data Science from the University at Buffalo. My work spans data engineering, analytics, and applied machine learning. I build ETL pipelines, Power BI dashboards, and reporting solutions that cross-functional teams use to make decisions.
At Sony I worked on Python and SQL data workflows across 5+ internal teams, automated reporting pipelines, and put in place data quality checks that improved consistency and reduced manual effort by 15%. I also worked with AWS-based workflows and CI/CD automation, focusing on building reliable data infrastructure that improved reporting efficiency and stakeholder decision-making.
My academic and project work blends data engineering with advanced analytics, from building a LangGraph multi-agent AI system for financial market analysis to processing 3.3M+ records using PySpark and NLP on Hadoop, and building a containerized DataOps observability platform with Kafka, Airflow, dbt, and Tableau.
Outside of data and AI, I'm a drummer, a big cricket fan, and slightly obsessed with Formula 1. You'll usually find me watching qualifying sessions on weekends.
Architected and deployed an end-to-end DataOps observability platform on GCP using Docker Compose, engineering real-time pipeline monitoring from Kafka streaming through Airflow orchestration and dbt transformations across 2,185+ events and 7 data models. Built a RAG-based Failure Intelligence engine using FastAPI and Google Gemini AI to answer natural language queries on pipeline health, SLA breaches, and schema change blast radius, deployed with a live Chat UI on GCP. Designed 3 Grafana dashboards backed by dbt mart models with an Airflow DAG orchestrating automated breach detection, schema change impact scoring, and pipeline summary logging.
Engineered an end-to-end MLOps pipeline on AWS ingesting 24 months of NYC Yellow Taxi Parquet data via automated Lambda triggers into a structured S3 hierarchy, with Glue jobs handling transformation and Athena enabling serverless querying. Built and tuned two LightGBM demand forecasting models using 28-day lag features and temporal aggregates, tracking all experiments with MLflow to eliminate training loss across restarts and achieving MAE of 2.895 and MAPE of 9.23%. Deployed a Streamlit app on AWS Elastic Beanstalk serving 200+ concurrent users, with RDS schema indexing on frequently joined columns cutting query latency by 25% and improving real-time dashboard responsiveness.
Built an end-to-end sentiment classification pipeline on Amazon review data, benchmarking Logistic Regression, Random Forest, SVM, and Naïve Bayes against fine-tuned DistilBERT and DeBERTa v3 base transformer models, achieving up to 93.5% accuracy. Engineered text features using TF-IDF vectorization, n-gram analysis, and polarity scoring, with GridSearchCV hyperparameter tuning and all runs tracked via MLflow integrated with DAGsHub for full experiment reproducibility. Deployed a live Streamlit app serving the fine-tuned DeBERTa model from Hugging Face Hub with real-time confidence scoring, and built a Power BI dashboard for sentiment trend analysis.
Led a team of 4 to build a LangGraph multi-agent system on 5GB of stock market data, flagging short-selling opportunities and generating readable investment summaries end-to-end in a 24-hour UB hackathon. Developed a Scikit-learn pipeline on OHLCV features with Qdrant vector search and sentence transformer embeddings, retrieving contextually similar market signals at Precision@10 of 0.60. Designed Streamlit and Tableau dashboards to surface model scores, agent findings, and backtest results for non-technical stakeholders.
Designed and implemented a scalable pipeline for processing and analyzing 3.3M+ Amazon book reviews. Leveraged Hadoop (MapReduce, HDFS) and Apache Spark / PySpark for distributed data processing. Applied TF-IDF vectorization and NLP techniques for sentiment classification, rating prediction, and helpfulness scoring, achieving 95% accuracy post hyperparameter tuning via PySpark MLlib.
Built a predictive modeling solution using demographic, lifestyle, and health data to assess osteoporosis risk and support clinical insights. Prepared 1,958 patient records, engineered features, and trained Logistic Regression, Random Forest, Decision Tree, and SVC models. Evaluated 4 models using accuracy, precision, recall, and F1 score, with Decision Tree achieving 90% accuracy and surfacing key risk factors.
Developed a machine learning pipeline to predict and analyze Bitcoin price fluctuations using large-scale financial datasets. The workflow involved extensive statistical analysis, feature engineering, and model experimentation to capture temporal price patterns. Containerized using Docker and deployed on AWS for scalability.
Developed a relational database system for an online music store to manage sales, customers, and inventory efficiently. Designed and implemented an Entity Relationship (E/R) model and created 16 normalized tables (post-BCNF decomposition) with indexing strategies for performance optimization.
Worked with NVIDIA (NVDA) stock price data to perform data cleaning, feature engineering, visualization, and predictive modeling. Applied KMeans clustering and developed predictive models using Linear Regression, Ridge Regression, and SVM, achieving R² scores of 1.000 and 0.997.
Built a centralized digital platform to unify all university activities into a single hub, enabling students and faculty to seamlessly access announcements, academic schedules, and services. Implemented a machine learning based chatbot to handle FAQs and provide instant support.
Developed an augmented reality based indoor navigation application to help students and visitors navigate campus buildings. Leveraged Unity with C# and LiDAR technology to build precise 3D maps, implementing Dijkstra's algorithm for accurate navigation. Nominated for Best Final Project award.
Specialized in Advanced Machine Learning, Advanced Data Analytics, Statistical Analysis, and Big Data Applications.
Specialized in full stack development, Operating Systems, Advanced Data Structures and Algorithms.
Open to full-time roles in Data Analytics, Data Engineering, Analytics Engineering, and Data Science. Let's connect.