Portrait of Shamik Basu

Shamik Basu

Data Scientist & ML Engineer

Open to Summer 2026 Internships
2+ Years Production ML MS Data Science @ USC GPA 3.78 / 4.0

Projects & Impact

Production systems and research built from scratch

1st Place — UCLA SAIRS Hackathon 2025 · Sustainability Track

EcoMate-AI

April 2025

  • Built an end-to-end Streamlit application using Gemini 2.5 Flash to analyze receipts and activity logs, estimating CO₂ emissions per item and generating ranked sustainability recommendations.
  • Designed the full pipeline: OCR extraction → emission-factor mapping → GenAI recommendations → interactive Plotly dashboard.
  • Presented live to industry panelists from Microsoft, IBM, NVIDIA, Google, and LinkedIn.
Python Gemini 2.5 Flash Streamlit FastAPI Plotly OCR RAG

Custom CUDA Library for CNN Pre-Processing

Jan – Feb 2025

  • Engineered custom CUDA kernels for matrix multiplication and image convolution, benchmarking CPU, naïve CUDA, tiled CUDA (shared memory), and cuBLAS across matrix sizes up to N=2048 on NVIDIA Tesla T4 GPUs.
  • Compiled optimized kernels into .so shared libraries with Python bindings, achieving exponential throughput gains and enabling GPU acceleration inside standard data science workflows.
CUDA C++ cuBLAS Python Jupyter GPU Computing

Retail Sales Intelligence

2024

  • End-to-end retail analytics pipeline on a 10,000+ order US superstore dataset — EDA, regional trend analysis, customer segmentation, and a 90-day sales forecast.
  • Built interactive HTML report with Plotly; forecast model achieved MAE ~$1,200 on 90-day holdout.
Python pandas scikit-learn Plotly Matplotlib Jupyter

Smart Parking Optimizer

2022

  • Real-time parking slot allocation system using shortest-path algorithms on a live 2D map with tkinter GUI and SQLite backend.
  • Modular 4-layer architecture: GUI, allocation engine, authentication, and map renderer.
Python CUDA Algorithms SQLite tkinter

View all projects on GitHub →

Work History

Data Science Associate Intern Current

KCC Capital Partners · Los Angeles, CA · Jan 2026 – Present

  • Fine-tuning and integrating an open-source SLM into a production JavaScript/Docker chatbot service to automate client service request classification and routing, reducing handling overhead for the automation team.
  • Evaluating model outputs against baseline response quality benchmarks to guide iteration on prompt design and fine-tuning parameters.
Data Scientist

Bajaj Finserv Health · Pune, India · Nov 2023 – Dec 2024

  • Architected a real-time medical document analytics system using RAG, LangChain, and REST APIs — processing 5M+ records/month at 92% extraction accuracy, replacing a fully manual workflow.
  • Engineered LLM-based inference pipelines (GPT-3.5 Turbo) to automate high-complexity decision workflows, cutting operational costs by 72%.
  • Built modular monitoring pipelines with LangChain and Langfuse for model observability, reducing GPU compute utilization by 15%.
  • Integrated ML model outputs into Power BI dashboards, cutting ad-hoc reporting turnaround time by 42% and enabling self-serve analytics for stakeholders.
Associate Data Scientist

Bajaj Finserv Health · Pune, India · Jul 2022 – Oct 2023

  • Developed a supervised ML model (Logistic Regression) for workforce performance prediction, improving efficiency outcomes by 22%.
  • Redesigned the NER-based name-matching algorithm in the fraud detection pipeline, increasing policyholder identification accuracy by 27% and reducing false positives.
  • Processed 10M+ records in Azure Synapse using SQL to deliver business intelligence reports for senior stakeholders.
Data Engineer Intern

Bajaj Finserv Health · Pune, India · Jan 2022 – Jun 2022

  • Ran A/B tests and cohort analyses identifying key user behavior patterns that improved web conversion by 37%.
  • Designed a distributed analytics system in C++, Trino, and Docker supporting 10M+ records across 200+ features.
Data Engineer Intern

Reomnify · Nov 2020 – Jan 2021

  • Engineered custom web scraping solutions for 500+ company templates with version control, reducing manual data collection overhead.

Academic Background

Master of Science, Data Science

University of Southern California

Los Angeles, CA · Jan 2025 – Dec 2026

GPA 3.78 / 4.0

Coursework: Machine Learning, Deep Learning, Data Management, Data Science

Bachelor of Technology, Computer Science

SRM Institute of Science and Technology

Chennai, India · May 2018 – Jun 2022

GPA 3.46 / 4.0

Coursework: Machine Learning, Artificial Intelligence, Data Structures & Algorithms, Probability & Queueing Theory

Technical Stack

Languages

PythonSQLC/C++CUDABashJavaScriptR

ML / AI

PyTorchTensorFlowscikit-learnLLMsRAGLangChainLangfuseNLPNERBERTspaCyDeep LearningComputer Vision

Data Engineering

pandasNumPyApache KafkaSparkTrinoInformaticaA/B TestingPredictive Modeling

Databases

MySQLPostgreSQLMongoDBAzure SynapseSnowflakeSQL Server

Visualization

Power BITableauPlotlyMatplotlibStreamlit

Cloud & DevOps

Microsoft AzureDockerKubernetesCI/CDFastAPIREST APIsELK StackGCP

Community & Mentorship

Vice President

GRIDS — Graduates Rising in Data Science, USC

Sep 2025 – Present

Lead analytics workshops, ideathons, and data-driven projects for 250+ members across USC's largest data science organization.

Graduate Student Mentor

USC Viterbi School of Engineering

Jan 2025 – Present

Mentoring 6 graduate students on data science career paths, technical communication, and professional development at USC.

Let's Work Together

Currently seeking Summer 2026 Data Science, ML & AI internship opportunities.