Arpitha Thippeswamy

Hello, I'm

Arpitha Thippeswamy

Data Scientist • MSc in Data Science

Master's in Data Science Graduate Student at Thompson Rivers University with 2.5+ years of experience in analytics, machine learning, and applied research.

Currently working as a Graduate Research Assistant, building predictive models and data-driven solutions for real-world decision-making across complex datasets.

About Me

Data Scientist with 2.5+ years of experience in machine learning, analytics, and data engineering across healthcare, pharmaceutical, and research domains.

I'm currently pursuing an MSc in Data Science at Thompson Rivers University (GPA 3.76), conducting NRC-funded graduate research on ensemble predictive modeling for hospital length-of-stay prediction in geriatric patients using large-scale clinical data (MIMIC-IV). My research spans end-to-end data architecture design, complex SQL engineering, ML model development, and statistical validation from raw clinical data to deployable predictive systems.

On the industry side, I have built ML models , engineered ETL workflows, and delivered Power BI dashboards for clients in healthcare and pharmaceuticals translating complex data into decisions that business stakeholders can act on.I'm actively building the Clinical LOS Intelligence System, a conversational AI extension that will enable natural language querying of research model findings via RAG pipeline, LLM interface, and agentic orchestration.

Open to Data Scientist, Data Analyst, ML Engineer, and Data Engineer opportunities across Canada.

🎓

Education

Master's in Data Science at Thompson Rivers University (TRU)

💼

Current Role

Graduate Research Assistant at Thompson Rivers University (TRU)

Education

My academic journey in Data Science

Master of Science in Data Science logo

Master of Science in Data Science

GPA: 3.77/4.33

Thompson Rivers University, Kamloops, British Columbia, Canada

Sep 2024 – Present

Postgraduate Diploma in Artificial Intelligence logo

Postgraduate Diploma in Artificial Intelligence

GPA: 87.59/100

Georgian College, Barrie, Ontario, Canada

Sep 2023 – Apr 2024

Bachelor of Science (Economics, Mathematics, Statistics) logo

Bachelor of Science (Economics, Mathematics, Statistics)

GPA: 77.37/100

Mount Carmel College, Bangalore, India

May 2018 – Jul 2021

Experience

Research, leadership, and industry experience across data science & analytics

Graduate Research Assistant

Part-time
Thompson Rivers University, Kamloops, BC, Canada
Jul 2025 – Present

Key Achievements:

  • Conducting NRC-funded research on hospital length-of-stay (LOS) prediction for older adults (65+) using MIMIC-IV; designed end-to-end PostgreSQL data architecture (raw staging + analytical datamart) with OLTP & OLAP schemas for high-volume data storage and downstream ML modeling.
  • Wrote and optimized complex SQL queries for data extraction, transformation, feature engineering, and auditing across multiple hospital departments; ensured data accuracy and integrity prior to modeling.
  • Developed and evaluated ensemble ML models (Random Forest, Gradient Boosting, Stacked Ensembles); conducted A/B-style comparative analysis of classification vs. regression formulations, evaluated using cross-validation, AUC-ROC, and F1 metrics.
  • Presented research as independent poster presenter at the 2026 ISG World Conference (Vancouver); responded to audience Q&A as co-author alongside supervisor Dr. Piper Jackson.
  • Currently designing the Clinical LOS Intelligence System - a conversational AI extension for natural language querying of model findings, live LOS prediction, and SHAP-driven explanations via RAG pipeline, LLM interface, and agentic orchestration.

Graduate Teaching Assistant

Part-time
Department of Physical Sciences, Thompson Rivers University
Jan 2026 – Feb 2026

Key Achievements:

  • Delivered hands-on instruction in applied statistics in excel for a lab-based undergraduate course, guiding students through uncertainty estimation, standard deviation, confidence intervals, and relative error analysis -translating statistical concepts into practical, interpretable findings.
  • Educated a diverse cohort on rigorous data validation, precision assessment, and analytical reporting - reinforcing the importance of statistical thinking and hypothesis-driven reasoning.
🎓

Data Analyst Intern

Internship
Tutort Academy, India
Aug 2022 – Jul 2023

Key Achievements:

  • Completed intensive ML and data analytics program covering SQL, machine learning models, deployment, and AI technologies; gained hands-on experience with core data science tools applied to practical projects.
💼

Junior Data Scientist

Full-time
Aidastech, India
Sep 2021 – Jul 2022

Key Achievements:

  • Enabled prioritization of the top 20% of accounts, improving sales targeting efficiency and revenue conversion. Identified 8% increase in incremental quarterly sales opportunities, directly supporting strategic revenue growth initiatives.

ML & Advanced Analytics | Customer Segmentation | Probabilistic Recommendation

  • Designed and implemented an end-to-end machine learning pipeline in Python to process, clean, and transform large-scale transactional sales and customer attribute data for advanced analytics use cases.
  • Engineered customer behavioral features, including a Recency–Frequency (RF) scoring model, and applied clustering techniques to segment customers into behavioral and attribute-based cohorts.
  • Developed a Growth Opportunity Matrix by integrating clustering outputs with product purchase patterns to identify high-potential customer-product combinations.
  • Applied the Apriori association rule mining algorithm within customer segments to build product affinity models and uncover cross-sell and up-sell opportunities.
  • Built an automated recommendation engine that runs on a monthly cycle to analyze evolving sales patterns and generate actionable product recommendations.
  • Designed interactive customer intelligence dashboards using Power BI to visualize purchase behavior, product mix, and personalized recommendations, enabling account managers to drive targeted sales strategies.

Data Engineering & BI

  • Partnered with business users and leadership stakeholders at a US pharmaceutical client to gather data and analytics requirements, translating business needs into analytical solutions, SQL-based insights, and automated reporting outputs.
  • Optimized SQL queries and stored procedures reducing report execution time and improving ETL pipeline reliability across multiple source systems, enhancing overall reporting efficiency and data accessibility.
  • Designed and maintained scalable ETL pipelines using Microsoft SSIS, monitoring scheduled jobs and proactively resolving failures to ensure consistent and timely data delivery across enterprise systems.
  • Developed interactive Power BI dashboards enabling self-service analytics, allowing business users to generate on-demand reports independently and improving decision-making speed and data accessibility.
  • Acted as a data analytics liaison between technical and business teams, supporting end-user training, adoption, and Agile/Scrum-based delivery processes with well-documented reporting frameworks.

Received Employee of the Quarter (Q1 2022) for performance and impact.

Research Work

I work as a Graduate Research Assistant on healthcare predictive analytics for older adults, focusing on machine learning models that support efficient resource allocation and improved patient outcomes.

Graduate Research Assistant — Thompson Rivers University

NRC-Funded | Supervisor: Dr. Piper Jackson | Jul 2025 – Present

Project Context

Research aligned with the Digital Comprehensive Geriatric Assessment at Home (D-CGA@home) initiative, a federally funded program supported by the National Research Council of Canada (NRC). The project applies machine learning and data engineering to support clinical decision-making, care planning, and operational optimization in geriatric healthcare systems.

Data Engineering, Storage & Analytics (PostgreSQL)

  • Built and managed a PostgreSQL-based analytical database to store and query large-scale MIMIC-IV (v3.1) data, including admissions, transfers, diagnoses, procedures, lab measurements, and ED events
  • Designed normalized schemas and indexing strategies to support efficient joins across high-volume patient-, admission-, department-, and time-based records
  • Integrated PostgreSQL with Python (psycopg2 / SQLAlchemy) for scalable feature extraction and model-ready dataset generation
  • Implemented chunked data ingestion and processing workflows to handle very large tables efficiently while ensuring reproducibility and memory safety

Data & Methodology

  • Performed advanced feature engineering, including temporal aggregation of lab measurements, severity proxies, comorbidity groupings (ICD-9), and department-specific clinical indicators
  • Applied a staged modeling strategy: classification (binary and multi-class LOS thresholds) for early risk stratification, and regression for continuous LOS estimation
  • Models explored: Logistic & Linear Regression, Random Forest, Gradient Boosting, Extra Trees, Voting and Stacking Ensembles, and exploratory Neural Networks
  • Addressed class imbalance using SMOTE and stratified sampling; conducted iterative feature selection and hyperparameter tuning with cross-validation (AUC-ROC, F1, RMSE)

Research Achievements, Funding & Outcomes

  • Presented research as independent poster presenter at the 2026 ISG World Conference (Vancouver); responded to audience Q&A as co-author alongside supervisor Dr. Piper Jackson
  • Secured $1,000 in competitive conference funding to support research dissemination and presentation
  • CIHR–Institute of Aging (CIHR-IA) Trainee Travel Award Recipient — $500, International Society for Gerontechnology 15th World Conference (2026)
  • Ongoing work toward a conference paper and journal-ready manuscript on ensemble modeling for multi-department LOS prediction

Current Extension: Clinical LOS Intelligence System

Currently designing a conversational AI extension on top of the research enabling natural language querying of model findings, live LOS prediction, and SHAP-driven explanations via a RAG pipeline, LLM interface, and agentic orchestration.

Research Poster & Oral Presentation — ISG World Conference, SFU Vancouver · March 2026

Conference presentation photo 1
Conference presentation photo 2

3-Minute Thesis (3MT) — Award-Winning Presentation

I presented my graduate research in a 3-minute talk using a single static slide focused on clear storytelling, real-world impact, and communicating AI to a broad audience.

🏆 Second Prize Winner🗳️ People’s Choice Award

TRU 3MT Competition Highlights

  • Won Second Prize for research presentation and impact.
  • Won the People’s Choice Award based on audience voting.
  • Presented complex ML research in an accessible, patient-centered story.
  • Strengthened skills in research communication, clarity, and public speaking.

🎞️ Slide used in the competition

3MT slide

3MT Competition at OLARA

Projects

Stacked Ensemble Models for multi-department length-of-stay(los) prediction in geriatric patients

Built an end-to-end healthcare analytics pipeline using PostgreSQL and Python to predict multi-department hospital length of stay for older adults.Applied ensemble learning techniques to improve prediction robustness and support data-driven hospital resource planning

PythonPostgre-SQLScikit-learnClassificationHealthcare Analytics

Heart Disease Prediction (TRU)

Built predictive models for heart disease outcomes as part of the TRU Data Science Seminar (Fall 2024). Focused on data preprocessing, feature engineering, classification modeling, and performance evaluation using real-world healthcare data.

PythonPandasScikit-learnClassificationHealthcare Analytics

Breast Cancer Prediction

Developed supervised machine learning models to predict breast cancer diagnosis. Emphasized exploratory data analysis, feature selection, model training, and evaluation using standard classification metrics.

PythonMachine LearningClassificationEDAModel Evaluation

Comparing Activation Functions in Neural Networks

Compared ReLU, Sigmoid, Tanh, Leaky ReLU, and Swish on CIFAR-10 to study convergence behavior, gradient dynamics, and theoretical trade-offs. Includes visualization of loss, accuracy, gradient flow, and mathematical analysis.

PyTorchDeep LearningNeural NetworksOptimizationTheory

Imputing Missing Reaction Times using EM Algorithm

Implemented an Expectation-Maximization (EM) approach to impute missing reaction times under a bivariate normal model. Focused on statistical modeling, iterative estimation, and missing data theory.

StatisticsEM AlgorithmProbabilityPythonMissing Data

Cross-Domain Sentiment Analysis

Built and evaluated sentiment analysis models across multiple domains (IMDB, social media, airline tweets, consumer complaints). Studied cross-dataset generalization using TF-IDF and classical ML models.

NLPTF-IDFSentiment AnalysisScikit-learnEvaluation

Skills & Technologies

Tools I use for research, analytics, and applied machine learning

Currently learning

🧠

Data Science & Machine Learning

Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
XGBoost
A/B Testing
Hypothesis Testing
Statistical Validation
NLP
Deep Learning
💻

Programming

Python
R
PySpark ✦
🗄️

Data & Databases

SQL
PostgreSQL
SSMS
ETL (SSIS)
☁️

Cloud & Data Engineering

Azure
Azure Data Factory
Azure Blob Storage
Databricks
Synapse
Hugging Face Spaces ✦
📊

Visualization & Analytics

Power BI
Power BI Service
Advanced Excel
Tableau
IBM SPSS
🛠️

Dev Tools

Jupyter Notebook
VS Code
GitHub
Jira
⚙️

Agile & MLOps

Git & GitHub
Docker
FastAPI ✦
GitHub Actions CI/CD ✦
Model Versioning ✦
🤖

AI & Advanced Analytics

LangChain ✦
ChromaDB ✦
RAG Pipelines ✦
LLM Interfaces ✦
SHAP Explainability ✦
Transformer Architecture ✦
Prompt Engineering ✦
Agentic AI ✦

Leadership & Student Engagement

Actively contributing to student advocacy, governance, and community engagement at Thompson Rivers University through leadership, representation, and volunteering.

Leadership Roles at TRU

  • Member-at-Large – University Affairs Committee
    Supporting student advocacy, contributing to policy discussions, and strengthening graduate representation in academic and campus matters.
  • Student Representative – Student Voices Steering Committee
    Collaborating on proposals, consultations, and formal representations of student interests within TRUSU governance structures.
  • Student Representative – Hiring Committee (VP / Provost of Academics)
    Appointed to contribute student perspectives in senior academic leadership recruitment at Thompson Rivers University.
  • Vice President – Women in Business Club (WIB)
    Led communications, created meeting agendas, and organized initiatives to support leadership development and inclusion.
AdvocacyGovernaceStudentLeadershipGraduateCommunity

Volunteering & Student Engagement

  • Volunteered at Student Caucus stalls during Back-to-School BBQs, engaging new students and supporting recruitment initiatives.
  • Helped organize and host Clubs Day & information sessions, facilitating student participation and community building.
  • Actively involved in outreach, event coordination, and student-facing initiatives to strengthen graduate engagement at TRU.
TRU Student Caucus & Back-to-School BBQ
Clubs Day and Student Engagement at TRU

TRUSU Student Caucus events and student engagement activities at TRU

Impact & Learning

These leadership and volunteering experiences strengthened my ability to collaborate across diverse stakeholders, represent student voices effectively, and contribute to institutional decision-making skills that I bring directly into my research and data-driven work.

Awards & Recognition

Professional achievements and recognition

TRUSU Services Committee Conference Grant

TRUSU Services Committee Conference Grant

March 2026

Thompson Rivers University Students' Union — CAD $1,000

Awarded by the TRUSU Services Committee to support conference attendance at ISG2026, covering registration, travel, and accommodation expenses.

CIHR-IA Trainee Travel Award

CIHR-IA Trainee Travel Award

March 2026

CIHR Institute of Aging — CAD $500

Awarded by the Canadian Institutes of Health Research – Institute of Aging to support attendance and research dissemination at the 15th World Conference of the International Society for Gerontechnology (ISG2026), Vancouver.

3-Minute Thesis (3MT) - 2nd Place + People's Choice Award

3-Minute Thesis (3MT) - 2nd Place + People's Choice Award

Feb 2026

Thompson Rivers University — CAD $500

Recognized for communicating complex graduate research to a public audience in three minutes - winning second prize and a hamper for the crowd's vote.

TELUS Excellence in Science Award

TELUS Excellence in Science Award

Feb 2026

Thompson Rivers University — CAD $1,000

Awarded based on academic merit and excellence in graduate-level science research.

Graduate Student Leader Award

Graduate Student Leader Award

Oct 2025

Thompson Rivers University — CAD $6,000

Recognized by TRU's Office of the Vice-President Research for leadership and contributions to graduate student engagement and research culture.

Employee of the Quarter

Employee of the Quarter

Apr 2022

Issued by Aidastech

Awarded for outstanding performance and impact on analytics delivery, reporting automation, and client project execution