
Hello, I'm
Arpitha Thippeswamy
Data Scientist • MSc in Data Science
Master's in Data Science Graduate Student at Thompson Rivers University with 2.5+ years of experience in analytics, machine learning, and applied research.
Currently working as a Graduate Research Assistant, building predictive models and data-driven solutions for real-world decision-making across complex datasets.
About Me
Data Scientist with 2.5+ years of experience in machine learning, analytics, and data engineering across healthcare, pharmaceutical, and research domains.
I'm currently pursuing an MSc in Data Science at Thompson Rivers University (GPA 3.76), conducting NRC-funded graduate research on ensemble predictive modeling for hospital length-of-stay prediction in geriatric patients using large-scale clinical data (MIMIC-IV). My research spans end-to-end data architecture design, complex SQL engineering, ML model development, and statistical validation from raw clinical data to deployable predictive systems.
On the industry side, I have built ML models , engineered ETL workflows, and delivered Power BI dashboards for clients in healthcare and pharmaceuticals translating complex data into decisions that business stakeholders can act on.I'm actively building the Clinical LOS Intelligence System, a conversational AI extension that will enable natural language querying of research model findings via RAG pipeline, LLM interface, and agentic orchestration.
Open to Data Scientist, Data Analyst, ML Engineer, and Data Engineer opportunities across Canada.
Education
Master's in Data Science at Thompson Rivers University (TRU)
Current Role
Graduate Research Assistant at Thompson Rivers University (TRU)
Education
My academic journey in Data Science

Master of Science in Data Science
Thompson Rivers University, Kamloops, British Columbia, Canada
Sep 2024 – Present

Postgraduate Diploma in Artificial Intelligence
Georgian College, Barrie, Ontario, Canada
Sep 2023 – Apr 2024

Bachelor of Science (Economics, Mathematics, Statistics)
Mount Carmel College, Bangalore, India
May 2018 – Jul 2021
Experience
Research, leadership, and industry experience across data science & analytics
Graduate Research Assistant
Part-timeKey Achievements:
- Conducting NRC-funded research on hospital length-of-stay (LOS) prediction for older adults (65+) using MIMIC-IV; designed end-to-end PostgreSQL data architecture (raw staging + analytical datamart) with OLTP & OLAP schemas for high-volume data storage and downstream ML modeling.
- Wrote and optimized complex SQL queries for data extraction, transformation, feature engineering, and auditing across multiple hospital departments; ensured data accuracy and integrity prior to modeling.
- Developed and evaluated ensemble ML models (Random Forest, Gradient Boosting, Stacked Ensembles); conducted A/B-style comparative analysis of classification vs. regression formulations, evaluated using cross-validation, AUC-ROC, and F1 metrics.
- Presented research as independent poster presenter at the 2026 ISG World Conference (Vancouver); responded to audience Q&A as co-author alongside supervisor Dr. Piper Jackson.
- Currently designing the Clinical LOS Intelligence System - a conversational AI extension for natural language querying of model findings, live LOS prediction, and SHAP-driven explanations via RAG pipeline, LLM interface, and agentic orchestration.
Graduate Teaching Assistant
Part-timeKey Achievements:
- Delivered hands-on instruction in applied statistics in excel for a lab-based undergraduate course, guiding students through uncertainty estimation, standard deviation, confidence intervals, and relative error analysis -translating statistical concepts into practical, interpretable findings.
- Educated a diverse cohort on rigorous data validation, precision assessment, and analytical reporting - reinforcing the importance of statistical thinking and hypothesis-driven reasoning.
Data Analyst Intern
InternshipKey Achievements:
- Completed intensive ML and data analytics program covering SQL, machine learning models, deployment, and AI technologies; gained hands-on experience with core data science tools applied to practical projects.
Junior Data Scientist
Full-timeKey Achievements:
- Enabled prioritization of the top 20% of accounts, improving sales targeting efficiency and revenue conversion. Identified 8% increase in incremental quarterly sales opportunities, directly supporting strategic revenue growth initiatives.
ML & Advanced Analytics | Customer Segmentation | Probabilistic Recommendation
- Designed and implemented an end-to-end machine learning pipeline in Python to process, clean, and transform large-scale transactional sales and customer attribute data for advanced analytics use cases.
- Engineered customer behavioral features, including a Recency–Frequency (RF) scoring model, and applied clustering techniques to segment customers into behavioral and attribute-based cohorts.
- Developed a Growth Opportunity Matrix by integrating clustering outputs with product purchase patterns to identify high-potential customer-product combinations.
- Applied the Apriori association rule mining algorithm within customer segments to build product affinity models and uncover cross-sell and up-sell opportunities.
- Built an automated recommendation engine that runs on a monthly cycle to analyze evolving sales patterns and generate actionable product recommendations.
- Designed interactive customer intelligence dashboards using Power BI to visualize purchase behavior, product mix, and personalized recommendations, enabling account managers to drive targeted sales strategies.
Data Engineering & BI
- Partnered with business users and leadership stakeholders at a US pharmaceutical client to gather data and analytics requirements, translating business needs into analytical solutions, SQL-based insights, and automated reporting outputs.
- Optimized SQL queries and stored procedures reducing report execution time and improving ETL pipeline reliability across multiple source systems, enhancing overall reporting efficiency and data accessibility.
- Designed and maintained scalable ETL pipelines using Microsoft SSIS, monitoring scheduled jobs and proactively resolving failures to ensure consistent and timely data delivery across enterprise systems.
- Developed interactive Power BI dashboards enabling self-service analytics, allowing business users to generate on-demand reports independently and improving decision-making speed and data accessibility.
- Acted as a data analytics liaison between technical and business teams, supporting end-user training, adoption, and Agile/Scrum-based delivery processes with well-documented reporting frameworks.
Received Employee of the Quarter (Q1 2022) for performance and impact.
Research Work
I work as a Graduate Research Assistant on healthcare predictive analytics for older adults, focusing on machine learning models that support efficient resource allocation and improved patient outcomes.
Graduate Research Assistant — Thompson Rivers University
NRC-Funded | Supervisor: Dr. Piper Jackson | Jul 2025 – Present
Project Context
Research aligned with the Digital Comprehensive Geriatric Assessment at Home (D-CGA@home) initiative, a federally funded program supported by the National Research Council of Canada (NRC). The project applies machine learning and data engineering to support clinical decision-making, care planning, and operational optimization in geriatric healthcare systems.
Data Engineering, Storage & Analytics (PostgreSQL)
- Built and managed a PostgreSQL-based analytical database to store and query large-scale MIMIC-IV (v3.1) data, including admissions, transfers, diagnoses, procedures, lab measurements, and ED events
- Designed normalized schemas and indexing strategies to support efficient joins across high-volume patient-, admission-, department-, and time-based records
- Integrated PostgreSQL with Python (psycopg2 / SQLAlchemy) for scalable feature extraction and model-ready dataset generation
- Implemented chunked data ingestion and processing workflows to handle very large tables efficiently while ensuring reproducibility and memory safety
Data & Methodology
- Performed advanced feature engineering, including temporal aggregation of lab measurements, severity proxies, comorbidity groupings (ICD-9), and department-specific clinical indicators
- Applied a staged modeling strategy: classification (binary and multi-class LOS thresholds) for early risk stratification, and regression for continuous LOS estimation
- Models explored: Logistic & Linear Regression, Random Forest, Gradient Boosting, Extra Trees, Voting and Stacking Ensembles, and exploratory Neural Networks
- Addressed class imbalance using SMOTE and stratified sampling; conducted iterative feature selection and hyperparameter tuning with cross-validation (AUC-ROC, F1, RMSE)
Research Achievements, Funding & Outcomes
- Presented research as independent poster presenter at the 2026 ISG World Conference (Vancouver); responded to audience Q&A as co-author alongside supervisor Dr. Piper Jackson
- Secured $1,000 in competitive conference funding to support research dissemination and presentation
- CIHR–Institute of Aging (CIHR-IA) Trainee Travel Award Recipient — $500, International Society for Gerontechnology 15th World Conference (2026)
- Ongoing work toward a conference paper and journal-ready manuscript on ensemble modeling for multi-department LOS prediction
Current Extension: Clinical LOS Intelligence System
Currently designing a conversational AI extension on top of the research enabling natural language querying of model findings, live LOS prediction, and SHAP-driven explanations via a RAG pipeline, LLM interface, and agentic orchestration.
Research Poster & Oral Presentation — ISG World Conference, SFU Vancouver · March 2026


3-Minute Thesis (3MT) — Award-Winning Presentation
I presented my graduate research in a 3-minute talk using a single static slide focused on clear storytelling, real-world impact, and communicating AI to a broad audience.
TRU 3MT Competition Highlights
- Won Second Prize for research presentation and impact.
- Won the People’s Choice Award based on audience voting.
- Presented complex ML research in an accessible, patient-centered story.
- Strengthened skills in research communication, clarity, and public speaking.
🎞️ Slide used in the competition

Projects
Stacked Ensemble Models for multi-department length-of-stay(los) prediction in geriatric patients
Built an end-to-end healthcare analytics pipeline using PostgreSQL and Python to predict multi-department hospital length of stay for older adults.Applied ensemble learning techniques to improve prediction robustness and support data-driven hospital resource planning
Heart Disease Prediction (TRU)
Built predictive models for heart disease outcomes as part of the TRU Data Science Seminar (Fall 2024). Focused on data preprocessing, feature engineering, classification modeling, and performance evaluation using real-world healthcare data.
Breast Cancer Prediction
Developed supervised machine learning models to predict breast cancer diagnosis. Emphasized exploratory data analysis, feature selection, model training, and evaluation using standard classification metrics.
Comparing Activation Functions in Neural Networks
Compared ReLU, Sigmoid, Tanh, Leaky ReLU, and Swish on CIFAR-10 to study convergence behavior, gradient dynamics, and theoretical trade-offs. Includes visualization of loss, accuracy, gradient flow, and mathematical analysis.
Imputing Missing Reaction Times using EM Algorithm
Implemented an Expectation-Maximization (EM) approach to impute missing reaction times under a bivariate normal model. Focused on statistical modeling, iterative estimation, and missing data theory.
Cross-Domain Sentiment Analysis
Built and evaluated sentiment analysis models across multiple domains (IMDB, social media, airline tweets, consumer complaints). Studied cross-dataset generalization using TF-IDF and classical ML models.
Skills & Technologies
Tools I use for research, analytics, and applied machine learning
✦ Currently learning
Data Science & Machine Learning
Programming
Data & Databases
Cloud & Data Engineering
Visualization & Analytics
Dev Tools
Agile & MLOps
AI & Advanced Analytics
Leadership & Student Engagement
Actively contributing to student advocacy, governance, and community engagement at Thompson Rivers University through leadership, representation, and volunteering.
Leadership Roles at TRU
- Member-at-Large – University Affairs Committee
Supporting student advocacy, contributing to policy discussions, and strengthening graduate representation in academic and campus matters. - Student Representative – Student Voices Steering Committee
Collaborating on proposals, consultations, and formal representations of student interests within TRUSU governance structures. - Student Representative – Hiring Committee (VP / Provost of Academics)
Appointed to contribute student perspectives in senior academic leadership recruitment at Thompson Rivers University. - Vice President – Women in Business Club (WIB)
Led communications, created meeting agendas, and organized initiatives to support leadership development and inclusion.
Volunteering & Student Engagement
- Volunteered at Student Caucus stalls during Back-to-School BBQs, engaging new students and supporting recruitment initiatives.
- Helped organize and host Clubs Day & information sessions, facilitating student participation and community building.
- Actively involved in outreach, event coordination, and student-facing initiatives to strengthen graduate engagement at TRU.


TRUSU Student Caucus events and student engagement activities at TRU
Impact & Learning
These leadership and volunteering experiences strengthened my ability to collaborate across diverse stakeholders, represent student voices effectively, and contribute to institutional decision-making skills that I bring directly into my research and data-driven work.
Awards & Recognition
Professional achievements and recognition

TRUSU Services Committee Conference Grant
Thompson Rivers University Students' Union — CAD $1,000
Awarded by the TRUSU Services Committee to support conference attendance at ISG2026, covering registration, travel, and accommodation expenses.

CIHR-IA Trainee Travel Award
CIHR Institute of Aging — CAD $500
Awarded by the Canadian Institutes of Health Research – Institute of Aging to support attendance and research dissemination at the 15th World Conference of the International Society for Gerontechnology (ISG2026), Vancouver.

3-Minute Thesis (3MT) - 2nd Place + People's Choice Award
Thompson Rivers University — CAD $500
Recognized for communicating complex graduate research to a public audience in three minutes - winning second prize and a hamper for the crowd's vote.

TELUS Excellence in Science Award
Thompson Rivers University — CAD $1,000
Awarded based on academic merit and excellence in graduate-level science research.

Graduate Student Leader Award
Thompson Rivers University — CAD $6,000
Recognized by TRU's Office of the Vice-President Research for leadership and contributions to graduate student engagement and research culture.

Employee of the Quarter
Issued by Aidastech
Awarded for outstanding performance and impact on analytics delivery, reporting automation, and client project execution





