Aravind Kannappan

About Me

I'm a Master's student in Applied Statistics at NYU, with a Bachelor's in Statistics from Baylor University.

Passionate about leveraging data and AI to solve real-world problems, I specialize in machine learning, financial modeling, and NLP.

My work spans across healthcare, finance, and quantitative research, with expertise in building production-ready ML systems and conducting independent AI research.

Currently publishing research in peer-reviewed journals and working on cutting-edge projects in personalized medicine and financial technology.

Quick Facts

  • Applied Statistics @ NYU
  • Machine Learning Engineer
  • New York, NY
  • AI Research Engineer

Interests

  • Quantitative Finance
  • Bioinformatics
  • Deep Learning
  • MLOps & Production Systems

Education

New York University

Masters of Science in Applied Statistics

September 2024 - Present

New York City, New York

Baylor University

Bachelors of Science in Statistics, Minor in Biology

August 2020 - August 2024

Waco, Texas

Work Experience

April 2025 - Present

Machine Learning Engineer Intern

Icahn School of Medicine at Mount Sinai

  • Built production-ready transformer models using PyTorch with optimized attention mechanisms for patient cost analysis, delivering 15% improvement in recommendation system accuracy
  • Implemented reinforcement learning algorithms (multi-armed bandit with epsilon-greedy optimization) in Python, reducing insurance resource allocation waste by 10% through intelligent decision-making
  • Architected scalable ML infrastructure with Python and Pandas, creating HIPAA-compliant data pipelines that accelerated model deployment cycles by 20%
  • Developed automated MLOps framework integrating hyperparameter tuning and cross-validation with scikit-learn, enhancing model robustness and reliability by 12%
Jan 2021 - Present

Artificial Intelligence Research Engineer

Independent Research

  • Architected multi-agent AI platform using Python, LangChain, and Hugging Face, processing 100+ diagnostic cases with 21% accuracy improvement over baseline models
  • Built production-ready NLP pipeline with SpaCy for patient narrative analysis, deployed via Docker containers, achieving 18% enhancement in diagnostic relevance scoring
  • Developed ensemble machine learning system integrating social and clinical data streams using Scikit-Learn, reducing diagnostic errors by 12% through advanced feature engineering
  • Engineered performance evaluation framework with automated testing and monitoring, enabling systematic model validation and continuous improvement workflows
  • Publishing technical findings in peer-reviewed journal, demonstrating impact of AI-driven personalized medicine approaches
Jan 2022 - Dec 2024

Bioinformatics Intern

Baylor College of Medicine

  • Implemented scalable genomic analysis pipeline in R and Python, processing large-scale datasets (CoMMpass, UAMS) with 10,000+ patient records and genetic markers
  • Built predictive modeling system using Cox regression and deep learning techniques, achieving hazard ratio of 2.44 for survival prediction in oncology applications
  • Developed automated data processing workflows using R packages (survminer, clusterProfiler) for pathway analysis, enabling efficient biomarker discovery and patient stratification
  • Collaborated with cross-functional medical teams to translate complex statistical findings into actionable clinical insights and treatment recommendations

Featured Projects

AI & Finance

Quant Aegis

GNNXGBoostAWS LambdaPython

Quant Aegis

Engineered real-time fraud detection system using Graph Neural Networks (GNN) and XGBoost in Python, processing transaction networks with 15% accuracy improvement.

Optimized cloud inference pipeline on AWS Lambda, achieving 20% reduction in response time for low-latency financial surveillance applications.

Finance

Volatility Alchemist

PythonScikit-learnGARCHRandom Forest

Volatility Alchemist

Built a production-grade options analytics platform integrating Random Forest volatility modeling with Black Scholes theory, delivering R² of 0.97 for highly liquid equity options.

Achieved 68% accuracy in predicting five-day volatility regimes and produced Sharpe ratios of 0.84–1.47 through automated signal generation.

Finance

Rare Disaster Asset Pricing Model

PythonNumPyPandasSciPy

Rare Disaster Asset Pricing Model

Extended the Mehra–Prescott model to include a disaster state via a three-state Markov chain, improving historical U.S. equity market fit by 18%.

Calibrated disaster probabilities using GDP and asset return data from 1929–2020, aligning equity premia with the historical 5–7% range.

AI & Data

RallyScope

PythonXGBoostCatBoostOpenCVSHAP

RallyScope

Built a tennis analytics platform integrating surface-specific Elo ratings and interpretable ML models, achieving AUC = 0.81 and 74% match prediction accuracy across 36,342 ATP/WTA matches.

Developed computer vision serve analysis with serve speed estimates within 8.2 km/h MAE of Hawk-Eye benchmarks.

AI

TrafficFlowOpt

PythonJAXC++Docker

TrafficFlowOpt

Architected unified traffic optimization framework fusing Neural ODE forecasting, PDE-based flow modeling, and graph-theoretic routing, enabling 85% accurate 30-minute congestion predictions.

Implemented adaptive shortest-path routing driving 15.9% reduction in average travel time and 23.1% cut in total vehicle delay.

Publications

Characterization of driver mutations identifies gene signatures predictive of prognosis and treatment sensitivity in multiple myeloma

September 2024

The Oncologist Journal

J.-R. Li, A. K. Parthasarathy, A. S. Kannappan, S. Arsang-Jang, J. Dong, C. Cheng

Published research on characterization of driver mutations and gene signatures predictive of prognosis in multiple myeloma. This study identified novel biomarkers for improved patient stratification and treatment selection.

Read Paper

Skills & Technologies

Programming Languages

Python95%
SQL (PostgreSQL, NoSQL)90%
R85%
Java80%
Matlab & C++75%
React & TypeScript70%

ML/AI Frameworks

Scikit-learn90%
PyTorch85%
TensorFlow80%
Hugging Face85%
NLTK & SpaCy80%
XGBoost75%

Cloud & DevOps

AWS (S3, Lambda, EC2)85%
Docker80%
Google Cloud75%
Kubernetes70%
Git & Linux85%

Data & Analytics

ETL Pipelines90%
Tableau/Power BI85%
A/B Testing90%
Statistical Analysis85%
Apache Spark/Hadoop80%
Statistical Analysis
Machine Learning
Deep Learning
Natural Language Processing
Computer Vision
Time Series Analysis
Bayesian Inference
Reinforcement Learning
MLOps
A/B Testing
Causal Inference
Bioinformatics
Financial Modeling
Risk Management
Data Mining
Linear Algebra
Frequentist Inference
Multilevel Modeling
Data Structures & Algorithmss
Operating Systems
Partial Differential Equations
Ordinary Differential Equations
Stochastic Modeling
Graph Neural Networks
Discrete Mathematics
Game Theory
Biostatistics