Joel Markapudi

Machine Learning & Data Engineer | M.S. in Artificial Intelligence @ Northeastern University

Machine learning and LLM engineering enthusiast completing an M.S. in Artificial Intelligence at Northeastern University (GPA 3.9), with four years of experience as a Data Engineer building FP&A and operational reporting systems. My work bridges production-grade data engineering with modern AI systems—covering retrieval/RAG architectures, LLM orchestration and evaluation, ML or Vision domain, and end-to-end MLOps. I like working on full-stack ML systems, including but not limited to 3D human pose or Scene data, with a strong focus on deployable, cost-aware, and reproducible pipelines.

Current reads: Andrew Gordon Wilson's keynotes and lectures on Bayesian deep learning; gated attention for LLMs (NeurIPS 2025 best paper); Google's work on nested learning and self-modifying recurrent architectures; Efficient LLM training with memory and compute optimization.

Research & Academic Projects

Click on each project card to explore more details!

FinSights: Production-Grade Financial Intelligence System

Hybrid RAG architecture with DuckDB stratified sampling and semantic retrieval. Processes 72M→1M sentences achieving $0.017/query cost with exact citation provenance.

Documentation →

Text-to-Pose Diffusion

CLIP-conditioned diffusion model with cross-attention for 3D pose generation. Hybrid CNN-Transformer architecture with anatomical constraint enforcement and kinematic chain validation.

Design → Report →

Multi-View 3D Scene Analysis

10k+ LOC pipeline for multi-view scene reconstruction with pose-guided filtering, occlusion handling, and RANSAC validation on ETH3D dataset.

Implementation →

Vision-Language Late-Interaction Retrieval

ColBERT-style ViT+CLIP system with LoRA parameter-efficient fine-tuning. MaxSim token→patch alignment achieving 38% InfoNCE loss reduction and 100% retrieval accuracy on Flickr8k.

Repository → Notebook →

Protein Structure Prediction

Implemented HMM, CRF, and BiLSTM architectures for secondary structure prediction. CRF model achieved 67% accuracy on CB513 benchmark using evolutionary features.

Report →

SocrAItic Circle

Multi-agent LLM debate framework with multi-phase cycles, iterative refinement, and YAML-driven orchestration for enhanced reasoning and logical consistency evaluation.

Project Repository →

Artist Classification

Comparative study of SVM-SIFT-BoVW, CAEs, VAEs, and CNNs for artistic style classification. Classical SVM approach achieved 89% accuracy on 50-class dataset.

Professional Experience

FP&A and SG&A Reporting

Led BI delivery for finance and operations stakeholders. Built scalable ETL pipelines and Power BI models supporting 11+ programs. Shipped 25+ dashboards covering 500+ metrics, automated workflows cutting 70+ hours/month, and implemented CI/CD reducing deployment time 50%.

Innova Solutions • Mar 2020 – Nov 2023

Operational Reporting & Data Warehouse

Architected enterprise data warehouse integrating Bullhorn/HGC/JobDiva/Oracle EBS with 25+ PL/SQL procedures across 40+ star-schema tables. Engineered entity resolution achieving 88% accuracy. Delivered dashboards serving 300+ KPIs with forecasting and executive analytics.

Innova Solutions • May 2022 – Nov 2023

Other Works

Additional Labs & Experiments

ML-Serving with GitHub CI/CD and AWS Lambda

End-to-end ML deployment workflow using GitHub Actions, AWS Lambda, and SAM Infrastructure for serverless model serving. → Source Code | Study Notes

Optuna and MLFlow with Synthetic Time-Series

Hyperparameter optimization and experiment tracking using Optuna and MLFlow on synthetic time-series data. → Source Code