Applied GenAI & AI Research Engineer with nearly 2 years of hands-on, production experience at Brainy Neurals, building systems that go from PoC to deployment — not just demos. I specialize in the full stack of modern AI: LLMs, RAG pipelines, AI agents, computer vision, and intelligent automation orchestration.
I bridge business requirements with advanced AI architectures, delivering systems that operate at scale — whether that's analyzing railway infrastructure at 90 km/h, processing 2,500+ refund disputes per week autonomously, or running 30-stage AI-driven B2B sales sequences with zero human intervention.
Research-driven. Execution-focused. Production-obsessed.
| Domain | What I Do |
|---|---|
| 🤖 LLM Systems | End-to-end RAG pipelines, MCP-based rich-context workflows, knowledge graph-driven querying |
| 🧩 AI Agents | Multi-agent orchestration with LangChain, LangGraph, n8n — fully autonomous business process automation |
| 👁️ Computer Vision | Real-time detection, depth-based measurement, pose estimation, video analytics at production scale |
| 🎨 Generative AI | Image generation (SDXL + IP-Adapter), super-resolution (Real-ESRGAN), real-time talking avatars |
| 🎙️ Voice & Speech AI | Real-time speech-to-speech with GPT-4o Realtime, Faster-Whisper ASR, NeMo diarization |
| ⚙️ MLOps & APIs | FastAPI/Flask backends, Docker deployments, scalable GPU pipelines, cloud delivery (AWS S3) |
17 production-grade AI systems built and deployed across EdTech, FinTech, Railway, E-Commerce, B2B Sales, Sports Analytics, and more.
Real-time, bi-directional voice AI tutor with specialized Australian accent coaching, automated pronunciation scoring, and session analytics.
GPT-4o Realtime API STT/TTS Streaming WebSockets FastAPI MongoDB Flutter
Text-to-floorplan agent using spatial reasoning to convert natural language room descriptions into AutoCAD DXF files — first draft in seconds, not hours.
Google Gemini LangChain Pydantic Python Geometry Engine Ezdxf
Unified trading intelligence engine combining news/social scraping, YouTube transcription, RSI/MACD/Volume signals, and a custom fine-tuned LLM into a single confidence-scored dashboard.
FinBERT VADER Custom Quantized LLM Faster-Whisper FastAPI Plotly PostgreSQL
Real-time YOLOv11 + ZED stereo vision system measuring overhead wire height, stagger, and gradient parameters at 90 km/h with GPS location tagging and automated CSV reporting.
YOLOv11 ZED Stereo Vision SDK CUDA OpenCV PyQt6
🎯 Targeted 50% reduction in pantograph wear · Eliminated manual inspection walking · Zero track possession required
End-to-end generative ad creative engine: scrapes product pages → generates AIDA scripts → produces AI images, videos, and voiceovers → delivers to AWS S3. 90% reduction in manual creative effort.
GPT-4.1 / GPT-5 Fal.ai ElevenLabs n8n AWS S3 FastAPI
Identity-preserving avatar engine using SDXL + IP-Adapter-FaceID. Turns user photos into fantasy-world avatars while maintaining facial identity via InsightFace embeddings and SSIM quality filtering.
Stable Diffusion XL InsightFace IP-Adapter-FaceID FastAPI PyTorch
Multi-speaker transcription, diarization, noise removal, and AI-generated structured meeting summaries — integrated with Zoom. Instant meeting minutes with zero manual note-taking.
Faster-Whisper NVIDIA NeMo Demucs Google Gemini PyTorch
Camera-based shoulder measurement engine using MediaPipe Pose + Face Mesh for pixel-to-cm conversion and size recommendation — deployable on any standard consumer device.
MediaPipe Pose Face Mesh OpenCV NumPy
Visual similarity search engine for thousands of elevator SKUs using EfficientNet-B0 feature embeddings and FAISS retrieval, returning ranked matches in seconds.
EfficientNet-B0 FAISS PyTorch Streamlit
30-stage automated AI follow-up engine with reply detection, business-day-aware sequencing, CRM-based personalization, and tone progression (casual → assertive → direct). 10–15 hours of weekly manual work eliminated.
GPT-4.1 n8n Gmail API Google Sheets API HighLevel CRM
Natural language → SQL → BigQuery execution → finance report → QuickBooks-ready journal entries. Processing time reduced from hours to under 1 minute. 95% manual workload reduction.
GPT-4.1 BigQuery FastAPI AWS S3 n8n
Lip-synced real-time avatar engine combining GPT-4o Realtime API with diffusion-based motion synthesis for hyper-realistic AI-driven customer interactions from static images.
LIA Diffusion GPT-4o Realtime HuBERT PyTorch CUDA
PDF beam extraction engine using regex + spatial proximity logic to detect, count, associate beam lengths, and output annotated audit PDFs. Hours of manual work reduced to seconds.
PyMuPDF Python Regex Gradio
arXiv scraping → PDF parsing → AI summarization → scheduled LinkedIn publishing. Fully automated thought leadership pipeline running 3 posts/week with zero manual drafting.
Gemini 2.0 Flash FastAPI APScheduler LinkedIn API
2x/4x upscaling engine for images and video with face restoration, noise removal, and tiling support for memory efficiency — enabling 4K-ready asset generation from legacy sources.
Real-ESRGAN GFPGAN PyTorch FFmpeg
Knowledge graph-powered ATS with semantic JD matching, OCR fallback parsing, and conversational database querying. 90% reduction in screening time.
Gemini 2.0 Flash Neo4j FastAPI LangChain
End-to-end automated basketball video intelligence: multi-player tracking, team classification, event recognition (shots/assists/steals/blocks), zone-aware scoring, per-player CSV stats, and auto-generated highlight clips. Targeting 600k+ games/year at scale.
DeepStream TensorRT YOLO RF-DETR PyTorch FastAPI AWS S3 LangChain Gemini
Senior AI Engineer │ Brainy Neurals │ April 2026 – Present
AI Research Engineer │ Brainy Neurals │ June 2024 – April 2026
Teaching Assistant │ IIIT Vadodara │ Sep 2023 – May 2024
Django Developer │ Sarva Suvidhaen │ Sep 2023 – Oct 2023
AI Intern │ IBM (AICTE/Edunet) │ Jun 2023 – Jul 2023
Python Developer │ Arth Infosoft │ Jan 2023 – May 2023
Django Developer │ Grownited Pvt. Ltd. │ Jun 2022 – Jul 2022
🎓 M.Tech in Artificial Intelligence — Indian Institute of Information Technology Vadodara (2023 – 2025)
🎓 B.E. in Information & Communication Technology — L.J. Institute of Engineering & Technology (2019 – 2023)
- 🏆 Microsoft Learn AI Skills Challenge
- ✅ Prompt Compression and Query Optimization
- ✅ Introduction to Relational Databases and SQL
- ✅ Introduction to Blockchain Technologies
Unlock intelligence from audio, vision, and text — automate decision-making — and bridge the gap between cutting-edge AI research and real-world business execution.
📫 Open to collaboration on GenAI products, agentic automations, and deep learning deployments.
rushabhshah122000@gmail.com · LinkedIn · Portfolio

