RSNA biomedical imaging
Brain MRI and abdominal CT (Kaggle)
EfficientNet3D on mpMRI DICOM (FLAIR, T1w, T1wCE, T2w) for MGMT classification; RSNA 2023 abdominal trauma with 2.5D EfficientNet CNN, 3D R3D-18, and DICOM-to-3D preprocessing.
Kaggle →Senior ML Engineer · AI Researcher · PhD Applicant
Dual-track profile for AI/ML opportunities: a production-focused engineering track for recruiters and a research-focused academic track for faculty and admissions committees.
Built production AI systems across OCR/document intelligence, voice AI, retrieval systems, and large-scale crawling. Engineering decisions prioritize measurable outcomes: throughput, latency, cost, and reliability.
Best for: AI/ML engineering roles, research engineer roles, startup applied AI positions, and teams needing end-to-end ML ownership from experimentation to deployment.
Core signal: practical systems at scale (10K+ OCR docs/day, multimillion-site pipelines, real-time voice interactions) with quantifiable optimization and delivery impact.
Document: Industry CV (PDF) · source (TeX)
Document AI and OCR in low-resource language settings, with emphasis on robust methods, reproducible experimentation, and practical deployment constraints.
Peer-reviewed publications (ACM, Springer), strong undergraduate academic record (CGPA 3.96/4.00), and teaching experience in core CS/AI courses.
Targeting fully funded US PhD programs (Fall 2027) with interests spanning document understanding, multimodal learning, and evaluation-oriented ML systems.
Efficient multimodal learning and edge-centric vision: VLM/OCR adaptation, quantization, and comparative computer vision evaluation (ACM SE'24 DLA).
Profile: Academic track page · Document: Academic CV (PDF) · source (TeX)
Project: Borno OCR — funded by the ICT Ministry of Bangladesh
Leading enterprise deep learning OCR for Bengali text recognition, document layout analysis, and automated PDF/image → editable DOCX.
Project: AI Interview Bot — real-time voice interview platform
Built the full interview engine and LiveKit WebRTC voice pipeline for automated AI-driven candidate interviews with session-isolated RAG and automated evaluation.
Project: Ad Intel Crawler — AI-powered ad network detection
AI-driven large-scale web crawling, automated ad network identification, and real-time media analysis for advertising intelligence.
Project: Vendidit — AI eCommerce intelligence
Large-scale scraping, automated fair market valuation, and intelligent data mapping.
Project: Bengali OCR — ICT Ministry funded
Deep learning OCR for Bengali, layout analysis, and scene text for government digitization.
Supported core undergraduate CS courses for cohorts of 30+ students per offering.
Led UCPC as President; organized 4 CSE competitive programming contests (2022–2023).
Python (expert), TypeScript, SQL, RESTful APIs, modular service architecture
PyTorch, TensorFlow, scikit-learn, Sentence Transformers, Hugging Face, faster-whisper, Piper (TTS), vLLM, Ollama, LLM integration, deployment, OpenCV, EAST/CRNN handwritten OCR, QAT, DataParallel
LangChain, RAG, ChromaDB, transformers, GPT/Qwen workflows, semantic search, prompt engineering
FastAPI, WebSockets, LiveKit/WebRTC, SQLite, Redis, SQLAlchemy, PostgreSQL, pgvector, Pydantic, rate limiting, HTTP microservices
Playwright, Crawl4AI, browser-use, BeautifulSoup4, Scrapy, async crawling pipelines
Gradio, Next.js, React, Docker, Docker Compose, GitHub Actions, CI/CD, Pytest, Vitest
Pandas, NumPy, Matplotlib, Seaborn, MLflow, Weights & Biases, statistical analysis
Git, GitHub, Jira, Notion, Slack, Agile, project management
Production-oriented systems aligned with current work: OCR, matching platforms, full-stack voice agents (REST/WebSocket + optional LiveKit), and RAG.
Brain MRI and abdominal CT (Kaggle)
EfficientNet3D on mpMRI DICOM (FLAIR, T1w, T1wCE, T2w) for MGMT classification; RSNA 2023 abdominal trauma with 2.5D EfficientNet CNN, 3D R3D-18, and DICOM-to-3D preprocessing.
Kaggle →Bengali handwritten document OCR
Full-page Bengali handwriting: EAST + LANMS detection, CRNN recognition with QAT-ready checkpoints, reading-order assembly. FastAPI endpoints (/plugin, /v1/ocr/handwritten), optional Gradio UI, DataParallel multi-GPU.
Full-stack conversational agent (FastAPI · Next.js 14)
Open-source monorepo: FastAPI backend with Next.js /call UI—REST and WebSockets for STT/TTS and agent turns, plus optional LiveKit WebRTC via a dedicated worker calling the same API. Uses Ollama, faster-whisper, and Piper; SQLite for appointment tooling and transcript persistence aligned across transports; optional MuseTalk lip-sync. Backend and frontend CI with pytest and Vitest.
Supervisor discovery & matching
End-to-end platform: CV signal extraction, AI-driven professor/opportunity discovery, ranked matches with evidence and outreach drafts.
GitHub →Real-time voice interview platform
LiveKit WebRTC voice pipeline (faster-whisper STT, Ollama LLM, multi-backend TTS) with session-isolated concurrent interviews. Interview engine handles CV/JD parsing, ChromaDB RAG, ATS scoring, evaluation worker (LLM rubrics + deterministic fallback), and transcript APIs. Rate limiting (Redis/in-memory), turn-level prompt injection defense, Next.js candidate UI with mic/cam pre-checks.
GitHub →YOLOv9 + tracking
Vehicle detection and tracking at 30 FPS with 92%+ detection accuracy.
GitHub →Document Q&A
Streamlit app with RAG for document-based Q&A—high query accuracy and fast responses in a local setup.
GitHub →Akanda, M.B.A.N., Ahmed, M., Rabby, A.S.A., & Rahman, F. Optimum Deep Learning Method for Document Layout Analysis in Low Resource Languages. ACM Southeast Conference (ACM SE ’24), 199–204.
doi:10.1145/3603287.3651184Akanda, M.B.A.N., Prodhan, M., Sarwar, S., Raatul, A.M., Paul, B. Voice Controlled Home Automation with Cloud-Based Environment Monitoring System. ICTCS 2022, LNNS vol. 623, Springer, Singapore.
doi:10.1007/978-981-19-9638-2_21B.Sc. Computer Science and Engineering (Minor: Business Administration)