Data Science Research Assistant — Data Science Institute, University of Chicago
Developed AI-powered Q&A chatbot using open-source large language models (Phi-4, Llama-3, Gemma-2B) to provide farmers streamlined access to agricultural seed laws across 78 countries
Processed and analyzed 183 legal PDF documents across 9 languages and covering legislation from 1981-2023
Built end-to-end Retrieval-Augmented Generation (RAG) pipeline with ChromaDB vector database, implementing semantic search and document chunking strategies for legal documents ranging from 1-150 pages
Collaborated with A Growing Culture nonprofit organization to advance global food sovereignty through multilingualdocument processing system development
Optimized chatbot response time to 15-20 seconds and evaluated model performance using ROUGE metrics across multiple LLM architectures
Identified critical gaps in multilingual NLP preprocessing pipelines and developed custom text normalization strategies for non-English legal terminology across 9 languages