Telugu LLM Research

Research Papers

Test Set Quality in Multilingual LLM Evaluation

This paper focuses on the quality of the datasets used for evaluating multilingual LLMs for Telugu and French languages and argues that test sets should not be considered immutable and should be revisited, checked for correctness, and potentially versioned.

AACL 2025 • 5th Workshop on "Evaluation & Comparison of NLP Systems"

📄 ACL Anthology

MATA (మాట): Mindful Assessment of the Telugu Abilities of Large Language Models

We propose a novel approach for detecting and analyzing code-switching patterns in Telugu-English mixed text, with applications in social media analysis and language modeling.

LREC 2026 • Main Conference

📄 arXiv

MetricalARGS: A Taxonomy for Studying Metrical Poetry with LLMs

We introduce MetricalARGS, the first taxonomy of poetry-related NLP tasks designed to evaluate LLMs on metrical poetry across four dimensions: Analysis, Retrieval, Generation, and Support.

Preprint Oct 2025 • Under Review

📄 arXiv

📈 Recent Updates

Latest News

Feb 2026: 🎉 MATA accepted at LREC 2026: Main Conference
Dec 2025: 🎉 Test Set Quality paper accepted at AACL 2025: 5th Workshop on "Evaluation & Comparison of NLP Systems"
October 2025: 📄 Proposed a taxonomy of poetry-related NLP tasks for evaluating the reasoning capabilities of LLMs
August 2025: 📄 Proposed a new benchmark (MATA) for evaluating the Telugu abilities of LLMs
July 2025: 🎉 Test Set Quality paper accepted at COLM 2025: 1st Workshop on Multilingual Data Quality Signals

Collaboration and Contact

We welcome collaborations with researchers, institutions, and organizations working on Telugu language processing and multilingual NLP. Join us in advancing the field of Telugu computational linguistics.

🐙 GitHub Repository

📚 Blog