SemEval 2026 Task 8 (MTRAGEval)
Multi-Turn RAG Evaluation
SemEval 2026 Task 8: MTRAGEval - Multi-Turn RAG Evaluation
Purdue University Fort Wayne, 2026
Participating in SemEval 2026 Task 8 focused on evaluating Retrieval-Augmented Generation (RAG) systems in multi-turn conversational settings. This task addresses the critical challenge of assessing RAG system performance when context builds across multiple interactions.
Task Overview
MTRAGEval evaluates RAG systems across several dimensions:
- Answer Quality: Accuracy, completeness, and relevance of generated responses
- Retrieval Quality: Effectiveness of document retrieval across conversation turns
- Context Management: How well systems maintain and utilize conversation history
- Grounding & Attribution: Proper citation and factual grounding in retrieved documents
Technical Focus
- Multi-turn conversation modeling for RAG systems
- Evaluation metrics for retrieval-conditioned generation
- Context window management across conversation history
- Failure mode analysis in conversational RAG settings
Connection to Research Interests
This work directly relates to my thesis and professional experience:
- Builds on RASS project experience with RAG evaluation (RAGAS, TruLens)
- Addresses evaluation as a first-class system requirement
- Focuses on realistic failure modes and edge cases
- Explores retrieval-conditioned failure patterns
Skills & Tools
Python, LangChain, RAG Systems, Multi-turn Dialogue, Evaluation Frameworks, RAGAS, TruLens