SemEval 2026 Task 8 (MTRAGEval)

Multi-Turn RAG Evaluation

SemEval 2026 Task 8: MTRAGEval - Multi-Turn RAG Evaluation

Purdue University Fort Wayne, 2026

Participating in SemEval 2026 Task 8 focused on evaluating Retrieval-Augmented Generation (RAG) systems in multi-turn conversational settings. This task addresses the critical challenge of assessing RAG system performance when context builds across multiple interactions.

Task Overview

MTRAGEval evaluates RAG systems across several dimensions:

  • Answer Quality: Accuracy, completeness, and relevance of generated responses
  • Retrieval Quality: Effectiveness of document retrieval across conversation turns
  • Context Management: How well systems maintain and utilize conversation history
  • Grounding & Attribution: Proper citation and factual grounding in retrieved documents

Technical Focus

  • Multi-turn conversation modeling for RAG systems
  • Evaluation metrics for retrieval-conditioned generation
  • Context window management across conversation history
  • Failure mode analysis in conversational RAG settings

Connection to Research Interests

This work directly relates to my thesis and professional experience:

  • Builds on RASS project experience with RAG evaluation (RAGAS, TruLens)
  • Addresses evaluation as a first-class system requirement
  • Focuses on realistic failure modes and edge cases
  • Explores retrieval-conditioned failure patterns

Skills & Tools

Python, LangChain, RAG Systems, Multi-turn Dialogue, Evaluation Frameworks, RAGAS, TruLens