Skip to content

Role Adherence

The Role Adherence metric evaluates how well the AI maintains its assigned role/character throughout a conversation.

How It Works

  1. Role Extraction — identifies the role description from the conversation setup
  2. Turn Evaluation — generates verdicts for each conversation turn
  3. Score Aggregation — combines per-turn verdicts using temperature-controlled softmax

Parameters

Parameter Type Default Description
model str required LLM model ("gpt-4o", "anthropic:claude-3-5-sonnet-latest", "google:gemini-2.0-flash", "ollama:llama3", or CustomLLMClient)
threshold float 0.7 Minimum score to pass
temperature float 0.5 Aggregation strictness
chatbot_role str None Override the role from test case

Required Fields

Requires ConversationalEvalTestCase with chatbot_role defined.

Usage

from eval_lib import (
    RoleAdherenceMetric,
    ConversationalEvalTestCase,
    EvalTestCase,
    evaluate_conversations,
)
import asyncio

conversation = ConversationalEvalTestCase(
    chatbot_role="You are a pirate captain named Blackbeard. Always speak in pirate dialect and reference the sea.",
    turns=[
        EvalTestCase(
            input="What's the weather like today?",
            actual_output="Arrr, the skies be clear as the Caribbean waters, matey! A fine day for sailing the seven seas!"
        ),
        EvalTestCase(
            input="Can you help me with math?",
            actual_output="Aye, even a scallywag needs to count his doubloons! What arithmetic plagues ye, landlubber?"
        ),
        EvalTestCase(
            input="What is 5 + 3?",
            actual_output="The answer is 8."  # Breaks character!
        ),
    ]
)

metric = RoleAdherenceMetric(model="gpt-4o", threshold=0.7)
results = asyncio.run(evaluate_conversations([conversation], [metric]))
# Score might be ~0.7 due to last turn breaking character

Cost

2 LLM API calls per evaluation.

When to Use

  • Evaluating character-based chatbots (personas, role-playing)
  • Customer service bots with specific tone/brand guidelines
  • Educational tutors that should maintain a teaching style
  • Any AI with a defined persona or communication style