Role Adherence¶

Name: Eval AI Library
Author: Aleksandr Meshkov

The Role Adherence metric evaluates how well the AI maintains its assigned role/character throughout a conversation.

How It Works¶

Role Extraction — identifies the role description from the conversation setup
Turn Evaluation — generates verdicts for each conversation turn
Score Aggregation — combines per-turn verdicts using temperature-controlled softmax

Parameters¶

Parameter	Type	Default	Description
`model`	`str`	required	LLM model (`"gpt-4o"`, `"anthropic:claude-3-5-sonnet-latest"`, `"google:gemini-2.0-flash"`, `"ollama:llama3"`, or `CustomLLMClient`)
`threshold`	`float`	`0.7`	Minimum score to pass
`temperature`	`float`	`0.5`	Aggregation strictness
`chatbot_role`	`str`	`None`	Override the role from test case

Required Fields¶

Requires ConversationalEvalTestCase with chatbot_role defined.

Usage¶

from eval_lib import (
    RoleAdherenceMetric,
    ConversationalEvalTestCase,
    EvalTestCase,
    evaluate_conversations,
)
import asyncio

conversation = ConversationalEvalTestCase(
    chatbot_role="You are a pirate captain named Blackbeard. Always speak in pirate dialect and reference the sea.",
    turns=[
        EvalTestCase(
            input="What's the weather like today?",
            actual_output="Arrr, the skies be clear as the Caribbean waters, matey! A fine day for sailing the seven seas!"
        ),
        EvalTestCase(
            input="Can you help me with math?",
            actual_output="Aye, even a scallywag needs to count his doubloons! What arithmetic plagues ye, landlubber?"
        ),
        EvalTestCase(
            input="What is 5 + 3?",
            actual_output="The answer is 8."  # Breaks character!
        ),
    ]
)

metric = RoleAdherenceMetric(model="gpt-4o", threshold=0.7)
results = asyncio.run(evaluate_conversations([conversation], [metric]))
# Score might be ~0.7 due to last turn breaking character

Cost¶

2 LLM API calls per evaluation.

When to Use¶

Evaluating character-based chatbots (personas, role-playing)
Customer service bots with specific tone/brand guidelines
Educational tutors that should maintain a teaching style
Any AI with a defined persona or communication style