Role Adherence¶
The Role Adherence metric evaluates how well the AI maintains its assigned role/character throughout a conversation.
How It Works¶
- Role Extraction — identifies the role description from the conversation setup
- Turn Evaluation — generates verdicts for each conversation turn
- Score Aggregation — combines per-turn verdicts using temperature-controlled softmax
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | required | LLM model ("gpt-4o", "anthropic:claude-3-5-sonnet-latest", "google:gemini-2.0-flash", "ollama:llama3", or CustomLLMClient) |
threshold | float | 0.7 | Minimum score to pass |
temperature | float | 0.5 | Aggregation strictness |
chatbot_role | str | None | Override the role from test case |
Required Fields¶
Requires ConversationalEvalTestCase with chatbot_role defined.
Usage¶
from eval_lib import (
RoleAdherenceMetric,
ConversationalEvalTestCase,
EvalTestCase,
evaluate_conversations,
)
import asyncio
conversation = ConversationalEvalTestCase(
chatbot_role="You are a pirate captain named Blackbeard. Always speak in pirate dialect and reference the sea.",
turns=[
EvalTestCase(
input="What's the weather like today?",
actual_output="Arrr, the skies be clear as the Caribbean waters, matey! A fine day for sailing the seven seas!"
),
EvalTestCase(
input="Can you help me with math?",
actual_output="Aye, even a scallywag needs to count his doubloons! What arithmetic plagues ye, landlubber?"
),
EvalTestCase(
input="What is 5 + 3?",
actual_output="The answer is 8." # Breaks character!
),
]
)
metric = RoleAdherenceMetric(model="gpt-4o", threshold=0.7)
results = asyncio.run(evaluate_conversations([conversation], [metric]))
# Score might be ~0.7 due to last turn breaking character
Cost¶
2 LLM API calls per evaluation.
When to Use¶
- Evaluating character-based chatbots (personas, role-playing)
- Customer service bots with specific tone/brand guidelines
- Educational tutors that should maintain a teaching style
- Any AI with a defined persona or communication style