Semantic Similarity¶

Name: Eval AI Library
Author: Aleksandr Meshkov

The Semantic Similarity metric computes the cosine similarity between the embeddings of actual_output and expected_output. No LLM model is needed.

How It Works¶

graph TD
    A[actual_output] --> B[1. Generate Embedding]
    C[expected_output] --> D[2. Generate Embedding]
    B --> E[3. Cosine Similarity]
    D --> E
    E --> F[Final Score 0.0-1.0]

Embed actual output — converts actual_output to a vector representation
Embed expected output — converts expected_output to a vector representation
Cosine similarity — computes the cosine of the angle between the two vectors

Parameters¶

Parameter	Type	Default	Description
`threshold`	`float`	`0.7`	Minimum score to pass
`embedding_provider`	`str`	`"openai"`	Embedding provider (`"openai"` or `"local"`)
`model_name`	`str`	provider default	Embedding model name

Required Fields¶

Field	Required
`actual_output`	Yes
`expected_output`	Yes
`input`	No
`retrieval_context`	No

Usage¶

from eval_lib.metrics.vector_metrics import SemanticSimilarityMetric
from eval_lib import EvalTestCase, evaluate
import asyncio

test_case = EvalTestCase(
    actual_output="Paris is the capital of France.",
    expected_output="The capital of France is Paris."
)

metric = SemanticSimilarityMetric(
    threshold=0.8,
    embedding_provider="openai",
    model_name="text-embedding-3-small"
)

results = asyncio.run(evaluate([test_case], [metric]))

Cost¶

1 embedding API call per evaluation (both texts are batched into a single request).

Example Scenarios¶

High Score (0.95+)¶

EvalTestCase(
    actual_output="The cat sat on the mat.",
    expected_output="A cat was sitting on the mat."
)
# Semantically near-identical statements

Low Score (< 0.5)¶

EvalTestCase(
    actual_output="Python is a programming language.",
    expected_output="The snake slithered through the grass."
)
# Same word, completely different meaning