Test Case Generation¶
Eval AI Library includes a powerful test case generator that creates evaluation datasets from your documents. It supports 15+ document formats including PDF, DOCX, CSV, JSON, HTML, and images with OCR.
Supported Formats¶
| Category | Formats |
|---|---|
| Text | .txt, .md, .rtf, .xml, .json, .yaml, .html |
| Office | .pdf, .docx, .docm, .xlsx, .pptx |
| Data | .csv, .tsv |
| Images | .png, .jpg, .jpeg (with OCR via Tesseract) |
Quick Start¶
from eval_lib import DatasetGenerator
generator = DatasetGenerator(
model="gpt-4o",
input_format="question",
expected_output_format="answer",
agent_description="You are a helpful assistant that answers questions about machine learning.",
max_rows=20,
language="en"
)
test_cases = generator.generate(file_path="./knowledge_base.pdf")
# Returns a list of EvalTestCase objects
for tc in test_cases:
print(f"Q: {tc.input}")
print(f"A: {tc.expected_output}")
print(f"Context: {tc.retrieval_context[:1]}")
print("---")
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | required | LLM for generation |
input_format | str | required | How inputs are formatted (e.g., "question") |
expected_output_format | str | required | Expected answer format (e.g., "answer") |
agent_description | str | None | System role/context for generation |
test_types | list[str] | None | Types of test cases to generate |
question_length | str | "mixed" | "short", "medium", "long", "mixed" |
question_openness | str | "mixed" | "open", "closed", "mixed" |
chunk_size | int | 1024 | Document chunking size (characters) |
chunk_overlap | int | 100 | Overlap between chunks |
max_rows | int | 10 | Number of test cases to generate |
temperature | float | 0.3 | Generation temperature |
trap_density | float | 0.1 | Proportion of trap/adversarial questions |
language | str | "en" | Language for generated test cases |
embedding_model | str | "openai:text-embedding-3-small" | Model for semantic similarity |
relevance_margin | float | 1.5 | Threshold for context relevance |
Document Loading¶
You can also use the document loader directly:
from eval_lib import DocumentLoader
# Load documents
docs = DocumentLoader.load_documents("./data/report.pdf")
# Chunk for RAG evaluation
chunks = DocumentLoader.chunk_documents(docs, chunk_size=1024, overlap=100)
Advanced Examples¶
Generate from Multiple Sources¶
generator = DatasetGenerator(
model="gpt-4o",
input_format="technical question",
expected_output_format="detailed technical answer",
max_rows=50,
language="en"
)
# Generate from different document types
test_cases_pdf = generator.generate(file_path="./docs/api_reference.pdf")
test_cases_md = generator.generate(file_path="./docs/user_guide.md")
test_cases_csv = generator.generate(file_path="./data/faq.csv")
all_test_cases = test_cases_pdf + test_cases_md + test_cases_csv
Customize Question Types¶
# Short, factual questions
generator = DatasetGenerator(
model="gpt-4o",
input_format="question",
expected_output_format="brief factual answer",
question_length="short",
question_openness="closed",
max_rows=30
)
# Open-ended, detailed questions
generator = DatasetGenerator(
model="gpt-4o",
input_format="question",
expected_output_format="comprehensive answer with examples",
question_length="long",
question_openness="open",
max_rows=15
)
With Trap Questions¶
Trap questions test the AI's ability to say "I don't know" when the answer isn't in the context:
generator = DatasetGenerator(
model="gpt-4o",
input_format="question",
expected_output_format="answer",
trap_density=0.2, # 20% of questions will be traps
max_rows=50
)
Multilingual Generation¶
# Russian
generator = DatasetGenerator(
model="gpt-4o",
input_format="вопрос",
expected_output_format="подробный ответ",
language="ru",
max_rows=20
)
# Spanish
generator = DatasetGenerator(
model="gpt-4o",
input_format="pregunta",
expected_output_format="respuesta detallada",
language="es",
max_rows=20
)
Use Generated Test Cases¶
from eval_lib import evaluate, AnswerRelevancyMetric, FaithfulnessMetric
# Generate test cases
test_cases = generator.generate(file_path="./knowledge_base.pdf")
# Evaluate your RAG system
metrics = [
AnswerRelevancyMetric(model="gpt-4o", threshold=0.7),
FaithfulnessMetric(model="gpt-4o", threshold=0.7),
]
results = asyncio.run(evaluate(test_cases, metrics))
OCR for Images¶
For image-based documents, ensure Tesseract is installed:
# Extracts text from images using OCR
test_cases = generator.generate(file_path="./scanned_document.png")
See Installation for Tesseract setup instructions.