API Reference
Complete API documentation for MedExplain-Evals, the benchmark for evaluating audience-adaptive medical explanation quality in LLMs.
Core Modules
Quick Reference
Model Clients
The entry point for interacting with LLM providers.
from src import UnifiedModelClient
client = UnifiedModelClient()
result = client.generate(
model="gpt-5.2",
messages=[{"role": "user", "content": "Explain diabetes"}]
)
See Model Clients for full documentation.
Ensemble Judge
Multi-model evaluation with weighted ensemble scoring.
from src import EnsembleLLMJudge
judge = EnsembleLLMJudge()
score = judge.evaluate(
original_content="Type 2 DM with HbA1c 8.5%...",
explanation="You have high blood sugar...",
audience="patient"
)
print(f"Overall: {score.overall}/5.0")
See Ensemble Judge for full documentation.
Audience Personas
Sophisticated audience modeling with 11 predefined personas.
from src import PersonaFactory
persona = PersonaFactory.get_predefined_persona("patient_low_literacy")
print(persona.health_literacy) # "low"
print(persona.reading_level_target) # (6, 10)
See Audience Personas for full documentation.
Knowledge Grounding
Medical knowledge base integration for factuality verification.
from src import MedicalKnowledgeGrounder
grounder = MedicalKnowledgeGrounder()
score = grounder.ground_explanation(
original="Diabetes mellitus type 2...",
explanation="You have high blood sugar..."
)
print(f"Factual accuracy: {score.factual_accuracy}")
See Knowledge Grounding for full documentation.
Safety Evaluation
Comprehensive medical safety assessment.
from src import MedicalSafetyEvaluator
evaluator = MedicalSafetyEvaluator()
score = evaluator.evaluate(
explanation="Stop your medication...",
medical_context="Cardiovascular"
)
print(f"Passed: {score.passed}")
See Safety Evaluation for full documentation.
Package Structure
src/
├── __init__.py # Package exports
├── model_clients.py # LLM provider clients
├── ensemble_judge.py # Multi-model judge ensemble
├── audience_personas.py # Audience modeling
├── knowledge_grounding.py # Medical KB integration
├── safety_evaluator.py # Safety assessment
├── evaluator.py # Legacy evaluator
├── benchmark.py # Benchmark runner
├── data_loaders_v2.py # Dataset loading
├── validation.py # Validation framework
└── multimodal_evaluator.py # Image + text evaluation
Environment Variables
Required API keys for full functionality:
Variable |
Description |
|---|---|
|
OpenAI API key for GPT models |
|
Anthropic API key for Claude models |
|
Google AI API key for Gemini models |
|
DeepSeek API key |
|
UMLS Terminology Services key |
|
AWS credentials for Amazon Nova |
|
AWS secret key |
Common Imports
# Core functionality
from src import (
# Model clients
UnifiedModelClient,
GenerationResult,
# Ensemble judge
EnsembleLLMJudge,
EnsembleScore,
JudgeConfig,
# Personas
PersonaFactory,
AudiencePersona,
AudienceType,
# Knowledge grounding
MedicalKnowledgeGrounder,
MedicalEntityExtractor,
UMLSClient,
RxNormClient,
# Safety
MedicalSafetyEvaluator,
SafetyScore,
DrugSafetyChecker,
)