API Reference

Complete API documentation for MedExplain-Evals, the benchmark for evaluating audience-adaptive medical explanation quality in LLMs.

Quick Reference 

Model Clients

The entry point for interacting with LLM providers.

from src import UnifiedModelClient

client = UnifiedModelClient()
result = client.generate(
    model="gpt-5.2",
    messages=[{"role": "user", "content": "Explain diabetes"}]
)

See Model Clients for full documentation.

Ensemble Judge

Multi-model evaluation with weighted ensemble scoring.

from src import EnsembleLLMJudge

judge = EnsembleLLMJudge()
score = judge.evaluate(
    original_content="Type 2 DM with HbA1c 8.5%...",
    explanation="You have high blood sugar...",
    audience="patient"
)
print(f"Overall: {score.overall}/5.0")

See Ensemble Judge for full documentation.

Audience Personas

Sophisticated audience modeling with 11 predefined personas.

from src import PersonaFactory

persona = PersonaFactory.get_predefined_persona("patient_low_literacy")
print(persona.health_literacy)  # "low"
print(persona.reading_level_target)  # (6, 10)

See Audience Personas for full documentation.

Knowledge Grounding

Medical knowledge base integration for factuality verification.

from src import MedicalKnowledgeGrounder

grounder = MedicalKnowledgeGrounder()
score = grounder.ground_explanation(
    original="Diabetes mellitus type 2...",
    explanation="You have high blood sugar..."
)
print(f"Factual accuracy: {score.factual_accuracy}")

See Knowledge Grounding for full documentation.

Safety Evaluation

Comprehensive medical safety assessment.

from src import MedicalSafetyEvaluator

evaluator = MedicalSafetyEvaluator()
score = evaluator.evaluate(
    explanation="Stop your medication...",
    medical_context="Cardiovascular"
)
print(f"Passed: {score.passed}")

See Safety Evaluation for full documentation.

Package Structure 

src/
├── __init__.py           # Package exports
├── model_clients.py      # LLM provider clients
├── ensemble_judge.py     # Multi-model judge ensemble
├── audience_personas.py  # Audience modeling
├── knowledge_grounding.py # Medical KB integration
├── safety_evaluator.py   # Safety assessment
├── evaluator.py          # Legacy evaluator
├── benchmark.py          # Benchmark runner
├── data_loaders_v2.py    # Dataset loading
├── validation.py         # Validation framework
└── multimodal_evaluator.py # Image + text evaluation

Environment Variables 

Required API keys for full functionality:

Variable	Description
`OPENAI_API_KEY`	OpenAI API key for GPT models
`ANTHROPIC_API_KEY`	Anthropic API key for Claude models
`GOOGLE_API_KEY`	Google AI API key for Gemini models
`DEEPSEEK_API_KEY`	DeepSeek API key
`UMLS_API_KEY`	UMLS Terminology Services key
`AWS_ACCESS_KEY_ID`	AWS credentials for Amazon Nova
`AWS_SECRET_ACCESS_KEY`	AWS secret key

Common Imports 

# Core functionality
from src import (
    # Model clients
    UnifiedModelClient,
    GenerationResult,

    # Ensemble judge
    EnsembleLLMJudge,
    EnsembleScore,
    JudgeConfig,

    # Personas
    PersonaFactory,
    AudiencePersona,
    AudienceType,

    # Knowledge grounding
    MedicalKnowledgeGrounder,
    MedicalEntityExtractor,
    UMLSClient,
    RxNormClient,

    # Safety
    MedicalSafetyEvaluator,
    SafetyScore,
    DrugSafetyChecker,
)

API Reference

Core Modules

Quick Reference

Model Clients

Ensemble Judge

Audience Personas

Knowledge Grounding

Safety Evaluation

Package Structure

Environment Variables

Common Imports

Core Modules 

Quick Reference 

Package Structure 

Environment Variables 

Common Imports 