MedExplain-Evals Documentation
Welcome to MedExplain-Evals, a resource-efficient benchmark for evaluating audience-adaptive explanation quality in medical Large Language Models.
This project is developed as part of Google Summer of Code 2025, mentored by Google DeepMind.
Getting Started:
Core Functionality:
API Reference:
Overview
MedExplain-Evals addresses a critical gap in medical AI evaluation by providing the first benchmark specifically designed to assess an LLM’s ability to generate audience-adaptive medical explanations for four key stakeholders:
Physicians - Technical, evidence-based explanations
Nurses - Practical care implications and monitoring
Patients - Simple, empathetic, jargon-free language
Caregivers - Concrete tasks and warning signs
Key Features
Novel evaluation framework for audience-adaptive medical explanations
Support for MedQA-USMLE, iCliniq, and Cochrane Reviews datasets
Advanced safety metrics including contradiction and hallucination detection
Automated complexity stratification using Flesch-Kincaid Grade Level
Interactive HTML leaderboards for result visualization
Multi-dimensional scoring with LLM-as-a-judge paradigm
Optimized for open-weight models on consumer hardware
Quick Start
pip install -r requirements.txt
from src.benchmark import MedExplain
from src.evaluator import MedExplainEvaluator
# Initialize benchmark
bench = MedExplain()
# Generate audience-adaptive explanations
explanations = bench.generate_explanations(medical_content, model)
# Evaluate explanations
evaluator = MedExplainEvaluator()
scores = evaluator.evaluate_all_audiences(explanations)
Architecture
MedExplain-Evals is built with SOLID principles:
Strategy Pattern for audience-specific scoring
Dependency Injection for flexible component management
Configuration-driven design with YAML configuration
Comprehensive logging for debugging and monitoring
Getting Help
Documentation
Primary documentation: This comprehensive guide covers installation, usage, and advanced topics
API Reference: Detailed function and class documentation with examples
Quickstart Guide: Quick Start Guide
Installation Guide: Installation
Support Channels
Bug Reports: GitHub Issues
Questions: GitHub Discussions
Troubleshooting
# Verify installation
python -c "import src; print('MedExplain-Evals is working')"
# Run basic test
python run_benchmark.py --model_name dummy --max_items 2
Contributing
We welcome contributions:
Code contributions via Pull Requests
Bug reports and feature requests via Issues
Documentation improvements
Research collaborations
See our Contributing Guidelines.
Citation
@software{medexplain-evals-2025,
title={MedExplain-Evals: A Resource-Efficient Benchmark for Evaluating Audience-Adaptive Explanation Quality in Medical Large Language Models},
author={Cheng Hei Lam},
year={2025},
url={https://github.com/heilcheng/medexplain-evals}
}