Contributing
Thank you for your interest in contributing to OpenEvals.
Development Setup
Fork and Clone
git clone https://github.com/YOUR_USERNAME/openevals.git
cd openevals
Create Virtual Environment
python -m venv venv
source venv/bin/activate
Install Development Dependencies
pip install -r requirements.txt
pip install -e .
Install Pre-commit Hooks
pip install pre-commit
pre-commit install
Code Style
Python
Follow PEP 8 style guide
Use Black for formatting
Use isort for import sorting
Maximum line length: 88 characters
black .
isort .
TypeScript (Frontend)
Use ESLint and Prettier
Follow Next.js conventions
cd web/frontend
npm run lint
npm run format
Testing
Run All Tests
pytest tests/
Run with Coverage
pytest --cov=openevals tests/
Test Specific Module
pytest tests/test_benchmark.py -v
Project Structure
openevals/
+-- openevals/
| +-- core/ # Benchmark orchestration
| +-- tasks/ # Benchmark implementations
| +-- utils/ # Utilities and metrics
| +-- visualization/ # Charts and reporting
| +-- scripts/ # CLI entry points
+-- web/
| +-- backend/ # FastAPI server
| +-- frontend/ # Next.js app
+-- docs/ # Documentation
+-- tests/ # Test suite
+-- examples/ # Usage examples
Contribution Guidelines
Issues
Check existing issues before creating new ones
Use issue templates when available
Provide clear reproduction steps for bugs
Include system information (OS, Python version, GPU)
Pull Requests
Create a branch from
main:git checkout -b feature/your-feature-name
Make changes and commit:
git commit -m "Add: description of change"
Push and create PR:
git push origin feature/your-feature-name
PR requirements:
Clear description of changes
Tests for new functionality
Documentation updates if needed
All tests passing
Commit Message Format
<type>: <description>
[optional body]
Types:
Add- New featureFix- Bug fixDocs- DocumentationRefactor- Code refactoringTest- Test additions/changesChore- Build, CI, dependencies
Adding New Benchmarks
Create task file in
openevals/tasks/:from typing import Dict, Any from openevals.core.model_loader import ModelWrapper class NewTaskBenchmark: def __init__(self, config: Dict[str, Any]): self.config = config def load_data(self): pass def evaluate(self, model: ModelWrapper) -> Dict[str, Any]: return {"overall": {"accuracy": 0.0}}
Register in
openevals/tasks/__init__.pyAdd tests in
tests/test_new_task.pyDocument in
docs/
Adding New Models
Add loader in
openevals/core/model_loader.py:class NewModelLoader: def load_model(self, size: str, variant: str, **kwargs): pass
Register model type in configuration schema
Add tests and documentation
Questions
Open a GitHub Discussion: https://github.com/heilcheng/openevals/discussions
Check existing issues and documentation
License
By contributing, you agree that your contributions will be licensed under the MIT License.