Contributing

Thank you for your interest in contributing to OpenEvals.

Development Setup

Fork and Clone

git clone https://github.com/YOUR_USERNAME/openevals.git
cd openevals

Create Virtual Environment

python -m venv venv
source venv/bin/activate

Install Development Dependencies

pip install -r requirements.txt
pip install -e .

Install Pre-commit Hooks

pip install pre-commit
pre-commit install

Code Style

Python

  • Follow PEP 8 style guide

  • Use Black for formatting

  • Use isort for import sorting

  • Maximum line length: 88 characters

black .
isort .

TypeScript (Frontend)

  • Use ESLint and Prettier

  • Follow Next.js conventions

cd web/frontend
npm run lint
npm run format

Testing

Run All Tests

pytest tests/

Run with Coverage

pytest --cov=openevals tests/

Test Specific Module

pytest tests/test_benchmark.py -v

Project Structure

openevals/
+-- openevals/
|   +-- core/           # Benchmark orchestration
|   +-- tasks/          # Benchmark implementations
|   +-- utils/          # Utilities and metrics
|   +-- visualization/  # Charts and reporting
|   +-- scripts/        # CLI entry points
+-- web/
|   +-- backend/        # FastAPI server
|   +-- frontend/       # Next.js app
+-- docs/               # Documentation
+-- tests/              # Test suite
+-- examples/           # Usage examples

Contribution Guidelines

Issues

  • Check existing issues before creating new ones

  • Use issue templates when available

  • Provide clear reproduction steps for bugs

  • Include system information (OS, Python version, GPU)

Pull Requests

  1. Create a branch from main:

    git checkout -b feature/your-feature-name
    
  2. Make changes and commit:

    git commit -m "Add: description of change"
    
  3. Push and create PR:

    git push origin feature/your-feature-name
    
  4. PR requirements:

    • Clear description of changes

    • Tests for new functionality

    • Documentation updates if needed

    • All tests passing

Commit Message Format

<type>: <description>

[optional body]

Types:

  • Add - New feature

  • Fix - Bug fix

  • Docs - Documentation

  • Refactor - Code refactoring

  • Test - Test additions/changes

  • Chore - Build, CI, dependencies

Adding New Benchmarks

  1. Create task file in openevals/tasks/:

    from typing import Dict, Any
    from openevals.core.model_loader import ModelWrapper
    
    class NewTaskBenchmark:
        def __init__(self, config: Dict[str, Any]):
            self.config = config
    
        def load_data(self):
            pass
    
        def evaluate(self, model: ModelWrapper) -> Dict[str, Any]:
            return {"overall": {"accuracy": 0.0}}
    
  2. Register in openevals/tasks/__init__.py

  3. Add tests in tests/test_new_task.py

  4. Document in docs/

Adding New Models

  1. Add loader in openevals/core/model_loader.py:

    class NewModelLoader:
        def load_model(self, size: str, variant: str, **kwargs):
            pass
    
  2. Register model type in configuration schema

  3. Add tests and documentation

Questions

License

By contributing, you agree that your contributions will be licensed under the MIT License.