Public Leaderboard
==================

OpenEvals includes a leaderboard system for tracking and comparing model performance.

Overview
--------

The leaderboard provides:

- Ranked model comparisons across benchmarks
- Historical performance tracking
- Public and private leaderboard modes
- Export capabilities for publications

Using the CLI
-------------

View Current Leaderboard
^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   python -m openevals.scripts.leaderboard --task mmlu

Submit Results
^^^^^^^^^^^^^^

.. code-block:: bash

   python -m openevals.scripts.leaderboard --submit results.yaml

Filter by Model Family
^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   python -m openevals.scripts.leaderboard --family gemma --task gsm8k

Input Formats
-------------

YAML Format
^^^^^^^^^^^

.. code-block:: yaml

   submission:
     model_name: "gemma-2b-it"
     model_family: "gemma"
     model_size: "2b"

   results:
     mmlu:
       overall: 0.65
       mathematics: 0.58
       computer_science: 0.72
     gsm8k:
       overall: 0.45

JSON Format
^^^^^^^^^^^

.. code-block:: json

   {
     "submission": {
       "model_name": "gemma-2b-it",
       "model_family": "gemma",
       "model_size": "2b"
     },
     "results": {
       "mmlu": {"overall": 0.65},
       "gsm8k": {"overall": 0.45}
     }
   }

Customization
-------------

Custom Ranking
^^^^^^^^^^^^^^

Configure ranking criteria:

.. code-block:: python

   from openevals.leaderboard import Leaderboard

   lb = Leaderboard()
   lb.set_ranking_weights({
       "mmlu": 0.3,
       "gsm8k": 0.2,
       "humaneval": 0.3,
       "arc": 0.2
   })

Filtering Options
^^^^^^^^^^^^^^^^^

.. code-block:: python

   # Filter by model size
   lb.filter(min_size="7b", max_size="70b")

   # Filter by date
   lb.filter(after="2025-01-01")

   # Filter by benchmark score
   lb.filter(min_mmlu=0.5)

Deployment
----------

Web Interface
^^^^^^^^^^^^^

The web platform includes an interactive leaderboard:

.. code-block:: bash

   cd web/backend && uvicorn app.main:app --port 8000

Access at http://localhost:8000/api/v1/benchmarks/leaderboard.

API Endpoints
^^^^^^^^^^^^^

.. list-table::
   :header-rows: 1
   :widths: 30 70

   * - Endpoint
     - Description
   * - ``GET /leaderboard``
     - Retrieve current rankings
   * - ``POST /leaderboard/submit``
     - Submit new results
   * - ``GET /leaderboard/history``
     - View historical rankings
   * - ``GET /leaderboard/export``
     - Export as CSV/JSON

Export Options
--------------

Export for Publications
^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

   python -m openevals.scripts.leaderboard --export latex --output table.tex

Available formats:

- LaTeX table
- Markdown table
- CSV
- JSON

Example LaTeX Output
^^^^^^^^^^^^^^^^^^^^

.. code-block:: latex

   \begin{table}[h]
   \centering
   \begin{tabular}{lcccc}
   \toprule
   Model & MMLU & GSM8K & HumanEval & ARC \\
   \midrule
   Llama 3 70B & 0.82 & 0.74 & 0.68 & 0.85 \\
   Gemma 27B & 0.75 & 0.68 & 0.62 & 0.78 \\
   \bottomrule
   \end{tabular}
   \caption{Model performance comparison}
   \end{table}