This repository is a working example for the SISAP 2026 Indexing Challenge https://sisap-challenges.github.io/, working with Python and GitHub Actions.
git clone https://github.com/sisap-challenges/sisap26-python-baseline
cd sisap26-python-baselineThis repository requires Python 3.9+ and several dependencies. We provide a helper script for easy setup, or you can install manually.
Use the provided install script to set up a virtual environment and install dependencies (including CPU-optimized PyTorch):
chmod +x install.sh
./install.sh
source venv/bin/activate- Install base requirements:
pip install -r requirements.txt
- Install CPU-only PyTorch (to avoid large CUDA downloads):
pip install torch~=2.4.0 --index-url https://download.pytorch.org/whl/cpu
Build and run using Docker:
docker build -t sisap-baseline .The suggested approach is to run the the Docker container as detailed in run_search.sh.
python eval.py results.csvwill produce a summary file of the results with the computed recall against the ground truth data.
This csv file can be further processed to create plots (using python plot.py --task {task1, task2, task3} res.csv) and show the fastest solutions above a certain recall threshold (using python show_operating_points.py).
Each dataset directory under data/ contains a config.json file that describes the task. The fields are:
| Field | Type | Description |
|---|---|---|
task |
string | Task identifier: "task1", "task2", or "task3" |
data |
string | HDF5 group containing the database vectors (e.g. "train") |
queries |
string | HDF5 path to the query vectors (task2/task3 only) |
gt_I |
string or array | HDF5 path(s) to the ground-truth nearest-neighbor indices |
k |
int | Number of nearest neighbors to retrieve |
dataset_name |
string | Human-readable dataset identifier |
filename |
string | Name of the HDF5 data file |
sparse |
bool | If true, vectors are sparse (task3 only); absent means false |
Example (task1): all-kNN — no separate query set; gt_I is a list of two HDF5 paths ["allknn", "knns"] pointing to the full neighbor graph.
Example (task2/3): query-search — queries and gt_I are single HDF5 paths for the query vectors and their ground-truth neighbors, respectively.
You can fork this repository and polish it to create your solution. Please also take care of the ci workflow (see below).
You can monitor your runnings in the "Actions" tab of the GitHub panel: for instance, you can see some runs of this repository: https://github.com/sisap-challenges/sisap26-python-baseline/actions
Install the TIRA CLI:
pip3 install --upgrade tiraThen do a dry run to verify your approach works locally (use task-2-spot-check-20260528-training or task-3-spot-check-20260528-training if your approach only targets task 2 or task 3, respectively):
tira-cli code-submission \
--path . \
--command '/app/search.py --input $inputDataset/*.h5 --task-description $inputDataset/config.json --output $outputDir' \
--task sisap-2026 \
--dataset task-1-spot-check-20260528-training \
--dry-run- Go to https://www.tira.io/ and sign up (or log in via GitHub).
- Navigate to the SISAP 2026 Indexing Challenge at https://www.tira.io/task-overview/sisap-2026 and click Register.
- Optional: add team members via https://www.tira.io/g?type=my.
Obtain your authentication token:
- Navigate to https://www.tira.io/task-overview/sisap-2026
- Click submit => Code Submissions => New Submission => I want to submit from my local machine
Then authenticate:
tira-cli login --token AUTH-TOKENVerify the setup:
tira-cli verify-installation --task sisap-2026 --team YOUR-TEAMSubmit by re-running the command from Step 1 without --dry-run:
tira-cli code-submission \
--path . \
--command '/app/search.py --input $inputDataset/*.h5 --task-description $inputDataset/config.json --output $outputDir' \
--task sisap-2026 \
--dataset task-1-spot-check-20260528-training- Navigate to https://www.tira.io/task-overview/sisap-2026
- Click submit => Code Submissions, select your submission
- Choose a dataset and the resources on which it should run
It is sufficient to run your submission on a few datasets; the organizers will script execution on all datasets once everything looks correct.