RDL-Bench

RDL-Bench is a benchmark for task-aware question answering over relational databases. It contains natural-language questions, task metadata, schema information, temporal context, executable evidence or target-construction logic, and structured ground-truth answers.

Repository Contents

.
├── data/
│   ├── instances/        # Released benchmark instances and metadata
│   └── json/             # Cleaned JSON files and generation/profiling utilities
├── scripts/              # Dataset construction and validation scripts
├── DB_MANIFEST.txt       # File listing for DuckDB database files
├── mysql_csv_MANIFEST.txt# File listing for source CSV files
├── rdlbench/             # Evaluation pipeline in the RDL-Bench paper.  
├── requirements.txt
├── DATASET_CARD.md
└── CITATION.cff

Data Download

The lightweight repository contains benchmark metadata, instance files, and dataset construction scripts.

Large files are provided through the GitHub Release:

Release page: RDL-Bench

Please download the following assets from the release page:

dataset.zip: benchmark instance files
DB.zip: database files
mysql_csv.zip: CSV exports
json_cleaned.zip: cleaned JSON construction files

After downloading, place the files under the corresponding directories described in DATASET_CARD.md.

Data Format

Each benchmark instance is represented as JSON with the following main components:

instance_id: unique instance identifier.
metadata: task type, answer nature, complexity, involved tables, and temporal fields when available.
question: primary natural-language question and optional variants.
evidence: SQL evidence for reasoning tasks or target-construction logic for predictive tasks.
ground_truth: structured answer stored in raw_value.

Primary evaluation should use the structured raw_value field rather than natural-language answer surface forms.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Typical Workflow

The scripts are organized by construction stage:

1_data_install.py              # Export source relational tables to CSV
2_data_2db.py                  # Convert CSV tables to DuckDB databases
3_generate_metajson.py         # Generate metadata JSON files
5_sample_Context.py            # Sample context information
6_repeat_check.py              # Check duplicate or repeated instances
7_generate_raw_value1.py       # Execute SQL and fill structured ground truth
8_generate_nl_answer.py        # Generate formatted natural-language answers
9_check.py                     # Run final consistency checks

Some scripts require setting the target domain name and paths before execution. Avoid committing machine-specific absolute paths.

Notes

Use only records before the reference time for predictive feature construction.
Evaluate reasoning tasks against structured ground truth.
For instances whose answer type is set, the answer stored in the released JSON files may be truncated for readability and file-size control. The truncated value should not be treated as the complete gold answer. To obtain the complete set-valued answer, users should reconstruct it from the evidence field. For reasoning instances, this usually means executing the provided SQL query against the corresponding database.

Configuration

config.yaml controls all paths and hyperparameters:

data_dir:   ../json          # directory with per-dataset JSON files
db_dir:     ../DB            # directory with *.duckdb databases
output_dir: results          # where result JSON files are written
model_name: claude-sonnet-4-6
api_key:    ""               # leave blank if using ANTHROPIC_API_KEY env var

Citation

If you use RDL-Bench, please cite the paper associated with this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDL-Bench

Repository Contents

Data Download

Data Format

Setup

Typical Workflow

Notes

Configuration

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
rdlbench		rdlbench
scripts		scripts
CITATION.cff		CITATION.cff
CLEANING_REPORT.json		CLEANING_REPORT.json
DATASET_CARD.md		DATASET_CARD.md
DB_MANIFEST.txt		DB_MANIFEST.txt
README.md		README.md
VALIDATION_REPORT.md		VALIDATION_REPORT.md
mysql_csv_MANIFEST.txt		mysql_csv_MANIFEST.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

RDL-Bench

Repository Contents

Data Download

Data Format

Setup

Typical Workflow

Notes

Configuration

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages