7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Elena Izzo*, Luca Parolari*, Davide Vezzaro*, Lamberto Ballan
Department of Mathematics, University of Padova, Padova, Italy

🌟 Accepted to 23rd International Conference on Image Analysis and Processing (ICIAP) 2025, 15-19 September 2025, Rome, Italy

*These authors contributed equally to this work and are listed in alphabetical order.

Abstract

Layout-guided text-to-image models offer greater control over the generation process by explicitly conditioning image synthesis on the spatial arrangement of elements. As a result, their adoption has increased in many computer vision applications, ranging from content creation to synthetic data generation. A critical challenge is achieving precise alignment between the image, textual prompt, and layout, ensuring semantic fidelity and spatial accuracy. Although recent benchmarks assess text alignment, layout alignment remains overlooked, and no existing benchmark jointly evaluates both. This gap limits the ability to evaluate a model's spatial fidelity, which is crucial when using layout-guided generation for synthetic data, as errors can introduce noise and degrade data quality. In this work, we introduce 7Bench, the first benchmark to assess both semantic and spatial alignment in layout-guided text-to-image generation. It features text-and-layout pairs spanning seven challenging scenarios, investigating object generation, color fidelity, attribute recognition, inter-object relationships, and spatial control. We propose an evaluation protocol that builds on existing frameworks by incorporating the layout alignment score to assess spatial accuracy. Using 7Bench, we evaluate several state-of-the-art diffusion models, uncovering their respective strengths and limitations across diverse alignment tasks. The benchmark is available at https://github.com/Elizzo/7Bench.

Benchmark

Benchmark	N. samples	N. scenarios	Layout
7bench.csv	224	7	✔️

Format

The benchmark is distributed as a CSV file with one row per sample, including following columns:

Column	Optional	Description
`id`	-	Unique identifier for each sample.
`category`	-	One of the seven 7Bench scenarios: object_binding, object_relationship, overlapping_bboxes, small_bboxes, color_binding, attribute_binding and complex_composition.
`prompt`	-	Full natural-language prompt describing the scene.
layout
`obj1`	-	First object mentioned in the prompt (always present).
`bbox1`	-	Bounding box for `obj1`, formatted as `(x_min, y_min, x_max, y_max)`. Coordinates are in 512x512 space.
`obj2`	yes	Second object referenced in the prompt.
`bbox2`	yes	Bounding box for `obj2`.
`obj3`	yes	Third object referenced in the prompt.
`bbox3`	yes	Bounding box for `obj3`.
`obj4`	yes	Fourth object, used in complex scenarios.
`bbox4`	yes	Bounding box for `obj4`.

Reproducibility

We provide the code for the models we use to generate the images, the images we generated instructing this models with 7Bench prompts and the the code used to evaluate the generated images.

Models Under Evaluation

Model
Stable Diffusion 1.4
Cross Attention Guidance
GLIGEN
BoxDiff
Attention Refocusing

Generated Images

Images
7bench-SD14
7bench-SD_CAG
7bench-G
7bench-G_BD
7bench-G_AR

Evaluation Pipeline

The generated images can evaluated using our pipeline. The source code of the evaluation pipeline for layout-guided models adapated from TIFA can be found at:

Evaluation Pipeline

Further Resources

Other resources (e.g. a small prompt collection for testing) can be found here.

Citation

If you use 7Bench or any part of this repository in your research, please consider citing our work:

Proceedings coming soon...

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
benchmark		benchmark
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Abstract

Benchmark

Format

Reproducibility

Models Under Evaluation

Generated Images

Evaluation Pipeline

Further Resources

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

Abstract

Benchmark

Format

Reproducibility

Models Under Evaluation

Generated Images

Evaluation Pipeline

Further Resources

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages