Implement of paper: GOAL: Global-local Object Alignment Learning
Visit our project page for additional information and interactive examples:
Our implementation is also available as a Docker image:
# Pull the image
docker pull qkenr0804/goal:goalDownload our fine-tuned weights from the links below:
-
🔍 ViT-Base16 Model: GOAL method fine-tuned with DOCCI
-
🔍 ViT-Base16 Model: GOAL method fine-tuned with DCI
-
🔍 ViT-Large14 Model: GOAL method fine-tuned with DOCCI
-
🔍 ViT-Large14 Model: GOAL method fine-tuned with DCI
Please download the datasets from the links below:
-
DOCCI Dataset
-
DCI Dataset
For our newly proposed evaluation protocols on DCI test set and ShareGPT4V test set, please refer to the JSON files available in the datasets folder of this repository.
You can fine-tuning the CLIP with GOAL method in goal_loss_finetuning.py
You can adjust datasets, ouput path, ... in get_args_parser()
python goal_loss_finetuning.pyUse your fine-tunned weight
You can evaluate retreival score using retrival_goal.py
You can evaluate mAP score about global+local test set using mAP_goal_jointtset.py
python retrieval_goal.py --ckpt <path/to/your/weight>python mAP_goal_jointtest.py --ckpt <path/to/your/weight>You can extract the attention map with you custum weight using visualization_attentionmap.py
python visualization_attentionmap.py --image_path <path/to/your/image> --output_path <path/to/your/output> --model L --ckpt <path/to/your/weight>If you find this work useful in your research, please consider citing:
@inproceedings{Hyungyu_2025_CVPR,
author={Hyungyu Choi, Young Kyun Jang, Chanho Eom},
title={GOAL: Global-local Object Alignment Learning},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}