image captions using blip.#204
Conversation
ad-claw000
left a comment
There was a problem hiding this comment.
This workflow looks like a great addition. LGTM!
ad-claw000
left a comment
There was a problem hiding this comment.
Great addition! However, I noticed a few issues that should be addressed before this is merged:
- Hardcoded
batch_size: Inapps/caption-image/app/images.py,self.batch_size = 32is hardcoded insideFindImageQueryGenerator.__init__. It ignores thebatch_sizeparameter passed from the CLI incaption_images.py. You should passbatch_sizeinto the generator constructor and use it there. - Pagination logic during updates: The query generator fetches batches using
batch_id: idxwhile filtering onwf_caption_image == None. Because the response handler updates these entities and removes theNonecondition, the total number of matching images changes dynamically. Depending on how ApertureDB evaluatesbatch_id, this could lead to skipping images (e.g. batch 1 shifts into batch 0's place after batch 0 is updated). A safer approach might be to not rely onbatch_idand instead repeatedly request the firstNitems (e.g., limit), or use a stable identifier for pagination. - Module-level Model Loading: The
AutoProcessorandBlipForConditionalGenerationare loaded at the module level inimages.py. This means they are loaded into memory as soon as the module is imported, even if just running--help. Consider lazy-loading them inside the class or function, or at least only when the command is actually executed.
Please let me know when these are updated!
- Pass batch_size from CLI down to QueryGenerator - Replace batch_id pagination with limit to handle dynamic properties - Lazy-load AutoProcessor and Blip model to improve startup time
|
I've pushed a commit to address the review comments:
Let me know if this looks good to go! |
There was a problem hiding this comment.
Pull request overview
This PR introduces a new caption-image workflow that auto-generates image captions using the HuggingFace BLIP model, adds container/build plumbing for it, and adds devcontainer configurations to run workflows against a local ApertureDB stack.
Changes:
- Added a new
apps/caption-imageworkflow that finds uncaptioned images and writes captions back to ApertureDB. - Integrated the new app into CI builds and added a Docker build-time “warmup/validation” step.
- Added devcontainer configs (compose + scripts) to spin up a local ApertureDB + Lenz + WebUI stack for multiple workflows.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| workflows-devcontainer.code-workspace | Workspace config to open workflows repo alongside ../app. |
| postinstall.sh | Devcontainer post-create script to configure adb and install shell completion. |
| initcommand.sh | Devcontainer init script (build base image + generate .env). |
| configuration_params.py | Emits ADB_PORT based on platform (mac vs others). |
| base/docker/scripts/sitecustomize.py | Improves global exception hook (docstring + avoids shadowing type). |
| apps/caption-image/requirements.txt | Adds Python dependency on transformers. |
| apps/caption-image/README.md | Documents the new caption-image workflow and usage. |
| apps/caption-image/Dockerfile | Builds caption-image image; installs torch/torchvision + transformers; runs validation. |
| apps/caption-image/app/warmup_validate.py | Build-time BLIP warmup + caption assertion. |
| apps/caption-image/app/images.py | QueryGenerator implementation: fetches images, runs BLIP, updates captions. |
| apps/caption-image/app/caption_images.py | Typer-based entrypoint for running the caption workflow. |
| apps/caption-image/app/app.sh | Container entrypoint script that runs caption_images.py. |
| .vscode/launch.json | Debug configuration for Python. |
| .gitignore | Ignores aperturedb/ directories (local dev data). |
| .github/workflows/main.yml | Adds caption-image to the CI build matrix. |
| .devcontainer/dataset-ingestion/docker-compose.yml | Local stack compose file for dataset-ingestion devcontainer. |
| .devcontainer/dataset-ingestion/devcontainer.json | VS Code devcontainer definition for dataset-ingestion. |
| .devcontainer/crawl-website/docker-compose.yml | Local stack compose file for crawl-website devcontainer. |
| .devcontainer/crawl-website/devcontainer.json | VS Code devcontainer definition for crawl-website. |
| .devcontainer/caption-image/docker-compose.yml | Local stack compose file for caption-image devcontainer. |
| .devcontainer/caption-image/devcontainer.json | VS Code devcontainer definition for caption-image. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Moved configuration_params.py to .devcontainer - Updated initcommand.sh to loop over all devcontainers - Provided ADB_PORT default in docker-compose.yml files - Updated images.py to correctly use batching, add PyTorch inference context, fix missing DONE state, handle execution query errors - Fixes to warmup_validate.py to avoid external network request - Replaced PIP commands with requirements.txt - Addressed logging and env var issues in caption_images.py
- Add threading.Lock to get_model_and_processor lazy init - Remove unused desc_blobs variable - Change PRELOAD_MODEL default to false in Dockerfile - Replace torchvision with pillow in requirements.txt - Fix WORKFLOW_VERSION quoting in initcommand.sh - Add caption-image service to docker-compose.yml
|
Replaced the |
- Use count query + server side batch in FindImageQueryGenerator - Add validation for batch_size > 0 - Switch to RGB and handle decode exceptions - Simplify uniqueids-captions zipping to skip failed - Update log levels to check WF_LOG_LEVEL - Add PRELOAD_MODEL flag to docker-compose.yml - Add proper evaluation and no_grad to warmup_validate - Clean up docs to reflect python None - Add set -euo pipefail to bash scripts
|
I've restored the stable identifier for pagination to prevent skipping images (using |
- Use batching in FindImage to avoid memory explosion (resolves #3322118023) - Fix double-checked locking thread-safety issue (resolves #3322118041) - Differentiate and abort on transient/system errors (resolves #3322118049)
…or handling - Implement standard RUN_ONCE and SLEEPING_TIME loop in app.sh, and add status_tools.py progress reporting. - Add RUN_ONCE to caption-image service in docker-compose.yml for consistency. - Handle len(uniqueids) != len(r_blobs) mismatch by marking images as failed instead of silently looping. - Handle per-image generation failures by appending to failed list instead of raising RuntimeError and aborting the batch.
…kipping images during updates
|
I have addressed the review feedback by restoring the stable identifier ( |
Addresses the review comment to capture return values and handle non-zero status for update failures.
|
I have restored the stable identifier for pagination to prevent skipping images (using |
| volumes: | ||
| - ./aperturedb/certificate:/cert |
| volumes: | ||
| - ./aperturedb/certificate:/etc/lenz/certificate |
| volumes: | ||
| - ./aperturedb/db:/aperturedb/db | ||
| - ./aperturedb/logs:/aperturedb/logs |
| volumes: | ||
| - ./aperturedb/certificate:/etc/nginx/certificate |
| aperturedb: | ||
| image: aperturedata/aperturedb-community:latest |
| aperturedb: | ||
| condition: service_started |
Adds auto generation of image captions using BLIP.
https://huggingface.co/docs/transformers/main/en/model_doc/blip#transformers.BlipForConditionalGeneration
TODO:
Add tests: Adding a validation at build time with a basic script.