This project provides an agentic workflow for the yt volumetric data analysis library. It allows users to perform complex data analysis tasks using natural language queries, guided by a curated knowledge base of yt best practices.
-
Clone the repository:
git clone <repository-url> cd yt-agent-project
-
Install dependencies: This project uses
pyproject.toml. You can install it in editable mode:pip install -e .Note: You will need the
google-genaiandytpackages. -
Set up your API Key: The agent uses Google's Gemini models. You need to set your API key as an environment variable:
export GOOGLE_API_KEY="your_api_key_here"
Start a chat session with the agent:
python main.py --interactiveRun a specific query and immediately execute the generated code:
python main.py "Load snapshot_001.h5 and print the field list" --executeThe agent's "brain" is a collection of Markdown files located in yt_agent/knowledge_base/. You can expand its capabilities by adding new topics, examples, and documentation.
The easiest way to teach the agent a new concept is to use the interactive training mode. The agent will interview you about a topic and generate the documentation file for you.
python main.py --trainYou will be prompted for:
- A filename (e.g.,
phase_plots) - A title for the topic
- A description of the concept
- A code example
If you have existing yt analysis notebooks (.ipynb), you can automatically convert them into knowledge base entries. The tool extracts Markdown cells and Code cells to preserve the context and logic.
python main.py --ingest path/to/notebook.ipynbYou can also ingest multiple notebooks at once:
python main.py --ingest notebook1.ipynb notebook2.ipynbYou can simply create new Markdown (.md) files in yt_agent/knowledge_base/.
Ensure they follow this structure for best results:
# Topic Title
Explanation of the concept...
\`\`\`python
import yt
# Best practice code example
\`\`\`As your knowledge base grows, sending all documentation with every query can become expensive. This agent automatically uses Gemini Context Caching to minimize costs.
- On startup, the agent checks if the current knowledge base matches an existing cache in your Google Cloud project.
- If a match is found: It reuses the cache. You do not pay for ingestion again. You only pay for the query tokens.
- If no match is found: It uploads the knowledge base and creates a new cache (valid for 2 hours).
This means repeated runs or interactive sessions share the same "brain" without re-uploading data, making it efficient for analyzing multiple datasets in sequence.