feat: make LLM and embedding model configurable via YAML config by AaryanCode69 · Pull Request #112 · reactome/reactome_chatbot

AaryanCode69 · 2026-03-01T06:01:05Z

Summary

Make LLM and embedding model/provider configurable via the existing YAML config system instead of being hard-coded in AgentGraph.__init__().

Problem

The model provider and model name were hard-coded in AgentGraph.__init__():

llm: BaseChatModel = get_llm("openai", "gpt-4o-mini")
embedding: Embeddings = get_embedding("openai", "text-embedding-3-large")

This made it impossible to switch models, providers, or base URLs without directly modifying source code — limiting experimentation, self-hosting with local models (e.g., Ollama), and deployment flexibility.

Solution

Introduce a ModelsConfig Pydantic model and wire it through the existing YAML config pipeline so users can control models declaratively:

# config.yml / config_default.yml
models:
  llm:
    provider: openai        # or "ollama"
    model: gpt-4o-mini      # any model supported by the provider
    # base_url: http://localhost:11434  # optional, for self-hosted endpoints
  embedding:
    provider: openai         # or "huggingfacehub", "huggingfacelocal"
    model: text-embedding-3-large
    # device: cpu            # optional, for local HuggingFace models

AgentGraph now reads from config instead of hard-coded values:

llm: BaseChatModel = get_llm(
    llm_cfg.provider, llm_cfg.model, base_url=llm_cfg.base_url
)
embedding: Embeddings = get_embedding(
    emb_cfg.provider, emb_cfg.model, device=emb_cfg.device
)

Changes

File	Change
`src/util/config_yml/models.py`	New — `LLMConfig`, `EmbeddingConfig`, and `ModelsConfig` Pydantic models with sensible defaults
`src/util/config_yml/__init__.py`	Added `models: ModelsConfig` field to `Config` with a default so existing configs without the key still work
`src/agent/graph.py`	`AgentGraph.__init__()` accepts optional `ModelsConfig`; reads provider/model from config instead of hard-coded strings
`bin/chat-chainlit.py`	Passes `config.models` through to `AgentGraph` at startup
`config_default.yml`	Added `models` section with current default values documented

Backward Compatibility

The models key in YAML is optional — ModelsConfig defaults to openai/gpt-4o-mini and openai/text-embedding-3-large, matching the previously hard-coded values.
AgentGraph accepts models_config=None and falls back to the same defaults.
Existing config.yml files without a models section continue to work without modification.

Example: Switching to Ollama

models:
  llm:
    provider: ollama
    model: llama3
    base_url: http://localhost:11434
  embedding:
    provider: huggingfacelocal
    model: BAAI/bge-small-en-v1.5
    device: cuda

No code changes required — just update the YAML config and restart.

How This Enables Future MCP Integration

Decouples model selection from orchestration logic — the AgentGraph no longer owns provider/model decisions, making it easier for an MCP server to initialize its own LLM instances from the same shared config.
Establishes a config-driven pattern — the new ModelsConfig Pydantic model provides a validated, extensible schema. Future MCP settings (server URL, transport, tool registrations) can follow the same pattern and live alongside it in config.yml.
Reduces hard-coded coupling identified as a blocker — the architecture analysis explicitly listed hard-coded model selection as a limitation for MCP integration. This PR removes that limitation.
Supports diverse deployment topologies — MCP servers may run with different model backends (e.g., a local Ollama instance for development, OpenAI for production). Config-driven model selection makes this seamless without code forks.

Related Issue

Resolves #108

Previously, the model provider and model name were hard-coded in AgentGraph.__init__() as get_llm("openai", "gpt-4o-mini") and get_embedding("openai", "text-embedding-3-large"). This made it impossible to switch models without modifying source code. - Add llm and embedding configuration fields to the YAML config schema - Update Pydantic config models to validate new model settings - Update AgentGraph to read provider/model from YAML config instead of hard-coded values - Retain existing defaults for backward compatibility Resolves reactome#108

Pydantic v2 BaseModel rejects unknown fields by default. Adding a `models` key to config.yml caused the entire Config to fail validation, returning None and silently disabling messages, rate limits, and feature flags. Setting extra="ignore" lets the parser skip unrecognized keys while still loading all known configuration correctly.

AaryanCode69 added 2 commits March 1, 2026 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make LLM and embedding model configurable via YAML config#112

feat: make LLM and embedding model configurable via YAML config#112
AaryanCode69 wants to merge 2 commits intoreactome:mainfrom
AaryanCode69:feat/configurable-llm-embedding

AaryanCode69 commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AaryanCode69 commented Mar 1, 2026

Summary

Problem

Solution

Changes

Backward Compatibility

Example: Switching to Ollama

How This Enables Future MCP Integration

Related Issue

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant