AI Data Analyst

Turn natural language questions into executable SQL against your local SQLite database. This project uses LangChain with local Ollama models and runs on the ultra-fast uv Python package manager.

Features

Automatic schema extraction from amazon.db
Natural language to SQL using local LLM (Ollama)
Deterministic SQL generation (temperature=0)
Easily switch models (e.g. DeepSeek-R1 vs QwenCoder)
Simple Python API (get_data_from_database(prompt)) for integration
Fast environment setup and dependency management via uv

Tech Stack

Python 3.11+
SQLite (amazon.db)
SQLAlchemy for schema introspection
LangChain Core + Community + Ollama integration
Ollama local models (recommend: qwen2.5-coder:7b for speed)
uv for dependency resolution, syncing, and running

Architecture

Extract schema using SQLAlchemy inspector
Feed schema + user question to an LLM prompt template
Model returns a SQL query (reasoning models may include <think> blocks – we strip or avoid them)
Execute SQL safely against amazon.db
Return results list

main.py
└─ extract_schema() -> text_to_sql() -> get_data_from_database()

Prerequisites

Install Ollama: https://ollama.com/download
Pull a model (choose one):
- Fast coder model:
```
ollama pull qwen2.5-coder:7b
```
- Reasoning model (slower):
```
ollama pull deepseek-r1:8b
```
Ensure amazon.db exists in the project root.

Setup (UV)

No need to manually create a virtual environment—uv handles it.

# Install uv if not present
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies from pyproject.toml
uv sync

# Run a module or script
uv run python main.py

To add a new dependency:

uv add package_name

To upgrade dependencies:

uv lock --upgrade
uv sync

Usage

Python API example:

from main import get_data_from_database
results = get_data_from_database("Show the first 5 products")
print(results)

Direct script run (if you extend main.py with CLI later):

uv run python main.py

Switching Models

Edit main.py:

model = OllamaLLM(model="qwen2.5-coder:7b", temperature=0)

If you use DeepSeek-R1 models and see long delays, they generate hidden reasoning. Mitigations:

Add: "Do not use tags." to system prompt
Strip with regex: re.sub(r"<think>.*?</think>", "", raw, flags=re.DOTALL)
Prefer a non-reasoning coder model for performance

Validating Generated SQL (Optional)

Before execution you can add a lightweight validator:

if not sql_query.lower().startswith("select"):
    raise ValueError("Only SELECT queries are allowed.")

Add more guards (block DROP/DELETE/UPDATE) for safety.

Performance Tips

Use smaller local models (qwen2.5-coder:7b) for snappy responses
Keep temperature=0 for deterministic output
Cache schema: avoid recomputing extract_schema each call
(Advanced) Stream model output and stop at first semicolon ; if the model over-explains

Troubleshooting

Issue	Cause	Fix
Long response time	Reasoning model generating chains	Switch to coder model
SQL errors (no such column)	Model hallucinated	Strengthen prompt, show schema clearly
Empty results	Query valid but data missing	Inspect `amazon.db` contents
Ollama connection error	Service not running	Run `ollama serve` or open app

Security Notes

Executing arbitrary LLM-generated SQL can be risky. Restrict to read-only queries and sanitize user inputs if you later interpolate values.

Contributing

Fork & branch
Make changes
Run uv sync && uv run python -m py_compile main.py
Submit PR

License

MIT

Built with local AI + fast Python tooling.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
amazon.db		amazon.db
create_db.py		create_db.py
frontend.py		frontend.py
image.png		image.png
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Data Analyst

Features

Tech Stack

Architecture

Prerequisites

Setup (UV)

Usage

Switching Models

Validating Generated SQL (Optional)

Performance Tips

Troubleshooting

Security Notes

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Data Analyst

Features

Tech Stack

Architecture

Prerequisites

Setup (UV)

Usage

Switching Models

Validating Generated SQL (Optional)

Performance Tips

Troubleshooting

Security Notes

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages