Skip to content

DanielTobi0/chat-with-structured-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chat With Your Data

An interactive Streamlit application that allows you to have a conversation with your structured data (CSV/Excel).

Ask questions in natural language, and get back answers, insights, and even visualizations.

Demo

lv_0_20250710153428.mp4

Features

  • Natural Language Queries: Ask questions about your data in plain natural language.
  • Interactive Chat Interface: A user-friendly chat UI built with Streamlit.
  • Support for CSV and Excel: Upload multiple files of different formats.
  • AI-Powered Code Generation: Uses LLM to generate Python/pandas code to answer your questions.
  • Self-Correcting Code: Automatically attempts to fix errors in the generated code.
  • Data Visualization: Can generate plots and charts using Plotly.

How it Works

The application uses a combination of a Streamlit frontend and a Python backend. When a user asks a question:

  1. The Streamlit app captures the input.
  2. The DataChatSystem formats the question along with the schema of the uploaded dataframes into a prompt for an OpenAI model.
  3. The LLM generates Python code to answer the question.
  4. The application executes this code in a sandboxed environment.
  5. If the code runs successfully, the result (text or a plot) is displayed.
  6. If the code fails, the system uses the error message to ask the LLM to correct the code, and this process repeats for a few attempts.

How to Use

  1. Clone the repository:

    git clone https://github.com/DanielTobi0/chat-with-structured-data.git
    cd chat-with-structured-data
  2. Install dependencies: It's recommended to use a virtual environment with uv for faster dependency installation.

    # Install uv if you don't have it yet
    pip install uv
    
    # Create virtual environment and install dependencies
    uv venv
    source .venv/bin/activate  # On Windows use `.venv\Scripts\activate`
    uv pip install -r requirements.txt
  3. Set up OpenAI API Key: The application requires an OpenAI API key. You can enter it directly in the application's sidebar.

  4. Run the application:

    streamlit run app.py
  5. Use the App:

    • Open your browser to the local URL provided by Streamlit.
    • Enter your OpenAI API key in the sidebar.
    • Upload your CSV or Excel files.
    • Click "Confirm".
    • Start asking questions!

Core Architecture

The application follows a modular architecture centered around the DataChatSystem class that connects all components:

Component Overview

  1. Frontend Layer (app.py)

    • Streamlit-based user interface
    • Handles file uploads, API key configuration, and chat interactions
    • Renders text responses and visualizations
  2. Chat Engine (src/chat.py)

    • DataChatSystem class coordinates the entire question-answering workflow
    • Manages conversation state and history
    • Handles the code generation, execution, and correction pipeline
  3. Data Processing (src/data_loading.py)

    • DataProcessor class handles loading CSV and Excel files
    • Normalizes column names and structures data for analysis
    • Supports multiple sheets and file formats
  4. LLM Integration

    • Sends structured prompts to OpenAI models
    • Processes responses and extracts executable Python code
    • Implements error handling and code correction strategies
  5. Code Execution Environment

    • Sandboxed execution of generated Python code
    • Error capture and reporting for correction attempts
    • Safe result conversion for display in the UI

Data Flow

  1. User uploads data files → Files processed by DataProcessor → DataFrame context created
  2. User asks question → Question combined with data schema as prompt → Sent to OpenAI API
  3. API returns Python code → Code executed in sandbox → Results or errors captured
  4. If successful, results displayed to u

Uploading lv_0_20250710153428.mp4…

ser → If errors occur, code correction attempted

The system employs a self-correcting approach where code errors are analyzed and used to generate improved code in subsequent attempts, creating a robust question-answering experience.

Project Structure

.
├── app.py                  # Main Streamlit application file
├── src/
│   ├── chat.py             # Core logic for the data chat system (code generation, execution, correction)
│   ├── config.py           # Application configuration
│   ├── data_loading.py     # Handles loading and processing of data files
│   ├── prompts.py          # Prompts for the LLM
│   └── utils.py            # Utility functions
├── requirements.txt        # Python dependencies
└── README.md               # This file

About

Question and Answering agent for tabular datasets without external agentic framework using context engineering to manage memory and improve response quality

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages