An interactive Streamlit application that allows you to have a conversation with your structured data (CSV/Excel).
Ask questions in natural language, and get back answers, insights, and even visualizations.
lv_0_20250710153428.mp4
- Natural Language Queries: Ask questions about your data in plain natural language.
- Interactive Chat Interface: A user-friendly chat UI built with Streamlit.
- Support for CSV and Excel: Upload multiple files of different formats.
- AI-Powered Code Generation: Uses LLM to generate Python/pandas code to answer your questions.
- Self-Correcting Code: Automatically attempts to fix errors in the generated code.
- Data Visualization: Can generate plots and charts using Plotly.
The application uses a combination of a Streamlit frontend and a Python backend. When a user asks a question:
- The Streamlit app captures the input.
- The
DataChatSystemformats the question along with the schema of the uploaded dataframes into a prompt for an OpenAI model. - The LLM generates Python code to answer the question.
- The application executes this code in a sandboxed environment.
- If the code runs successfully, the result (text or a plot) is displayed.
- If the code fails, the system uses the error message to ask the LLM to correct the code, and this process repeats for a few attempts.
-
Clone the repository:
git clone https://github.com/DanielTobi0/chat-with-structured-data.git cd chat-with-structured-data -
Install dependencies: It's recommended to use a virtual environment with uv for faster dependency installation.
# Install uv if you don't have it yet pip install uv # Create virtual environment and install dependencies uv venv source .venv/bin/activate # On Windows use `.venv\Scripts\activate` uv pip install -r requirements.txt
-
Set up OpenAI API Key: The application requires an OpenAI API key. You can enter it directly in the application's sidebar.
-
Run the application:
streamlit run app.py
-
Use the App:
- Open your browser to the local URL provided by Streamlit.
- Enter your OpenAI API key in the sidebar.
- Upload your CSV or Excel files.
- Click "Confirm".
- Start asking questions!
The application follows a modular architecture centered around the DataChatSystem class that connects all components:
-
Frontend Layer (app.py)
- Streamlit-based user interface
- Handles file uploads, API key configuration, and chat interactions
- Renders text responses and visualizations
-
Chat Engine (src/chat.py)
DataChatSystemclass coordinates the entire question-answering workflow- Manages conversation state and history
- Handles the code generation, execution, and correction pipeline
-
Data Processing (src/data_loading.py)
DataProcessorclass handles loading CSV and Excel files- Normalizes column names and structures data for analysis
- Supports multiple sheets and file formats
-
LLM Integration
- Sends structured prompts to OpenAI models
- Processes responses and extracts executable Python code
- Implements error handling and code correction strategies
-
Code Execution Environment
- Sandboxed execution of generated Python code
- Error capture and reporting for correction attempts
- Safe result conversion for display in the UI
- User uploads data files → Files processed by
DataProcessor→ DataFrame context created - User asks question → Question combined with data schema as prompt → Sent to OpenAI API
- API returns Python code → Code executed in sandbox → Results or errors captured
- If successful, results displayed to u
Uploading lv_0_20250710153428.mp4…
ser → If errors occur, code correction attempted
The system employs a self-correcting approach where code errors are analyzed and used to generate improved code in subsequent attempts, creating a robust question-answering experience.
.
├── app.py # Main Streamlit application file
├── src/
│ ├── chat.py # Core logic for the data chat system (code generation, execution, correction)
│ ├── config.py # Application configuration
│ ├── data_loading.py # Handles loading and processing of data files
│ ├── prompts.py # Prompts for the LLM
│ └── utils.py # Utility functions
├── requirements.txt # Python dependencies
└── README.md # This file