Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,17 @@ repos:
- id: pyupgrade
args: [--py38-plus]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.9
hooks:
- id: ruff

- repo: https://github.com/psf/black
rev: 24.4.2
hooks:
- id: black

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.9
hooks:
- id: ruff
args: ["--fix"]

- repo: https://github.com/nbQA-dev/nbQA
rev: 1.8.5
hooks:
Expand Down
46 changes: 44 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,14 @@

`dataframes-haystack` is an extension for [Haystack 2](https://docs.haystack.deepset.ai/docs/intro) that enables integration with dataframe libraries.

The library offers custom [Converters](https://docs.haystack.deepset.ai/docs/converters) components that convert data stored in dataframes into Haystack [`Document`](https://docs.haystack.deepset.ai/docs/data-classes#document) objects.

The dataframe libraries currently supported are:
- [pandas](https://pandas.pydata.org/)
- [Polars](https://pola.rs)

The library offers various custom [Converters](https://docs.haystack.deepset.ai/docs/converters) components to transform dataframes into Haystack [`Document`](https://docs.haystack.deepset.ai/docs/data-classes#document) objects:
- `FileToPandasDataFrame` and `FileToPolarsDataFrame` read files and convert them into dataframes.
- `PandasDataFrameConverter` or `PolarsDataFrameConverter` convert data stored in dataframes into Haystack `Document`objects.

## 🛠️ Installation

```sh
Expand All @@ -40,6 +42,26 @@ pip install "dataframes-haystack[polars]"

### Pandas

#### FileToPandasDataFrame

```python
from dataframes_haystack.components.converters.pandas import FileToPandasDataFrame

converter = FileToPandasDataFrame(file_format="csv")

output_dataframe = converter.run(
file_paths=["data/doc1.csv", "data/doc2.csv"]
)
```

Result:
```python
>>> output_dataframe
{'dataframe': <pandas.DataFrame>}
```

#### PandasDataFrameConverter

```python
import pandas as pd

Expand All @@ -65,6 +87,26 @@ Result:

### Polars

#### FileToPolarsDataFrame

```python
from dataframes_haystack.components.converters.polars import FileToPolarsDataFrame

converter = FileToPolarsDataFrame(file_format="csv")

output_dataframe = converter.run(
file_paths=["data/doc1.csv", "data/doc2.csv"]
)
```

Result:
```python
>>> output_dataframe
{'dataframe': <polars.DataFrame>}
```

#### PolarsDataFrameConverter

```python
import polars as pl

Expand Down
Loading