Allow quotation marks in string fields by mschunte2 · Pull Request #4 · Fjanks/pydatev

mschunte2 · 2026-05-22T07:36:55Z

WARNING: This code is AI Generated - please ignore/reject/review if you are uncomfortable with this.

Adds DATEV Belege support — the documentation files (invoices,
receipts, …) shipped alongside a Buchungsstapel as a belege.zip
Document Package with a document.xml manifest. Standard library only;
Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical
output, no zip unless a Beleg is attached).

Public API (top-level `pydatev`)

Beleg(filepath, belegtyp=None, guid=None, archive_name=None) — a
file-backed document (guid, filename, blob, belegtyp).
Belegarchiv(...) — manages a set of Belege; writes/reads
belege.zip + document.xml (schemas v04.0 and v06.0); dedups by
GUID; container protocol (len/in/iter/indexing) plus
add/remove/clear/get_by_guid.
Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *, guid=None, archive_name=None) — attaches a file to a booking row,
sets the CSV Beleglink (BEDI "<UUID>"), and auto-writes
belege.zip next to the CSV on save() (reading it back on load()).
uuid8_from_sha256(namespace, *parts) — the deterministic UUIDv8
(RFC 9562, SHA-256) primitive used for the default Beleg GUID over
(namespace, archive_name, blob); stable across re-exports. Pass
guid= to supply your own identity.

Also fixes CSV Text round-trip with embedded quotes, so a Beleglink
written as BEDI "<UUID>" survives save() → load().

Docs: File-handling.md. Tests: tests/test_belegarchiv.py,
tests/test_text_roundtrip.py (22 tests, all green).

DATEV's Buchungsstapel format requires the "Beleglink" column to carry `BEDI "<UUID>"` (with embedded quotes) when using the BEDI provider for Beleg-Bilder. The format is the CSV-standard doubled-quote escape: the cell content `BEDI "<UUID>"` is serialised in the file as `"BEDI ""<UUID>"""` (outer quotes = field delimiters, inner doubled quotes = escaped literal quote). This is what BuchhaltungsButler emits in its own DATEV round-trip exports. Two changes in src/pydatev/pydatev.py: - DatevEntry.__setitem__: drop the rejection of `"` in Text values. Quotes are legal in Text fields; the file-level CSV escape happens in python2datev(), not at value-storage time. - DatevEntry.python2datev: when emitting a Text value, double any embedded quotation marks (`value.replace('"', '""')`) before wrapping in the field-level `"..."`. This produces standards- conformant DATEV CSV output for values containing quotes. Verified by round-trip against a BuchhaltungsButler-generated DATEV export (2026-05-22) — BB's auto-Beleg-Verknüpfung greift now matches the BEDI-quoted UUID against the document.xml GUID.

Fjanks · 2026-05-26T11:11:11Z

Thank you for the pull request! Allowing quotation marks does make sense. It is consistent with the specifications of the DATEV format.

And the new Belegarchiv is a very nice feature!!

Before I merge, I have two small comments:

I think there is a little bug. When I add an entry with the Beleglink 'BEDI "<uuid>"', save it to a file and load that file, the result is 'BEDI <uuid>'. The quotation marks are saved correctly, but they are lost when loading the file. I think replacing the line value = string.replace('"','') with something like value = string[1:-1].replace('""','"') in the function datev2python() should fix it.
In the code and also in File-handling.md, there is the reference to https://apps.datev.de/help-center for the supported file types. However, I didn't find the list there. Can you give a more specific link?

mschunte2 · 2026-05-26T14:14:04Z

Thanks a lot for the feedback! I asked Claude to fix the bug based on your feedback...
Unfortunately, I did not find a publicly accessible document of the file formats. As a consequence, I asked claude to only use the document number (despite it not being accessible).

FYI: I now use the library and the Belegarchiv function to "pull" transactions from Flatex and import them into Buchhaltungsbutler using the Buchhaltungsbutler function for importing DATEV compatible transactions and the corresponding Belege.zip. I can confirm that this has worked so far for 20+ purchases of ETFs at Flatex.

mschunte2 · 2026-05-26T20:58:12Z

I now moved all AI-generated code into a sub-module "Belegarchiv" to ensure that the original pydatev.py stays untouched (except for the changes to introduce quotations for Belege) and that none of the original users of pydatev is affected by any of the AI-generated code.

import pydatev now exposes the upstream surface — no Beleg names leaked.
from pydatev.belegarchiv import Buchungsstapel, Beleg, Belegarchiv is the opt-in path for everything Belege.

Fjanks · 2026-06-02T19:51:20Z

We don't need to put AI-generated code in extra files. I think that will not be feasible in the long term. But in this case its still a good idea to move the Belegarchiv to an extra file, because splitting long files in parts can make it easier to understand the codebase.

However, AI-generated code should to be revised before we publish it. AI often generates code that is unnecessarily complicated or contains bugs that are hard to find. If we include it without a critical review, the project may become more and more messy and hard to maintain. To avoid that, I have some comments and suggestions how to improve it.

1
I think having the Buchungsstapel and Buchungsentry from pydatev.belegarchiv as replacements for two classes in pydatev is confusing. It requires the users to understand the differences and decide on which one to use. About the two Buchungsstapel classes, I would suggest to just merge them. About the class Buchungsentry, I think the class is not necessary at all. It just exists because right now an invoice is added this way:

invoice = pydatev.belegarchiv.Beleg(...)
entry = bs.add_buchung(...)
entry['Beleg'] = invoice

This approach requires the entry to handle these pseudo-fields 'Belge' and 'Belegtyp', which adds a lot of code. What do you think about adding invoices in the following way:

entry = bs.add_buchung(...)
bs.add_beleg(entry, filename, belegtyp)

Then the new function Buchungsstapel.add_beleg(...) just adds the invoice to the Belegarchiv and sets entry['Beleglink']=.... This way, we don't need the Buchungsentry class anymore, because the original DatevEntry can be used without any changes.

2
I didn't find document 1000312 and I don't want to spread false information. Is it maybe just made up by AI? In that case, remove the reference and also remove the check for supported file types.

3
In line 66 in pydatev.py there is a comment which explains why a line was removed. Such explanations make sense in a commit message, but as a comment in the code a future reader might just be confused because he or she doesn't know what was there before it was removed.

4
You don't need to change the version number and add something in the changelog with every commit. All commits of this pull request are part of one new feature and most users probably don't want to know about all the intermediate steps during the development. Lets increase the version just once with this PR and make only one entry in the changelog. And if the changes don't break older code that depends on pydatev, the changelog can be quite short. Just describe the new feature in one or a few sentences.

5
The tests in test_module_isolation.py just check whether the code is in the file where it is right now. If someone decides to move or rename things, these tests can fail although the code may still be valid and functional. If you remove these tests we don't loose anything and save the time to keep the file up to date.

6
When you move or rename things, don't forget to update the documentation. At the moment, README.md and File-handling.md are not up to date.

Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a `belege.zip` Document Package with a `document.xml` manifest. Standard library only; Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical output, no zip unless a Beleg is attached). Public API (top-level `pydatev`): - `Beleg(filepath, belegtyp=None, guid=None, archive_name=None)` — a file-backed document (guid, filename, blob, belegtyp). - `Belegarchiv(...)` — manages a set of Belege; writes/reads `belege.zip` + `document.xml` (schemas v04.0 and v06.0); dedups by GUID; container protocol (len/in/iter/indexing) plus add/remove/clear/get_by_guid. - `Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *, guid=None, archive_name=None)` — attaches a file to a booking row, sets the CSV `Beleglink` (`BEDI "<UUID>"`), and auto-writes `belege.zip` next to the CSV on save() (reading it back on load()). - `uuid8_from_sha256(namespace, *parts)` — the deterministic UUIDv8 (RFC 9562, SHA-256) primitive used for the default Beleg GUID over (namespace, archive_name, blob); stable across re-exports. Pass `guid=` to supply your own identity. Also fixes CSV `Text` round-trip with embedded quotes, so a `Beleglink` written as `BEDI "<UUID>"` survives save() → load(). Docs: File-handling.md. Tests: tests/test_belegarchiv.py, tests/test_text_roundtrip.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mschunte2 · 2026-06-03T21:57:43Z

Thanks a lot for the comments. I tried to address those in my latest commit. I now did a mix of smaller extension of pydatev, e.g. bs.add_beleg(entry, filename, belegtyp) and another file _belege.py that implements the handling of the Belegarchiv. I hope this helps. Further feedback is appreciated.

BTW: Buchhaltungsbutler is unable to import 2 buchungen with 1 beleg. I hope they debug this soon.

mschunte2 force-pushed the master branch from c722864 to 2d3b570 Compare May 22, 2026 12:08

mschunte2 force-pushed the master branch from 1a07c2c to 2900084 Compare June 3, 2026 21:45

mschunte2 changed the title ~~Allow quotation marks in Text values (CSV-escape via doubling)~~ Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a belege.zip Document Package with a document.xml manifest. Jun 3, 2026

mschunte2 changed the title ~~Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a belege.zip Document Package with a document.xml manifest.~~ Allow quotation marks in string fields Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow quotation marks in string fields #4

Allow quotation marks in string fields #4
mschunte2 wants to merge 2 commits into
Fjanks:masterfrom
mschunte2:master

mschunte2 commented May 22, 2026 •

edited

Loading

Uh oh!

Fjanks commented May 26, 2026

Uh oh!

mschunte2 commented May 26, 2026 •

edited

Loading

Uh oh!

mschunte2 commented May 26, 2026 •

edited

Loading

Uh oh!

Fjanks commented Jun 2, 2026

Uh oh!

mschunte2 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mschunte2 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Public API (top-level pydatev)

Uh oh!

Fjanks commented May 26, 2026

Uh oh!

mschunte2 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mschunte2 commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fjanks commented Jun 2, 2026

Uh oh!

mschunte2 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mschunte2 commented May 22, 2026 •

edited

Loading

Public API (top-level `pydatev`)

mschunte2 commented May 26, 2026 •

edited

Loading

mschunte2 commented May 26, 2026 •

edited

Loading