Skip to content

Allow quotation marks in string fields #4

Open
mschunte2 wants to merge 2 commits into
Fjanks:masterfrom
mschunte2:master
Open

Allow quotation marks in string fields #4
mschunte2 wants to merge 2 commits into
Fjanks:masterfrom
mschunte2:master

Conversation

@mschunte2
Copy link
Copy Markdown

@mschunte2 mschunte2 commented May 22, 2026

WARNING: This code is AI Generated - please ignore/reject/review if you are uncomfortable with this.

Adds DATEV Belege support — the documentation files (invoices,
receipts, …) shipped alongside a Buchungsstapel as a belege.zip
Document Package with a document.xml manifest. Standard library only;
Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical
output, no zip unless a Beleg is attached).

Public API (top-level pydatev)

  • Beleg(filepath, belegtyp=None, guid=None, archive_name=None) — a
    file-backed document (guid, filename, blob, belegtyp).
  • Belegarchiv(...) — manages a set of Belege; writes/reads
    belege.zip + document.xml (schemas v04.0 and v06.0); dedups by
    GUID; container protocol (len/in/iter/indexing) plus
    add/remove/clear/get_by_guid.
  • Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *, guid=None, archive_name=None) — attaches a file to a booking row,
    sets the CSV Beleglink (BEDI "<UUID>"), and auto-writes
    belege.zip next to the CSV on save() (reading it back on load()).
  • uuid8_from_sha256(namespace, *parts) — the deterministic UUIDv8
    (RFC 9562, SHA-256) primitive used for the default Beleg GUID over
    (namespace, archive_name, blob); stable across re-exports. Pass
    guid= to supply your own identity.

Also fixes CSV Text round-trip with embedded quotes, so a Beleglink
written as BEDI "<UUID>" survives save()load().

Docs: File-handling.md. Tests: tests/test_belegarchiv.py,
tests/test_text_roundtrip.py (22 tests, all green).

DATEV's Buchungsstapel format requires the "Beleglink" column to carry
`BEDI "<UUID>"` (with embedded quotes) when using the BEDI provider for
Beleg-Bilder. The format is the CSV-standard doubled-quote escape:
the cell content `BEDI "<UUID>"` is serialised in the file as
`"BEDI ""<UUID>"""` (outer quotes = field delimiters, inner doubled
quotes = escaped literal quote). This is what BuchhaltungsButler emits
in its own DATEV round-trip exports.

Two changes in src/pydatev/pydatev.py:

- DatevEntry.__setitem__: drop the rejection of `"` in Text values.
  Quotes are legal in Text fields; the file-level CSV escape happens
  in python2datev(), not at value-storage time.

- DatevEntry.python2datev: when emitting a Text value, double any
  embedded quotation marks (`value.replace('"', '""')`) before
  wrapping in the field-level `"..."`. This produces standards-
  conformant DATEV CSV output for values containing quotes.

Verified by round-trip against a BuchhaltungsButler-generated DATEV
export (2026-05-22) — BB's auto-Beleg-Verknüpfung greift now matches
the BEDI-quoted UUID against the document.xml GUID.
@Fjanks
Copy link
Copy Markdown
Owner

Fjanks commented May 26, 2026

Thank you for the pull request! Allowing quotation marks does make sense. It is consistent with the specifications of the DATEV format.

And the new Belegarchiv is a very nice feature!!

Before I merge, I have two small comments:

  1. I think there is a little bug. When I add an entry with the Beleglink 'BEDI "<uuid>"', save it to a file and load that file, the result is 'BEDI <uuid>'. The quotation marks are saved correctly, but they are lost when loading the file. I think replacing the line value = string.replace('"','') with something like value = string[1:-1].replace('""','"') in the function datev2python() should fix it.

  2. In the code and also in File-handling.md, there is the reference to https://apps.datev.de/help-center for the supported file types. However, I didn't find the list there. Can you give a more specific link?

@mschunte2
Copy link
Copy Markdown
Author

mschunte2 commented May 26, 2026

Thanks a lot for the feedback! I asked Claude to fix the bug based on your feedback...
Unfortunately, I did not find a publicly accessible document of the file formats. As a consequence, I asked claude to only use the document number (despite it not being accessible).

FYI: I now use the library and the Belegarchiv function to "pull" transactions from Flatex and import them into Buchhaltungsbutler using the Buchhaltungsbutler function for importing DATEV compatible transactions and the corresponding Belege.zip. I can confirm that this has worked so far for 20+ purchases of ETFs at Flatex.

@mschunte2
Copy link
Copy Markdown
Author

mschunte2 commented May 26, 2026

I now moved all AI-generated code into a sub-module "Belegarchiv" to ensure that the original pydatev.py stays untouched (except for the changes to introduce quotations for Belege) and that none of the original users of pydatev is affected by any of the AI-generated code.

import pydatev now exposes the upstream surface — no Beleg names leaked.
from pydatev.belegarchiv import Buchungsstapel, Beleg, Belegarchiv is the opt-in path for everything Belege.

@Fjanks
Copy link
Copy Markdown
Owner

Fjanks commented Jun 2, 2026

We don't need to put AI-generated code in extra files. I think that will not be feasible in the long term. But in this case its still a good idea to move the Belegarchiv to an extra file, because splitting long files in parts can make it easier to understand the codebase.

However, AI-generated code should to be revised before we publish it. AI often generates code that is unnecessarily complicated or contains bugs that are hard to find. If we include it without a critical review, the project may become more and more messy and hard to maintain. To avoid that, I have some comments and suggestions how to improve it.

1
I think having the Buchungsstapel and Buchungsentry from pydatev.belegarchiv as replacements for two classes in pydatev is confusing. It requires the users to understand the differences and decide on which one to use. About the two Buchungsstapel classes, I would suggest to just merge them. About the class Buchungsentry, I think the class is not necessary at all. It just exists because right now an invoice is added this way:

invoice = pydatev.belegarchiv.Beleg(...)
entry = bs.add_buchung(...)
entry['Beleg'] = invoice

This approach requires the entry to handle these pseudo-fields 'Belge' and 'Belegtyp', which adds a lot of code. What do you think about adding invoices in the following way:

entry = bs.add_buchung(...)
bs.add_beleg(entry, filename, belegtyp)

Then the new function Buchungsstapel.add_beleg(...) just adds the invoice to the Belegarchiv and sets entry['Beleglink']=.... This way, we don't need the Buchungsentry class anymore, because the original DatevEntry can be used without any changes.

2
I didn't find document 1000312 and I don't want to spread false information. Is it maybe just made up by AI? In that case, remove the reference and also remove the check for supported file types.

3
In line 66 in pydatev.py there is a comment which explains why a line was removed. Such explanations make sense in a commit message, but as a comment in the code a future reader might just be confused because he or she doesn't know what was there before it was removed.

4
You don't need to change the version number and add something in the changelog with every commit. All commits of this pull request are part of one new feature and most users probably don't want to know about all the intermediate steps during the development. Lets increase the version just once with this PR and make only one entry in the changelog. And if the changes don't break older code that depends on pydatev, the changelog can be quite short. Just describe the new feature in one or a few sentences.

5
The tests in test_module_isolation.py just check whether the code is in the file where it is right now. If someone decides to move or rename things, these tests can fail although the code may still be valid and functional. If you remove these tests we don't loose anything and save the time to keep the file up to date.

6
When you move or rename things, don't forget to update the documentation. At the moment, README.md and File-handling.md are not up to date.

Adds DATEV Belege support — the documentation files (invoices,
receipts, …) shipped alongside a Buchungsstapel as a `belege.zip`
Document Package with a `document.xml` manifest. Standard library only;
Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical
output, no zip unless a Beleg is attached).

Public API (top-level `pydatev`):
- `Beleg(filepath, belegtyp=None, guid=None, archive_name=None)` — a
  file-backed document (guid, filename, blob, belegtyp).
- `Belegarchiv(...)` — manages a set of Belege; writes/reads
  `belege.zip` + `document.xml` (schemas v04.0 and v06.0); dedups by
  GUID; container protocol (len/in/iter/indexing) plus
  add/remove/clear/get_by_guid.
- `Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *,
  guid=None, archive_name=None)` — attaches a file to a booking row,
  sets the CSV `Beleglink` (`BEDI "<UUID>"`), and auto-writes
  `belege.zip` next to the CSV on save() (reading it back on load()).
- `uuid8_from_sha256(namespace, *parts)` — the deterministic UUIDv8
  (RFC 9562, SHA-256) primitive used for the default Beleg GUID over
  (namespace, archive_name, blob); stable across re-exports. Pass
  `guid=` to supply your own identity.

Also fixes CSV `Text` round-trip with embedded quotes, so a `Beleglink`
written as `BEDI "<UUID>"` survives save() → load().

Docs: File-handling.md. Tests: tests/test_belegarchiv.py,
tests/test_text_roundtrip.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mschunte2
Copy link
Copy Markdown
Author

Thanks a lot for the comments. I tried to address those in my latest commit. I now did a mix of smaller extension of pydatev, e.g. bs.add_beleg(entry, filename, belegtyp) and another file _belege.py that implements the handling of the Belegarchiv. I hope this helps. Further feedback is appreciated.

BTW: Buchhaltungsbutler is unable to import 2 buchungen with 1 beleg. I hope they debug this soon.

@mschunte2 mschunte2 changed the title Allow quotation marks in Text values (CSV-escape via doubling) Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a belege.zip Document Package with a document.xml manifest. Jun 3, 2026
@mschunte2 mschunte2 changed the title Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a belege.zip Document Package with a document.xml manifest. Allow quotation marks in string fields Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants