Allow quotation marks in string fields #4
Conversation
DATEV's Buchungsstapel format requires the "Beleglink" column to carry
`BEDI "<UUID>"` (with embedded quotes) when using the BEDI provider for
Beleg-Bilder. The format is the CSV-standard doubled-quote escape:
the cell content `BEDI "<UUID>"` is serialised in the file as
`"BEDI ""<UUID>"""` (outer quotes = field delimiters, inner doubled
quotes = escaped literal quote). This is what BuchhaltungsButler emits
in its own DATEV round-trip exports.
Two changes in src/pydatev/pydatev.py:
- DatevEntry.__setitem__: drop the rejection of `"` in Text values.
Quotes are legal in Text fields; the file-level CSV escape happens
in python2datev(), not at value-storage time.
- DatevEntry.python2datev: when emitting a Text value, double any
embedded quotation marks (`value.replace('"', '""')`) before
wrapping in the field-level `"..."`. This produces standards-
conformant DATEV CSV output for values containing quotes.
Verified by round-trip against a BuchhaltungsButler-generated DATEV
export (2026-05-22) — BB's auto-Beleg-Verknüpfung greift now matches
the BEDI-quoted UUID against the document.xml GUID.
|
Thank you for the pull request! Allowing quotation marks does make sense. It is consistent with the specifications of the DATEV format. And the new Belegarchiv is a very nice feature!! Before I merge, I have two small comments:
|
|
Thanks a lot for the feedback! I asked Claude to fix the bug based on your feedback... FYI: I now use the library and the Belegarchiv function to "pull" transactions from Flatex and import them into Buchhaltungsbutler using the Buchhaltungsbutler function for importing DATEV compatible transactions and the corresponding Belege.zip. I can confirm that this has worked so far for 20+ purchases of ETFs at Flatex. |
|
I now moved all AI-generated code into a sub-module "Belegarchiv" to ensure that the original pydatev.py stays untouched (except for the changes to introduce quotations for Belege) and that none of the original users of pydatev is affected by any of the AI-generated code. import pydatev now exposes the upstream surface — no Beleg names leaked. |
|
We don't need to put AI-generated code in extra files. I think that will not be feasible in the long term. But in this case its still a good idea to move the Belegarchiv to an extra file, because splitting long files in parts can make it easier to understand the codebase. However, AI-generated code should to be revised before we publish it. AI often generates code that is unnecessarily complicated or contains bugs that are hard to find. If we include it without a critical review, the project may become more and more messy and hard to maintain. To avoid that, I have some comments and suggestions how to improve it. 1 invoice = pydatev.belegarchiv.Beleg(...)
entry = bs.add_buchung(...)
entry['Beleg'] = invoiceThis approach requires the entry to handle these pseudo-fields 'Belge' and 'Belegtyp', which adds a lot of code. What do you think about adding invoices in the following way: entry = bs.add_buchung(...)
bs.add_beleg(entry, filename, belegtyp)Then the new function 2 3 4 5 6 |
Adds DATEV Belege support — the documentation files (invoices, receipts, …) shipped alongside a Buchungsstapel as a `belege.zip` Document Package with a `document.xml` manifest. Standard library only; Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical output, no zip unless a Beleg is attached). Public API (top-level `pydatev`): - `Beleg(filepath, belegtyp=None, guid=None, archive_name=None)` — a file-backed document (guid, filename, blob, belegtyp). - `Belegarchiv(...)` — manages a set of Belege; writes/reads `belege.zip` + `document.xml` (schemas v04.0 and v06.0); dedups by GUID; container protocol (len/in/iter/indexing) plus add/remove/clear/get_by_guid. - `Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *, guid=None, archive_name=None)` — attaches a file to a booking row, sets the CSV `Beleglink` (`BEDI "<UUID>"`), and auto-writes `belege.zip` next to the CSV on save() (reading it back on load()). - `uuid8_from_sha256(namespace, *parts)` — the deterministic UUIDv8 (RFC 9562, SHA-256) primitive used for the default Beleg GUID over (namespace, archive_name, blob); stable across re-exports. Pass `guid=` to supply your own identity. Also fixes CSV `Text` round-trip with embedded quotes, so a `Beleglink` written as `BEDI "<UUID>"` survives save() → load(). Docs: File-handling.md. Tests: tests/test_belegarchiv.py, tests/test_text_roundtrip.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks a lot for the comments. I tried to address those in my latest commit. I now did a mix of smaller extension of pydatev, e.g. bs.add_beleg(entry, filename, belegtyp) and another file _belege.py that implements the handling of the Belegarchiv. I hope this helps. Further feedback is appreciated. BTW: Buchhaltungsbutler is unable to import 2 buchungen with 1 beleg. I hope they debug this soon. |
belege.zip Document Package with a document.xml manifest.
belege.zip Document Package with a document.xml manifest.
WARNING: This code is AI Generated - please ignore/reject/review if you are uncomfortable with this.
Adds DATEV Belege support — the documentation files (invoices,
receipts, …) shipped alongside a Buchungsstapel as a
belege.zipDocument Package with a
document.xmlmanifest. Standard library only;Python 3.6+. Plain Buchungsstapel CSV use is unchanged (byte-identical
output, no zip unless a Beleg is attached).
Public API (top-level
pydatev)Beleg(filepath, belegtyp=None, guid=None, archive_name=None)— afile-backed document (guid, filename, blob, belegtyp).
Belegarchiv(...)— manages a set of Belege; writes/readsbelege.zip+document.xml(schemas v04.0 and v06.0); dedups byGUID; container protocol (len/in/iter/indexing) plus
add/remove/clear/get_by_guid.
Buchungsstapel.add_beleg(entry, filepath, belegtyp=None, *, guid=None, archive_name=None)— attaches a file to a booking row,sets the CSV
Beleglink(BEDI "<UUID>"), and auto-writesbelege.zipnext to the CSV onsave()(reading it back onload()).uuid8_from_sha256(namespace, *parts)— the deterministic UUIDv8(RFC 9562, SHA-256) primitive used for the default Beleg GUID over
(namespace, archive_name, blob); stable across re-exports. Pass
guid=to supply your own identity.Also fixes CSV
Textround-trip with embedded quotes, so aBeleglinkwritten as
BEDI "<UUID>"survivessave()→load().Docs:
File-handling.md. Tests:tests/test_belegarchiv.py,tests/test_text_roundtrip.py(22 tests, all green).