Skip to content

gh-139353: Add Objects/unicode_codecs_utf.c file#142190

Open
vstinner wants to merge 2 commits into
python:mainfrom
vstinner:unicode_codecs_utf
Open

gh-139353: Add Objects/unicode_codecs_utf.c file#142190
vstinner wants to merge 2 commits into
python:mainfrom
vstinner:unicode_codecs_utf

Conversation

@vstinner
Copy link
Copy Markdown
Member

@vstinner vstinner commented Dec 2, 2025

Rename functions:

  • ascii_decode() => _PyUnicode_DecodeASCII()
  • backslashreplace() => _PyUnicode_backslashreplace()
  • raise_encode_exception() => _PyUnicode_RaiseEncodeException()
  • unicode_decode_call_errorhandler_writer() => _PyUnicode_DecodeCallErrorHandler()
  • unicode_decode_utf8() => _PyUnicode_DecodeUTF8()
  • unicode_encode_call_errorhandler() => _PyUnicode_EncodeCallErrorHandler()
  • unicode_encode_utf8() => _PyUnicode_EncodeUTF8()
  • xmlcharrefreplace() => _PyUnicode_xmlcharrefreplace()

Move static inline functions and macros to pycore_unicodeobject.h:

  • _PyUnicode_CHECK()
  • _PyUnicode_UTF8()
  • PyUnicode_UTF8()
  • PyUnicode_SET_UTF8()
  • PyUnicode_UTF8_LENGTH()
  • PyUnicode_SET_UTF8_LENGTH()

Rename functions:

* ascii_decode() => _PyUnicode_DecodeASCII()
* backslashreplace() => _PyUnicode_backslashreplace()
* raise_encode_exception() => _PyUnicode_RaiseEncodeException()
* unicode_decode_call_errorhandler_writer() => _PyUnicode_DecodeCallErrorHandler()
* unicode_decode_utf8() => _PyUnicode_DecodeUTF8()
* unicode_encode_call_errorhandler() => _PyUnicode_EncodeCallErrorHandler()
* unicode_encode_utf8() => _PyUnicode_EncodeUTF8()
* xmlcharrefreplace() => _PyUnicode_xmlcharrefreplace()

Move static inline functions and macros to pycore_unicodeobject.h:

* _PyUnicode_CHECK()
* _PyUnicode_UTF8()
* PyUnicode_UTF8()
* PyUnicode_SET_UTF8()
* PyUnicode_UTF8_LENGTH()
* PyUnicode_SET_UTF8_LENGTH()
@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented Dec 3, 2025

@serhiy-storchaka: What do you think of this split?

@serhiy-storchaka
Copy link
Copy Markdown
Member

I do not feel easy about this. The UTF codecs code is tightly coupled with other code. This PR makes some static function non-static, and exposes local functions in a header. This means that the compiler cannot completely inline them -- it needs to keep also a non-inlined copy, and this can affect its decision to inline them. This means that low level C API which was previously not intended to use outside of the unicodeobject.c file can now be used in other CPython code, at it will be used, for sure. This will also affect optimization and maintainability.

If the goal of this change is to improve maintainability, I am not sure that its effect on maintainability is net positive.

@vstinner
Copy link
Copy Markdown
Member Author

vstinner commented Dec 3, 2025

An alternative is to put all codecs in a single file: #141469 (6,671 lines of C code).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions Bot added the stale Stale PR or inactive for long period of time. label May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting core review skip news stale Stale PR or inactive for long period of time.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants