Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ repos:
rev: v0.45.0
hooks:
- id: markdownlint
exclude: docs/source/include-toplevel-*


- repo: local
Expand Down
62 changes: 11 additions & 51 deletions README-Unicode.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ for i in 2.7 3.{6,7};do echo "$i:";
LC_ALL=C python$i -c 'open("/usr/share/hwdata/pci.ids").read()';done
```

```
```text
2.7:
3.6:
Traceback (most recent call last):
Expand All @@ -106,65 +106,31 @@ UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 97850: ordi
3.7:
```

This error means that the `'ascii' codec` cannot handle input ord() >= 128, and as some Video cards use `²` to reference their power, the `ascii` codec chokes on them.

It means `xcp.pci.PCIIds()` cannot use `open("/usr/share/hwdata/pci.ids").read()`.
The `'ascii'` codec fails on all bytes >128.
For example, it cannot decode the bytes representing `²` (UTF-8: power of two) in the PCI IDs database.
To read `/usr/share/hwdata/pci.ids`, we must use `encoding="utf-8"`.

While Python 3.7 and newer use UTF-8 mode by default, it does not set up an error handler for `UnicodeDecodeError`.

As it happens, some older tools output ISO-8859-1 characters hard-coded and these aren't valid UTF-8 sequences, and even newer Python versions need error handlers to not fail:
Also, some older tools output ISO-8859-1 characters
These aren't valid UTF-8 sequences.
For all Python versions, we need to use error handlers to handle them:

```sh
echo -e "\0262" # ISO-8859-1 for: "²"
python3 -c 'open(".text").read()'
```

```
```text
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 0: invalid start byte
```

Of course, `xcp/net/ifrename` won't be affected but it would be good to fix the
warning for them as well in an intelligent way. See the proposal for that below.

There are a couple of possibilities and Python because 2.7 does not support the
arguments we need to pass to ensure that all users of open() will work, we need
to make passing the arguments conditional on Python >= 3.

1. Overriding `open()`, while technically working would not only affect xcp.python but the entire program:

```py
if sys.version_info >= (3, 0):
original_open = __builtins__["open"]
def uopen(*args, **kwargs):
if "b" not in (args[1] \
if len(args) >= 2 else kwargs.get("mode", "")):
kwargs.setdefault("encoding", "UTF-8")
kwargs.setdefault("errors", "replace")
return original_open(*args, **kwargs)
__builtins__["open"] = uopen
```

2. This is sufficient but is not very nice:

```py
# xcp/utf8mode.py
if sys.version_info >= (3, 0):
open_utf8args = {"encoding": "utf-8", "errors": "replace"}
else:
open_utf8args = {}
# xcp/{cmd,pci,environ?,logger?}.py tests/test_{pci,biodevname?,...?}.py
+ from .utf8mode import open_utf8args
...
- open(filename)
+ open(filename, **open_utf8args)
```

But, `pylint` will still warn about these lines, so I propose:

3. Instead, use a wrapper function, which will also silence the `pylint` warnings at the locations which have been changed to use it:
To fix these issues, `xcp.compat`, provides a wrapper for `open()`.
It adds `encoding="utf-8", errors="replace"`
to enable UTF-8 conversion and handle encoding errors:

```py
# xcp/utf8mode.py
Expand All @@ -182,9 +148,3 @@ to make passing the arguments conditional on Python >= 3.
+ utf8open(filename)
```

Using the 3rd option, the `pylint` warnings for the changed locations
`unspecified-encoding` and `consider-using-with` don't appear without
explicitly disabling them.

PS: Since utf8open() still returns a context-manager, `with open(...) as f:`
would still work.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@

myst_heading_anchors = 2
templates_path = ["_templates"]
exclude_patterns = []
exclude_patterns: list[str] = []


# -- Options for HTML output -------------------------------------------------
Expand Down
134 changes: 0 additions & 134 deletions stubs/pytest_httpserver.pyi

This file was deleted.

4 changes: 2 additions & 2 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -119,9 +119,9 @@ commands =
coverage xml -o {envlogdir}/coverage.xml --fail-under {env:XCP_COVERAGE_MIN:78}
coverage lcov -o {envlogdir}/coverage.lcov
coverage html -d {envlogdir}/htmlcov
coverage html -d {envlogdir}/htmlcov-tests --fail-under {env:TESTS_COVERAGE_MIN:96} \
coverage html -d {envlogdir}/htmlcov-tests --fail-under {env:TESTS_COVERAGE_MIN:95} \
--include="tests/*"
diff-cover --compare-branch=origin/master --exclude xcp/dmv.py \
diff-cover --compare-branch=origin/master \
{env:PY3_DIFFCOVER_OPTIONS} --fail-under {env:DIFF_COVERAGE_MIN:92} \
--html-report {envlogdir}/coverage-diff.html \
{envlogdir}/coverage.xml
Expand Down
Loading