Update dependency lxml to v4 [SECURITY]#21

Open

renovate[bot] wants to merge 1 commit intomasterfrom

renovate/pypi-lxml-vulnerability

renovate bot commented Mar 7, 2022

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
lxml (source, changelog)	`==3.8.0` -> `==4.6.5`

GitHub Vulnerability Alerts

CVE-2020-27783

A XSS vulnerability was discovered in python-lxml's clean module. The module's parser didn't properly imitate browsers, which caused different behaviors between the sanitizer and the user's page. A remote attacker could exploit this flaw to run arbitrary HTML/JS code.

CVE-2021-28957

An XSS vulnerability was discovered in the python lxml clean module versions before 4.6.3. When disabling the safe_attrs_only and forms arguments, the Cleaner class does not remove the formaction attribute allowing for JS to bypass the sanitizer. A remote attacker could exploit this flaw to run arbitrary JS code on users who interact with incorrectly sanitized HTML. This issue is patched in lxml 4.6.3.

CVE-2021-43818

Impact

The HTML Cleaner in lxml.html lets certain crafted script content pass through, as well as script content in SVG files embedded using data URIs.

Users that employ the HTML cleaner in a security relevant context should upgrade to lxml 4.6.5.

Patches

The issue has been resolved in lxml 4.6.5.

Workarounds

None.

References

The issues are tracked under the report IDs GHSL-2021-1037 and GHSL-2021-1038.

Release Notes

lxml/lxml

`v4.6.5`

==================

Bugs fixed

A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script
content through SVG images (CVE-2021-43818).
A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script
content through CSS imports and other crafted constructs (CVE-2021-43818).

`v4.6.4`

==================

Features added

GH#317: A new property system_url was added to DTD entities.
Patch by Thirdegree.
GH#314: The STATIC_* variables in setup.py can now be passed via env vars.
Patch by Isaac Jurado.

`v4.6.3`

==================

Bugs fixed

A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung,
which allowed JavaScript to pass through. The cleaner now removes the HTML5
formaction attribute.

`v4.6.2`

==================

Bugs fixed

A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry,
which allowed JavaScript to pass through. The cleaner now removes more sneaky
"style" content.

`v4.6.1`

==================

Bugs fixed

A vulnerability was discovered in the HTML Cleaner by Yaniv Nizry, which allowed
JavaScript to pass through. The cleaner now removes more sneaky "style" content.

`v4.6.0`

==================

Features added

GH#310: lxml.html.InputGetter supports __len__() to count the number of input fields.
Patch by Aidan Woolley.
lxml.html.InputGetter has a new .items() method to ease processing all input fields.
lxml.html.InputGetter.keys() now returns the field names in document order.
GH-309: The API documentation is now generated using sphinx-apidoc.
Patch by Chris Mayo.

Bugs fixed

LP#1869455: C14N 2.0 serialisation failed for unprefixed attributes
when a default namespace was defined.
TreeBuilder.close() raised AssertionError in some error cases where it
should have raised XMLSyntaxError. It now raises a combined exception to
keep up backwards compatibility, while switching to XMLSyntaxError as an
interface.

`v4.5.2`

==================

Bugs fixed

Cleaner() now validates that only known configuration options can be set.
LP#1882606: Cleaner.clean_html() discarded comments and PIs regardless of the
corresponding configuration option, if remove_unknown_tags was set.
LP#1880251: Instead of globally overwriting the document loader in libxml2, lxml now
sets it per parser run, which improves the interoperability with other users of libxml2
such as libxmlsec.
LP#1881960: Fix build in CPython 3.10 by using Cython 0.29.21.
The setup options "--with-xml2-config" and "--with-xslt-config" were accidentally renamed
to "--xml2-config" and "--xslt-config" in 4.5.1 and are now available again.

`v4.5.1`

==================

Bugs fixed

LP#1570388: Fix failures when serialising documents larger than 2GB in some cases.
LP#1865141, GH#298: QName values were not accepted by the el.iter() method.
Patch by xmo-odoo.
LP#1863413, GH#297: The build failed to detect libraries on Linux that are only
configured via pkg-config.
Patch by Hugh McMaster.

`v4.5.0`

==================

Features added

A new function indent() was added to insert tail whitespace for pretty-printing
an XML tree.

Bugs fixed

LP#1857794: Tail text of nodes that get removed from a document using item
deletion disappeared silently instead of sticking with the node that was removed.

Other changes

MacOS builds are 64-bit-only by default.
Set CFLAGS and LDFLAGS explicitly to override it.
Linux/MacOS Binary wheels now use libxml2 2.9.10 and libxslt 1.1.34.
LP#1840234: The package version number is now available as lxml.__version__.

`v4.4.3`

==================

Bugs fixed

LP#1844674: itertext() was missing tail text of comments and PIs since 4.4.0.

`v4.4.2`

==================

Bugs fixed

LP#1835708: ElementInclude incorrectly rejected repeated non-recursive
includes as recursive.
Patch by Rainer Hausdorf.

`v4.4.1`

==================

Bugs fixed

LP#1838252: The order of an OrderedDict was lost in 4.4.0 when passing it as
attrib mapping during element creation.
LP#1838521: The package metadata now lists the supported Python versions.

`v4.4.0`

==================

Features added

Element.clear() accepts a new keyword argument keep_tail=True to clear
everything but the tail text. This is helpful in some document-style use cases
and for clearing the current element in iterparse() and pull parsing.
When creating attributes or namespaces from a dict in Python 3.6+, lxml now
preserves the original insertion order of that dict, instead of always sorting
the items by name. A similar change was made for ElementTree in CPython 3.8.
See https://bugs.python.org/issue34160
Integer elements in lxml.objectify implement the __index__() special method.
GH#269: Read-only elements in XSLT were missing the nsmap property.
Original patch by Jan Pazdziora.
ElementInclude can now restrict the maximum inclusion depth via a max_depth
argument to prevent content explosion. It is limited to 6 by default.
The target object of the XMLParser can have start_ns() and end_ns()
callback methods to listen to namespace declarations.
The TreeBuilder has new arguments comment_factory and pi_factory to
pass factories for creating comments and processing instructions, as well as
flag arguments insert_comments and insert_pis to discard them from the
tree when set to false.
A C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>_ implementation was added as
etree.canonicalize(), a corresponding C14NWriterTarget class, and
a c14n2 serialisation method.

Bugs fixed

When writing to file paths that contain the URL escape character '%', the file
path could wrongly be mangled by URL unescaping and thus write to a different
file or directory. Code that writes to file paths that are provided by untrusted
sources, but that must work with previous versions of lxml, should best either
reject paths that contain '%' characters, or otherwise make sure that the path
does not contain maliciously injected '%XX' URL hex escapes for paths like '../'.
Assigning to Element child slices with negative step could insert the slice at
the wrong position, starting too far on the left.
Assigning to Element child slices with overly large step size could take very
long, regardless of the length of the actual slice.
Assigning to Element child slices of the wrong size could sometimes fail to
raise a ValueError (like a list assignment would) and instead assign outside
of the original slice bounds or leave parts of it unreplaced.
The comment and pi events in iterwalk() were never triggered, and
instead, comments and processing instructions in the tree were reported as
start elements. Also, when walking an ElementTree (as opposed to its root
element), comments and PIs outside of the root element are now reported.
LP#1827833: The RelaxNG compact syntax support was broken with recent versions
of rnc2rng.
LP#1758553: The HTML elements source and track were added to the list
of empty tags in lxml.html.defs.
Registering a prefix other than "xml" for the XML namespace is now rejected.
Failing to write XSLT output to a file could raise a misleading exception.
It now raises IOError.

Other changes

Support for Python 3.4 was removed.
When using Element.find*() with prefix-namespace mappings, the empty string
is now accepted to define a default namespace, in addition to the previously
supported None prefix. Empty strings are more convenient since they keep
all prefix keys in a namespace dict strings, which simplifies sorting etc.
The ElementTree.write_c14n() method has been deprecated in favour of the
long preferred ElementTree.write(f, method="c14n"). It will be removed
in a future release.

`v4.3.5`

==================

Rebuilt with Cython 0.29.13 to support Python 3.8.

`v4.3.4`

==================

Rebuilt with Cython 0.29.10 to support Python 3.8.

`v4.3.3`

==================

Bugs fixed

Fix leak of output buffer and unclosed files in _XSLTResultTree.write_output().

`v4.3.2`

==================

Bugs fixed

Crash in 4.3.1 when appending a child subtree with certain text nodes.

Other changes

Built with Cython 0.29.6.

`v4.3.1`

==================

Bugs fixed

LP#1814522: Crash when appending a child subtree that contains unsubstituted
entity references.

Other changes

Built with Cython 0.29.5.

`v4.3.0`

==================

Features added

The module lxml.sax is compiled using Cython in order to speed it up.
GH#267: lxml.sax.ElementTreeProducer now preserves the namespace prefixes.
If two prefixes point to the same URI, the first prefix in alphabetical order
is used. Patch by Lennart Regebro.
Updated ISO-Schematron implementation to 2013 version (now MIT licensed)
and the corresponding schema to the 2016 version (with optional "properties").

Other changes

GH#270, GH#271: Support for Python 2.6 and 3.3 was removed.
Patch by hugovk.
The minimum dependency versions were raised to libxml2 2.9.2 and libxslt 1.1.27,
which were released in 2014 and 2012 respectively.
Built with Cython 0.29.2.

`v4.2.6`

==================

Bugs fixed

LP#1799755: Fix a DeprecationWarning in Py3.7+.
Import warnings in Python 3.6+ were resolved.

`v4.2.5`

==================

Bugs fixed

Javascript URLs that used URL escaping were not removed by the HTML cleaner.
Security problem found by Omar Eissa. (CVE-2018-19787)

`v4.2.4`

==================

Features added

GH#259: Allow using pkg-config for build configuration.
Patch by Patrick Griffis.

Bugs fixed

LP#1773749, GH#268: Crash when moving an element to another document with
Element.insert().
Patch by Alexander Weggerle.

`v4.2.3`

==================

Bugs fixed

Reverted GH#265: lxml links against zlib as a shared library again.

`v4.2.2`

==================

Bugs fixed

GH#266: Fix sporadic crash during GC when parse-time schema validation is used
and the parser participates in a reference cycle.
Original patch by Julien Greard.
GH#265: lxml no longer links against zlib as a shared library, only on static builds.
Patch by Nehal J Wani.

`v4.2.1`

==================

Bugs fixed

LP#1755825: iterwalk() failed to return the 'start' event for the initial
element if a tag selector is used.
LP#1756314: Failure to import 4.2.0 into PyPy due to a missing library symbol.
LP#1727864, GH#258: Add "-isysroot" linker option on MacOS as needed by XCode 9.

`v4.2.0`

==================

Features added

GH#255: SelectElement.value returns more standard-compliant and
browser-like defaults for non-multi-selects. If no option is selected, the
value of the first option is returned (instead of None). If multiple options
are selected, the value of the last one is returned (instead of that of the
first one). If no options are present (not standard-compliant)
SelectElement.value still returns None.
GH#261: The HTMLParser() now supports the huge_tree option.
Patch by stranac.

Bugs fixed

LP#1551797: Some XSLT messages were not captured by the transform error log.
LP#1737825: Crash at shutdown after an interrupted iterparse run with XMLSchema
validation.

Other changes

`v4.1.1`

==================

Rebuild with Cython 0.27.3 to improve support for Py3.7.

`v4.1.0`

==================

Features added

ElementPath supports text predicates for current node, like "[.='text']".
ElementPath allows spaces in predicates.
Custom Element classes and XPath functions can now be registered with a
decorator rather than explicit dict assignments.
Static Linux wheels are now built with link time optimisation (LTO) enabled.
This should have a beneficial impact on the overall performance by providing
a tighter compiler integration between lxml and libxml2/libxslt.

Bugs fixed

LP#1722776: Requesting non-Element objects like comments from a document with
PythonElementClassLookup could fail with a TypeError.

`v4.0.0`

==================

Features added

The ElementPath implementation is now compiled using Cython,
which speeds up the .find*() methods quite significantly.
The modules lxml.builder, lxml.html.diff and lxml.html.clean
are also compiled using Cython in order to speed them up.
xmlfile() supports async coroutines using async with and await.
iterwalk() has a new method skip_subtree() that prevents walking into
the descendants of the current element.
RelaxNG.from_rnc_string() accepts a base_url argument to
allow relative resource lookups.
The XSLT result object has a new method .write_output(file) that serialises
output data into a file according to the <xsl:output> configuration.

Bugs fixed

GH#251: HTML comments were handled incorrectly by the soupparser.
Patch by mozbugbox.
LP#1654544: The html5parser no longer passes the useChardet option
if the input is a Unicode string, unless explicitly requested. When parsing
files, the default is to enable it when a URL or file path is passed (because
the file is then opened in binary mode), and to disable it when reading from
a file(-like) object.

Note: This is a backwards incompatible change of the default configuration.
If your code parses byte strings/streams and depends on character detection,
please pass the option guess_charset=True explicitly, which already worked
in older lxml versions.
LP#1703810: etree.fromstring() failed to parse UTF-32 data with BOM.
LP#1526522: Some RelaxNG errors were not reported in the error log.
LP#1567526: Empty and plain text input raised a TypeError in soupparser.
LP#1710429: Uninitialised variable usage in HTML diff.
LP#1415643: The closing tags context manager in xmlfile() could continue
to output end tags even after writing failed with an exception.
LP#1465357: xmlfile.write() now accepts and ignores None as input argument.
Compilation under Py3.7-pre failed due to a modified function signature.

Other changes

The main module source files were renamed from lxml.*.pyx to plain
*.pyx (e.g. etree.pyx) to simplify their handling in the build
process. Care was taken to keep the old header files as fallbacks for
code that compiles against the public C-API of lxml, but it might still
be worth validating that third-party code does not notice this change.

Configuration

📅 Schedule: "" (UTC).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, click this checkbox.

This PR has been generated by WhiteSource Renovate. View repository job log here.


          Update dependency lxml to v4 [SECURITY]

7f44feb

viezly bot commented Mar 7, 2022

Pull request by bot. No need to analyze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet