Open
Conversation
|
Pull request by bot. No need to analyze |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==3.8.0->==4.6.5GitHub Vulnerability Alerts
CVE-2020-27783
A XSS vulnerability was discovered in python-lxml's clean module. The module's parser didn't properly imitate browsers, which caused different behaviors between the sanitizer and the user's page. A remote attacker could exploit this flaw to run arbitrary HTML/JS code.
CVE-2021-28957
An XSS vulnerability was discovered in the python
lxmlclean module versions before 4.6.3. When disabling the safe_attrs_only and forms arguments, the Cleaner class does not remove the formaction attribute allowing for JS to bypass the sanitizer. A remote attacker could exploit this flaw to run arbitrary JS code on users who interact with incorrectly sanitized HTML. This issue is patched inlxml4.6.3.CVE-2021-43818
Impact
The HTML Cleaner in lxml.html lets certain crafted script content pass through, as well as script content in SVG files embedded using data URIs.
Users that employ the HTML cleaner in a security relevant context should upgrade to lxml 4.6.5.
Patches
The issue has been resolved in lxml 4.6.5.
Workarounds
None.
References
The issues are tracked under the report IDs GHSL-2021-1037 and GHSL-2021-1038.
Release Notes
lxml/lxml
v4.6.5Compare Source
==================
Bugs fixed
A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script
content through SVG images (CVE-2021-43818).
A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script
content through CSS imports and other crafted constructs (CVE-2021-43818).
v4.6.4Compare Source
==================
Features added
GH#317: A new property
system_urlwas added to DTD entities.Patch by Thirdegree.
GH#314: The
STATIC_*variables insetup.pycan now be passed via env vars.Patch by Isaac Jurado.
v4.6.3Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes the HTML5
formactionattribute.v4.6.2Compare Source
==================
Bugs fixed
which allowed JavaScript to pass through. The cleaner now removes more sneaky
"style" content.
v4.6.1Compare Source
==================
Bugs fixed
JavaScript to pass through. The cleaner now removes more sneaky "style" content.
v4.6.0Compare Source
==================
Features added
GH#310:
lxml.html.InputGettersupports__len__()to count the number of input fields.Patch by Aidan Woolley.
lxml.html.InputGetterhas a new.items()method to ease processing all input fields.lxml.html.InputGetter.keys()now returns the field names in document order.GH-309: The API documentation is now generated using
sphinx-apidoc.Patch by Chris Mayo.
Bugs fixed
LP#1869455: C14N 2.0 serialisation failed for unprefixed attributes
when a default namespace was defined.
TreeBuilder.close()raisedAssertionErrorin some error cases where itshould have raised
XMLSyntaxError. It now raises a combined exception tokeep up backwards compatibility, while switching to
XMLSyntaxErroras aninterface.
v4.5.2Compare Source
==================
Bugs fixed
Cleaner()now validates that only known configuration options can be set.LP#1882606:
Cleaner.clean_html()discarded comments and PIs regardless of thecorresponding configuration option, if
remove_unknown_tagswas set.LP#1880251: Instead of globally overwriting the document loader in libxml2, lxml now
sets it per parser run, which improves the interoperability with other users of libxml2
such as libxmlsec.
LP#1881960: Fix build in CPython 3.10 by using Cython 0.29.21.
The setup options "--with-xml2-config" and "--with-xslt-config" were accidentally renamed
to "--xml2-config" and "--xslt-config" in 4.5.1 and are now available again.
v4.5.1Compare Source
==================
Bugs fixed
LP#1570388: Fix failures when serialising documents larger than 2GB in some cases.
LP#1865141, GH#298:
QNamevalues were not accepted by theel.iter()method.Patch by xmo-odoo.
LP#1863413, GH#297: The build failed to detect libraries on Linux that are only
configured via pkg-config.
Patch by Hugh McMaster.
v4.5.0Compare Source
==================
Features added
indent()was added to insert tail whitespace for pretty-printingan XML tree.
Bugs fixed
deletion disappeared silently instead of sticking with the node that was removed.
Other changes
MacOS builds are 64-bit-only by default.
Set CFLAGS and LDFLAGS explicitly to override it.
Linux/MacOS Binary wheels now use libxml2 2.9.10 and libxslt 1.1.34.
LP#1840234: The package version number is now available as
lxml.__version__.v4.4.3Compare Source
==================
Bugs fixed
itertext()was missing tail text of comments and PIs since 4.4.0.v4.4.2Compare Source
==================
Bugs fixed
ElementIncludeincorrectly rejected repeated non-recursiveincludes as recursive.
Patch by Rainer Hausdorf.
v4.4.1Compare Source
==================
Bugs fixed
LP#1838252: The order of an OrderedDict was lost in 4.4.0 when passing it as
attrib mapping during element creation.
LP#1838521: The package metadata now lists the supported Python versions.
v4.4.0Compare Source
==================
Features added
Element.clear()accepts a new keyword argumentkeep_tail=Trueto cleareverything but the tail text. This is helpful in some document-style use cases
and for clearing the current element in
iterparse()and pull parsing.When creating attributes or namespaces from a dict in Python 3.6+, lxml now
preserves the original insertion order of that dict, instead of always sorting
the items by name. A similar change was made for ElementTree in CPython 3.8.
See https://bugs.python.org/issue34160
Integer elements in
lxml.objectifyimplement the__index__()special method.GH#269: Read-only elements in XSLT were missing the
nsmapproperty.Original patch by Jan Pazdziora.
ElementInclude can now restrict the maximum inclusion depth via a
max_depthargument to prevent content explosion. It is limited to 6 by default.
The
targetobject of the XMLParser can havestart_ns()andend_ns()callback methods to listen to namespace declarations.
The
TreeBuilderhas new argumentscomment_factoryandpi_factorytopass factories for creating comments and processing instructions, as well as
flag arguments
insert_commentsandinsert_pisto discard them from thetree when set to false.
A
C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>_ implementation was added asetree.canonicalize(), a correspondingC14NWriterTargetclass, anda
c14n2serialisation method.Bugs fixed
When writing to file paths that contain the URL escape character '%', the file
path could wrongly be mangled by URL unescaping and thus write to a different
file or directory. Code that writes to file paths that are provided by untrusted
sources, but that must work with previous versions of lxml, should best either
reject paths that contain '%' characters, or otherwise make sure that the path
does not contain maliciously injected '%XX' URL hex escapes for paths like '../'.
Assigning to Element child slices with negative step could insert the slice at
the wrong position, starting too far on the left.
Assigning to Element child slices with overly large step size could take very
long, regardless of the length of the actual slice.
Assigning to Element child slices of the wrong size could sometimes fail to
raise a ValueError (like a list assignment would) and instead assign outside
of the original slice bounds or leave parts of it unreplaced.
The
commentandpievents initerwalk()were never triggered, andinstead, comments and processing instructions in the tree were reported as
startelements. Also, when walking an ElementTree (as opposed to its rootelement), comments and PIs outside of the root element are now reported.
LP#1827833: The RelaxNG compact syntax support was broken with recent versions
of
rnc2rng.LP#1758553: The HTML elements
sourceandtrackwere added to the listof empty tags in
lxml.html.defs.Registering a prefix other than "xml" for the XML namespace is now rejected.
Failing to write XSLT output to a file could raise a misleading exception.
It now raises
IOError.Other changes
Support for Python 3.4 was removed.
When using
Element.find*()with prefix-namespace mappings, the empty stringis now accepted to define a default namespace, in addition to the previously
supported
Noneprefix. Empty strings are more convenient since they keepall prefix keys in a namespace dict strings, which simplifies sorting etc.
The
ElementTree.write_c14n()method has been deprecated in favour of thelong preferred
ElementTree.write(f, method="c14n"). It will be removedin a future release.
v4.3.5Compare Source
==================
v4.3.4Compare Source
==================
v4.3.3Compare Source
==================
Bugs fixed
_XSLTResultTree.write_output().v4.3.2Compare Source
==================
Bugs fixed
Other changes
v4.3.1Compare Source
==================
Bugs fixed
entity references.
Other changes
v4.3.0Compare Source
==================
Features added
The module
lxml.saxis compiled using Cython in order to speed it up.GH#267:
lxml.sax.ElementTreeProducernow preserves the namespace prefixes.If two prefixes point to the same URI, the first prefix in alphabetical order
is used. Patch by Lennart Regebro.
Updated ISO-Schematron implementation to 2013 version (now MIT licensed)
and the corresponding schema to the 2016 version (with optional "properties").
Other changes
GH#270, GH#271: Support for Python 2.6 and 3.3 was removed.
Patch by hugovk.
The minimum dependency versions were raised to libxml2 2.9.2 and libxslt 1.1.27,
which were released in 2014 and 2012 respectively.
Built with Cython 0.29.2.
v4.2.6Compare Source
==================
Bugs fixed
LP#1799755: Fix a DeprecationWarning in Py3.7+.
Import warnings in Python 3.6+ were resolved.
v4.2.5Compare Source
==================
Bugs fixed
Security problem found by Omar Eissa. (CVE-2018-19787)
v4.2.4Compare Source
==================
Features added
pkg-configfor build configuration.Patch by Patrick Griffis.
Bugs fixed
Element.insert().Patch by Alexander Weggerle.
v4.2.3Compare Source
==================
Bugs fixed
v4.2.2Compare Source
==================
Bugs fixed
GH#266: Fix sporadic crash during GC when parse-time schema validation is used
and the parser participates in a reference cycle.
Original patch by Julien Greard.
GH#265: lxml no longer links against zlib as a shared library, only on static builds.
Patch by Nehal J Wani.
v4.2.1Compare Source
==================
Bugs fixed
LP#1755825:
iterwalk()failed to return the 'start' event for the initialelement if a tag selector is used.
LP#1756314: Failure to import 4.2.0 into PyPy due to a missing library symbol.
LP#1727864, GH#258: Add "-isysroot" linker option on MacOS as needed by XCode 9.
v4.2.0Compare Source
==================
Features added
GH#255:
SelectElement.valuereturns more standard-compliant andbrowser-like defaults for non-multi-selects. If no option is selected, the
value of the first option is returned (instead of None). If multiple options
are selected, the value of the last one is returned (instead of that of the
first one). If no options are present (not standard-compliant)
SelectElement.valuestill returnsNone.GH#261: The
HTMLParser()now supports thehuge_treeoption.Patch by stranac.
Bugs fixed
LP#1551797: Some XSLT messages were not captured by the transform error log.
LP#1737825: Crash at shutdown after an interrupted iterparse run with XMLSchema
validation.
Other changes
v4.1.1Compare Source
==================
v4.1.0Compare Source
==================
Features added
ElementPath supports text predicates for current node, like "[.='text']".
ElementPath allows spaces in predicates.
Custom Element classes and XPath functions can now be registered with a
decorator rather than explicit dict assignments.
Static Linux wheels are now built with link time optimisation (LTO) enabled.
This should have a beneficial impact on the overall performance by providing
a tighter compiler integration between lxml and libxml2/libxslt.
Bugs fixed
PythonElementClassLookupcould fail with a TypeError.v4.0.0Compare Source
==================
Features added
The ElementPath implementation is now compiled using Cython,
which speeds up the
.find*()methods quite significantly.The modules
lxml.builder,lxml.html.diffandlxml.html.cleanare also compiled using Cython in order to speed them up.
xmlfile()supports async coroutines usingasync withandawait.iterwalk()has a new methodskip_subtree()that prevents walking intothe descendants of the current element.
RelaxNG.from_rnc_string()accepts abase_urlargument toallow relative resource lookups.
The XSLT result object has a new method
.write_output(file)that serialisesoutput data into a file according to the
<xsl:output>configuration.Bugs fixed
GH#251: HTML comments were handled incorrectly by the soupparser.
Patch by mozbugbox.
LP#1654544: The html5parser no longer passes the
useChardetoptionif the input is a Unicode string, unless explicitly requested. When parsing
files, the default is to enable it when a URL or file path is passed (because
the file is then opened in binary mode), and to disable it when reading from
a file(-like) object.
Note: This is a backwards incompatible change of the default configuration.
If your code parses byte strings/streams and depends on character detection,
please pass the option
guess_charset=Trueexplicitly, which already workedin older lxml versions.
LP#1703810:
etree.fromstring()failed to parse UTF-32 data with BOM.LP#1526522: Some RelaxNG errors were not reported in the error log.
LP#1567526: Empty and plain text input raised a TypeError in soupparser.
LP#1710429: Uninitialised variable usage in HTML diff.
LP#1415643: The closing tags context manager in
xmlfile()could continueto output end tags even after writing failed with an exception.
LP#1465357:
xmlfile.write()now accepts and ignores None as input argument.Compilation under Py3.7-pre failed due to a modified function signature.
Other changes
lxml.*.pyxto plain*.pyx(e.g.etree.pyx) to simplify their handling in the buildprocess. Care was taken to keep the old header files as fallbacks for
code that compiles against the public C-API of lxml, but it might still
be worth validating that third-party code does not notice this change.
Configuration
📅 Schedule: "" (UTC).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by WhiteSource Renovate. View repository job log here.