Skip to content

Latest commit

 

History

History
26 lines (16 loc) · 2.13 KB

lxml.md

File metadata and controls

26 lines (16 loc) · 2.13 KB

lxml

跟 ElementTree 的關係 {: #elementtree }

  • lxml - Processing XML and HTML with Python lxml.etree 除了 ElementTree API,也提供 lxml.etree 專有的API,以揭露底層 libxml2libxslt 專屬的功能,例如 XPath、XML Schema、XSLT 等。
  • The lxml.etree Tutorial 這份 Tutorial 主要是在講 ElementTree API,相較於另一份文件 lxml.etree specific API 則是從 "相較於 ElementTree API",lxml.etree 多了什麼" 的角度在寫。
  • ElementTree compatibility of lxml.etree 注重 lxml.etreeElementTree 的相容性,但還是有不相容的部份。
  • Benchmarks and Speed lxml.etree 在許多地方速度比 cElementTree 還快。
  • lxml FAQ - Frequently Asked Questions Who uses lxml? 提到 "Also note that the compatibility to the ElementTree library does not require projects to set a hard dependency on lxml - as long as they do not take advantage of lxml's enhanced feature set." 意思是可以因 lxml 提昇效能,但又不綁定 lxml,不過前題是不能用到 lxml 特有的 API。

HTML Parsing ??

  • Parsing HTML - Parsing XML and HTML with lxml #ril
    • HTML parsing is similarly simple. The parsers have a recover keyword argument that the HTMLParser sets by default. It lets libxml2 try its best to return a valid HTML tree with all content it can manage to parse. It will not raise an exception on parser errors. You should use libxml2 version 2.6.21 or newer to take advantage of this feature.

參考資料 {: #reference }

社群: