Comparison of HTML parsers
Parser | License | Implementation language(s) | Latest release date |
---|---|---|---|
Beautiful Soup | Python Software Foundation License | Python | 2012-08-20 [1] |
html5lib | MIT License | Python and PHP | 2012-02-11[2] |
HTML::Parser | Perl license | Perl | 2011-10-11 |
HTML Tidy | W3C license | ANSI C | 2009-03-25[3] |
HtmlCleaner | BSD License[4] | Java | 2010-12-22[5] |
Jericho HTML Parser | Eclipse Public License | Java | 2012-10-30[6] |
jsdom | MIT license | JavaScript | 2012-07-12 |
jsoup | MIT license | Java | 2012-09-23[7] |
JTidy | JTidy License | Java | 2009-12-01[8] |
libxml2 HTMLparser | MIT License | C (programming language) | 2012-09-11[9] |
NekoHTML | Apache License 2.0 | Java | 2012-11-05[10] |
TagSoup | Apache License 2.0 | Java | 2011-07-07 |
Validator.nu HTML Parser | MIT License | Java | 2012-06-05 |
Parser | License | Implementation language(s) | Latest release date |
Some usages
Searching
Parser | Tag search | Class search | ID search |
---|---|---|---|
Beautiful Soup | .tag_name | .find("tag_name", {"class": "class_name"}) | .find(id=id_value) |
jQuery | $("tag_name") | $(".class_name") | $("#id_value") |
References
- ^ [1]
- ^ Downloads - html5lib - Library for working with HTML documents - Google Project Hosting
- ^ HTML Tidy for Windows
- ^ HtmlCleaner is distributed under BSD License
- ^ Dec. 22, 2010: HtmlCleaner release 2.2
- ^ Jericho HTML Parser - Browse /jericho-html/3.3 at SourceForge.net
- ^ jsoup/CHANGES at master · jhy/jsoup · GitHub
- ^ JTidy - Browse /JTidy at SourceForge.net
- ^ libxml2 Releases
- ^ NekoHTML | Change History