Skip to content

Latest commit

 

History

History
44 lines (32 loc) · 1.22 KB

File metadata and controls

44 lines (32 loc) · 1.22 KB

HTMLParser

HTML parser written in Python

It implements most of needed functions for convenient working with HTML DOM.

Features

  • Parsing from string or from URL(with or without connection);
  • All DOM readonly functions;
  • CSS query selectors;

Warnings

When using querySelect, please keep in mind some differences from native CSS selectors:

  • when using selectors like querySelectorAll("input[type='text']"), attribute value should always be quoted;
  • when using complex selectors like querySelectorAll("div > li > div"), there should be at least one space between each selector and operator.

Usage

From URL:

  from parser import *
  dom = HTMLDomParser(PARSER_MODE["URL"], "http://my_favourite_web_site.zzz")
  doc = dom.getDocument()
  divs = doc.getElementsByTagName("div")
  firstDiv = divs[0]
  firstDivFirstChild = firstDiv.firstElementChild()
  secondDiv = divs[1]
  secondDiv2 = firstDiv.nextElementSibling()

  navs = doc.getElementsByClassName("nav")

  classyDivs = doc.querySelectorAll("div[class]")
  divLiDiv = doc.querySelectorAll("div > li > div")

  ...

Or from string:

  from parser import *
  dom = HTMLDomParser(PARSER_MODE["RAW"], "<html><head>...</head><body>...</body></html>")