Skip to content
forked from edsu/microdata

python library for extracting html5 microdata

Notifications You must be signed in to change notification settings

kmartino/microdata

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

microdata.py is a small utility library for extracting HTML5 Microdata from HTML. It depends on html5lib to do the heavy lifting of building the DOM. For more about HTML5 Microdata check out Mark Pilgrim's chapter on on it in Dive Into HTML5.

Here's the basic usage using https://raw.github.com/edsu/microdata/master/test-data/example.html as an example:

>>> import microdata
>>> items = microdata.get_items(open("test-data/example.html"))
>>> item = items[0]
>>> item.itemtype
u"http://schema.org/Person"
>>> item.name
u"Jane Doe"
>>> item.colleagues
u"http://www.xyz.edu/students/alicejones.html"
>>> item.get_all('colleagues')
[u"http://www.xyz.edu/students/alicejones.html", u"http://www.xyz.edu/students/bobsmith.html"]
>>> print item.json()
{ 
  "$itemtype": "http://schema.org/Person",
  "$itemid": "http://www.xyz.edu/~jane",
  "colleagues": [
    "http://www.xyz.edu/students/alicejones.html",
    "http://www.xyz.edu/students/bobsmith.html"
  ],
  "name": [
    "Jane Doe"
  ],
  "url": [
    "www.janedoe.com"
  ],
  "image": [
    "janedoe.jpg"
  ],
  "address": [
    { 
      "$itemtype": "http://schema.org/PostalAddress",
      "addressLocality": [
        "Seattle"
      ],
      "streetAddress": [
        "\n          20341 Whitworth Institute\n          405 N. Whitworth\n" 
      ],
      "postalCode": [
        "98052"
      ],
      "addressRegion": [
        "WA"
      ]
    }
  ],
  "telephone": [
    "(425) 123-4567"
  ],
  "jobTitle": [
    "Professor"
  ],
  "email": [
    "mailto:jane-doe@xyz.edu"
  ]
}

About

python library for extracting html5 microdata

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published