0 bookmark(s) - Sort by: Date ↓ / Title /
we are happy to announce a new release of the WebDataCommons RDFa, Microdata, and Microformat data sets.
The data sets have been extracted from the November 2013 version of the Common Crawl covering 2.24 billion HTML pages which originate from 12.8 million websites (pay-level-domains).
Altogether we discovered structured data within 585 million HTML pages out of the 2.24 billion pages contained in the crawl (26%). These pages originate from 1.7 million different pay-level-domains out of the 12.8 million pay-level-domains covered by the crawl (13%).
Approximately 471 thousand of these websites use RDFa, while 463 thousand websites use Microdata. Microformats are used on 1 million websites within the crawl.
First / Previous / Next / Last
/ Page 2 of 0