HAI Seminar with Common Crawl
Schedule
Wed Oct 22 2025 at 12:00 pm to 01:15 pm
UTC-07:00Location
Gates Computer Science Building Room 119 | Stanford, CA

About this Event
Preserving Humanity's Knowledge and Making it Accessible | Addressing Challenges of Public Web Data
HAI Seminar with Common Crawl
Visit our website to learn more about the event agenda, speakers, and other details
The Common Crawl Foundation is dedicated to preserving humanity's knowledge and making it accessible through its free public web dataset, a vital resource since 2008. As AI development accelerates, concerns have emerged regarding the accessibility and transparency of public web data, impacting open datasets in three key ways: robots.txt exclusions, legal demands, and "bot defenses." Two of these are not visible in public and are not very well understood. We will present insights from a new data product that utilizes Common Crawl's crawl metadata to visually explore these three problems, advocating for greater transparency and informed solutions for the future of public web data.
Details:
Time: 12:00 pm - 1:15 pm PT
Location: Gates Computer Science Building, Room 119, 353 Jane Stanford Way, CA 94503.
Where is it happening?
Gates Computer Science Building Room 119, 353 Serra Mall, Stanford, United StatesEvent Location & Nearby Stays:
USD 0.00
