About 325,000 results
Open links in new tab
  1. Common Crawl Is Doing the AI Industry’s Dirty Work - The Atlantic

    Nov 4, 2025 · Editor’s note: This work is part of AI Watchdog, The Atlantic’s ongoing investigation into the generative-AI industry. The Common Crawl Foundation is little known outside of …

  2. Common Crawl Criticized for 'Quietly Funneling Paywalled ...

    4 days ago · For more than a decade, the nonprofit Common Crawl "has been scraping billions of webpages to build a massive archive of the internet," notes the Atlantic, making it freely …

  3. Common Crawl accused of giving paywalled content to AI ...

    6 days ago · Common Crawl’s massive internet archive may be giving AI companies access to paywalled journalism, according to a new report.

  4. The Company Quietly Funneling Paywalled Articles to AI

    Nov 4, 2025 · In the process, my reporting has found, Common Crawl has opened a back door for AI companies to train their models with paywalled articles from major news websites.

  5. The company quietly funneling paywalled articles to AI ...

    6 days ago · The company quietly funneling paywalled articles to AI developers The Atlantic / Alex Reisner / Nov 5, 2025 “A search for nytimes.com in any crawl from 2013 through 2022 shows …

  6. Common Crawl deletes 2M Dutch news articles | Cybernews

    6 days ago · At the request of BREIN, Common Crawl has removed over two million news articles belonging to popular Dutch news outlets from its AI training dataset. According to BREIN, a …

  7. Common Crawl's Controversial Role in AI Training Raises ...

    Nov 4, 2025 · The Common Crawl Foundation has been scraping the internet for over a decade, creating a vast archive used by AI companies to train models, including paywalled content. …