×
AWS Probes AI Startup Perplexity Over Alleged Data Scraping Violations
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Amazon investigates Perplexity AI over potential data-scraping violations: Amazon Web Services is looking into whether AI startup Perplexity is violating its terms of service by scraping web content without permission, following reports from multiple news outlets.

Accusations of improper data scraping: Several publications, including Forbes and Wired, have accused Perplexity of swiping their web archives to train its AI models without consent or compensation:

  • Forbes alleged that Perplexity is creating “knockoff stories” using similar wording and lifted fragments from its articles without adequate citation.
  • Wired identified an IP address it believes Perplexity is using to crawl its sites and those of its parent company, Condé Nast, in violation of the robots.txt standard.
  • The Guardian, Forbes, and The New York Times also told Wired they have seen the same IP address on their servers.

AWS investigating Perplexity’s practices: An AWS representative confirmed that Amazon is investigating whether Perplexity is breaking its rules, which prohibit using AWS services for any illegal activity:

  • All AWS clients must follow the instructions in websites’ robots.txt files, which typically disallow bots from scraping data.
  • However, Perplexity claims it is following the rules and that AWS is not looking into the startup beyond the initial Wired report.

Broader tensions over AI firms scraping web content: The Perplexity investigation highlights growing backlash against tech companies training AI models on web data without explicit permission:

  • Microsoft’s AI chief recently claimed any “open web” content is “fair use” for AI firms to scrape and monetize, sparking debate over the ethics of this practice.
  • The New York Times is suing OpenAI and Microsoft for alleged copyright infringement by pulling from its articles to train their AI without consent.
  • Some outlets like Semafor and TIME have proactively licensed their content to AI companies, while others are fighting back against nonconsensual scraping.

Perplexity’s ambitions and industry connections: Despite the controversy, Perplexity has positioned itself as a potential Google competitor with backing from major tech players:

  • The startup aims to offer an AI-powered “answer engine” and is supported by Jeff Bezos’ investment fund and Nvidia.
  • However, the similarities between some of Perplexity’s and Google’s search results have raised questions about the true extent of its innovation.

Analyzing deeper: The Perplexity investigation underscores the complex dynamics around AI and web scraping, with tech giants, startups, and content creators jockeying for control over valuable training data. As lawmakers and the public scrutinize these practices more closely, clearer regulations may be needed to balance AI innovation with intellectual property rights and content ownership. In the meantime, the outcome of Amazon’s investigation could set an important precedent for Perplexity and other AI firms relying on web data to fuel their models.

Amazon Investigates Perplexity AI Over Potential Data-Scraping Violations

Recent News

Time Partners with OpenAI, Joining Growing Trend of Media Companies Embracing AI

Time partners with OpenAI, joining a growing trend of media companies leveraging AI to enhance journalism and expand access to trusted information.

AI Uncovers EV Adoption Barriers, Sparking New Climate Research Opportunities

Innovative AI analysis reveals critical barriers to electric vehicle adoption, offering insights to accelerate the transition.

AI Accelerates Disease Diagnosis: Earlier Detection, Novel Biomarkers, and Personalized Insights

From facial temperature patterns to blood biomarkers, AI is enabling earlier detection and uncovering novel indicators of disease.