• Home
  • Next Gen Gadgets for ME
  • Middle Eastern Startup Ecosystem
  • FutureTech in ME
  • Reports
  • Home
  • Next Gen Gadgets for ME
  • Middle Eastern Startup Ecosystem
  • FutureTech in ME
  • Reports
Home Global News

Cloudflare Says Perplexity AI is Secretly Crawling Websites to Steal User Data and Ignore Privacy Rules

August 7, 2025
in Global News
Reading Time: 4 mins read
Cloudflare Says Perplexity AI is Secretly Crawling Websites to Steal User Data and Ignore Privacy Rules
Share on FacebookShare on Twitter

The war over data privacy just got messier. Cloudflare has accused Perplexity AI, a fast-growing AI-powered search engine, of secretly crawling websites to collect data, even when those websites have explicitly instructed bots to stay out. According to Cloudflare, Perplexity has been disguising its identity, rotating IP addresses, and ignoring robots.txt files, which are standard tools websites use to say “do not scrape my content.”

This means Perplexity might be accessing and using information without permission, raising significant concerns about how AI companies collect user data and whether they’re adhering to the rules of the internet.

You might also like

10 Powerful Google Chrome AI Features That Will Make Your Browsing Effortless

Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

OpenAI Launches ChatGPT Go in India at $5/Month With New UPI Support, Its Most Affordable Plan Yet

The Core Allegation

According to Cloudflare, Perplexity initially identifies itself correctly when crawling sites. However, when faced with network blocks or restrictions via robots.txt files, it allegedly switches tactics by:

  • Modifying its user agent to disguise crawling activity
  • Rotating IP addresses and ASNs to bypass restrictions
  • Using undeclared crawlers in addition to its public bots (PerplexityBot and Perplexity-User)
  • Ignoring or, in some cases, failing to even request robots.txt directives

Evidence from Cloudflare’s Investigation

Cloudflare claims it launched an investigation after receiving multiple complaints from customers who had explicitly prohibited Perplexity’s crawlers through robots.txt and Web Application Firewall (WAF) rules. Despite these measures, customers reported that Perplexity continued accessing their content.

To verify, Cloudflare:

  • Created controlled test environments using brand-new domains, implementing strict robots.txt rules to block all bots
  • Observed that Perplexity’s bots still retrieved restricted content
  • Detected attempts by Perplexity to impersonate a generic browser agent, mimicking Google Chrome on macOS when its declared crawler was blocked
  • Traced undeclared crawlers using machine learning and network signal analysis across tens of thousands of domains and millions of requests per day

The Technical Breakdown

Cloudflare’s findings show that Perplexity’s undeclared crawlers were:

  • Using IP addresses outside Perplexity’s official IP range
  • Rotating through these IPs and switching ASNs to avoid detection
  • Conducting large-scale scraping, described as across tens of thousands of domains and millions of requests per day

In addition, Cloudflare reports that Perplexity continued providing detailed responses about restricted test domains, even though they were explicitly blocked.

Why This Matters

For decades, the internet has operated on an implicit foundation of trust between site owners and automated crawlers. Protocols like robots.txt exist to balance functionality and fairness, ensuring that sites can manage automated access without resorting to aggressive measures.

Cloudflare’s statement underscores this principle:
“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust.”

Violations of this trust, the company warns, undermine the principles that allow the web and, by extension, AI systems built on top of it to function transparently.

The Broader Implications

This controversy isn’t just about one company. It raises broader questions about:

  • Ethical AI Development: Should AI companies honor standard web protocols, or is aggressive data acquisition a necessary evil in the race for better models?
  • Data Ownership and Consent: Who controls the content that AI scrapes, and how should consent be enforced?
  • Industry Regulation: Will this prompt calls for stronger governance and legal frameworks around AI-driven crawling?

Perplexity’s Response?

As of this writing, Perplexity has not issued an official statement addressing Cloudflare’s claims. The company, known for its rapid rise as a conversational AI competitor, now faces scrutiny not only from the tech community but potentially from regulators concerned with compliance and data ethics.

Bottom Line

The Cloudflare-Perplexity standoff signals the beginning of a larger battle over how AI companies acquire data, and whether transparency will remain a cornerstone of the internet or become collateral damage in the AI arms race.

Advertisement Advertisement Advertisement
ADVERTISEMENT
Previous Post

Boxy Closes $1.5 Million Pre‑Seed Investment from EQIQ

Next Post

GPT‑5 for Middle East Startups: What Founders and Developers Should Know

Recommended For You

10 Powerful Google Chrome AI Features That Will Make Your Browsing Effortless
Artifical Intelligence

10 Powerful Google Chrome AI Features That Will Make Your Browsing Effortless

by Faith Amonimo
September 19, 2025
0

Chrome users can now tap into Google's most powerful AI assistant directly from their browser as the company rolls out its biggest update in the platform's 16-year history. The new...

Read moreDetails
Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

September 11, 2025
OpenAI Launches ChatGPT Go in India at $5/Month With New UPI Support, Its Most Affordable Plan Yet

OpenAI Launches ChatGPT Go in India at $5/Month With New UPI Support, Its Most Affordable Plan Yet

August 19, 2025
SoftBank to Invest $2 Billion in Intel: Impact on Middle East Tech and AI

SoftBank to Invest $2 Billion in Intel: Impact on Middle East Tech and AI

August 19, 2025
Saudi Tech Growth Sparks Surge in New Businesses, 1.72 Million Registered

Saudi Arabia Caps Foreign Ownership in Listed Companies at 49%: What This Means for Tech Investors

August 18, 2025
Next Post
GPT‑5 for Middle East Startups: What Founders and Developers Should Know

GPT‑5 for Middle East Startups: What Founders and Developers Should Know

Fluidity Riyadh: A New Gateway for GCC–Africa Business Partnerships

Fluidity Riyadh: A New Gateway for GCC–Africa Business Partnerships

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular Stories

  • Azad Abdullahi Launches Snoozify: A Free Chrome Extension That Lets You Snooze Tabs and Bring Them Back Later

    Azad Abdullahi Launches Snoozify: A Free Chrome Extension That Lets You Snooze Tabs and Bring Them Back Later

    0 shares
    Share 0 Tweet 0
  • Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

    0 shares
    Share 0 Tweet 0
  • UAE PropTech Leader PRYPCO Raises Pre-Series A Funding from General Catalyst

    0 shares
    Share 0 Tweet 0
  • How She Works: The Cultural Competence Playbook of Jennifer Mwangi, NHS Transformation Leader and Doctoral Researcher at the University of Bath

    0 shares
    Share 0 Tweet 0
  • Spotify Launches Lossless Audio for Premium Subscribers

    0 shares
    Share 0 Tweet 0

Where the Middle East Tech Revolution Begins – Covering tech innovations, startups, and developments across the Middle East..​

Facebook X-twitter Instagram Linkedin

Get In Touch

United Arab Emirates (Dubai)

Email: Info@techsoma.net

Quick Links

Advertise on Techsoma

Publish your Articles

T & C

Privacy Policy

© 2025 — Techsoma Middle East. All Rights Reserved

Add New Playlist

No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?