• Home
  • Next Gen Gadgets for ME
  • Middle Eastern Startup Ecosystem
  • FutureTech in ME
  • Reports
  • Home
  • Next Gen Gadgets for ME
  • Middle Eastern Startup Ecosystem
  • FutureTech in ME
  • Reports
Home Global News

Cloudflare Says Perplexity AI is Secretly Crawling Websites to Steal User Data and Ignore Privacy Rules

by Ifeanyi Abraham
August 7, 2025
in Global News
Reading Time: 4 mins read
Cloudflare Says Perplexity AI is Secretly Crawling Websites to Steal User Data and Ignore Privacy Rules
Share on FacebookShare on Twitter

The war over data privacy just got messier. Cloudflare has accused Perplexity AI, a fast-growing AI-powered search engine, of secretly crawling websites to collect data, even when those websites have explicitly instructed bots to stay out. According to Cloudflare, Perplexity has been disguising its identity, rotating IP addresses, and ignoring robots.txt files, which are standard tools websites use to say “do not scrape my content.”

This means Perplexity might be accessing and using information without permission, raising significant concerns about how AI companies collect user data and whether they’re adhering to the rules of the internet.

The Core Allegation

According to Cloudflare, Perplexity initially identifies itself correctly when crawling sites. However, when faced with network blocks or restrictions via robots.txt files, it allegedly switches tactics by:

  • Modifying its user agent to disguise crawling activity
  • Rotating IP addresses and ASNs to bypass restrictions
  • Using undeclared crawlers in addition to its public bots (PerplexityBot and Perplexity-User)
  • Ignoring or, in some cases, failing to even request robots.txt directives

Evidence from Cloudflare’s Investigation

Cloudflare claims it launched an investigation after receiving multiple complaints from customers who had explicitly prohibited Perplexity’s crawlers through robots.txt and Web Application Firewall (WAF) rules. Despite these measures, customers reported that Perplexity continued accessing their content.

To verify, Cloudflare:

  • Created controlled test environments using brand-new domains, implementing strict robots.txt rules to block all bots
  • Observed that Perplexity’s bots still retrieved restricted content
  • Detected attempts by Perplexity to impersonate a generic browser agent, mimicking Google Chrome on macOS when its declared crawler was blocked
  • Traced undeclared crawlers using machine learning and network signal analysis across tens of thousands of domains and millions of requests per day

The Technical Breakdown

Cloudflare’s findings show that Perplexity’s undeclared crawlers were:

  • Using IP addresses outside Perplexity’s official IP range
  • Rotating through these IPs and switching ASNs to avoid detection
  • Conducting large-scale scraping, described as across tens of thousands of domains and millions of requests per day

In addition, Cloudflare reports that Perplexity continued providing detailed responses about restricted test domains, even though they were explicitly blocked.

Why This Matters

For decades, the internet has operated on an implicit foundation of trust between site owners and automated crawlers. Protocols like robots.txt exist to balance functionality and fairness, ensuring that sites can manage automated access without resorting to aggressive measures.

Cloudflare’s statement underscores this principle:
“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust.”

Violations of this trust, the company warns, undermine the principles that allow the web and, by extension, AI systems built on top of it to function transparently.

The Broader Implications

This controversy isn’t just about one company. It raises broader questions about:

  • Ethical AI Development: Should AI companies honor standard web protocols, or is aggressive data acquisition a necessary evil in the race for better models?
  • Data Ownership and Consent: Who controls the content that AI scrapes, and how should consent be enforced?
  • Industry Regulation: Will this prompt calls for stronger governance and legal frameworks around AI-driven crawling?

Perplexity’s Response?

As of this writing, Perplexity has not issued an official statement addressing Cloudflare’s claims. The company, known for its rapid rise as a conversational AI competitor, now faces scrutiny not only from the tech community but potentially from regulators concerned with compliance and data ethics.

Bottom Line

The Cloudflare-Perplexity standoff signals the beginning of a larger battle over how AI companies acquire data, and whether transparency will remain a cornerstone of the internet or become collateral damage in the AI arms race.

Advertisement Advertisement Advertisement
ADVERTISEMENT
Previous Post

Boxy Closes $1.5 Million Pre‑Seed Investment from EQIQ

Next Post

GPT‑5 for Middle East Startups: What Founders and Developers Should Know

Recommended For You

TikTok’s $14B US Sale: Oracle, Silver Lake, and UAE’s MGX Take Nearly Half
Global News

TikTok’s $14B US Sale: Oracle, Silver Lake, and UAE’s MGX Take Nearly Half

by Faith Amonimo
September 26, 2025
0

President Donald Trump has sealed a deal that values TikTok's US operations at just $14 billion. A fraction of what experts expected and far below parent company ByteDance's $330 billion...

Read moreDetails
10 Powerful Google Chrome AI Features That Will Make Your Browsing Effortless

10 Powerful Google Chrome AI Features That Will Make Your Browsing Effortless

September 19, 2025
Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

September 11, 2025
OpenAI Launches ChatGPT Go in India at $5/Month With New UPI Support, Its Most Affordable Plan Yet

OpenAI Launches ChatGPT Go in India at $5/Month With New UPI Support, Its Most Affordable Plan Yet

August 19, 2025
SoftBank to Invest $2 Billion in Intel: Impact on Middle East Tech and AI

SoftBank to Invest $2 Billion in Intel: Impact on Middle East Tech and AI

August 19, 2025
Next Post
GPT‑5 for Middle East Startups: What Founders and Developers Should Know

GPT‑5 for Middle East Startups: What Founders and Developers Should Know

Fluidity Riyadh: A New Gateway for GCC–Africa Business Partnerships

Fluidity Riyadh: A New Gateway for GCC–Africa Business Partnerships

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

ADVERTISEMENT

Popular Stories

  • Azad Abdullahi Launches Snoozify: A Free Chrome Extension That Lets You Snooze Tabs and Bring Them Back Later

    Azad Abdullahi Launches Snoozify: A Free Chrome Extension That Lets You Snooze Tabs and Bring Them Back Later

    0 shares
    Share 0 Tweet 0
  • Microsoft and Uber Alum Raises $3M for YC-Backed Munify, a Neobank for the Egyptian Diaspora

    0 shares
    Share 0 Tweet 0
  • Replit Raises $250M Series C at $3B Valuation and Launches Agent 3

    0 shares
    Share 0 Tweet 0
  • UAE’s VentureOne and Technology Innovation Institute Launch QuantumConnect to Secure Future Communications

    0 shares
    Share 0 Tweet 0
  • Doha AI Ethics Conference 2025: Global Tech Leaders to Debate Cultural Values in AI

    0 shares
    Share 0 Tweet 0

Where the Middle East Tech Revolution Begins – Covering tech innovations, startups, and developments across the Middle East..​

Facebook X-twitter Instagram Linkedin

Get In Touch

United Arab Emirates (Dubai)

Email: Info@techsoma.net

Quick Links

Advertise on Techsoma

Publish your Articles

T & C

Privacy Policy

© 2025 — Techsoma Middle East. All Rights Reserved

Add New Playlist

No Result
View All Result

© 2025 JNews - Premium WordPress news & magazine theme by Jegtheme.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?