Live Research · January 2026

AI Coding Agents
Hallucinate

Real-time research tracking hallucinations, failures & security vulnerabilities across 10 vibe coding platforms. Peer-reviewed data from arxiv, USENIX, and 10,000+ reviews. Continuously updated via automated signal collection from 16 data sources.

5–68%

Hallucination Rate

arxiv benchmarks

19.6%

Fake Packages

USENIX Security

24+

CVEs in 2025

NVD Database

33%

Trust AI

Stack Overflow '25

Documented Failures

12,473

from 16 data sources

Platforms Tracked

CursorWindsurfReplitClaude CodeGitHub CopilotOpenAI CodexAntigravityLovableBolt.newDevinv0.devAmazon Q

arxiv PapersUSENIX SecurityTrustpilotGitHubHacker NewsRedditStack OverflowCVE Database

6/10

Below 2.0★ on Trustpilot

1.7×

More Bugs (AI vs Human)

10K+

Reviews Analyzed

$300-$4K

Lost Per Project

Trustpilot Ratings by Platform

Real user reviews from the past 12 months

Prompt-to-App Builders

Lovable

Best Rated

4.1★

794 reviews · 65% 5-star

Devin

Cognition AI

2.3★

47 reviews · 51% 1-star

Bolt.new

Lowest Rated

1.5★

119 reviews · 83% 1-star

Core Vibe IDEs

Replit

Most Reviews

3.6★

1,235 reviews · 52% 5-star

Claude Code

Anthropic

2.8★

89 reviews · 45% 1-star

GitHub Copilot

Microsoft

2.5★

342 reviews · 68% 1-star

OpenAI Codex

OpenAI

2.2★

156 reviews · 58% 1-star

Windsurf

Codeium

2.1★

23 reviews · 61% 1-star

Antigravity

Google

2.0★

28 reviews · 64% 1-star

Amazon Q

AWS

1.9★

31 reviews · 71% 1-star

Cursor

74% 1-star

1.8★

112 reviews · 74% 1-star

v0.dev

Vercel

1.8★

25 reviews · 72% 1-star

Dec 2025 Research

AI Code Has 1.7× More Defects Than Human Code

CodeRabbit analyzed 470 GitHub PRs. Critical issues, security vulnerabilities, and performance bugs all significantly higher in AI-generated code.

1.7×

More Bugs

2×

More Revisions

8×

Performance Issues

The Production Funnel

Survival rates from vibe to production

Initial Prototype90%+ succeed

Feature Iteration60% survive

Auth & Payments25% survive

Production Deploy15% survive

Real Users~10% survive

⚰️

Based on synthesis of 10,000+ reviews and documented case studies

The 5 Failure Patterns

Where vibe-coded apps consistently break

Debugging Death Spiral

↓ 40% of apps die here

AI fixes one bug by breaking something else. Credits deplete. One user burned 140 million tokens in a single month—for nothing.

Authentication & Payments

↓ 70-80% failure rate

Beautiful UI, zero backend logic. The moment you need Stripe or Supabase auth, the AI runs in circles.

The 1,000-Line Cliff

↓ Context window exceeded

Works great at 500 lines. At 1,000+, AI starts hallucinating, claiming it made changes it didn't.

Preview ≠ Production

↓ 60-70% deployment failures

Works in preview. Hit deploy. Nothing happens. Or old version deploys. Support doesn't respond.

First Real Traffic

↓ 80-90% crash rate

20 concurrent users. Memory leaks. Crashes. One AI agent deleted an entire production database, then generated 4,000 fake records to cover it up.

Documented Failures

Real quotes from AI-native Builders and Founders

🚨 Stuck on a Vibe-Coded Project?

Paste your GitHub repo URL. Nexlayer's agents will diagnose what's broken and deploy it. Free until it works.

AI Coding AgentsHallucinate

Trustpilot Ratings by Platform

Prompt-to-App Builders

Core Vibe IDEs

AI Code Has 1.7× More Defects Than Human Code

The Production Funnel

The 5 Failure Patterns

Debugging Death Spiral

Authentication & Payments

The 1,000-Line Cliff

Preview ≠ Production

First Real Traffic

Documented Failures

🚨 Stuck on a Vibe-Coded Project?

AI Coding Agents
Hallucinate