Research Agent Next scheduled run: Loading...Research Agent Next scheduled run: Loading...Research Agent Next scheduled run: Loading...
Live Research · January 2026

AI Coding Agents
Hallucinate

Real-time research tracking hallucinations, failures & security vulnerabilities across 10 vibe coding platforms. Peer-reviewed data from arxiv, USENIX, and 10,000+ reviews. Continuously updated via automated signal collection from 16 data sources.

5–68%
Hallucination Rate
arxiv benchmarks
19.6%
Fake Packages
USENIX Security
24+
CVEs in 2025
NVD Database
33%
Trust AI
Stack Overflow '25
Documented Failures
12,473
from 16 data sources

Platforms Tracked

CursorWindsurfReplitClaude CodeGitHub CopilotOpenAI CodexAntigravityLovableBolt.newDevinv0.devAmazon Q
arxiv PapersUSENIX SecurityTrustpilotGitHubHacker NewsRedditStack OverflowCVE Database
6/10
Below 2.0★ on Trustpilot
1.7×
More Bugs (AI vs Human)
10K+
Reviews Analyzed
$300-$4K
Lost Per Project

Trustpilot Ratings by Platform

Real user reviews from the past 12 months

Prompt-to-App Builders

Lovable
Best Rated
4.1★
794 reviews · 65% 5-star
Devin
Cognition AI
2.3★
47 reviews · 51% 1-star
Bolt.new
Lowest Rated
1.5★
119 reviews · 83% 1-star

Core Vibe IDEs

Replit
Most Reviews
3.6★
1,235 reviews · 52% 5-star
Claude Code
Anthropic
2.8★
89 reviews · 45% 1-star
GitHub Copilot
Microsoft
2.5★
342 reviews · 68% 1-star
OpenAI Codex
OpenAI
2.2★
156 reviews · 58% 1-star
Windsurf
Codeium
2.1★
23 reviews · 61% 1-star
Antigravity
Google
2.0★
28 reviews · 64% 1-star
Amazon Q
AWS
1.9★
31 reviews · 71% 1-star
Cursor
74% 1-star
1.8★
112 reviews · 74% 1-star
v0.dev
Vercel
1.8★
25 reviews · 72% 1-star
Dec 2025 Research

AI Code Has 1.7× More Defects Than Human Code

CodeRabbit analyzed 470 GitHub PRs. Critical issues, security vulnerabilities, and performance bugs all significantly higher in AI-generated code.

1.7×
More Bugs
More Revisions
Performance Issues

The Production Funnel

Survival rates from vibe to production

Initial Prototype90%+ succeed
Feature Iteration60% survive
Auth & Payments25% survive
Production Deploy15% survive
Real Users~10% survive
⚰️

Based on synthesis of 10,000+ reviews and documented case studies

The 5 Failure Patterns

Where vibe-coded apps consistently break

1

Debugging Death Spiral

↓ 40% of apps die here

AI fixes one bug by breaking something else. Credits deplete. One user burned 140 million tokens in a single month—for nothing.

2

Authentication & Payments

↓ 70-80% failure rate

Beautiful UI, zero backend logic. The moment you need Stripe or Supabase auth, the AI runs in circles.

3

The 1,000-Line Cliff

↓ Context window exceeded

Works great at 500 lines. At 1,000+, AI starts hallucinating, claiming it made changes it didn't.

4

Preview ≠ Production

↓ 60-70% deployment failures

Works in preview. Hit deploy. Nothing happens. Or old version deploys. Support doesn't respond.

5

First Real Traffic

↓ 80-90% crash rate

20 concurrent users. Memory leaks. Crashes. One AI agent deleted an entire production database, then generated 4,000 fake records to cover it up.

Documented Failures

Real quotes from AI-native Builders and Founders

🚨 Stuck on a Vibe-Coded Project?

Paste your GitHub repo URL. Nexlayer's agents will diagnose what's broken and deploy it. Free until it works.