Crawl Errors | Blue Frog Docs

Crawl Errors

Understanding and fixing website crawl errors that prevent search engines from indexing your content.

Crawl Errors

What This Means

Crawl errors occur when search engine bots (like Googlebot) cannot access pages on your website. This prevents those pages from being indexed and appearing in search results.

Impact:

Types of Crawl Errors

Server Errors (5xx)

  • 500 Internal Server Error
  • 502 Bad Gateway
  • 503 Service Unavailable

Not Found Errors (4xx)

  • 404 Not Found
  • 410 Gone
  • 403 Forbidden

Redirect Errors

  • Redirect chains (too many redirects)
  • Redirect loops
  • Broken redirects

Robots.txt Issues

  • Blocking important pages
  • Malformed robots.txt
  • Overly restrictive rules

How to Diagnose

1. Google Search Console

  1. Go to Search Console > Indexing > Pages
  2. Check "Not indexed" section
  3. Review crawl errors under each category
  4. Click into specific issues for affected URLs

2. Crawl Your Site

Use crawling tools:

3. Server Log Analysis

Check server logs for:

  • 4xx and 5xx response codes
  • Googlebot requests
  • Error patterns

General Fixes

Fix 1: 404 Not Found Errors

Identify source:

  1. Find pages linking to 404 URLs
  2. Check if URL changed or page deleted
  3. Review external links pointing to missing pages

Solutions:

  • Redirect to relevant existing page (301 redirect)
  • Restore the missing content
  • Update internal links
  • Request external sites update their links
  • If intentionally removed, return 410 (Gone)

Fix 2: Server Errors (5xx)

Common causes:

Solutions:

  • Review server error logs
  • Increase server resources
  • Fix application bugs
  • Optimize database queries
  • Consider better hosting

Fix 3: Redirect Issues

Redirect chains: A → B → C → D (bad) A → D (good)

Solutions:

  • Update redirects to point directly to final URL
  • Maximum 1-2 redirects per chain
  • Remove redirect loops
  • Audit redirects regularly

Fix 4: Robots.txt Problems

Check robots.txt:

https://yoursite.com/robots.txt

Common issues:

# Too restrictive - blocks everything
User-agent: *
Disallow: /

# Blocking important directories
Disallow: /products/
Disallow: /blog/

Solution:

  • Allow important pages to be crawled
  • Only block admin/private areas
  • Test with Google's robots.txt tester

Fix 5: Soft 404s

Pages returning 200 but with "not found" content.

Detection:

Solution:

  • Return proper 404 status code
  • Or provide valuable content on the page

Platform-Specific Guides

Platform Guide
Shopify Shopify Redirects
WordPress WordPress 404 Handling
Squarespace Squarespace URL Mapping

Prevention

  • Monitor Search Console weekly
  • Set up alerts for crawl errors
  • Test redirects after site changes
  • Audit links during site migrations
  • Use proper status codes

Verification

After fixes:

  1. Request re-crawl in Search Console
  2. Use "URL Inspection" tool
  3. Monitor error count trending down
  4. Check pages returning to index

Further Reading

// SYS.FOOTER