Bot Traffic Issues | Blue Frog Docs

Bot Traffic Issues

Diagnose and fix bot and crawler traffic that inflates analytics data and skews business metrics

Bot Traffic Issues

What This Means

Bot traffic occurs when automated programs (bots, crawlers, scrapers, spam bots) visit your website and trigger analytics tracking. This inflates metrics, skews data, and makes it difficult to understand actual human user behavior.

Impact on Your Business

Inflated Metrics:

  • Session counts artificially high
  • Page view numbers misleading
  • Engagement metrics distorted
  • Cannot trust reported traffic
  • Bounce rate artificially low or high

Poor Decision Making:

  • Optimizing for bot behavior, not users
  • A/B tests invalidated by bot traffic
  • Conversion rates appear lower than reality
  • User behavior insights corrupted

Wasted Resources:

  • Server resources consumed by bots
  • Analytics quotas used by non-human traffic
  • Marketing attribution diluted
  • Ad spend optimization based on bot data
  • Customer support time investigating fake conversions

How to Diagnose

Method 1: Check GA4 Bot Filtering Status

  1. Verify bot filtering enabled:

    • GA4 → Admin → Data Settings → Data Filters
    • Check if "Internal Traffic" filter active
    • Review filter configuration
  2. Check data stream settings:

    • Admin → Data Streams → Select stream
    • Configure tag settings
    • Verify bot filtering enabled

What to Look For:

  • Bot filtering toggle status
  • Filter exclusion patterns
  • Known bot traffic patterns

Method 2: Review Traffic Patterns

  1. Check for suspicious patterns:

    • GA4 → Reports → Realtime → Overview
    • Look for rapid-fire page views
    • Identical user paths
    • Unrealistic navigation speed
  2. Analyze session duration:

    • Reports → Engagement → Pages and screens
    • Look for 0-second sessions
    • Extremely short engagement times
    • No interaction events
  3. Review bounce rate anomalies:

    • Extremely high bounce (100%)
    • Or extremely low bounce (0%)
    • Single-page sessions
    • No scroll depth tracking

What to Look For:

  • Page views from same IP in rapid succession
  • Sessions lasting 0 seconds
  • Perfect 100% bounce rate
  • Visits at unusual hours (3-5 AM spikes)
  • Geographic patterns (single country surge)

Method 3: Check User-Agent Strings

  1. Review User-Agent data:

  2. Common bot User-Agents:

    Googlebot
    Bingbot
    AhrefsBot
    SemrushBot
    MJ12bot
    DotBot
    BLEXBot
    (not set)
    
  3. Server log analysis:

    # Check server logs for bot traffic
    grep -i "bot\|crawl\|spider" access.log | wc -l
    

What to Look For:

  • "(not set)" browser names
  • Known bot/crawler names
  • Suspicious user-agent patterns
  • Missing user-agent strings

Method 4: Analyze Traffic Sources

  1. Check referral sources:

    • GA4 → Reports → Acquisition → Traffic acquisition
    • Look for suspicious referrers
    • Check for spam domains
  2. Common spam referrers:

    semalt.com
    buttons-for-website.com
    free-social-buttons.com
    get-free-traffic-now.com
    
  3. Review campaign sources:

    • Look for "direct" traffic spikes
    • Unusual UTM parameters
    • Malformed source data

What to Look For:

  • Referral traffic from known spam domains
  • Sudden spikes from single sources
  • Referrers with suspicious names
  • Traffic from countries you don't target

Method 5: Check Conversion Patterns

  1. Review unusual conversion behavior:

    • Conversions with no prior engagement
    • Purchase events with 0-second sessions
    • Form submissions without form view
    • Multiple conversions from same session
  2. Test conversion tracking:

    • Perform test conversion
    • Check if legitimate
    • Look for fake conversions in same timeframe

What to Look For:

  • Conversion events without page_view
  • Transaction IDs in sequential order
  • Same value repeated conversions
  • Conversions from bot-like sessions

General Fixes

Fix 1: Enable GA4 Bot Filtering

Activate built-in bot filtering:

  1. Enable in GA4 Admin:

    • Admin → Data Settings → Data collection
    • Toggle "Enable bot filtering"
    • Saves automatically
  2. Configure data filter:

    • Admin → Data Settings → Data Filters
    • Create filter for known bots
    • Apply to data stream
  3. What gets filtered:

    Googlebot
    Bingbot
    Yahoo! Slurp
    DuckDuckBot
    Baiduspider
    YandexBot
    Other IAB/ABC International Spiders & Bots list
    
  4. Note limitations:

    • Only filters known bots
    • Doesn't catch sophisticated bots
    • Some bad bots not in IAB list
    • Need additional measures

Fix 2: Implement Server-Side Bot Detection

Block bots before they reach analytics:

  1. Detect bots via User-Agent:

    // Node.js/Express example
    const isBot = require('isbot');
    
    app.use((req, res, next) => {
      if (isBot(req.get('user-agent'))) {
        // Block bot or serve different content
        res.status(403).send('Bot detected');
        return;
      }
      next();
    });
    
  2. Use bot detection library:

    // Install: npm install isbot
    import { isbot } from 'isbot';
    
    if (isbot(navigator.userAgent)) {
      // Don't initialize analytics
      console.log('Bot detected, analytics disabled');
    } else {
      // Initialize GA4
      gtag('config', 'G-XXXXXXXXXX');
    }
    
  3. robots.txt configuration:

    User-agent: *
    Disallow: /admin/
    Disallow: /checkout/
    Disallow: /cart/
    
    User-agent: AhrefsBot
    Disallow: /
    
    User-agent: SemrushBot
    Disallow: /
    

Fix 3: Implement JavaScript Challenge

Require JavaScript execution to load tracking:

  1. Delay analytics initialization:

    // Only load analytics after user interaction
    let analyticsLoaded = false;
    
    function loadAnalytics() {
      if (analyticsLoaded) return;
    
      // Load GA4
      const script = document.createElement('script');
      script.src = 'https://www.googletagmanager.com/gtag/js?id=G-XXXXXXXXXX';
      document.head.appendChild(script);
    
      script.onload = function() {
        window.dataLayer = window.dataLayer || [];
        function gtag(){dataLayer.push(arguments);}
        gtag('js', new Date());
        gtag('config', 'G-XXXXXXXXXX');
      };
    
      analyticsLoaded = true;
    }
    
    // Load on first user interaction
    ['mousedown', 'mousemove', 'keydown', 'scroll', 'touchstart'].forEach(event => {
      document.addEventListener(event, loadAnalytics, { once: true, passive: true });
    });
    
  2. Require scroll interaction:

    let scrollTracked = false;
    
    window.addEventListener('scroll', function() {
      if (!scrollTracked && window.scrollY > 100) {
        gtag('config', 'G-XXXXXXXXXX');
        scrollTracked = true;
      }
    }, { passive: true });
    
  3. Human verification:

    // Simple mouse movement detection
    let humanVerified = false;
    
    document.addEventListener('mousemove', function verifyHuman() {
      if (!humanVerified) {
        humanVerified = true;
        gtag('config', 'G-XXXXXXXXXX');
        document.removeEventListener('mousemove', verifyHuman);
      }
    });
    

Fix 4: Filter Bot Traffic with GTM

Use Google Tag Manager to filter bots:

  1. Create bot detection variable:

    • GTM → Variables → New User-Defined Variable
    • Variable Type: Custom JavaScript
    function() {
      var ua = navigator.userAgent.toLowerCase();
      var bots = ['googlebot', 'bingbot', 'slurp', 'duckduckbot',
                  'baiduspider', 'yandexbot', 'facebookexternalhit',
                  'twitterbot', 'rogerbot', 'linkedinbot', 'embedly',
                  'quora link preview', 'showyoubot', 'outbrain',
                  'pinterest', 'slackbot', 'vkshare', 'w3c_validator',
                  'redditbot', 'applebot', 'whatsapp', 'flipboard',
                  'tumblr', 'bitlybot', 'skypeuripreview', 'nuzzel',
                  'discordbot', 'qwantify', 'pinterestbot', 'bitrix',
                  'headlesschrome', 'phantomjs'];
    
      for (var i = 0; i < bots.length; i++) {
        if (ua.indexOf(bots[i]) > -1) {
          return true;
        }
      }
      return false;
    }
    
  2. Create blocking trigger:

    • GTM → Triggers → New Trigger
    • Trigger Type: Page View
    • Add exception: Bot Detection Variable = true
  3. Apply to all tags:

    • Edit each tag
    • Add trigger exception
    • Prevents tags from firing for bots

Fix 5: Implement IP-Based Filtering

Block known bot IP ranges:

  1. Create IP exclusion list in GA4:

    • Admin → Data Settings → Data Filters
    • Create new filter
    • Filter Type: Internal Traffic
    • Define IP addresses
  2. Server-side IP blocking:

    // Node.js/Express example
    const botIPs = [
      '66.249.64.0/19', // Google
      '157.55.32.0/19', // Bing
      // Add known bot IP ranges
    ];
    
    function isIPInRange(ip, range) {
      // IP range checking logic
      // Use npm package: ip-range-check
    }
    
    app.use((req, res, next) => {
      const clientIP = req.ip;
    
      if (botIPs.some(range => isIPInRange(clientIP, range))) {
        res.status(403).send('Access denied');
        return;
      }
    
      next();
    });
    
  3. Cloudflare bot management:

    • Enable Cloudflare Bot Management
    • Configure bot fighting mode
    • Set challenge level
    • Review bot score analytics

Fix 6: Use CAPTCHA for High-Value Actions

Verify human users for conversions:

  1. Google reCAPTCHA v3:

    <script src="https://www.google.com/recaptcha/api.js?render=YOUR_SITE_KEY"></script>
    
    <script>
    function submitForm() {
      grecaptcha.ready(function() {
        grecaptcha.execute('YOUR_SITE_KEY', {action: 'submit'}).then(function(token) {
          // Add token to form
          document.getElementById('g-recaptcha-response').value = token;
    
          // Track conversion (human verified)
          gtag('event', 'purchase', {
            transaction_id: 'T123',
            value: 99.99,
            verified_human: true
          });
    
          // Submit form
          document.getElementById('form').submit();
        });
      });
    }
    </script>
    
  2. Verify on backend:

    // Server-side verification
    const axios = require('axios');
    
    async function verifyCaptcha(token) {
      const response = await axios.post(
        'https://www.google.com/recaptcha/api/siteverify',
        null,
        {
          params: {
            secret: 'YOUR_SECRET_KEY',
            response: token
          }
        }
      );
    
      return response.data.success && response.data.score > 0.5;
    }
    
  3. Track verified humans separately:

    gtag('event', 'verified_conversion', {
      recaptcha_score: 0.9,
      transaction_id: 'T123'
    });
    

Fix 7: Create Custom Bot Filter Report

Identify and analyze bot traffic:

  1. Create GA4 exploration:

    • GA4 → Explore → Free form
    • Dimensions: Session source, Browser, Device category
    • Metrics: Sessions, Bounce rate, Avg session duration
  2. Add segments for bot patterns:

    Segment 1: 0-second sessions
    Segment 2: 100% bounce rate
    Segment 3: (not set) browser
    Segment 4: Single page sessions
    
  3. Create audience for bot traffic:

    • Configure → Audiences → New audience
    • Conditions:
    • Use for exclusion in reports
  4. Exclude bot audience from reporting:

    • Apply audience exclusion to reports
    • Compare metrics with/without bots
    • Document bot traffic percentage

Platform-Specific Guides

Detailed implementation instructions for your specific platform:

Platform Troubleshooting Guide
Shopify Shopify Bot Traffic Guide
WordPress WordPress Bot Traffic Guide
Wix Wix Bot Traffic Guide
Squarespace Squarespace Bot Traffic Guide
Webflow Webflow Bot Traffic Guide

Verification

After implementing fixes:

  1. Monitor bot traffic percentage:

    • Create bot audience in GA4
    • Check percentage of total traffic
    • Should decrease after fixes
    • Document baseline vs improved
  2. Review session quality metrics:

    • Average session duration should increase
    • Pages per session should increase
    • Bounce rate should normalize
    • Engagement rate should improve
  3. Test bot detection:

    • Use online bot detection tools
    • Simulate bot traffic
    • Verify analytics not triggered
    • Check server logs for blocks
  4. Compare conversion data:

    • Review conversion rates before/after
    • Should see more realistic rates
    • Check average order value
    • Verify data quality improved

Common Mistakes

  1. Not enabling GA4 bot filtering - Simple toggle often missed
  2. Blocking good bots - Google/Bing crawlers need access
  3. Only client-side detection - Bots bypass JavaScript
  4. Not updating bot lists - New bots emerge regularly
  5. Overly aggressive filtering - Blocking legitimate users
  6. Ignoring referral spam - Different from bot traffic
  7. Not monitoring bot traffic percentage - Can't measure improvement
  8. Blocking in robots.txt but not analytics - Still tracked
  9. No server-side validation - Relies only on client
  10. Not creating baseline metrics - Can't prove improvement

Troubleshooting Checklist

  • GA4 bot filtering enabled
  • Data filters configured in GA4
  • Server-side bot detection implemented
  • JavaScript challenge for analytics loading
  • GTM bot detection variable created
  • Known bot IP ranges blocked
  • CAPTCHA implemented for conversions
  • robots.txt properly configured
  • Bot traffic audience created in GA4
  • Baseline metrics documented
  • Regular bot list updates scheduled
  • Monitoring bot traffic percentage

Acceptable Bot Traffic Levels

Normal: < 5% of total traffic

  • Search engine crawlers
  • Monitoring services
  • Social media link previews

Investigate: 5-15% of traffic

  • May indicate emerging bot problem
  • Check for new bot types
  • Review filtering effectiveness

Critical: > 15% of traffic

  • Significant bot infiltration
  • Immediate action needed
  • Data quality severely compromised
  • Implement stricter measures

Additional Resources

// SYS.FOOTER