PII in Analytics Data | Blue Frog Docs

PII in Analytics Data

Detecting, removing, and preventing personally identifiable information in analytics platforms

PII in Analytics Data

Personally Identifiable Information (PII) accidentally collected in analytics platforms creates serious privacy compliance risks, potential fines, and loss of user trust. PII includes names, emails, phone numbers, addresses, social security numbers, and other data that can identify individuals.

What This Means

PII leaking into analytics happens through:

  • URL parameters - ?email=user@example.com&phone=555-1234
  • Page titles - "Thank you, John Smith" in title tag
  • Form fields - Accidentally tracking form input values
  • Custom dimensions - Passing user data to custom fields
  • Site search - Users searching for their own name/email
  • Ecommerce data - Including customer names in product fields

Why This Is Critical

Legal consequences:

  • GDPR violations - Up to €20M or 4% of global revenue
  • CCPA violations - Up to $7,500 per intentional violation
  • HIPAA violations - Up to $1.5M per year for healthcare data
  • Class action lawsuits - Data privacy litigation

Platform consequences:

  • GA4/Adobe Analytics - Terms of Service violations
  • Account suspension - Platforms may suspend accounts
  • Data deletion requirements - Must delete historical PII data

Business consequences:

  • Loss of user trust
  • Regulatory audits
  • Negative publicity
  • Insurance premium increases

How to Diagnose

1. Audit URL Parameters

Check for PII in URLs:

High-risk pages:

  • Form confirmation pages (/thank-you?email=...)
  • User account pages (/account?user_id=john.smith)
  • Search results (/search?q=john+doe+phone+number)
  • Password reset pages (/reset?token=...&email=...)
  • Checkout pages (/checkout?name=...&address=...)

How to check:

In GA4:

  1. Reports → Engagement → Pages and screens
  2. Search for @, email, name, phone, address
  3. Look at Page path + query string dimension

In Adobe Analytics:

  1. Workspace → Page URL dimension
  2. Apply search filter for PII indicators
  3. Review query parameters

Browser console:

// Check what URLs are being sent
console.log(window.location.href);
// Check dataLayer for PII
console.log(window.dataLayer);

2. Review Page Titles

Page titles are tracked automatically:

Risky patterns:

<!-- BAD: Contains PII -->
<title>Welcome back, john.smith@example.com - Dashboard</title>
<title>Order confirmation for John Smith - Store</title>
<title>Reset password for user: jane.doe</title>

<!-- GOOD: Generic titles -->
<title>Dashboard - Store</title>
<title>Order Confirmation - Store</title>
<title>Reset Password - Store</title>

Check in GA4:

  1. Reports → Engagement → Pages and screens
  2. Review "Page title" dimension
  3. Search for patterns: @, names, emails

3. Inspect Custom Dimensions and Metrics

Review what's being sent to custom dimensions:

GA4 custom dimensions:

  1. Admin → Custom Definitions
  2. Review each custom dimension
  3. Check example values for PII

Common PII leaks in custom dimensions:

  • User ID dimension contains email instead of hash
  • Customer tier includes customer name
  • User properties include phone numbers
  • Custom metrics include personal data

Test by viewing in DebugView:

// GA4 DebugView - look at custom parameters
gtag('config', 'G-XXXXXX', {
  'debug_mode': true
});

4. Check Site Search Tracking

Users may search for their own information:

Risky searches:

  • "john smith order status"
  • "my account email@example.com"
  • "track order 555-1234" (phone as order number)

In GA4:

  1. Reports → Engagement → Search terms
  2. Review for names, emails, phone patterns
  3. Check search query parameter configuration

5. Review Ecommerce Tracking

Common PII in ecommerce data:

// BAD: Contains customer name
gtag('event', 'purchase', {
  transaction_id: 'T12345',
  value: 99.99,
  items: [{
    item_name: 'Gift for John Smith', // PII!
    item_id: 'SKU123'
  }]
});

// GOOD: No PII
gtag('event', 'purchase', {
  transaction_id: 'T12345',
  value: 99.99,
  items: [{
    item_name: 'Blue Widget',
    item_id: 'SKU123'
  }]
});

6. Audit Form Tracking

Check if form field values are being captured:

Test in browser console:

// Before submitting form, check what's tracked
console.log(window.dataLayer);

// Submit form, check again
// Look for form field values in events

Look for:

  • Form field values in event parameters
  • Input values in custom dimensions
  • User data in event labels

7. Use Automated Scanning Tools

Scan for PII patterns:

Regex patterns to search for:

  • Email: [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
  • Phone: \d{3}[-.\s]?\d{3}[-.\s]?\d{4}
  • SSN: \d{3}-\d{2}-\d{4}
  • Credit card: \d{4}[-\s]?\d{4}[-\s]?\d{4}[-\s]?\d{4}

Tools:

General Fixes

1. Remove PII from URLs

Strip query parameters before page load:

Client-side URL cleaning:

// Remove PII parameters from URL and analytics
(function() {
  const piiParams = ['email', 'name', 'phone', 'address', 'ssn', 'user'];
  const url = new URL(window.location);
  let modified = false;

  piiParams.forEach(param => {
    if (url.searchParams.has(param)) {
      url.searchParams.delete(param);
      modified = true;
    }
  });

  if (modified) {
    // Update URL without page reload
    window.history.replaceState({}, '', url);
  }
})();

Run before analytics loads:

<!-- Load PII removal BEFORE GTM/analytics -->
<script src="/js/pii-removal.js"></script>
<script async src="https://www.googletagmanager.com/gtm.js?id=GTM-XXXXX"></script>

Server-side redirect (preferred):

// Node.js/Express example
app.get('/thank-you', (req, res) => {
  // Strip PII from URL
  const cleanUrl = '/thank-you';
  if (req.query.email || req.query.name) {
    return res.redirect(302, cleanUrl);
  }
  res.render('thank-you');
});

2. Configure GA4 Data Redaction

Enable data redaction in GA4:

  1. Admin → Data Settings → Data Collection
  2. Enable "Redact Personally Identifiable Information"
  3. This removes email addresses from URLs automatically

Note: Only available in GA4, not Universal Analytics

Configure in gtag.js:

gtag('config', 'G-XXXXXX', {
  'anonymize_ip': true, // IP anonymization
  'allow_google_signals': false, // Disable remarketing
});

3. Implement URL Filtering in GTM

Google Tag Manager variable filtering:

Create a custom variable:

// Variable Name: Clean Page URL
function() {
  const url = new URL({{Page URL}});
  const piiParams = ['email', 'name', 'phone', 'address', 'user', 'token'];

  piiParams.forEach(param => {
    url.searchParams.delete(param);
  });

  return url.toString();
}

Use in GA4 config tag:

  • Instead of {{Page URL}}, use {{Clean Page URL}}

4. Hash User Identifiers

Instead of sending raw user IDs or emails:

// BAD: Sending email directly
gtag('config', 'G-XXXXXX', {
  'user_id': 'john.smith@example.com'
});

// GOOD: Hash the email first
async function hashEmail(email) {
  const encoder = new TextEncoder();
  const data = encoder.encode(email.toLowerCase().trim());
  const hashBuffer = await crypto.subtle.digest('SHA-256', data);
  const hashArray = Array.from(new Uint8Array(hashBuffer));
  return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
}

const hashedEmail = await hashEmail('john.smith@example.com');
gtag('config', 'G-XXXXXX', {
  'user_id': hashedEmail // '4a5f7c2e...'
});

Benefits:

  • Can still identify unique users
  • Cannot reverse to find PII
  • Compliant with privacy regulations

5. Use Generic Page Titles

Set consistent, PII-free page titles:

// Set title before analytics loads
document.title = 'Order Confirmation - Store Name';

// Or in React/SPA:
useEffect(() => {
  document.title = 'Order Confirmation - Store Name';
}, []);

Dynamic title strategy:

// Instead of: "Welcome, John Smith"
// Use: "Welcome Back"

const pageTitle = isLoggedIn ? 'Dashboard' : 'Login';
document.title = `${pageTitle} - Store Name`;

6. Implement Enhanced Measurement Filters

GA4 Enhanced Measurement - exclude form fields:

Admin → Data Streams → Configure tag settings

  • Enhanced Measurement → Settings
  • Form interactions: Disable or configure exclusions

Exclude specific form classes:

<!-- Forms with this class won't track field values -->
<form class="pii-form">
  <input type="email" name="email">
</form>

7. Create Data Deletion Requests

If PII already exists in analytics:

GA4 Data Deletion:

  1. Admin → Data Settings → Data Deletion Requests
  2. Create new request
  3. Specify parameter and value to delete (e.g., email = "user@example.com")
  4. Select date range
  5. Submit request (takes several days to process)

Adobe Analytics Data Deletion:

  1. Privacy Service UI → Create Request
  2. GDPR/CCPA deletion request
  3. Specify identifiers
  4. Submit through Privacy API

Google Ads:

  • Use Google Ads API for data deletion
  • Contact support for account-level issues

8. Audit and Filter Custom Dimensions

Review all custom dimensions:

// BEFORE: Sending raw user data
gtag('event', 'login', {
  'user_email': user.email, // PII!
  'user_name': user.name     // PII!
});

// AFTER: Sending hashed or generic data
gtag('event', 'login', {
  'user_id_hash': hashUserId(user.id),
  'user_type': user.subscription_tier // Not PII
});

Implement safeguards:

function trackEvent(eventName, params) {
  // Validate no PII in params
  const piiPatterns = [
    /@.*\./,  // Email pattern
    /\d{3}.*\d{3}.*\d{4}/, // Phone pattern
  ];

  Object.values(params).forEach(value => {
    piiPatterns.forEach(pattern => {
      if (pattern.test(String(value))) {
        console.error('PII detected in tracking params!', value);
        // Don't send event or remove the parameter
      }
    });
  });

  gtag('event', eventName, params);
}

9. Implement Content Security Policy for Analytics

Monitor what's being tracked:

<meta http-equiv="Content-Security-Policy"
  content="default-src 'self';
           script-src 'self' https://www.googletagmanager.com;
           connect-src 'self' https://www.google-analytics.com;
           report-uri /csp-report">

Monitor CSP reports to detect:

  • Unauthorized tracking scripts
  • Data being sent to unexpected domains

10. Train Team on PII Prevention

Create internal guidelines:

DO:

  • Hash user identifiers
  • Use generic page titles
  • Strip URL parameters
  • Validate before sending to analytics

DON'T:

  • Send emails, names, phone numbers
  • Track form field values
  • Include PII in product names
  • Use PII in custom dimensions

Review checklist for new features:

  • URLs checked for PII parameters
  • Page titles are generic
  • Custom dimensions validated
  • Form tracking excludes PII fields
  • Ecommerce data sanitized

Platform-Specific Guides

Platform Guide
Shopify Shopify privacy and PII handling
WordPress WordPress GDPR compliance
GA4 GA4 data redaction settings
Adobe Analytics Adobe privacy controls

Further Reading

// SYS.FOOTER