Data Privacy & LLM Usage
Transparency about how PasteProof detects, processes, and protects your sensitive data.
1How Detection Works
Pattern Matching (Regex-Based)
PasteProof uses a library of regular expression patterns to detect common PII formats in real time: credit card numbers (Luhn-validated), Social Security Numbers, API keys (OpenAI, AWS, Stripe, and others), email addresses, phone numbers, and more. These patterns run locally in the browser extension and server-side in our API endpoints.
Team-Shared Custom Patterns
Teams can define custom detection patterns specific to their organization — internal ID formats, proprietary key structures, or domain-specific sensitive data. These patterns are shared across all team members and applied consistently in the browser extension, Slack integration, and API scans.
AI/LLM-Powered Context Analysis
Beyond pattern matching, PasteProof uses large language models to understand context. The AI can distinguish between test data and production secrets, harmless code snippets and credential leaks. This context-aware layer reduces false positives and catches sensitive data that patterns alone might miss.
2Data Retention Policy
We follow a minimal retention approach. Here is what we store and for how long:
| Data Type | Stored? | Retention |
|---|---|---|
| Detection event metadata | Yes | 90 days, then purged |
| Raw text content | No | Never stored |
| Detected PII values | No | Redacted before logging |
| Audit logs (teams) | Yes | 7 years (compliance) |
| Session tokens | Yes | 30 days, auto-expire |
| Clipboard / form content | No | 0 seconds (ephemeral, in-memory only) |
| Account data | Yes | Until deletion (30-day grace period, then hard delete) |
We do not sell, share, or monetize user data. Detection metadata is used solely for analytics dashboards visible to your team. Older detection data is anonymized automatically — user IDs are hashed, emails and IP addresses are removed, and only aggregate statistics are retained.
3Anonymization
All PII is redacted before any data is stored or logged. When PasteProof detects sensitive data:
- •The type of detection is recorded (e.g., "credit card", "API key") but not the actual value.
- •Values shown in dashboards are truncated or masked (e.g.,
4532-****-****-1234). - •No raw PII is persisted in our databases, logs, or analytics systems.
- •Browser extension detections are processed locally — detected values never leave your device unless you use the API.
- •Sensitive credentials (passwords, API keys, Slack tokens) are encrypted at rest using bcrypt, SHA-256, or AES-256-GCM respectively.
4LLM Usage
PasteProof uses large language models for context-aware AI scans. Here is how that works:
What Data Is Sent
When AI scanning is enabled, the text content of a field or document is sent to the LLM for analysis. We strip known PII patterns before sending when possible, and the LLM is instructed to evaluate context, not store data.
LLM Providers
We use a layered approach depending on the request type. None of our providers train on your data:
- •Groq (Ollama Instruct 3.8) — Used for fast inference on short-context PII detection requests. Groq does not retain customer data for inference requests by default and does not use it for model training. Groq's data policy
- •Self-hosted on Modal — For large-context document scanning, we run our own inference infrastructure on Modal's SOC 2 Type 2 certified platform. Function inputs and outputs are deleted after processing, all data is encrypted in transit and at rest, and workloads run in gVisor-sandboxed containers. Modal does not access or use your source code, function inputs, or outputs. Modal's security guide
We are actively working toward self-hosting all of our LLM infrastructure, which will ensure no user data is retained or used for training by any third party.
Stateless Processing
All AI detection is processed statelessly — content is analyzed in memory and immediately discarded after the response is sent. No clipboard content, form field content, or message content is ever written to disk during AI scanning. Only request metadata (timestamp, user ID, content length) is logged.
Opt-Out
AI scanning is a Premium feature and is opt-in. If you prefer pattern-matching-only detection, you can disable AI scans in your extension settings or team policies. Pattern matching alone provides robust detection without any data leaving your device.
5On-Premise Deployment
Need to Meet Data Residency Requirements?
We offer on-premise deployment options for organizations that need full control over where their data is processed and stored. Run PasteProof entirely within your infrastructure — no external API calls, complete data sovereignty.
Contact Us to Discuss Your Needs