Document Scanning API
Frontend documentation for PII detection in documents (PDF, Word, Images).
Base URL
Development: http://localhost:8787
Production: https://api.pasteproof.comAuthentication
All endpoints require a Bearer token in the Authorization header:
Authorization: Bearer YOUR_JWT_TOKENSupported File Types
| Type | Extensions | MIME Types |
|---|---|---|
.pdf | application/pdf | |
| Word | .docx .doc | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| Images | .jpg .png | image/jpeg image/png |
Max file size: 50MB
Endpoints
1. Upload Document
Upload a document for PII scanning and redaction.
Endpoint:
POST /v1/documents/uploadRequest:
Content-Type: multipart/form-data
file: File (required)
mode: "detection" | "redaction" (optional, default: "redaction")
- "detection": Return detections list only (no redacted document)
- "redaction": Return redacted document with detections
redactionStyle: "label" | "blackout" (optional, default: "label")
- "label": Replace PII with [PII_TYPE] labels
- "blackout": Replace PII with black rectangles
piiTypes: JSON array or comma-separated string (optional)
- Specific PII types to check for
- Format: ["EMAIL_ADDRESS","SSN"] or "EMAIL_ADDRESS,SSN"
- If omitted, checks for all PII types
ignorePiiTypes: JSON array or comma-separated string (optional)
- PII types to ignore/skip
- These types will not be detected or redacted
customMatches: JSON array or comma-separated string (optional)
- Exact text strings to detect and redact
- Format: ["internal_id_123","ACME-CORP"] or "internal_id_123,ACME-CORP"
- Useful for organization-specific sensitive data
customIgnores: JSON array or comma-separated string (optional)
- Exact text strings to ignore even if they match PII patterns
- Format: ["example@test.com","000-00-0000"] or "example@test.com,000-00-0000"
- Useful for known safe test data or placeholdersExample:
const formData = new FormData();
formData.append('file', fileBlob);
formData.append('mode', 'redaction');
formData.append('redactionStyle', 'blackout');
formData.append('piiTypes', JSON.stringify(['EMAIL_ADDRESS', 'SSN']));
formData.append('ignorePiiTypes', 'CREDIT_CARD,API_KEY');
formData.append('customMatches', 'internal_id_123,ACME-CORP,project-apollo');
formData.append('customIgnores', 'example@test.com,000-00-0000');Response (Success - 200):
{
"success": true,
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"message": "Your document is being processed.",
"estimatedTime": "1-5 minutes",
"filename": "employee_data.pdf",
"fileSize": 2547896
}2. Check Processing Status
Poll this endpoint to check if processing is complete.
Endpoint:
GET /v1/documents/:documentId/statusResponse (Complete - 200):
{
"success": true,
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"status": "complete",
"progress": 100,
"detections": [
{
"type": "SSN",
"value": "[REDACTED_SSN]",
"page": 1,
"confidence": 95
}
],
"summary": {
"total": 15,
"by_type": {
"SSN": 3,
"EMAIL_ADDRESS": 5,
"PHONE_NUMBER": 4
},
"pages_processed": 5
}
}Status Values: queued, processing, complete, failed
3. List Documents
List all document scans for the authenticated user with pagination and filtering.
Endpoint:
GET /v1/documentsQuery Parameters:
limit (optional): Number of results per page (default: 50, max: 100)
offset (optional): Pagination offset (default: 0)
status (optional): Filter by status (queued, processing, complete, failed)
orderBy (optional): Field to sort by (default: created_at)
order (optional): Sort order asc or desc (default: desc)Response (Success - 200):
{
"success": true,
"documents": [
{
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"filename": "employee_data.pdf",
"fileType": "pdf",
"fileSize": 2547896,
"status": "failed",
"progress": 0,
"detectionsCount": 0,
"hasRedactedDocument": false,
"error": "Processing failed",
"retryCount": 1,
"createdAt": "2026-01-08T10:30:00Z",
"startedAt": null,
"completedAt": null
}
],
"pagination": {
"total": 10,
"limit": 50,
"offset": 0,
"hasMore": false
}
}4. Retry Failed Scan
Resubmit a failed or queued document to the processing queue.
Endpoint:
POST /v1/documents/:documentId/retryFeatures:
- Validates the document exists and belongs to the user
- Prevents retrying documents currently being processed
- Verifies the original file still exists in R2 storage
- Increments retry count
- Resets status, clears errors, and requeues the document
Response (Success - 200):
{
"success": true,
"message": "Document queued for retry",
"documentId": "550e8400-e29b-41d4-a716-446655440000",
"status": "queued",
"retryCount": 2
}Error Responses:
404if document not found400if document is currently processing404if original file no longer exists in storage503if R2/Queue not configured
5. Download Redacted Document
Download the redacted version of the document.
Endpoint:
GET /v1/documents/:documentId/downloadResponse (Success - 200):
Content-Type: application/pdf
Content-Disposition: attachment; filename="redacted_employee_data.pdf"
[Binary file data]6. Delete Document
Delete a document and all associated files.
Endpoint:
DELETE /v1/documents/:documentIdResponse (Success - 200):
{
"success": true,
"message": "Document deleted successfully"
}Error Codes
| Code | Meaning | Common Causes |
|---|---|---|
| 400 | Bad Request | Invalid file type, file too large, missing file |
| 401 | Unauthorized | Missing or invalid auth token |
| 403 | Forbidden | Free tier user (Premium required) |
| 404 | Not Found | Document doesn't exist or doesn't belong to user |
| 429 | Rate Limited | Too many requests, wait and retry |
| 500 | Server Error | Processing error, retry or contact support |
PII Types Detected
The system detects 25+ types of PII:
Personal
EMAIL_ADDRESSPHONE_NUMBERSSNDATE_OF_BIRTHDRIVERS_LICENSEPASSPORTFULL_NAME
Financial
CREDIT_CARDCVVROUTING_NUMBERACCOUNT_NUMBERIBAN
Security
PASSWORDAPI_KEYAWS_KEYGITHUB_TOKENJWTPRIVATE_KEY
Technical & Medical
IP_ADDRESSMAC_ADDRESSCRYPTO_WALLETMEDICAL_RECORD
Performance
| Document Type | Typical Processing Time |
|---|---|
| Small PDF (1-10 pages) | 10-30 seconds |
| Medium PDF (10-50 pages) | 30s - 2 minutes |
| Large PDF (50-200 pages) | 2-5 minutes |
| Word Document | 5-20 seconds |
| Image (with OCR) | 5-15 seconds per page |
Note: Cold starts may take an additional 10-20 seconds on the first request.