Document Scanning API

Frontend documentation for PII detection in documents (PDF, Word, Images).

Base URL

Development: http://localhost:8787
Production: https://api.pasteproof.com

Authentication

All endpoints require a Bearer token in the Authorization header:

Authorization: Bearer YOUR_JWT_TOKEN

Supported File Types

TypeExtensionsMIME Types
PDF.pdfapplication/pdf
Word.docx .docapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
Images.jpg .pngimage/jpeg image/png

Max file size: 50MB

Endpoints

1. Upload Document

Upload a document for PII scanning and redaction.

Endpoint:

POST /v1/documents/upload

Request:

Content-Type: multipart/form-data

file: File (required)
mode: "detection" | "redaction" (optional, default: "redaction")
  - "detection": Return detections list only (no redacted document)
  - "redaction": Return redacted document with detections
redactionStyle: "label" | "blackout" (optional, default: "label")
  - "label": Replace PII with [PII_TYPE] labels
  - "blackout": Replace PII with black rectangles
piiTypes: JSON array or comma-separated string (optional)
  - Specific PII types to check for
  - Format: ["EMAIL_ADDRESS","SSN"] or "EMAIL_ADDRESS,SSN"
  - If omitted, checks for all PII types
ignorePiiTypes: JSON array or comma-separated string (optional)
  - PII types to ignore/skip
  - These types will not be detected or redacted
customMatches: JSON array or comma-separated string (optional)
  - Exact text strings to detect and redact
  - Format: ["internal_id_123","ACME-CORP"] or "internal_id_123,ACME-CORP"
  - Useful for organization-specific sensitive data
customIgnores: JSON array or comma-separated string (optional)
  - Exact text strings to ignore even if they match PII patterns
  - Format: ["example@test.com","000-00-0000"] or "example@test.com,000-00-0000"
  - Useful for known safe test data or placeholders

Example:

const formData = new FormData();
formData.append('file', fileBlob);
formData.append('mode', 'redaction');
formData.append('redactionStyle', 'blackout');
formData.append('piiTypes', JSON.stringify(['EMAIL_ADDRESS', 'SSN']));
formData.append('ignorePiiTypes', 'CREDIT_CARD,API_KEY');
formData.append('customMatches', 'internal_id_123,ACME-CORP,project-apollo');
formData.append('customIgnores', 'example@test.com,000-00-0000');

Response (Success - 200):

{
  "success": true,
  "documentId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "message": "Your document is being processed.",
  "estimatedTime": "1-5 minutes",
  "filename": "employee_data.pdf",
  "fileSize": 2547896
}

2. Check Processing Status

Poll this endpoint to check if processing is complete.

Endpoint:

GET /v1/documents/:documentId/status

Response (Complete - 200):

{
  "success": true,
  "documentId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "complete",
  "progress": 100,
  "detections": [
    {
      "type": "SSN",
      "value": "[REDACTED_SSN]",
      "page": 1,
      "confidence": 95
    }
  ],
  "summary": {
    "total": 15,
    "by_type": {
      "SSN": 3,
      "EMAIL_ADDRESS": 5,
      "PHONE_NUMBER": 4
    },
    "pages_processed": 5
  }
}

Status Values: queued, processing, complete, failed

3. List Documents

List all document scans for the authenticated user with pagination and filtering.

Endpoint:

GET /v1/documents

Query Parameters:

limit (optional): Number of results per page (default: 50, max: 100)
offset (optional): Pagination offset (default: 0)
status (optional): Filter by status (queued, processing, complete, failed)
orderBy (optional): Field to sort by (default: created_at)
order (optional): Sort order asc or desc (default: desc)

Response (Success - 200):

{
  "success": true,
  "documents": [
    {
      "documentId": "550e8400-e29b-41d4-a716-446655440000",
      "filename": "employee_data.pdf",
      "fileType": "pdf",
      "fileSize": 2547896,
      "status": "failed",
      "progress": 0,
      "detectionsCount": 0,
      "hasRedactedDocument": false,
      "error": "Processing failed",
      "retryCount": 1,
      "createdAt": "2026-01-08T10:30:00Z",
      "startedAt": null,
      "completedAt": null
    }
  ],
  "pagination": {
    "total": 10,
    "limit": 50,
    "offset": 0,
    "hasMore": false
  }
}

4. Retry Failed Scan

Resubmit a failed or queued document to the processing queue.

Endpoint:

POST /v1/documents/:documentId/retry

Features:

  • Validates the document exists and belongs to the user
  • Prevents retrying documents currently being processed
  • Verifies the original file still exists in R2 storage
  • Increments retry count
  • Resets status, clears errors, and requeues the document

Response (Success - 200):

{
  "success": true,
  "message": "Document queued for retry",
  "documentId": "550e8400-e29b-41d4-a716-446655440000",
  "status": "queued",
  "retryCount": 2
}

Error Responses:

  • 404 if document not found
  • 400 if document is currently processing
  • 404 if original file no longer exists in storage
  • 503 if R2/Queue not configured

5. Download Redacted Document

Download the redacted version of the document.

Endpoint:

GET /v1/documents/:documentId/download

Response (Success - 200):

Content-Type: application/pdf
Content-Disposition: attachment; filename="redacted_employee_data.pdf"

[Binary file data]

6. Delete Document

Delete a document and all associated files.

Endpoint:

DELETE /v1/documents/:documentId

Response (Success - 200):

{
  "success": true,
  "message": "Document deleted successfully"
}

Error Codes

CodeMeaningCommon Causes
400Bad RequestInvalid file type, file too large, missing file
401UnauthorizedMissing or invalid auth token
403ForbiddenFree tier user (Premium required)
404Not FoundDocument doesn't exist or doesn't belong to user
429Rate LimitedToo many requests, wait and retry
500Server ErrorProcessing error, retry or contact support

PII Types Detected

The system detects 25+ types of PII:

Personal

  • EMAIL_ADDRESS
  • PHONE_NUMBER
  • SSN
  • DATE_OF_BIRTH
  • DRIVERS_LICENSE
  • PASSPORT
  • FULL_NAME

Financial

  • CREDIT_CARD
  • CVV
  • ROUTING_NUMBER
  • ACCOUNT_NUMBER
  • IBAN

Security

  • PASSWORD
  • API_KEY
  • AWS_KEY
  • GITHUB_TOKEN
  • JWT
  • PRIVATE_KEY

Technical & Medical

  • IP_ADDRESS
  • MAC_ADDRESS
  • CRYPTO_WALLET
  • MEDICAL_RECORD

Performance

Document TypeTypical Processing Time
Small PDF (1-10 pages)10-30 seconds
Medium PDF (10-50 pages)30s - 2 minutes
Large PDF (50-200 pages)2-5 minutes
Word Document5-20 seconds
Image (with OCR)5-15 seconds per page

Note: Cold starts may take an additional 10-20 seconds on the first request.