Skip to main content
This guide explains how to collect and submit documents in a way that preserves forensic value and improves detection reliability.

The main rule: submit the original file bytes

Submit the raw original file as received from the end user (or downstream customer). Avoid intake paths that re-save, re-encode, or reconstruct the document before analysis.
Do not submit files that were pre-processed by third-party tools unless you have no alternative.

Why this matters

Many systems silently modify documents (resizing, JPEG re-compression, stripping metadata, PDF reconstruction). These changes can reduce or eliminate forensic signals. Common culprits:
  • email gateways
  • document management systems
  • file sharing tools
  • messaging apps (WhatsApp/Slack)
  • “optimize” or “compress” functions

Practical guidance

DoDon’t
Collect the file via direct upload in your user flowRoute via messaging apps (often re-encodes images)
Store and forward the original bytes unchangedRe-save documents in editors “to standardize” them
Upload binary bytes to the presigned URLBase64-encode and re-wrap the file payload

Format considerations

PDFs (best for automation and accuracy)

Digitally issued PDFs (e.g., bank statements, utility bills, forms) generally enable stronger detection and automation and are less sensitive to quality loss from re-encoding.

Images (more sensitive to quality)

Images require more care:
  • authenticity is harder to determine
  • pre-digital tampering (before scanning) may not be detectable without content analysis
  • lower image quality increases false positives

Redaction guidance

Redaction and annotation creates a new modified document, which can reduce forensic value. Key impacts:
  • may change document structure/metadata and obscure evidence
  • invalidates digital signatures/hashes used for authenticity verification
  • some tools reconstruct PDFs in ways that increase false positives
If possible, analyze originals first and apply redaction only after analysis. If you must redact before analysis, keep the original stored separately for audit/debug (if allowed by your policy).

Supported file formats

See Supported formats.

Go-live checklist

Before you go live, validate:
  • You can capture and store original bytes end-to-end.
  • Your pipeline does not re-encode, resize, reconstruct, or “optimize” documents.
  • You have a strategy for low-quality image handling (re-request vs manual review).
  • Your logs contain only non-PII identifiers (e.g., submission_id, safe query_id, timestamps, stage/cell).