Treatments and Transforms

In a CDS, a "treatment" is any processing step applied to data as it crosses a security boundary. A single transfer might pass through a dozen different treatments -- each one checking for a different category of risk, each one preparing the data for the next stage.

The NCSC compares this to airport security: multiple sequential checks, where each prepares the data for the next. Assurance is not gained at a single point but through the pipeline.

The Treatment Pipeline

Data entering or leaving a security domain passes through a chain of treatments. The chain is directional -- different treatments apply to import flows (low-to-high) versus export flows (high-to-low) -- and it is ordered so that each stage can trust the output of the previous one.

Import Pipeline (Low-to-High)

The primary concern is integrity: preventing malware, exploits, and corrupted content from entering the trusted domain.

Transformation (untrusted side) -- convert complex formats to simpler, verifiable ones
Protocol break and flow control -- terminate the connection, pass payload via simplified protocol
Verification (trusted side) -- syntactic and semantic checks on the now-simple content

The NCSC places the transformation engine on the untrusted side because "we assume that the transformation engine could be compromised." Complex, vulnerable parsing happens where compromise is least damaging.

Export Pipeline (High-to-Low)

The primary concern is confidentiality: ensuring classified content is not disclosed.

Release authorisation -- determining whether the data may be released
Release control -- enforcing that only authorised data passes
Content validation -- keyword scanning, DLP, security label checks
Human review (where required) -- manual inspection and approval

The design principle: security-critical controls should be "single purpose and suitably hardened," and the architecture should avoid a "monolithic technology stack providing multiple security functions."

Treatment Types

1. Content Transformation

Converts complex, untrusted data formats into simpler, verifiable formats before they cross the boundary. The NCSC describes this as designed "with the aim of neutering any malicious code present in the content."

Nested content requires recursive handling: "Nested content should be un-packed, transformed if required, and verified."

2. Content Inspection and Verification

After transformation, the verification engine performs two kinds of checks:

Syntactic verification ensures "the structure and syntax of the object are correct"
Semantic verification ensures "the meaning is valid in the context of the operation or business process"

For structured data, the Isode M-Guard implements this through multiple rule mechanisms: XML Schema (protocol structure), XPath (element-level checks), Schematron (flexible rules), and Relax NG (modern XML specification).

3. Format Verification

The CDS must reliably determine what a file actually is and validate its structure. The NCSC requires that "content format should be verified robustly and consistently at each step."

Hardware verification preferred

The NCSC states that software-only format verification is "usually unachievable for complex parsing software." Hardware specifically designed to verify simple data types is preferred. The hybrid approach uses hardware for structure validation and software to check that "each data field is valid for the end application."

This is a remarkably strong statement from a national authority and has direct implications for choosing between hardware and software CDS.

4. Dirty Word Searching

Keyword scanning applies pre-defined word lists against outbound content to detect classified terms, codewords, or sensitive identifiers that should not cross the boundary. The NCSC describes this as one of several "heuristic measures, such as keyword scanning, data loss prevention tooling, or manual second-person review."

Dirty word searching is typically one treatment in a broader export chain -- necessary but not sufficient on its own.

5. Metadata Stripping and Enrichment

Export flows should not contain superfluous information: "Data exports should not contain any superfluous user data or machine generated information" except information "specifically required for the intelligibility of the exported data."

Stripping removes hidden metadata that could leak sensitive information: author names, revision history, tracked changes, GPS coordinates in images, internal network paths.

Enrichment adds required metadata: security labels, provenance markings, or handling instructions.

6. Security Label Checking

The CDS checks data labels against catalogues of allowed labels to enforce information flow policy. The Isode M-Guard supports STANAG 4774/4778 NATO Confidentiality Metadata Labels, checking labels to "prevent leak of sensitive data."

Label checking is the mechanism that connects the data's classification to the policy about what can cross the boundary.

7. Content Disarm and Reconstruct (CDR)

CDR is the content-level complement to protocol break. Where protocol break strips and reconstructs the transport layer, CDR strips and reconstructs the content itself.

CDR "does not determine or detect malware's functionality" -- instead, it takes a preventive approach by removing all unapproved components regardless of whether they are known to be malicious. It deconstructs files, removes elements that do not match the file type's standards or set policies, then rebuilds clean versions.

Three levels of CDR:

Level	Approach	Security	Fidelity
Level 1	Flatten and convert to PDF	Maximum	Lowest -- original format lost
Level 2	Strip active content, preserve file type	High	Good -- format preserved, macros removed
Level 3	Eliminate all risk, preserve type, integrity, and active content	Good	Highest -- content and structure preserved

CDR is effective against zero-day vulnerabilities because it removes all potentially malicious code rather than relying on known threat signatures. Glasswall's implementation "does not depend on malware signatures or prior knowledge of threats."

CDR is being absorbed into the guard pipeline

CDR was briefly a standalone product category. It is now being integrated directly into guard products as a treatment stage. Everfox's acquisition of Deep Secure exemplifies this trend. When evaluating CDS, look for CDR as part of the guard rather than as a separate product.

8. Schema Validation

For structured data (XML, JSON, database records), schema validation ensures conformance to an expected structure. The M-Guard implements this through configurable Application Profiles with sets of rule catalogues. Each guard instance can enable rules from loaded catalogues, providing flexible, per-deployment validation.

9. Image and Document Sanitisation

Image sanitisation re-encodes images to strip steganographic content and embedded metadata. Document sanitisation removes macros, scripts, embedded objects, and active content. The NCSC requires that "verification components should ensure all potentially active content has been removed."

10. Anti-Covert-Channel Treatment

Covert channels exploit encoding variants, timing differences, or other subtle signals to smuggle information across a boundary. The M-Guard applies content normalisation to guard "against covert channels and attacks using encoding variants." Rate limiting also reduces covert channel bandwidth by providing a "mechanism to limit the rate of messages."

11. Redaction

Redaction removes or masks specific portions of content before release. This is closely related to human review for high-to-low transfers, where a trained reviewer examines, redacts classified content, and approves release. Automated redaction can handle simpler cases based on pattern matching or label-driven rules.

12. Business Rule Checks

Beyond structural and security checks, guards can enforce business-level rules that validate whether content makes sense in the operational context. For example: checking that a message's subject matter matches the originator's authorised topics, or that database query results fall within expected value ranges.

Treatment Summary Table

Treatment	Primary Direction	Purpose
Content transformation	Import	Convert complex formats to simple, verifiable ones
Content inspection / verification	Both	Syntactic and semantic correctness checks
Format verification	Import	Confirm file type and validate structure
Dirty word searching	Export	Detect classified terms and codewords
Metadata stripping	Export	Remove hidden information that could leak
Metadata enrichment	Both	Add required labels and handling markings
Security label checking	Both	Enforce information flow policy based on classification
CDR	Import	Deconstruct and rebuild files to known-good standard
Schema validation	Both	Validate structured data against expected schemas
Image / document sanitisation	Import	Remove steganographic content, macros, active content
Anti-covert-channel	Both	Normalise content and limit rates to prevent signalling
Redaction	Export	Remove or mask classified portions
Business rule checks	Both	Validate operational context and business logic

Policy-Driven Treatment Selection

Treatment selection is driven by security policy, which itself reflects the threat model and risk appetite. The NCSC guidance is to "transfer only what is necessary to achieve the required business outcomes" and to "choose simple protocols and strip unneeded information where possible."

In practice, the treatment pipeline is configured per data type and per flow direction. Different file types need different treatments. An incoming Office document needs CDR and active content removal. An outgoing intelligence report needs dirty word search, label checking, metadata stripping, and possibly human review. A streaming sensor feed needs lightweight format verification and rate limiting.

The security enforcement mechanisms -- labels, access control, release authority -- govern what gets through. The treatments govern how it gets processed on the way.

For the protocol-level foundation that these treatments build on, see Protocol Break. For the formal models governing data flow direction, see Data Flow Models.