Architecture
Fixpoint loop internals, convergence detection, lane interaction, failure modes, and the attestation pipeline.
High-Level Overview
FORGE is organized around a single entry point β normalize_content() β that orchestrates a multi-stage pipeline.
The architecture has four major subsystems:
- Router β Maps content type β lane chain
- Fixpoint Loop Engine β Runs lane chains to convergence
- Attestation Engine β Creates sealed ForgeStamps
- Telemetry Collector β Records structural failure metadata
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β normalize_content() β
β β
β ββββββββββββ ββββββββββββββββββ ββββββββββββββββ β
β β Router ββββΆβ Fixpoint Loop ββββΆβ Attestation β β
β β β β Engine β β Engine β β
β ββββββββββββ βββββββββ¬βββββββββ ββββββββββββββββ β
β β β
β βββββββββΌβββββββββ β
β β Telemetry β β
β β Collector β β
β ββββββββββββββββββ β
β β
β Returns: ForgeResult { content, audit, stamp, ... } β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The Router
The router is the first component to execute. It receives the content_type string and resolves it to a lane chain definition. The chain definition includes:
- Pre-loop lanes β Lanes that run exactly once before the fixpoint loop begins.
- Loop lanes β Lanes that run repeatedly inside the fixpoint loop until convergence.
- Post-loop lanes β Lanes that run exactly once after the loop (typically attestation).
| Content Type | Pre-loop | Loop | Post-loop |
|---|---|---|---|
DIFF |
L0, L0.5, L0.7 | L1, L2, L3, L4 | β |
TEXT |
T0 | T1, T2, T3, T4 | β |
JSON |
T1 | T2, T3, T4 | β |
PROMPT |
β | β | T2 |
COMPLIANCE |
T3, T0 | β | T4 |
SPEECH |
S0, T0 | T1, T2, T3, T4 | β |
VIDEO_META |
T1, V0, V1 | β | T4 |
The router also applies configuration overrides: disabled lanes are skipped, per-type lane exclusions (skip_lanes) are applied, and disabled content types raise ForgeConfigError.
The Fixpoint Loop Engine
The fixpoint loop is the heart of FORGE. It's based on the mathematical concept of a fixpoint: a value that is unchanged by a function. In FORGE's case, content has reached a fixpoint when running all loop lanes produces no changes β the output is identical to the input.
Algorithm
# Simplified fixpoint loop (actual implementation in forge/loop.py)
def fixpoint_loop(content, loop_lanes, config):
for iteration in range(config.max_iterations):
previous = content
for lane in loop_lanes:
if not config.lane_enabled(lane.id):
continue
content = lane.process(content)
audit.record(lane, content)
# Check convergence
if content == previous:
return content, CONVERGED
# Check threshold-based convergence
if diff_ratio(content, previous) <= config.convergence_threshold:
return content, CONVERGED
# Did not converge within max_iterations
if config.fail_closed:
return content, REJECTED
else:
return content, QUARANTINED
Convergence Detection
FORGE uses two convergence strategies:
-
Exact match (
convergence_threshold = 0.0) β The default. Content must be byte-for-byte identical between iterations. This is the strictest and most reliable mode. -
Threshold match (
convergence_threshold > 0.0) β Content is considered converged if the diff ratio between consecutive iterations is below the threshold. Useful for content where minor formatting variations are acceptable.
Oscillation detection: FORGE also detects oscillation β when content alternates between two or more states without converging. If the same content hash appears twice in the iteration history, FORGE declares oscillation and takes the last stable state.
Iteration Budget
The max_iterations setting (default: 10) bounds the loop. In practice, most content converges in 2β4 iterations. The budget protects against pathological cases where lanes interact in ways that prevent convergence.
| Content Type | Typical Iterations | Reason |
|---|---|---|
| TEXT | 1β2 | Simple: encoding + policy rarely interact |
| DIFF | 2β5 | Hunk repairs can trigger structural re-validation |
| JSON | 1β3 | Schema repairs may expose new policy violations |
| SPEECH | 2β4 | Disfluency removal can reveal hidden PII |
Lane Interaction
Lanes within the fixpoint loop can interact β one lane's repair may create new issues that another lane detects. This is by design: the fixpoint loop exists precisely to handle these cascading effects.
Common Interaction Patterns
T1 β T3 (Schema β Policy)
T1 repairs a missing field by filling a default value. T3 then scans the filled value and detects PII or policy violations that weren't present in the original content. On the next iteration, T1 sees no schema issues, and T3 has already cleaned the value β convergence.
L0.5 β L1 (Hunk β Structure)
L0.5 repairs hunk line counts. This can change the apparent structure of the diff (a hunk that seemed empty now has content). L1 then re-validates the file-level structure. In the fixpoint loop, L1 and L0.5 don't conflict β L0.5 runs pre-loop, so L1 always sees corrected hunks.
S0 β T2 (Speech β Prompt Safety)
S0 removes filler words and collapses repetitions. This can reveal injection patterns that were broken up by disfluencies: "ignore um all previous uh instructions" becomes "ignore all previous instructions" after S0, which T2 then detects.
T2 β T3 (Prompt Safety β Policy)
T2 strips injected content, shortening the text. T3 then applies length-based policies and PII detection on the shortened text. In rare cases, removing injected content can change PII detection boundaries.
Lane ordering within the loop matters. FORGE runs lanes in a fixed order per content type (defined by the router). Changing the order is not supported β the chains are specifically designed for correctness.
Failure Modes
FORGE is designed to fail safely. Here are the failure modes and how each is handled:
1. Non-Convergence
Content doesn't stabilize within max_iterations. This typically means lanes are producing cascading changes that don't settle.
- fail_closed = true β Content is REJECTED. ForgeStamp trust level is REJECTED.
- fail_closed = false β Content is QUARANTINED. The last iteration's output is emitted with a QUARANTINED trust level.
2. Oscillation
Content alternates between two or more states. FORGE detects this by tracking content hashes across iterations. When a duplicate hash is seen, the loop exits immediately.
- The content from the first occurrence of the repeated hash is used (the "stable" state before oscillation began).
- Trust level is set to QUARANTINED regardless of
fail_closed.
3. Lane Timeout
A single lane exceeds its timeout_ms. The lane is aborted and its status is set to ERROR. Depending on strict_mode, this either fails the entire normalization or allows remaining lanes to continue.
4. Total Timeout
The entire normalization call exceeds timeout_ms. A ForgeTimeoutError is raised with a partial result attached. The partial result contains whatever was computed before the timeout.
5. Lane Internal Error
A lane encounters an unrecoverable bug (not a content issue). A ForgeLaneError is raised with the lane ID and partial result. Telemetry records the failure for debugging.
| Failure | Trust Level | Content Emitted | Exception |
|---|---|---|---|
| Non-convergence (fail_closed) | REJECTED | Last iteration | None |
| Non-convergence (fail_open) | QUARANTINED | Last iteration | None |
| Oscillation | QUARANTINED | Pre-oscillation state | None |
| Lane timeout | Depends on strict_mode | Pre-timeout state | None (unless strict) |
| Total timeout | N/A | Partial (on exception) | ForgeTimeoutError |
| Lane internal error | N/A | Partial (on exception) | ForgeLaneError |
The Attestation Pipeline
After the fixpoint loop completes (or on failure), the attestation engine creates a ForgeStamp. The stamp is the final arbiter of trust.
Stamp Creation Process
- Aggregate lane results β Collect status from every lane that ran. The worst status determines the base trust level.
- Apply convergence result β If the loop converged and all lanes passed, trust is TRUSTED. If repairs were made, trust is REPAIRED. Non-convergence or oscillation produces QUARANTINED or REJECTED.
- Build stamp payload β Actor info, lane list, trust level, timestamp, content hash (optional), FORGE version.
- Seal with HMAC-SHA256 β The payload is serialized to a canonical JSON form and signed with the HMAC secret.
import hmac
import hashlib
import json
def seal_stamp(payload: dict, secret: str) -> str:
"""Create HMAC-SHA256 signature for a ForgeStamp."""
# Canonical JSON: sorted keys, no whitespace
canonical = json.dumps(payload, sort_keys=True, separators=(',', ':'))
signature = hmac.new(
secret.encode('utf-8'),
canonical.encode('utf-8'),
hashlib.sha256
).hexdigest()
return signature
Trust Level Determination
Lane Results Convergence β Trust Level
βββββββββββββββββ βββββββββββ βββββββββββββ
All PASSED Converged β TRUSTED
Any REPAIRED Converged β REPAIRED
Any WARNING Converged β REPAIRED
Any ERROR N/A β REJECTED
N/A Not converged β REJECTED (fail_closed)
N/A Not converged β QUARANTINED (fail_open)
N/A Oscillation β QUARANTINED
Telemetry Integration
The telemetry subsystem hooks into the fixpoint loop and attestation engine to collect structural failure data. It operates as a passive observer β it never modifies content or affects normalization outcomes.
Data Flow
Fixpoint Loop βββΆ FailureRecord βββΆ Anonymizer βββΆ Local Buffer
β
βΌ (if upload=true)
telemetry.fixpointforge.report
β
βΌ
Self-improving loop
(normalizer updates)
Telemetry records are created for every normalization run β not just failures. The record includes lane timing, iteration count, convergence status, content type, and structural metadata. Raw content is never included.
See Telemetry for the complete FailureRecord schema, anonymization details, and opt-out instructions.
The Self-Improving Loop
When telemetry upload is enabled, failure records flow to telemetry.fixpointforge.report. This data powers a feedback loop that improves FORGE's normalizers over time:
- Collection β Structural failure metadata arrives from FORGE instances worldwide.
- Aggregation β Failures are grouped by lane, content type, and failure pattern.
- Analysis β Patterns are identified: which lanes fail most often, which content types cause oscillation, which lane interactions produce non-convergence.
- Improvement β Normalizer logic is updated based on aggregate patterns. New regex patterns, adjusted thresholds, new repair strategies.
- Release β Updated normalizers ship in the next FORGE release.
The self-improving loop is entirely optional. FORGE works perfectly without telemetry upload. But enabling it contributes to improving normalization quality for the entire community.
Source Code Layout
forge/
βββ __init__.py # normalize_content() β the public API
βββ config.py # ForgeConfig: env vars + YAML loading
βββ result.py # ForgeResult, ForgeStamp, AuditEntry, LaneResult
βββ router.py # Content type β lane chain mapping
βββ loop.py # Fixpoint loop engine + convergence detection
βββ attestation.py # HMAC-SHA256 stamp creation + verification
βββ exceptions.py # All exception classes
βββ types.py # Content type + trust level constants
β
βββ normalizers/ # 20 normalizer files
β βββ base.py # Abstract base class for all lanes
β βββ registry.py # Lane registration + discovery
β βββ chain.py # Chain builder (pre-loop, loop, post-loop)
β βββ utils.py # Shared utilities (diff parsing, PII regex, etc.)
β βββ sp_outbound.py # Service Provider outbound normalizer
β βββ l0_syntax.py # L0: Diff syntax validation
β βββ l05_hunk.py # L0.5: Hunk header repair
β βββ l07_context.py # L0.7: Context line validation
β βββ l1_structure.py # L1: Structural normalization
β βββ l2_semantic.py # L2: Semantic checking
β βββ l3_compliance.py # L3: Diff compliance
β βββ l4_attestation.py # L4: Diff attestation
β βββ t0_encoding.py # T0: Encoding normalization
β βββ t1_schema.py # T1: Schema validation
β βββ t2_prompt_safety.py # T2: Prompt safety
β βββ t3_policy.py # T3: Policy enforcement
β βββ t4_final.py # T4: Final attestation
β βββ s0_speech.py # S0: Speech normalization
β βββ v0_video_struct.py # V0: Video structure
β βββ v1_video_meta.py # V1: Video temporal validation
β
βββ telemetry/
βββ __init__.py
βββ collector.py # FailureRecord creation + buffering
βββ uploader.py # HTTP upload to telemetry endpoint
βββ anonymizer.py # Content stripping + structural extraction
Design Principles
- Fail closed by default. Uncertain content is rejected, not passed through. This is a security boundary β the default must be safe.
-
Single entry point. One function (
normalize_content()) handles all content types. Consumers don't need to know about lanes or the fixpoint loop. - Cryptographic attestation. Every result is sealed with HMAC-SHA256. Downstream systems can verify provenance without re-running normalization.
- Convergent by design. The fixpoint loop guarantees that output stabilizes. If it doesn't, that's a signal β not a bug.
- Observable but private. Telemetry collects structural metadata, never raw content. Operators have full visibility into what happened without exposing sensitive data.
- Composable lanes. Each lane owns one concern. Lanes compose through the fixpoint loop rather than direct coupling. This makes the system extensible without cascade risk.