Telemetry

What FORGE collects, how anonymization works, the self-improving feedback loop, and how to opt out.

Overview

FORGE collects structural telemetry about normalization runs through the forge/telemetry/ package. This data serves two purposes:

Local observability — Operators can inspect failure patterns, lane performance, and convergence behavior in their own systems.
Community improvement — When upload is enabled, anonymized structural metadata is sent to telemetry.fixpointforge.report to improve normalizers for everyone.

💡

Privacy guarantee: Telemetry never includes raw content. Not in local records, not in uploaded data. Only structural metadata (content type, lane IDs, timing, iteration count, failure patterns) is collected.

What's Collected

Every call to normalize_content() produces a FailureRecord, regardless of whether the normalization succeeded or failed. The record captures:

Category	Fields	Example
Content metadata	content_type, content_length, content_hash	`"TEXT"`, `1250`, `"a3f8..."`
Lane results	lane_id, status, changes_made, duration_ms	`"T2"`, `"REPAIRED"`, `3`, `4.2`
Loop metadata	iterations, converged, oscillation_detected	`3`, `true`, `false`
Timing	total_duration_ms, per_lane_timing	`12.3`, `{"T0": 1.2, "T1": 3.4, ...}`
Trust result	trust_level, passed	`"REPAIRED"`, `true`
Failure details	failure_type, failure_lane, failure_message	`"non_convergence"`, `"T2"`, `"..."`
Environment	forge_version, python_version, os_platform	`"2.0.0"`, `"3.11"`, `"linux"`

🚫

Never collected: Raw content, normalized content, PII, API keys, secrets, user identities, IP addresses, or any data that could identify individuals or organizations.

FailureRecord Schema

The complete FailureRecord structure:

JSON Schema

{
  "record_id": "uuid-v4",
  "timestamp": "2026-03-20T10:30:00Z",
  "forge_version": "2.0.0",
  
  "content": {
    "type": "TEXT",
    "length": 1250,
    "hash": "sha256:a3f8c1..."
  },
  
  "loop": {
    "iterations": 3,
    "converged": true,
    "oscillation": false,
    "convergence_threshold": 0.0
  },
  
  "lanes": [
    {
      "id": "T0",
      "status": "PASSED",
      "changes": 2,
      "duration_ms": 1.2,
      "issues": []
    },
    {
      "id": "T1",
      "status": "PASSED",
      "changes": 0,
      "duration_ms": 0.8,
      "issues": []
    },
    {
      "id": "T2",
      "status": "REPAIRED",
      "changes": 3,
      "duration_ms": 4.2,
      "issues": ["injection_detected", "hidden_instruction"]
    },
    {
      "id": "T3",
      "status": "REPAIRED",
      "changes": 1,
      "duration_ms": 2.1,
      "issues": ["pii_email"]
    },
    {
      "id": "T4",
      "status": "PASSED",
      "changes": 0,
      "duration_ms": 0.5,
      "issues": []
    }
  ],
  
  "result": {
    "trust_level": "REPAIRED",
    "passed": true,
    "total_duration_ms": 12.3
  },
  
  "environment": {
    "python_version": "3.11.8",
    "os_platform": "linux",
    "forge_config_hash": "sha256:b2c4..."
  }
}

Anonymization

FORGE applies two layers of anonymization to telemetry data:

Layer 1 — Structural Extraction (Always On)

By design, telemetry records only contain structural metadata — not content. The collector in forge/telemetry/collector.py extracts lane results, timing, and status from the ForgeResult without touching the actual content.

Content is hashed (SHA-256), not stored.
Issue descriptions are categorical labels ("pii_email"), not content excerpts.
Lane status is an enum, not a description of what was found.

Layer 2 — Additional Anonymization (Default On)

When forge.telemetry.anonymize = true (the default), the anonymizer in forge/telemetry/anonymizer.py applies additional stripping:

Content hash is removed (no way to correlate records to specific inputs).
Config hash is removed (no way to infer deployment-specific settings).
Timestamps are bucketed to the nearest hour (reduces temporal correlation risk).
Record IDs are regenerated (no way to track sequences of requests).
Issue counts are reported but issue labels are generalized (e.g., "pii_email" → "pii").

Python — accessing anonymized record

from forge.telemetry.collector import TelemetryCollector
from forge.telemetry.anonymizer import anonymize_record

collector = TelemetryCollector()

# Get raw record (structural metadata only)
raw_record = collector.last_record
print(raw_record.content.hash)  # "sha256:a3f8c1..."

# Anonymize for upload
anon_record = anonymize_record(raw_record)
print(anon_record.content.hash)  # None (removed)
print(anon_record.timestamp)     # "2026-03-20T10:00:00Z" (bucketed)

Collection Pipeline

The telemetry collection pipeline is implemented in forge/telemetry/ with three components:

1. Collector (`collector.py`)

Hooks into the fixpoint loop to create FailureRecords. Runs synchronously within the normalization call. Records are buffered in memory until flushed.

2. Anonymizer (`anonymizer.py`)

Applied to each record before it leaves the process. Strips structural metadata per the anonymization rules. Always runs when anonymize = true (default).

3. Uploader (`uploader.py`)

Batches anonymized records and uploads them to telemetry.fixpointforge.report via HTTPS POST. Only active when upload = true (default: false).

normalize_content()
       │
       ▼
  Collector ──▶ FailureRecord (raw, structural only)
       │
       ▼
  Anonymizer ──▶ FailureRecord (anonymized)
       │
       ▼
  Local Buffer ──▶ [batch_size records or flush_interval]
       │
       ▼ (if upload=true)
  Uploader ──▶ HTTPS POST to telemetry.fixpointforge.report
       │
       ▼
  Response: 200 OK (accepted) or 429 (rate limited)

ℹ️

The uploader runs in a background thread to avoid adding latency to normalization calls. If the upload endpoint is unreachable, records are buffered locally and retried on the next flush.

The Self-Improving Loop

Uploaded telemetry powers a feedback loop that improves FORGE's normalizers across releases:

Aggregate — Records from thousands of FORGE instances are aggregated by content type, lane, and failure pattern.
Pattern Detection — Statistical analysis identifies:
- Lanes with high failure rates for specific content types
- Common oscillation patterns between lane pairs
- Content types that consistently require many iterations
- New injection patterns detected by T2
Normalizer Updates — Based on patterns, the FORGE team:
- Adds new regex patterns to T2 for emerging injection techniques
- Adjusts T3 PII detection for new data formats
- Tunes convergence behavior for lane pairs that oscillate
- Adds repair strategies for common structural issues
Release — Updated normalizers ship in the next FORGE release. Users update via pip install --upgrade fixpointforge.

💡

The self-improving loop is a community benefit. No individual user's data is identifiable in the aggregate. Think of it like browser telemetry that helps improve web standards — structural patterns, not personal content.

Opting Out

FORGE provides granular control over telemetry at every level:

Disable All Telemetry

No records are created. No data is buffered or stored locally.

Environment Variable

export FORGE_TELEMETRY_ENABLED=false

YAML

forge:
  telemetry:
    enabled: false

Python

from forge.config import ForgeConfig

config = ForgeConfig(telemetry_enabled=False)
result = normalize_content(content, "TEXT", config=config)

Disable Upload Only

Records are collected locally but never sent to the FORGE cloud. Useful for local debugging and observability.

Environment Variable

export FORGE_TELEMETRY_ENABLED=true
export FORGE_TELEMETRY_UPLOAD=false

YAML

forge:
  telemetry:
    enabled: true
    upload: false

Disable Lane Timing

Collect records but without per-lane timing data (reduces granularity of uploaded data).

export FORGE_TELEMETRY_LANE_TIMING=false

Maximum Anonymization

Enable all anonymization features for the most privacy-conscious configuration:

YAML — maximum privacy

forge:
  telemetry:
    enabled: true
    upload: true
    anonymize: true            # Strip content hashes, bucket timestamps
    include_lane_timing: false # No per-lane timing
    batch_size: 500            # Larger batches = less temporal correlation
    flush_interval_s: 300      # Less frequent uploads

Accessing Local Telemetry

When telemetry is enabled, you can access records programmatically:

Python

from forge.telemetry.collector import TelemetryCollector

collector = TelemetryCollector()

# Get the most recent record
last = collector.last_record
print(f"Content type: {last.content.type}")
print(f"Trust level: {last.result.trust_level}")
print(f"Iterations: {last.loop.iterations}")
print(f"Duration: {last.result.total_duration_ms}ms")

# Get all buffered records
for record in collector.buffer:
    if not record.result.passed:
        print(f"FAILURE: {record.content.type} — {record.result.trust_level}")
        for lane in record.lanes:
            if lane.status != "PASSED":
                print(f"  {lane.id}: {lane.status} ({lane.issues})")

# Flush buffer manually
collector.flush()

# Export to JSON
import json
with open("telemetry_export.json", "w") as f:
    json.dump([r.to_dict() for r in collector.buffer], f, indent=2)

Upload Protocol

When upload is enabled, records are sent to the FORGE telemetry endpoint:

Setting	Value
Endpoint	`https://telemetry.fixpointforge.report`
Method	`POST`
Content-Type	`application/json`
Authentication	None required (anonymous submission)
Batch size	Configurable (default: 100 records)
Flush interval	Configurable (default: 60 seconds)
Retry policy	Exponential backoff, max 3 retries
Rate limit	1000 records/minute per source IP

HTTP request (conceptual)

POST https://telemetry.fixpointforge.report/v1/records
Content-Type: application/json

{
  "forge_version": "2.0.0",
  "batch_id": "uuid-v4",
  "records": [
    { ... FailureRecord ... },
    { ... FailureRecord ... }
  ]
}

The endpoint returns:

200 OK — Records accepted.
429 Too Many Requests — Rate limited. Retry after the Retry-After header value.
500 Internal Server Error — Server issue. Records are re-buffered locally for retry.

Frequently Asked Questions

Can you see my content?

No. Telemetry never includes raw content, normalized content, or any text from your data. Only structural metadata (content type, lane IDs, timing, failure categories) is collected.

Can you identify my organization?

No. With anonymization enabled (the default), there are no identifiers that link records to specific users, organizations, or deployments. Content hashes, config hashes, and timestamps are stripped or bucketed.

What if I'm in a regulated industry?

Disable telemetry upload (FORGE_TELEMETRY_UPLOAD=false). You can still use local telemetry for observability. Or disable telemetry entirely (FORGE_TELEMETRY_ENABLED=false).

Does telemetry add latency?

Negligible. Record creation is synchronous but lightweight (no content processing). Upload runs in a background thread and never blocks normalization calls.

Can I run my own telemetry endpoint?

Yes. Set FORGE_TELEMETRY_ENDPOINT to your own URL. The uploader sends the same JSON payload regardless of the endpoint. Use this for internal observability dashboards.

Telemetry

Overview

What's Collected

FailureRecord Schema

Anonymization

Layer 1 — Structural Extraction (Always On)

Layer 2 — Additional Anonymization (Default On)

Collection Pipeline

1. Collector (collector.py)

2. Anonymizer (anonymizer.py)

3. Uploader (uploader.py)

The Self-Improving Loop

Opting Out

Disable All Telemetry

Disable Upload Only

Disable Lane Timing

Maximum Anonymization

Accessing Local Telemetry

Upload Protocol

Frequently Asked Questions

Can you see my content?

Can you identify my organization?

What if I'm in a regulated industry?

Does telemetry add latency?

Can I run my own telemetry endpoint?

1. Collector (`collector.py`)

2. Anonymizer (`anonymizer.py`)

3. Uploader (`uploader.py`)