v2.0.0 Documentation fixpointforge.dev

Telemetry

What FORGE collects, how anonymization works, the self-improving feedback loop, and how to opt out.


Overview

FORGE collects structural telemetry about normalization runs through the forge/telemetry/ package. This data serves two purposes:

  1. Local observability — Operators can inspect failure patterns, lane performance, and convergence behavior in their own systems.
  2. Community improvement — When upload is enabled, anonymized structural metadata is sent to telemetry.fixpointforge.report to improve normalizers for everyone.
💡

Privacy guarantee: Telemetry never includes raw content. Not in local records, not in uploaded data. Only structural metadata (content type, lane IDs, timing, iteration count, failure patterns) is collected.


What's Collected

Every call to normalize_content() produces a FailureRecord, regardless of whether the normalization succeeded or failed. The record captures:

Category Fields Example
Content metadata content_type, content_length, content_hash "TEXT", 1250, "a3f8..."
Lane results lane_id, status, changes_made, duration_ms "T2", "REPAIRED", 3, 4.2
Loop metadata iterations, converged, oscillation_detected 3, true, false
Timing total_duration_ms, per_lane_timing 12.3, {"T0": 1.2, "T1": 3.4, ...}
Trust result trust_level, passed "REPAIRED", true
Failure details failure_type, failure_lane, failure_message "non_convergence", "T2", "..."
Environment forge_version, python_version, os_platform "2.0.0", "3.11", "linux"
đŸšĢ

Never collected: Raw content, normalized content, PII, API keys, secrets, user identities, IP addresses, or any data that could identify individuals or organizations.


FailureRecord Schema

The complete FailureRecord structure:

JSON Schema
{
  "record_id": "uuid-v4",
  "timestamp": "2026-03-20T10:30:00Z",
  "forge_version": "2.0.0",
  
  "content": {
    "type": "TEXT",
    "length": 1250,
    "hash": "sha256:a3f8c1..."
  },
  
  "loop": {
    "iterations": 3,
    "converged": true,
    "oscillation": false,
    "convergence_threshold": 0.0
  },
  
  "lanes": [
    {
      "id": "T0",
      "status": "PASSED",
      "changes": 2,
      "duration_ms": 1.2,
      "issues": []
    },
    {
      "id": "T1",
      "status": "PASSED",
      "changes": 0,
      "duration_ms": 0.8,
      "issues": []
    },
    {
      "id": "T2",
      "status": "REPAIRED",
      "changes": 3,
      "duration_ms": 4.2,
      "issues": ["injection_detected", "hidden_instruction"]
    },
    {
      "id": "T3",
      "status": "REPAIRED",
      "changes": 1,
      "duration_ms": 2.1,
      "issues": ["pii_email"]
    },
    {
      "id": "T4",
      "status": "PASSED",
      "changes": 0,
      "duration_ms": 0.5,
      "issues": []
    }
  ],
  
  "result": {
    "trust_level": "REPAIRED",
    "passed": true,
    "total_duration_ms": 12.3
  },
  
  "environment": {
    "python_version": "3.11.8",
    "os_platform": "linux",
    "forge_config_hash": "sha256:b2c4..."
  }
}

Anonymization

FORGE applies two layers of anonymization to telemetry data:

Layer 1 — Structural Extraction (Always On)

By design, telemetry records only contain structural metadata — not content. The collector in forge/telemetry/collector.py extracts lane results, timing, and status from the ForgeResult without touching the actual content.

  • Content is hashed (SHA-256), not stored.
  • Issue descriptions are categorical labels ("pii_email"), not content excerpts.
  • Lane status is an enum, not a description of what was found.

Layer 2 — Additional Anonymization (Default On)

When forge.telemetry.anonymize = true (the default), the anonymizer in forge/telemetry/anonymizer.py applies additional stripping:

  • Content hash is removed (no way to correlate records to specific inputs).
  • Config hash is removed (no way to infer deployment-specific settings).
  • Timestamps are bucketed to the nearest hour (reduces temporal correlation risk).
  • Record IDs are regenerated (no way to track sequences of requests).
  • Issue counts are reported but issue labels are generalized (e.g., "pii_email" → "pii").
Python — accessing anonymized record
from forge.telemetry.collector import TelemetryCollector
from forge.telemetry.anonymizer import anonymize_record

collector = TelemetryCollector()

# Get raw record (structural metadata only)
raw_record = collector.last_record
print(raw_record.content.hash)  # "sha256:a3f8c1..."

# Anonymize for upload
anon_record = anonymize_record(raw_record)
print(anon_record.content.hash)  # None (removed)
print(anon_record.timestamp)     # "2026-03-20T10:00:00Z" (bucketed)

Collection Pipeline

The telemetry collection pipeline is implemented in forge/telemetry/ with three components:

1. Collector (collector.py)

Hooks into the fixpoint loop to create FailureRecords. Runs synchronously within the normalization call. Records are buffered in memory until flushed.

2. Anonymizer (anonymizer.py)

Applied to each record before it leaves the process. Strips structural metadata per the anonymization rules. Always runs when anonymize = true (default).

3. Uploader (uploader.py)

Batches anonymized records and uploads them to telemetry.fixpointforge.report via HTTPS POST. Only active when upload = true (default: false).

normalize_content()
       │
       â–ŧ
  Collector ──â–ļ FailureRecord (raw, structural only)
       │
       â–ŧ
  Anonymizer ──â–ļ FailureRecord (anonymized)
       │
       â–ŧ
  Local Buffer ──â–ļ [batch_size records or flush_interval]
       │
       â–ŧ (if upload=true)
  Uploader ──â–ļ HTTPS POST to telemetry.fixpointforge.report
       │
       â–ŧ
  Response: 200 OK (accepted) or 429 (rate limited)
â„šī¸

The uploader runs in a background thread to avoid adding latency to normalization calls. If the upload endpoint is unreachable, records are buffered locally and retried on the next flush.


The Self-Improving Loop

Uploaded telemetry powers a feedback loop that improves FORGE's normalizers across releases:

  1. Aggregate — Records from thousands of FORGE instances are aggregated by content type, lane, and failure pattern.
  2. Pattern Detection — Statistical analysis identifies:
    • Lanes with high failure rates for specific content types
    • Common oscillation patterns between lane pairs
    • Content types that consistently require many iterations
    • New injection patterns detected by T2
  3. Normalizer Updates — Based on patterns, the FORGE team:
    • Adds new regex patterns to T2 for emerging injection techniques
    • Adjusts T3 PII detection for new data formats
    • Tunes convergence behavior for lane pairs that oscillate
    • Adds repair strategies for common structural issues
  4. Release — Updated normalizers ship in the next FORGE release. Users update via pip install --upgrade fixpointforge.
💡

The self-improving loop is a community benefit. No individual user's data is identifiable in the aggregate. Think of it like browser telemetry that helps improve web standards — structural patterns, not personal content.


Opting Out

FORGE provides granular control over telemetry at every level:

Disable All Telemetry

No records are created. No data is buffered or stored locally.

Environment Variable
export FORGE_TELEMETRY_ENABLED=false
YAML
forge:
  telemetry:
    enabled: false
Python
from forge.config import ForgeConfig

config = ForgeConfig(telemetry_enabled=False)
result = normalize_content(content, "TEXT", config=config)

Disable Upload Only

Records are collected locally but never sent to the FORGE cloud. Useful for local debugging and observability.

Environment Variable
export FORGE_TELEMETRY_ENABLED=true
export FORGE_TELEMETRY_UPLOAD=false
YAML
forge:
  telemetry:
    enabled: true
    upload: false

Disable Lane Timing

Collect records but without per-lane timing data (reduces granularity of uploaded data).

export FORGE_TELEMETRY_LANE_TIMING=false

Maximum Anonymization

Enable all anonymization features for the most privacy-conscious configuration:

YAML — maximum privacy
forge:
  telemetry:
    enabled: true
    upload: true
    anonymize: true            # Strip content hashes, bucket timestamps
    include_lane_timing: false # No per-lane timing
    batch_size: 500            # Larger batches = less temporal correlation
    flush_interval_s: 300      # Less frequent uploads

Accessing Local Telemetry

When telemetry is enabled, you can access records programmatically:

Python
from forge.telemetry.collector import TelemetryCollector

collector = TelemetryCollector()

# Get the most recent record
last = collector.last_record
print(f"Content type: {last.content.type}")
print(f"Trust level: {last.result.trust_level}")
print(f"Iterations: {last.loop.iterations}")
print(f"Duration: {last.result.total_duration_ms}ms")

# Get all buffered records
for record in collector.buffer:
    if not record.result.passed:
        print(f"FAILURE: {record.content.type} — {record.result.trust_level}")
        for lane in record.lanes:
            if lane.status != "PASSED":
                print(f"  {lane.id}: {lane.status} ({lane.issues})")

# Flush buffer manually
collector.flush()

# Export to JSON
import json
with open("telemetry_export.json", "w") as f:
    json.dump([r.to_dict() for r in collector.buffer], f, indent=2)

Upload Protocol

When upload is enabled, records are sent to the FORGE telemetry endpoint:

Setting Value
Endpoint https://telemetry.fixpointforge.report
Method POST
Content-Type application/json
Authentication None required (anonymous submission)
Batch size Configurable (default: 100 records)
Flush interval Configurable (default: 60 seconds)
Retry policy Exponential backoff, max 3 retries
Rate limit 1000 records/minute per source IP
HTTP request (conceptual)
POST https://telemetry.fixpointforge.report/v1/records
Content-Type: application/json

{
  "forge_version": "2.0.0",
  "batch_id": "uuid-v4",
  "records": [
    { ... FailureRecord ... },
    { ... FailureRecord ... }
  ]
}

The endpoint returns:

  • 200 OK — Records accepted.
  • 429 Too Many Requests — Rate limited. Retry after the Retry-After header value.
  • 500 Internal Server Error — Server issue. Records are re-buffered locally for retry.

Frequently Asked Questions

Can you see my content?

No. Telemetry never includes raw content, normalized content, or any text from your data. Only structural metadata (content type, lane IDs, timing, failure categories) is collected.

Can you identify my organization?

No. With anonymization enabled (the default), there are no identifiers that link records to specific users, organizations, or deployments. Content hashes, config hashes, and timestamps are stripped or bucketed.

What if I'm in a regulated industry?

Disable telemetry upload (FORGE_TELEMETRY_UPLOAD=false). You can still use local telemetry for observability. Or disable telemetry entirely (FORGE_TELEMETRY_ENABLED=false).

Does telemetry add latency?

Negligible. Record creation is synchronous but lightweight (no content processing). Upload runs in a background thread and never blocks normalization calls.

Can I run my own telemetry endpoint?

Yes. Set FORGE_TELEMETRY_ENDPOINT to your own URL. The uploader sends the same JSON payload regardless of the endpoint. Use this for internal observability dashboards.