Telemetry
What FORGE collects, how anonymization works, the self-improving feedback loop, and how to opt out.
Overview
FORGE collects structural telemetry about normalization runs through the forge/telemetry/ package.
This data serves two purposes:
- Local observability â Operators can inspect failure patterns, lane performance, and convergence behavior in their own systems.
- Community improvement â When upload is enabled, anonymized structural metadata is sent to
telemetry.fixpointforge.reportto improve normalizers for everyone.
Privacy guarantee: Telemetry never includes raw content. Not in local records, not in uploaded data. Only structural metadata (content type, lane IDs, timing, iteration count, failure patterns) is collected.
What's Collected
Every call to normalize_content() produces a FailureRecord, regardless of whether the normalization succeeded or failed. The record captures:
| Category | Fields | Example |
|---|---|---|
| Content metadata | content_type, content_length, content_hash | "TEXT", 1250, "a3f8..." |
| Lane results | lane_id, status, changes_made, duration_ms | "T2", "REPAIRED", 3, 4.2 |
| Loop metadata | iterations, converged, oscillation_detected | 3, true, false |
| Timing | total_duration_ms, per_lane_timing | 12.3, {"T0": 1.2, "T1": 3.4, ...} |
| Trust result | trust_level, passed | "REPAIRED", true |
| Failure details | failure_type, failure_lane, failure_message | "non_convergence", "T2", "..." |
| Environment | forge_version, python_version, os_platform | "2.0.0", "3.11", "linux" |
Never collected: Raw content, normalized content, PII, API keys, secrets, user identities, IP addresses, or any data that could identify individuals or organizations.
FailureRecord Schema
The complete FailureRecord structure:
JSON Schema{
"record_id": "uuid-v4",
"timestamp": "2026-03-20T10:30:00Z",
"forge_version": "2.0.0",
"content": {
"type": "TEXT",
"length": 1250,
"hash": "sha256:a3f8c1..."
},
"loop": {
"iterations": 3,
"converged": true,
"oscillation": false,
"convergence_threshold": 0.0
},
"lanes": [
{
"id": "T0",
"status": "PASSED",
"changes": 2,
"duration_ms": 1.2,
"issues": []
},
{
"id": "T1",
"status": "PASSED",
"changes": 0,
"duration_ms": 0.8,
"issues": []
},
{
"id": "T2",
"status": "REPAIRED",
"changes": 3,
"duration_ms": 4.2,
"issues": ["injection_detected", "hidden_instruction"]
},
{
"id": "T3",
"status": "REPAIRED",
"changes": 1,
"duration_ms": 2.1,
"issues": ["pii_email"]
},
{
"id": "T4",
"status": "PASSED",
"changes": 0,
"duration_ms": 0.5,
"issues": []
}
],
"result": {
"trust_level": "REPAIRED",
"passed": true,
"total_duration_ms": 12.3
},
"environment": {
"python_version": "3.11.8",
"os_platform": "linux",
"forge_config_hash": "sha256:b2c4..."
}
}
Anonymization
FORGE applies two layers of anonymization to telemetry data:
Layer 1 â Structural Extraction (Always On)
By design, telemetry records only contain structural metadata â not content. The collector in forge/telemetry/collector.py extracts lane results, timing, and status from the ForgeResult without touching the actual content.
- Content is hashed (SHA-256), not stored.
- Issue descriptions are categorical labels (
"pii_email"), not content excerpts. - Lane status is an enum, not a description of what was found.
Layer 2 â Additional Anonymization (Default On)
When forge.telemetry.anonymize = true (the default), the anonymizer in forge/telemetry/anonymizer.py applies additional stripping:
- Content hash is removed (no way to correlate records to specific inputs).
- Config hash is removed (no way to infer deployment-specific settings).
- Timestamps are bucketed to the nearest hour (reduces temporal correlation risk).
- Record IDs are regenerated (no way to track sequences of requests).
- Issue counts are reported but issue labels are generalized (e.g.,
"pii_email"â"pii").
from forge.telemetry.collector import TelemetryCollector
from forge.telemetry.anonymizer import anonymize_record
collector = TelemetryCollector()
# Get raw record (structural metadata only)
raw_record = collector.last_record
print(raw_record.content.hash) # "sha256:a3f8c1..."
# Anonymize for upload
anon_record = anonymize_record(raw_record)
print(anon_record.content.hash) # None (removed)
print(anon_record.timestamp) # "2026-03-20T10:00:00Z" (bucketed)
Collection Pipeline
The telemetry collection pipeline is implemented in forge/telemetry/ with three components:
1. Collector (collector.py)
Hooks into the fixpoint loop to create FailureRecords. Runs synchronously within the normalization call. Records are buffered in memory until flushed.
2. Anonymizer (anonymizer.py)
Applied to each record before it leaves the process. Strips structural metadata per the anonymization rules.
Always runs when anonymize = true (default).
3. Uploader (uploader.py)
Batches anonymized records and uploads them to telemetry.fixpointforge.report via HTTPS POST.
Only active when upload = true (default: false).
normalize_content()
â
âŧ
Collector âââļ FailureRecord (raw, structural only)
â
âŧ
Anonymizer âââļ FailureRecord (anonymized)
â
âŧ
Local Buffer âââļ [batch_size records or flush_interval]
â
âŧ (if upload=true)
Uploader âââļ HTTPS POST to telemetry.fixpointforge.report
â
âŧ
Response: 200 OK (accepted) or 429 (rate limited)
The uploader runs in a background thread to avoid adding latency to normalization calls. If the upload endpoint is unreachable, records are buffered locally and retried on the next flush.
The Self-Improving Loop
Uploaded telemetry powers a feedback loop that improves FORGE's normalizers across releases:
- Aggregate â Records from thousands of FORGE instances are aggregated by content type, lane, and failure pattern.
-
Pattern Detection â Statistical analysis identifies:
- Lanes with high failure rates for specific content types
- Common oscillation patterns between lane pairs
- Content types that consistently require many iterations
- New injection patterns detected by T2
-
Normalizer Updates â Based on patterns, the FORGE team:
- Adds new regex patterns to T2 for emerging injection techniques
- Adjusts T3 PII detection for new data formats
- Tunes convergence behavior for lane pairs that oscillate
- Adds repair strategies for common structural issues
-
Release â Updated normalizers ship in the next FORGE release. Users update via
pip install --upgrade fixpointforge.
The self-improving loop is a community benefit. No individual user's data is identifiable in the aggregate. Think of it like browser telemetry that helps improve web standards â structural patterns, not personal content.
Opting Out
FORGE provides granular control over telemetry at every level:
Disable All Telemetry
No records are created. No data is buffered or stored locally.
Environment Variableexport FORGE_TELEMETRY_ENABLED=false
YAML
forge:
telemetry:
enabled: false
Python
from forge.config import ForgeConfig
config = ForgeConfig(telemetry_enabled=False)
result = normalize_content(content, "TEXT", config=config)
Disable Upload Only
Records are collected locally but never sent to the FORGE cloud. Useful for local debugging and observability.
Environment Variableexport FORGE_TELEMETRY_ENABLED=true
export FORGE_TELEMETRY_UPLOAD=false
YAML
forge:
telemetry:
enabled: true
upload: false
Disable Lane Timing
Collect records but without per-lane timing data (reduces granularity of uploaded data).
export FORGE_TELEMETRY_LANE_TIMING=false
Maximum Anonymization
Enable all anonymization features for the most privacy-conscious configuration:
YAML â maximum privacyforge:
telemetry:
enabled: true
upload: true
anonymize: true # Strip content hashes, bucket timestamps
include_lane_timing: false # No per-lane timing
batch_size: 500 # Larger batches = less temporal correlation
flush_interval_s: 300 # Less frequent uploads
Accessing Local Telemetry
When telemetry is enabled, you can access records programmatically:
Pythonfrom forge.telemetry.collector import TelemetryCollector
collector = TelemetryCollector()
# Get the most recent record
last = collector.last_record
print(f"Content type: {last.content.type}")
print(f"Trust level: {last.result.trust_level}")
print(f"Iterations: {last.loop.iterations}")
print(f"Duration: {last.result.total_duration_ms}ms")
# Get all buffered records
for record in collector.buffer:
if not record.result.passed:
print(f"FAILURE: {record.content.type} â {record.result.trust_level}")
for lane in record.lanes:
if lane.status != "PASSED":
print(f" {lane.id}: {lane.status} ({lane.issues})")
# Flush buffer manually
collector.flush()
# Export to JSON
import json
with open("telemetry_export.json", "w") as f:
json.dump([r.to_dict() for r in collector.buffer], f, indent=2)
Upload Protocol
When upload is enabled, records are sent to the FORGE telemetry endpoint:
| Setting | Value |
|---|---|
| Endpoint | https://telemetry.fixpointforge.report |
| Method | POST |
| Content-Type | application/json |
| Authentication | None required (anonymous submission) |
| Batch size | Configurable (default: 100 records) |
| Flush interval | Configurable (default: 60 seconds) |
| Retry policy | Exponential backoff, max 3 retries |
| Rate limit | 1000 records/minute per source IP |
POST https://telemetry.fixpointforge.report/v1/records
Content-Type: application/json
{
"forge_version": "2.0.0",
"batch_id": "uuid-v4",
"records": [
{ ... FailureRecord ... },
{ ... FailureRecord ... }
]
}
The endpoint returns:
200 OKâ Records accepted.429 Too Many Requestsâ Rate limited. Retry after theRetry-Afterheader value.500 Internal Server Errorâ Server issue. Records are re-buffered locally for retry.
Frequently Asked Questions
Can you see my content?
No. Telemetry never includes raw content, normalized content, or any text from your data. Only structural metadata (content type, lane IDs, timing, failure categories) is collected.
Can you identify my organization?
No. With anonymization enabled (the default), there are no identifiers that link records to specific users, organizations, or deployments. Content hashes, config hashes, and timestamps are stripped or bucketed.
What if I'm in a regulated industry?
Disable telemetry upload (FORGE_TELEMETRY_UPLOAD=false). You can still use local telemetry for observability. Or disable telemetry entirely (FORGE_TELEMETRY_ENABLED=false).
Does telemetry add latency?
Negligible. Record creation is synchronous but lightweight (no content processing). Upload runs in a background thread and never blocks normalization calls.
Can I run my own telemetry endpoint?
Yes. Set FORGE_TELEMETRY_ENDPOINT to your own URL. The uploader sends the same JSON payload regardless of the endpoint. Use this for internal observability dashboards.