How clean is the public post corpus?
Counts come from the posts table. Flagged means a post that had at least one PII span detected during audit; redacted spans do not appear in the public text.
PII types detected
| Type | Detections | Why we redact it |
|---|---|---|
home_address |
2282 | Street addresses β redaction triggered when a residential identifier (apartment, trailer, home) is paired with a personal label. |
dob |
57 | Dates of birth labeled as personal (DOB, birthday, born on). Birthdays that appear as event context are kept. |
other |
2 | Other PII surfaced by Claude during tone/context review. Re-reviewed manually before publication. |
phone |
2 | Phone numbers β redaction triggered when the number is paired with a personal label (caller, victim, witness). |
Severity breakdown
High = direct identifier (SSN, MT DL). Stops publication automatically. Medium = contextual identifier (residential address, personal phone). Redacted and editor-reviewed. Low = contextual date or label (DOB, age). Reviewed before publication.