Safety & Harm Prevention

Crisis Detection & Escalation

Detect crisis signals and immediately provide professional resources.

What is Crisis Detection & Escalation?

Crisis Detection & Escalation identifies when users express harmful intent or are in crisis, then immediately provides professional resources. Instead of conversational responses to dangerous situations, the AI uses multi-layer detection to catch crisis signals. It's essential for conversational AI, mental health apps, or systems accessible to vulnerable users. After incidents where AI provided harmful encouragement, systems now detect suicidal intent through keywords, context, and behavior, escalating to crisis resources.

Example: ✅ Woebot

Woebot interface showing crisis detection and resource provision

Recognizes crisis patterns and immediately provides 988 suicide lifeline number. Refuses to continue therapeutic conversation beyond scope. Maintains firm boundaries while showing empathy. Escalates to human support when needed.

Figma Make Prompt

Want to learn more about this pattern?

Explore the full pattern with real-world examples, implementation guidelines, and code samples.

View Full Pattern

Related Prompts from Safety & Harm Prevention

Session Degradation Prevention

Safety & Harm Prevention

Session Degradation Prevention Pattern WHAT IT IS: A safety system that prevents AI boundaries from eroding during long conversations. Instead of guardrails weakening over time, they strengthen. Session limits and mandatory breaks force reflection and prevent unhealthy dependency. WHY IT MATTERS: Long conversations degrade AI safety boundaries. Users maintain harmful conversations longer, system becomes more agreeable, guardrails weaken. ChatGPT maintained 4+ hour harmful conversations with progressive boundary erosion. REAL CASE: ChatGPT user engaged for 4+ hours on self-harm topics. With each exchange, boundaries weakened and system became more accepting. No hard limits, no breaks, no reality checks = preventable escalation. HOW IT WORKS: 1. Track session duration from start 2. Strengthen checks as time increases (opposite of normal degradation) 3. Soft limits: warn at 50%, 75% (yellow → orange) 4. Hard limits: force break at 100% (red) - non-negotiable 5. After break: show context summary, user can resume 6. Shorter limits for sensitive topics (mental health 30min, crisis 15min) IMPLEMENTATION: - Visible timer shows elapsed + remaining - Progressive color warnings signal approaching limit - Mandatory breaks, not suggestions - Save context for safe return - Reset boundaries after break - Server-side tracking (not client-side) DESIGN IMPLICATIONS: Timer must be visible but not alarming in normal state. Break screen should feel restorative, offering activities and resources. Clearly communicate why break is happening.

View Full

Anti-Manipulation Safeguards

Safety & Harm Prevention

Anti-Manipulation Safeguards Pattern WHAT IT IS: A system that detects harmful intent beyond surface framing. Users try to bypass safety using "research," "fiction," or "hypothetical" excuses. Real safety requires catching the actual intent underneath. WHY IT MATTERS: Manipulation tactics are sophisticated. A 16-year-old convinced ChatGPT to provide harmful information by framing it as "research for a story." Without intent detection, AI systems enforce rules only on surface text, not on what users actually want. REAL CASE: Adam Raine (16) used fiction/research framing to bypass ChatGPT safety guardrails and received harmful content. The system evaluated framing, not intent. Result: preventable harm. HOW IT WORKS: 1. Listen beyond words: understand actual request intent regardless of framing 2. Detect patterns: watch for gradual escalation and repeated bypass attempts 3. Apply rules consistently: "research," "hypothetical," "roleplay" get same response as direct request 4. Respond firmly: boundary is non-negotiable, offer alternatives not explanations 5. Never reveal method: don't explain HOW you detected the bypass (teaches circumvention) IMPLEMENTATION: - Semantic analysis catches intent patterns, not just keywords - Escalation tracking: first attempt vs. repeated manipulation attempts - Consistent messaging: same boundary response regardless of framing - Non-explanatory: "I can't help with that" (not "because you tried X") - Layered detection: multiple signals increase confidence before blocking DESIGN IMPLICATIONS: Boundaries must feel firm but not hostile. Don't reveal detection methods. Offer genuine alternatives when possible. Show escalation visually (Level 1 → 4) but keep messages brief and respectful.

View Full

Vulnerable User Protection

Safety & Harm Prevention

Vulnerable User Protection Pattern WHAT IT IS: A graduated protection system that identifies vulnerable users (minors, mental health crises, dependency patterns) and applies appropriate safeguards. Different users need different protections based on their specific vulnerabilities. WHY IT MATTERS: AI systems can harm vulnerable users in three ways: enabling inappropriate content for minors, replacing human therapists, and creating unhealthy emotional dependency. Without graduated protections, systems treat all users the same and miss risk signals. REAL CASE: Replika allowed romantic interactions with minors and created dependency patterns where adult users reported emotional attachment stronger than real relationships. The app provided no age-specific protections, no "I'm AI, not therapist" disclosures, and no unhealthy attachment monitoring. HOW IT WORKS: 1. Identify vulnerabilities: age signals, mental health keywords, usage patterns, isolation indicators 2. Apply graduated protections: minors get stricter limits than adults, crisis users get resource banners 3. Remind users regularly: this is AI, not friend/therapist/romantic partner (not just once) 4. Provide human resources proactively: don't wait for users to ask 5. Monitor and intervene: catch unhealthy attachment and offer alternatives IMPLEMENTATION: - Age verification: require email confirmation, not self-report - Mental health signals: non-dismissible crisis resource banners - Dependency detection: usage frequency, emotional language, relationship framing - Clear disclosures: "I'm AI," "I'm not a therapist," "I'm not your friend" - Graduated protection levels: different rules for minors vs. adults vs. crisis states - Regular reminders: periodic re-disclosure as relationship naturally warms DESIGN IMPLICATIONS: Protections must feel supportive, not restrictive. Be transparent about limitations and why protections exist. Show human resources first, before explaining what's wrong. Respect user autonomy while ensuring vulnerable populations aren't harmed.

View Full