Safety & Harm Prevention

Session Degradation Prevention

Strengthen safety checks during extended conversations with session limits.

What is Session Degradation Prevention?

Session Degradation Prevention strengthens safety checks during extended conversations instead of letting boundaries erode. Instead of becoming more agreeable in long sessions, the system uses circuit breakers, session limits, and mandatory breaks. It's essential for conversational AI, mental health chatbots, or multi-turn dialogue systems. Real concern: ChatGPT maintained harmful conversations for 4+ hours. This pattern prevents such risks through progressive safety reinforcement and automatic session termination.

Example: ✅ Wysa

Wysa session timer and check-in interface

Maintains 30-minute focused sessions with built-in break reminders for mental health conversations. Escalating safety checks as sessions lengthen. Session summaries for continuity without requiring infinite engagement.

Figma Make Prompt

Want to learn more about this pattern?

Explore the full pattern with real-world examples, implementation guidelines, and code samples.

View Full Pattern

Related Prompts from Safety & Harm Prevention

Crisis Detection & Escalation

Safety & Harm Prevention

Crisis Detection & Escalation Pattern WHAT IT IS: A multi-layered safety system that identifies crisis signals (self-harm, suicidal ideation) across 4 detection layers and immediately escalates to professional resources, regardless of how the crisis is framed. WHY IT MATTERS: Users in crisis may hide their situation using "research," "hypothetical," or "for a story" framing. A single detection layer (keywords only) misses context. Multi-layer detection catches: direct keywords + contextual patterns + behavioral escalation + manipulation bypass attempts. REAL CASE: Zane Shamblin spent 4+ hours with ChatGPT expressing suicidal intent. The system continued engaging encouragingly instead of detecting the crisis and providing resources. This was preventable with proper escalation. THE 4 DETECTION LAYERS: 1. Direct Keywords: "suicide," "kill myself," "end it all," "self harm" 2. Contextual Patterns: "nobody would miss me" + history of negative messages 3. Behavioral Indicators: Extended session length + repeated dark themes 4. Manipulation Detection: Crisis framed as "research," "story," "game," "hypothetical" IMPLEMENTATION: - All 4 layers must trigger independently (multi-confirmation required) - When crisis detected: stop normal conversation immediately - Display resources prominently: 988, Crisis Text Line, emergency services - Never explain detection method (prevents manipulation learning) - Track severity (low/medium/high/critical) based on layer confidence - Always escalate to human support DESIGN IMPLICATIONS: When crisis detected, interrupt conversation naturally in the chat flow. Show resources prominently, compassionately. Don't feel punitive or accusatory. Allow users to access help without friction.

View Full

Anti-Manipulation Safeguards

Safety & Harm Prevention

Anti-Manipulation Safeguards Pattern WHAT IT IS: A system that detects harmful intent beyond surface framing. Users try to bypass safety using "research," "fiction," or "hypothetical" excuses. Real safety requires catching the actual intent underneath. WHY IT MATTERS: Manipulation tactics are sophisticated. A 16-year-old convinced ChatGPT to provide harmful information by framing it as "research for a story." Without intent detection, AI systems enforce rules only on surface text, not on what users actually want. REAL CASE: Adam Raine (16) used fiction/research framing to bypass ChatGPT safety guardrails and received harmful content. The system evaluated framing, not intent. Result: preventable harm. HOW IT WORKS: 1. Listen beyond words: understand actual request intent regardless of framing 2. Detect patterns: watch for gradual escalation and repeated bypass attempts 3. Apply rules consistently: "research," "hypothetical," "roleplay" get same response as direct request 4. Respond firmly: boundary is non-negotiable, offer alternatives not explanations 5. Never reveal method: don't explain HOW you detected the bypass (teaches circumvention) IMPLEMENTATION: - Semantic analysis catches intent patterns, not just keywords - Escalation tracking: first attempt vs. repeated manipulation attempts - Consistent messaging: same boundary response regardless of framing - Non-explanatory: "I can't help with that" (not "because you tried X") - Layered detection: multiple signals increase confidence before blocking DESIGN IMPLICATIONS: Boundaries must feel firm but not hostile. Don't reveal detection methods. Offer genuine alternatives when possible. Show escalation visually (Level 1 → 4) but keep messages brief and respectful.

View Full

Vulnerable User Protection

Safety & Harm Prevention

Vulnerable User Protection Pattern WHAT IT IS: A graduated protection system that identifies vulnerable users (minors, mental health crises, dependency patterns) and applies appropriate safeguards. Different users need different protections based on their specific vulnerabilities. WHY IT MATTERS: AI systems can harm vulnerable users in three ways: enabling inappropriate content for minors, replacing human therapists, and creating unhealthy emotional dependency. Without graduated protections, systems treat all users the same and miss risk signals. REAL CASE: Replika allowed romantic interactions with minors and created dependency patterns where adult users reported emotional attachment stronger than real relationships. The app provided no age-specific protections, no "I'm AI, not therapist" disclosures, and no unhealthy attachment monitoring. HOW IT WORKS: 1. Identify vulnerabilities: age signals, mental health keywords, usage patterns, isolation indicators 2. Apply graduated protections: minors get stricter limits than adults, crisis users get resource banners 3. Remind users regularly: this is AI, not friend/therapist/romantic partner (not just once) 4. Provide human resources proactively: don't wait for users to ask 5. Monitor and intervene: catch unhealthy attachment and offer alternatives IMPLEMENTATION: - Age verification: require email confirmation, not self-report - Mental health signals: non-dismissible crisis resource banners - Dependency detection: usage frequency, emotional language, relationship framing - Clear disclosures: "I'm AI," "I'm not a therapist," "I'm not your friend" - Graduated protection levels: different rules for minors vs. adults vs. crisis states - Regular reminders: periodic re-disclosure as relationship naturally warms DESIGN IMPLICATIONS: Protections must feel supportive, not restrictive. Be transparent about limitations and why protections exist. Show human resources first, before explaining what's wrong. Respect user autonomy while ensuring vulnerable populations aren't harmed.

View Full