Managing Conversation Context
Context management is how a chat remembers what was said five messages ago. Every AI model has a context window — GPT-4 holds ~128K tokens, Claude ~200K — but the design work is deciding what to keep verbatim, summarize, or drop without users noticing.
How AI Context Windows Work
Every AI model has a context window - the maximum amount of text it can "see" at once. GPT-4 handles ~128K tokens, Claude handles ~200K tokens. But larger context doesn't mean you should dump everything in.
System Prompt
Keep the personality and instructions - always included in every request.
Recent Messages
Include the last N messages verbatim for immediate context.
Compressed History
Summarize older messages to preserve context without using all the tokens.
Retrieved Context (RAG)
Inject relevant search results or document snippets when the user references specific information.
This sliding window approach keeps responses relevant without hitting token limits or increasing latency.
Designing for Multi-Turn Conversations
Users expect conversational AI to handle:
Pronoun resolution: "Write a function to sort a list" → "Now make it handle duplicates" - the AI must know "it" refers to the sorting function.
Topic branching: A user discusses topic A, switches to topic B, then says "going back to what we were discussing before" - the AI should recall topic A.
Correction: "Actually, I meant Python not JavaScript" - the AI should reinterpret the previous request without re-asking for all the other details.
Design your context passing to preserve these conversation dynamics. The simplest approach: always send the full conversation history (up to the context limit) as the messages array to the API.
Conversation History UX
For products with persistent conversations, design the history experience:
Sidebar list: Show past conversations with auto-generated titles (ChatGPT pattern). The title should be derived from the first user message or the conversation topic, not "Conversation 1, Conversation 2."
Search: Let users search across all conversations. This is table stakes for any product where users will accumulate 50+ conversations.
Branching: Some products (like ChatGPT) allow editing a previous message and regenerating from that point, creating a conversation branch. This is powerful but adds UI complexity - start without it.
Selective memory — make what you retain visible
Context is a system-design concern. Memory is a UX concern. Users must always be able to answer three questions about the AI in front of them: what can it see right now, what is it retaining for later, and how do I see or change that?
When products get this wrong, the failure mode isn't usually "the AI remembered something it shouldn't" — it's "the AI retained state and never surfaced it." Users feel watched rather than helped, and the product gets pulled from the default surface. A well-designed conversational UI makes memory legible, not silent.
- Notion AI Meeting Notes
- Sees audio + transcript during the session. Retains session-scoped context only. Shows a real-time waveform while listening, and a post-session panel with full transcript + conclusions + sources. Disclosure is continuous, not upfront.
- Microsoft Copilot (screen AI)
- Sees the current screen and voice input. Retains context across sessions. Only reveals context during active conversation — when the chat closes, the retention becomes invisible. This is the gap that led to the March 2026 feature rollback.
- Meta internal agent
- Saw engineer data beyond authorized scope. Retained indefinitely. Provided no visible permission boundary. An extreme case, but a useful contrast: when the disclosure surface is missing entirely, "capability" becomes surveillance.
If the AI remembers something between turns or between sessions, say so — in the UI, not in the terms of service. Every retention you don't surface becomes a trust debt, and users cannot tell the difference between "remembers helpfully" and "watches quietly" without a visible signal.
Related AIUX patterns: The [Context Switching](/patterns/context-switching) pattern covers how to help users move between multiple AI conversations without losing state. [Selective Memory](/patterns/selective-memory) addresses giving users control over what the AI remembers across sessions. For preventing quality degradation in long conversations, see [Session Degradation Prevention](/patterns/session-degradation-prevention).