Stateful Computer Use: Persistent GUI Memory for Autonomous Desktop Agents
Authors ekkOS Engineering
Abstract
Computer Use agents operate in a stateless screenshot-action loop — every session starts blind, repeating mistakes and rediscovering UI layouts. We introduce Stateful Computer Use, an ekkOS integration that gives desktop agents persistent memory across GUI sessions. By forging successful interaction patterns, tracking anti-patterns from misclicks, and enforcing safety directives during screen control, the system transforms Computer Use from a stateless tool into a learning agent that improves with every desktop interaction.
# Stateful Computer Use: Persistent GUI Memory for Desktop Agents
The Problem: Amnesia at the Desktop
Anthropic's Computer Use capability (March 2026) gives Claude the ability to control your desktop — clicking buttons, typing text, navigating apps, and completing multi-step GUI workflows. It works through a screenshot-action loop: capture the screen, reason about what's visible, execute an action, repeat.
But there's a fundamental limitation: Computer Use is stateless.
Every session starts from zero. Claude doesn't remember that:
This statelessness means Computer Use is slow, error-prone, and unable to learn from its own mistakes.
---
The Solution: ekkOS GUI Pattern Memory
ekkOS already solves the memory problem for code — patterns, anti-patterns, directives, and episodic recall persist across sessions. Stateful Computer Use extends this memory layer to GUI interactions.
Architecture
┌─────────────────────────────────────────────────────────┐
│ Claude Code + CU │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Pre-Task │ │ Screenshot │ │ Post-Task │ │
│ │ Search │───▶│ Action Loop │───▶│ Auto-Forge │ │
│ └──────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │
│ │ ekkOS Memory Layer │ │
│ ┌────▼──────────────────────────────────────▼────┐ │
│ │ │ │
│ │ GUI Patterns ─ Anti-Patterns ─ Directives │ │
│ │ App Knowledge ─ Workflow Prefs ─ Safety Rules │ │
│ │ │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘The Three Phases
Phase 1: Pre-Task Retrieval
Before Claude starts any Computer Use task, ekkOS searches for relevant GUI patterns:
ekkOS_Search("GUI pattern Safari form submission")
→ Pattern: [CU] Safari — Submit contact form on example.com
- Click sequence: (450, 320) Login → (300, 400) Form → (500, 450) Submit
- Anti-pattern: Cookie dialog blocks first click, dismiss at (700, 200) first
- Works when: Safari 18+, macOS Tahoe, light modeThis eliminates cold-start guessing and gives Claude a proven action path before it even takes the first screenshot.
Phase 2: Safety Directive Enforcement
ekkOS directives act as persistent guardrails during screen control:
| Directive | Type | Purpose | |-----------|------|---------| | Never interact with banking apps | NEVER | Financial safety | | Never enter passwords via CU | NEVER | Credential protection | | Always screenshot before form submit | MUST | Audit trail | | Confirm before Delete/Remove/Drop | MUST | Destructive action guard | | Prefer keyboard shortcuts | PREFER | Efficiency optimization | | Avoid elements < 20px | AVOID | Misclick prevention |
These directives persist across every session — Claude doesn't need to be reminded.
Phase 3: Post-Task Auto-Forge
After a successful Computer Use task, the system automatically forges the interaction as a reusable pattern:
Title: [CU] VS Code — Open terminal and run tests
Tags: [computer-use, gui-pattern, vs-code, auto-forged]
Action Sequence:
1. key: Cmd+` (toggle terminal)
2. wait: 500ms (terminal focus)
3. type: "npm test"
4. key: Enter
5. verify: "Tests passed" in terminal output
Works When:
- App: VS Code 1.96+
- OS: macOS Tahoe
- UI State: terminal was closed, sidebar open
Anti-patterns:
- Clicking View → Terminal menu is slower (3 actions vs 1 shortcut)
- Terminal panel sometimes opens as Panel, not integrated terminal---
The GUI Pattern Schema
We define a structured schema for Computer Use patterns that captures everything needed for reliable replay:
interface GUIPattern {
// Identity
title: string; // "[CU] {App} — {Action}"
tags: string[]; // ["computer-use", "gui-pattern", "{app}", ...]
// The Task
problem: string; // What the user wanted to accomplish
solution: string; // Successful action sequence
// Action Sequence (structured)
actions: {
type: 'click' | 'type' | 'key' | 'scroll' | 'wait' | 'navigate' | 'verify';
target?: string; // Description of UI element
coordinate?: [number, number];
text?: string;
duration?: number; // For waits
}[];
// Conditions
works_when: {
app: string; // Exact app name + version
os: string; // macOS version
resolution?: string; // Screen resolution
ui_state?: string; // Dark/light mode, sidebar state, etc.
};
// Learned Failures
anti_patterns: {
description: string;
failed_at_step?: number;
workaround?: string;
}[];
// Metrics (tracked by ekkOS Golden Loop)
success_rate: number;
applied_count: number;
last_verified: string; // ISO date
}---
Implementation
Hook Integration
The system integrates via Claude Code's hook architecture:
1. Stop Hook Enhancement — After each turn, the existing `stop.sh` hook calls `computer-use-forge.cjs` 2. Transcript Analysis — The script scans the JSONL transcript for computer use tool calls (`computer_20241022`, `computer_20250124`) 3. Pattern Extraction — Detects app name, action sequence, coordinates, and success/failure 4. Auto-Forge — Calls ekkOS API to store as a tagged GUI pattern 5. Silent Operation — Runs in background, no user interruption
Rules Integration
A Claude Code rules file (`~/.claude/rules/computer-use.md`) instructs Claude to:
---
Results: From Stateless to Learning
| Metric | Without ekkOS | With ekkOS | |--------|:-------------:|:----------:| | Cold-start accuracy | ~60% | ~90% (pattern-guided) | | Average misclicks per task | 3-5 | 0-1 (learned coordinates) | | Repeated mistakes | Every session | Once (anti-pattern forged) | | Task completion time | Baseline | ~40% faster (skip discovery) | | Safety violations | Possible | Directive-blocked | | Cross-session learning | None | Cumulative |
*Preliminary estimates based on internal testing. Formal benchmarks in progress.*
---
The Bigger Picture
Computer Use is the most "amnesiac" capability in the Claude stack. Every screenshot starts blind. By adding persistent memory, we transform it from a demo-impressive but operationally fragile tool into a learning desktop agent that genuinely improves over time.
This is the Golden Loop applied to a new surface area: Search → Apply → Forge → Improve. The same loop that makes ekkOS effective for code now works for GUI automation.
The long-term vision: an AI that knows your desktop better than you do — not because it was programmed with your preferences, but because it learned them through experience and never forgot.
---