ekkOS Labs // research division
Agent MemoryIn ProgressMarch 30, 2026

Stateful Computer Use: Persistent GUI Memory for Autonomous Desktop Agents

Authors ekkOS Engineering

Abstract

Computer Use agents operate in a stateless screenshot-action loop — every session starts blind, repeating mistakes and rediscovering UI layouts. We introduce Stateful Computer Use, an ekkOS integration that gives desktop agents persistent memory across GUI sessions. By forging successful interaction patterns, tracking anti-patterns from misclicks, and enforcing safety directives during screen control, the system transforms Computer Use from a stateless tool into a learning agent that improves with every desktop interaction.

# Stateful Computer Use: Persistent GUI Memory for Desktop Agents

The Problem: Amnesia at the Desktop

Anthropic's Computer Use capability (March 2026) gives Claude the ability to control your desktop — clicking buttons, typing text, navigating apps, and completing multi-step GUI workflows. It works through a screenshot-action loop: capture the screen, reason about what's visible, execute an action, repeat.

But there's a fundamental limitation: Computer Use is stateless.

Every session starts from zero. Claude doesn't remember that:

  • The export button in Figma moved after the last update
  • Your VS Code uses a custom keybinding for the terminal
  • Safari's cookie consent dialog appears on first visit to every new domain
  • Last time it tried to click a small dropdown, it misclicked 3 times before succeeding
  • This statelessness means Computer Use is slow, error-prone, and unable to learn from its own mistakes.

    ---

    The Solution: ekkOS GUI Pattern Memory

    ekkOS already solves the memory problem for code — patterns, anti-patterns, directives, and episodic recall persist across sessions. Stateful Computer Use extends this memory layer to GUI interactions.

    Architecture

    ┌─────────────────────────────────────────────────────────┐
    │                    Claude Code + CU                      │
    │                                                          │
    │  ┌──────────┐    ┌──────────────┐    ┌──────────────┐   │
    │  │ Pre-Task  │    │  Screenshot  │    │  Post-Task   │   │
    │  │  Search   │───▶│  Action Loop │───▶│  Auto-Forge  │   │
    │  └──────────┘    └──────────────┘    └──────────────┘   │
    │       │                                      │           │
    │       │         ekkOS Memory Layer           │           │
    │  ┌────▼──────────────────────────────────────▼────┐     │
    │  │                                                 │     │
    │  │  GUI Patterns ─ Anti-Patterns ─ Directives     │     │
    │  │  App Knowledge ─ Workflow Prefs ─ Safety Rules  │     │
    │  │                                                 │     │
    │  └─────────────────────────────────────────────────┘     │
    └─────────────────────────────────────────────────────────┘

    The Three Phases

    Phase 1: Pre-Task Retrieval

    Before Claude starts any Computer Use task, ekkOS searches for relevant GUI patterns:

    ekkOS_Search("GUI pattern Safari form submission")
    → Pattern: [CU] Safari — Submit contact form on example.com
      - Click sequence: (450, 320) Login → (300, 400) Form → (500, 450) Submit
      - Anti-pattern: Cookie dialog blocks first click, dismiss at (700, 200) first
      - Works when: Safari 18+, macOS Tahoe, light mode

    This eliminates cold-start guessing and gives Claude a proven action path before it even takes the first screenshot.

    Phase 2: Safety Directive Enforcement

    ekkOS directives act as persistent guardrails during screen control:

    | Directive | Type | Purpose | |-----------|------|---------| | Never interact with banking apps | NEVER | Financial safety | | Never enter passwords via CU | NEVER | Credential protection | | Always screenshot before form submit | MUST | Audit trail | | Confirm before Delete/Remove/Drop | MUST | Destructive action guard | | Prefer keyboard shortcuts | PREFER | Efficiency optimization | | Avoid elements < 20px | AVOID | Misclick prevention |

    These directives persist across every session — Claude doesn't need to be reminded.

    Phase 3: Post-Task Auto-Forge

    After a successful Computer Use task, the system automatically forges the interaction as a reusable pattern:

    Title: [CU] VS Code — Open terminal and run tests
    Tags: [computer-use, gui-pattern, vs-code, auto-forged]
    
    Action Sequence:
    1. key: Cmd+` (toggle terminal)
    2. wait: 500ms (terminal focus)
    3. type: "npm test"
    4. key: Enter
    5. verify: "Tests passed" in terminal output
    
    Works When:
    - App: VS Code 1.96+
    - OS: macOS Tahoe
    - UI State: terminal was closed, sidebar open
    
    Anti-patterns:
    - Clicking View → Terminal menu is slower (3 actions vs 1 shortcut)
    - Terminal panel sometimes opens as Panel, not integrated terminal

    ---

    The GUI Pattern Schema

    We define a structured schema for Computer Use patterns that captures everything needed for reliable replay:

    interface GUIPattern {
      // Identity
      title: string;           // "[CU] {App} — {Action}"
      tags: string[];          // ["computer-use", "gui-pattern", "{app}", ...]
    
      // The Task
      problem: string;         // What the user wanted to accomplish
      solution: string;        // Successful action sequence
    
      // Action Sequence (structured)
      actions: {
        type: 'click' | 'type' | 'key' | 'scroll' | 'wait' | 'navigate' | 'verify';
        target?: string;       // Description of UI element
        coordinate?: [number, number];
        text?: string;
        duration?: number;     // For waits
      }[];
    
      // Conditions
      works_when: {
        app: string;           // Exact app name + version
        os: string;            // macOS version
        resolution?: string;   // Screen resolution
        ui_state?: string;     // Dark/light mode, sidebar state, etc.
      };
    
      // Learned Failures
      anti_patterns: {
        description: string;
        failed_at_step?: number;
        workaround?: string;
      }[];
    
      // Metrics (tracked by ekkOS Golden Loop)
      success_rate: number;
      applied_count: number;
      last_verified: string;   // ISO date
    }

    ---

    Implementation

    Hook Integration

    The system integrates via Claude Code's hook architecture:

    1. Stop Hook Enhancement — After each turn, the existing `stop.sh` hook calls `computer-use-forge.cjs` 2. Transcript Analysis — The script scans the JSONL transcript for computer use tool calls (`computer_20241022`, `computer_20250124`) 3. Pattern Extraction — Detects app name, action sequence, coordinates, and success/failure 4. Auto-Forge — Calls ekkOS API to store as a tagged GUI pattern 5. Silent Operation — Runs in background, no user interruption

    Rules Integration

    A Claude Code rules file (`~/.claude/rules/computer-use.md`) instructs Claude to:

  • Search ekkOS before every Computer Use task
  • Apply retrieved GUI patterns instead of guessing
  • Check `ekkOS_Conflict` before interacting with sensitive apps
  • Forge new patterns after successful tasks
  • Record anti-patterns from failures
  • ---

    Results: From Stateless to Learning

    | Metric | Without ekkOS | With ekkOS | |--------|:-------------:|:----------:| | Cold-start accuracy | ~60% | ~90% (pattern-guided) | | Average misclicks per task | 3-5 | 0-1 (learned coordinates) | | Repeated mistakes | Every session | Once (anti-pattern forged) | | Task completion time | Baseline | ~40% faster (skip discovery) | | Safety violations | Possible | Directive-blocked | | Cross-session learning | None | Cumulative |

    *Preliminary estimates based on internal testing. Formal benchmarks in progress.*

    ---

    The Bigger Picture

    Computer Use is the most "amnesiac" capability in the Claude stack. Every screenshot starts blind. By adding persistent memory, we transform it from a demo-impressive but operationally fragile tool into a learning desktop agent that genuinely improves over time.

    This is the Golden Loop applied to a new surface area: Search → Apply → Forge → Improve. The same loop that makes ekkOS effective for code now works for GUI automation.

    The long-term vision: an AI that knows your desktop better than you do — not because it was programmed with your preferences, but because it learned them through experience and never forgot.

    ---

    Status & Availability

  • Research Preview — Available now for ekkOS users with Claude Pro/Max
  • Components — Hook script, rules file, and safety directives ship with ekkOS CLI
  • Requirements — Claude Code with Computer Use enabled (macOS), ekkOS memory connected
  • Roadmap — Structured action replay, visual element hashing, cross-user collective GUI patterns
  • ekkOS Labs — Cognitive Research for Intelligent Agents