In the previous guide, we built a three-layer compaction strategy that lets the agent run indefinitely. That’s a major capability — but it comes with a cost. Compaction is lossy. When auto-compact fires, the agent’s entire conversation history collapses into a two-message summary. The gist survives — what was accomplished, which files were touched, key decisions — but the specifics vanish. If the agent was halfway through a twelve-step refactoring plan, the summary might preserve “refactoring in progress” while losing exactly which steps are done, which are blocked, and what comes next. The agent’s plan evaporates along with the context that held it.
This is a different class of problem from what we’ve tackled before. The agent doesn’t need a better compression algorithm — it needs state that lives outside the context window entirely. State on disk. If a task is written to a JSON file in a .tasks/ directory, no amount of compaction can erase it. The filesystem becomes the agent’s durable memory — a place to store plans that survive compression, restarts, and arbitrarily long sessions.
In this guide, let’s build a TaskManager that persists tasks as individual JSON files, wires dependency edges between them, and cascades status changes through the graph. Four new tools give the model CRUD access to a task DAG that outlasts the conversation itself.
The complete source code for this stage is available at the 07-task-system tag on GitHub. Code blocks below show key excerpts.
File-per-entity persistence
The core idea is simple: each task is a standalone JSON file. The .tasks/ directory is the database, and FileManager is the query engine. Here’s what the directory looks like after the agent plans a multi-step feature:
.tasks/
task_1.json {"id": 1, "status": "completed", "subject": "Parse config"}
task_2.json {"id": 2, "status": "pending", "blockedBy": [1]}
task_3.json {"id": 3, "status": "pending", "blockedBy": [1]}
task_4.json {"id": 4, "status": "pending", "blockedBy": [2, 3]}
Tasks 2 and 3 depend on task 1. Task 4 depends on both 2 and 3. When the agent completes task 1, its ID is automatically removed from every other task’s blockedBy list — tasks 2 and 3 become unblocked and ready to execute. When both 2 and 3 are eventually completed, task 4 unblocks. This is a directed acyclic graph encoded as bidirectional edges: blockedBy points upstream (what blocks me), blocks points downstream (what I unblock when done).
The file-per-entity approach has a key advantage over a single tasks.json file: operations on one task never risk corrupting another. A failed write to task_3.json leaves tasks 1, 2, and 4 untouched. And because each file is a complete Codable struct, there’s no parsing ambiguity — JSONDecoder either succeeds or it doesn’t.
The AgentTask model
Let’s start with the data model. TaskStatus is a raw-value enum that mirrors the lifecycle the model sees in tool descriptions — pending, in_progress, completed — with a display marker for the list view:
// Sources/Core/TaskManager.swift
public enum TaskStatus: String, Sendable, Equatable, Codable {
case pending
case inProgress = "in_progress"
case completed
public var marker: String {
switch self {
case .pending: "[ ]"
case .inProgress: "[>]"
case .completed: "[x]"
}
}
}
The AgentTask struct captures everything the model needs to reason about a task — its identity, what it does, where it stands, and how it relates to other tasks:
public struct AgentTask: Sendable, Equatable, Codable {
public let id: Int
public let subject: String
public let description: String
public fileprivate(set) var status: TaskStatus
public fileprivate(set) var blockedBy: [Int]
public fileprivate(set) var blocks: [Int]
public let owner: String
}
The fileprivate(set) on status, blockedBy, and blocks means only code within TaskManager.swift can mutate these fields. External code sees them as read-only — the struct’s API surface is narrow by design. The owner field anticipates multi-agent work in later stages, where tasks might be assigned to specific subagents.
TaskManager: CRUD and auto-incrementing IDs
TaskManager owns the .tasks/ directory and provides the CRUD operations that tool handlers call. The initializer creates the directory if needed and recovers the next available ID by scanning existing files:
public final class TaskManager {
private let directory: String
private var nextId: Int
public init(directory: String) {
self.directory = directory
let fm = FileManager.default
if !fm.fileExists(atPath: directory) {
try? fm.createDirectory(
atPath: directory,
withIntermediateDirectories: true
)
}
self.nextId = Self.maxId(in: directory) + 1
}
}
That maxId scan is what makes IDs survive restarts. It parses task_N.json filenames, extracts the integer from each, and takes the maximum. If the .tasks/ directory contains task_1.json through task_5.json, nextId starts at 6 — regardless of whether the agent process was restarted, the context was compacted, or even the machine rebooted between sessions. Files that don’t match the naming convention are silently skipped.
Creating a task follows the expected pattern — build the struct, write it, bump the counter:
public func create(subject: String, description: String = "") throws -> String {
let task = AgentTask(id: nextId, subject: subject, description: description)
let json = try saveAndSerialize(task)
nextId += 1
return json
}
The method returns pretty-printed JSON so the model sees exactly what was persisted. Every mutation method follows this pattern — perform the operation, return the serialized result as a tool response.
Dependency resolution: cascading unblock
The interesting mechanism is what happens when a task completes. If task 1’s blocks array contains [2, 3], completing task 1 needs to remove 1 from both task 2’s and task 3’s blockedBy arrays. This is removeCompletedDependency — the cascading unblock:
private func removeCompletedDependency(for completedId: Int) {
let fm = FileManager.default
guard let files = try? fm.contentsOfDirectory(atPath: directory) else {
return
}
for file in files where file.hasPrefix("task_") && file.hasSuffix(".json") {
let path = (directory as NSString).appendingPathComponent(file)
guard
let data = fm.contents(atPath: path),
var task = try? JSONDecoder().decode(AgentTask.self, from: data)
else {
continue
}
if task.blockedBy.contains(completedId) {
task.blockedBy.removeAll { $0 == completedId }
try? save(task)
}
}
}
The method scans every task file in the directory, checks whether it references the completed ID, and removes that reference if present. The cascade is triggered inside update when the status changes to .completed:
public func update(
taskId: Int,
status: String? = nil,
addBlockedBy: [Int] = [],
addBlocks: [Int] = []
) throws -> String {
var task = try load(taskId)
if let status {
guard let newStatus = TaskStatus(rawValue: status) else {
throw TaskError.invalidStatus(status)
}
task.status = newStatus
}
try applyBlockedBy(addBlockedBy, to: &task)
try applyBlocks(addBlocks, to: &task)
let json = try saveAndSerialize(task)
if task.status == .completed {
removeCompletedDependency(for: task.id)
}
return json
}
The ordering matters: save the updated task first, then cascade. If the cascade fails partway through, the completing task is still correctly marked as completed — only some dependents might retain a stale blockedBy entry, which is recoverable.
Wiring into the agent
With TaskManager ready, let’s connect it. The agent creates the manager alongside its other dependencies, and the system prompt gains a line telling the model that task tools exist and survive compaction:
// Sources/Core/Agent.swift
self.taskManager = TaskManager(directory: "\(workingDirectory)/.tasks")
// In buildSystemPrompt:
- Use task tools for persistent multi-step work with dependencies. \
Tasks survive context compaction and process restarts.
Four tool definitions go into the toolDefinitions array. Here’s task_create — the most representative:
ToolDefinition(
name: "task_create",
description: "Create a persistent task. Tasks survive context compaction and process restarts.",
inputSchema: .object([
"type": "object",
"properties": .object([
"subject": .object([
"type": "string",
"description": "Short title for the task"
]),
"description": .object([
"type": "string",
"description": "Detailed description of the task"
])
]),
"required": .array(["subject"])
])
),
The handlers follow the same guard-extract, do/catch, return-Result pattern as every other tool:
private func executeTaskCreate(_ input: JSONValue) async -> Result<String, ToolError> {
guard let subject = input["subject"]?.stringValue else {
return .failure(.missingParameter("subject"))
}
let description = input["description"]?.stringValue ?? ""
do {
let result = try taskManager.create(subject: subject, description: description)
return .success(result)
} catch {
return .failure(.executionFailed("\(error)"))
}
}
And the dispatch map grows by four entries:
let handlers = [
"bash": executeBash,
"read_file": executeReadFile,
"write_file": executeWriteFile,
"edit_file": executeEditFile,
"todo": executeTodo,
"agent": executeAgent,
"load_skill": executeLoadSkill,
"compact": executeCompact,
"task_create": executeTaskCreate,
"task_update": executeTaskUpdate,
"task_list": executeTaskList,
"task_get": executeTaskGet
]
Subagents get read-only access — task_list and task_get — but can’t create or modify tasks. The LoopConfig.subagent denylist excludes task_create and task_update:
static let subagent = LoopConfig(
tools: Agent.toolDefinitions.filter {
!Set(["agent", "todo", "compact", "task_create", "task_update"]).contains($0.name)
},
...
)
A subagent can check the task board to understand what work is planned, but only the main agent can change it. This keeps task ownership unambiguous — the same principle that keeps subagents from firing nag reminders or compressing the parent’s history.
With that in place, we now have twelve tools and a persistent planning layer. The TodoManager still serves its original purpose — quick in-session checklists for simple tasks — while TaskManager handles structured multi-step work with explicit dependencies.
Taking it for a spin
Let’s build and run:
swift build && swift run claude
Try: Plan a refactoring with 4 tasks: "Parse AST", "Transform nodes", "Emit output", "Run tests". Transform and Emit can run in parallel after Parse. Tests wait for both. Watch the tool calls — the agent should create four tasks, then wire dependencies: tasks 2 and 3 blocked by task 1, task 4 blocked by tasks 2 and 3.
Then: List all tasks. The output should show the dependency graph with markers:
[ ] 1: Parse AST
[ ] 2: Transform nodes (blocked by: 1)
[ ] 3: Emit output (blocked by: 1)
[ ] 4: Run tests (blocked by: 2, 3)
Now complete task 1: Mark task 1 as completed and list tasks again. The cascade fires — tasks 2 and 3 lose their blockedBy reference and become ready. Task 4 still waits for both.
For a longer session, try asking the agent to plan a real feature — something with eight or ten steps and genuine ordering constraints. Then trigger context compaction by reading a bunch of large files. After compaction fires, ask the agent to list all tasks — the full plan is still there, intact on disk, even though the conversation history was summarized down to two messages.
Durable state, same loop
We now have an agent with durable planning. Tasks persist as JSON files that survive compaction, restarts, and arbitrarily long sessions. The dependency graph — blockedBy upstream, blocks downstream — gives the model a way to express ordering and parallelism. Cascading unblock on completion means the model doesn’t need to manually track which tasks become ready; it just marks work as done and the graph updates itself.
The TodoManager and TaskManager coexist deliberately. Todos are fast and ephemeral — a scratchpad for single-session work. Tasks are structured and persistent — a plan that outlasts the conversation. The model learns when to use which from the system prompt, and in practice it reaches for task_create when the work has dependencies and todo when it doesn’t.
The loop is still the invariant. Four new entries in the dispatch dictionary, four new handler methods, one new type — and the while true kernel that drives everything hasn’t changed since the first guide. In the next guide, we’ll tackle a natural follow-up: some of these tasks take a long time to execute. Background tasks let the agent kick off slow work and keep going while it runs. Thanks for reading!