Building a Coding Agent in Swift, Part 3: Self-Managed Task Tracking

Our agent can run commands, read files, write files, and edit code — all chained together automatically within a single prompt. Ask it to scaffold a module with three source files and a config, and it’ll happily bash and write_file its way through the whole thing. But ask it to refactor a codebase in ten steps, and something interesting happens: it nails steps one through three, starts to drift around step five, and by step seven it’s improvising. The plan it had at the beginning has faded into the growing sea of tool calls and results filling the context window.

This is a well-known property of language models called instruction-following decay. As a conversation grows longer, the system prompt and the original intent carry less weight relative to the mass of recent content. The model doesn’t forget in the human sense — it just pays less attention. For a coding agent doing multi-step work, that’s a serious problem. The plan has no durable representation — it lives only in the model’s reasoning, and reasoning fades as context grows.

The fix is surprisingly simple: give the agent a structured notepad that it writes for itself. Instead of holding the plan in the system prompt or hoping the model remembers, we give it a todo tool that maintains a visible, updatable task list right in the conversation. Every time the agent calls the tool, the current plan comes back as a tool result — fresh content near the end of the context, exactly where the model pays the most attention. In this guide, let’s build that notepad and a nag system that reminds the agent to use it.

The complete source code for this stage is available at the 03-todo-write tag on GitHub. Code blocks below show key excerpts.

A todo tool the agent writes for itself

The core idea is a TodoManager that stores a list of items, each with a status: pending, in_progress, or completed. The agent calls the todo tool to set the full list whenever it wants to update the plan. One key constraint: only a single item can be in_progress at a time. This forces sequential focus — the model can’t mark three things as in-progress and half-finish all of them.

Let’s start with the data model. Each todo item has an ID, a text description, and a status:

// Sources/Core/TodoManager.swift
public enum TodoStatus: String, Sendable, Equatable, Codable {
  case pending
  case inProgress = "in_progress"
  case completed

  public var marker: String {
    switch self {
    case .pending: "[ ]"
    case .inProgress: "[>]"
    case .completed: "[x]"
    }
  }
}

public struct TodoItem: Sendable, Equatable, Codable {
  public let id: String
  public let text: String
  public let status: TodoStatus
}

The status markers — [ ], [>], [x] — make the rendered output instantly scannable for both the model and us watching the agent work.

The manager itself is a class that validates and stores items. The validation rules are intentionally tight: no more than 20 items, no blank text, and that single-in-progress constraint. Here’s the core:

public final class TodoManager {
  public static let maxItems = 20
  public private(set) var items: [TodoItem] = []

  public enum ValidationError: Error, Equatable, Sendable {
    case tooManyItems
    case emptyText(String)
    case multipleInProgress
  }

  public func update(items: [TodoItem]) throws {
    if items.count > Self.maxItems {
      throw ValidationError.tooManyItems
    }

    for item in items where item.text.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty {
      throw ValidationError.emptyText(item.id)
    }

    let inProgressCount = items.filter { $0.status == .inProgress }.count
    if inProgressCount > 1 {
      throw ValidationError.multipleInProgress
    }

    self.items = items
  }
}

TodoManager is a class rather than a struct, which might seem surprising for a type that just holds an array. The reasoning: it’s a stateful manager with a long-lived identity, owned exclusively by the Agent instance. The agent creates one TodoManager at init and mutates it throughout the session. A struct would work with mutating methods, but a class better expresses the intent — this is a single piece of mutable state with a lifecycle tied to the agent.

Wiring the tool into the agent

Adding todo to the dispatch map follows exactly the same pattern as every other tool — one entry in the dictionary, one handler method. Here’s the handler, which bridges between JSONValue inputs and our typed TodoItem model:

// Sources/Core/Agent.swift
private func executeTodo(_ input: JSONValue) async -> Result<String, ToolError> {
    guard let itemsArray = input["items"]?.arrayValue else {
        return .failure(.missingParameter("items"))
    }

    var todoItems: [TodoItem] = []
    for element in itemsArray {
        guard let id = element["id"]?.stringValue else {
            return .failure(.missingParameter("items[].id"))
        }
        guard let text = element["text"]?.stringValue else {
            return .failure(.missingParameter("items[].text"))
        }
        guard let statusString = element["status"]?.stringValue else {
            return .failure(.missingParameter("items[].status"))
        }
        guard let status = TodoStatus(rawValue: statusString) else {
            return .failure(.executionFailed("Invalid status '\(statusString)' for item \(id)"))
        }
        todoItems.append(TodoItem(id: id, text: text, status: status))
    }

    do {
        try todoManager.update(items: todoItems)
        return .success(todoManager.render())
    } catch {
        return .failure(.executionFailed("\(error)"))
    }
}

The handler does the mechanical work of parsing JSON into typed values, then delegates to TodoManager.update() for validation. If everything passes, render() returns the formatted list that goes back to the model as a tool result. The dispatch map gains one line:

func executeTool(name: String, input: JSONValue) async -> Result<String, ToolError> {
    let handlers = [
        "bash": executeBash,
        "read_file": executeReadFile,
        "write_file": executeWriteFile,
        "edit_file": executeEditFile,
        "todo": executeTodo  // one new entry
    ]
    ...
}

And the render() method produces output the model can read at a glance:

public func render() -> String {
    if items.isEmpty {
        return "No todos."
    }

    let completedCount = items.filter { $0.status == .completed }.count
    var lines = items.map { "\($0.status.marker) \($0.text)" }
    lines.append("(\(completedCount)/\(items.count) completed)")

    return lines.joined(separator: "\n")
}

With that in place, the agent has a self-managed planning tool. A rendered todo list looks like [x] Add type hints / [>] Extract helper / [ ] Update docstring / [ ] Run linter / (1/4 completed) — that string appears as a tool result near the end of the context, exactly where we want the plan to live. And just like every tool before it, adding todo required zero changes to the loop itself.

The nag system: reminding the agent to plan

Having a todo tool is necessary but not sufficient. The model might simply not call it — especially as the conversation grows and the system prompt instruction to “use the todo tool” fades. We need a gentle mechanism that nudges the agent back toward planning when it drifts.

The approach is a turn counter. Every time the agent loop processes tool calls, we check whether any of them was todo. If not, we increment turnsWithoutTodo. If the counter hits a threshold (three turns) and there are still open items, we inject a short reminder into the tool results:

// Sources/Core/Agent.swift — inside run()
var turnsWithoutTodo = 0

while true {
    // ... API call, check stopReason ...

    var results: [ContentBlock] = []
    var didUseTodo = false

    for case .toolUse(let id, let name, let input) in response.content {
        let toolResult = await executeTool(name: name, input: input)

        if name == "todo" {
            didUseTodo = true
        }
        // ... append result to results ...
    }

    turnsWithoutTodo = didUseTodo ? 0 : turnsWithoutTodo + 1
    if turnsWithoutTodo >= Self.todoReminderThreshold && todoManager.hasOpenItems() {
        results.append(.text("Update your todos."))
    }

    messages.append(Message(role: .user, content: results))
}

A few things to note about the placement. The turnsWithoutTodo counter is a local variable inside run(), not an instance property. It only matters within a single user query — when the user types a new prompt, the counter resets naturally. The messages array, by contrast, stays on the Agent instance so conversation history persists across REPL turns.

The reminder is appended to the results array, after all tool results. During development, we initially inserted it at position zero — before the tool results. That’s risky because the Anthropic API expects tool_result blocks to come first in a user message that responds to tool use. Appending the text reminder after all results is the safer ordering.

There’s also a subtlety with didUseTodo: it’s set to true whenever the model calls the todo tool, regardless of whether the call succeeds or fails. Ideally, a failed todo call (say, with invalid data) shouldn’t reset the nag counter — the agent didn’t actually update its plan. The current implementation is a pragmatic compromise; gating on success would add complexity for a rare edge case.

Taking it for a spin

Let’s build and run:

swift build && swift run agent

Try a multi-step task: Refactor the file Package.swift: first read it, then add a comment header, then verify it still compiles. Watch for the agent to call todo early to lay out the plan, then update statuses as it works through each step. If the agent skips the todo tool for three turns, the "Update your todos." reminder should appear in the output.

The nag only fires when there are open items — if the agent never calls todo in the first place, there’s nothing to nag about. To see the reminder in action, try a task complex enough that the agent creates a todo list early, then gets absorbed in the work: Create a Swift package with three modules: a networking library, a models library that depends on it, and a CLI that ties them together. Include a Package.swift with the dependency graph. Watch for the agent to lay out the plan with todo, then start building — after three turns of file operations without updating the list, the reminder appears.

What we’ve built and where we’re going

We now have an agent that can track its own work. The todo tool gives the model a structured notepad — a place to write down the plan, mark items as in-progress, and check them off as they’re completed. The nag system ensures the plan doesn’t get abandoned as the conversation grows. Together, they’re a lightweight counter-measure to the instruction-following decay that makes long agent sessions drift.

The mechanism is simple — a class with validation rules, one tool handler, and a turn counter — but it addresses a real problem that gets worse as agents take on larger tasks. The loop itself didn’t change; we added one entry to the dispatch dictionary and a counter with an injection point after tool processing. The pattern holds: the loop is the invariant, tools are the variable. Nag reminders work well here, but an interesting question arises when we start running child agents: the TodoManager is shared state on the same Agent instance. If a subagent runs, should it nag about the parent’s todos? We’ll tackle that in the next guide when we build subagents and introduce LoopConfig to control per-loop behavior. Thanks for reading!