Building a Coding Agent in Swift, Part 8: Background Tasks

Our agent can plan multi-step work with a persistent task DAG, compress its own memory, delegate to subagents, and load skills on demand — all driven by the same agent loop from the first guide. But every tool call still blocks. When the model calls bash to run a test suite that takes two minutes, the loop sits idle, waiting for the process to finish before it can do anything else. If someone asks “run the tests and while that’s going, create the config file,” the agent does them sequentially — tests first, config second. For fast commands this doesn’t matter. For builds, installs, and test suites, it’s a real bottleneck.

The fix is a background execution layer: a way to hand a slow command to a worker, get a job ID back immediately, and keep the loop moving. When the command finishes, its result goes into a notification queue. Before each API call, the loop drains that queue and injects the results as messages — so the model sees them on its next turn without ever having blocked. The loop stays synchronous; only the subprocess I/O runs in parallel.

In this guide, let’s build BackgroundManager — the first and only actor in the entire codebase — and wire it into the agent loop with a notification injection pattern that keeps background results flowing into the model’s context.

The complete source code for this stage is available at the 08-background-tasks tag on GitHub. Code blocks below show key excerpts.

Why an actor — and only one

Every other manager in our codebase — TodoManager, TaskManager, SkillLoader, ContextCompactor — is accessed exclusively from the agent loop’s sequential flow. The loop calls a tool handler, the handler calls the manager, the manager returns, the loop continues. There’s never a moment where two pieces of code touch the same state simultaneously.

BackgroundManager breaks that pattern. When the model calls background_run, the manager spawns a Task {} that runs the shell command asynchronously. That task might finish thirty seconds later — while the main loop is in the middle of an API call, processing other tool results, or even draining the notification queue. The task writes to jobs and notifications; the main loop reads from them. Two isolated execution contexts mutating the same dictionaries. That’s a textbook data race, and it’s exactly what Swift’s actor keyword exists to prevent:

// Sources/Core/BackgroundManager.swift
public actor BackgroundManager {
  private let executor: ShellExecutor

  private var jobs: [String: BackgroundJob] = [:]
  private var notifications: [BackgroundNotification] = []
  private var runningTasks: [String: Task<Void, Never>] = [:]

  public init(executor: ShellExecutor) {
    self.executor = executor
  }
}

The actor keyword means every access to jobs, notifications, and runningTasks is serialized by the compiler. No locks, no dispatch queues — the concurrency safety is structural. And because ShellExecutor is a Sendable struct with only a let stored property, it can be safely captured by the actor without any bridging.

The supporting types are straightforward value types. BackgroundJob tracks a command’s lifecycle — its ID, preview text, status, and eventual result:

public struct BackgroundJob: Sendable, Equatable {
  public let id: String
  public let command: String
  public let commandPreview: String
  public var status: BackgroundJobStatus
  public var result: String?
}

BackgroundNotification is the message format that flows from the background into the agent loop — a snapshot of what happened:

public struct BackgroundNotification: Sendable, Equatable {
  public let jobId: String
  public let status: BackgroundJobStatus
  public let command: String
  public let result: String
}

Both are Sendable and cross the actor isolation boundary cleanly.

Job lifecycle: dispatch, execute, notify

Let’s walk through what happens when the model calls background_run. The run() method creates a job record, spawns a Task {} to execute the command, and returns immediately with a confirmation string:

public func run(
  command: String,
  timeout: TimeInterval = Limits.backgroundTimeout
) -> String {
  let jobId = String(UUID().uuidString.prefix(8)).lowercased()
  let commandPreview = String(command.prefix(Limits.backgroundCommandPreview))
  jobs[jobId] = BackgroundJob(
    id: jobId, command: command, commandPreview: commandPreview, status: .running
  )

  let task = Task {
    let status: BackgroundJobStatus
    let output: String

    do {
      let result = try await self.executor.execute(command, timeout: timeout)

      if result.exitCode != 0 {
        status = .error
      } else {
        status = .completed
      }

      output = result.formatted
    } catch ShellExecutorError.timeout {
      status = .timeout
      output = "Error: Timeout (\(Int(timeout))s)"
    } catch {
      status = .error
      output = "Error: \(error)"
    }

    self.complete(jobId: jobId, status: status, output: output)
  }
  runningTasks[jobId] = task

  return "Background job \(jobId) started: \(commandPreview)"
}

One thing to keep in mind here is that the Task {} inside the actor inherits the actor’s isolation — self.executor, self.complete(), and self.runningTasks are all accessible directly. Actors don’t support [weak self] captures (and don’t need them — the actor’s lifetime is managed by its owner, not by individual tasks). The task calls self.complete() when the shell command finishes, which updates the job status and enqueues a notification:

private func complete(
  jobId: String,
  status: BackgroundJobStatus,
  output: String
) {
  jobs[jobId]?.status = status
  jobs[jobId]?.result = output

  notifications.append(
    BackgroundNotification(
      jobId: jobId,
      status: status,
      command: jobs[jobId]?.commandPreview ?? "",
      result: String(output.prefix(Limits.backgroundResultPreview))
    )
  )

  runningTasks.removeValue(forKey: jobId)
}

The notification carries a truncated preview of the output — enough for the model to understand what happened without flooding the context. The full result is stored in jobs[jobId]?.result for retrieval via background_check.

Draining the queue is a single atomic operation — read everything, then clear:

public func drainNotifications() -> [BackgroundNotification] {
  let result = notifications
  notifications.removeAll()
  return result
}

Because this runs inside the actor, the read-and-clear is serialized with respect to complete(). A notification can never be half-written when we drain, and it can never be lost between the read and the clear.

Notification injection: bridging background to model

The background manager accumulates results, but the model can’t see them until they’re injected into the messages array. That injection happens in drainBackgroundNotifications, which runs in the agent loop before each API call:

func drainBackgroundNotifications(_ messages: [Message]) async -> [Message] {
  let notifications = await backgroundManager.drainNotifications()
  guard !notifications.isEmpty else {
    return messages
  }

  let text =
    notifications
    .map { "[bg:\($0.jobId)] \($0.status.rawValue): \($0.result)" }
    .joined(separator: "\n")

  var result = messages
  let wrappedText = "<background-results>\n\(text)\n</background-results>"

  if let lastMessage = result.last, lastMessage.role == .user {
    var updatedContent = lastMessage.content
    updatedContent.append(.text(wrappedText))
    result[result.count - 1] = Message(role: .user, content: updatedContent)
  } else {
    result.append(.user(wrappedText))
  }

  result.append(.assistant("Noted background results."))
  return result
}

The <background-results> XML wrapper gives the model a clear signal that these are asynchronous completions, not user input. The if let lastMessage check handles the API’s alternation requirement — if the last message is already a user message (which it is after tool results are appended), the background results get appended to that message’s content rather than creating a consecutive user turn. A synthetic assistant acknowledgment follows so the next user message has a proper assistant turn before it.

The loop wires it alongside compaction, both running before the API call:

while true {
  try Task.checkCancellation()
  // ...
  messages = await applyCompaction(messages)
  if config.drainBackground {
    messages = await drainBackgroundNotifications(messages)
  }

  let request = APIRequest(
    model: model, maxTokens: Limits.defaultMaxTokens,
    system: systemPrompt, messages: messages, tools: config.tools
  )
  let response = try await apiClient.createMessage(request: request)
  // ... process tools, append results, continue
}

That config.drainBackground flag is the subagent guard. During development, an early version ran drainBackgroundNotifications in every agentLoop call — including subagent loops. A subagent running a quick research task would consume background notifications meant for the main agent. The results were gone before the main loop ever saw them. The fix: LoopConfig.default sets drainBackground: true, while LoopConfig.subagent sets it to false:

static let `default` = LoopConfig(
  tools: Agent.toolDefinitions,
  maxIterations: .max,
  enableNag: true,
  drainBackground: true,
  label: "agent"
)

static let subagent = LoopConfig(
  tools: Agent.toolDefinitions.filter {
    !subagentExcludedTools.contains($0.name)
  },
  maxIterations: 30,
  enableNag: false,
  drainBackground: false,
  label: "subagent"
)

The subagent’s excluded tools now also include background_run and background_check — a subagent shouldn’t be able to spawn background work at all, since it can’t drain the results.

With that in place, our agent can hand off slow commands and keep working. Two new entries in the dispatch dictionary, one new actor, and a three-line drain check in the loop — the background execution layer is complete.

The Linux SIGTERM saga

The ShellExecutor gained a timeout parameter for background commands (defaulting to 300 seconds). The timeout mechanism uses DispatchSource.makeTimerSource() — a GCD timer that fires once after the deadline and terminates the process. An earlier design considered Task.sleep, but there’s a subtle problem: try? await Task.sleep(for:) swallows CancellationError. When the process finishes normally and the sleep task is cancelled, execution falls through past the sleep, sets a timeout flag, and kills a process that already exited. DispatchSource avoids this entirely — timer.cancel() is synchronous and guaranteed to prevent the handler from firing.

The first version called process.terminate() in the timer handler — SIGTERM. On macOS, this worked perfectly: bash received SIGTERM, the child process died, the timeout was detected. Then the Linux build ran, and three timeout scenarios broke.

The root cause: macOS and Linux bash handle SIGTERM differently when waiting on a foreground child. macOS bash exits promptly. Linux bash defers the signal until the child process finishes on its own. A bash -c "sleep 10" with a 2-second timeout would run for the full 10 seconds on Linux because bash ignored the SIGTERM while waiting for sleep.

The fix came in two parts. First, process.interrupt() replaced process.terminate() — SIGINT instead of SIGTERM. Bash forwards SIGINT to the child process group on both platforms. Second, the timeout detection itself changed from signal-based to elapsed-time:

let startTime = DispatchTime.now()
// ... process runs, timer fires interrupt() if needed ...
process.waitUntilExit()
timer?.cancel()

if let timeout {
  let elapsedSeconds =
    Double(
      DispatchTime.now().uptimeNanoseconds - startTime.uptimeNanoseconds
    ) / 1_000_000_000
  if elapsedSeconds >= timeout {
    throw ShellExecutorError.timeout(seconds: Int(timeout))
  }
}

The original detection checked process.terminationReason == .uncaughtSignal && process.terminationStatus == SIGTERM — a check that was fragile across platforms and depended on bash’s specific signal-handling behavior. Wall-clock comparison is platform-independent and unambiguous: if the process took longer than the timeout, it was terminated.

Taking it for a spin

Let’s build and run:

swift build && swift run agent

Try: Run "sleep 5 && echo done" in the background, then create a file called hello.txt with "world" in it. Watch the tool calls — the agent should call background_run, get a job ID back immediately, then proceed to create the file without waiting. A few seconds later, when the sleep finishes, the [background] 1 result(s) injected message should appear as the drain fires before the next API call.

For something more realistic: Run the test suite in the background with "swift test" and while it runs, read Package.swift and summarize the dependencies. The agent works on the summary while the tests execute in parallel. When the tests finish, the results appear in the model’s next context window.

To see the check tool: Start three background tasks: "sleep 2", "sleep 4", "sleep 6". Then check all background jobs. The first check should show a mix of completed and running jobs. A second check a few seconds later should show all three completed.

The capstone: 14 tools, one loop

We’ve reached the end of the series. Let’s take stock of what we’ve built.

The agent now has 14 tools across eight stages: bash, read_file, write_file, edit_file, todo, agent, load_skill, compact, task_create, task_update, task_list, task_get, background_run, background_check. It can run shell commands, manipulate files, track its own work, delegate to subagents, load specialized knowledge, compress its memory, plan with a dependency graph, and execute slow commands in the background. The Agent type grew from a placeholder caseless enum in stage 0 to 849 lines — and the agent loop at its center is structurally unchanged from the first guide.

That’s the thesis we set out to test. Claude Code’s effectiveness comes from architectural restraint: a small set of excellent tools, thin orchestration, and heavy reliance on the model. The loop is the invariant — API call, check stop reason, process tool uses, append results, repeat. Every new capability was added the same way: define the tool, write the handler, add an entry to the dispatch dictionary. The loop never needed a new branch, a new state machine, or a different control flow. New behaviors arrived as injection points around the loop — nag reminders after tool processing, compaction before the API call, background drain alongside it — but the kernel itself held steady.

The one actor in the codebase exists because one type genuinely needed concurrent access to shared state. Everything else — classes, structs, enums — uses the simplest concurrency model that works. No over-architecture, no speculative abstractions, no framework. Just a loop, a dictionary, and a model that knows what to do with the tools it’s given. Thanks for reading!