Building a Coding Agent in Swift, Part 5: Skill Loading

Our agent can run commands, read and write files, track its own work, and delegate tasks to subagents. That’s a solid toolkit — but everything the agent knows comes from either the model’s training data or the contents of files it reads during a session. Ask it to follow a specific git commit convention, a code review checklist, or a deployment workflow, and it has nothing to draw on. We could stuff all of that knowledge into the system prompt, but that’s wasteful: ten domain-specific guides at roughly 2,000 tokens each would add 20,000 tokens to every single API call, most of which would be irrelevant to the task at hand.

The fix is a two-layer injection strategy. Layer one is cheap: a one-line description of each available skill, embedded in the system prompt. The model sees what’s available at a glance — maybe 100 tokens per skill. Layer two is expensive but on-demand: when the model decides it actually needs a skill, it calls a tool that returns the full body as a tool result. The knowledge arrives exactly when it’s useful, and only the skills the model asks for consume context.

In this guide, let’s build a SkillLoader that scans the filesystem for skill files, a buildSystemPrompt function that injects their names, and a load_skill tool that delivers their full content. This is the midpoint of our series — after this stage, the agent can run commands, manipulate files, plan its work, delegate tasks, and load new knowledge on demand.

The complete source code for this stage is available at the 05-skill-loading tag on GitHub. Code blocks below show key excerpts.

What a skill looks like on disk

Each skill lives in its own subdirectory under skills/, with a single SKILL.md file. The file uses YAML frontmatter — a name, a description — followed by the full body of knowledge. Here’s the example skill that ships with the project:

skills/
  example/
    SKILL.md
  code-review/
    SKILL.md

And the contents of a SKILL.md:

---
name: example
description: An example skill demonstrating the skill file format
---

This is a sample skill file. Skills are stored in `skills/{name}/SKILL.md` and
provide specialized knowledge that the agent can load on demand via the
`load_skill` tool.

The frontmatter is the cheap part — the description feeds into the system prompt. The body below the closing --- is the expensive part — it only reaches the model when explicitly requested. A skill for code review might have a three-word description but a 2,000-token body with detailed checklists, severity rubrics, and formatting conventions. The agent pays for those tokens only when it’s actually doing a code review.

Two layers, two costs

The architecture breaks down into a clear division of labor. At init time, SkillLoader scans the skills/ directory and parses every SKILL.md it finds. The parsed descriptions flow into buildSystemPrompt, which appends a short menu to the system prompt — something like:

Skills available:
  - code-review: Review code for bugs, style issues, and best practices
  - example: An example skill demonstrating the skill file format

That’s layer one. Every API call includes it, but it’s tiny — a few lines of text that tell the model what knowledge is available.

Layer two is the load_skill tool. When the model calls load_skill(name: "code-review"), the handler returns the full body wrapped in <skill> tags. That content arrives as a tool result — fresh context near the end of the messages array, exactly where the model pays the most attention. The model asked for it, so it’s relevant. And because it’s a tool result rather than part of the system prompt, it only appears in the one turn that needed it.

Scanning and parsing

Let’s walk through SkillLoader. The type holds a dictionary of parsed skills, populated once at init time and never mutated afterward:

// Sources/Core/SkillLoader.swift
public struct SkillLoader: Sendable {
  public struct Skill: Sendable {
    public let name: String
    public let description: String
    public let body: String
  }

  private let skills: [String: Skill]
}

The Skill struct captures exactly three things: the name (used as a lookup key), the description (injected into the system prompt), and the body (returned by the tool).

The initializer scans the skills directory, silently handling the case where it doesn’t exist:

public init(directory: String) {
  let fileManager = FileManager.default
  var loadedSkills: [String: Skill] = [:]

  var isDirectory: ObjCBool = false
  guard
    fileManager.fileExists(atPath: directory, isDirectory: &isDirectory),
    isDirectory.boolValue
  else {
    self.skills = [:]
    return
  }

  let contents = (try? fileManager.contentsOfDirectory(atPath: directory)) ?? []
  for entry in contents {
    let skillFile = "\(directory)/\(entry)/SKILL.md"
    guard
      fileManager.fileExists(atPath: skillFile),
      let text = try? String(contentsOfFile: skillFile, encoding: .utf8)
    else {
      continue
    }

    let (meta, body) = Self.parseFrontmatter(text)
    let skillName = meta["name"] ?? entry
    guard let description = meta["description"] else {
      continue
    }

    loadedSkills[skillName] = Skill(
      name: skillName,
      description: description,
      body: body.trimmingCharacters(in: .whitespacesAndNewlines)
    )
  }

  self.skills = loadedSkills
}

The init walks each subdirectory looking for a SKILL.md file. If the frontmatter specifies a name, that’s the key; otherwise, the directory name is used as a fallback. Skills without a description are silently skipped — the description is what makes layer one work, so a skill without one has nothing to advertise. The try? on contentsOfDirectory and String(contentsOfFile:) means a permissions error on one skill doesn’t prevent the rest from loading.

The frontmatter parser is a straightforward line-by-line scan — no regex, no YAML library:

private static func parseFrontmatter(_ text: String) -> (meta: [String: String], body: String) {
  let lines = text.components(separatedBy: "\n")

  guard
    let firstLine = lines.first,
    firstLine.trimmingCharacters(in: .whitespaces) == "---"
  else {
    return (meta: [:], body: text)
  }

  var meta: [String: String] = [:]
  var closingIndex: Int?

  for index in 1..<lines.count {
    let line = lines[index]
    if line.trimmingCharacters(in: .whitespaces) == "---" {
      closingIndex = index
      break
    }

    if let colonRange = line.range(of: ":") {
      let key = String(line[line.startIndex..<colonRange.lowerBound])
        .trimmingCharacters(in: .whitespaces)
      let value = String(line[colonRange.upperBound...])
        .trimmingCharacters(in: .whitespaces)
      if !key.isEmpty {
        meta[key] = value
      }
    }
  }

  guard let closing = closingIndex else {
    return (meta: [:], body: text)
  }

  let bodyLines = Array(lines[(closing + 1)...])
  let body = bodyLines.joined(separator: "\n")
  return (meta: meta, body: body)
}

If the file doesn’t start with ---, the entire text is treated as the body with no metadata — a graceful fallback for plain markdown files. If the opening delimiter exists but the closing one is missing, the same fallback applies. Only when both delimiters are present does the parser extract key-value pairs from the lines between them.

The two public accessors provide what each layer needs. The descriptions property produces the compact menu for the system prompt, sorted alphabetically for deterministic output:

public var descriptions: String {
  guard !skills.isEmpty else {
    return ""
  }

  return skills.values
    .sorted { $0.name < $1.name }
    .map { "  - \($0.name): \($0.description)" }
    .joined(separator: "\n")
}

And content(for:) delivers the full body wrapped in <skill> tags, with a helpful error message listing available skills if the name doesn’t match:

public func content(for name: String) -> String {
  if let skill = skills[name] {
    return "<skill name=\"\(name)\">\n\(skill.body)\n</skill>"
  }

  if skills.isEmpty {
    return "Unknown skill '\(name)'. No skills are available."
  }

  let available = skills.keys.sorted().joined(separator: ", ")
  return "Unknown skill '\(name)'. Available skills: \(available)"
}

The <skill> tag wrapping is a small but deliberate choice — it gives the model a clear signal that the content is structured knowledge, distinct from a regular tool output. When the model sees <skill name="code-review">...</skill> in a tool result, it knows exactly what it’s looking at.

Wiring into the agent

With SkillLoader ready, let’s connect it to the agent. The buildSystemPrompt method gains a skillDescriptions parameter that conditionally appends the skill menu:

// Sources/Core/Agent.swift
public static func buildSystemPrompt(cwd: String, skillDescriptions: String = "") -> String {
  var prompt = """
    You are a coding agent at \(cwd). Use tools to solve tasks. \
    Act, don't explain.

    - Prefer read_file/write_file/edit_file over bash for file operations
    - Always check tool results before proceeding
    - Use the todo tool to plan multi-step tasks. Mark in_progress before starting, completed when done.
    """

  if !skillDescriptions.isEmpty {
    prompt += "\nUse load_skill to access specialized knowledge.\n\nSkills available:\n\(skillDescriptions)"
  }

  return prompt
}

The empty-string default means callers that don’t have skills — including the static .subagent config and any existing code — work without changes. When skills are present, the prompt gains two things: a behavioral instruction (“Use load_skill to access specialized knowledge”) and the skill menu itself.

In Agent.init, the SkillLoader is created before the system prompt, since its descriptions output feeds into the prompt:

self.skillLoader = SkillLoader(directory: skillsDirectory ?? "\(workingDirectory)/skills")
self.systemPrompt =
  systemPrompt
  ?? Self.buildSystemPrompt(
    cwd: workingDirectory,
    skillDescriptions: self.skillLoader.descriptions
  )

The load_skill tool handler is the simplest in the codebase — a single guard and a return:

private func executeLoadSkill(_ input: JSONValue) async -> Result<String, ToolError> {
  guard let name = input["name"]?.stringValue else {
    return .failure(.missingParameter("name"))
  }
  return .success(skillLoader.content(for: name))
}

And the dispatch map gains one entry:

let handlers = [
  "bash": executeBash,
  "read_file": executeReadFile,
  "write_file": executeWriteFile,
  "edit_file": executeEditFile,
  "todo": executeTodo,
  "agent": executeAgent,
  "load_skill": executeLoadSkill
]

With that in place, we now have an agent that discovers knowledge at startup and delivers it on demand. The load_skill tool is automatically available to subagents too — the denylist in LoopConfig.subagent only excludes agent and todo, so a child agent can load skills independently during a delegated task.

Taking it for a spin

Let’s build and run:

swift build && swift run agent

Create a skills/ directory in the working folder with a custom skill — say, skills/git-workflow/SKILL.md containing frontmatter with a description and a body with commit conventions. Then try: What skills do you have available? The agent should list the skills it found at startup.

For something more interesting, try: Load the example skill and tell me what format skill files use. Watch the tool calls — the model should call load_skill with name: "example", receive the full body in <skill> tags, and summarize the format. The system prompt told it the skill existed; the tool delivered the content.

To see the economics in action, try a session where the agent handles a task that doesn’t need skills — just file operations and bash commands. The skill descriptions in the system prompt add a few lines of overhead, but the full bodies never appear. That’s the payoff of the two-layer approach.

The midpoint: everything clicks together

Let’s take stock of where we are. Over five stages, we’ve built an agent that can run shell commands, read and write files, edit code, track its own work with a todo list, delegate subtasks to child agents, and now load specialized knowledge on demand. Seven tools, one loop. The agent loop itself — API call, check stop reason, process tools, append results — hasn’t changed since the first guide. Every new capability has been a new entry in the dispatch dictionary, a new handler method, and sometimes a new injection point before or after tool processing.

That’s the thesis in action: the loop is the invariant, tools are the variable. SkillLoader is a particularly clean example — the entire feature is a struct that scans a directory, a static function that generates a prompt, and a three-line tool handler. No changes to agentLoop, no changes to processToolUses, no changes to LoopConfig. Skills bloat the context by design — every load_skill call adds a full body to the messages array, and it stays there for the rest of the session. In the next guide, we’ll tackle that directly with context compaction: a three-layer compression strategy that lets the agent run indefinitely without hitting the context window ceiling. Thanks for reading!