What does the skill-creator skill do?

It guides Claude through creating, improving, and measuring Agent Skills — drafting a SKILL.md, writing and running evals, benchmarking performance with variance analysis, and optimizing the description for reliable triggering.

Can it improve a skill I already have?

Yes. It snapshots your current skill, reruns the test cases against that baseline, and applies feedback-driven edits, so you can see whether the new version actually beats the old one.

How does the eval and benchmarking step work?

It spawns with-skill and baseline runs on the same prompts, grades each against assertions, and aggregates pass_rate, time, and token usage as mean ± stddev with deltas using aggregate_benchmark.py.

What is description optimization and why does it matter?

The description in SKILL.md frontmatter is the primary mechanism that decides whether Claude invokes a skill. The skill runs a loop that tests candidate descriptions against should-trigger and should-not-trigger queries and selects the best by held-out score to avoid overfitting.

Does it work on Claude.ai without subagents?

Yes. On Claude.ai it runs test cases one at a time and gathers qualitative feedback inline, skipping the subagent-based benchmarking and the CLI-only description optimizer.

Can it package a finished skill?

Yes. The bundled package_skill.py script produces a distributable .skill file you can install, and it runs anywhere with Python and a filesystem.

Which platforms does this skill support?

It targets Claude specifically — Claude Code, Claude.ai, the Claude API, and Cowork — since it builds Anthropic Agent Skills and uses Claude-specific tooling.

Is skill-creator free and open source?

It is free to install from the public anthropics/skills repository, and it is open source under the Apache 2.0 license (see LICENSE.txt).

ClaudeIntermediateFreeEditorial

Skill Creator

Create, improve, and benchmark Claude Agent Skills — draft a SKILL.md, run evals with variance analysis, and optimize the description for better triggering.

No reviews

Anthropic

164,1474120

Overview

The skill-creator skill is the official Anthropic Agent Skill for creating new Agent Skills, improving existing ones, and measuring their performance. It walks Claude through the full authoring loop — draft a SKILL.md, write realistic test prompts, run evals, benchmark results with variance analysis, and optimize the skill description for better triggering accuracy.

What is skill-creator?

skill-creator is a meta-skill: a skill for making skills. It encodes Anthropic's recommended workflow for skill development, including the anatomy of a skill (SKILL.md plus optional scripts/, references/, and assets/ folders), the three-level progressive-disclosure loading model, and the eval-driven iteration loop. It bundles scripts such as aggregate_benchmark.py, the eval-viewer/generate_review.py review tool, run_loop.py for description optimization, and package_skill.py for producing a distributable .skill file.

What it does

Creates skills from scratch — captures intent, interviews you on edge cases, and writes the SKILL.md frontmatter (name, description) and body.
Improves existing skills — snapshots the current version, reruns evals against it, and applies feedback-driven edits.
Runs evals — spawns with-skill and baseline runs on the same test prompts, then grades each against assertions.
Benchmarks with variance analysis — aggregates pass_rate, time, and token usage as mean ± stddev with deltas, then surfaces flaky or non-discriminating evals.
Optimizes the description — runs a train/held-out loop that evaluates each candidate description three times per query and selects the best by test score to avoid overfitting.

How it works

The skill meets you wherever you are in the process. For a new skill it helps narrow the idea, drafts the skill, writes 2-3 realistic test prompts, runs Claude with and without the skill, grades the outputs, and launches an HTML review viewer (generate_review.py) so you can inspect results and leave feedback. It captures total_tokens and duration_ms from each run into timing.json, aggregates a benchmark.json, and iterates. For triggering, it generates a 20-query eval set (should-trigger and should-not-trigger near-misses) and runs run_loop.py to propose and test improved descriptions. It adapts to Claude Code, Claude.ai, and Cowork environments where subagents or a browser may be unavailable.

Who it is for

Anyone building Agent Skills for Claude — from developers packaging a repeatable workflow to teams who need their skills to trigger reliably and prove their value with data. It is written to communicate clearly with users across a wide range of technical familiarity, explaining terms like evals and assertions when context suggests they are needed.

What you can build

A new SKILL.md with well-structured frontmatter and progressive disclosure.
An eval suite (evals/evals.json) with prompts and objectively verifiable assertions.
A quantitative benchmark comparing a skill against its baseline with variance.
An optimized, "pushy" description tuned for better triggering accuracy.
A packaged .skill file ready to install.

Why it matters

Most skills underperform because they are written once and never tested. skill-creator turns skill authoring into a measurable, iterative discipline: you see whether a skill actually beats the baseline, whether your evals discriminate, and whether the description triggers when it should. That feedback loop is what separates a skill that works on one example from one that works across a million invocations.

What's Included

SKILL.md guide covering skill anatomy and three-level progressive disclosure
Intent-capture and interview workflow for drafting a new skill
Eval workflow with with-skill and baseline runs plus assertion grading
aggregate_benchmark.py for pass_rate, time, and token stats with variance
eval-viewer/generate_review.py HTML viewer for qualitative and quantitative review
run_loop.py description optimizer with train/held-out splitting
package_skill.py to produce a distributable .skill file

Installation

1. Add the skill (skills CLI)

bash

npx skills add anthropics/skills --skill skill-creator

2. Or install via the Claude Code plugin marketplace

bash

/plugin marketplace add anthropics/skills
/plugin install example-skills@anthropic-agent-skills

3. Use it

Tell your agent what you want to build, for example: "Use the skill-creator skill to help me build a skill that converts our internal RFC markdown into a formatted PDF, and set up evals to test it."

Requirements

Claude Code or Claude.ai with Agent Skills support
Python 3 for the bundled benchmarking and packaging scripts
For description optimization: the claude CLI (claude -p), available in Claude Code

Changelog

v1.0.02026-06-01

Initial listing of the official Anthropic skill-creator skill.

FAQs

Reviews

No reviews yet. Be the first to review this skill!

Turn prompts into followers

Teach your AI new tricks

Learn AI, the practical way