ClaudeIntermediateFreeEditorial

Skill Creator

Create, improve, and benchmark Claude Agent Skills — draft a SKILL.md, run evals with variance analysis, and optimize the description for better triggering.

No reviews
A
Anthropic
146,09730

Overview

The skill-creator skill is the official Anthropic Agent Skill for creating new Agent Skills, improving existing ones, and measuring their performance. It walks Claude through the full authoring loop — draft a SKILL.md, write realistic test prompts, run evals, benchmark results with variance analysis, and optimize the skill description for better triggering accuracy.

What is skill-creator?

skill-creator is a meta-skill: a skill for making skills. It encodes Anthropic's recommended workflow for skill development, including the anatomy of a skill (SKILL.md plus optional scripts/, references/, and assets/ folders), the three-level progressive-disclosure loading model, and the eval-driven iteration loop. It bundles scripts such as aggregate_benchmark.py, the eval-viewer/generate_review.py review tool, run_loop.py for description optimization, and package_skill.py for producing a distributable .skill file.

What it does

  • Creates skills from scratch — captures intent, interviews you on edge cases, and writes the SKILL.md frontmatter (name, description) and body.
  • Improves existing skills — snapshots the current version, reruns evals against it, and applies feedback-driven edits.
  • Runs evals — spawns with-skill and baseline runs on the same test prompts, then grades each against assertions.
  • Benchmarks with variance analysis — aggregates pass_rate, time, and token usage as mean ± stddev with deltas, then surfaces flaky or non-discriminating evals.
  • Optimizes the description — runs a train/held-out loop that evaluates each candidate description three times per query and selects the best by test score to avoid overfitting.

How it works

The skill meets you wherever you are in the process. For a new skill it helps narrow the idea, drafts the skill, writes 2-3 realistic test prompts, runs Claude with and without the skill, grades the outputs, and launches an HTML review viewer (generate_review.py) so you can inspect results and leave feedback. It captures total_tokens and duration_ms from each run into timing.json, aggregates a benchmark.json, and iterates. For triggering, it generates a 20-query eval set (should-trigger and should-not-trigger near-misses) and runs run_loop.py to propose and test improved descriptions. It adapts to Claude Code, Claude.ai, and Cowork environments where subagents or a browser may be unavailable.

Who it is for

Anyone building Agent Skills for Claude — from developers packaging a repeatable workflow to teams who need their skills to trigger reliably and prove their value with data. It is written to communicate clearly with users across a wide range of technical familiarity, explaining terms like evals and assertions when context suggests they are needed.

What you can build

  • A new SKILL.md with well-structured frontmatter and progressive disclosure.
  • An eval suite (evals/evals.json) with prompts and objectively verifiable assertions.
  • A quantitative benchmark comparing a skill against its baseline with variance.
  • An optimized, "pushy" description tuned for better triggering accuracy.
  • A packaged .skill file ready to install.

Why it matters

Most skills underperform because they are written once and never tested. skill-creator turns skill authoring into a measurable, iterative discipline: you see whether a skill actually beats the baseline, whether your evals discriminate, and whether the description triggers when it should. That feedback loop is what separates a skill that works on one example from one that works across a million invocations.

What's Included

  • SKILL.md guide covering skill anatomy and three-level progressive disclosure
  • Intent-capture and interview workflow for drafting a new skill
  • Eval workflow with with-skill and baseline runs plus assertion grading
  • aggregate_benchmark.py for pass_rate, time, and token stats with variance
  • eval-viewer/generate_review.py HTML viewer for qualitative and quantitative review
  • run_loop.py description optimizer with train/held-out splitting
  • package_skill.py to produce a distributable .skill file

Installation

1. Add the skill (skills CLI)

bash
npx skills add anthropics/skills --skill skill-creator

2. Or install via the Claude Code plugin marketplace

bash
/plugin marketplace add anthropics/skills
/plugin install example-skills@anthropic-agent-skills

3. Use it

Tell your agent what you want to build, for example: "Use the skill-creator skill to help me build a skill that converts our internal RFC markdown into a formatted PDF, and set up evals to test it."

Requirements

  • Claude Code or Claude.ai with Agent Skills support
  • Python 3 for the bundled benchmarking and packaging scripts
  • For description optimization: the claude CLI (claude -p), available in Claude Code

Changelog

v1.0.02026-06-01

Initial listing of the official Anthropic skill-creator skill.

FAQs

Reviews

Sign in to leave a rating and review.

No reviews yet. Be the first to review this skill!