Skip to content
Code Guide

5. Skills

New in March 2026: Anthropic’s Skill Creator update formalizes a taxonomy that changes how you design, test, and eventually retire skills. Sources: ainews.com, mexc.co, claudecode.jp — not yet reflected in the official llms-full.txt.

Not all skills age the same way. The type you’re building determines how you write it, how you test it, and when to retire it.

Capability UpliftEncoded Preference
What it doesFills a gap the base model can’t handle consistentlySequences existing capabilities your team’s specific way
ExamplesPrecise PDF text placement, custom code patternsNDA review checklist, weekly status update workflow
DurabilityFades as the model improvesStays durable as long as the workflow is relevant
Retirement signalModel passes the eval without the skillWorkflow changes or becomes irrelevant
Eval approachA/B test: with vs. without the skillFidelity check: does it follow the sequence correctly?

Capability Uplift teaches Claude something it genuinely can’t do well on its own — yet. High value today, but carries a maintenance debt: as Claude improves, these skills may become redundant. Evals tell you when that happens before a user does.

Encoded Preference encodes your team’s specific way of doing something Claude already knows how to do. An NDA review follows your legal team’s criteria, not a generic checklist. These skills don’t compete with model improvements — they capture workflow decisions that are yours to make, and stay relevant as long as your process does.

Practical implication: When building a Capability Uplift skill, budget time for evals. When building an Encoded Preference skill, budget time for keeping the workflow description accurate as your process evolves.

Skills are knowledge packages that agents can inherit.

ConceptPurposeInvocation
AgentContext isolation toolTask tool delegation
SkillKnowledge module/skill-name or auto-loaded
CommandProcess workflowSlash command
AspectCommandsSkillsAgents
What it isPrompt templateKnowledge moduleContext isolation tool
Location.claude/commands/.claude/skills/.claude/agents/
Invocation/command-name/skill-name or auto-loadedTask tool delegation
ExecutionIn main conversationLoaded into contextSeparate subprocess
ContextShares main contextAdds to agent contextIsolated context
Best forRepeatable workflowsReusable knowledgeScope-limited analysis
Token costLow (template only)Medium (knowledge loaded)High (full agent)
Examples/commit, /pr, /shipTDD, security-guardiansecurity-audit, perf-audit
Is this a repeatable workflow with steps?
├─ Yes → Use a COMMAND
│ Example: /commit, /release-notes, /ship
└─ No → Is this specialized knowledge multiple agents need?
├─ Yes → Use a SKILL
│ Example: TDD methodology, security checklist
└─ No → Does this need isolated context or parallel work?
├─ Yes → Use an AGENT
│ Example: code-reviewer, performance-auditor
└─ No → Just write it in CLAUDE.md as instructions

See also: §2.7 Configuration Decision Guide for a broader decision tree covering all seven mechanisms (including Hooks, MCP, and CLAUDE.md vs rules).

NeedSolutionExample
Run tests before commitCommand/commit with test step
Security review knowledgeSkill + Agentsecurity-guardian skill → security-audit agent
Parallel code reviewMultiple scope-focused agentsLaunch 3 review agents with isolated scopes
Quick git workflowCommand/pr, /ship
Architecture knowledgeSkillarchitecture-patterns skill
Complex debuggingAgentdebugging-specialist agent

Without skills:

Agent A: Has security knowledge (duplicated)
Agent B: Has security knowledge (duplicated)
Agent C: Has security knowledge (duplicated)

With skills:

security-guardian skill: Single source of security knowledge
Agent A: inherits security-guardian
Agent B: inherits security-guardian
Agent C: inherits security-guardian
Good SkillBad SkillExpected Lifespan
Reusable across agentsSingle-agent specific
Domain-focusedToo broad
Contains reference materialJust instructions
Includes checklistsMissing verification
Has evals defined”Seems to work” validationCapability Uplift: monitor regularly; Encoded Preference: stable
Clear retirement criteriaNo lifecycle planCapability Uplift: short-medium; Encoded Preference: long

Skills live in .claude/skills/{skill-name}/ directories.

skill-name/
├── SKILL.md # Required - Main instructions
├── reference.md # Optional - Detailed documentation
├── checklists/ # Optional - Verification lists
│ ├── security.md
│ └── performance.md
├── examples/ # Optional - Code patterns
│ ├── good-example.ts
│ └── bad-example.ts
└── scripts/ # Optional - Helper scripts
└── audit.sh
---
name: skill-name
description: Short description for activation (max 1024 chars)
allowed-tools: Read Grep Bash
---
FieldSpecDescription
nameagentskills.ioLowercase, 1-64 chars, hyphens only, no --, must match directory name
descriptionagentskills.ioWhat the skill does and when to use it (max 1024 chars)
allowed-toolsagentskills.ioSpace-delimited list of pre-approved tools. Supports wildcard scoping: Bash(npm run *), Bash(agent-browser:*), Edit(/docs/**)
licenseagentskills.ioLicense name or reference to bundled file
compatibilityagentskills.ioEnvironment requirements (max 500 chars)
metadataagentskills.ioArbitrary key-value pairs (author, version, etc.)
disable-model-invocationCC onlytrue to make skill manual-only (workflow with side effects)

allowed-tools wildcard scoping — limit a skill to specific command namespaces rather than opening full Bash access:

# Scope to a specific CLI tool only — no other Bash commands allowed
allowed-tools: Bash(agent-browser:*)
# Scope to npm scripts only
allowed-tools: Bash(npm run *)
# Read-only + scoped writes
allowed-tools: Read Grep Glob Edit(/docs/**)

This is more secure than granting broad Bash access: the skill can only run commands matching the pattern. Ideal for skills wrapping a specific CLI tool.

Open standard: Agent Skills follow the agentskills.io specification, created by Anthropic and supported by 30+ platforms (Cursor, VS Code, GitHub, Codex, Gemini CLI, Goose, Roo Code, etc.). Skills you create for Claude Code are portable. The disable-model-invocation field is a Claude Code extension.

Use the official skills-ref CLI to validate your skill before publishing:

Terminal window
skills-ref validate ./my-skill # Check frontmatter + naming conventions
skills-ref to-prompt ./my-skill # Generate <available_skills> XML for agent prompts

Beyond spec validation: /audit-agents-skills extends frontmatter checks with content quality, design patterns, and production readiness scoring. Works on both skills and agents together with weighted criteria (32 points max per file).

Before publishing or committing a skill, run through this content checklist. /audit-agents-skills scores frontmatter and structure; this checklist covers the content layer that automated tools miss.

Checklist (Every.to compound-engineering criteria, adapted):

  • Frontmatter complete: name, description, allowed-tools all present and accurate
  • “When to Apply” section: explicitly states the triggers and anti-triggers (when NOT to use)
  • Methodology is structured: numbered steps or a clear decision sequence, not free-form paragraphs
  • No TODOs or placeholders: every section is complete and actionable
  • allowed-tools scoped to minimum: if the skill only reads files, don’t grant Bash; if it searches, don’t grant Edit
  • Output format documented: what does Claude produce? Example or template included
  • No AskUserQuestion for cross-platform skills: skills invoked by other agents should not block on interactive prompts
  • Single responsibility: one skill, one domain — not a catch-all that dispatches to sub-skills
  • Description is a trigger sentence: the description field should tell Claude when to activate this skill, not what it does internally

A skill that passes these 9 gates is ready for production use or sharing via the agentskills.io registry.

Skills have a lifecycle. Treating them like permanent artifacts leads to skill rot: dead code in .claude/skills/ that consumes tokens and provides no value.

Two patterns govern when to act:

CATCH REGRESSIONS SPOT OUTGROWTH
───────────────── ──────────────
Model Evolves Model Improves
↓ ↓
Skill Drifts Skill Passes Alone
↓ (without help)
Eval Alerts ↓
(early signal) Skill Retired
↓ (no longer needed)
Fix or Retire

Catch Regressions: Your skill worked last month. The model updated. Now it behaves differently. Without evals, you discover this when a user reports a problem. With evals, you catch it before the failure reaches anyone.

Spot Outgrowth: You built a Capability Uplift skill to cover a gap. Six months later, Claude handles that gap natively. Run the eval without the skill — if it passes, the skill is no longer needed. Remove it to reduce context load and maintenance overhead.

  • Run eval without the skill: does Claude pass on its own?
  • Check last activation date: when did this skill last fire in practice?
  • Check workflow accuracy: for Encoded Preference skills, has the underlying process changed?
  • Archive before deleting: move to .claude/skills/archive/ with a dated note explaining why it was retired

See also: §5.Y Skill Evals — how to run evals to inform retirement decisions.


Skill evals move quality from “seems to work” to “know it works.” They’re the testing layer that makes skills production-grade.

Available via: Skill Creator plugin (Anthropic GitHub) for Claude Code users. Live on Claude.ai and Cowork as of March 2026. Sources: ainews.com, mexc.co — not yet in official llms-full.txt.

Skill → Test Prompts + Files
Expected Output (what good looks like)
Run Evals
Pass ✓ / Fail ✗
Improve skill → Re-run

You define three things: test prompts (realistic inputs that trigger the skill), expected outputs (description of what “good” looks like — not exact string matching), and a pass rate threshold. Claude executes the skill against each test case and judges the output.

Results report: pass rate, elapsed time, token usage per test case.

Benchmark Mode — tracks pass rates, elapsed time, and token usage across model updates. Runs tests in parallel with clean, isolated contexts (no cross-contamination between cases). Use this to detect regressions automatically when Claude updates.

A/B Testing (Comparator Agents) — blind head-to-head comparison between two versions of a skill. Version A vs. Version B, judged without knowing which is which. Removes confirmation bias from skill improvement decisions.

Trigger Tuning (Description Optimizer) — analyzes your skill’s description field and suggests improvements to reduce false positives (skill fires when it shouldn’t) and false negatives (skill doesn’t fire when it should). Anthropic’s internal test: 5 of 6 document-creation skills showed improved triggering accuracy after optimization. [Source: claudecode.jp — directional, not independently verified]

Use CaseWhenAction
Catch RegressionsAfter model updatesRun benchmark → alert if pass rate drops
Spot OutgrowthPeriodically for Capability Uplift skillsRun eval without the skill → if it passes, retire
.claude/skills/my-skill/
├── SKILL.md
└── tests/ ← Eval directory
├── test-01-basic.md # Prompt + expected output description
├── test-02-edge-case.md # Edge case coverage
└── benchmark-config.md # Pass rate threshold, token budget
  • One behavior per test: don’t combine multiple assertions — failures become ambiguous
  • Include edge cases: test the inputs that made the skill necessary in the first place
  • Define “good” precisely: vague expected outputs make eval judgments unreliable
  • Set a pass rate threshold: 80% is a reasonable starting point; adjust for criticality

See also: §5.2 Skill Quality Gates for pre-publish checklist | §5.X Skill Lifecycle for retirement workflow


---
name: your-skill-name
description: Expert guidance for [domain] problems
allowed-tools: Read Grep Bash
---
# Your Skill Name
## Expertise Areas
This skill provides knowledge in:
- [Area 1]
- [Area 2]
- [Area 3]
## When to Apply
Use this skill when:
- [Situation 1]
- [Situation 2]
## Methodology
When activated, follow this approach:
1. [Step 1]
2. [Step 2]
3. [Step 3]
## Key Concepts
### Concept 1: [Name]
[Explanation]
### Concept 2: [Name]
[Explanation]
## Checklists
### Pre-Implementation Checklist
- [ ] [Check 1]
- [ ] [Check 2]
- [ ] [Check 3]
### Post-Implementation Checklist
- [ ] [Verification 1]
- [ ] [Verification 2]
## Examples
### Good Pattern

// Good example

### Anti-Pattern

// Bad example - don’t do this

## Reference Material
See `reference.md` for detailed documentation.
## 5.4 Skill Examples
### Example 1: Security Guardian Skill
```markdown
---
name: security-guardian
description: Security expertise for OWASP Top 10, auth, and data protection
allowed-tools: Read Grep Bash
---
# Security Guardian
## Expertise Areas
- OWASP Top 10 vulnerabilities
- Authentication & Authorization
- Data protection & encryption
- API security
- Secrets management
## OWASP Top 10 Checklist
### A01: Broken Access Control
- [ ] Check authorization on every endpoint
- [ ] Verify row-level permissions
- [ ] Test IDOR vulnerabilities
- [ ] Check for privilege escalation
### A02: Cryptographic Failures
- [ ] Check for hardcoded secrets
- [ ] Verify TLS configuration
- [ ] Review password hashing (bcrypt/argon2)
- [ ] Check data encryption at rest
### A03: Injection
- [ ] Review SQL queries (parameterized?)
- [ ] Check NoSQL operations
- [ ] Review command execution
- [ ] Check XSS vectors
[... more checklists ...]
## Authentication Patterns
### Good: Secure Password Hashing
```typescript
import { hash, verify } from 'argon2';
const hashedPassword = await hash(password);
const isValid = await verify(hashedPassword, inputPassword);
// DON'T DO THIS
const hashed = md5(password);
const hashed = sha1(password);
.gitignore
.env
.env.local
*.pem
*credentials*
// Good
const apiKey = process.env.API_KEY;
// Bad
const apiKey = "sk-1234567890abcdef";
---
name: tdd
description: Test-Driven Development methodology and patterns
allowed-tools: Read Write Bash
---
# TDD (Test-Driven Development)
## The TDD Cycle
┌─────────────────────────────────────────────────────────┐
│ RED → GREEN → REFACTOR │
├─────────────────────────────────────────────────────────┤
│ │
│ 1. RED ──→ Write a failing test │
│ │ │
│ ▼ │
│ 2. GREEN ──→ Write minimal code to pass │
│ │ │
│ ▼ │
│ 3. REFACTOR ──→ Improve code, keep tests green │
│ │ │
│ └────────────→ Repeat │
│ │
└─────────────────────────────────────────────────────────┘
## Methodology
### Step 1: RED (Write Failing Test)
Write a test for the behavior you want BEFORE writing any code.
```typescript
// user.test.ts
describe('User', () => {
it('should validate email format', () => {
expect(isValidEmail('test@example.com')).toBe(true);
expect(isValidEmail('invalid')).toBe(false);
});
});

Run: pnpm test → Should FAIL (function doesn’t exist)

Write the MINIMUM code to make the test pass.

user.ts
export const isValidEmail = (email: string): boolean => {
return email.includes('@');
};

Run: pnpm test → Should PASS

Now improve the implementation while keeping tests green.

// user.ts (improved)
export const isValidEmail = (email: string): boolean => {
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailRegex.test(email);
};

Run: pnpm test → Should still PASS

it('should calculate order total', () => {
// Arrange - Set up test data
const items = [
{ price: 10, quantity: 2 },
{ price: 5, quantity: 3 }
];
// Act - Execute the code
const total = calculateTotal(items);
// Assert - Verify the result
expect(total).toBe(35);
});

Purpose: Detect, analyze, and suggest Gang of Four design patterns in TypeScript/JavaScript codebases with stack-aware recommendations.

Location: examples/skills/design-patterns/

Key Features:

  • Detects 23 GoF design patterns (Creational, Structural, Behavioral)
  • Stack-aware detection (React, Angular, NestJS, Vue, Express, RxJS, Redux, ORMs)
  • Code smell detection with pattern suggestions
  • Quality evaluation (5 criteria: Correctness, Testability, SRP, Open/Closed, Documentation)
  • Prefers stack-native alternatives (e.g., React Context over Singleton)

Structure:

design-patterns/
├── SKILL.md # Main skill instructions
├── reference/
│ ├── patterns-index.yaml # 23 patterns metadata
│ ├── creational.md # 5 creational patterns
│ ├── structural.md # 7 structural patterns
│ └── behavioral.md # 11 behavioral patterns
├── signatures/
│ ├── stack-patterns.yaml # Stack detection + native alternatives
│ ├── detection-rules.yaml # Grep patterns for detection
│ └── code-smells.yaml # Smell → pattern mappings
└── checklists/
└── pattern-evaluation.md # Quality scoring system

Operating Modes:

  1. Detection Mode: Find existing patterns in codebase

    Terminal window
    # Invoke via skill or direct analysis
    "Analyze design patterns in src/"
  2. Suggestion Mode: Identify code smells and suggest patterns

    Terminal window
    "Suggest design patterns to fix code smells in src/services/"
  3. Evaluation Mode: Score pattern implementation quality

    Terminal window
    "Evaluate the Factory pattern implementation in src/lib/errors/"

Example Output:

{
"stack_detected": {
"primary": "react",
"version": "19.0",
"secondary": ["typescript", "next.js", "prisma"],
"detection_sources": ["package.json", "tsconfig.json"]
},
"patterns_found": {
"factory-method": [{
"file": "src/lib/errors/factory.ts",
"lines": "12-45",
"confidence": 0.9,
"quality_score": 8.2,
"notes": "Well-implemented with proper abstraction"
}],
"singleton": [{
"file": "src/config.ts",
"confidence": 0.85,
"quality_score": 4.0,
"recommendation": "Consider React Context instead"
}]
},
"code_smells": [{
"type": "switch_on_type",
"file": "src/components/data-handler.tsx",
"line": 52,
"severity": "medium",
"suggested_pattern": "strategy",
"rationale": "Replace conditional logic with strategy objects"
}]
}

Stack-Native Recommendations:

PatternReact AlternativeAngular AlternativeNestJS Alternative
SingletonContext API + Provider@Injectable() service@Injectable() (default)
ObserveruseState + useEffectRxJS ObservablesEventEmitter
DecoratorHigher-Order Component@Decorator syntax@Injectable decorators
FactoryCustom Hook patternFactory serviceProvider pattern

Detection Methodology:

  1. Stack Detection: Analyze package.json, tsconfig.json, config files
  2. Pattern Search: Use Glob → Grep → Read pipeline
    • Glob: Find candidate files (**/*factory*.ts, **/*singleton*.ts)
    • Grep: Match detection patterns (regex for key structures)
    • Read: Verify pattern implementation
  3. Quality Evaluation: Score on 5 criteria (0-10 each)
  4. Smell Detection: Identify anti-patterns and suggest refactoring

Quality Evaluation Criteria:

CriterionWeightDescription
Correctness30%Follows canonical pattern structure
Testability25%Easy to mock, no global state
Single Responsibility20%One clear purpose
Open/Closed15%Extensible without modification
Documentation10%Clear intent, usage examples

Example Usage in Agent:

---
name: architecture-reviewer
description: Review system architecture and design patterns
tools: Read, Grep, Glob
skills:
- design-patterns # Inherits pattern knowledge
---
When reviewing architecture:
1. Use design-patterns skill to detect existing patterns
2. Evaluate pattern implementation quality
3. Suggest improvements based on stack-native alternatives
4. Check for code smells requiring pattern refactoring

Integration with Méthode Aristote:

This skill is now installed in the Méthode Aristote repository at:

/Users/florianbruniaux/Sites/MethodeAristote/app/.claude/skills/design-patterns/

Usage:

  1. Direct invocation: “Analyze design patterns in src/”
  2. Via agent: Create an agent that inherits the design-patterns skill
  3. Automated review: Use in CI/CD to detect pattern violations

Reference:

  • Full documentation: examples/skills/design-patterns/SKILL.md
  • Pattern reference: examples/skills/design-patterns/reference/*.md
  • Detection rules: examples/skills/design-patterns/signatures/*.yaml

The Claude Code community has created specialized skill collections for specific domains. One notable collection focuses on cybersecurity and penetration testing.

Repository: zebbern/claude-code-guide Skills Directory: /skills

This repository contains 29 cybersecurity-focused skills covering penetration testing, vulnerability assessment, and security analysis:

Penetration Testing & Exploitation

  • SQL Injection Testing
  • XSS (Cross-Site Scripting) Testing
  • Broken Authentication Testing
  • IDOR (Insecure Direct Object Reference) Testing
  • File Path Traversal Testing
  • Active Directory Attacks
  • Privilege Escalation (Linux & Windows)

Security Tools & Frameworks

  • Metasploit Framework
  • Burp Suite Testing
  • SQLMap Database Pentesting
  • Wireshark Analysis
  • Shodan Reconnaissance
  • Scanning Tools

Infrastructure Security

  • AWS Penetration Testing
  • Cloud Penetration Testing
  • Network 101
  • SSH Penetration Testing
  • SMTP Penetration Testing

Application Security

  • API Fuzzing & Bug Bounty
  • WordPress Penetration Testing
  • HTML Injection Testing
  • Top Web Vulnerabilities

Methodologies & References

  • Ethical Hacking Methodology
  • Pentest Checklist
  • Pentest Commands
  • Red Team Tools
  • Linux Shell Scripting

To use these skills in your Claude Code setup:

  1. Clone or download specific skills from the repository
  2. Copy the skill folder to your .claude/skills/ directory
  3. Reference in your agents using the skills frontmatter field
Terminal window
# Example: Add SQL injection testing skill
cd ~/.claude/skills/
curl -L https://github.com/zebbern/claude-code-guide/archive/refs/heads/main.zip -o skills.zip
unzip -j skills.zip "claude-code-guide-main/skills/sql-injection-testing/*" -d sql-injection-testing/

Then reference in an agent:

---
name: security-auditor
description: Security testing specialist for penetration testing
tools: Read, Grep, Bash
---

Note: These cybersecurity skills have not been fully tested by the maintainers of this guide. While they appear well-structured and comprehensive based on their documentation, you should:

  • Test thoroughly before using in production security assessments
  • Ensure you have proper authorization before conducting any penetration testing
  • Review and validate the techniques against your organization’s security policies
  • Use only in legal contexts with written permission from system owners
  • Contribute back if you find issues or improvements

The skills appear to follow proper ethical hacking guidelines and include appropriate legal prerequisites, but as with any security tooling, verification is essential.

Repository: antonbabenko/terraform-skill Author: Anton Babenko (creator of terraform-aws-modules, 1B+ downloads, AWS Community Hero) Documentation: terraform-best-practices.com

A production-grade Claude Code skill for Terraform and OpenTofu infrastructure management, covering:

Testing & Validation

  • Test strategy decision frameworks (native tests vs Terratest)
  • Workflow examples for different testing scenarios

Module Development

  • Naming conventions and versioning patterns
  • Structural best practices for reusable modules

CI/CD Integration

  • GitHub Actions and GitLab CI templates
  • Cost estimation and compliance checks baked in

Security & Compliance

  • Static analysis and policy-as-code integration
  • Security scanning workflows

Patterns & Anti-patterns

  • Side-by-side examples of recommended vs problematic approaches
  • Decision frameworks over prescriptive rules

This skill demonstrates several best practices for production-grade skill development:

  1. Marketplace distribution: Uses .claude-plugin/marketplace.json for easy installation
  2. Structured references: Organized references/ directory with knowledge base
  3. Test coverage: Includes tests/ directory for skill validation
  4. Decision frameworks: Emphasizes frameworks over rigid rules, enabling contextual decisions
Terminal window
# Via marketplace (if available)
/install terraform-skill@antonbabenko
# Manual installation
cd ~/.claude/skills/
git clone https://github.com/antonbabenko/terraform-skill.git terraform

If you create specialized skills for other domains (DevOps, data science, ML/AI, etc.), consider sharing them with the community through similar repositories or pull requests to existing collections.

Repository: blader/Claudeception Author: Siqi Chen (@blader) | Stars: 1k+ | License: MIT

Unlike traditional skill repositories, Claudeception is a meta-skill that generates new skills during Claude Code sessions. It addresses a fundamental limitation: “Every time you use an AI coding agent, it starts from zero.”

  1. Monitors your Claude Code sessions via hook activation
  2. Detects non-obvious discoveries (debugging techniques, workarounds, project-specific patterns)
  3. Writes new skill files with Problem/Context/Solution/Verification structure
  4. Retrieves matching skills in future sessions when similar contexts arise

A user reported Claudeception auto-generated a pre-merge-code-review skill from their actual workflow—transforming an ad-hoc debugging session into a reusable, automatically-triggered skill.

Terminal window
# User-level installation
git clone https://github.com/blader/Claudeception.git ~/.claude/skills/claudeception
# Project-level installation
git clone https://github.com/blader/Claudeception.git .claude/skills/claudeception

See the repository README for hook configuration.

AspectRecommendation
GovernanceReview generated skills periodically; archive or merge duplicates
OverheadHook-based activation adds evaluation per prompt
ScopeStart with non-critical projects to validate the workflow
Quality gatesClaudeception only persists tested, discovery-driven knowledge

This skill demonstrates the skill-that-creates-skills pattern—a meta-approach where Claude Code improves itself through session learning. Inspired by academic work on reusable skill libraries (Voyager, CASCADE, SEAgent, Reflexion).

Automatic Skill Improvement: Claude Reflect System

Section titled “Automatic Skill Improvement: Claude Reflect System”

Repository: claude-reflect-system Author: Haddock Development | Status: Production-ready (2026) Marketplace: Agent Skills Index

While Claudeception creates new skills from discovered patterns, Claude Reflect System automatically improves existing skills by analyzing Claude’s feedback and detected corrections during sessions.

Claude Reflect operates in two modes:

Manual Mode (/reflect [skill-name]):

Terminal window
/reflect design-patterns # Analyze and propose improvements for specific skill

Automatic Mode (Stop hook):

  1. Monitors Stop hook triggers (session end, error, explicit stop)
  2. Parses session transcript for skill-related feedback
  3. Classifies improvement type (correction, enhancement, new example)
  4. Proposes skill modifications with confidence level (HIGH/MED/LOW)
  5. Waits for explicit user review and approval
  6. Backs up original skill file to Git
  7. Applies changes with validation (YAML syntax, markdown structure)
  8. Commits with descriptive message
FeaturePurposeImplementation
User Review GatePrevent automatic unwanted changesAll proposals require explicit approval before application
Git BackupsEnable rollback of bad improvementsAuto-commits before each modification with descriptive messages
Syntax ValidationMaintain skill file integrityYAML frontmatter + markdown body validation before write
Confidence LevelsPrioritize high-quality improvementsHIGH (clear correction) > MED (likely improvement) > LOW (suggestion)
Locking MechanismPrevent concurrent modificationsFile locks during analysis and application phases
Terminal window
# Clone to skills directory
git clone https://github.com/haddock-development/claude-reflect-system.git \
~/.claude/skills/claude-reflect-system
# Configure Stop hook (add to ~/.claude/hooks/Stop.sh or Stop.ps1)
# Bash example:
echo '/reflect-auto' >> ~/.claude/hooks/Stop.sh
chmod +x ~/.claude/hooks/Stop.sh
# PowerShell example:
Add-Content -Path "$HOME\.claude\hooks\Stop.ps1" -Value "/reflect-auto"

See the repository README for detailed hook configuration.

Problem: You use a terraform-validation skill that doesn’t catch a specific security misconfiguration. During the session, Claude detects and corrects the issue manually.

Reflect System detects:

  • Claude corrected a pattern not covered by the skill
  • Correction was verified (tests passed)
  • High confidence (clear improvement)

Proposal:

Skill: terraform-validation
Confidence: HIGH
Change: Add S3 bucket encryption validation
Diff:
+ - Check bucket encryption: aws_s3_bucket.*.server_side_encryption_configuration
+ - Reject: Encryption not set or using AES256 instead of aws:kms

User reviews → approves → skill updated → future sessions automatically catch this issue.

Self-improving systems introduce specific security risks. Claude Reflect System includes mitigations, but users must remain vigilant:

RiskDescriptionMitigationUser Responsibility
Feedback PoisoningAdversarial inputs manipulate improvement proposalsUser review gate, confidence scoringReview all HIGH confidence proposals, reject suspicious changes
Memory PoisoningMalicious edits to learned patterns accumulateGit backups, syntax validationPeriodically audit skill history via Git log
Prompt InjectionEmbedded instructions in session transcriptsInput sanitization, proposal isolationNever approve proposals with executable commands
Skill BloatUnbounded growth without curationManual /reflect [skill] mode, curate regularlyArchive or merge redundant improvements quarterly

Academic sources:

  • Anthropic Memory Cookbook (official guidance on agent memory systems)
  • Research on adversarial attacks against AI learning systems
CommandEffect
/reflect-onEnable automatic Stop hook analysis
/reflect-offDisable automatic analysis (manual mode only)
/reflect [skill-name]Manually trigger analysis for specific skill
/reflect statusShow enabled/disabled state and recent proposals

Default: Disabled (opt-in for safety)

Comparison: Claudeception vs Reflect System

Section titled “Comparison: Claudeception vs Reflect System”
AspectClaudeceptionClaude Reflect System
FocusSkill generation (create new)Skill improvement (refine existing)
TriggerNew patterns discoveredCorrections/feedback detected
InputSession discoveries, workaroundsClaude’s self-corrections, user feedback
ReviewImplicit (skill created, user evaluates in next session)Explicit (proposal shown, user approves/rejects)
SafetyQuality gates (only tested discoveries)Git backups, syntax validation, confidence levels
Use CaseBootstrap project-specific skillsEvolve skills based on real-world usage
OverheadHook evaluation per promptStop hook evaluation (session end)
  1. Bootstrap (Claudeception): Let Claude generate skills from discovered patterns during initial project work
  2. Iterate (Use skills): Apply generated skills in subsequent sessions
  3. Refine (Reflect System): Enable /reflect-on to capture improvements as skills evolve with usage
  4. Curate (Manual): Quarterly review via /reflect status and Git history to archive or merge redundant patterns

Example timeline:

  • Week 1-2: Claudeception generates api-error-handling skill from debugging sessions
  • Week 3-6: Skill used in 20+ sessions, catches 80% of error cases
  • Week 7: Reflect detects 3 missed edge cases, proposes HIGH confidence additions
  • Week 8: User approves, skill now catches 95% of cases automatically

Repository: nextlevelbuilder/ui-ux-pro-max-skill Site: ui-ux-pro-max-skill.nextlevelbuilder.io | uupm.cc Stars: 33.7k | Forks: 3.3k | License: MIT | Latest: v2.2.1 (Jan 2026)

UI UX Pro Max is the most popular design skill in the AI coding assistant ecosystem. It adds a design reasoning engine to Claude Code (and 14 other assistants), replacing generic AI-generated UI with professional, industry-aware design systems.

The engine works offline — it runs BM25 search over ~400 local JSON rules to recommend styles, palettes, and typography. No external LLM calls, no network dependency at runtime.

AssetCountExamples
UI Styles67Glassmorphism, Brutalism, Bento Grid, AI-Native UI, Claymorphism…
Color Palettes96Industry-specific: SaaS, fintech, healthcare, e-commerce, luxury…
Font Pairings57Curated Google Fonts combinations with context rules
Chart Types25Dashboard, analytics, BI recommendations
UX Guidelines99Best practices, anti-patterns, accessibility rules
Industry Reasoning Rules100SaaS, fintech, healthcare, e-commerce, beauty, Web3, gaming…

The Design System Generator (v2.0+) analyzes your product type and generates a complete, tailored design system in seconds:

Terminal window
# Generate design system for a SaaS dashboard project
python3 .claude/skills/ui-ux-pro-max/scripts/search.py "saas analytics dashboard" \
--design-system -p "MyApp"
# Output: pattern + style + palette + typography + effects + anti-patterns + checklist

Master + Override pattern for multi-page projects:

Terminal window
# Generate and persist a global design system
python3 .claude/skills/ui-ux-pro-max/scripts/search.py "saas dashboard" \
--design-system --persist -p "MyApp"
# Create page-specific overrides
python3 .claude/skills/ui-ux-pro-max/scripts/search.py "checkout flow" \
--design-system --persist -p "MyApp" --page "checkout"

This creates a design-system/ folder:

design-system/
├── MASTER.md # Global: colors, typography, spacing, components
└── pages/
└── checkout.md # Page-specific overrides only

Reference in your Claude Code prompts:

I am building the Checkout page.
Read design-system/MASTER.md, then check design-system/pages/checkout.md.
Prioritize page rules if present, otherwise use Master rules.
Now generate the code.

Option 1 — Claude Marketplace (two commands):

/plugin marketplace add nextlevelbuilder/ui-ux-pro-max-skill
/plugin install ui-ux-pro-max@ui-ux-pro-max-skill

Option 2 — CLI (recommended):

Terminal window
npm install -g uipro-cli
cd /path/to/your/project
uipro init --ai claude # Claude Code

Option 3 — Manual (no npm):

Terminal window
git clone --depth=1 https://github.com/nextlevelbuilder/ui-ux-pro-max-skill /tmp/uipro
cp -r /tmp/uipro/.claude/skills/ui-ux-pro-max .claude/skills/

Prerequisite: Python 3.x must be installed (the reasoning engine is a Python script).

Once installed, the skill activates automatically for UI/UX requests in Claude Code:

Build a landing page for my SaaS product
Create a dashboard for healthcare analytics
Design a fintech app with dark theme
AspectNotes
ScopeMulti-platform — supports Cursor, Windsurf, Copilot, Gemini CLI, and 10 others alongside Claude Code
Quality signal33.7k stars, 3.3k forks in 3 months — strongest community traction of any design skill
MaintenanceActive — v2.0→v2.2.1 in 10 days (Jan 2026), updated regularly
Chinese communityStrong adoption: listed on jimmysong.io, benchmark repos in Chinese dev ecosystem

Security note: npm install -g uipro-cli installs a package from an anonymous organization (“nextlevelbuilder”) globally. Source audit (Feb 2026) confirmed:

  • No preinstall/postinstall scripts in the npm package
  • No network calls in the Python engine (search.py, core.py, design_system.py — stdlib + local CSV/JSON only)

Option 3 (manual git clone) remains the safest route if you want to inspect before installing. The package has not been formally audited by Anthropic or the maintainers of this guide.

For comprehensive DevOps/SRE workflows, see DevOps & SRE Guide:

  • The FIRE Framework: First Response → Investigate → Remediate → Evaluate
  • Kubernetes troubleshooting: Prompts by symptom (CrashLoopBackOff, OOMKilled, etc.)
  • Incident response: Solo and multi-agent patterns
  • IaC patterns: Terraform, Ansible, GitOps workflows
  • Guardrails: Security boundaries and team adoption checklist

Quick Start: Agent Template | CLAUDE.md Template

URL: skills.sh | GitHub: vercel-labs/agent-skills | Launched: January 21, 2026

Skills.sh (Vercel Labs) provides a centralized marketplace for discovering and installing agent skills with one-command installation:

Terminal window
npx add-skill vercel-labs/agent-skills # React/Next.js best practices (35K+ installs)
npx add-skill supabase/agent-skills # Postgres optimization patterns
npx add-skill anthropics/skills # Frontend design + skill-creator
npx add-skill anthropics/claude-plugins-official # CLAUDE.md auditor + automation recommender

Installation: Skills are copied to ~/.claude/skills/ (same format as this guide)

Supported agents: 20+ including Claude Code, Cursor, GitHub Copilot, Windsurf, Cline, Goose, and others

Format: Standard SKILL.md with YAML frontmatter (100% compatible with Section 5.2-5.3)

CategoryTop SkillsInstallsCreator
Frontendvercel-react-best-practices35K+vercel-labs
web-design-guidelines26.6Kvercel-labs
frontend-design5.6Kanthropics
Databasesupabase-postgres-best-practices1K+supabase
Authbetter-auth-best-practices2K+better-auth
Testingtest-driven-development721obra
Mediaremotion-best-practicesNewremotion-dev
Metaskill-creator3.2Kanthropics
Toolingclaude-md-improver472anthropics
claude-automation-recommender333anthropics

Full catalog: skills.sh leaderboard

Vercel launched automated security scanning on every skills.sh skill (announcement, Feb 17, 2026), partnering with three independent security firms covering 60,000+ skills:

PartnerMethodPerformance
SocketCross-ecosystem static analysis + LLM-based noise reduction (curl|sh, obfuscation, exfiltration, suspicious deps)95% precision, 97% F1
Snykmcp-scan engine: LLM judges + deterministic rules, detects “toxic flows” between natural language and executable code90-100% recall, 0% false positives on legit skills
Gen (Agent Trust Hub)Real-time monitoring of connections in/out of agents to prevent data exfiltration and prompt injectionContinuous

Risk levels displayed on every skill page and shown before installation via skills@1.4.0+:

RatingMeaning
✅ SafeVerified against security best practices
🟡 Low RiskMinor risk indicators detected
🔴 High RiskSignificant security concerns
☠️ CriticalSevere or malicious behavior — hidden from search

Continuous monitoring: skills are re-evaluated as detection improves. If a repository becomes malicious after install, its rating updates automatically.

Mental model: treat a skill like a Docker image — it’s an executable dependency, not a prompt. Verify the rating before installing in production.

Status: Launched Jan 21, 2026, security-audited since Feb 17, 2026 (Socket + Snyk + Gen)

Governance: Community project by Vercel Labs (not official Anthropic). Skills contributed by Vercel, Anthropic, Supabase, and community members.

Trade-offs:

  • ✅ Centralized discovery + leaderboard (200+ skills)
  • ✅ One-command install (vs manual GitHub clone)
  • ✅ Format 100% compatible with this guide
  • ✅ Automated 3-layer security audit before installation
  • ✅ Continuous monitoring post-install
  • ⚠️ Multi-agent focus (not Claude Code specific)
  • ⚠️ Skills require explicit invocation; agents only auto-invoke them ~56% of the time (Gao, 2026). For critical instructions, prefer always-loaded CLAUDE.md
Use CaseRecommendation
Discover popular patternsskills.sh (leaderboard, trending)
Install official framework skillsskills.sh (Vercel React, Supabase, etc.)
Team-specific/internal skillsGitHub repos (like claude-code-templates, 17K⭐)
Custom enterprise skillsLocal .claude/skills/ (Section 5.2-5.3)

Standard installation (global, all Claude Code sessions):

Terminal window
# Install Vercel bundle (3 skills: react + web-design + deploy)
npx add-skill vercel-labs/agent-skills
# Install Supabase Postgres patterns
npx add-skill supabase/agent-skills
# Verify installation
ls ~/.claude/skills/
# Output: react-best-practices/ web-design-guidelines/ vercel-deploy/

Manual installation (project-specific):

Terminal window
# Clone from GitHub
git clone https://github.com/vercel-labs/agent-skills.git /tmp/agent-skills
# Copy specific skill
cp -r /tmp/agent-skills/react-best-practices .claude/skills/
# Claude Code auto-discovers skills in .claude/skills/

Quick jump: Slash Commands · Creating Custom Commands · Command Template · Command Examples


Note (January 2026): Skills and Commands are being unified. Both now use the same invocation mechanism (/skill-name or /command-name), share YAML frontmatter syntax, and can be triggered identically. The conceptual distinction (skills = knowledge modules, commands = workflow templates) remains useful for organization, but technically they’re converging. Create new ones based on purpose, not mechanism.


Reading time: 10 minutes Skill level: Week 1-2 Goal: Create custom slash commands