codex-skill-auditor
A Codex skill and CLI helper for auditing Codex skill folders before publishing.
A command-line tool that audits AI agent skill folders for structural problems and trigger-quality issues before you publish or share them, with CI and pre-commit hook support.
This repository provides a command-line tool for checking AI agent skill folders before publishing or sharing them. Skills for AI coding assistants like OpenAI Codex or Anthropic Claude are organized as folders containing a description file, configuration files, and optional scripts. The auditor scans these folders and reports problems that are easy to miss during manual review.
The checks cover two broad areas. Structural checks catch things like missing required files, invalid YAML formatting, placeholder text that was never replaced, files that are too large, broken internal links, and Python syntax errors in bundled scripts. Trigger quality checks, added in version 0.3, focus on the description field that an agent reads to decide whether to load and follow a skill. The key finding the tool is built around is that if a description summarizes what the skill does rather than stating when it should be activated, agents tend to follow the description instead of reading the full skill instructions. The auditor flags descriptions written in first-person, descriptions that summarize workflow steps, and descriptions that exceed character budget limits.
The tool outputs color-coded results in the terminal and switches to plain Markdown automatically when run in a CI environment like GitHub Actions. A strict mode causes the command to exit with a failure code even for lower-severity findings, which is useful for blocking a pull request if a skill has quality issues. An autofix flag handles mechanical corrections automatically, such as renaming a skill to match its folder name or moving documentation files outside the skill folder. Judgment calls like rewriting a description are left to the person reviewing the output.
The tool can also audit an entire directory of skills at once and detect when two skills have descriptions that overlap too closely, which can cause agents to trigger the wrong one. It is installable as a Python package, a GitHub Action, or a pre-commit hook. A future planned feature would drive a small language model against a scenario file to test whether a skill actually triggers correctly.
Where it fits
- Catch missing files, bad YAML, or placeholder text in an AI skill folder before publishing it.
- Block a pull request in CI if a skill's trigger description is written in a way that confuses agents.
- Scan an entire directory of skills at once to find descriptions that overlap and could cause the wrong skill to fire.
- Run as a pre-commit hook so skill quality issues are caught locally before they reach code review.