CLIWatch Docs
Agent-readiness testing for CLIs — benchmark how well AI coding agents can use your command-line tool, catch regressions in CI, and get PR comments with pass rates.
Quick Start
Option 1: Let your AI assistant set it up
Paste this into Claude Code, Cursor, or Codex:
Install
@cliwatch/cliglobally, then runcliwatch skillsto read the setup docs. Usecliwatch init --cito scaffold the benchmark config and a GitHub Actions workflow. Make sureCLIWATCH_API_KEYandAI_GATEWAY_API_KEYare set as GitHub secrets so results upload and I get PR comments.
Option 2: Set up manually
1. Create a task suite
Create a cli-bench.yaml in your project root (next to package.json). This is where cli-bench looks when you run it:
cli: mycli
version_command: "mycli --version"
tasks:
- id: show-help
intent: "Show the help information for mycli"
difficulty: easy
assert:
- exit_code: 0
- output_contains: "usage"
- id: create-project
intent: "Create a new project called my-app"
difficulty: medium
assert:
- exit_code: 0
- file_exists: "my-app/package.json"
See the full cli-bench.yaml Reference for all options.
2. Run the benchmark
npm install -g @cliwatch/cli-bench
export AI_GATEWAY_API_KEY="vck_..."
export CLIWATCH_API_KEY="cw_..."
# Run from the directory containing cli-bench.yaml
cli-bench --upload
All model calls go through the Vercel AI Gateway — one key for all providers. Create a CLIWatch API key at app.cliwatch.com.
3. Add to CI
Create .github/workflows/cliwatch.yml:
name: CLIWatch Benchmarks
on:
pull_request:
push:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}
See the full GitHub Actions guide for caching, thresholds, and PR comments.
4. View results
Results appear at app.cliwatch.com with:
- Pass rate matrix — which tasks pass on which models
- Trend charts — track pass rates across releases
- PR comments — benchmark results posted on every pull request
What's Next?
- cli-bench.yaml Reference — Full config schema
- Assertions — All 10 assertion types
- Providers & Models — Supported LLMs and model IDs
- GitHub Actions — CI setup with thresholds and PR comments
- Writing Effective Tasks — Tips for good intents and test strategy
- Troubleshooting — Common issues and debugging