GitHub Actions
Run CLIWatch benchmarks on every PR and push to main.
Basic Workflow
Create .github/workflows/cliwatch.yml:
name: CLIWatch Benchmarks
on:
pull_request:
push:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}
Environment Variables
Add these as repository secrets in Settings → Secrets and variables → Actions:
| Variable | Required | Description |
|---|---|---|
CLIWATCH_API_KEY | Yes | Your CLIWatch API key for uploading results |
AI_GATEWAY_API_KEY | Yes | Vercel AI Gateway key — provides access to all models |
All model calls go through the Vercel AI Gateway, so you only need one API key for all providers.
PR Comments
CLIWatch can post benchmark results as PR comments:
- Install the CLIWatch GitHub App on your repository
- The app automatically comments on PRs with benchmark results
- Comments show pass rates, regressions, and task-level details
Connect the GitHub App at app.cliwatch.com under Settings → GitHub.
Threshold-Based CI Gating
Use thresholds to fail CI when pass rates drop:
# cli-bench.yaml
thresholds:
default: 80
tolerance: 5
behavior: error
models:
anthropic/claude-sonnet-4-20250514: 90
openai/gpt-4o-mini: 70
With this config, the CI job exits with code 1 if thresholds are violated.
With Build Step
If your CLI needs to be built before benchmarking:
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run build
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}
Tips
- Run benchmarks on PRs to catch regressions before merge
- Use
upload: auto(default) — results upload whenCLIWATCH_API_KEYis set - Set
concurrency: 1in CI if your CLI modifies shared state - Use
--dry-runto test your config without running the LLM