Skip to main content

GitHub Actions

Run CLIWatch benchmarks on every PR and push to main.

Basic Workflow

Create .github/workflows/cliwatch.yml:

name: CLIWatch Benchmarks
on:
pull_request:
push:
branches: [main]

jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

Environment Variables

Add these as repository secrets in Settings → Secrets and variables → Actions:

VariableRequiredDescription
CLIWATCH_API_KEYYesYour CLIWatch API key for uploading results
AI_GATEWAY_API_KEYYesVercel AI Gateway key — provides access to all models

All model calls go through the Vercel AI Gateway, so you only need one API key for all providers.

PR Comments

CLIWatch can post benchmark results as PR comments:

  1. Install the CLIWatch GitHub App on your repository
  2. The app automatically comments on PRs with benchmark results
  3. Comments show pass rates, regressions, and task-level details

Connect the GitHub App at app.cliwatch.com under Settings → GitHub.

Threshold-Based CI Gating

Use thresholds to fail CI when pass rates drop:

# cli-bench.yaml
thresholds:
default: 80
tolerance: 5
behavior: error
models:
anthropic/claude-sonnet-4-20250514: 90
openai/gpt-4o-mini: 70

With this config, the CI job exits with code 1 if thresholds are violated.

With Build Step

If your CLI needs to be built before benchmarking:

jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run build
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

Tips

  • Run benchmarks on PRs to catch regressions before merge
  • Use upload: auto (default) — results upload when CLIWATCH_API_KEY is set
  • Set concurrency: 1 in CI if your CLI modifies shared state
  • Use --dry-run to test your config without running the LLM