GitHub Actions

Run CLIWatch benchmarks on every PR and push to main.

Basic Workflow

Create .github/workflows/cliwatch.yml:

name: CLIWatch Benchmarks
on:
  pull_request:
  push:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm install -g @cliwatch/cli-bench
      - run: cli-bench
        env:
          AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
          CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

Environment Variables

Add these as repository secrets in Settings → Secrets and variables → Actions:

Variable	Required	Description
`CLIWATCH_API_KEY`	Yes	Your CLIWatch API key for uploading results
`AI_GATEWAY_API_KEY`	Yes	Vercel AI Gateway key — provides access to all models

All model calls go through the Vercel AI Gateway, so you only need one API key for all providers.

PR Comments

CLIWatch can post benchmark results as PR comments:

Install the CLIWatch GitHub App on your repository
The app automatically comments on PRs with benchmark results
Comments show pass rates, regressions, and task-level details

Connect the GitHub App at app.cliwatch.com under Settings → GitHub.

Threshold-Based CI Gating

Use thresholds to fail CI when pass rates drop:

# cli-bench.yaml
thresholds:
  default: 80
  tolerance: 5
  behavior: error
  models:
    anthropic/claude-sonnet-4-20250514: 90
    openai/gpt-4o-mini: 70

With this config, the CI job exits with code 1 if thresholds are violated.

With Build Step

If your CLI needs to be built before benchmarking:

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: npm
      - run: npm ci
      - run: npm run build
      - run: npm install -g @cliwatch/cli-bench
      - run: cli-bench
        env:
          AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
          CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

Tips

Run benchmarks on PRs to catch regressions before merge
Use upload: auto (default) — results upload when CLIWATCH_API_KEY is set
Set concurrency: 1 in CI if your CLI modifies shared state
Use --dry-run to test your config without running the LLM

Basic Workflow​

Environment Variables​

PR Comments​

Threshold-Based CI Gating​

With Build Step​

Tips​