Skip to main content

GitHub Actions

Run CLIWatch benchmarks on every PR and push to main.

Basic Workflow

Create .github/workflows/cliwatch.yml:

name: CLIWatch Benchmarks
on:
pull_request:
push:
branches: [main]

jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

Environment Variables

Add these as repository secrets in Settings → Secrets and variables → Actions:

VariableRequiredDescription
CLIWATCH_API_KEYYesYour CLIWatch API key for uploading results
AI_GATEWAY_API_KEYYesVercel AI Gateway key (provides access to all models)

All model calls go through the Vercel AI Gateway, so you only need one API key for all providers.

PR Comments via GitHub App

CLIWatch can automatically post eval results as comments on your pull requests.

Setup

  1. Go to app.cliwatch.com, then Settings > General
  2. Click Connect GitHub and authorize the CLIWatch GitHub App
  3. Select the repositories you want to enable

Once connected, the app posts a comment on every PR that triggers an eval run.

What PR Comments Show

  • Pass rate summary: overall pass rate for the run, with comparison to main branch
  • Regressions: tasks that passed on main but fail on the PR branch
  • Per-model breakdown: pass rates for each model tested
  • Link to dashboard: click through to the full run detail with conversation traces

PR comments update automatically if you push new commits to the same PR.

Threshold-Based CI Gating

Use thresholds to fail CI when pass rates drop:

# cli-bench.yaml
thresholds:
default: 80
tolerance: 5
behavior: error
models:
anthropic/claude-sonnet-4.6: 90
google/gemini-3-flash: 70

With this config, the CI job exits with code 1 if thresholds are violated.

With Build Step

If your CLI needs to be built before benchmarking:

jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run build
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}

PR Comment via --github-comment

If you prefer a lightweight approach without installing the GitHub App, use --github-comment to write a markdown summary to a file, then post it with gh:

jobs:
benchmark:
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- run: npm install -g @cliwatch/cli-bench
- run: cli-bench --github-comment pr-comment.md
env:
AI_GATEWAY_API_KEY: ${{ secrets.AI_GATEWAY_API_KEY }}
CLIWATCH_API_KEY: ${{ secrets.CLIWATCH_API_KEY }}
- name: Post PR comment
if: github.event_name == 'pull_request' && hashFiles('pr-comment.md') != ''
run: gh pr comment ${{ github.event.pull_request.number }} --body-file pr-comment.md
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

The --github-comment flag is skipped in --dry-run mode.

Tips

  • Run benchmarks on PRs to catch regressions before merge
  • Use upload: auto (default): results upload when CLIWATCH_API_KEY is set
  • Set concurrency: 1 in CI if your CLI modifies shared state
  • Use --dry-run to test your config without running the LLM
  • Use --tags smoke to run a fast subset of tasks in CI