Context Modes

Context modes control what information the LLM receives about your CLI before attempting a task.

Zero-Shot (Default)

CLIWatch uses zero-shot prompting. The LLM receives only the CLI name and the task intent, then relies on its training knowledge to figure out the right commands.

What the LLM sees:

CLI: mycli

Task: Create a new project called my-app

This tests the most realistic scenario: an AI coding agent encountering your CLI without special preparation. If a model already knows your CLI well, it will succeed. If not, you will see lower pass rates and higher turn counts as the agent experiments.

Custom System Prompts

You can give the LLM additional context via the system_prompt field in your config. This is appended to the default agent prompt.

system_prompt: |
  This CLI requires authentication. Run 'mycli auth login --token test' first.
  All commands output JSON by default.

Use this for:

Authentication instructions the agent needs to follow
Environment-specific context (e.g., "the database is already running")
Clarifications about non-obvious CLI behavior

The system prompt applies to all tasks in the suite. Keep it concise; the agent still needs to figure out the commands on its own.

Scaffolds for Pre-Loaded Context

If your tasks need files, configs, or project structure to already exist, use scaffolds instead of context modes. Scaffolds copy a directory into the task's working directory before execution, giving the agent something to work with.

scaffold: scaffolds/starter

tasks:
  - id: modify-config
    intent: "Change the database port to 5433"
    assert:
      - file_contains:
          path: "config.yaml"
          text: "port: 5433"

Tips

Start with zero-shot. It measures what matters most: can an agent use your CLI out of the box?
Use system_prompt sparingly. Only include information the agent genuinely needs (auth setup, environment assumptions). Do not paste your entire docs into it.
Compare across models. Different models have different training data. A CLI that scores well with one model may score poorly with another. Use multiple providers to get a balanced picture.

Zero-Shot (Default)​

Custom System Prompts​

Scaffolds for Pre-Loaded Context​

Tips​

Zero-Shot (Default)

Custom System Prompts

Scaffolds for Pre-Loaded Context

Tips