Context Modes
Context modes control what information the LLM receives about your CLI before attempting a task.
Zero-Shot (Default)
CLIWatch uses zero-shot prompting. The LLM receives only the CLI name and the task intent, then relies on its training knowledge to figure out the right commands.
What the LLM sees:
CLI: mycli
Task: Create a new project called my-app
This tests the most realistic scenario: an AI coding agent encountering your CLI without special preparation. If a model already knows your CLI well, it will succeed. If not, you will see lower pass rates and higher turn counts as the agent experiments.
Custom System Prompts
You can give the LLM additional context via the system_prompt field in your config. This is appended to the default agent prompt.
system_prompt: |
This CLI requires authentication. Run 'mycli auth login --token test' first.
All commands output JSON by default.
Use this for:
- Authentication instructions the agent needs to follow
- Environment-specific context (e.g., "the database is already running")
- Clarifications about non-obvious CLI behavior
The system prompt applies to all tasks in the suite. Keep it concise; the agent still needs to figure out the commands on its own.
Scaffolds for Pre-Loaded Context
If your tasks need files, configs, or project structure to already exist, use scaffolds instead of context modes. Scaffolds copy a directory into the task's working directory before execution, giving the agent something to work with.
scaffold: scaffolds/starter
tasks:
- id: modify-config
intent: "Change the database port to 5433"
assert:
- file_contains:
path: "config.yaml"
text: "port: 5433"
Tips
- Start with zero-shot. It measures what matters most: can an agent use your CLI out of the box?
- Use
system_promptsparingly. Only include information the agent genuinely needs (auth setup, environment assumptions). Do not paste your entire docs into it. - Compare across models. Different models have different training data. A CLI that scores well with one model may score poorly with another. Use multiple providers to get a balanced picture.