Help Modes
Help modes control how the LLM learns about your CLI's capabilities. Different modes test different aspects of agent-readiness.
The Three Modes
injected (default)
The CLI's help text is injected into the LLM's prompt before each task.
help_modes:
- injected
How it works: cli-bench runs <cli> --help (and subcommand help), captures the output, and includes relevant sections in the prompt. Help text is selected based on task keywords (up to ~4K characters), so the LLM sees the most relevant help for each task.
Tests: Can the LLM use your CLI when given the documentation?
Best for: Initial benchmarking, CLIs with good --help output, testing if your help text is clear.
discoverable
The LLM must discover help on its own by running commands.
help_modes:
- discoverable
How it works: The LLM only gets the CLI name and the task intent. It must run <cli> --help or explore on its own.
Tests: Can the LLM figure out your CLI without pre-loaded docs?
Best for: Testing CLI discoverability, measuring the cost of discovery (extra turns and tokens).
none
No help text is provided and the LLM is not prompted to discover it.
help_modes:
- none
How it works: The LLM only gets the CLI name and the task intent. It relies on training data knowledge.
Tests: Does the LLM already know your CLI from training data?
Best for: Well-known CLIs (git, docker, npm), measuring baseline LLM knowledge.
Using Multiple Modes
Test with multiple modes to compare:
help_modes:
- injected
- discoverable
Each mode runs all tasks independently. Results are tagged by help mode in the dashboard, so you can see:
- How much your help text improves pass rates (injected vs. none)
- How discoverable your CLI is (discoverable vs. injected)
- How well-known your CLI is (none vs. injected)
Impact on Turns
| Mode | Typical Extra Turns | Notes |
|---|---|---|
injected | 0 | Help is pre-loaded |
discoverable | 1-3+ | LLM runs help commands first |
none | 0 | LLM uses prior knowledge |
Discovery turns count toward max_turns, so increase it for discoverable mode. A good rule of thumb: add 2-3 turns on top of what you'd set for injected.
tasks:
- id: complex-task
intent: "Do something complex"
max_turns: 10 # e.g., 7 for injected + 3 for discovery
assert:
- exit_code: 0
Recommendations
- Start with
injected— get your tasks and assertions right first - Add
discoverable— see if your CLI is self-documenting - Try
nonefor popular CLIs — measure baseline knowledge - Compare modes — the delta tells you how good your docs are