Help Modes

Help modes control how the LLM learns about your CLI's capabilities. Different modes test different aspects of agent-readiness.

The Three Modes

`injected` (default)

The CLI's help text is injected into the LLM's prompt before each task.

help_modes:
  - injected

How it works: cli-bench runs <cli> --help (and subcommand help), captures the output, and includes relevant sections in the prompt. Help text is selected based on task keywords (up to ~4K characters), so the LLM sees the most relevant help for each task.

Tests: Can the LLM use your CLI when given the documentation?

Best for: Initial benchmarking, CLIs with good --help output, testing if your help text is clear.

`discoverable`

The LLM must discover help on its own by running commands.

help_modes:
  - discoverable

How it works: The LLM only gets the CLI name and the task intent. It must run <cli> --help or explore on its own.

Tests: Can the LLM figure out your CLI without pre-loaded docs?

Best for: Testing CLI discoverability, measuring the cost of discovery (extra turns and tokens).

`none`

No help text is provided and the LLM is not prompted to discover it.

help_modes:
  - none

How it works: The LLM only gets the CLI name and the task intent. It relies on training data knowledge.

Tests: Does the LLM already know your CLI from training data?

Best for: Well-known CLIs (git, docker, npm), measuring baseline LLM knowledge.

Using Multiple Modes

Test with multiple modes to compare:

help_modes:
  - injected
  - discoverable

Each mode runs all tasks independently. Results are tagged by help mode in the dashboard, so you can see:

How much your help text improves pass rates (injected vs. none)
How discoverable your CLI is (discoverable vs. injected)
How well-known your CLI is (none vs. injected)

Impact on Turns

Mode	Typical Extra Turns	Notes
`injected`	0	Help is pre-loaded
`discoverable`	1-3+	LLM runs help commands first
`none`	0	LLM uses prior knowledge

Discovery turns count toward max_turns, so increase it for discoverable mode. A good rule of thumb: add 2-3 turns on top of what you'd set for injected.

tasks:
  - id: complex-task
    intent: "Do something complex"
    max_turns: 10       # e.g., 7 for injected + 3 for discovery
    assert:
      - exit_code: 0

Recommendations

Start with injected — get your tasks and assertions right first
Add discoverable — see if your CLI is self-documenting
Try none for popular CLIs — measure baseline knowledge
Compare modes — the delta tells you how good your docs are

The Three Modes​

injected (default)​

discoverable​

none​

Using Multiple Modes​

Impact on Turns​

Recommendations​