Skip to main content

Help Modes

Help modes control how the LLM learns about your CLI's capabilities. Different modes test different aspects of agent-readiness.

The Three Modes

injected (default)

The CLI's help text is injected into the LLM's prompt before each task.

help_modes:
- injected

How it works: cli-bench runs <cli> --help (and subcommand help), captures the output, and includes relevant sections in the prompt. Help text is selected based on task keywords (up to ~4K characters), so the LLM sees the most relevant help for each task.

Tests: Can the LLM use your CLI when given the documentation?

Best for: Initial benchmarking, CLIs with good --help output, testing if your help text is clear.

discoverable

The LLM must discover help on its own by running commands.

help_modes:
- discoverable

How it works: The LLM only gets the CLI name and the task intent. It must run <cli> --help or explore on its own.

Tests: Can the LLM figure out your CLI without pre-loaded docs?

Best for: Testing CLI discoverability, measuring the cost of discovery (extra turns and tokens).

none

No help text is provided and the LLM is not prompted to discover it.

help_modes:
- none

How it works: The LLM only gets the CLI name and the task intent. It relies on training data knowledge.

Tests: Does the LLM already know your CLI from training data?

Best for: Well-known CLIs (git, docker, npm), measuring baseline LLM knowledge.

Using Multiple Modes

Test with multiple modes to compare:

help_modes:
- injected
- discoverable

Each mode runs all tasks independently. Results are tagged by help mode in the dashboard, so you can see:

  • How much your help text improves pass rates (injected vs. none)
  • How discoverable your CLI is (discoverable vs. injected)
  • How well-known your CLI is (none vs. injected)

Impact on Turns

ModeTypical Extra TurnsNotes
injected0Help is pre-loaded
discoverable1-3+LLM runs help commands first
none0LLM uses prior knowledge

Discovery turns count toward max_turns, so increase it for discoverable mode. A good rule of thumb: add 2-3 turns on top of what you'd set for injected.

tasks:
- id: complex-task
intent: "Do something complex"
max_turns: 10 # e.g., 7 for injected + 3 for discovery
assert:
- exit_code: 0

Recommendations

  1. Start with injected — get your tasks and assertions right first
  2. Add discoverable — see if your CLI is self-documenting
  3. Try none for popular CLIs — measure baseline knowledge
  4. Compare modes — the delta tells you how good your docs are