# Dynobox Documentation Corpus

---


---

This file concatenates the public Dynobox documentation into one plain-text corpus for agent ingestion.

---


---

# Page: Dynobox Docs

Canonical URL: https://docs.dynobox.xyz/
Markdown URL: https://docs.dynobox.xyz/README.md
Source: https://github.com/dynobox/dynobox/blob/main/docs/README.md
Topics: overview, agent testing, harnesses, assertions, config formats

# Dynobox Docs

Dynobox is a local test runner for agent and skill workflows. You describe a
task, choose one or more local agent harnesses, and assert on observable
behavior such as tool calls, shell commands, files in the sandbox, transcripts,
HTTP requests, and final messages.

Dynobox is useful when you want repeatable checks for agent behavior before
shipping a prompt, skill, or workflow change.

## Start Here

- [Getting Started](./getting-started.md): install the CLI, scaffold a dyno,
  and run your first scenario.
- [Config Authoring](./config-authoring.md): write JavaScript, TypeScript, or
  YAML dynos with the `@dynobox/sdk` helpers.
- [CLI Reference](./cli.md): commands, flags, output modes, JSON reports, and
  exit behavior.
- [CI Integration](./ci.md): run Dynobox in GitHub Actions and publish JSON
  reports as build artifacts.

## Agent Resources

The docs site publishes agent-oriented entry points for retrieval and indexing:

- [`llms.txt`](https://docs.dynobox.xyz/llms.txt): concise docs map with
  canonical HTML, markdown, source, package, and command references.
- [`llms-full.txt`](https://docs.dynobox.xyz/llms-full.txt): the full public
  docs corpus as one plain-text file.
- [`docs-index.json`](https://docs.dynobox.xyz/docs-index.json):
  machine-readable page metadata, topics, headings, and canonical URLs.
- Raw markdown pages such as
  [`getting-started.md`](https://docs.dynobox.xyz/getting-started.md) for
  direct ingestion without HTML parsing.

## What Dynobox Tests

Dynobox runs each scenario in an isolated temporary work directory. Setup
commands create the fixture, the selected harness performs the task, and
assertions evaluate what happened.

You can assert:

- Tool calls, including expected and prohibited shell commands.
- Skill instruction loading with `skill.invoked(...)`.
- Ordered tool-call sequences.
- Files present inside the scenario work directory.
- Harness transcript and final-message text.
- HTTP requests made by local child-process tools that honor proxy environment
  variables.

## Supported Harnesses

Dynobox currently runs local scenarios through:

- Claude Code via the `claude` executable.
- OpenAI Codex via the `codex` executable.

Each harness must already be installed, authenticated, and available on
`PATH`.

## Supported Config Formats

Dynobox discovers `*.dyno.{mjs,js,ts,mts,yaml,yml}` files recursively when you
run a directory. Explicit file paths can use non-`*.dyno.*` names, such as
`dynobox.config.ts`, as long as they are loadable Dynobox configs.

Supported authoring formats:

- TypeScript or JavaScript with `defineDyno(...)` from `@dynobox/sdk`.
- YAML with the same `type`-discriminated assertion objects that SDK helpers
  return.

CommonJS config files (`.cjs` and `.cts`) are not supported because the SDK is
ESM-only.

## Current Limits

Dynobox is under active development and is currently focused on local
execution. These areas are not complete yet:

- HTTP capture for harness-native web tools and binaries that ignore proxy/CA
  environment variables.
- Hosted or remote runner execution.
- Rich multi-iteration controls from authored configs.

---

# Page: Getting Started

Canonical URL: https://docs.dynobox.xyz/getting-started/
Markdown URL: https://docs.dynobox.xyz/getting-started.md
Source: https://github.com/dynobox/dynobox/blob/main/docs/getting-started.md
Topics: install, init, run, harnesses, debug

# Getting Started

This guide gets you from an empty project to one passing Dynobox run.

Dynobox tests live in `*.dyno.*` files. A dyno describes a prompt, optional
setup commands, one or more harnesses, and assertions about what the harness did
while completing the task.

## Prerequisites

- Node.js 22 or newer.
- At least one supported local harness:
  - `claude` for Claude Code.
  - `codex` for OpenAI Codex.

The selected harness must be installed, authenticated, and available on `PATH`.

## Install

Install the CLI:

```bash
npm install -g dynobox
```

Check that it is available:

```bash
dynobox --help
```

## Create Your First Dyno

Use `dynobox init` to scaffold a starter scenario:

```bash
dynobox init
```

This writes `dynobox/example.dyno.mjs`. Run it with:

```bash
dynobox run
```

By default, `dynobox run` discovers every `*.dyno.{mjs,js,ts,mts,yaml,yml}`
file under the current directory.

## Choose A Harness

Each dyno can declare its own harness list. You can also override harnesses at
runtime:

```bash
dynobox run --harness claude-code
dynobox run --harness codex
dynobox run --harness claude-code,codex
```

If neither the config nor the CLI selects a harness, Dynobox defaults to
`claude-code`.

## Author A Minimal Dyno

The example below asks the harness to inspect `package.json` and checks that it
used a shell command, did not edit files, and mentioned the test script in the
final answer.

```ts
import {artifact, defineDyno, finalMessage, tool} from '@dynobox/sdk';

export default defineDyno({
  name: 'package-script-check',
  harnesses: ['claude-code'],
  scenarios: [
    {
      name: 'detects test script',
      setup: [
        `cat > package.json <<'JSON'
{
  "name": "fixture",
  "scripts": {"test": "vitest run"}
}
JSON`,
      ],
      prompt:
        'Inspect package.json and tell me whether this project has a test script.',
      assertions: [
        tool.called('shell', {includes: 'package.json'}),
        tool.notCalled('edit_file'),
        artifact.contains('package.json', 'vitest run'),
        finalMessage.contains('test'),
      ],
    },
  ],
});
```

The same dyno can be authored in YAML:

```yaml
name: package-script-check
harnesses:
  - claude-code
scenarios:
  - name: detects test script
    prompt: >-
      Inspect package.json and tell me whether this project has a test script.
    setup:
      - |
        cat > package.json <<'JSON'
        {
          "name": "fixture",
          "scripts": {"test": "vitest run"}
        }
        JSON
    assertions:
      - label: reads package.json
        type: tool.called
        tool: shell
        command:
          includes: package.json
      - type: tool.notCalled
        tool: edit_file
      - type: artifact.contains
        path: package.json
        text: vitest run
      - type: finalMessage.contains
        text: test
```

See [Config Authoring](./config-authoring.md) for the full assertion reference.

## Run A Specific Target

`dynobox run [path]` accepts:

- No argument: discover dynos recursively under the current directory.
- Directory path: discover dynos recursively under that directory.
- File path: run one loadable Dynobox config file.

Examples:

```bash
dynobox run
dynobox run examples/local-observability
dynobox run my-skill.dyno.yaml
dynobox run dynobox.config.ts
```

Directory discovery skips hidden entries, `node_modules`, `dist`, `build`,
`coverage`, `.git`, `.dynobox`, `.next`, and `.cache`. Explicit file paths do
not need to match the `*.dyno.*` naming pattern, but they still need to be
loadable JavaScript, TypeScript, or YAML Dynobox configs. `.cjs` and `.cts`
configs are not supported.

## Debug A Run

Use these flags while developing scenarios:

```bash
dynobox run --verbose
dynobox run --debug
dynobox run --reporter json
```

`--debug` includes each job's temporary work directory and writes debug logs
when data is available:

- `dynobox-transcript.log`
- `dynobox-chat-history.jsonl`
- `dynobox-tool-events.json`
- `dynobox-stderr.log`

Dynobox uses each harness's normal permission behavior by default. For trusted
local evals that intentionally need full access, configure
`permissionMode: 'dangerous'` in the dyno or pass:

```bash
dynobox run --permission-mode dangerous
```

## Next Steps

- Write more scenarios with [Config Authoring](./config-authoring.md).
- Add Dynobox to automation with [CI Integration](./ci.md).
- Check exact flags and output fields in the [CLI Reference](./cli.md).

---

# Page: Config Authoring

Canonical URL: https://docs.dynobox.xyz/config-authoring/
Markdown URL: https://docs.dynobox.xyz/config-authoring.md
Source: https://github.com/dynobox/dynobox/blob/main/docs/config-authoring.md
Topics: @dynobox/sdk, defineDyno, YAML, assertions, HTTP capture, skills

# Config Authoring

Dynobox configs describe what to run and what to assert. A config can be
authored as JavaScript, TypeScript, or YAML.

Directory discovery loads files named `*.dyno.{mjs,js,ts,mts,yaml,yml}`.
Explicit file paths can use other names, such as `dynobox.config.ts`, as long
as the file is a loadable Dynobox config.

CommonJS config files (`.cjs` and `.cts`) are not supported because
`@dynobox/sdk` is ESM-only.

## Minimal Config

```ts
import {defineDyno, tool} from '@dynobox/sdk';

export default defineDyno({
  name: 'local-observability',
  harnesses: ['claude-code'],
  scenarios: [
    {
      name: 'inspect package scripts',
      setup: [
        `cat > package.json <<'JSON'
{"scripts":{"test":"vitest run"}}
JSON`,
      ],
      prompt:
        'Use a shell command that reads package.json and tell me whether a test script exists.',
      assertions: [
        tool.called('shell'),
        tool.called('shell', {includes: 'package.json'}),
      ],
    },
  ],
});
```

## Config Shape

```ts
type DynoboxConfig = {
  name?: string;
  version?: string;
  harnesses?: HarnessRunConfig[];
  setup?: string[];
  endpoints?: Record<string, Endpoint>;
  scenarios: ScenarioInput[];
};
```

Top-level `setup` commands and `endpoints` are merged into each scenario.
Top-level `harnesses` apply when a scenario does not define its own harnesses.
Scenario harnesses replace the top-level harness list.

```ts
type ScenarioInput = {
  id?: string;
  name: string;
  prompt: string;
  harnesses?: HarnessRunConfig[];
  setup?: string[];
  endpoints?: Record<string, Endpoint>;
  assertions?: Assertion[];
};
```

Each scenario runs in a fresh temporary work directory. Setup commands run in
that directory before the harness prompt, and artifact assertions read files
from that directory after the harness exits.

Scenario `id` is optional. When provided, it is used for stable compiled
scenario IDs, job IDs, and `dynobox run --scenario` filters. Without an `id`,
Dynobox derives one from the scenario name.

## Harnesses

Supported harness IDs:

- `claude-code`
- `codex`

Use strings when the default model and permission behavior are fine:

```ts
harnesses: ['claude-code', 'codex'];
```

Use objects to set a model or permission mode:

```ts
harnesses: [
  {id: 'claude-code', model: 'sonnet'},
  {id: 'codex', model: 'gpt-5.1', permissionMode: 'dangerous'},
];
```

Permission modes:

- `default`: use the harness's normal permission and sandbox behavior.
- `dangerous`: opt into harness-specific full-access or permission-bypass flags
  for trusted local evals.

Dangerous mode maps to:

- `claude-code`: `--permission-mode bypassPermissions`
- `codex`: `--sandbox danger-full-access -c approval_policy="never"`

The CLI can override authored harnesses with `--harness` and authored
permission modes with `--permission-mode`.

## Assertions

Assertions are evaluated against observed harness behavior after each scenario
runs.

### Tool Calls

Use `tool.called` and `tool.notCalled` to assert tool usage.

```ts
tool.called('shell');
tool.notCalled('web_fetch');
tool.called('shell', {includes: 'package.json'});
tool.notCalled('shell', {matches: 'rm\\s+-rf'});
```

Supported tool kinds:

- `shell`
- `read_file`
- `write_file`
- `edit_file`
- `search_files`
- `web_fetch`
- `web_search`
- `mcp`
- `task`
- `unknown`

Shell tool assertions can include exactly one command matcher:

- `{equals: 'pnpm test'}`
- `{includes: 'package.json'}`
- `{startsWith: 'pnpm'}`
- `{matches: 'pnpm\\s+test'}`

`matches` is a JavaScript regular expression string. Command matchers are only
valid on `shell` tool assertions.

### Ordered Sequences

Use `sequence.inOrder` when order matters.

```ts
sequence.inOrder([
  tool.called('shell', {includes: 'package.json'}),
  tool.called('shell', {includes: 'pnpm test'}),
]);
```

For shell commands, ordered matching can match multiple steps against one
compound command when the command text appears in order.

### Skills

Use `skill.invoked` to assert that the harness accessed a named skill's
`SKILL.md` instruction file.

```ts
skill.invoked('commit');
```

This passes when observed tool events reference
`.agents/skills/<name>/SKILL.md` or `.claude/skills/<name>/SKILL.md`, including
reads, searches, or shell commands that access the file.

### Artifacts

Artifact assertions read files inside the scenario work directory.

```ts
artifact.exists('README.md');
artifact.contains('package.json', 'vitest run');
```

Artifact paths must be relative and must stay inside the work directory.

### Transcript And Final Message

Use transcript assertions to inspect the full harness transcript. Use
final-message assertions to inspect the final assistant response extracted from
the harness output.

```ts
transcript.contains('package.json');
finalMessage.contains('test script');
```

Final-message extraction depends on the harness output format. If a harness
does not provide a final message, the assertion fails with a clear message.

## HTTP Assertions

Declare endpoints with `http.endpoint(...)` and assert whether matching
requests were observed.

```ts
endpoints: {
  npmPrettier: http.endpoint({
    method: 'GET',
    url: 'https://registry.npmjs.org/prettier',
  }),
},
assertions: [http.called('npmPrettier', {status: 200})];
```

Endpoint keys become part of stable IR ids, so they may only contain letters,
numbers, underscores, and hyphens.

Endpoint specs also accept `headers`, `body`, and `response` fields. The
current local runner preserves those fields in the compiled IR, but HTTP
assertions match observed requests by endpoint URL/method and optional response
status. It does not use those fields to mock or shape requests yet.

When a scenario includes HTTP assertions, Dynobox starts a per-job local proxy
and sets proxy environment variables on the harness child process:

- `HTTP_PROXY`
- `HTTPS_PROXY`
- `http_proxy`
- `https_proxy`

Dynobox also sets common CA variables to a generated CA at
`~/.dynobox/ca.pem`:

- `NODE_EXTRA_CA_CERTS`
- `SSL_CERT_FILE`
- `REQUESTS_CA_BUNDLE`
- `CURL_CA_BUNDLE`

HTTP capture covers local child-process traffic that honors those proxy and CA
environment variables. Harness-native web tools and binaries with their own
trust stores may bypass capture.

## Path Helpers

The `dyno` helper is useful when config files need stable paths relative to the
config module.

```ts
import {dyno} from '@dynobox/sdk';

const here = dyno.here(import.meta.url);

setup: [`cp ${here.q('./fixtures/input.txt')} input.txt`];
```

Available helpers:

- `dyno.fsPath(url)`
- `dyno.fromUrl(baseUrl, path)`
- `dyno.shellQuote(value)` or `dyno.q(value)`
- `dyno.here(import.meta.url).path(path)`
- `dyno.here(import.meta.url).q(path)`

## Reusable Scenarios

Use `defineScenario` when you want to author or export a scenario
independently, then include it in a dyno.

```ts
import {defineDyno, defineScenario, tool} from '@dynobox/sdk';

const checksPackageJson = defineScenario({
  name: 'checks package json',
  prompt: 'Read package.json and summarize the scripts.',
  assertions: [tool.called('shell', {includes: 'package.json'})],
});

export default defineDyno({
  scenarios: [checksPackageJson],
});
```

## YAML Configs

YAML dynos use the same top-level shape as JavaScript and TypeScript configs.
The difference is that helper calls are written as plain objects using the same
authoring assertion shape that SDK helpers return.

```yaml
name: package-script-check
harnesses:
  - claude-code
scenarios:
  - name: detects test script
    prompt: >-
      Inspect package.json and tell me whether this project has a test script.
    setup:
      - |
        cat > package.json <<'JSON'
        {"scripts":{"test":"vitest run"}}
        JSON
    assertions:
      - label: reads package.json
        type: tool.called
        tool: shell
        command:
          includes: package.json
      - type: tool.notCalled
        tool: edit_file
      - type: artifact.contains
        path: package.json
        text: vitest run
      - type: finalMessage.contains
        text: test
```

YAML configs flow through the same schema and IR compiler as JavaScript and
TypeScript configs.

## Authoring Assertion Contract

All assertion objects accept optional `id` and `label` fields. `id` stabilizes
compiled assertion IDs and JSON report references. `label` appears in CLI and
JSON output.

| TypeScript helper                                      | Authoring object                                                   |
| ------------------------------------------------------ | ------------------------------------------------------------------ |
| `tool.called('shell')`                                 | `{type: tool.called, tool: shell}`                                 |
| `tool.called('shell', {includes: 'x'})`                | `{type: tool.called, tool: shell, command: {includes: x}}`         |
| `tool.notCalled('edit_file')`                          | `{type: tool.notCalled, tool: edit_file}`                          |
| `artifact.exists('README.md')`                         | `{type: artifact.exists, path: README.md}`                         |
| `artifact.contains('pkg.json', 'foo')`                 | `{type: artifact.contains, path: pkg.json, text: foo}`             |
| `transcript.contains('done')`                          | `{type: transcript.contains, text: done}`                          |
| `finalMessage.contains('ok')`                          | `{type: finalMessage.contains, text: ok}`                          |
| `skill.invoked('commit')`                              | `{type: skill.invoked, skill: commit}`                             |
| `sequence.inOrder([tool.called('shell', {...}), ...])` | `{type: sequence.inOrder, steps: [{type: tool.called, ...}, ...]}` |
| `http.called('npmPrettier', {status: 200})`            | `{type: http.called, endpoint: npmPrettier, status: 200}`          |
| `http.notCalled('leftPad')`                            | `{type: http.notCalled, endpoint: leftPad}`                        |

Command matcher shapes accept exactly one of `equals`, `includes`,
`startsWith`, or `matches`, and are only valid on `shell` tool assertions.

Older YAML objects that used `kind`, `toolKind`, or `matcher` are not accepted.
Use `type`, `tool`, and `command` instead.

When YAML parsing fails, the CLI emits a `line:column` pointer into the file so
syntax errors are easy to locate.

---

# Page: CLI Reference

Canonical URL: https://docs.dynobox.xyz/cli/
Markdown URL: https://docs.dynobox.xyz/cli.md
Source: https://github.com/dynobox/dynobox/blob/main/docs/cli.md
Topics: CLI, dynobox init, dynobox run, JSON reporter, exit codes

# CLI Reference

The public CLI package is `dynobox`:

```bash
npm install -g dynobox
```

## Commands

### `dynobox init`

Create a starter dyno under `./dynobox/`.

```bash
dynobox init
dynobox init --yaml
dynobox init --harness codex
dynobox init --force
```

`dynobox init` writes `dynobox/example.dyno.mjs` by default. With `--yaml`, it
writes `dynobox/example.dyno.yaml`. Existing starter files are not overwritten
unless `--force` is passed. `--harness` accepts the same harness IDs as
`dynobox run`; invalid harness IDs fail before writing a starter file.

### `dynobox run [path]`

Discover and run dyno files.

```bash
dynobox run
dynobox run examples
dynobox run my-skill.dyno.yaml
dynobox run dynobox.config.ts
```

Path behavior:

- No path: discover under the current working directory.
- Directory path: discover recursively under that directory.
- File path: run that one loadable Dynobox config file.

Directory discovery matches `**/*.dyno.{mjs,js,ts,mts,yaml,yml}`. It skips
hidden entries, `node_modules`, `dist`, `build`, `coverage`, `.git`,
`.dynobox`, `.next`, and `.cache`.

Explicit file paths do not need to match the `*.dyno.*` naming pattern. YAML
files are parsed as YAML, and JavaScript or TypeScript files such as `.mjs`,
`.js`, `.ts`, and `.mts` are imported through the CLI loader. `.cjs` and `.cts`
configs are not supported because `@dynobox/sdk` is ESM-only.

A load error in one discovered file does not stop other files from running.
Each bad file prints a `config:` error block on stderr, and the process exits
non-zero if any file failed to load or any job failed.

## Run Options

```text
--harness <id>             Override config harnesses; repeat or comma-separate
                           for multiple harnesses.
--permission-mode <mode>   Override harness permission mode: default or
                           dangerous.
--scenario <pattern>       Run only scenarios whose name or id matches;
                           repeat or comma-separate for multiple patterns.
--quiet                    Print compact CI-friendly output.
--verbose                  Expand scenario details even when passing.
--debug                    Include debug paths and artifacts.
--reporter <fmt>           Output reporter format: text or json.
```

Harness IDs are `claude-code` and `codex`.

Examples:

```bash
dynobox run --harness claude-code
dynobox run --harness codex
dynobox run --harness claude-code,codex
dynobox run --harness codex --permission-mode dangerous
dynobox run --scenario "release*"
dynobox run --scenario "lint*,deploy package"
dynobox run --reporter json
```

Scenario filters match the compiled scenario name or id. Patterns support `*`
for any number of characters and `?` for one character. If no scenarios match,
the run exits with code `1`.

## Output Modes

Default output prints the run header, job status, assertion details for failed
or expanded jobs, and a final summary. Passing jobs collapse to one line.

`--quiet` prints compact CI-friendly progress and failure information.

`--verbose` expands scenario details even when jobs pass.

`--debug` includes temporary work-directory paths and writes debug logs inside
each job's work directory when data is available. Debug logs can include:

- `dynobox-transcript.log`
- `dynobox-chat-history.jsonl`
- `dynobox-tool-events.json`
- `dynobox-stderr.log`

`--reporter json` emits newline-delimited JSON on stdout instead of text.
Dynobox writes one job object per completed job, then one summary object. The
JSON reporter always uses static output so stdout remains machine-readable.

When stdout is an interactive terminal and live output is enabled, Dynobox
streams phase progress and harness tool events as they happen. In
non-interactive output, quiet mode, or incompatible terminals, it renders static
output after jobs complete.

## JSON Reporter

Every JSON reporter object includes `"schema": "dynobox.report.v1"` and a
`type` field.

Job records include:

- `jobId`
- `scenario.id` and `scenario.name`
- `harness.id`, with `model` and `permissionMode` when configured
- `iteration`, using a 1-based number
- `status` and `passed`
- `timing`
- `diagnostics`
- `warnings`
- `artifacts`
- `debugLogPaths` when `--debug` produced logs
- `setup.commands`
- `harnessOutput.exitCode` and `harnessOutput.durationMs` when the harness ran
- `observations.toolEventCount` and `observations.httpEventCount`
- `assertions`, with `assertionId`, optional `label`, `kind`, `passed`, and
  `message`

The summary record includes:

- `status`
- `totals.jobs`, `totals.passed`, `totals.failed`, `totals.configErrors`,
  `totals.warnings`, and `totals.durationMs`
- `plan.scenarios`, `plan.harnesses`, and `plan.iterations`
- `failedJobs`
- `warningJobs`

Example:

```bash
dynobox run --reporter json examples/local-observability
```

In CI, redirect stdout to an artifact file:

```bash
dynobox run --reporter json dynobox > dynobox-report.ndjson
```

## Exit Codes

Dynobox exits with `0` when all loaded jobs pass.

Dynobox exits with `1` for:

- No subcommand supplied.
- Config load, parse, validation, or flag errors.
- No dynos found for a directory target.
- At least one completed job failed.

## Harness Requirements

The CLI registers both real harnesses by default:

- `claude-code` invokes Claude Code with stream JSON output and hook events.
- `codex` invokes Codex with JSON output, no color, and the git-repo check
  skipped.

Make sure the selected harness executable is installed, authenticated, and
available on `PATH`.

Dynobox uses each harness's normal permission behavior by default. Use
`--permission-mode dangerous` only for trusted local evals that intentionally
need full access or non-interactive approval bypasses.

Dangerous mode maps to harness-specific flags:

- `claude-code`: adds `--permission-mode bypassPermissions`.
- `codex`: adds `--sandbox danger-full-access -c approval_policy="never"`.

Permission warnings are advisory. They explain when a harness blocked a tool
action, but they do not change job status, assertion results, or exit codes.

## Development Checkout

See [CONTRIBUTING.md](../CONTRIBUTING.md) for local checkout workflows.

---

# Page: CI Integration

Canonical URL: https://docs.dynobox.xyz/ci/
Markdown URL: https://docs.dynobox.xyz/ci.md
Source: https://github.com/dynobox/dynobox/blob/main/docs/ci.md
Topics: CI, GitHub Actions, JSON reports, artifacts

# CI Integration

Dynobox runs in CI like any other command-line test step. A successful run exits
with `0`; config, flag, discovery, load, or job failures exit with `1`.

Use text output when humans will read the log:

```bash
dynobox run dynobox --quiet --harness claude-code
```

Use JSON output when a later CI step should consume the results:

```bash
dynobox run dynobox --reporter json --harness claude-code > dynobox-report.ndjson
```

`--reporter json` writes newline-delimited JSON to stdout. Each completed job
produces one `"type": "job"` record, followed by one `"type": "summary"` record.
Every record includes `"schema": "dynobox.report.v1"`.

## Recommended Pattern

1. Install Node.js 22 or newer.
2. Install `dynobox`.
3. Install the harness executable for the job.
4. Run `dynobox run` once per harness, usually through a CI matrix.
5. Upload the JSON report as a build artifact.
6. Summarize the final JSON `summary` record in the job output.

For targeted CI jobs, combine the JSON reporter with scenario filters:

```bash
dynobox run dynobox --reporter json --scenario "release*" > dynobox-report.ndjson
```

Scenario filters match the compiled scenario name or id. Repeat the flag or use
comma-separated values to select multiple patterns:

```bash
dynobox run dynobox --scenario "release*,publish package"
```

## GitHub Actions

A reference workflow lives at
[`examples/.github/workflows/example-eval.yml`](../examples/.github/workflows/example-eval.yml).
It runs a matrix over `claude-code` and `codex`, writes one NDJSON report per
harness, uploads each report, and appends a compact summary to the GitHub
Actions step summary.

Copy the workflow into your repository's `.github/workflows/` directory and
adjust:

- `DYNOBOX_TARGET` for the directory or file containing your dynos.
- Harness install commands for your pinned versions.
- Secrets for the selected harnesses.

The example assumes:

- `ANTHROPIC_API_KEY` is available for `claude-code`.
- `OPENAI_API_KEY` is available for `codex`.

## Read JSON Reports

The JSON reporter is line-oriented. Read the file one line at a time and parse
each line as a separate JSON object.

```js
import {readFileSync} from 'node:fs';

const records = readFileSync('dynobox-report.ndjson', 'utf8')
  .trim()
  .split('\n')
  .filter(Boolean)
  .map((line) => JSON.parse(line));

const summary = records.find((record) => record.type === 'summary');
console.log(summary.totals);
```

Useful job fields include `jobId`, `scenario`, `harness`, `status`, `passed`,
`warnings`, `observations`, and `assertions`.

Useful summary fields include `status`, `totals`, `plan`, `failedJobs`, and
`warningJobs`.

Permission warnings are advisory. They explain when a harness blocked a tool
action, but they do not change job status or exit codes. Use
`--permission-mode dangerous` only for trusted evals that intentionally need
full local access.

Config and discovery failures can happen before any job runs. In those cases,
Dynobox writes the config error to stderr and exits `1`; there may be no JSON
summary record to parse.

## Artifact Naming

When a CI matrix runs multiple harnesses, write one report per harness:

```bash
dynobox run dynobox --reporter json --harness "$HARNESS" > "dynobox-${HARNESS}.ndjson"
```

This keeps reports easy to compare and avoids interleaving records from
different CI jobs.