Evidence¶

Evidence is the recorded history of one OptPilot run.

The public configs define what should happen:

method proposes candidate
environment evaluates candidate
OptPilot records evidence

The run directory shows what actually happened: which candidates were proposed, how they were materialized, which trials succeeded or failed, which metrics were returned, and where evaluator output files were written.

Run Directory¶

By default, runs are written to a runs/ directory next to the directory that contains the study config. The built-in job-shop studies live under examples/studies/, so their default output root is examples/runs/. You can override this with --output-root or evidence.outputDir.

Common files:

File	Meaning
`summary.json`	Final run summary, best metric, failure count, and run status.
`study_spec.json`	Compiled run spec generated from the study, environment, and method configs.
`candidates.jsonl`	Candidate records, validation details, and materialization details.
`observations.jsonl`	Trial observations and metric values.
`trials.jsonl`	Trial lifecycle records and backend metadata.
`method_calls.jsonl`	Method requests, responses, and errors.
`method_events.jsonl`	Events emitted by methods.
`scheduler_events.jsonl`	Scheduling and worker events.
`environment_snapshot.json`	Environment contract used by the run.
`run_policy.json`	Budget, retry, parallelism, and timeout policy.
`run_lineage.json`	Resume and branch lineage metadata.

The exact set can vary by evidence level and by which parts of the runtime are used.

A typical local run looks like:

runs/my-study-2026-06-20T.../
  summary.json
  study_spec.json
  run_policy.json
  run_lineage.json
  environment_snapshot.json
  candidates.jsonl
  trials.jsonl
  observations.jsonl
  method_calls.jsonl
  scheduler_events.jsonl
  prompts/
    prompt-.../prompt.json
  candidates/
    candidate-.../files/...
  trials/
    trial-.../
      candidate/
      candidate.json
      workspace_manifest.json
      evaluator outputs...
  evidence_files/
    trial-.../
      copied outputs when evidence.outputFileStorage: copy

The most important files for debugging are usually summary.json, observations.jsonl, candidates.jsonl, and the corresponding trials/<trial-id>/ directory.

Storage Roles¶

OptPilot uses a few runtime folders with different jobs.

Runtime storage	Purpose
Method workspace	Scratch space for one method invocation. Command wrappers often write request files and logs here.
Candidate store	Durable handoff area for candidates produced by methods, especially generated files.
Trial workspace	Fresh evaluation directory for one trial. `trialWorkspace` entries are copied here and file candidates are materialized here.
Evidence directory	Run-level records, summaries, and retained evaluator outputs.

The evaluator normally reads the trial workspace, not the candidate store. For file candidates, the runner copies files from the candidate store into the trial workspace according to the environment candidate contract.

Output Files¶

Evaluators may produce logs, JSON summaries, CSV files, SQLite databases, images, or other files inside the trial workspace.

There are two ways those files become visible in evidence:

the evaluator returns output_files descriptors
the environment config lists outputFiles patterns to collect after evaluation

evidence.outputFileStorage controls whether file bytes are copied into evidence storage:

Value	Behavior
`reference`	Evidence records paths to files where they were produced, usually inside trial workspaces.
`copy`	Matching output files are copied into evidence storage so they remain easy to inspect even if trial workspaces are later cleaned up.

Metric values should still be returned or extracted through metrics. Output files are for supporting evidence, debugging, traces, plots, logs, and databases.

EvidenceView¶

Methods can inspect previous results through EvidenceView during iterative optimization.

Typical information available through this API includes:

observations and metric values
trial records
candidate records
method call records
scheduler events
method events
extracted records
evaluator output files and artifacts

This gives methods a stable way to learn from previous trials without parsing raw run files by hand.

def propose(self, n_candidates, study_state, evidence_view):
    recent = evidence_view.observations(limit=3)
    traces = evidence_view.artifacts(kind="json", limit=5)
    rows = evidence_view.records("events", limit=20)
    ...

records(...) reads rows extracted from configured JSONL, CSV, SQLite, or custom record streams. artifacts(...) and output_files(...) return metadata for files produced during evaluation, such as logs, plots, JSON reports, CSV files, or SQLite databases. They return paths and content references so a method can decide what to read.

Resume And Branch¶

Resume appends more trials to an existing run:

uv run optpilot run examples/studies/job_shop_rule_parameters_baseline.yaml \
  --resume-run-dir path/to/existing-run

Branch starts a new run that records a previous run as its parent:

uv run optpilot run examples/studies/job_shop_rule_parameters_baseline.yaml \
  --branch-from-run-dir path/to/existing-run