How A Run Works¶

This page explains the runtime sequence after you already have one successful OptPilot run. For the recommended first walkthrough, use Getting Started.

This page follows what happens after:

uv run optpilot run examples/studies/job_shop_rule_parameters_baseline.yaml

At a high level, OptPilot loads the study config, resolves the referenced environment and method configs, validates compatibility, compiles an internal run spec, and runs the propose-evaluate-record loop until the study budget stops.

Runtime Sequence¶

sequenceDiagram
  participant CLI as "optpilot run"
  participant Compiler as "Config compiler"
  participant Runner as "Runner"
  participant Method as "Method"
  participant CandidateStore as "Candidate store"
  participant Trial as "Trial workspace"
  participant Evaluator as "Environment evaluator"
  participant Evidence as "Evidence store"

  CLI->>Compiler: load study config
  Compiler->>Compiler: load environmentConfig and methodConfig
  Compiler->>Compiler: validate JSON Schema and compatibility
  Compiler-->>Runner: compiled study spec
  Runner->>Evidence: create run directory and write run metadata

  loop until budget stops
    Runner->>Method: request candidates with study state, context, and evidence
    Method-->>Runner: candidate manifests

    opt file candidates
      Method->>CandidateStore: write generated files
      Runner->>CandidateStore: read contentRef files
    end

    Runner->>Trial: create fresh workspace
    Runner->>Trial: copy trialWorkspace entries
    Runner->>Trial: materialize candidate
    Runner->>Evaluator: evaluate candidate in trial workspace
    Evaluator-->>Runner: status, metrics, output files, records
    Runner->>Evidence: write candidate, trial, observation, and event records
    Runner->>Method: send observations when supported
  end

  Runner->>Evidence: write summary

The method proposes candidates. The runner does not invent them. The runner supplies the method with the study state, method context, candidate contract, and evidence that the method is allowed to rely on.

Config Compilation¶

The public YAML files are authoring configs. The runner executes a compiled internal spec.

Compilation performs these checks:

load config: study
resolve environmentConfig and methodConfig from the study file
validate all three files against JSON Schema
resolve config-relative paths
check that the method accepts the environment candidate format
check required method context paths and capabilities
check that the study objective metric is declared by the environment when metric keys are provided

The compiled spec is written to:

run_dir/study_spec.json

Users normally do not edit this file. It is evidence for what OptPilot actually ran.

Some field names change during compilation because public YAML is optimized for authoring while the internal spec is optimized for execution.

Public authoring field	Internal runtime field	Why it changes
`objective.metric`	`objective.primaryMetric.name`	Runtime can also hold secondary metrics and aggregation details.
`objective.direction`	`objective.primaryMetric.direction`	The runner compares metrics using the compiled primary metric.
`budget.maxTrials`	`stopping.maxTrials`	Budget becomes a stopping policy.
`evaluator.settings`	`environment.evaluator.config.settings`	Environment-owned inputs remain attached to the evaluator.
`method.settings`	`method.config` and `method.settings`	Method implementations read runtime config; original settings remain available for audit.
`environment.candidate`	`candidate.context.candidate` plus validation/materialization specs	The runner needs both the public contract and executable validation/materialization rules.

When in doubt, treat public YAML as the source you edit and study_spec.json as the exact run record you inspect.

Environment evaluator inputs are normal configuration, carried as evaluator.settings in public YAML and passed to Python evaluators as context["settings"]. OptPilot does not interpret those settings beyond validation and path resolution. An evaluator may run one fixed case, use settings as simulator arguments, or loop over multiple benchmark cases internally and return aggregated metrics plus per-case records.

Method Execution¶

The method owns the search algorithm. It can be a small Python class, a command-line optimizer, an LLM agent, or a wrapper around a full upstream repository.

For each proposal request, OptPilot provides:

study_state: completed trials, failure count, best metric, and related run state
candidate_context: the environment candidate contract and method-visible context
evidence: previous observations, candidates, trials, records, calls, and events
settings: the free object from the method config
runtime_context: per-call paths, including method workspace and candidate store

For Python methods, the canonical path is study_state["candidate_context"] plus the optional evidence_view argument. For command methods, the request JSON also breaks out candidate, methodContext, and candidate_context for convenience; they describe the same environment-provided contract.

Parameter candidates look like:

{
  "candidate_id": "candidate-001",
  "format": "parameters",
  "spec": {"x": 4.2, "mode": "balanced"},
  "generator": {"method_id": "my-method"}
}

File candidates reference files generated by the method:

{
  "candidate_id": "candidate-001",
  "format": "files",
  "spec": {
    "bundleRef": "/path/to/run/candidates/candidate-001/files",
    "files": [
      {
        "path": "src/policy.py",
        "contentRef": "/path/to/run/candidates/candidate-001/files/src/policy.py",
        "sha256": "..."
      }
    ]
  },
  "generator": {"method_id": "my-method"}
}

optpilot.candidate_files.CandidateFileStore creates this file-candidate shape for Python methods.

Candidate Store And Trial Workspace¶

The run directory contains distinct storage areas:

Location	Purpose
`run_dir/candidates/`	Durable method-produced candidate files before evaluation.
`run_dir/method_calls/`	Per-call method request, response, stdout, and stderr files.
`run_dir/trials/`	Per-trial workspaces used by evaluators.
`run_dir/evidence_files/`	Optional copies of evaluator output files when `evidence.outputFileStorage: copy` is enabled.

For file candidates, materialization has one extra handoff:

method writes generated files to candidate store
method returns candidate manifest with contentRef and sha256
runner validates the manifest
runner copies contentRef files into the trial workspace
evaluator reads the trial workspace

The trial workspace is what gets evaluated. The candidate store is a handoff area before evaluation. The evaluator normally reads the trial workspace, not the candidate store.

Trial Workspace Preparation¶

Each candidate evaluation gets a fresh workspace under run_dir/trials/.

For parameter candidates:

the candidate spec is passed directly as runtime input
no environment source tree is required unless the evaluator itself needs copied files

The job-shop parameter baseline from Getting Started follows this simpler path.

For file candidates:

OptPilot creates a fresh trial workspace.
It copies every environment.trialWorkspace entry into that workspace.
It validates the method's file manifest.
It copies method-generated files into candidate.materialize.root.
It writes workspace_manifest.json.
It calls the evaluator with the workspace and candidate root.

The SA example copies a complete simulator source tree because the evaluator runs the simulator from inside the trial workspace after candidate edits are applied. If an evaluator uses an installed package, a prebuilt image, an external service, or only JSON input files, it does not need to copy the complete environment implementation.

File-candidate tracks such as Strategic Airlift add this materialization step on top of the same base loop. The method still proposes candidates, the environment still evaluates them, and OptPilot still records the evidence.

Environment Evaluation¶

The environment config chooses one evaluator mode:

Evaluator field fragment:

evaluator:
  python: user_catalog.environments.my_environment.evaluator:evaluate

Alternative evaluator field fragment:

evaluator:
  command: [python, evaluate.py, "{candidate_json}", "{settings_file}", "{metrics_file}"]

Alternative evaluator field fragment:

evaluator:
  adapter: user_catalog.environments.my_environment.adapter:MyAdapter

The evaluator receives the materialized candidate and context["settings"]. It returns or writes:

status
metric values
optional constraint results
optional output-file descriptors
optional record streams

The configured adapter normalizes those values into observations.

Parallelism And Runtimes¶

Study execution controls environment trials:

Study execution fragment:

execution:
  backend: local          # local | local_subprocess
  parallelism: 2
  runtime:
    sandbox: host         # host | container

Current runner support:

Setting	Status
`backend: local` with `sandbox: host`	Implemented.
`backend: local_subprocess` with `sandbox: host`	Implemented.
`backend: local` with `sandbox: container`	Implemented for Docker/Podman-compatible CLIs.

Method runtime is separate from environment execution:

Method runtime fragment:

runtime:
  sandbox: container
  container:
    image: my-method-image:latest
    executable: docker

Use a method runtime container when the optimizer or agent needs different dependencies from the evaluator.

Evidence¶

Every run directory records the compiled spec, trial results, candidate records, method calls, scheduler events, output files, and final summary.

Use Evidence for the run file catalog and resume/branch behavior.