Connectors¶

A connector is a Python function, generated by an LLM, that maps one tool’s output dict onto the next tool’s input dict. One connector is generated per edge.

Generation¶

The first time an edge executes, biocomposer builds a prompt from:

the downstream tool’s input schema (each field’s type, cardinality, format, required);
the upstream tool’s output schema;
a snapshot of the actual runtime data, directory listings with file heads, list shapes, so the mapping is decided against what really exists, not just declared types.

The LLM returns edge_<upstream>_to_<downstream>(output) -> dict.

Caching¶

A connector is keyed by edge identity (upstream_id, downstream_id) and written to a connectors file, then reused for every later call, across decision-loop iterations and across every item of a gather node. A loop or an N-way fan-out reuses one generated function; there is no per-iteration or per-item regeneration.

What a connector does¶

The prompt directs the model to reason in steps:

Entity match, map a field only when it holds the same semantic data as a downstream field; matching type alone is not enough.
Format conversion, when the downstream expects a different format (e.g. a FASTA file vs. an in-memory sequence), write the conversion.
Cardinality, when the downstream wants one item but the upstream is a directory or list of many, select the single primary file, excluding trajectory/intermediate/temporary artifacts. For a map edge the same step instead returns the list of per-item input dicts (Gather nodes).

Input nodes bypass connectors¶

Connectors are not generated for user-provided input node values, those are applied verbatim, so the model can never substitute a manifest example value for a real one. This is also why input-node values override upstream outputs on a key clash (Input merging and key clashes).