Connectors
==========

A connector is a Python function, generated by an LLM, that maps one tool's output
dict onto the next tool's input dict. One connector is generated per edge.

Generation
----------

The first time an edge executes, biocomposer builds a prompt from:

- the downstream tool's input schema (each field's ``type``, ``cardinality``,
  ``format``, ``required``);
- the upstream tool's output schema;
- a snapshot of the **actual runtime data**, directory listings with file heads,
  list shapes, so the mapping is decided against what really exists, not just
  declared types.

The LLM returns ``edge_<upstream>_to_<downstream>(output) -> dict``.

Caching
-------

A connector is keyed by edge identity ``(upstream_id, downstream_id)`` and written
to a connectors file, then reused for every later call, across decision-loop
iterations and across every item of a gather node. A loop or an N-way fan-out reuses
one generated function; there is no per-iteration or per-item regeneration.

What a connector does
---------------------

The prompt directs the model to reason in steps:

- **Entity match**, map a field only when it holds the same semantic data as a
  downstream field; matching ``type`` alone is not enough.
- **Format conversion**, when the downstream expects a different format (e.g. a
  FASTA file vs. an in-memory sequence), write the conversion.
- **Cardinality**, when the downstream wants one item but the upstream is a
  directory or list of many, select the single primary file, excluding
  trajectory/intermediate/temporary artifacts. For a **map** edge the same step
  instead returns the *list* of per-item input dicts (:doc:`nodes/gather`).

Input nodes bypass connectors
-----------------------------

Connectors are not generated for user-provided :doc:`input node <nodes/input>`
values, those are applied verbatim, so the model can never substitute a manifest
example value for a real one. This is also why input-node values override upstream
outputs on a key clash (:ref:`merge-order`).