Graph

Graph is the pipeline: a set of nodes and the directed edges between them. Building a graph only describes the pipeline, nothing runs until execute() (see Execution).

from biocomposer import Graph

g = Graph()
a = g.add_input_node(sequences="/vol/inputs/family.fasta")
b = g.add_node("clustalo")
g.add_edge((a, b))
g.set_output_node(b)

Graphs are defined using nodes (bioinformatics tools) and edges (auto-generated mapper functions) that enable data flow between nodes. There are 6 types of nodes, covered in Nodes. Execution, run order, input merging, results, is covered in Execution.

Edges

add_edge takes one or more (upstream, downstream) tuples and records the wiring; it also tracks fan-out (how many downstreams each node feeds), which the executor uses for caching. Rules:

  • A node may have several incoming edges.

  • An InputNode may only be a source.

  • Both endpoints may be nodes or subgraphs.

Edge order matters when two upstreams collide on a key, see Input merging and key clashes on the Execution page.

g.add_edge((rfdiffusion, proteinmpnn), (mpnn_in, proteinmpnn))  # two edges, one call

Output nodes

set_output_node(node) marks which node’s result execute() returns. Execution starts from the output node(s) and walks backward. A graph may have several outputs; execute() returns one result per output node. See Output nodes.

Reference

class biocomposer.Graph

Bases: object

add_decision_node(score_fn: str, conditions: list, modifier_tool) DecisionNode
add_edge(*edges)

Wire nodes together. Each edge is a 2-tuple (upstream_node, downstream_node).

add_gather_node(tool_name: str, split_key: str, gpu: str = None, args_override: str = None, entrypoint_override: str = None) GatherNode

Scatter an upstream collection (the upstream output key split_key) over a cardinality-one inner tool, running it once per item and gathering the per-item outputs into a dict-of-lists.

add_input_node(**kwargs) InputNode
add_node(tool_name: str, gpu: str = None, args_override: str = None, entrypoint_override: str = None) Node
execute()
set_llm(provider: str, model: str, api_key: str)
set_output_node(node: Node = None)