bv documentation

Three audiences, one page. Jump to whichever applies.

For end users

running tools in your project

tl;dr

# 1. install a runtime (pick one)
curl -fsSL https://get.docker.com/rootless | sh        # docker, on a laptop
conda install -c conda-forge apptainer                 # apptainer, on HPC

# 2. install bv
curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh

# 3. add a tool, call its binary, commit the lockfile
mkdir myproj && cd myproj
bv add blast
bv run blastn -version
git add bv.toml bv.lock && git commit -m "pin tools"

# 4. on any other machine: same images, by digest
bv sync

install

bv needs Docker or Apptainer/Singularity, plus git. No Python, no conda.

Docker is typical on a laptop. Use the rootless installer when you can; on a GPU box install nvidia-container-toolkit too.

Apptainer is typical on shared HPC nodes since it does not need a daemon or root. Install with conda: conda install -c conda-forge apptainer.

Then install bv itself:

curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh
# or, with cargo:
cargo install biov

Verify with bv doctor, which prints the runtimes it can see and any missing pieces.

core commands

command	what it does
`bv add <tool>[@ver]`	resolve from the registry, pull the image, write `bv.toml` and `bv.lock`, generate shims
`bv run <binary> <args>`	run a binary inside its container; current directory is mounted at `/workspace`
`bv exec <cmd>`	run any command with the project's binaries on PATH (good for scripts, Make, Snakemake, CI)
`bv shell`	interactive subshell with the project active
`bv sync`	pull every image pinned in `bv.lock` by digest and regenerate shims
`bv list [--binaries]`	show installed tools, or the binary routing table
`bv search <query>`	search the registry by name, description, or I/O type
`bv show <tool> [--format json\|mcp\|json-schema]`	typed I/O schema and metadata
`bv lock [--check]`	regenerate `bv.lock`; `--check` exits 1 if anything would change
`bv doctor`	environment check (runtimes, GPU, project state)

bv.toml & bv.lock

bv.toml is what you wrote (or what bv add wrote for you). bv.lock is the resolved, pinned state. Commit both. .bv/ (generated shim directory) is gitignored automatically.

# bv.toml
[project]
name = "myproj"

[[tools]]
id = "blast"
version = "=2.15.0"

[[tools]]
id = "hmmer"

[runtime]
backend = "auto"      # docker | apptainer | auto (default)

If two tools expose the same binary name, bv lock fails with a clear error. Resolve it with [binary_overrides]:

[binary_overrides]
samtools = "samtools"   # this tool wins for the `samtools` shim

caches

Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (think ColabFold writing to /cache/colabfold). bv binds writable host directories at the right paths automatically. The set of paths is resolved in three layers:

Tool manifest (cache_paths) : the tool author's authoritative list. ColabFold's manifest declares cache_paths = ["/cache/colabfold"].
Your [[cache]] entries in bv.toml : add new paths or redirect any existing path to a different host directory.
Apptainer fallbacks : for tools that haven't declared cache paths yet, bv auto-binds /cache and /root/.cache.

Default host path: ~/.cache/bv/<tool>/<slug>. Docker skips the apptainer fallbacks because its writable upper layer already covers them; manifest and user entries apply on both backends.

# bv.toml : redirect colabfold weights to a shared NFS cache
[[cache]]
match = "colabfold"
container_path = "/cache/colabfold"
host_path = "/srv/shared/colabfold-weights"

# add an extra path for every tool
[[cache]]
match = "*"
container_path = "/tmp/scratch"
host_path = "~/.cache/bv/{tool}/scratch"

The {tool} token is replaced with the tool id; ~ expands to $HOME.

backends

bv auto-detects what runtime is available. Pin it explicitly with --backend, the BV_BACKEND env var, or [runtime] backend = "apptainer" in bv.toml.

feature	docker	apptainer
root needed	daemon needs privileges (rootless mode available)	no
GPU flag	`--gpus all`	`--nv`
image cache	docker image store	SIF files in `~/.local/share/bv/sif`
writable container FS	yes (upper layer)	no (use cache mounts)

reference data

Some tools need large reference databases (kraken2, blast pdbaa, etc.). The manifest declares them; bv add tells you what's needed; bv data fetch downloads.

bv add kraken2
bv data fetch pdbaa --yes
bv run kraken2 ...      # bv mounts the data directory automatically

troubleshooting

Image not available locally after bv add: run bv sync to pull by digest.
Read-only filesystem errors from a tool on apptainer: the tool writes to a path bv hasn't bound. Add a [[cache]] entry, or open an issue against the registry asking the maintainer to add it to cache_paths.
GPU not detected: check nvidia-smi first, then bv doctor. On Docker, you need nvidia-container-toolkit; on Apptainer, the host's NVIDIA libraries.
Conflicting binary names across tools: bv lock tells you exactly which binaries collide. Use [binary_overrides].

For publishers

adding your tool to the registry

tl;dr

# in a directory with a Dockerfile (or pointing at a github repo)
bv publish .
# or:
bv publish github:user/repo@v1.0.0

# answer the prompts (name, version, description, typed I/O),
# bv builds the image, pushes to ghcr, and opens a PR to bv-registry.

manifest schema

A manifest lives at tools/<id>/<version>.toml in bv-registry. The full reference is in SCHEMA.md; below is the cheat sheet.

[tool]
id          = "colabfold"
version     = "1.6.0"
description = "ColabFold: fast protein structure prediction"
homepage    = "https://github.com/sokrypton/ColabFold"
license     = "MIT"
tier        = "core"        # core | community | experimental
maintainers = ["github:sokrypton"]

[tool.image]
backend   = "docker"
reference = "ghcr.io/sokrypton/colabfold:1.6.0-cuda12"
# digest is added automatically at lock time

[tool.hardware]
cpu_cores = 8
ram_gb    = 16.0
disk_gb   = 10.0

[tool.hardware.gpu]
required     = true
min_vram_gb  = 8
cuda_version = "12.0"

[[tool.inputs]]
name        = "fasta"
type        = "fasta[protein]"
cardinality = "one"
description = "Input protein sequences"

[[tool.outputs]]
name        = "output_dir"
type        = "dir"
cardinality = "one"
description = "Predicted structures and confidence scores"

[tool.entrypoint]
command       = "colabfold_batch"
args_template = "--num-recycle 3 {fasta} {output_dir}"

# Container paths the tool writes to. bv binds these to writable host
# dirs (critical on apptainer's read-only SIF root). Skip if the tool
# does not write inside the image.
cache_paths = ["/cache/colabfold"]

[tool.binaries]
exposed = ["colabfold_batch"]

Typed I/O matters. Inputs and outputs use the bv-types vocabulary (fasta, fasta[protein], blast_tab, pdb, etc.). Tools without typed I/O sit in the experimental tier and are hidden from default search results.

bv publish

bv publish handles fetching the source, generating a Dockerfile if you don't have one, building the image, pushing to GHCR, and opening the registry PR. It can run interactively or read a bv-publish.toml config.

bv publish ./my-tool                     # local directory, interactive
bv publish github:user/repo@v2.1.0       # github source, clones at the tag
bv publish . --non-interactive           # CI mode (reads bv-publish.toml)
bv publish . --no-push --no-pr           # dry run: build, print manifest, exit

sources it accepts

local directory: bv publish ./my-tool uses the directory as-is. The directory name becomes the tool-name hint; git describe --tags in that directory becomes the version hint.
github: bv publish github:owner/repo[@ref] shallow-clones the repo into a tempdir. repo becomes the tool-name hint. The version hint comes from the @ref (with leading v stripped) or, if omitted, the latest tag in the clone.

build systems it knows about

If your repo doesn't ship a Dockerfile, bv generates a Dockerfile.bv based on whichever of these it finds first (in order):

looks for	base image	generated build step
`Dockerfile`	(used as-is)	no generation
`environment.yml` / `environment.yaml`	`mambaorg/micromamba:1.5`	`micromamba install -y -n base -f env.yml`
`pyproject.toml` with `[build-system]`	`python:3.11-slim`	`pip install --no-cache-dir .`
`requirements.txt`	`python:3.11-slim`	`pip install --no-cache-dir -r requirements.txt`
`Cargo.toml`	`rust:1.75` → `debian:bookworm-slim`	multi-stage: `cargo build --release`, copy binaries to `/usr/local/bin`
`Makefile`	`debian:bookworm-slim` + `build-essential`	`make`

If none match and there's no Dockerfile, bv publish fails with a clear error: add a Dockerfile or write a bv-publish.toml. The generated Dockerfile.bv is left in your working directory; commit it (or replace it with your own) before re-running.

where it gets the manifest fields

manifest field	source
`id` (tool name)	directory name or repo name; overridable with `--tool-name` or the interactive prompt.
`version`	`--tool-version` → git tag from `@ref` → `git describe --tags` → prompt.
`description`, `homepage`, `license`	prompts in interactive mode; from `bv-publish.toml` in non-interactive.
`[[tool.inputs]]` / `[[tool.outputs]]`	prompts (one I/O per add); types must be in the `bv-types` vocabulary.
`[tool.hardware]`	prompts for cpu / ram / disk / GPU; sensible defaults filled in.
`[tool.entrypoint]`	prompted; usually the binary name your Dockerfile installs.
`tier`	always starts as `community`. Promotion is a separate registry PR.
`[tool.image].digest`	computed automatically from `docker manifest inspect` after the push.

what it pushes, and where

image: built and pushed to ghcr.io/<your-github-username>/<tool>:<version> by default, so you don't need write access to any shared org: a normal GitHub token (with write:packages scope) is enough. Override the namespace with --push-to <org> if you want to push somewhere else, like a lab-shared GHCR org. bv logs in to GHCR with your GHCR_TOKEN, falling back to GITHUB_TOKEN. Multi-arch builds use --platform.
digest: resolved by docker manifest inspect immediately after the push and embedded in the manifest [tool.image].digest for reproducibility.
registry PR: a new branch add-<tool>-<version> is opened against mlberkeley/bv-registry (override with --registry-repo owner/repo), adding tools/<tool>/<version>.toml. The PR body links back to your source URL (the local path or github.com/owner/repo).

Skip stages with flags: --no-push stops after building (manifest is printed; no GHCR write); --no-pr pushes the image but doesn't open the PR; passing both is a useful dry run.

For a release-on-tag GitHub Action, drop this into .github/workflows/bv-publish.yml:

on:
  release:
    types: [published]
jobs:
  publish:
    uses: mlberkeley/bv/.github/workflows/bv-publish.yml@main
    with:
      tool-name: my-tool
    secrets:
      GHCR_TOKEN:        ${{ secrets.GHCR_TOKEN }}
      BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}

conformance tests

By default, bv conformance <tool> pulls your image and smoke-tests every binary you declared in [tool.binaries]. For each one it tries --version, -version, --help, -h, -v, version in order, and considers the binary alive if any of them exits 0. Most tools need no extra config.

Run it locally before opening the PR:

bv conformance my-tool
bv conformance my-tool --backend apptainer

For unusual binaries, add a [tool.smoke] block:

[tool.smoke]
# Pin a specific probe arg for binaries that don't accept any of the defaults.
probes = { weird-tool = "--check", another = "" }   # "" runs the binary with no args

# Skip binaries with no safe non-destructive invocation (daemons, REPLs that
# wait on stdin forever). They still get shims; conformance just skips them.
skip = ["server-daemon"]

Conformance runs in CI on every registry PR. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.

tiers

tier	requirements
`experimental`	basic checks pass; may lack typed I/O. Hidden from default search.
`community`	typed I/O present, conformance passes, manifest valid. Default for new submissions.
`core`	actively maintained, recognized publisher, runs on docker and apptainer, conformance passes on both.

Promotion is a separate PR by a registry maintainer. See GOVERNANCE.md for the full criteria.

new versions

One file per version. Add tools/<id>/<newver>.toml; do not edit the old one. The website and bv search surface the latest available version per tool by default; users can request older versions explicitly with bv add <tool>@<ver>.

For maintainers

running bv-ingest, reviewing PRs, promoting tools

tl;dr

# nightly auto-ingestion (also runs from .github/workflows/)
bv-ingest run --limit 50

# review the staging PRs that need typed I/O
bv-ingest review --staging-dir ./bv-registry-staging

# promote a reviewed tool from staging to bv-registry
bv-ingest promote samtools 1.20 --staging-dir ./bv-registry-staging

the pipeline

bv-ingest turns Bioconda recipes into draft manifests so the registry stays current without manual scraping. Two repos cooperate:

bv-registry-staging : auto-generated drafts. Manifests start in community tier without typed I/O. PRs land here automatically.
bv-registry : the live registry users pull from. Tools graduate here only after a maintainer reviews and promotes them.

End-to-end:

fetch recipes: clone bioconda-recipes, parse meta.yaml for build / test / run_exports, derive binary names.
resolve images: query quay.io/biocontainers for the matching tag and digest. Skip tools without a published BioContainer.
generate manifest: write tools/<id>/<version>.toml in staging with hardware defaults, exposed binaries, but no typed I/O.
open PR: one PR per (tool, version) against bv-registry-staging.
review: maintainer adds [[tool.inputs]] / [[tool.outputs]] using the bv-types vocabulary, and (if needed) a [tool.smoke] override.
promote: bv-ingest promote opens a PR to bv-registry. CI runs conformance; merge ships it.

bv-ingest commands

command	what it does
`bv-ingest run [--dry-run] [--limit N] [--tool ID]`	full pipeline. Default opens PRs against `BV_STAGING_REPO`. `--dry-run` prints what would happen.
`bv-ingest review --staging-dir <path>`	list manifests still missing typed I/O, or with `--show TOOL/VERSION`, dump one for review.
`bv-ingest promote <tool> <version> --staging-dir <path>`	copy the reviewed manifest from staging to bv-registry and open a PR.
`bv-ingest status --staging-dir <path>`	count of staged, reviewed, and promoted manifests.

Common env vars:

BV_STAGING_REPO   = "mlberkeley/bv-registry-staging"
BV_REGISTRY_REPO  = "mlberkeley/bv-registry"
BV_BIOCONDA_CACHE = "/var/tmp/bioconda-recipes"   # local clone, optional
GITHUB_TOKEN      = ...                            # falls back to `gh auth token`

reviewing PRs

Auto-generated PRs come in tagged auto-ingest. The fast path:

Check the upstream tool's docs to identify what file types its main entrypoint reads and writes.
Add [[tool.inputs]] / [[tool.outputs]] blocks. If a needed type does not exist in bv-types, add it there first (separate PR).
If the tool downloads model weights or writes large scratch state inside the image, add cache_paths = [...]. Look for clues in the upstream Dockerfile (WORKDIR, VOLUME) or run bv run <tool> on apptainer and watch for read-only-fs errors.
Add a [tool.smoke] override only if any binary needs a non-default probe (the default loop covers --version, -version, --help, -h, -v, version).
Run bv conformance <tool> locally on both backends.
Approve the staging PR. Once merged, run bv-ingest promote.

Heuristic for typed I/O. Most bioconda tools take FASTA / FASTQ / BAM / VCF and emit one or two of the same. When in doubt, look at the test commands declared in the bioconda recipe; they reveal what the tool expects.

promoting to core

A tool moves from community to core only when:

typed I/O is complete and uses canonical types (no ad-hoc string placeholders);
the maintainer is a recognized publisher (project author, lab, or accepted volunteer maintainer);
conformance passes on both docker and apptainer in CI;
the tool has had at least one published version active for 30 days without an unfixed issue tagged broken.

Open a separate PR labelled tier-promote that flips tier = "core". Two maintainer approvals merge it.