bv docs

how to use, publish, and maintain

bv documentation

Three audiences, one page. Jump to whichever applies.

For end users

running tools in your project

tl;dr

# 1. install a runtime (pick one)
curl -fsSL https://get.docker.com/rootless | sh        # docker, on a laptop
conda install -c conda-forge apptainer                 # apptainer, on HPC

# 2. install bv
curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh

# 3. add a tool, call its binary, commit the lockfile
mkdir myproj && cd myproj
bv add blast
bv run blastn -version
git add bv.toml bv.lock && git commit -m "pin tools"

# 4. on any other machine: same images, by digest
bv sync

install

bv needs Docker or Apptainer/Singularity, plus git. No Python, no conda.

Docker is typical on a laptop. Use the rootless installer when you can; on a GPU box install nvidia-container-toolkit too.

Apptainer is typical on shared HPC nodes since it does not need a daemon or root. Install with conda: conda install -c conda-forge apptainer.

Then install bv itself:

curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh
# or, with cargo:
cargo install biov

Verify with bv doctor, which prints the runtimes it can see and any missing pieces.

core commands

commandwhat it does
bv add <tool>[@ver]resolve from the registry, pull the image, write bv.toml and bv.lock, generate shims
bv run <binary> <args>run a binary inside its container; current directory is mounted at /workspace
bv exec <cmd>run any command with the project's binaries on PATH (good for scripts, Make, Snakemake, CI)
bv shellinteractive subshell with the project active
bv syncpull every image pinned in bv.lock by digest and regenerate shims
bv list [--binaries]show installed tools, or the binary routing table
bv search <query>search the registry by name, description, or I/O type
bv show <tool> [--format json|mcp|json-schema]typed I/O schema and metadata
bv lock [--check]regenerate bv.lock; --check exits 1 if anything would change
bv doctorenvironment check (runtimes, GPU, project state)

bv.toml & bv.lock

bv.toml is what you wrote (or what bv add wrote for you). bv.lock is the resolved, pinned state. Commit both. .bv/ (generated shim directory) is gitignored automatically.

# bv.toml
[project]
name = "myproj"

[[tools]]
id = "blast"
version = "=2.15.0"

[[tools]]
id = "hmmer"

[runtime]
backend = "auto"      # docker | apptainer | auto (default)

If two tools expose the same binary name, bv lock fails with a clear error. Resolve it with [binary_overrides]:

[binary_overrides]
samtools = "samtools"   # this tool wins for the `samtools` shim

caches

Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (think ColabFold writing to /cache/colabfold). bv binds writable host directories at the right paths automatically. The set of paths is resolved in three layers:

  1. Tool manifest (cache_paths) : the tool author's authoritative list. ColabFold's manifest declares cache_paths = ["/cache/colabfold"].
  2. Your [[cache]] entries in bv.toml : add new paths or redirect any existing path to a different host directory.
  3. Apptainer fallbacks : for tools that haven't declared cache paths yet, bv auto-binds /cache and /root/.cache.

Default host path: ~/.cache/bv/<tool>/<slug>. Docker skips the apptainer fallbacks because its writable upper layer already covers them; manifest and user entries apply on both backends.

# bv.toml : redirect colabfold weights to a shared NFS cache
[[cache]]
match = "colabfold"
container_path = "/cache/colabfold"
host_path = "/srv/shared/colabfold-weights"

# add an extra path for every tool
[[cache]]
match = "*"
container_path = "/tmp/scratch"
host_path = "~/.cache/bv/{tool}/scratch"

The {tool} token is replaced with the tool id; ~ expands to $HOME.

backends

bv auto-detects what runtime is available. Pin it explicitly with --backend, the BV_BACKEND env var, or [runtime] backend = "apptainer" in bv.toml.

featuredockerapptainer
root neededdaemon needs privileges (rootless mode available)no
GPU flag--gpus all--nv
image cachedocker image storeSIF files in ~/.local/share/bv/sif
writable container FSyes (upper layer)no (use cache mounts)

reference data

Some tools need large reference databases (kraken2, blast pdbaa, etc.). The manifest declares them; bv add tells you what's needed; bv data fetch downloads.

bv add kraken2
bv data fetch pdbaa --yes
bv run kraken2 ...      # bv mounts the data directory automatically

troubleshooting

For publishers

adding your tool to the registry

tl;dr

# in a directory with a Dockerfile (or pointing at a github repo)
bv publish .
# or:
bv publish github:user/repo@v1.0.0

# answer the prompts (name, version, description, typed I/O),
# bv builds the image, pushes to ghcr, and opens a PR to bv-registry.

manifest schema

A manifest lives at tools/<id>/<version>.toml in bv-registry. The full reference is in SCHEMA.md; below is the cheat sheet.

[tool]
id          = "colabfold"
version     = "1.6.0"
description = "ColabFold: fast protein structure prediction"
homepage    = "https://github.com/sokrypton/ColabFold"
license     = "MIT"
tier        = "core"        # core | community | experimental
maintainers = ["github:sokrypton"]

[tool.image]
backend   = "docker"
reference = "ghcr.io/sokrypton/colabfold:1.6.0-cuda12"
# digest is added automatically at lock time

[tool.hardware]
cpu_cores = 8
ram_gb    = 16.0
disk_gb   = 10.0

[tool.hardware.gpu]
required     = true
min_vram_gb  = 8
cuda_version = "12.0"

[[tool.inputs]]
name        = "fasta"
type        = "fasta[protein]"
cardinality = "one"
description = "Input protein sequences"

[[tool.outputs]]
name        = "output_dir"
type        = "dir"
cardinality = "one"
description = "Predicted structures and confidence scores"

[tool.entrypoint]
command       = "colabfold_batch"
args_template = "--num-recycle 3 {fasta} {output_dir}"

# Container paths the tool writes to. bv binds these to writable host
# dirs (critical on apptainer's read-only SIF root). Skip if the tool
# does not write inside the image.
cache_paths = ["/cache/colabfold"]

[tool.binaries]
exposed = ["colabfold_batch"]
Typed I/O matters. Inputs and outputs use the bv-types vocabulary (fasta, fasta[protein], blast_tab, pdb, etc.). Tools without typed I/O sit in the experimental tier and are hidden from default search results.

bv publish

bv publish handles fetching the source, generating a Dockerfile if you don't have one, building the image, pushing to GHCR, and opening the registry PR. It can run interactively or read a bv-publish.toml config.

bv publish ./my-tool                     # local directory, interactive
bv publish github:user/repo@v2.1.0       # github source, clones at the tag
bv publish . --non-interactive           # CI mode (reads bv-publish.toml)
bv publish . --no-push --no-pr           # dry run: build, print manifest, exit

sources it accepts

build systems it knows about

If your repo doesn't ship a Dockerfile, bv generates a Dockerfile.bv based on whichever of these it finds first (in order):

looks forbase imagegenerated build step
Dockerfile(used as-is)no generation
environment.yml / environment.yamlmambaorg/micromamba:1.5micromamba install -y -n base -f env.yml
pyproject.toml with [build-system]python:3.11-slimpip install --no-cache-dir .
requirements.txtpython:3.11-slimpip install --no-cache-dir -r requirements.txt
Cargo.tomlrust:1.75debian:bookworm-slimmulti-stage: cargo build --release, copy binaries to /usr/local/bin
Makefiledebian:bookworm-slim + build-essentialmake

If none match and there's no Dockerfile, bv publish fails with a clear error: add a Dockerfile or write a bv-publish.toml. The generated Dockerfile.bv is left in your working directory; commit it (or replace it with your own) before re-running.

where it gets the manifest fields

manifest fieldsource
id (tool name)directory name or repo name; overridable with --tool-name or the interactive prompt.
version--tool-version → git tag from @refgit describe --tags → prompt.
description, homepage, licenseprompts in interactive mode; from bv-publish.toml in non-interactive.
[[tool.inputs]] / [[tool.outputs]]prompts (one I/O per add); types must be in the bv-types vocabulary.
[tool.hardware]prompts for cpu / ram / disk / GPU; sensible defaults filled in.
[tool.entrypoint]prompted; usually the binary name your Dockerfile installs.
tieralways starts as community. Promotion is a separate registry PR.
[tool.image].digestcomputed automatically from docker manifest inspect after the push.

what it pushes, and where

  1. image: built and pushed to ghcr.io/<your-github-username>/<tool>:<version> by default, so you don't need write access to any shared org: a normal GitHub token (with write:packages scope) is enough. Override the namespace with --push-to <org> if you want to push somewhere else, like a lab-shared GHCR org. bv logs in to GHCR with your GHCR_TOKEN, falling back to GITHUB_TOKEN. Multi-arch builds use --platform.
  2. digest: resolved by docker manifest inspect immediately after the push and embedded in the manifest [tool.image].digest for reproducibility.
  3. registry PR: a new branch add-<tool>-<version> is opened against mlberkeley/bv-registry (override with --registry-repo owner/repo), adding tools/<tool>/<version>.toml. The PR body links back to your source URL (the local path or github.com/owner/repo).

Skip stages with flags: --no-push stops after building (manifest is printed; no GHCR write); --no-pr pushes the image but doesn't open the PR; passing both is a useful dry run.

For a release-on-tag GitHub Action, drop this into .github/workflows/bv-publish.yml:

on:
  release:
    types: [published]
jobs:
  publish:
    uses: mlberkeley/bv/.github/workflows/bv-publish.yml@main
    with:
      tool-name: my-tool
    secrets:
      GHCR_TOKEN:        ${{ secrets.GHCR_TOKEN }}
      BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}

conformance tests

By default, bv conformance <tool> pulls your image and smoke-tests every binary you declared in [tool.binaries]. For each one it tries --version, -version, --help, -h, -v, version in order, and considers the binary alive if any of them exits 0. Most tools need no extra config.

Run it locally before opening the PR:

bv conformance my-tool
bv conformance my-tool --backend apptainer

For unusual binaries, add a [tool.smoke] block:

[tool.smoke]
# Pin a specific probe arg for binaries that don't accept any of the defaults.
probes = { weird-tool = "--check", another = "" }   # "" runs the binary with no args

# Skip binaries with no safe non-destructive invocation (daemons, REPLs that
# wait on stdin forever). They still get shims; conformance just skips them.
skip = ["server-daemon"]

Conformance runs in CI on every registry PR. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.

tiers

tierrequirements
experimentalbasic checks pass; may lack typed I/O. Hidden from default search.
communitytyped I/O present, conformance passes, manifest valid. Default for new submissions.
coreactively maintained, recognized publisher, runs on docker and apptainer, conformance passes on both.

Promotion is a separate PR by a registry maintainer. See GOVERNANCE.md for the full criteria.

new versions

One file per version. Add tools/<id>/<newver>.toml; do not edit the old one. The website and bv search surface the latest available version per tool by default; users can request older versions explicitly with bv add <tool>@<ver>.

For maintainers

running bv-ingest, reviewing PRs, promoting tools

tl;dr

# nightly auto-ingestion (also runs from .github/workflows/)
bv-ingest run --limit 50

# review the staging PRs that need typed I/O
bv-ingest review --staging-dir ./bv-registry-staging

# promote a reviewed tool from staging to bv-registry
bv-ingest promote samtools 1.20 --staging-dir ./bv-registry-staging

the pipeline

bv-ingest turns Bioconda recipes into draft manifests so the registry stays current without manual scraping. Two repos cooperate:

End-to-end:

  1. fetch recipes: clone bioconda-recipes, parse meta.yaml for build / test / run_exports, derive binary names.
  2. resolve images: query quay.io/biocontainers for the matching tag and digest. Skip tools without a published BioContainer.
  3. generate manifest: write tools/<id>/<version>.toml in staging with hardware defaults, exposed binaries, but no typed I/O.
  4. open PR: one PR per (tool, version) against bv-registry-staging.
  5. review: maintainer adds [[tool.inputs]] / [[tool.outputs]] using the bv-types vocabulary, and (if needed) a [tool.smoke] override.
  6. promote: bv-ingest promote opens a PR to bv-registry. CI runs conformance; merge ships it.

bv-ingest commands

commandwhat it does
bv-ingest run [--dry-run] [--limit N] [--tool ID]full pipeline. Default opens PRs against BV_STAGING_REPO. --dry-run prints what would happen.
bv-ingest review --staging-dir <path>list manifests still missing typed I/O, or with --show TOOL/VERSION, dump one for review.
bv-ingest promote <tool> <version> --staging-dir <path>copy the reviewed manifest from staging to bv-registry and open a PR.
bv-ingest status --staging-dir <path>count of staged, reviewed, and promoted manifests.

Common env vars:

BV_STAGING_REPO   = "mlberkeley/bv-registry-staging"
BV_REGISTRY_REPO  = "mlberkeley/bv-registry"
BV_BIOCONDA_CACHE = "/var/tmp/bioconda-recipes"   # local clone, optional
GITHUB_TOKEN      = ...                            # falls back to `gh auth token`

reviewing PRs

Auto-generated PRs come in tagged auto-ingest. The fast path:

  1. Check the upstream tool's docs to identify what file types its main entrypoint reads and writes.
  2. Add [[tool.inputs]] / [[tool.outputs]] blocks. If a needed type does not exist in bv-types, add it there first (separate PR).
  3. If the tool downloads model weights or writes large scratch state inside the image, add cache_paths = [...]. Look for clues in the upstream Dockerfile (WORKDIR, VOLUME) or run bv run <tool> on apptainer and watch for read-only-fs errors.
  4. Add a [tool.smoke] override only if any binary needs a non-default probe (the default loop covers --version, -version, --help, -h, -v, version).
  5. Run bv conformance <tool> locally on both backends.
  6. Approve the staging PR. Once merged, run bv-ingest promote.
Heuristic for typed I/O. Most bioconda tools take FASTA / FASTQ / BAM / VCF and emit one or two of the same. When in doubt, look at the test commands declared in the bioconda recipe; they reveal what the tool expects.

promoting to core

A tool moves from community to core only when:

Open a separate PR labelled tier-promote that flips tier = "core". Two maintainer approvals merge it.