bv documentation
Three audiences, one page. Jump to whichever applies.
For end users
running tools in your projecttl;dr
# 1. install a runtime (pick one)
curl -fsSL https://get.docker.com/rootless | sh # docker, on a laptop
conda install -c conda-forge apptainer # apptainer, on HPC
# 2. install bv
curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh
# 3. add a tool, call its binary, commit the lockfile
mkdir myproj && cd myproj
bv add blast
bv run blastn -version
git add bv.toml bv.lock && git commit -m "pin tools"
# 4. on any other machine: same images, by digest
bv sync
install
bv needs Docker or Apptainer/Singularity, plus git. No Python, no conda.
Docker is typical on a laptop. Use the rootless installer when you can; on a GPU box install nvidia-container-toolkit too.
Apptainer is typical on shared HPC nodes since it does not need a daemon or root. Install with conda: conda install -c conda-forge apptainer.
Then install bv itself:
curl -fsSL https://raw.githubusercontent.com/mlberkeley/bv/main/install.sh | sh
# or, with cargo:
cargo install biov
Verify with bv doctor, which prints the runtimes it can see and any missing pieces.
core commands
| command | what it does |
|---|---|
bv add <tool>[@ver] | resolve from the registry, pull the image, write bv.toml and bv.lock, generate shims |
bv run <binary> <args> | run a binary inside its container; current directory is mounted at /workspace |
bv exec <cmd> | run any command with the project's binaries on PATH (good for scripts, Make, Snakemake, CI) |
bv shell | interactive subshell with the project active |
bv sync | pull every image pinned in bv.lock by digest and regenerate shims |
bv list [--binaries] | show installed tools, or the binary routing table |
bv search <query> | search the registry by name, description, or I/O type |
bv show <tool> [--format json|mcp|json-schema] | typed I/O schema and metadata |
bv lock [--check] | regenerate bv.lock; --check exits 1 if anything would change |
bv doctor | environment check (runtimes, GPU, project state) |
bv.toml & bv.lock
bv.toml is what you wrote (or what bv add wrote for you). bv.lock is the resolved, pinned state. Commit both. .bv/ (generated shim directory) is gitignored automatically.
# bv.toml
[project]
name = "myproj"
[[tools]]
id = "blast"
version = "=2.15.0"
[[tools]]
id = "hmmer"
[runtime]
backend = "auto" # docker | apptainer | auto (default)
If two tools expose the same binary name, bv lock fails with a clear error. Resolve it with [binary_overrides]:
[binary_overrides]
samtools = "samtools" # this tool wins for the `samtools` shim
caches
Apptainer runs containers with a read-only root filesystem, so any tool that downloads model weights or scratches to disk inside the image will fail (think ColabFold writing to /cache/colabfold). bv binds writable host directories at the right paths automatically. The set of paths is resolved in three layers:
- Tool manifest (
cache_paths) : the tool author's authoritative list. ColabFold's manifest declarescache_paths = ["/cache/colabfold"]. - Your
[[cache]]entries inbv.toml: add new paths or redirect any existing path to a different host directory. - Apptainer fallbacks : for tools that haven't declared cache paths yet, bv auto-binds
/cacheand/root/.cache.
Default host path: ~/.cache/bv/<tool>/<slug>. Docker skips the apptainer fallbacks because its writable upper layer already covers them; manifest and user entries apply on both backends.
# bv.toml : redirect colabfold weights to a shared NFS cache
[[cache]]
match = "colabfold"
container_path = "/cache/colabfold"
host_path = "/srv/shared/colabfold-weights"
# add an extra path for every tool
[[cache]]
match = "*"
container_path = "/tmp/scratch"
host_path = "~/.cache/bv/{tool}/scratch"
The {tool} token is replaced with the tool id; ~ expands to $HOME.
backends
bv auto-detects what runtime is available. Pin it explicitly with --backend, the BV_BACKEND env var, or [runtime] backend = "apptainer" in bv.toml.
| feature | docker | apptainer |
|---|---|---|
| root needed | daemon needs privileges (rootless mode available) | no |
| GPU flag | --gpus all | --nv |
| image cache | docker image store | SIF files in ~/.local/share/bv/sif |
| writable container FS | yes (upper layer) | no (use cache mounts) |
reference data
Some tools need large reference databases (kraken2, blast pdbaa, etc.). The manifest declares them; bv add tells you what's needed; bv data fetch downloads.
bv add kraken2
bv data fetch pdbaa --yes
bv run kraken2 ... # bv mounts the data directory automatically
troubleshooting
Image not available locallyafterbv add: runbv syncto pull by digest.- Read-only filesystem errors from a tool on apptainer: the tool writes to a path bv hasn't bound. Add a
[[cache]]entry, or open an issue against the registry asking the maintainer to add it tocache_paths. - GPU not detected: check
nvidia-smifirst, thenbv doctor. On Docker, you neednvidia-container-toolkit; on Apptainer, the host's NVIDIA libraries. - Conflicting binary names across tools:
bv locktells you exactly which binaries collide. Use[binary_overrides].
For publishers
adding your tool to the registrytl;dr
# in a directory with a Dockerfile (or pointing at a github repo)
bv publish .
# or:
bv publish github:user/repo@v1.0.0
# answer the prompts (name, version, description, typed I/O),
# bv builds the image, pushes to ghcr, and opens a PR to bv-registry.
manifest schema
A manifest lives at tools/<id>/<version>.toml in bv-registry. The full reference is in SCHEMA.md; below is the cheat sheet.
[tool]
id = "colabfold"
version = "1.6.0"
description = "ColabFold: fast protein structure prediction"
homepage = "https://github.com/sokrypton/ColabFold"
license = "MIT"
tier = "core" # core | community | experimental
maintainers = ["github:sokrypton"]
[tool.image]
backend = "docker"
reference = "ghcr.io/sokrypton/colabfold:1.6.0-cuda12"
# digest is added automatically at lock time
[tool.hardware]
cpu_cores = 8
ram_gb = 16.0
disk_gb = 10.0
[tool.hardware.gpu]
required = true
min_vram_gb = 8
cuda_version = "12.0"
[[tool.inputs]]
name = "fasta"
type = "fasta[protein]"
cardinality = "one"
description = "Input protein sequences"
[[tool.outputs]]
name = "output_dir"
type = "dir"
cardinality = "one"
description = "Predicted structures and confidence scores"
[tool.entrypoint]
command = "colabfold_batch"
args_template = "--num-recycle 3 {fasta} {output_dir}"
# Container paths the tool writes to. bv binds these to writable host
# dirs (critical on apptainer's read-only SIF root). Skip if the tool
# does not write inside the image.
cache_paths = ["/cache/colabfold"]
[tool.binaries]
exposed = ["colabfold_batch"]
bv-types vocabulary (fasta, fasta[protein], blast_tab, pdb, etc.). Tools without typed I/O sit in the experimental tier and are hidden from default search results.
bv publish
bv publish handles fetching the source, generating a Dockerfile if you don't have one, building the image, pushing to GHCR, and opening the registry PR. It can run interactively or read a bv-publish.toml config.
bv publish ./my-tool # local directory, interactive
bv publish github:user/repo@v2.1.0 # github source, clones at the tag
bv publish . --non-interactive # CI mode (reads bv-publish.toml)
bv publish . --no-push --no-pr # dry run: build, print manifest, exit
sources it accepts
- local directory:
bv publish ./my-tooluses the directory as-is. The directory name becomes the tool-name hint;git describe --tagsin that directory becomes the version hint. - github:
bv publish github:owner/repo[@ref]shallow-clones the repo into a tempdir.repobecomes the tool-name hint. The version hint comes from the@ref(with leadingvstripped) or, if omitted, the latest tag in the clone.
build systems it knows about
If your repo doesn't ship a Dockerfile, bv generates a Dockerfile.bv based on whichever of these it finds first (in order):
| looks for | base image | generated build step |
|---|---|---|
Dockerfile | (used as-is) | no generation |
environment.yml / environment.yaml | mambaorg/micromamba:1.5 | micromamba install -y -n base -f env.yml |
pyproject.toml with [build-system] | python:3.11-slim | pip install --no-cache-dir . |
requirements.txt | python:3.11-slim | pip install --no-cache-dir -r requirements.txt |
Cargo.toml | rust:1.75 → debian:bookworm-slim | multi-stage: cargo build --release, copy binaries to /usr/local/bin |
Makefile | debian:bookworm-slim + build-essential | make |
If none match and there's no Dockerfile, bv publish fails with a clear error: add a Dockerfile or write a bv-publish.toml. The generated Dockerfile.bv is left in your working directory; commit it (or replace it with your own) before re-running.
where it gets the manifest fields
| manifest field | source |
|---|---|
id (tool name) | directory name or repo name; overridable with --tool-name or the interactive prompt. |
version | --tool-version → git tag from @ref → git describe --tags → prompt. |
description, homepage, license | prompts in interactive mode; from bv-publish.toml in non-interactive. |
[[tool.inputs]] / [[tool.outputs]] | prompts (one I/O per add); types must be in the bv-types vocabulary. |
[tool.hardware] | prompts for cpu / ram / disk / GPU; sensible defaults filled in. |
[tool.entrypoint] | prompted; usually the binary name your Dockerfile installs. |
tier | always starts as community. Promotion is a separate registry PR. |
[tool.image].digest | computed automatically from docker manifest inspect after the push. |
what it pushes, and where
- image: built and pushed to
ghcr.io/<your-github-username>/<tool>:<version>by default, so you don't need write access to any shared org: a normal GitHub token (withwrite:packagesscope) is enough. Override the namespace with--push-to <org>if you want to push somewhere else, like a lab-shared GHCR org.bvlogs in to GHCR with yourGHCR_TOKEN, falling back toGITHUB_TOKEN. Multi-arch builds use--platform. - digest: resolved by
docker manifest inspectimmediately after the push and embedded in the manifest[tool.image].digestfor reproducibility. - registry PR: a new branch
add-<tool>-<version>is opened againstmlberkeley/bv-registry(override with--registry-repo owner/repo), addingtools/<tool>/<version>.toml. The PR body links back to your source URL (the local path orgithub.com/owner/repo).
Skip stages with flags: --no-push stops after building (manifest is printed; no GHCR write); --no-pr pushes the image but doesn't open the PR; passing both is a useful dry run.
For a release-on-tag GitHub Action, drop this into .github/workflows/bv-publish.yml:
on:
release:
types: [published]
jobs:
publish:
uses: mlberkeley/bv/.github/workflows/bv-publish.yml@main
with:
tool-name: my-tool
secrets:
GHCR_TOKEN: ${{ secrets.GHCR_TOKEN }}
BV_REGISTRY_TOKEN: ${{ secrets.BV_REGISTRY_TOKEN }}
conformance tests
By default, bv conformance <tool> pulls your image and smoke-tests every binary you declared in [tool.binaries]. For each one it tries --version, -version, --help, -h, -v, version in order, and considers the binary alive if any of them exits 0. Most tools need no extra config.
Run it locally before opening the PR:
bv conformance my-tool
bv conformance my-tool --backend apptainer
For unusual binaries, add a [tool.smoke] block:
[tool.smoke]
# Pin a specific probe arg for binaries that don't accept any of the defaults.
probes = { weird-tool = "--check", another = "" } # "" runs the binary with no args
# Skip binaries with no safe non-destructive invocation (daemons, REPLs that
# wait on stdin forever). They still get shims; conformance just skips them.
skip = ["server-daemon"]
Conformance runs in CI on every registry PR. Today it's a smoke check only; running tools on canonical inputs and validating typed outputs is on the v2 roadmap.
tiers
| tier | requirements |
|---|---|
experimental | basic checks pass; may lack typed I/O. Hidden from default search. |
community | typed I/O present, conformance passes, manifest valid. Default for new submissions. |
core | actively maintained, recognized publisher, runs on docker and apptainer, conformance passes on both. |
Promotion is a separate PR by a registry maintainer. See GOVERNANCE.md for the full criteria.
new versions
One file per version. Add tools/<id>/<newver>.toml; do not edit the old one. The website and bv search surface the latest available version per tool by default; users can request older versions explicitly with bv add <tool>@<ver>.
For maintainers
running bv-ingest, reviewing PRs, promoting toolstl;dr
# nightly auto-ingestion (also runs from .github/workflows/)
bv-ingest run --limit 50
# review the staging PRs that need typed I/O
bv-ingest review --staging-dir ./bv-registry-staging
# promote a reviewed tool from staging to bv-registry
bv-ingest promote samtools 1.20 --staging-dir ./bv-registry-staging
the pipeline
bv-ingest turns Bioconda recipes into draft manifests so the registry stays current without manual scraping. Two repos cooperate:
- bv-registry-staging : auto-generated drafts. Manifests start in
communitytier without typed I/O. PRs land here automatically. - bv-registry : the live registry users pull from. Tools graduate here only after a maintainer reviews and promotes them.
End-to-end:
- fetch recipes: clone bioconda-recipes, parse
meta.yamlfor build / test / run_exports, derive binary names. - resolve images: query quay.io/biocontainers for the matching tag and digest. Skip tools without a published BioContainer.
- generate manifest: write
tools/<id>/<version>.tomlin staging with hardware defaults, exposed binaries, but no typed I/O. - open PR: one PR per (tool, version) against bv-registry-staging.
- review: maintainer adds
[[tool.inputs]]/[[tool.outputs]]using thebv-typesvocabulary, and (if needed) a[tool.smoke]override. - promote:
bv-ingest promoteopens a PR to bv-registry. CI runs conformance; merge ships it.
bv-ingest commands
| command | what it does |
|---|---|
bv-ingest run [--dry-run] [--limit N] [--tool ID] | full pipeline. Default opens PRs against BV_STAGING_REPO. --dry-run prints what would happen. |
bv-ingest review --staging-dir <path> | list manifests still missing typed I/O, or with --show TOOL/VERSION, dump one for review. |
bv-ingest promote <tool> <version> --staging-dir <path> | copy the reviewed manifest from staging to bv-registry and open a PR. |
bv-ingest status --staging-dir <path> | count of staged, reviewed, and promoted manifests. |
Common env vars:
BV_STAGING_REPO = "mlberkeley/bv-registry-staging"
BV_REGISTRY_REPO = "mlberkeley/bv-registry"
BV_BIOCONDA_CACHE = "/var/tmp/bioconda-recipes" # local clone, optional
GITHUB_TOKEN = ... # falls back to `gh auth token`
reviewing PRs
Auto-generated PRs come in tagged auto-ingest. The fast path:
- Check the upstream tool's docs to identify what file types its main entrypoint reads and writes.
- Add
[[tool.inputs]]/[[tool.outputs]]blocks. If a needed type does not exist inbv-types, add it there first (separate PR). - If the tool downloads model weights or writes large scratch state inside the image, add
cache_paths = [...]. Look for clues in the upstream Dockerfile (WORKDIR,VOLUME) or runbv run <tool>on apptainer and watch for read-only-fs errors. - Add a
[tool.smoke]override only if any binary needs a non-default probe (the default loop covers--version,-version,--help,-h,-v,version). - Run
bv conformance <tool>locally on both backends. - Approve the staging PR. Once merged, run
bv-ingest promote.
promoting to core
A tool moves from community to core only when:
- typed I/O is complete and uses canonical types (no ad-hoc
stringplaceholders); - the maintainer is a recognized publisher (project author, lab, or accepted volunteer maintainer);
- conformance passes on both docker and apptainer in CI;
- the tool has had at least one published version active for 30 days without an unfixed issue tagged
broken.
Open a separate PR labelled tier-promote that flips tier = "core". Two maintainer approvals merge it.