Deploying a new branchwater instance¶

Deploying a new branchwater instance involves bringing up a couple of components:

branchwater-web, the web frontend at https://branchwater.sourmash.bio
branchwater-server, the backend serving the RocksDB inverted index for sourmash signatures
a duckdb database for the SRA metadata used for branchwater-web results

A diagram of how these components are connected:

        graph LR;
classDef server fill:#4A902A,stroke:#333,stroke-width:4px,color:#fff;
classDef web fill:#dc6c11,stroke:#333,stroke-width:4px,color:#fff;
classDef index fill:#3c48cc,stroke:#333,stroke-width:4px,color:#fff;
classDef duckdb fill:#6980e9,stroke:#333,stroke-width:4px,color:#fff;
classDef client fill:#8450e1,stroke:#333,stroke-width:4px,color:#fff;

A01(browser):::client --> B01(web):::web
B01 --> C01(server):::server
B01 --> D01[(duckdb)]:::duckdb
C01 --> E01[(index)]:::index

Be it for development or production usage, a docker-compose configuration is available in the repo that can bring up these components in the appropriate order.

Quickstart: branchwater with a demo dataset¶

Clone the repo¶

git clone https://github.com/sourmash-bio/branchwater
cd branchwater

Set up dependencies¶

We use pixi for managing dependencies and running tasks for branchwater development, you can install it with

curl -fsSL https://pixi.sh/install.sh | bash

or check updated instructions on their website.

For deploying a complete development or production environment, we have a docker-compose.yml configuration describing the containers and how they connect together. For using this configuration, either docker compose or podman-compose is needed. While there are many ways to get them installed, on MacOS or Windows there are a couple of “Desktop” versions with a complete solution (GUI, start services, configure networking) to make it easy to get started.

Note

You only need one of docker or podman, no need to install both!

Docker Desktop

We recommend setting up Rancher Desktop for development with docker compose. Follow instructions from their website to set it up for your operating system.

Podman Desktop

Podman Desktop is the “Desktop” equivalent for Podman. Follow instruction on their website to set it up for your operating system.

pixi tasks default to run with docker compose, if you’re using podman you need to update tasks to use

"podman-compose"

instead of

"docker compose"

especially in the deploy and metadata tasks. Edit the pixi.toml file and replace entries accordingly.

The demo dataset¶

The demo dataset included in the repo has the following SRA accessions:

ERR272375, a salt marsh metagenome
SRR5439749, a human gut metagenome
SRR20285055, an air metagenome
SRR24480609, a gut metagenome
ERR3220185, a bovine gut metagenome
SRR6269135, a marine metagenome
SRR25653600, a phage metagenome
SRR25021205, a soil metagenome
SRR25646998, a drinking water metagenome
SRR25611550,a food production metagenome
SRR7698815, a plant metagenome
SRR2243572, a wastewater metagenome

They are listed in this file:

cat experiments/inputs/demo_sraids

You can modify and use other SRA accessions, they were chosen just so we can see some results in the web frontend.

Download signatures and prepare search index¶

The snakemake pipeline in experiments/Snakefile was prepared to

download pre-calculated signatures for the SRA accessions in the demo dataset from wort,
build a search index
copy data into bw_db/ so it can be used further ahead by the containers in docker-compose.yml

You can run the snakemake pipeline with

pixi run index -j 4

which will install all the dependencies needed and run snakemake. You can adjust how many jobs are executed by changing -j 4.

This will create a bw_db directory at the root of the repository with the following structure:

bw_db
├── index/      # the branchwater search index
├── sigs.zip    # signatures indexed for search
└── sraids      # a list of SRA accessions to download signatures and build the index

Prepare Metadata¶

Using BiqQuery

Prepare a BigQuery access key

Based on the SRA instructions at Setting-up BigQuery

Create project - sraproject
Go to BigQuery search tool
1. In the Explorer panel select +ADD to add data
2. Select Star a project by name
3. Search: nih-sra-datastore and select it

Create service account key

Go to navigation menu -> IAM & Admin -> service accounts -> + CREATE SERVICE ACCOUNT
- name: sraquery
- ID: sraquery
  - NOTE - the name of the actual project id is autogenerated
- Roles: BigQuery Job User; BigQuery Data Owner; BigQuery Read Sessions User
Once the service account is created, click the menu bar under actions and choose Manage keys
1. select add key
2. create new key
3. key type: JSON
4. Download key to the bw_db/ folder
5. save as bqKey.json
In the BigQuery console, under sraproject create a dataset named mastiffdata

Checkpoint before metadata processing

This is how the bw_db directory at the root of the repository should look like:

bw_db
├── bqKey.json     # NEW: BigQuery credentials and Project ID
├── index/         # the branchwater search index
├── sigs.zip       # signatures indexed for search
└── sraids         # a list of SRA accessions to download signatures and build the index

Download the SRA metadata from bigquery

pixi run metadata_bq

From SRA parquet in AWS Open Data

Checkpoint before metadata processing

This is how the bw_db directory at the root of the repository should look like:

bw_db
├── index/         # the branchwater search index
├── sigs.zip       # signatures indexed for search
└── sraids         # a list of SRA accessions to download signatures and build the index

Download the SRA metadata via parquet file

pixi run metadata_sra

Note

to build a smaller dataset for testing, run pixi run metadata_sra --build-test-db

Load the metadata into duckdb

pixi run load_duckdb

Note

if reloading after switching from e.g. test db to full db, need to run: pixi run load_duckdb --force

Bring up search index and web frontend¶

pixi run deploy build app
pixi run deploy up -d app

Web frontend will be available at http://localhost:8000