Deploying a new branchwater instance

Deploying a new branchwater instance involves bringing up a couple of components:

  • branchwater-web, the web frontend at https://branchwater.sourmash.bio

  • branchwater-server, the backend serving the RocksDB inverted index for sourmash signatures

  • a duckdb database for the SRA metadata used for branchwater-web results

A diagram of how these components are connected:

        graph LR;
classDef server fill:#4A902A,stroke:#333,stroke-width:4px,color:#fff;
classDef web fill:#dc6c11,stroke:#333,stroke-width:4px,color:#fff;
classDef index fill:#3c48cc,stroke:#333,stroke-width:4px,color:#fff;
classDef duckdb fill:#6980e9,stroke:#333,stroke-width:4px,color:#fff;
classDef client fill:#8450e1,stroke:#333,stroke-width:4px,color:#fff;

A01(browser):::client --> B01(web):::web
B01 --> C01(server):::server
B01 --> D01[(duckdb)]:::duckdb
C01 --> E01[(index)]:::index
    

Be it for development or production usage, a docker-compose configuration is available in the repo that can bring up these components in the appropriate order.

Quickstart: branchwater with a demo dataset

Clone the repo

git clone https://github.com/sourmash-bio/branchwater
cd branchwater

Set up dependencies

We use pixi for managing dependencies and running tasks for branchwater development, you can install it with

curl -fsSL https://pixi.sh/install.sh | bash

or check updated instructions on their website.

For deploying a complete development or production environment, we have a docker-compose.yml configuration describing the containers and how they connect together. For using this configuration, either docker compose or podman-compose is needed. While there are many ways to get them installed, on MacOS or Windows there are a couple of “Desktop” versions with a complete solution (GUI, start services, configure networking) to make it easy to get started.

Note

You only need one of docker or podman, no need to install both!

We recommend setting up Rancher Desktop for development with docker compose. Follow instructions from their website to set it up for your operating system.

Podman Desktop is the “Desktop” equivalent for Podman. Follow instruction on their website to set it up for your operating system.

pixi tasks default to run with docker compose, if you’re using podman you need to update tasks to use

"podman-compose"

instead of

"docker compose"

especially in the deploy and metadata tasks. Edit the pixi.toml file and replace entries accordingly.

The demo dataset

The demo dataset included in the repo has the following SRA accessions:

They are listed in this file:

cat experiments/inputs/demo_sraids

You can modify and use other SRA accessions, they were chosen just so we can see some results in the web frontend.

Download signatures and prepare search index

The snakemake pipeline in experiments/Snakefile was prepared to

  • download pre-calculated signatures for the SRA accessions in the demo dataset from wort,

  • build a search index

  • copy data into bw_db/ so it can be used further ahead by the containers in docker-compose.yml

You can run the snakemake pipeline with

pixi run index -j 4

which will install all the dependencies needed and run snakemake. You can adjust how many jobs are executed by changing -j 4.

This will create a bw_db directory at the root of the repository with the following structure:

bw_db
├── index/      # the branchwater search index
├── sigs.zip    # signatures indexed for search
└── sraids      # a list of SRA accessions to download signatures and build the index

Prepare Metadata

Prepare a BigQuery access key

Based on the SRA instructions at Setting-up BigQuery

  1. Create project - sraproject

  2. Go to BigQuery search tool

    1. In the Explorer panel select +ADD to add data

    2. Select Star a project by name

    3. Search: nih-sra-datastore and select it

Create service account key

  1. Go to navigation menu -> IAM & Admin -> service accounts -> + CREATE SERVICE ACCOUNT

    • name: sraquery

    • ID: sraquery

      • NOTE - the name of the actual project id is autogenerated

    • Roles: BigQuery Job User; BigQuery Data Owner; BigQuery Read Sessions User

  2. Once the service account is created, click the menu bar under actions and choose Manage keys

    1. select add key

    2. create new key

    3. key type: JSON

    4. Download key to the bw_db/ folder

    5. save as bqKey.json

  3. In the BigQuery console, under sraproject create a dataset named mastiffdata

Checkpoint before metadata processing

This is how the bw_db directory at the root of the repository should look like:

bw_db
├── bqKey.json     # NEW: BigQuery credentials and Project ID
├── index/         # the branchwater search index
├── sigs.zip       # signatures indexed for search
└── sraids         # a list of SRA accessions to download signatures and build the index

Download the SRA metadata from bigquery

pixi run metadata_bq

Checkpoint before metadata processing

This is how the bw_db directory at the root of the repository should look like:

bw_db
├── index/         # the branchwater search index
├── sigs.zip       # signatures indexed for search
└── sraids         # a list of SRA accessions to download signatures and build the index

Download the SRA metadata via parquet file

pixi run metadata_sra

Note

to build a smaller dataset for testing, run pixi run metadata_sra --build-test-db

Load the metadata into duckdb

pixi run load_duckdb

Note

if reloading after switching from e.g. test db to full db, need to run: pixi run load_duckdb --force

Bring up search index and web frontend

pixi run deploy build app
pixi run deploy up -d app

Web frontend will be available at http://localhost:8000