Deploying a new branchwater instance¶
Deploying a new branchwater instance involves bringing up a couple of components:
branchwater-web, the web frontend at https://branchwater.sourmash.biobranchwater-server, the backend serving the RocksDB inverted index for sourmash signaturesa duckdb database for the SRA metadata used for
branchwater-webresults
A diagram of how these components are connected:
graph LR;
classDef server fill:#4A902A,stroke:#333,stroke-width:4px,color:#fff;
classDef web fill:#dc6c11,stroke:#333,stroke-width:4px,color:#fff;
classDef index fill:#3c48cc,stroke:#333,stroke-width:4px,color:#fff;
classDef duckdb fill:#6980e9,stroke:#333,stroke-width:4px,color:#fff;
classDef client fill:#8450e1,stroke:#333,stroke-width:4px,color:#fff;
A01(browser):::client --> B01(web):::web
B01 --> C01(server):::server
B01 --> D01[(duckdb)]:::duckdb
C01 --> E01[(index)]:::index
Be it for development or production usage,
a docker-compose
configuration is available in the repo
that can bring up these components in the appropriate order.
Quickstart: branchwater with a demo dataset¶
Clone the repo¶
git clone https://github.com/sourmash-bio/branchwater
cd branchwater
Set up dependencies¶
We use pixi for managing dependencies and running tasks for branchwater development,
you can install it with
curl -fsSL https://pixi.sh/install.sh | bash
or check updated instructions on their website.
For deploying a complete development or production environment,
we have a docker-compose.yml configuration describing the containers and how they connect together.
For using this configuration,
either docker compose or podman-compose is needed.
While there are many ways to get them installed,
on MacOS or Windows there are a couple of “Desktop” versions with a complete solution
(GUI, start services, configure networking) to make it easy to get started.
Note
You only need one of docker or podman, no need to install both!
We recommend setting up Rancher Desktop
for development with docker compose.
Follow instructions from their website to set it up for your operating system.
Podman Desktop is the “Desktop” equivalent for Podman. Follow instruction on their website to set it up for your operating system.
pixi tasks default to run with docker compose,
if you’re using podman you need to update tasks to use
"podman-compose"
instead of
"docker compose"
especially in the deploy and metadata tasks.
Edit the pixi.toml file and replace entries accordingly.
The demo dataset¶
The demo dataset included in the repo has the following SRA accessions:
ERR272375, a salt marsh metagenome
SRR5439749, a human gut metagenome
SRR20285055, an air metagenome
SRR24480609, a gut metagenome
ERR3220185, a bovine gut metagenome
SRR6269135, a marine metagenome
SRR25653600, a phage metagenome
SRR25021205, a soil metagenome
SRR25646998, a drinking water metagenome
SRR25611550,a food production metagenome
SRR7698815, a plant metagenome
SRR2243572, a wastewater metagenome
They are listed in this file:
cat experiments/inputs/demo_sraids
You can modify and use other SRA accessions, they were chosen just so we can see some results in the web frontend.
Download signatures and prepare search index¶
The snakemake pipeline in experiments/Snakefile was prepared to
download pre-calculated signatures for the SRA accessions in the demo dataset from wort,
build a search index
copy data into
bw_db/so it can be used further ahead by the containers indocker-compose.yml
You can run the snakemake pipeline with
pixi run index -j 4
which will install all the dependencies needed and run snakemake.
You can adjust how many jobs are executed by changing -j 4.
This will create a bw_db directory at the root of the repository with the following structure:
bw_db
├── index/ # the branchwater search index
├── sigs.zip # signatures indexed for search
└── sraids # a list of SRA accessions to download signatures and build the index
Prepare Metadata¶
Prepare a BigQuery access key
Based on the SRA instructions at Setting-up BigQuery
Create project -
sraprojectGo to
BigQuerysearch toolIn the
Explorer panelselect+ADDto add dataSelect
Star a project by nameSearch:
nih-sra-datastoreand select it
Create service account key
Go to
navigation menu->IAM & Admin->service accounts->+ CREATE SERVICE ACCOUNTname:
sraqueryID:
sraqueryNOTE - the name of the actual project id is autogenerated
Roles:
BigQuery Job User;BigQuery Data Owner;BigQuery Read Sessions User
Once the service account is created, click the menu bar under
actionsand chooseManage keysselect
add keycreate new keykey type:
JSONDownload key to the
bw_db/foldersave as
bqKey.json
In the BigQuery console, under
sraprojectcreate a dataset namedmastiffdata
Checkpoint before metadata processing
This is how the bw_db directory at the root of the repository should look like:
bw_db
├── bqKey.json # NEW: BigQuery credentials and Project ID
├── index/ # the branchwater search index
├── sigs.zip # signatures indexed for search
└── sraids # a list of SRA accessions to download signatures and build the index
Download the SRA metadata from bigquery
pixi run metadata_bq
Checkpoint before metadata processing
This is how the bw_db directory at the root of the repository should look like:
bw_db
├── index/ # the branchwater search index
├── sigs.zip # signatures indexed for search
└── sraids # a list of SRA accessions to download signatures and build the index
Download the SRA metadata via parquet file
pixi run metadata_sra
Note
to build a smaller dataset for testing, run pixi run metadata_sra --build-test-db
Load the metadata into duckdb
pixi run load_duckdb
Note
if reloading after switching from e.g. test db to full db, need to run:
pixi run load_duckdb --force
Bring up search index and web frontend¶
pixi run deploy build app
pixi run deploy up -d app
Web frontend will be available at http://localhost:8000