# Deploying a new branchwater instance Deploying a new `branchwater` instance involves bringing up a couple of components: - `branchwater-web`, the web frontend at [https://branchwater.sourmash.bio](https://branchwater.sourmash.bio) - `branchwater-server`, the backend serving the RocksDB inverted index for sourmash signatures - a [duckdb](https://duckdb.org/) database for the SRA metadata used for `branchwater-web` results A diagram of how these components are connected: ::: {mermaid} graph LR; classDef server fill:#4A902A,stroke:#333,stroke-width:4px,color:#fff; classDef web fill:#dc6c11,stroke:#333,stroke-width:4px,color:#fff; classDef index fill:#3c48cc,stroke:#333,stroke-width:4px,color:#fff; classDef duckdb fill:#6980e9,stroke:#333,stroke-width:4px,color:#fff; classDef client fill:#8450e1,stroke:#333,stroke-width:4px,color:#fff; A01(browser):::client --> B01(web):::web B01 --> C01(server):::server B01 --> D01[(duckdb)]:::duckdb C01 --> E01[(index)]:::index ::: Be it for development or production usage, a [`docker-compose`](https://docs.docker.com/compose/) configuration is [available in the repo](https://github.com/sourmash-bio/branchwater/blob/main/docker-compose.yml) that can bring up these components in the appropriate order. ## Quickstart: branchwater with a demo dataset ### Clone the repo ``` git clone https://github.com/sourmash-bio/branchwater cd branchwater ``` ### Set up dependencies We use [pixi](https://pixi.sh) for managing dependencies and running tasks for `branchwater` development, you can install it with ``` curl -fsSL https://pixi.sh/install.sh | bash ``` or check updated instructions on their website. For deploying a complete development or production environment, we have a `docker-compose.yml` configuration describing the containers and how they connect together. For using this configuration, either `docker compose` or `podman-compose` is needed. While there are many ways to get them installed, on MacOS or Windows there are a couple of "Desktop" versions with a complete solution (GUI, start services, configure networking) to make it easy to get started. :::{note} You only need one of `docker` or `podman`, no need to install both! ::: ::::{tab-set} :::{tab-item} Docker Desktop We recommend setting up [Rancher Desktop](https://rancherdesktop.io/) for development with `docker compose`. Follow instructions from their website to set it up for your operating system. ::: :::{tab-item} Podman Desktop [Podman Desktop](https://podman-desktop.io/) is the "Desktop" equivalent for Podman. Follow instruction on their website to set it up for your operating system. `pixi` tasks default to run with `docker compose`, if you're using `podman` you need to update tasks to use ``` "podman-compose" ``` instead of ``` "docker compose" ``` especially in the `deploy` and `metadata` tasks. Edit the `pixi.toml` file and replace entries accordingly. ::: :::: ### The demo dataset The demo dataset included in the repo has the following SRA accessions: - [ERR272375](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=ERR272375&display=metadata), a salt marsh metagenome - [SRR5439749](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR5439749&display=metadata), a human gut metagenome - [SRR20285055](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR20285055&display=metadata), an air metagenome - [SRR24480609](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR24480609&display=metadata), a gut metagenome - [ERR3220185](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=ERR3220185&display=metadata), a bovine gut metagenome - [SRR6269135](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR6269135&display=metadata), a marine metagenome - [SRR25653600](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR25653600&display=metadata), a phage metagenome - [SRR25021205](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR25021205&display=metadata), a soil metagenome - [SRR25646998](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR25646998&display=metadata), a drinking water metagenome - [SRR25611550](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR25611550&display=metadata),a food production metagenome - [SRR7698815](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR7698815&display=metadata), a plant metagenome - [SRR2243572](https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR2243572&display=metadata), a wastewater metagenome They are listed in this file: ``` cat experiments/inputs/demo_sraids ``` You can modify and use other SRA accessions, they were chosen just so we can see some results in the web frontend. ### Download signatures and prepare search index The snakemake pipeline in `experiments/Snakefile` was prepared to - download pre-calculated signatures for the SRA accessions in the demo dataset from [wort](https://wort.sourmash.bio), - build a search index - copy data into `bw_db/` so it can be used further ahead by the containers in `docker-compose.yml` You can run the snakemake pipeline with ``` pixi run index -j 4 ``` which will install all the dependencies needed and run snakemake. You can adjust how many jobs are executed by changing `-j 4`. This will create a `bw_db` directory at the root of the repository with the following structure: ``` bw_db ├── index/ # the branchwater search index ├── sigs.zip # signatures indexed for search └── sraids # a list of SRA accessions to download signatures and build the index ``` ### Prepare Metadata ::::{tab-set} :::{tab-item} Using BiqQuery #### Prepare a BigQuery access key ```{include} ../metadata/README.md :start-after: :end-before: ``` #### Checkpoint before metadata processing This is how the `bw_db` directory at the root of the repository should look like: ``` bw_db ├── bqKey.json # NEW: BigQuery credentials and Project ID ├── index/ # the branchwater search index ├── sigs.zip # signatures indexed for search └── sraids # a list of SRA accessions to download signatures and build the index ``` #### Download the SRA metadata from bigquery ``` pixi run metadata_bq ``` ::: :::{tab-item} From SRA parquet in AWS Open Data #### Checkpoint before metadata processing This is how the `bw_db` directory at the root of the repository should look like: ``` bw_db ├── index/ # the branchwater search index ├── sigs.zip # signatures indexed for search └── sraids # a list of SRA accessions to download signatures and build the index ``` #### Download the SRA metadata via parquet file ``` pixi run metadata_sra ``` ```{note} to build a smaller dataset for testing, run `pixi run metadata_sra --build-test-db` ``` ### Load the metadata into duckdb ``` pixi run load_duckdb ``` ```{note} if reloading after switching from e.g. test db to full db, need to run: `pixi run load_duckdb --force` ``` ::: :::: ### Bring up search index and web frontend ``` pixi run deploy build app pixi run deploy up -d app ``` Web frontend will be available at [http://localhost:8000](http://localhost:8000)