BGE-gaplist

This Snakemake workflow orchestrates the processing of taxonomic data from multiple sources, including BOLD Systems, Fauna Europaea, and expert contributions. It integrates data, performs gap analysis, and maintains updated species lists.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Preliminaries

The workflow combines pre-processed taxonomic information from the following sources:

BOLD
Fauna Europaea
Lepiform
WORMS
iNaturalist
input from various experts

At present, these data are expected to simply be there, though the future plan is to do this as part of the overall workflow.

Workflow Overview

The pipeline consists of four main steps:

Updating BOLD data through their API
Combining data from multiple taxonomic sources
Analyzing coverage gaps
Generating final updated species lists

Available Targets

Default Target (`all`)

snakemake all

Runs the complete pipeline, generating:

Updated combined species lists
Gap analysis reports
Sorted taxonomic hierarchies

Individual Targets

Update BOLD Data

snakemake update_bold_data

Queries the BOLD API for current specimen data.

Combine Lists

snakemake combine_lists

Integrates data from all taxonomic sources.

Analyze Gaps

snakemake analyze_gaps

Generates gap analysis reports.

Update Final List

snakemake update_final_list

Merges latest BOLD data into combined lists.

Utility Targets

Clean

snakemake clean

Removes all generated files and logs.

Generate Documentation

snakemake generate_docs

Creates documentation for all components.

Running the Workflow

Prerequisites

Conda or Mamba
Input data in Raw_Data directory

Setup

Create the conda environment:

conda env create -f environment.yml

Activate the environment:

conda activate BGE-gaplist

Execution

Full pipeline:

snakemake --cores all

Dry run to check execution plan:

snakemake -n

Generate workflow DAG:

snakemake --dag | dot -Tsvg > workflow.svg

Resource Requirements

BOLD data update: Single thread, ~2GB memory
Data combination: Single thread, memory varies with input size
Gap analysis: Multi-thread capable, memory scales with data size

Output Structure

BGE-gaplist/
├── results/
│   ├── Curated_Data/          # Processed data
│   │   ├── updated_combined_lists.csv
│   │   ├── combined_species_lists.csv
│   │   └── {date}_updated_BOLD_data.csv
│   └── Gap_Lists/            # Analysis results
│       ├── Gap_list_all.csv
│       └── sorted/          # Hierarchical results
└── logs/                # Process logs

Error Handling

All steps log to files in logs/
Failed steps retain partial outputs for inspection
Use --rerun-incomplete to restart failed jobs

Configuration

Edit config/config.yaml to modify:

File paths
API settings
Processing parameters

Contributing

Fork the repository
Create a feature branch
Submit a pull request

Support

File issues on the project's issue tracker.

Authors

Fabian Deister - SNSB Rutger Vos - Naturalis

Acknowledgments

This workflow builds on work from Fabian Deister, SNSB

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
config		config
doc		doc
lib/Bio/BGE/GapList		lib/Bio/BGE/GapList
resources/Raw_Data		resources/Raw_Data
results		results
tests		tests
workflow		workflow
.gitignore		.gitignore
.perlcriticrc		.perlcriticrc
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BGE-gaplist

License

Preliminaries

Workflow Overview

Available Targets

Default Target (`all`)

Individual Targets

Update BOLD Data

Combine Lists

Analyze Gaps

Update Final List

Utility Targets

Clean

Generate Documentation

Running the Workflow

Prerequisites

Setup

Execution

Resource Requirements

Output Structure

Error Handling

Configuration

Contributing

Support

Authors

Acknowledgments

About

Releases

Packages

Languages

License

naturalis/BGE-gaplist

Folders and files

Latest commit

History

Repository files navigation

BGE-gaplist

License

Preliminaries

Workflow Overview

Available Targets

Default Target (all)

Individual Targets

Update BOLD Data

Combine Lists

Analyze Gaps

Update Final List

Utility Targets

Clean

Generate Documentation

Running the Workflow

Prerequisites

Setup

Execution

Resource Requirements

Output Structure

Error Handling

Configuration

Contributing

Support

Authors

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Default Target (`all`)

Packages