Skip to content

Commit

Permalink
Release 1.6.0 (#400)
Browse files Browse the repository at this point in the history
* added new config docs

* Remove --toggle key. Add logic during fetch samples for toggle. Remove checking for toggle off during sample submission (redundant). #263

* Remove toggle key constants. Apply formatting. #263

* Remove toggle key property. Simplified logic for fetching samples. #263

* fix pephub failing tests

* cleaned up more to help pass pytests

* distutils.dir_util for shutil for python 3.12

* added divvy entry point inside looper

* black format

* another black reformat

* added docs

* clean up based on feedback

* clean up based on feedback

* removed redundancy

* black fmt

* added divvy inspect

* black fmt

* added sub cmd, docker args

* added line break for inspect output

* added divvy docs #343

* added divvy docs #343

* divvy docs fix

* mkdocs fix

* Fixed mkdocs error

* Update requirements-doc.txt

* updated reqs-doc

* merge mistake fix

* added divvy imgs

* added new looper init

* added to changlog, fix divvy imgs

* divvy readme img fix

* fixed initialization of generic piface

* fixed initialization of generic piface

* added tests

* fixed main setup

* Update how_to_define_looper_config.md

* Update __init__.py

* Update test_other.py

* Ise

* added changelog and minor naming changes

* remove old logging function

* dev version bump

* fix typo in html_report and upgraded pandas requirements for pephubclient

* fixed requirements

* fixed docs requirements

* added versioneer to doc requirements

* added Cython to doc requirements

* added readthedocs config

* added looper to requirements docs

* allow for using pipestat.summarize, align with pipestat 0.4.0

* clean up code, update usage doc

* update doc requirements pephubclient

* downgrade docs to 3.10

* adjust get_status to use proper sample_name if pipestat configured #326

* adjust conductor to retrieve pipestat manager variables with pipestat 0.4.0 refactoring.

* Allows skipping some tests if run offline. Closes #370

* work on using test_args instead of subprocesses

* Finish switching applicable tests away from subprocess

* Lint and update doc string to test_args_expansion

* Change return type.

* lint

* add test for var_templates #357, and clean up tests

* attempt simple check to see if provided pipelines are callable #195

* minor adjustments, polished docstring

* work on new peppy

* update changelog

* lint

* update version to 1.5.0

* update changelog

* update reqs and changelog to use pipestat v0.5.0

* more work on peppy without attmap

* Refactoring for looper config

* added looper config file argument

* code fix

* Added comment about deprecating for old looper specification

* fixed looper init error

* change logo for docs build tab icon

* fix favicon

* update docs and changelog for 1.5.0 release

* - fix `looper table` failing without sample.protocol , update change log for point release

* fix "--looper-config"

* update version and changelog date

* adjust pipestat requirement to be >=0.5.1

* lint

* clarify message on rereun

* dev version flag

* version bump to 1.5.2-dev

* fix indentation

* fix message logic

* improve rerun messaging

* use f strings

* default msg

* fix error message.

* clean up divvy docs. Fix #393

* add expanding paths in read_looper_config_file

* Is this the fix for #398?

* fix some item attr confusion

* lint

* Adjust dry run submission reporting

* oops

* more attr fixes

* lint

* initial poc for rewiriting classes to return dict for pytests

* more test changes

* test polish

* use logmuse for all logging msgs

* typo

* reduce runp and collator tests

* Skip Checker related tests until CheckerOld is deprecated

* remove divvy dependence on attmap

* expand submission paths correctly

* Update requirements pephubclient

* refactor CLI code

* clean up imports

* cli imports cleanup

* final cli polish, linting

* have self.debug use consts and clean up structure

* add imports to cli_divvy

* Add pipestat configuration exception to sub commands and tests

* first pass at refactoring prj.pipestat_configured_project prj.pipestat_configured

* fix utils import

* fix tests, lint

* add pipeline type for looper table

* add sample_level stats for looper table

* fixed relative path

* fixed tests and updatd documentation

* added docstring to test

* remove unused code

* tweak sample_level pipestat retrieval

* remove CheckerOld

* Fixed #410

* Fixed #395

* change destroy_summary to use pipestat configuration and pipestat functions

* add --project default for destroy command and rename funcs for disambiguation

* remove html_reports and update imports.

* add pipestat compatible pep for pytesting and associated test.

* adjust pytest fixture.

* add new pytests for pipestat configurations, re-implement Check tests

* change LOGGER.warn to LOGGER.warning due to deprecation

* change function name for copyfile, remove todos.

* attempt to change the way looper gets pipestat configuration, tests broken

* fix broken tests and file path issues

* fix config_file namespace and pep-config issue

* add plugin

* Pipestat polish (#412)

* fix pep-config issue

* pass output_dir to pipestat #411 and # 390

* adjust building looper namespace

* revert to using pipestat config path, tests broken

* fix tests by reverting some changes

* allow sample name to be passed during config check, raise value error if no pipestat config

* pass sample name only

* resolve schema path based on pipestat config

* clean test and allow pipestat namespace to include config_file path

* remove unnecessary pipestat namespace key value pairs

* Attempt constructing a pipestat config file from looper config and piface and then writing to file. Tests broken.

* fix tests

* general clean up

* remove sample name during pipestat creation

* remove redundancy

* lint

* clean up comments

* fix runp for pipestat and add to pytest

* add information to looper's pipestat documentation.

* Update changelog

* 406 relative path (#408)

* Changed relative path to project config

* lint

* fixed failing test

* fixed failing test

* merge from dev and fix conflicts

---------

Co-authored-by: nsheff <[email protected]>
Co-authored-by: Donald Campbell <[email protected]>

* add info on looper report and point to pepatac example #41

* update looper to reflect pipestat's refactor from sample_name to record_identifier

* use pipestat to set status to 'waiting' if rerunning a failed sample. #169

* update pipestat and yacman versions to alpha and dev versions in requirements.

* update yacman to be released v0.9.2

* resolve pipestat module imports

* update eido 0.2.1 and peppy to pre-release version

* skip test failing on GitHub for now.

* modify test skip to include entire class.

* lint

* revert change per discussion: #420

* change naming of generated pipeline_interface.yaml #417

* add clarification in docs for accessing sample.sample_yaml_path and sample.sample_yaml_cwl #421

* move plugins to `plugins.py` #419

* Added writing output_schema and count_lines.sh when initializing a pipeline interface #418

* begin work on looper link #72

* continue looper link, now functional #72

* allow access for looper.pep_config #424

* fix accessing looper.pep_config #424, add building out looper namespace based on config file #423

* add better error message for #397

* fix pipestat import

* remove sample_name and project_name from pipestat_namespace in favor of record_identifier

* update pipestat req

* update pipestat req for newest alpha release

* update changelog.md

* update project definition docs

* update pipeline-interface-specification.md

* update parameterizing-pipelines.md and initialize.md

* misc documentation corrections

* update usage.md

* update hello-world example

* 2nd pass on docs

* add clarification for sample_modifiers and configuring project

* fix for #427

* implement path expansion during pipestat configuration check

* print link directory to terminal after using looper link

* print report directory to terminal after using looper report

* change schema_path to output_schema for pipestat config

* update docs schema_path to output_schema for pipestat config

* fix key error with populate_sample_paths

* allow rewriting looper config even if it exists

* only pass sample_name as record_identifier if it is given

* add default_project_record_identifier if using {record_identifier} in pipestat config

* fix #428

* fix pep_config not populating during runp

* remove redundant pep_config arg

* fix bug with path expansion during config read

* pass looper's samples to pipestat summarize

* WIP attempt at selector-flag #126

* more progress, works for selection #126

* Add tests, both selection and exclusion based on flags now works #126

* fix path issue for items with underscores #126

* ignore flags if slecting on flags, change sel and exc to have nargs="*"

* update docs

* add more tests for selecting attributes in tandem with flags #126

* fix project typo in docs

* version bump for prerelease 1.6.0a1 and add pipestat req v0.6.0a9

* 2nd attempt add pipestat req v0.6.0a9

* bump to 1.6.0a2

* fixes #430 and adds corresponding tests

* Updates for new peppy

* lint

* basic tab completion for initial commands #422

* polish docs for tab completion #422

* lint

* potential fix pepkit/peppy#459

* Revert "potential fix pepkit/peppy#459"

This reverts commit 281cef4.

* version 1.6.0a3 pre-release

* v1.6.0 release prep

---------

Co-authored-by: Khoroshevskyi <[email protected]>
Co-authored-by: Donald C <[email protected]>
Co-authored-by: ayobi <[email protected]>
  • Loading branch information
4 people authored Dec 22, 2023
1 parent 5c499a2 commit b47a568
Show file tree
Hide file tree
Showing 60 changed files with 3,948 additions and 5,769 deletions.
17 changes: 17 additions & 0 deletions bash_complete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Begin looper bash autocomplete
_looper_autocomplete()
{
local cur prev opts1
cur=${COMP_WORDS[COMP_CWORD]}
prev=${COMP_WORDS[COMP_CWORD-1]}
opts1=$(looper --commands)
case ${COMP_CWORD} in
1)
COMPREPLY=($(compgen -W "${opts1}" -- ${cur}))
;;
2)
COMPREPLY=()
;;
esac
} && complete -o bashdefault -o default -F _looper_autocomplete looper
# end looper bash autocomplete
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ unzip master.zip

# Run looper:
cd hello_looper-master
looper run project/project_config.yaml
looper run --looper-config .looper.yaml project/project_config.yaml
```

Detailed explanation of results is in the [Hello world tutorial](hello-world.md).
27 changes: 27 additions & 0 deletions docs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,30 @@ Once a pipeline is submitted any remaining interface files will be ignored.
Until an appropriate pipeline is found, each interface file will be considered in succession.
If no suitable pipeline is found in any interface, the sample will be skipped.
In other words, the `pipeline_interfaces` value specifies a *prioritized* search list.

## Set up tab completion

Source `bash_complete.sh` to your `~/.bashrc` to get basic tab completion for Looper.

Then, simply type `looper <tab> <tab>` to see a list of commands and `looper comma<tab>` to get autocompletion for specific commands.

Source script to add to `~/.bashrc`:
```bash
# Begin looper bash autocomplete
_looper_autocomplete()
{
local cur prev opts1
cur=${COMP_WORDS[COMP_CWORD]}
prev=${COMP_WORDS[COMP_CWORD-1]}
opts1=$(looper --commands)
case ${COMP_CWORD} in
1)
COMPREPLY=($(compgen -W "${opts1}" -- ${cur}))
;;
2)
COMPREPLY=()
;;
esac
} && complete -o bashdefault -o default -F _looper_autocomplete looper
# end looper bash autocomplete
```
17 changes: 16 additions & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) and [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) format.

## [1.6.0] -- 2023-12-22

### Added
- `looper link` creates symlinks for results grouped by record_identifier. It requires pipestat to be configured. [#72](https://github.com/pepkit/looper/issues/72)
- basic tab completion.

### Changed
- looper now works with pipestat v0.6.0 and greater.
- `looper table`, `check` now use pipestat and therefore require pipestat configuration. [#390](https://github.com/pepkit/looper/issues/390)
- changed how looper configures pipestat [#411](https://github.com/pepkit/looper/issues/411)
- initializing pipeline interface also writes an example `output_schema.yaml` and `count_lines.sh` pipeline

### Fixed
- filtering via attributes that are integers.

## [1.5.1] -- 2023-08-14

### Fixed
Expand Down Expand Up @@ -68,7 +83,7 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
## [1.3.1] -- 2021-06-18

### Changed
- If remote schemas are not accessbile, the job submission doesn't fail anymore
- If remote schemas are not accessible, the job submission doesn't fail anymore
- Fixed a bug where looper stated "No failed flag found" when a failed flag was found

### Deprecated
Expand Down
151 changes: 24 additions & 127 deletions docs/defining-a-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,142 +4,39 @@

To start, you need a project defined in the [standard Portable Encapsulated Project (PEP) format](http://pep.databio.org). Start by [creating a PEP](https://pep.databio.org/en/latest/simple_example/).

## 2. Connect the PEP to looper
## 2. Specify the Sample Annotation

### 2.1 Specify `output_dir`

Once you have a basic PEP, you can connect it to looper. Just provide the required looper-specific piece of information -- `output-dir`, a parent folder where you want looper to store your results. You do this by adding a `looper` section to your PEP. The `output_dir` key is expected in the top level of the `looper` section of the project configuration file. Here's an example:
This information generally lives in a `project_config.yaml` file.

Simplest example:
```yaml
looper:
output_dir: "/path/to/output_dir"
pep_version: 2.0.0
sample_table: sample_annotation.csv
```
### 2.2 Configure pipestat
*We recommend to read the [pipestat documentation](https://pipestat.databio.org) to learn more about the concepts described in this section*
Additionally, you may configure pipestat, the tool used to manage pipeline results. Pipestat provides lots of flexibility, so there are multiple configuration options that you can provide in `looper.pipestat.sample` or `looper.pipestat.project`, depending on the pipeline level you intend to run.

Please note that all the configuration options listed below *do not* specify the values passed to pipestat *per se*, but rather `Project` or `Sample` attribute names that hold these values. This way the pipestat configuration can change with pipeline submitted for every `Sample` if the PEP `sample_modifiers` are used.

- `results_file_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the YAML results file that will be used to report results into. Default value: `pipestat_results_file`, so the path will be sourced from either `Sample.pipestat_results_file` or `Project.pipestat_results_file`. If the path provided this way is not absolute, looper will make it relative to `{looper.output_dir}`.
- `namespace_attribute`: name of the `Sample` or `Project` attribute that indicates the namespace to report into. Default values: `sample_name` for sample-level pipelines `name` for project-level pipelines , so the path will be sourced from either `Sample.sample_name` or `Project.name`.
- `config_attribute`: name of the `Sample` or `Project` attribute that indicates the path to the pipestat configuration file. It's not needed in case the intended pipestat backend is the YAML results file mentioned above. It's required if the intended pipestat backend is a PostgreSQL database, since this is the only way to provide the database login credentials. Default value: `pipestat_config`, so the path will be sourced from either `Sample.pipestat_config` or `Project.pipestat_config`.

Non-configurable pipestat options:

- `schema_path`: never specified here, since it's sourced from `{pipeline.output_schema}`, that is specified in the pipeline interface file
- `record_identifier`: is automatically set to `{pipeline.pipeline_name}`, that is specified in the pipeline interface file

A more complicated example taken from [PEPATAC](https://pepatac.databio.org/en/latest/):
```yaml
name: "test123"
pipestat_results_file: "project_pipestat_results.yaml"
pipestat_config: "/path/to/project_pipestat_config.yaml"
pep_version: 2.0.0
sample_table: tutorial.csv

sample_modifiers:
append:
pipestat_config: "/path/to/pipestat_config.yaml"
pipestat_results_file: "RESULTS_FILE_PLACEHOLDER"
derive:
attributes: ["pipestat_results_file"]
attributes: [read1, read2]
sources:
RESULTS_FILE_PLACEHOLDER: "{sample_name}/pipestat_results.yaml"
looper:
output_dir: "/path/to/output_dir"
# pipestat configuration starts here
# the values below are defaults, so they are not needed, but configurable
pipestat:
sample:
results_file_attribute: "pipestat_results_file"
config_attribute: "pipestat_config"
namespace_attribute: "sample_name"
project:
results_file_attribute: "pipestat_results_file"
config_attribute: "pipestat_config"
namespace_attribute: "name"
```
## 3. Link a pipeline to your project

Next, you'll need to point the PEP to the *pipeline interface* file that describes the command you want looper to run.

### Understanding pipeline interfaces

Looper links projects to pipelines through a file called the *pipeline interface*. Any looper-compatible pipeline must provide a pipeline interface. To link the pipeline, you simply point each sample to the pipeline interfaces for any pipelines you want to run.

Looper pipeline interfaces can describe two types of pipeline: sample-level pipelines or project-level pipelines. Briefly, a sample-level pipeline is executed with `looper run`, which runs individually on each sample. A project-level pipeline is executed with `looper runp`, which runs a single job *per pipeline* on an entire project. Typically, you'll first be interested in the sample-level pipelines. You can read in more detail in the [pipeline tiers documentation](pipeline-tiers.md).

### Adding a sample-level pipeline interface

Sample pipelines are linked by adding a sample attribute called `pipeline_interfaces`. There are 2 easy ways to do this: you can simply add a `pipeline_interfaces` column in the sample table, or you can use an *append* modifier, like this:

```yaml
sample_modifiers:
append:
pipeline_interfaces: "/path/to/pipeline_interface.yaml"
```

The value for the `pipeline_interfaces` key should be the *absolute* path to the pipeline interface file. The paths may also contain environment variables. Once your PEP is linked to the pipeline, you just need to make sure your project provides any sample metadata required by the pipeline.

### Adding a project-level pipeline interface

Project pipelines are linked in the `looper` section of the project configuration file:

```
looper:
pipeline_interfaces: "/path/to/project_pipeline_interface.yaml"
```

### How to link to multiple pipelines

Looper decouples projects and pipelines, so you can have many projects using one pipeline, or many pipelines running on the same project. If you want to run more than one pipeline on a sample, you can simply add more than one pipeline interface, like this:

```yaml
sample_modifiers:
append:
pipeline_interfaces: ["/path/to/pipeline_interface.yaml", "/path/to/pipeline_interface2.yaml"]
```

Looper will submit jobs for both of these pipelines.

If you have a project that contains samples of different types, then you can use an `imply` modifier in your PEP to select which pipelines you want to run on which samples, like this:


```yaml
sample_modifiers:
# Obtain tutorial data from http://big.databio.org/pepatac/ then set
# path to your local saved files
R1: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r1.fastq.gz"
R2: "${TUTORIAL}/tools/pepatac/examples/data/{sample_name}_r2.fastq.gz"
imply:
- if:
protocol: "RRBS"
then:
pipeline_interfaces: "/path/to/pipeline_interface.yaml"
- if:
protocol: "ATAC"
then:
pipeline_interfaces: "/path/to/pipeline_interface2.yaml"
```


## 5. Customize looper

That's all you need to get started linking your project to looper. But you can also customize things further. Under the `looper` section, you can provide a `cli` keyword to specify any command line (CLI) options from within the project config file. The subsections within this section direct the arguments to the respective `looper` subcommands. So, to specify, e.g. sample submission limit for a `looper run` command use:

```yaml
looper:
output_dir: "/path/to/output_dir"
cli:
run:
limit: 2
```

or, to pass this argument to any subcommand:

```yaml
looper:
output_dir: "/path/to/output_dir"
all:
limit: 2
```

Keys in the `cli.<subcommand>` section *must* match the long argument parser option strings, so `command-extra`, `limit`, `dry-run` and so on. For more CLI options refer to the subcommands [usage](usage.md).
- if:
organism: ["human", "Homo sapiens", "Human", "Homo_sapiens"]
then:
genome: hg38
prealignment_names: ["rCRSd"]
deduplicator: samblaster # Default. [options: picard]
trimmer: skewer # Default. [options: pyadapt, trimmomatic]
peak_type: fixed # Default. [options: variable]
extend: "250" # Default. For fixed-width peaks, extend this distance up- and down-stream.
frip_ref_peaks: None # Default. Use an external reference set of peaks instead of the peaks called from this run
```
10 changes: 5 additions & 5 deletions docs/README_divvy.md → docs/divvy/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
![Logo](img/divvy_logo.svg)
![Logo](../img/divvy_logo.svg)

## What is `divvy`?

`Divvy` allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy *makes software portable*, so users may easily toggle among any computing resource (laptop, cluster, cloud).
The submission configuration tool embedded in `looper` is called `divvy`. Divvy is useful independently from looper, but it ships with looper. Divvy allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy *makes software portable*, so users may easily toggle among any computing resource (laptop, cluster, cloud).

![Merge](img/divvy-merge.svg)
![Merge](../img/divvy-merge.svg)
## What makes `divvy` better?

![NoDivvy](img/nodivvy.svg)
![NoDivvy](../img/nodivvy.svg)

Tools require a particular compute resource setup. For example, one pipeline requires SLURM, another requires AWS, and yet another just runs directly on your laptop. This makes it difficult to transfer to different environments. For tools that can run in multiple environments, each one must be configured separately.

Expand All @@ -16,7 +16,7 @@ Tools require a particular compute resource setup. For example, one pipeline req

Instead, `divvy`-compatible tools can run on any computing resource. **Users configure their computing environment once, and all divvy-compatible tools will use this same configuration.**

![Connect](img/divvy-connect.svg)
![Connect](../img/divvy-connect.svg)

Divvy reads a standard configuration file describing available compute resources and then uses a simple template system to write custom job submission scripts. Computing resources are organized as *compute packages*, which users select, populate with values, and build scripts for compute jobs.

Expand Down
File renamed without changes.
25 changes: 25 additions & 0 deletions docs/configuration_divvy.md → docs/divvy/configuration.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
# Installing divvy

Divvy is automatically installed when you install looper. See if your install worked by calling `divvy -h` on the command line. If the `divvy` executable in not in your `$PATH`, append this to your `.bashrc` or `.profile` (or `.bash_profile` on macOS):

```{console}
export PATH=~/.local/bin:$PATH
```

# Initial configuration

On a fresh install, `divvy` comes pre-loaded with some built-in compute packages, which you can explore by typing `divvy list`. If you need to tweak these or create your own packages, you will need to configure divvy manually. Start by initializing an empty `divvy` config file:

```{console}
export DIVCFG="divvy_config.yaml"
divvy init $DIVCFG
```

This `init` command will create a default config file, along with a folder of templates.

The `divvy write` and `list` commands require knowing where this genome config file is. You can pass it on the command line all the time (using the -c parameter), but this gets old. An alternative is to set up the $DIVCFG environment variable. Divvy will automatically use the config file in this environmental variable if it exists. Add this line to your `.bashrc` or `.profile` if you want it to persist for future command-line sessions. You can always specify -c if you want to override the value in the $DIVCFG variable on an ad-hoc basis:

```{console}
export DIVCFG=/path/to/divvy_config.yaml
```

# The divvy configuration file

At the heart of `divvy` is a the *divvy configuration file*, or `DIVCFG` for short. This is a `yaml` file that specifies a user's available *compute packages*. Each compute package represents a computing resource; for example, by default we have a package called `local` that populates templates to simple run jobs in the local console, and another package called `slurm` with a generic template to submit jobs to a SLURM cluster resource manager. Users can customize compute packages as much as needed.
Expand Down
File renamed without changes.
File renamed without changes.
1 change: 1 addition & 0 deletions docs/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ Looper uses a command-line interface so you have total power at your fingertips.
![html][html] **Beautiful linked result reports**

Looper automatically creates an internally linked, portable HTML report highlighting all results for your pipeline, for every pipeline.
For an html report example see: [PEPATAC Gold Summary](https://pepatac.databio.org/en/latest/files/examples/gold/gold_summary.html)
Loading

0 comments on commit b47a568

Please sign in to comment.