Skip to content

Commit

Permalink
Merge pull request #71 from rki-mf1/dev
Browse files Browse the repository at this point in the history
merge dev into main for v0.5.0
  • Loading branch information
Krannich479 authored Aug 26, 2024
2 parents cd23ddc + 8975d28 commit a0e8c7e
Show file tree
Hide file tree
Showing 14 changed files with 665 additions and 35 deletions.
31 changes: 27 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
2. [Installation](#installation)
3. [Usage](#usage)
4. [Help](#help)
5. [Citation](#citation)


## System requirements:
Expand Down Expand Up @@ -87,13 +88,35 @@ This generates the following data within the `<project_root>/results/` directory
- a report (CSV) with statistis across all tested individuals

### Tuning the workflow parameters
Many internal settings can be adjusted at the nextflow level.
CIEVaD enables access and finetuning to a vast majority of parameters of the internal software tools.
The parameters to adjust the workflows are listed on their respective help pages.
To inspect the help pages type `--help` after the script name.
Parameters can be adjusted via the CLI or within the _nextflow.config_ file.
To inspect the help pages type `--help` after the script name, e.g. `nextflow run hap.nf --help` for the hap.nf workflow.
Parameters can be adjusted via the CLI or directly within the _nextflow.config_ file.
Mind that parameters provided by the CLI will overwrite parameters set in config.
More information about tuning crucial parameters, e.g. [read quality](https://github.com/rki-mf1/cievad/wiki/Parameterization-of-the-workflow) and [genome coverage](https://github.com/rki-mf1/cievad/wiki/FAQ---Troubleshooting), can be found in the Wiki.

## Help:

Visit the project [wiki](https://github.com/rki-mf1/cievad/wiki) for more information, help and FAQs. <br>
Visit the project [wiki](https://github.com/rki-mf1/cievad/wiki) for more detail information on parameters, help and FAQs. <br>
Please file issues, bug reports and questions to the [issues](https://github.com/rki-mf1/cievad/issues) section.

## Citation:

We have a [preprint](https://www.biorxiv.org/content/10.1101/2024.06.21.600013v1) available for CIEVaD.
For the time being, if you use CIEVaD please cite
```
@article {Krannich2024.06.21.600013,
author = {Krannich, Thomas and Ternovoj, Dimitri and Paraskevopoulou, Sofia and Fuchs, Stephan},
title = {CIEVaD: a lightweight workflow collection for rapid and on demand deployment of end-to-end testing of genomic variant detection},
elocation-id = {2024.06.21.600013},
year = {2024},
doi = {10.1101/2024.06.21.600013},
publisher = {Cold Spring Harbor Laboratory},
abstract = {The identification of genomic variants has become a routine task in the thriving age of genome sequencing. Particularly small genomic variants of single or few nucleotides are routinely investigated for their impact on an organism{\textquoteright}s phenotype. Hence, precise and robust detection of the variants{\textquoteright} exact genomic location and change in nucleotide composition is vital in many biological applications. Although a plethora of methods exist for the many key steps of variant detection, thoroughly testing the detection process and evaluating its results is still a cumbersome procedure. In this work, we present a collection of trivial to apply and highly modifiable workflows to facilitate the generation of synthetic test data as well as to evaluate the accordance of a user-provided set of variants with the test data. Availability: The workflows are implemented in Nextflow and are freely available and open-source at https://github.com/rki-mf1/cievad under the GPL-3.0 license.Competing Interest StatementThe authors have declared no competing interest.},
URL = {https://www.biorxiv.org/content/early/2024/06/21/2024.06.21.600013},
eprint = {https://www.biorxiv.org/content/early/2024/06/21/2024.06.21.600013.full.pdf},
journal = {bioRxiv}
}
```


2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
'0.4.1'
'0.5.0'
58 changes: 58 additions & 0 deletions aux/Nstretches.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
import re
import matplotlib.pyplot as plt
import argparse

def parse_fasta(fasta_file):
sequences = {}
with open(fasta_file, 'r') as file:
sequence_id = None
sequence = ''
for line in file:
line = line.strip()
if line.startswith('>'):
if sequence_id is not None:
sequences[sequence_id] = sequence
sequence_id = line[1:] # Remove the '>' character
sequence = ''
else:
sequence += line
if sequence_id is not None:
sequences[sequence_id] = sequence
return sequences

def find_n_stretches(sequence):
return [(m.start(), m.end()) for m in re.finditer(r'N+', sequence)]

def generate_histogram(n_stretches, output_file):
lengths = [end - start for start, end in n_stretches]
plt.hist(lengths, bins=range(1, max(lengths) + 2), edgecolor='black')
plt.title('Histogram of N Stretches')
plt.xlabel('Length of N Stretches')
plt.ylabel('Frequency')
plt.savefig(output_file)
#plt.show()

def write_bed_file(n_stretches, sequence_id, bed_file):
with open(bed_file, 'w') as file:
for start, end in n_stretches:
file.write(f'{sequence_id}\t{start}\t{end}\n')

def process_fasta(fasta_file, histogram_output, bed_output):
sequences = parse_fasta(fasta_file)
for sequence_id, sequence in sequences.items():
n_stretches = find_n_stretches(sequence)
generate_histogram(n_stretches, histogram_output)
write_bed_file(n_stretches, sequence_id, bed_output)

def main():
parser = argparse.ArgumentParser(description="Process a FASTA file to find 'N' stretches, generate a histogram, and output a BED file.")
parser.add_argument('fasta_file', type=str, help="Input FASTA file")
parser.add_argument('histogram_output', type=str, help="Output filename for the histogram (PNG format)")
parser.add_argument('bed_output', type=str, help="Output filename for the BED file")

args = parser.parse_args()

process_fasta(args.fasta_file, args.histogram_output, args.bed_output)

if __name__ == "__main__":
main()
Loading

0 comments on commit a0e8c7e

Please sign in to comment.