BUG: [config.yaml error] #1018

bioinfolabmu · 2023-11-26T16:48:22Z

Describe the bug
My raw pair-end FASTQ data has the name like this

Sample1_Treat2_Replicate1_1.fq.gz
Sample1_Treat2_Replicate1_2.fq.gz

In my configuration file, I have the following entries:

fqsuffix: fq
fqext1: _1
fqext2: _2

Consequently, the program was looking for file named Sample1_Treat2_Replicate1__1.fq.gz (# two underscore symbol, instead of one)
Then, I modified the configuration file as

fqsuffix: fq
fqext1: 1
fqext2: 2

Next, I got this error.

Error validating config file.
ValidationError: 1 is not of type 'string'

Failed validating 'type' in schema['properties']['fqext1']:
OrderedDict([('description',
'filename suffix when handling paired-end data, '
'describing the forward read'),
('default', 'R1'),
('type', 'string')])

On instance['fqext1']:
1

bioinfolabmu · 2023-11-26T16:49:33Z

Alright, I think I know what is the problem

fqsuffix: fq
fqext1: '1'
fqext2: '2'

This modification has solved my configuration file problem.

bioinfolabmu · 2023-11-26T17:00:45Z

A relevant question in Sample.tsv file.

For the second line in this file, I have the following tab-delimited column descriptions.

sample assembly dev_stage treatment biological_replicates

My questions is "are these column descriptions be fixed as the key words in your tool? For example, what happen if I use "developmental_stage" to replace "dev_stage"? or use "genome" to replace "assembly"?

I did not find the relevant requirement information in your document web page. I am sorry if I missed something in your documentation.

Maarten-vd-Sande · 2023-11-26T20:03:20Z

Great that you fixed it. You can have any number of columns, and name them whatever you want. There are however certain column names that have a specific meaning. Such as sample, assembly, biological_replicates, and descriptive_name. These columns are used internally by seq2science for specific stuff. Moreover, you can use your column names in the differential peak/gene calling step: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#contrast-in-the-samples-tsv

Specifically:

"developmental_stage" -> "dev_stage" does not change anything in how seq2science runs, as those columns are ignored (except when they are used to define contrasts)
"assembly" -> "genome" makes it so that seq2science won't work. As the assembly column is required. This column specifies which assembly is used

bioinfolabmu · 2023-11-26T21:55:19Z

Thank you for quick response.

(1) So, considering the DEG analysis, the names in samples.tsv should be in consistent with your requirement. Actually, "dev_stage" should be "stages". That way, we can make sure to get proper results by seq2sequence. Am I correct?

(2) The column names such as "sample", "assembly", " stages", "treatments", "biological_replicates", "technical_replicates" and "condition" are easily applicable to many researcher's data analysis need. That should be sufficient. I noticed that "descriptive_name" requires unique constraint among different rows in the samples.tsv. Am I right? I am curios how does seq2science use "descriptive_name" internally?

Maarten-vd-Sande · 2023-11-27T10:46:18Z

(1) I don't understand the question. For the DEG analysis you can use any column(s) in the samples file. You can use dev_stage or stages. Just make sure that you contrast specification in the config.yaml reflects the correct column name

If you use the column dev_stage: dev_stage_one_two, and if you use stages: stages_one_two. It can be any column you want. You can even combine multiple columns for batch effect correction: https://vanheeringen-lab.github.io/seq2science/content/DESeq2.html#batch-effect-correction

(2) descriptive_name is one of the special columns that seq2science uses internally, just like for example sample, assembly, and biological replicates. It is used for the count table and for the final multiqc report

bioinfolabmu · 2023-11-28T14:01:54Z

Thanks. My qeustion was that "should we use 'stage' instead of 'dev_stage' or 'developmental_stages'? ". You said that it does not matter, because they are not the key words used in seq2science. My guess is that 'condition' is also not the key words used by seq2science. So, we can use different variations for it, such as 'conditions' or 'my_conditions', etc. Right?

bioinfolabmu · 2023-11-28T14:04:18Z

When I run my own data for alignment with Star, I encountered a bug. I am debugging now to see what happens. I noticed that Salmon as the quantifier tool, is not affected at all. It generates its own data. This means that Salmon is using his own alignment tool to finish the quantification itself. My next question is that, after I fixed the bug of running star, how can I connect the start alignment results to feed salmon for qunatification?

Maarten-vd-Sande · 2023-11-28T14:04:39Z

Yes you are right! I guess that's not entirely clear from the docs.

sample, assembly, descriptive_name, biological_replicates, and technical_replicates are column names used by seq2science internally. Any other column name is basically ignored, unless you use it for DESeq2

bioinfolabmu · 2023-11-28T14:10:36Z

Here is what in my config.yaml:

aligner:
star:
align: --quantMode GeneCounts --outSAMtype BAM

Seq2science gives me a fatal error in log file, saying "Duplicate parameter". I am trying to solve this problem.

Maarten-vd-Sande · 2023-11-28T14:14:39Z

Yeah that's perhaps unclear on our side (again). Almost all rules have sensible defaults, so you don't have to tune them. So you could just say: aligner: star.

We force star to output a BAM by default, as we need a bam as its output, so we always have --outSAMtype BAM_Unsorted. This gives a duplicate parameter

see: https://vanheeringen-lab.github.io/seq2science/content/all_rules.html#star-align

Maarten-vd-Sande · 2023-11-28T14:15:27Z

Also I'm not sure if the downstream steps work when you change quantmode

bioinfolabmu · 2023-11-28T14:22:12Z

I remembered that I read somewhere, you do not support 2-pass start alignemnt yet. What is your recommendation if we want to do the two passes, and then come back to Seq2science again?

bioinfolabmu · 2023-11-28T14:24:21Z

I also encounter a problem running trimglora, but no problem with fastp. My guess is the similar problem with configuration with default paramters or no default. I will debug that later.

Maarten-vd-Sande · 2023-11-28T16:05:29Z

I'm not familiar with 2-pass start alignment of star so I can't comment on that... What does it do? What changes? The sample fastqs, the genome assembly, or the index?

siebrenf · 2023-11-29T09:39:22Z

Hey bioinfolabmu,

I'm trying to read your questions but I get a bit confused. Please keep to one question per git Issue (I really don't mind if you open multiple 👼 )

I'll open some new issues for each question here that we haven't answered yet, and then try to answer them there!

bioinfolabmu added the bug Something isn't working label Nov 26, 2023

This was referenced Nov 29, 2023

Combining STAR and Salmon #1019

Open

STAR 2-pass alignment #1020

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: [config.yaml error] #1018

BUG: [config.yaml error] #1018

bioinfolabmu commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

Maarten-vd-Sande commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

Maarten-vd-Sande commented Nov 27, 2023

bioinfolabmu commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023 •

edited

Loading

siebrenf commented Nov 29, 2023

BUG: [config.yaml error] #1018

BUG: [config.yaml error] #1018

Comments

bioinfolabmu commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

Maarten-vd-Sande commented Nov 26, 2023

bioinfolabmu commented Nov 26, 2023

Maarten-vd-Sande commented Nov 27, 2023

bioinfolabmu commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

bioinfolabmu commented Nov 28, 2023

Maarten-vd-Sande commented Nov 28, 2023 • edited Loading

siebrenf commented Nov 29, 2023

Maarten-vd-Sande commented Nov 28, 2023 •

edited

Loading