-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: [config.yaml error] #1018
Comments
Alright, I think I know what is the problem fqsuffix: fq This modification has solved my configuration file problem. |
A relevant question in Sample.tsv file. For the second line in this file, I have the following tab-delimited column descriptions. sample assembly dev_stage treatment biological_replicates My questions is "are these column descriptions be fixed as the key words in your tool? For example, what happen if I use "developmental_stage" to replace "dev_stage"? or use "genome" to replace "assembly"? I did not find the relevant requirement information in your document web page. I am sorry if I missed something in your documentation. |
Great that you fixed it. You can have any number of columns, and name them whatever you want. There are however certain column names that have a specific meaning. Such as Specifically:
|
Thank you for quick response. (1) So, considering the DEG analysis, the names in samples.tsv should be in consistent with your requirement. Actually, "dev_stage" should be "stages". That way, we can make sure to get proper results by seq2sequence. Am I correct? (2) The column names such as "sample", "assembly", " stages", "treatments", "biological_replicates", "technical_replicates" and "condition" are easily applicable to many researcher's data analysis need. That should be sufficient. I noticed that "descriptive_name" requires unique constraint among different rows in the samples.tsv. Am I right? I am curios how does seq2science use "descriptive_name" internally? |
(1) I don't understand the question. For the DEG analysis you can use any column(s) in the samples file. You can use dev_stage or stages. Just make sure that you contrast specification in the config.yaml reflects the correct column name If you use the column dev_stage: (2) descriptive_name is one of the special columns that seq2science uses internally, just like for example sample, assembly, and biological replicates. It is used for the count table and for the final multiqc report |
Thanks. My qeustion was that "should we use 'stage' instead of 'dev_stage' or 'developmental_stages'? ". You said that it does not matter, because they are not the key words used in seq2science. My guess is that 'condition' is also not the key words used by seq2science. So, we can use different variations for it, such as 'conditions' or 'my_conditions', etc. Right? |
When I run my own data for alignment with Star, I encountered a bug. I am debugging now to see what happens. I noticed that Salmon as the quantifier tool, is not affected at all. It generates its own data. This means that Salmon is using his own alignment tool to finish the quantification itself. My next question is that, after I fixed the bug of running star, how can I connect the start alignment results to feed salmon for qunatification? |
Yes you are right! I guess that's not entirely clear from the docs.
|
Here is what in my config.yaml: aligner: Seq2science gives me a fatal error in log file, saying "Duplicate parameter". I am trying to solve this problem. |
Yeah that's perhaps unclear on our side (again). Almost all rules have sensible defaults, so you don't have to tune them. So you could just say: We force star to output a BAM by default, as we need a bam as its output, so we always have --outSAMtype BAM_Unsorted. This gives a duplicate parameter see: https://vanheeringen-lab.github.io/seq2science/content/all_rules.html#star-align |
Also I'm not sure if the downstream steps work when you change quantmode |
I remembered that I read somewhere, you do not support 2-pass start alignemnt yet. What is your recommendation if we want to do the two passes, and then come back to Seq2science again? |
I also encounter a problem running trimglora, but no problem with fastp. My guess is the similar problem with configuration with default paramters or no default. I will debug that later. |
I'm not familiar with 2-pass start alignment of star so I can't comment on that... What does it do? What changes? The sample fastqs, the genome assembly, or the index? |
Hey bioinfolabmu, I'm trying to read your questions but I get a bit confused. Please keep to one question per git Issue (I really don't mind if you open multiple 👼 ) I'll open some new issues for each question here that we haven't answered yet, and then try to answer them there! |
Describe the bug
My raw pair-end FASTQ data has the name like this
Sample1_Treat2_Replicate1_1.fq.gz
Sample1_Treat2_Replicate1_2.fq.gz
In my configuration file, I have the following entries:
fqsuffix: fq
fqext1: _1
fqext2: _2
Consequently, the program was looking for file named Sample1_Treat2_Replicate1__1.fq.gz (# two underscore symbol, instead of one)
Then, I modified the configuration file as
fqsuffix: fq
fqext1: 1
fqext2: 2
Next, I got this error.
Error validating config file.
ValidationError: 1 is not of type 'string'
Failed validating 'type' in schema['properties']['fqext1']:
OrderedDict([('description',
'filename suffix when handling paired-end data, '
'describing the forward read'),
('default', 'R1'),
('type', 'string')])
On instance['fqext1']:
1
The text was updated successfully, but these errors were encountered: