-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samplesheet check for distributed run #42
Comments
Issue seems to have bene resolved with different AWS settings. |
@AnotherSimon do you recall which settings you changed to get this to work? We are encountering issues running this on AWS batch and yours seems to be the closest. |
Hi Timothy, If I remember this bug correcctly, it was related to incorrect AWS account settings i.e. the service role not having the proper permissions to set up EC2 instances, access the buckets and submit jobs to the AWS Batch queue ... something like that. When in doubt, delete the cached jobs, reconfigure your environment from scratch or try using a completely unrelated pipeline to rule out permissions issues. Hope you're not in to deep to consider throwing out what results you already have. Fair warning: letting the work dir sit for stalled jobs will eat up a lot of S3 budget! |
@AnotherSimon thank you so much for the quick reply! It looks like our error had a different root but really appreciate the additional information. For anyone else who might stumble upon this, ours ended up being a different issue - we ended up narrowing it down to the way the awscli was mounted - it was installed in the underlying AMI in |
@tgjohnst we thank you for your resolution note |
Description of the bug
After a succesful testrun of RNAvar with local/docker config, I am trying to set up a run on AWS Batch to scale up to larger sample numbers.
I've moved all inputs and outputs to an S3 bucket the first few processes like SAMTOOLS_FAIDX run to completion. The samplesheet_check seems to fail however. Not well versed enough in DSL2 to know for sure but it seems like checking the existance of a file might not be possible since files are only pulled if they are needed for a given process i.e. "SampleA.fastq.gz" will only be downloaded to the EC2 instances for mapping of SampleA and SAMPLESHEET_CHECK seems to want to create a channel from a file that is accesible on a machine running the process.
Command used and terminal output
Relevant files
$cat nextflow.config
plugins {
id 'nf-amazon'
}
process {
executor = 'awsbatch'
queue = 'arn:aws:batch:us***:job-queue/***-queue'
}
aws {
batch {
// NOTE: this setting is only required if the AWS CLI tool is installed in a custom AMI
cliPath = '/home/ec2-user/miniconda/bin/aws'
}
region = 'us***'
}
$ tail .nextflow.log
Apr-29 19:11:21.317 [main] DEBUG nextflow.extension.CH - Bridging dataflow queue=DataflowQueue(queue=[])
Apr-29 19:11:21.318 [main] DEBUG nextflow.extension.CH - Bridging dataflow queue=DataflowQueue(queue=[])
Apr-29 19:11:21.318 [main] DEBUG nextflow.extension.CH - Bridging dataflow queue=DataflowQueue(queue=[])
Apr-29 19:11:21.318 [main] DEBUG nextflow.Session - Ignite dataflow network (86)
Apr-29 19:11:21.337 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:GTF2BED
Apr-29 19:11:21.342 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:SAMTOOLS_FAIDX
Apr-29 19:11:21.343 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY
Apr-29 19:11:21.344 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:INPUT_CHECK:SAMPLESHEET_CHECK
Apr-29 19:11:21.346 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:CAT_FASTQ
Apr-29 19:11:21.346 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:FASTQC
Apr-29 19:11:21.347 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_BEDTOINTERVALLIST
Apr-29 19:11:21.353 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_INTERVALLISTTOOLS
Apr-29 19:11:21.368 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:STAR_ALIGN
Apr-29 19:11:21.370 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:BAM_SORT_SAMTOOLS:SAMTOOLS_SORT
Apr-29 19:11:21.371 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:BAM_SORT_SAMTOOLS:SAMTOOLS_INDEX
Apr-29 19:11:21.372 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:BAM_SORT_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS
Apr-29 19:11:21.378 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:BAM_SORT_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT
Apr-29 19:11:21.378 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ALIGN_STAR:BAM_SORT_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS
Apr-29 19:11:21.378 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MARKDUPLICATES:PICARD_MARKDUPLICATES
Apr-29 19:11:21.379 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MARKDUPLICATES:SAMTOOLS_INDEX
Apr-29 19:11:21.379 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MARKDUPLICATES:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS
Apr-29 19:11:21.380 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MARKDUPLICATES:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT
Apr-29 19:11:21.381 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MARKDUPLICATES:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS
Apr-29 19:11:21.382 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:SPLITNCIGAR:GATK4_SPLITNCIGAR:GATK4_SPLITNCIGARREADS
Apr-29 19:11:21.385 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:SPLITNCIGAR:GATK4_SPLITNCIGAR:SAMTOOLS_INDEX
Apr-29 19:11:21.386 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:SPLITNCIGAR:SAMTOOLS_MERGE
Apr-29 19:11:21.386 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:SPLITNCIGAR:SAMTOOLS_INDEX
Apr-29 19:11:21.387 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_BASERECALIBRATOR
Apr-29 19:11:21.389 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:RECALIBRATE:APPLYBQSR
Apr-29 19:11:21.389 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:RECALIBRATE:SAMTOOLS_INDEX
Apr-29 19:11:21.389 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:RECALIBRATE:SAMTOOLS_STATS
Apr-29 19:11:21.389 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_HAPLOTYPECALLER
Apr-29 19:11:21.390 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_MERGEVCFS
Apr-29 19:11:21.390 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_INDEXFEATUREFILE
Apr-29 19:11:21.390 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:GATK4_VARIANTFILTRATION
Apr-29 19:11:21.396 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ANNOTATE:ENSEMBLVEP_ANNOTATE:ENSEMBLVEP
Apr-29 19:11:21.396 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:ANNOTATE:ENSEMBLVEP_ANNOTATE:TABIX_BGZIPTABIX
Apr-29 19:11:21.397 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:CUSTOM_DUMPSOFTWAREVERSIONS
Apr-29 19:11:21.399 [main] DEBUG nextflow.processor.TaskProcessor - Starting process > NFCORE_RNAVAR:RNAVAR:MULTIQC
Apr-29 19:11:21.399 [main] DEBUG nextflow.script.ScriptRunner - > Await termination
Apr-29 19:11:21.399 [main] DEBUG nextflow.Session - Session await
Apr-29 19:11:21.733 [Actor Thread 59] DEBUG nextflow.sort.BigSort - Sort completed -- entries: 1; slices: 1; internal sort time: 0.001 s; external sort time: 0.016 s; total time: 0.017 s
Apr-29 19:11:21.779 [Actor Thread 59] DEBUG nextflow.file.FileCollector - Saved collect-files list to: /tmp/27dcb8f6261b2ab88f8b4a2c393d7497.collect-file
Apr-29 19:11:21.802 [Actor Thread 59] DEBUG nextflow.file.FileCollector - Deleting file collector temp dir: /tmp/nxf-11758180432553332609
Apr-29 19:11:21.809 [Actor Thread 57] DEBUG nextflow.util.CacheHelper - Hash asset file sha-256: /home/ubuntu/.nextflow/assets/nf-core/rnavar/bin/check_samplesheet.py
Apr-29 19:11:22.436 [AWSBatch-executor-1] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] [AWS BATCH] Found job definition name=nf-quay-io-biocontainers-samtools-1-15-1--h1170115_0:1; container=quay.io/biocontainers/samtools:1.15.1--h1170115_0
Apr-29 19:11:22.512 [AWSBatch-executor-3] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] [AWS BATCH] Found job definition name=nf-quay-io-biocontainers-r-base-3-5-0:1; container=quay.io/biocontainers/r-base:3.5.0
Apr-29 19:11:22.598 [AWSBatch-executor-1] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] submitted > job=efc6cdce-f02c-4309-8fdc-ed5ed0587df0; work-dir=s3://myStudy/rnaver_workDir/ec/6daa393556ec2e0b8db50f36e9c6cc
Apr-29 19:11:22.599 [AWSBatch-executor-3] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] submitted > job=a9381608-c7f2-4020-8ee1-7259e066aa50; work-dir=s3://myStudy/rnaver_workDir/4f/535004f264dfb1ded556e9a55a019c
Apr-29 19:11:22.601 [AWSBatch-executor-3] INFO nextflow.Session - [4f/535004] Submitted process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:GTF2BED (Homo_sapiens.GRCh38.99.gtf)
Apr-29 19:11:22.605 [AWSBatch-executor-1] INFO nextflow.Session - [ec/6daa39] Submitted process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:SAMTOOLS_FAIDX (Homo_sapiens.GRCh38.dna.primary_assembly.fa)
Apr-29 19:11:22.619 [AWSBatch-executor-2] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] [AWS BATCH] Found job definition name=nf-quay-io-biocontainers-gatk4-4-2-5-0--hdfd78af_0:1; container=quay.io/biocontainers/gatk4:4.2.5.0--hdfd78af_0
Apr-29 19:11:22.711 [AWSBatch-executor-4] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] [AWS BATCH] Found job definition name=nf-quay-io-biocontainers-python-3-9--1:1; container=quay.io/biocontainers/python:3.9--1
Apr-29 19:11:22.730 [AWSBatch-executor-2] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] submitted > job=dc18d624-569f-406a-8abc-52f50371667c; work-dir=s3://myStudy/rnaver_workDir/ca/5493f2e77c41f2ceee42021857d968
Apr-29 19:11:22.730 [AWSBatch-executor-2] INFO nextflow.Session - [ca/5493f2] Submitted process > NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:GATK4_CREATESEQUENCEDICTIONARY (Homo_sapiens.GRCh38.dna.primary_assembly.fa)
Apr-29 19:11:22.790 [AWSBatch-executor-4] DEBUG n.c.aws.batch.AwsBatchTaskHandler - [AWS BATCH] submitted > job=70e9e6e0-e31c-4cca-98f4-ada84358c74d; work-dir=s3://myStudy/rnaver_workDir/fb/91fc1ead89e38ff0805706d89647c8
Apr-29 19:11:22.791 [AWSBatch-executor-4] INFO nextflow.Session - [fb/91fc1e] Submitted process > NFCORE_RNAVAR:RNAVAR:INPUT_CHECK:SAMPLESHEET_CHECK (test_RNAvar_input_S3paths.csv)
Apr-29 19:15:59.942 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 4; name: NFCORE_RNAVAR:RNAVAR:PREPARE_GENOME:SAMTOOLS_FAIDX (Homo_sapiens.GRCh38.dna.primary_assembly.fa); status: COMPLETED; exit: 0; error: -; workDir: s3://myStudy/rnaver_workDir/ec/6daa393556ec2e0b8db50f36e9c6cc]
Apr-29 19:15:59.950 [Task monitor] DEBUG nextflow.file.FileHelper - Path matcher not defined by 'S3FileSystem' file system -- using default default strategy
Apr-29 19:16:00.227 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 3; name: NFCORE_RNAVAR:RNAVAR:INPUT_CHECK:SAMPLESHEET_CHECK (test_RNAvar_input_S3paths.csv); status: COMPLETED; exit: 0; error: -; workDir: s3://myStudy/rnaver_workDir/fb/91fc1ead89e38ff0805706d89647c8]
Apr-29 19:16:00.430 [Actor Thread 70] ERROR nextflow.Nextflow - ERROR: Please check input samplesheet -> Read 1 FastQ file does not exist!
s3://myStudy/raw/myStudy-0002.trim.R1.fq.gz
Apr-29 19:16:00.461 [Task monitor] DEBUG n.util.BlockingThreadExecutorFactory - Thread pool name=FileTransfer; maxThreads=6; maxQueueSize=18; keepAlive=1m
System information
Using an EC2 instance (18.04.5 LTS Bionic Beaver) to control a run on AWS Batch.
N E X T F L O W ~ version 21.10.6
nf-core/rnavar
- revision: 8e8f79a [dev]The text was updated successfully, but these errors were encountered: