Unable to build the software #1

ShuangQiuac · 2019-04-17T15:48:49Z

Hi, I downloaded the software, and use the following command to build it:
mkdir build
cd build
cmake ..
make
But it returns the following error:
CMake Error: File /ext/kmerind/src/config/config.hpp.in does not exist.

Can you suggest the way to build and use it? Thanks!

tcpan · 2019-04-18T04:12:18Z

Hi, Shuang, I need to update the instruction. There is a step missing. You need to run the following from the source directory. git submodule update --init --recursive --progress That'll download the dependencies. Then do cmake and make. Please let me know if you run into further issues. Thanks! Tony Pan

…

On Wed, Apr 17, 2019 at 11:48 AM Shuang Qiu ***@***.***> wrote: Hi, I downloaded the software, and use the following command to build it: mkdir build cd build cmake .. make But it returns the following error: CMake Error: File /ext/kmerind/src/config/config.hpp.in does not exist. Can you suggest the way to build and use it? Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5QKXLU7hllB5RtWtYLiZ_aDbX2MIdyks5vh0JhgaJpZM4c1YrM> .

ShuangQiuac · 2019-04-18T06:39:45Z

Dear Tony Pan, Thanks for the reply! I can build the program by executing the command "git submodule update --init —recursive” before cmake and make. It generates “clear_cache” and “sys_probe” in the bin directory. Can you please provide further examples and instructions on how to run the program and how to specify parameters of the software? Thanks! Best regards. Shuang 在 2019年4月18日，下午12:12，Tony Pan <[email protected]<mailto:[email protected]>> 写道： Hi, Shuang, I need to update the instruction. There is a step missing. You need to run the following from the source directory. git submodule update --init --recursive --progress That'll download the dependencies. Then do cmake and make. Please let me know if you run into further issues. Thanks! Tony Pan

On Wed, Apr 17, 2019 at 11:48 AM Shuang Qiu ***@***.******@***.***>> wrote: Hi, I downloaded the software, and use the following command to build it: mkdir build cd build cmake .. make But it returns the following error: CMake Error: File /ext/kmerind/src/config/config.hpp.in<http://config.hpp.in> does not exist. Can you suggest the way to build and use it? Thanks! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5QKXLU7hllB5RtWtYLiZ_aDbX2MIdyks5vh0JhgaJpZM4c1YrM> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTXNH4HKP6ABDL2VYMLPQ7YKHANCNFSM4HGVRLGA>.

tcpan · 2019-04-18T15:02:32Z

Hi, Shuang,
It looks like I forgot to turn on the "Build Example Applications" by default.

Can you try to use ccmake instead of cmake

ccmake src_dir

which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit".

Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by

cmake --build . --target help

At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code.

Now some explanation of the target naming conventions. Here are a couple of example targets

"compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
"compact_debruijn_graph_fastq_A4_K31_freq_minimizer"

fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit.
A4: standard 2bit DNA encoding. The other alternative is A16, which supports 4 bit DNA encoding (IUPAC)
K21: kmer length. the cmake script is currently configured with 21, 31,51, 55, and 63. We can easily support others as well.
freq: this is my "code name" for an optimized graph construction algorithm. You should by default choose binaries with this label.
clean and clean_recompact: bubbles and deadends are removed and chains are recompacted. I used some simple criteria for identifying bubbles and deadends, and they may not be what you want. The code is set up so that an application developer can define their own criteria, but this requires some c++ coding.
minimizer: attempt at using minimizers for data distribution across multiple nodes - not performing well yet. You should avoid these.
incr: for when the input files pushes memory limit. This is data dependent (number of unique k-mers), but you may want to try using these incremental version if you have multiple files in your dataset and the fastq files are more than 1/16 of the total memory (a guess).

To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those.

To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk.

Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this.

Thanks

ShuangQiuac · 2019-05-01T05:04:40Z

Dear Tony, Thanks for your instruction! Unfortunately I can not use ccmake under CentOS in our lab servers. Could you please provide other instructions on how to use it with only cmake and make? For example, if I want to run bruno on dataset human chromosome 14, how can I build the program, and what parameters, e.g. kmer length, minimizer length, should I specify running it? Best regards. Shuang 在 2019年4月18日，下午11:02，Tony Pan <[email protected]<mailto:[email protected]>> 写道： Hi, Shuang, It looks like I forgot to turn on the "Build Example Applications" by default. Can you try to use ccmake instead of cmake ccmake src_dir which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit". Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by cmake --build . --target help At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code. Now some explanation of the target naming conventions. Here are a couple of example targets "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit. A4: standard 2bit DNA encoding. The other alternative is A16, which supports 4 bit DNA encoding (IUPAC) K21: kmer length. the cmake script is currently configured with 21, 31,51, 55, and 63. We can easily support others as well. freq: this is my "code name" for an optimized graph construction algorithm. You should by default choose binaries with this label. clean and clean_recompact: bubbles and deadends are removed and chains are recompacted. I used some simple criteria for identifying bubbles and deadends, and they may not be what you want. The code is set up so that an application developer can define their own criteria, but this requires some c++ coding. minimizer: attempt at using minimizers for data distribution across multiple nodes - not performing well yet. You should avoid these. incr: for when the input files pushes memory limit. This is data dependent (number of unique k-mers), but you may want to try using these incremental version if you have multiple files in your dataset and the fastq files are more than 1/16 of the total memory (a guess). To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those. To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk. Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this. Thanks — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.

tcpan · 2019-05-01T14:11:48Z

Hi, Shuang, Let's get you compiling first. Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This is the commandline way of changing cmake parameters. Next you can either do "make" to build everything or use the following to build specific binaries: cmake --build . --target help and pick the target binary you want, and run make {targetname} Once you have it running, you can invoke a binary with "--help" to see a list of its parameters, and the corresponding explanations. You probably will have some questions - please feel free to contact me. The choice of k value depends on what you're trying to do. For computational benchmarking, k <= 32 has the advantage of being long enough to have some biological relevance while short enough to fit in a machine word. For real assembly of human genome, however, larger k works better for resolving repeat regions, for example Hipmer uses 55 for human and SPADES goes up to 77 in their default settings. As I mentioned previously, our minimizer is not ready for use, and I am considering deprecating it completely. Please do not use it for genome assembly or performance benchmarking. Thanks, and let me know what other questions you may have.

…

On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.***> wrote: Dear Tony, Thanks for your instruction! Unfortunately I can not use ccmake under CentOS in our lab servers. Could you please provide other instructions on how to use it with only cmake and make? For example, if I want to run bruno on dataset human chromosome 14, how can I build the program, and what parameters, e.g. kmer length, minimizer length, should I specify running it? Best regards. Shuang 在 2019年4月18日，下午11:02，Tony Pan ***@***.***<mailto: ***@***.***>> 写道： Hi, Shuang, It looks like I forgot to turn on the "Build Example Applications" by default. Can you try to use ccmake instead of cmake ccmake src_dir which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit". Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by cmake --build . --target help At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code. Now some explanation of the target naming conventions. Here are a couple of example targets "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit. A4: standard 2bit DNA encoding. The other alternative is A16, which supports 4 bit DNA encoding (IUPAC) K21: kmer length. the cmake script is currently configured with 21, 31,51, 55, and 63. We can easily support others as well. freq: this is my "code name" for an optimized graph construction algorithm. You should by default choose binaries with this label. clean and clean_recompact: bubbles and deadends are removed and chains are recompacted. I used some simple criteria for identifying bubbles and deadends, and they may not be what you want. The code is set up so that an application developer can define their own criteria, but this requires some c++ coding. minimizer: attempt at using minimizers for data distribution across multiple nodes - not performing well yet. You should avoid these. incr: for when the input files pushes memory limit. This is data dependent (number of unique k-mers), but you may want to try using these incremental version if you have multiple files in your dataset and the fastq files are more than 1/16 of the total memory (a guess). To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those. To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk. Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this. Thanks — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> .

ShuangQiuac · 2019-05-01T14:48:08Z

Hi, Tony, Thanks for your reply! I can build the program now. Can you please specify what I should modify in the CMakeLists.txt, so that I can build a binary with K=29 and input file format = fasta? Best regards. Shuang 在 2019年5月1日，下午10:11，Tony Pan <[email protected]<mailto:[email protected]>> 写道： Hi, Shuang, Let's get you compiling first. Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This is the commandline way of changing cmake parameters. Next you can either do "make" to build everything or use the following to build specific binaries: cmake --build . --target help and pick the target binary you want, and run make {targetname} Once you have it running, you can invoke a binary with "--help" to see a list of its parameters, and the corresponding explanations. You probably will have some questions - please feel free to contact me. The choice of k value depends on what you're trying to do. For computational benchmarking, k <= 32 has the advantage of being long enough to have some biological relevance while short enough to fit in a machine word. For real assembly of human genome, however, larger k works better for resolving repeat regions, for example Hipmer uses 55 for human and SPADES goes up to 77 in their default settings. As I mentioned previously, our minimizer is not ready for use, and I am considering deprecating it completely. Please do not use it for genome assembly or performance benchmarking. Thanks, and let me know what other questions you may have.

On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***>> wrote: Dear Tony, Thanks for your instruction! Unfortunately I can not use ccmake under CentOS in our lab servers. Could you please provide other instructions on how to use it with only cmake and make? For example, if I want to run bruno on dataset human chromosome 14, how can I build the program, and what parameters, e.g. kmer length, minimizer length, should I specify running it? Best regards. Shuang 在 2019年4月18日，下午11:02，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>>> 写道： Hi, Shuang, It looks like I forgot to turn on the "Build Example Applications" by default. Can you try to use ccmake instead of cmake ccmake src_dir which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit". Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by cmake --build . --target help At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code. Now some explanation of the target naming conventions. Here are a couple of example targets "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit. A4: standard 2bit DNA encoding. The other alternative is A16, which supports 4 bit DNA encoding (IUPAC) K21: kmer length. the cmake script is currently configured with 21, 31,51, 55, and 63. We can easily support others as well. freq: this is my "code name" for an optimized graph construction algorithm. You should by default choose binaries with this label. clean and clean_recompact: bubbles and deadends are removed and chains are recompacted. I used some simple criteria for identifying bubbles and deadends, and they may not be what you want. The code is set up so that an application developer can define their own criteria, but this requires some c++ coding. minimizer: attempt at using minimizers for data distribution across multiple nodes - not performing well yet. You should avoid these. incr: for when the input files pushes memory limit. This is data dependent (number of unique k-mers), but you may want to try using these incremental version if you have multiple files in your dataset and the fastq files are more than 1/16 of the total memory (a guess). To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those. To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk. Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this. Thanks — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.

tcpan · 2019-05-01T17:20:50Z

Hi, Shuang, Good to hear. If you do not need bubble and deadend removal, then edit test/test/CMakeLists.txt, 1. add "29" to line 164. 2. uncomment lines 183-186 for the "freq" version of FASTA 3. uncomment lines 195-198 for the "incr" version of FASTA If you need bubble and deadend removal, then 1. add "29" to line 209 2. duplicate lines 218-221 and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. 3. for the "incr" verison, duplicate lines 228-231, and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. Then rerun cmake in your build directory, and compile the K29 versions of the targets. That should be it. Please let me know if you run into any issues. Thanks! Tony

…

On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.***> wrote: Hi, Tony, Thanks for your reply! I can build the program now. Can you please specify what I should modify in the CMakeLists.txt, so that I can build a binary with K=29 and input file format = fasta? Best regards. Shuang 在 2019年5月1日，下午10:11，Tony Pan ***@***.***<mailto: ***@***.***>> 写道： Hi, Shuang, Let's get you compiling first. Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This is the commandline way of changing cmake parameters. Next you can either do "make" to build everything or use the following to build specific binaries: cmake --build . --target help and pick the target binary you want, and run make {targetname} Once you have it running, you can invoke a binary with "--help" to see a list of its parameters, and the corresponding explanations. You probably will have some questions - please feel free to contact me. The choice of k value depends on what you're trying to do. For computational benchmarking, k <= 32 has the advantage of being long enough to have some biological relevance while short enough to fit in a machine word. For real assembly of human genome, however, larger k works better for resolving repeat regions, for example Hipmer uses 55 for human and SPADES goes up to 77 in their default settings. As I mentioned previously, our minimizer is not ready for use, and I am considering deprecating it completely. Please do not use it for genome assembly or performance benchmarking. Thanks, and let me know what other questions you may have. On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.*** ***@***.***>> wrote: > Dear Tony, > > Thanks for your instruction! Unfortunately I can not use ccmake under > CentOS in our lab servers. Could you please provide other instructions on > how to use it with only cmake and make? For example, if I want to run bruno > on dataset human chromosome 14, how can I build the program, and what > parameters, e.g. kmer length, minimizer length, should I specify running > it? > > Best regards. > > Shuang > > 在 2019年4月18日，下午11:02，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>>> 写道： > > > Hi, Shuang, > It looks like I forgot to turn on the "Build Example Applications" by > default. > > Can you try to use ccmake instead of cmake > > ccmake src_dir > > which will present a graphical user interface for configuring the project. > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". > then press c to configure and g to "generate and exit". > > Now this is going to create a somewhat large number of targets (what I > needed during my evaluation). You can see a list of the build target by > > cmake --build . --target help > > At this point if you run "make" (or "make -j 4"), everything will be built > and it might take a while, or you can run "make {target}", where {target} > is one of the targets listed. All the targets came about due to c++ > templating and a desire to reduce individual binary size and to avoid > excessive branching in the code. > > Now some explanation of the target naming conventions. Here are a couple > of example targets > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" > > fastq: means it operates on fastq files. We can easily support fasta files > as well - I'll explain how in a little bit. > A4: standard 2bit DNA encoding. The other alternative is A16, which > supports 4 bit DNA encoding (IUPAC) > K21: kmer length. the cmake script is currently configured with 21, 31,51, > 55, and 63. We can easily support others as well. > freq: this is my "code name" for an optimized graph construction > algorithm. You should by default choose binaries with this label. > clean and clean_recompact: bubbles and deadends are removed and chains are > recompacted. I used some simple criteria for identifying bubbles and > deadends, and they may not be what you want. The code is set up so that an > application developer can define their own criteria, but this requires some > c++ coding. > minimizer: attempt at using minimizers for data distribution across > multiple nodes - not performing well yet. You should avoid these. > incr: for when the input files pushes memory limit. This is data dependent > (number of unique k-mers), but you may want to try using these incremental > version if you have multiple files in your dataset and the fastq files are > more than 1/16 of the total memory (a guess). > > To support FASTA files and other k values, we just need to change the > CMakeLists.txt file to generate the appropriate targets. I can show you how > to do those. > > To summarize quickly, use the versions with "A4" and "freq" labels. If you > think you'll run out of memory, try the "incr" version. If you need fasta > file or other k-values support, let me know. If you need to remove bubbles > and dead ends, we should talk. > > Making the configure process easier has been on my things to do for a > while. I'll try to find some time to work on this. > > Thanks > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub< > #1 (comment)>, or > mute the thread< > https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA> .

ShuangQiuac · 2019-05-03T04:49:31Z

Hi, Tony, Thanks for your reply! Then how can I specify the input file? I didn’t see any parameter specification when I ran the compiled binary. Best regards. Shuang 在 2019年5月2日，上午1:20，Tony Pan <[email protected]<mailto:[email protected]>> 写道： Hi, Shuang, Good to hear. If you do not need bubble and deadend removal, then edit test/test/CMakeLists.txt, 1. add "29" to line 164. 2. uncomment lines 183-186 for the "freq" version of FASTA 3. uncomment lines 195-198 for the "incr" version of FASTA If you need bubble and deadend removal, then 1. add "29" to line 209 2. duplicate lines 218-221 and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. 3. for the "incr" verison, duplicate lines 228-231, and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. Then rerun cmake in your build directory, and compile the K29 versions of the targets. That should be it. Please let me know if you run into any issues. Thanks! Tony

On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.******@***.***>> wrote: Hi, Tony, Thanks for your reply! I can build the program now. Can you please specify what I should modify in the CMakeLists.txt, so that I can build a binary with K=29 and input file format = fasta? Best regards. Shuang 在 2019年5月1日，下午10:11，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>>> 写道： Hi, Shuang, Let's get you compiling first. Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This is the commandline way of changing cmake parameters. Next you can either do "make" to build everything or use the following to build specific binaries: cmake --build . --target help and pick the target binary you want, and run make {targetname} Once you have it running, you can invoke a binary with "--help" to see a list of its parameters, and the corresponding explanations. You probably will have some questions - please feel free to contact me. The choice of k value depends on what you're trying to do. For computational benchmarking, k <= 32 has the advantage of being long enough to have some biological relevance while short enough to fit in a machine word. For real assembly of human genome, however, larger k works better for resolving repeat regions, for example Hipmer uses 55 for human and SPADES goes up to 77 in their default settings. As I mentioned previously, our minimizer is not ready for use, and I am considering deprecating it completely. Please do not use it for genome assembly or performance benchmarking. Thanks, and let me know what other questions you may have. On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***> ***@***.***>> wrote: > Dear Tony, > > Thanks for your instruction! Unfortunately I can not use ccmake under > CentOS in our lab servers. Could you please provide other instructions on > how to use it with only cmake and make? For example, if I want to run bruno > on dataset human chromosome 14, how can I build the program, and what > parameters, e.g. kmer length, minimizer length, should I specify running > it? > > Best regards. > > Shuang > > 在 2019年4月18日，下午11:02，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>><mailto: > ***@***.******@***.******@***.***>>> 写道： > > > Hi, Shuang, > It looks like I forgot to turn on the "Build Example Applications" by > default. > > Can you try to use ccmake instead of cmake > > ccmake src_dir > > which will present a graphical user interface for configuring the project. > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". > then press c to configure and g to "generate and exit". > > Now this is going to create a somewhat large number of targets (what I > needed during my evaluation). You can see a list of the build target by > > cmake --build . --target help > > At this point if you run "make" (or "make -j 4"), everything will be built > and it might take a while, or you can run "make {target}", where {target} > is one of the targets listed. All the targets came about due to c++ > templating and a desire to reduce individual binary size and to avoid > excessive branching in the code. > > Now some explanation of the target naming conventions. Here are a couple > of example targets > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" > > fastq: means it operates on fastq files. We can easily support fasta files > as well - I'll explain how in a little bit. > A4: standard 2bit DNA encoding. The other alternative is A16, which > supports 4 bit DNA encoding (IUPAC) > K21: kmer length. the cmake script is currently configured with 21, 31,51, > 55, and 63. We can easily support others as well. > freq: this is my "code name" for an optimized graph construction > algorithm. You should by default choose binaries with this label. > clean and clean_recompact: bubbles and deadends are removed and chains are > recompacted. I used some simple criteria for identifying bubbles and > deadends, and they may not be what you want. The code is set up so that an > application developer can define their own criteria, but this requires some > c++ coding. > minimizer: attempt at using minimizers for data distribution across > multiple nodes - not performing well yet. You should avoid these. > incr: for when the input files pushes memory limit. This is data dependent > (number of unique k-mers), but you may want to try using these incremental > version if you have multiple files in your dataset and the fastq files are > more than 1/16 of the total memory (a guess). > > To support FASTA files and other k values, we just need to change the > CMakeLists.txt file to generate the appropriate targets. I can show you how > to do those. > > To summarize quickly, use the versions with "A4" and "freq" labels. If you > think you'll run out of memory, try the "incr" version. If you need fasta > file or other k-values support, let me know. If you need to remove bubbles > and dead ends, we should talk. > > Making the configure process easier has been on my things to do for a > while. I'll try to find some time to work on this. > > Thanks > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub< > #1 (comment)>, or > mute the thread< > https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>.

tcpan · 2019-05-03T12:46:48Z

Hi, Shuang, Can you verify that when you call the binary you get at least something like this? EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact PARSE ERROR: Required argument missing: filenames Brief USAGE: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C] [-N] [-R] [-B] [-U <uint16>] ... [-L <uint16>] ... [-T] [-O <string>] [--] [--version] [-h] <string> ... For complete USAGE and HELP type: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help You can add the "--help" switch to see the full parameter list. Let me know if you have any questions - the choice of switches will depend on what your goals are - benchmarking, generating and writing out the contigs, with or without bubble and deadend cleaning, etc. You can list all fasta files at the end of the command. I also want to re-emphasize that the bubble and deadend cleaning is meant as a demonstration of the library's capabilitiy and is based on my definition of bubbles and deadends. If you need graph cleaning, we should talk to make sure your desired logic is implemented. Thanks! Tony

…

On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.***> wrote: Hi, Tony, Thanks for your reply! Then how can I specify the input file? I didn’t see any parameter specification when I ran the compiled binary. Best regards. Shuang 在 2019年5月2日，上午1:20，Tony Pan ***@***.***<mailto: ***@***.***>> 写道： Hi, Shuang, Good to hear. If you do not need bubble and deadend removal, then edit test/test/CMakeLists.txt, 1. add "29" to line 164. 2. uncomment lines 183-186 for the "freq" version of FASTA 3. uncomment lines 195-198 for the "incr" version of FASTA If you need bubble and deadend removal, then 1. add "29" to line 209 2. duplicate lines 218-221 and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. 3. for the "incr" verison, duplicate lines 228-231, and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. Then rerun cmake in your build directory, and compile the K29 versions of the targets. That should be it. Please let me know if you run into any issues. Thanks! Tony On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.*** ***@***.***>> wrote: > Hi, Tony, > > Thanks for your reply! I can build the program now. Can you please specify > what I should modify in the CMakeLists.txt, so that I can build a binary > with K=29 and input file format = fasta? > > Best regards. > > Shuang > > 在 2019年5月1日，下午10:11，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>>> 写道： > > Hi, Shuang, > Let's get you compiling first. > > Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This > is the commandline way of changing cmake parameters. Next you can either > do "make" to build everything or use the following to build specific > binaries: > > > cmake --build . --target help > > and pick the target binary you want, and run > > make {targetname} > > Once you have it running, you can invoke a binary with "--help" to see a > list of its parameters, and the corresponding explanations. You probably > will have some questions - please feel free to contact me. > > > The choice of k value depends on what you're trying to do. For > computational benchmarking, k <= 32 has the advantage of being long enough > to have some biological relevance while short enough to fit in a machine > word. For real assembly of human genome, however, larger k works better > for resolving repeat regions, for example Hipmer uses 55 for human and > SPADES goes up to 77 in their default settings. > > As I mentioned previously, our minimizer is not ready for use, and I am > considering deprecating it completely. Please do not use it for genome > assembly or performance benchmarking. > > > Thanks, and let me know what other questions you may have. > > On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.*** ***@***.***> > ***@***.***>> wrote: > > > Dear Tony, > > > > Thanks for your instruction! Unfortunately I can not use ccmake under > > CentOS in our lab servers. Could you please provide other instructions > on > > how to use it with only cmake and make? For example, if I want to run > bruno > > on dataset human chromosome 14, how can I build the program, and what > > parameters, e.g. kmer length, minimizer length, should I specify running > > it? > > > > Best regards. > > > > Shuang > > > > 在 2019年4月18日，下午11:02，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>><mailto: > > ***@***.******@***.***><mailto: ***@***.***>>> 写道： > > > > > > Hi, Shuang, > > It looks like I forgot to turn on the "Build Example Applications" by > > default. > > > > Can you try to use ccmake instead of cmake > > > > ccmake src_dir > > > > which will present a graphical user interface for configuring the > project. > > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to > "ON". > > then press c to configure and g to "generate and exit". > > > > Now this is going to create a somewhat large number of targets (what I > > needed during my evaluation). You can see a list of the build target by > > > > cmake --build . --target help > > > > At this point if you run "make" (or "make -j 4"), everything will be > built > > and it might take a while, or you can run "make {target}", where > {target} > > is one of the targets listed. All the targets came about due to c++ > > templating and a desire to reduce individual binary size and to avoid > > excessive branching in the code. > > > > Now some explanation of the target naming conventions. Here are a couple > > of example targets > > > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" > > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" > > > > fastq: means it operates on fastq files. We can easily support fasta > files > > as well - I'll explain how in a little bit. > > A4: standard 2bit DNA encoding. The other alternative is A16, which > > supports 4 bit DNA encoding (IUPAC) > > K21: kmer length. the cmake script is currently configured with 21, > 31,51, > > 55, and 63. We can easily support others as well. > > freq: this is my "code name" for an optimized graph construction > > algorithm. You should by default choose binaries with this label. > > clean and clean_recompact: bubbles and deadends are removed and chains > are > > recompacted. I used some simple criteria for identifying bubbles and > > deadends, and they may not be what you want. The code is set up so that > an > > application developer can define their own criteria, but this requires > some > > c++ coding. > > minimizer: attempt at using minimizers for data distribution across > > multiple nodes - not performing well yet. You should avoid these. > > incr: for when the input files pushes memory limit. This is data > dependent > > (number of unique k-mers), but you may want to try using these > incremental > > version if you have multiple files in your dataset and the fastq files > are > > more than 1/16 of the total memory (a guess). > > > > To support FASTA files and other k values, we just need to change the > > CMakeLists.txt file to generate the appropriate targets. I can show you > how > > to do those. > > > > To summarize quickly, use the versions with "A4" and "freq" labels. If > you > > think you'll run out of memory, try the "incr" version. If you need > fasta > > file or other k-values support, let me know. If you need to remove > bubbles > > and dead ends, we should talk. > > > > Making the configure process easier has been on my things to do for a > > while. I'll try to find some time to work on this. > > > > Thanks > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub< > > #1 (comment)>, or > > mute the thread< > > > https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > <#1 (comment)>, or > mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> > > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub< > #1 (comment)>, or > mute the thread< > https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA> .

ShuangQiuac · 2019-05-24T11:45:24Z

Hi, Tony, Thanks for your reply! when I call the binary, it didn’t ask for any input parameter. It just ran and output results as follows, READING <path to bruno>/test/data/test.debruijn.small.fastq via posix total size read is 939 PARSING and INSERT rank 0 BEFFORE input=210 size=0 buckets=512 rank 0 AFTER input=210 size=140 reported=140 buckets=512 PARSING and INSERT DONE: total size after insert/rehash is 140 HISTOGRAM TOTAL Edge Existence Histogram: 0 1 2 3 4 0 0 5 0 0 0 1 1 132 0 1 0 2 0 0 0 0 0 3 0 1 0 0 0 4 0 0 0 0 0 rank 0 finished checking index PRINT BRANCHES PRINT BRANCH KMERS SIZES simple biedge size: 24 kmer size 8 node size 32 MAKE CHAINMAP MARK TERMINI NEXT TO BRANCHES estimate available mem=124400472064 bytes, p=1, alloc 124400472064 elements estimate num chain terminal updates=140, value_type size=24 bytes LIST RANKING rank 0 iter 1 updated 264, unfinished 137 internal chain nodes 117 rank 0 iter 2 updated 260, unfinished 137 internal chain nodes 97 rank 0 iter 3 updated 248, unfinished 137 internal chain nodes 57 rank 0 iter 4 updated 214, unfinished 114 internal chain nodes 0 REMOVE ISOLATED REMOVED 0 isolated nodes rank 0/1 input is EMPTY. rank 0 BEFORE input=210 size=0 buckets=512 rank 0 AFTER input=210 size=140 buckets=512 rank 0 map_base get_multiplicity called rank 0 BEFORE input=138 size=0 buckets=512 rank 0 AFTER input=138 size=6 buckets=512 PRINT CHAIN String PRINT CHAIN Nodes COMPUTE CHAIN FREQ SUMMARY GATHER NON_REP_END EDGE FREQUENCY rank 0 result size 6 capacity 7 CREATE CHAIN EDGE FREQUENCIES PRINT CHAIN EDGE FREQS Best regards. Shuang 在 2019年5月3日，下午8:46，Tony Pan <[email protected]<mailto:[email protected]>> 写道： Hi, Shuang, Can you verify that when you call the binary you get at least something like this? EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact PARSE ERROR: Required argument missing: filenames Brief USAGE: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C] [-N] [-R] [-B] [-U <uint16>] ... [-L <uint16>] ... [-T] [-O <string>] [--] [--version] [-h] <string> ... For complete USAGE and HELP type: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help You can add the "--help" switch to see the full parameter list. Let me know if you have any questions - the choice of switches will depend on what your goals are - benchmarking, generating and writing out the contigs, with or without bubble and deadend cleaning, etc. You can list all fasta files at the end of the command. I also want to re-emphasize that the bubble and deadend cleaning is meant as a demonstration of the library's capabilitiy and is based on my definition of bubbles and deadends. If you need graph cleaning, we should talk to make sure your desired logic is implemented. Thanks! Tony

On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.******@***.***>> wrote: Hi, Tony, Thanks for your reply! Then how can I specify the input file? I didn’t see any parameter specification when I ran the compiled binary. Best regards. Shuang 在 2019年5月2日，上午1:20，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>>> 写道： Hi, Shuang, Good to hear. If you do not need bubble and deadend removal, then edit test/test/CMakeLists.txt, 1. add "29" to line 164. 2. uncomment lines 183-186 for the "freq" version of FASTA 3. uncomment lines 195-198 for the "incr" version of FASTA If you need bubble and deadend removal, then 1. add "29" to line 209 2. duplicate lines 218-221 and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. 3. for the "incr" verison, duplicate lines 228-231, and change all occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the duplicates. Then rerun cmake in your build directory, and compile the K29 versions of the targets. That should be it. Please let me know if you run into any issues. Thanks! Tony On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.******@***.***> ***@***.***>> wrote: > Hi, Tony, > > Thanks for your reply! I can build the program now. Can you please specify > what I should modify in the CMakeLists.txt, so that I can build a binary > with K=29 and input file format = fasta? > > Best regards. > > Shuang > > 在 2019年5月1日，下午10:11，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>><mailto: > ***@***.******@***.******@***.***>>> 写道： > > Hi, Shuang, > Let's get you compiling first. > > Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This > is the commandline way of changing cmake parameters. Next you can either > do "make" to build everything or use the following to build specific > binaries: > > > cmake --build . --target help > > and pick the target binary you want, and run > > make {targetname} > > Once you have it running, you can invoke a binary with "--help" to see a > list of its parameters, and the corresponding explanations. You probably > will have some questions - please feel free to contact me. > > > The choice of k value depends on what you're trying to do. For > computational benchmarking, k <= 32 has the advantage of being long enough > to have some biological relevance while short enough to fit in a machine > word. For real assembly of human genome, however, larger k works better > for resolving repeat regions, for example Hipmer uses 55 for human and > SPADES goes up to 77 in their default settings. > > As I mentioned previously, our minimizer is not ready for use, and I am > considering deprecating it completely. Please do not use it for genome > assembly or performance benchmarking. > > > Thanks, and let me know what other questions you may have. > > On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***> ***@***.***> > ***@***.***>> wrote: > > > Dear Tony, > > > > Thanks for your instruction! Unfortunately I can not use ccmake under > > CentOS in our lab servers. Could you please provide other instructions > on > > how to use it with only cmake and make? For example, if I want to run > bruno > > on dataset human chromosome 14, how can I build the program, and what > > parameters, e.g. kmer length, minimizer length, should I specify running > > it? > > > > Best regards. > > > > Shuang > > > > 在 2019年4月18日，下午11:02，Tony Pan ***@***.******@***.***><mailto: ***@***.******@***.***>><mailto: > ***@***.******@***.******@***.***>><mailto: > > ***@***.******@***.******@***.***><mailto: ***@***.******@***.***>>>> 写道： > > > > > > Hi, Shuang, > > It looks like I forgot to turn on the "Build Example Applications" by > > default. > > > > Can you try to use ccmake instead of cmake > > > > ccmake src_dir > > > > which will present a graphical user interface for configuring the > project. > > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to > "ON". > > then press c to configure and g to "generate and exit". > > > > Now this is going to create a somewhat large number of targets (what I > > needed during my evaluation). You can see a list of the build target by > > > > cmake --build . --target help > > > > At this point if you run "make" (or "make -j 4"), everything will be > built > > and it might take a while, or you can run "make {target}", where > {target} > > is one of the targets listed. All the targets came about due to c++ > > templating and a desire to reduce individual binary size and to avoid > > excessive branching in the code. > > > > Now some explanation of the target naming conventions. Here are a couple > > of example targets > > > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" > > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" > > > > fastq: means it operates on fastq files. We can easily support fasta > files > > as well - I'll explain how in a little bit. > > A4: standard 2bit DNA encoding. The other alternative is A16, which > > supports 4 bit DNA encoding (IUPAC) > > K21: kmer length. the cmake script is currently configured with 21, > 31,51, > > 55, and 63. We can easily support others as well. > > freq: this is my "code name" for an optimized graph construction > > algorithm. You should by default choose binaries with this label. > > clean and clean_recompact: bubbles and deadends are removed and chains > are > > recompacted. I used some simple criteria for identifying bubbles and > > deadends, and they may not be what you want. The code is set up so that > an > > application developer can define their own criteria, but this requires > some > > c++ coding. > > minimizer: attempt at using minimizers for data distribution across > > multiple nodes - not performing well yet. You should avoid these. > > incr: for when the input files pushes memory limit. This is data > dependent > > (number of unique k-mers), but you may want to try using these > incremental > > version if you have multiple files in your dataset and the fastq files > are > > more than 1/16 of the total memory (a guess). > > > > To support FASTA files and other k values, we just need to change the > > CMakeLists.txt file to generate the appropriate targets. I can show you > how > > to do those. > > > > To summarize quickly, use the versions with "A4" and "freq" labels. If > you > > think you'll run out of memory, try the "incr" version. If you need > fasta > > file or other k-values support, let me know. If you need to remove > bubbles > > and dead ends, we should talk. > > > > Making the configure process easier has been on my things to do for a > > while. I'll try to find some time to work on this. > > > > Thanks > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub< > > #1 (comment)>, or > > mute the thread< > > > https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > <#1 (comment)>, or > mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> > > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub< > #1 (comment)>, or > mute the thread< > https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA> .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTVVBO52TH4UYLFOUXLPTQX3RANCNFSM4HGVRLGA>.

tcpan · 2019-05-31T14:51:13Z

Hi, Shuang, Sorry for the late reply - was on vacation. Can you verify the exact command you used? And also the git commit number you used? What do you get when you call? "bin/compact_debruijn_graph_fastq_A4_K31_freq --help" The binaries are not interactive. The parameters need to be specified at the commandline as parameters to the binary. When you call the binary without any parameters, it directly starts execution using the default parameters, which uses the included test data - essentially it becomes an integration test run. Tony

…

On Fri, May 24, 2019 at 7:45 AM Shuang Qiu ***@***.***> wrote: Hi, Tony, Thanks for your reply! when I call the binary, it didn’t ask for any input parameter. It just ran and output results as follows, READING <path to bruno>/test/data/test.debruijn.small.fastq via posix total size read is 939 PARSING and INSERT rank 0 BEFFORE input=210 size=0 buckets=512 rank 0 AFTER input=210 size=140 reported=140 buckets=512 PARSING and INSERT DONE: total size after insert/rehash is 140 HISTOGRAM TOTAL Edge Existence Histogram: 0 1 2 3 4 0 0 5 0 0 0 1 1 132 0 1 0 2 0 0 0 0 0 3 0 1 0 0 0 4 0 0 0 0 0 rank 0 finished checking index PRINT BRANCHES PRINT BRANCH KMERS SIZES simple biedge size: 24 kmer size 8 node size 32 MAKE CHAINMAP MARK TERMINI NEXT TO BRANCHES estimate available mem=124400472064 bytes, p=1, alloc 124400472064 elements estimate num chain terminal updates=140, value_type size=24 bytes LIST RANKING rank 0 iter 1 updated 264, unfinished 137 internal chain nodes 117 rank 0 iter 2 updated 260, unfinished 137 internal chain nodes 97 rank 0 iter 3 updated 248, unfinished 137 internal chain nodes 57 rank 0 iter 4 updated 214, unfinished 114 internal chain nodes 0 REMOVE ISOLATED REMOVED 0 isolated nodes rank 0/1 input is EMPTY. rank 0 BEFORE input=210 size=0 buckets=512 rank 0 AFTER input=210 size=140 buckets=512 rank 0 map_base get_multiplicity called rank 0 BEFORE input=138 size=0 buckets=512 rank 0 AFTER input=138 size=6 buckets=512 PRINT CHAIN String PRINT CHAIN Nodes COMPUTE CHAIN FREQ SUMMARY GATHER NON_REP_END EDGE FREQUENCY rank 0 result size 6 capacity 7 CREATE CHAIN EDGE FREQUENCIES PRINT CHAIN EDGE FREQS Best regards. Shuang 在 2019年5月3日，下午8:46，Tony Pan ***@***.***<mailto: ***@***.***>> 写道： Hi, Shuang, Can you verify that when you call the binary you get at least something like this? EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact PARSE ERROR: Required argument missing: filenames Brief USAGE: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C] [-N] [-R] [-B] [-U <uint16>] ... [-L <uint16>] ... [-T] [-O <string>] [--] [--version] [-h] <string> ... For complete USAGE and HELP type: bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help You can add the "--help" switch to see the full parameter list. Let me know if you have any questions - the choice of switches will depend on what your goals are - benchmarking, generating and writing out the contigs, with or without bubble and deadend cleaning, etc. You can list all fasta files at the end of the command. I also want to re-emphasize that the bubble and deadend cleaning is meant as a demonstration of the library's capabilitiy and is based on my definition of bubbles and deadends. If you need graph cleaning, we should talk to make sure your desired logic is implemented. Thanks! Tony On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.*** ***@***.***>> wrote: > Hi, Tony, > > Thanks for your reply! Then how can I specify the input file? I didn’t see > any parameter specification when I ran the compiled binary. > > Best regards. > > Shuang > > 在 2019年5月2日，上午1:20，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>>> 写道： > > Hi, Shuang, > Good to hear. > > If you do not need bubble and deadend removal, then edit > test/test/CMakeLists.txt, > > 1. add "29" to line 164. > 2. uncomment lines 183-186 for the "freq" version of FASTA > 3. uncomment lines 195-198 for the "incr" version of FASTA > > If you need bubble and deadend removal, then > > 1. add "29" to line 209 > 2. duplicate lines 218-221 and change all occurrences of "fastq" and > "FASTQ" to "fasta" and "FASTA" in the duplicates. > 3. for the "incr" verison, duplicate lines 228-231, and change all > occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the > duplicates. > > > Then rerun cmake in your build directory, and compile the K29 versions of > the targets. > > That should be it. Please let me know if you run into any issues. Thanks! > > Tony > > On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.*** ***@***.***> > ***@***.***>> wrote: > > > Hi, Tony, > > > > Thanks for your reply! I can build the program now. Can you please > specify > > what I should modify in the CMakeLists.txt, so that I can build a binary > > with K=29 and input file format = fasta? > > > > Best regards. > > > > Shuang > > > > 在 2019年5月1日，下午10:11，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>><mailto: > > ***@***.******@***.***><mailto: ***@***.***>>> 写道： > > > > Hi, Shuang, > > Let's get you compiling first. > > > > Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This > > is the commandline way of changing cmake parameters. Next you can either > > do "make" to build everything or use the following to build specific > > binaries: > > > > > > cmake --build . --target help > > > > and pick the target binary you want, and run > > > > make {targetname} > > > > Once you have it running, you can invoke a binary with "--help" to see a > > list of its parameters, and the corresponding explanations. You probably > > will have some questions - please feel free to contact me. > > > > > > The choice of k value depends on what you're trying to do. For > > computational benchmarking, k <= 32 has the advantage of being long > enough > > to have some biological relevance while short enough to fit in a machine > > word. For real assembly of human genome, however, larger k works better > > for resolving repeat regions, for example Hipmer uses 55 for human and > > SPADES goes up to 77 in their default settings. > > > > As I mentioned previously, our minimizer is not ready for use, and I am > > considering deprecating it completely. Please do not use it for genome > > assembly or performance benchmarking. > > > > > > Thanks, and let me know what other questions you may have. > > > > On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.*** ***@***.***> > ***@***.***> > > ***@***.***>> wrote: > > > > > Dear Tony, > > > > > > Thanks for your instruction! Unfortunately I can not use ccmake under > > > CentOS in our lab servers. Could you please provide other instructions > > on > > > how to use it with only cmake and make? For example, if I want to run > > bruno > > > on dataset human chromosome 14, how can I build the program, and what > > > parameters, e.g. kmer length, minimizer length, should I specify > running > > > it? > > > > > > Best regards. > > > > > > Shuang > > > > > > 在 2019年4月18日，下午11:02，Tony Pan ***@***.***<mailto: ***@***.***><mailto: > ***@***.******@***.***>><mailto: > > ***@***.******@***.***><mailto: ***@***.***>><mailto: > > > ***@***.******@***.***><mailto: ***@***.***><mailto: > ***@***.******@***.***>>>> 写道： > > > > > > > > > Hi, Shuang, > > > It looks like I forgot to turn on the "Build Example Applications" by > > > default. > > > > > > Can you try to use ccmake instead of cmake > > > > > > ccmake src_dir > > > > > > which will present a graphical user interface for configuring the > > project. > > > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to > > "ON". > > > then press c to configure and g to "generate and exit". > > > > > > Now this is going to create a somewhat large number of targets (what I > > > needed during my evaluation). You can see a list of the build target > by > > > > > > cmake --build . --target help > > > > > > At this point if you run "make" (or "make -j 4"), everything will be > > built > > > and it might take a while, or you can run "make {target}", where > > {target} > > > is one of the targets listed. All the targets came about due to c++ > > > templating and a desire to reduce individual binary size and to avoid > > > excessive branching in the code. > > > > > > Now some explanation of the target naming conventions. Here are a > couple > > > of example targets > > > > > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" > > > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer" > > > > > > fastq: means it operates on fastq files. We can easily support fasta > > files > > > as well - I'll explain how in a little bit. > > > A4: standard 2bit DNA encoding. The other alternative is A16, which > > > supports 4 bit DNA encoding (IUPAC) > > > K21: kmer length. the cmake script is currently configured with 21, > > 31,51, > > > 55, and 63. We can easily support others as well. > > > freq: this is my "code name" for an optimized graph construction > > > algorithm. You should by default choose binaries with this label. > > > clean and clean_recompact: bubbles and deadends are removed and chains > > are > > > recompacted. I used some simple criteria for identifying bubbles and > > > deadends, and they may not be what you want. The code is set up so > that > > an > > > application developer can define their own criteria, but this requires > > some > > > c++ coding. > > > minimizer: attempt at using minimizers for data distribution across > > > multiple nodes - not performing well yet. You should avoid these. > > > incr: for when the input files pushes memory limit. This is data > > dependent > > > (number of unique k-mers), but you may want to try using these > > incremental > > > version if you have multiple files in your dataset and the fastq files > > are > > > more than 1/16 of the total memory (a guess). > > > > > > To support FASTA files and other k values, we just need to change the > > > CMakeLists.txt file to generate the appropriate targets. I can show > you > > how > > > to do those. > > > > > > To summarize quickly, use the versions with "A4" and "freq" labels. If > > you > > > think you'll run out of memory, try the "incr" version. If you need > > fasta > > > file or other k-values support, let me know. If you need to remove > > bubbles > > > and dead ends, we should talk. > > > > > > Making the configure process easier has been on my things to do for a > > > while. I'll try to find some time to work on this. > > > > > > Thanks > > > > > > — > > > You are receiving this because you authored the thread. > > > Reply to this email directly, view it on GitHub< > > > #1 (comment)>, > or > > > mute the thread< > > > > > > https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>. > > > > > > > > > > > > — > > > You are receiving this because you commented. > > > Reply to this email directly, view it on GitHub > > > <#1 (comment)>, > or > > mute > > > the thread > > > < > > > https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA> > > > > > > . > > > > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub< > > #1 (comment)>, or > > mute the thread< > > > https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>. > > > > > > > — > > You are receiving this because you commented. > > Reply to this email directly, view it on GitHub > > <#1 (comment)>, or > mute > > the thread > > < > https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA> > > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub< > #1 (comment)>, or > mute the thread< > https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>. > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#1 (comment)>, or mute > the thread > < https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< #1 (comment)>, or mute the thread< https://github.com/notifications/unsubscribe-auth/AE6DTTVVBO52TH4UYLFOUXLPTQX3RANCNFSM4HGVRLGA>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AAHFAKJGH7NV6QMQTRWZ7CDPW7INJA5CNFSM4HGVRLGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWFBN2I#issuecomment-495589097>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKL3QT2RCRFHLEBQKM3PW7INJANCNFSM4HGVRLGA> .

ShuangQiuac · 2019-06-06T06:47:16Z

Hi, Tony, Thanks a lot for your reply. Sorry, I previously missed the last parameter for inputing a file, as listed with —help (but the program does not ask the input parameter if I directly execute ./compact_debruijn_graph_fastq_A4_K31. It directly input the test data.) I have another question: how to specify the number of CPU threads/processes to run it? It seems that only one CPU core is used by default when I execute ./compact_debruijn_graph_fastq_A4_K31 <input fastq file> The git commit number is db82d6f. Best regards. Shuang 在 2019年5月31日，下午10:51，Tony Pan <[email protected]<mailto:[email protected]>> 写道： interactive

tcpan · 2019-06-06T16:08:50Z

Hi, Shuang, Yeah, I should have mentioned this part about multi-core. The binary is an MPI program. What you need to do is use one of the MPI flavors: OpenMPI, MPICH, MVAPICH, or on Cray systems is Cray MPI. typically, there is an mpirun or mpiexec command that you'd prefix the binary. You'd also specify the cores/processes as parameter to the mpirun/mpiexec command. For example, for OpenMPI, the process count is specified by "-np", so the commandline might look like mpirun -np 16 ./compact_debruijn_graph_fastq_A4_K31 <input fastq file> Without using mpirun, the bruno binaries essentially runs as single threaded. Unfortunately, each MPI implementation may call their command differently, as well as the set of flags. Furthermore, to have it run well, the MPI processes should be pinned to the cores. Each MPI implementation again has its own way of doing this. Finally, if you are using a job scheduler, that will also have impact on how the processes are assigned to cores. Since you were able to compile, I assume you have an MPI installation on your system. If you can let me know which mpi you are using, and which job schedule if any, I'd be able to better tell you what switches are needed. Tony

…

On Thu, Jun 6, 2019 at 2:47 AM Shuang Qiu ***@***.***> wrote: Hi, Tony, Thanks a lot for your reply. Sorry, I previously missed the last parameter for inputing a file, as listed with —help (but the program does not ask the input parameter if I directly execute ./compact_debruijn_graph_fastq_A4_K31. It directly input the test data.) I have another question: how to specify the number of CPU threads/processes to run it? It seems that only one CPU core is used by default when I execute ./compact_debruijn_graph_fastq_A4_K31 <input fastq file> The git commit number is db82d6f. Best regards. Shuang 在 2019年5月31日，下午10:51，Tony Pan ***@***.***<mailto: ***@***.***>> 写道： interactive — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1?email_source=notifications&email_token=AAHFAKKEXMP3PTIYRHMHMLLPZCXHJA5CNFSM4HGVRLGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXB4YHA#issuecomment-499371036>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHFAKL7D5AJKX56YMVMQ4LPZCXHJANCNFSM4HGVRLGA> .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to build the software #1

Unable to build the software #1

ShuangQiuac commented Apr 17, 2019

tcpan commented Apr 18, 2019 via email

ShuangQiuac commented Apr 18, 2019 via email

tcpan commented Apr 18, 2019

ShuangQiuac commented May 1, 2019 via email

tcpan commented May 1, 2019 via email

ShuangQiuac commented May 1, 2019 via email

tcpan commented May 1, 2019 via email

ShuangQiuac commented May 3, 2019 via email

tcpan commented May 3, 2019 via email

ShuangQiuac commented May 24, 2019 via email

tcpan commented May 31, 2019 via email

ShuangQiuac commented Jun 6, 2019 via email

tcpan commented Jun 6, 2019 via email

Unable to build the software #1

Unable to build the software #1

Comments

ShuangQiuac commented Apr 17, 2019

tcpan commented Apr 18, 2019 via email

ShuangQiuac commented Apr 18, 2019 via email

tcpan commented Apr 18, 2019

ShuangQiuac commented May 1, 2019 via email

tcpan commented May 1, 2019 via email

ShuangQiuac commented May 1, 2019 via email

tcpan commented May 1, 2019 via email

ShuangQiuac commented May 3, 2019 via email

tcpan commented May 3, 2019 via email

ShuangQiuac commented May 24, 2019 via email

tcpan commented May 31, 2019 via email

ShuangQiuac commented Jun 6, 2019 via email

tcpan commented Jun 6, 2019 via email