-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to build the software #1
Comments
Hi, Shuang,
I need to update the instruction. There is a step missing.
You need to run the following from the source directory.
git submodule update --init --recursive --progress
That'll download the dependencies.
Then do cmake and make.
Please let me know if you run into further issues. Thanks!
Tony Pan
…On Wed, Apr 17, 2019 at 11:48 AM Shuang Qiu ***@***.***> wrote:
Hi, I downloaded the software, and use the following command to build it:
mkdir build
cd build
cmake ..
make
But it returns the following error:
CMake Error: File /ext/kmerind/src/config/config.hpp.in does not exist.
Can you suggest the way to build and use it? Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA5QKXLU7hllB5RtWtYLiZ_aDbX2MIdyks5vh0JhgaJpZM4c1YrM>
.
|
Dear Tony Pan,
Thanks for the reply! I can build the program by executing the command "git submodule update --init —recursive” before cmake and make.
It generates “clear_cache” and “sys_probe” in the bin directory. Can you please provide further examples and instructions on how to run the program and how to specify parameters of the software? Thanks!
Best regards.
Shuang
在 2019年4月18日,下午12:12,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
Hi, Shuang,
I need to update the instruction. There is a step missing.
You need to run the following from the source directory.
git submodule update --init --recursive --progress
That'll download the dependencies.
Then do cmake and make.
Please let me know if you run into further issues. Thanks!
Tony Pan
On Wed, Apr 17, 2019 at 11:48 AM Shuang Qiu ***@***.******@***.***>> wrote:
Hi, I downloaded the software, and use the following command to build it:
mkdir build
cd build
cmake ..
make
But it returns the following error:
CMake Error: File /ext/kmerind/src/config/config.hpp.in<http://config.hpp.in> does not exist.
Can you suggest the way to build and use it? Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA5QKXLU7hllB5RtWtYLiZ_aDbX2MIdyks5vh0JhgaJpZM4c1YrM>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTXNH4HKP6ABDL2VYMLPQ7YKHANCNFSM4HGVRLGA>.
|
Hi, Shuang, Can you try to use ccmake instead of cmake ccmake src_dir which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit". Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by cmake --build . --target help At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code. Now some explanation of the target naming conventions. Here are a couple of example targets "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr" fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit. To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those. To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk. Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this. Thanks |
Dear Tony,
Thanks for your instruction! Unfortunately I can not use ccmake under CentOS in our lab servers. Could you please provide other instructions on how to use it with only cmake and make? For example, if I want to run bruno on dataset human chromosome 14, how can I build the program, and what parameters, e.g. kmer length, minimizer length, should I specify running it?
Best regards.
Shuang
在 2019年4月18日,下午11:02,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
Hi, Shuang,
It looks like I forgot to turn on the "Build Example Applications" by default.
Can you try to use ccmake instead of cmake
ccmake src_dir
which will present a graphical user interface for configuring the project. The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON". then press c to configure and g to "generate and exit".
Now this is going to create a somewhat large number of targets (what I needed during my evaluation). You can see a list of the build target by
cmake --build . --target help
At this point if you run "make" (or "make -j 4"), everything will be built and it might take a while, or you can run "make {target}", where {target} is one of the targets listed. All the targets came about due to c++ templating and a desire to reduce individual binary size and to avoid excessive branching in the code.
Now some explanation of the target naming conventions. Here are a couple of example targets
"compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
"compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
fastq: means it operates on fastq files. We can easily support fasta files as well - I'll explain how in a little bit.
A4: standard 2bit DNA encoding. The other alternative is A16, which supports 4 bit DNA encoding (IUPAC)
K21: kmer length. the cmake script is currently configured with 21, 31,51, 55, and 63. We can easily support others as well.
freq: this is my "code name" for an optimized graph construction algorithm. You should by default choose binaries with this label.
clean and clean_recompact: bubbles and deadends are removed and chains are recompacted. I used some simple criteria for identifying bubbles and deadends, and they may not be what you want. The code is set up so that an application developer can define their own criteria, but this requires some c++ coding.
minimizer: attempt at using minimizers for data distribution across multiple nodes - not performing well yet. You should avoid these.
incr: for when the input files pushes memory limit. This is data dependent (number of unique k-mers), but you may want to try using these incremental version if you have multiple files in your dataset and the fastq files are more than 1/16 of the total memory (a guess).
To support FASTA files and other k values, we just need to change the CMakeLists.txt file to generate the appropriate targets. I can show you how to do those.
To summarize quickly, use the versions with "A4" and "freq" labels. If you think you'll run out of memory, try the "incr" version. If you need fasta file or other k-values support, let me know. If you need to remove bubbles and dead ends, we should talk.
Making the configure process easier has been on my things to do for a while. I'll try to find some time to work on this.
Thanks
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
|
Hi, Shuang,
Let's get you compiling first.
Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
is the commandline way of changing cmake parameters. Next you can either
do "make" to build everything or use the following to build specific
binaries:
cmake --build . --target help
and pick the target binary you want, and run
make {targetname}
Once you have it running, you can invoke a binary with "--help" to see a
list of its parameters, and the corresponding explanations. You probably
will have some questions - please feel free to contact me.
The choice of k value depends on what you're trying to do. For
computational benchmarking, k <= 32 has the advantage of being long enough
to have some biological relevance while short enough to fit in a machine
word. For real assembly of human genome, however, larger k works better
for resolving repeat regions, for example Hipmer uses 55 for human and
SPADES goes up to 77 in their default settings.
As I mentioned previously, our minimizer is not ready for use, and I am
considering deprecating it completely. Please do not use it for genome
assembly or performance benchmarking.
Thanks, and let me know what other questions you may have.
…On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.***> wrote:
Dear Tony,
Thanks for your instruction! Unfortunately I can not use ccmake under
CentOS in our lab servers. Could you please provide other instructions on
how to use it with only cmake and make? For example, if I want to run bruno
on dataset human chromosome 14, how can I build the program, and what
parameters, e.g. kmer length, minimizer length, should I specify running
it?
Best regards.
Shuang
在 2019年4月18日,下午11:02,Tony Pan ***@***.***<mailto:
***@***.***>> 写道:
Hi, Shuang,
It looks like I forgot to turn on the "Build Example Applications" by
default.
Can you try to use ccmake instead of cmake
ccmake src_dir
which will present a graphical user interface for configuring the project.
The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON".
then press c to configure and g to "generate and exit".
Now this is going to create a somewhat large number of targets (what I
needed during my evaluation). You can see a list of the build target by
cmake --build . --target help
At this point if you run "make" (or "make -j 4"), everything will be built
and it might take a while, or you can run "make {target}", where {target}
is one of the targets listed. All the targets came about due to c++
templating and a desire to reduce individual binary size and to avoid
excessive branching in the code.
Now some explanation of the target naming conventions. Here are a couple
of example targets
"compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
"compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
fastq: means it operates on fastq files. We can easily support fasta files
as well - I'll explain how in a little bit.
A4: standard 2bit DNA encoding. The other alternative is A16, which
supports 4 bit DNA encoding (IUPAC)
K21: kmer length. the cmake script is currently configured with 21, 31,51,
55, and 63. We can easily support others as well.
freq: this is my "code name" for an optimized graph construction
algorithm. You should by default choose binaries with this label.
clean and clean_recompact: bubbles and deadends are removed and chains are
recompacted. I used some simple criteria for identifying bubbles and
deadends, and they may not be what you want. The code is set up so that an
application developer can define their own criteria, but this requires some
c++ coding.
minimizer: attempt at using minimizers for data distribution across
multiple nodes - not performing well yet. You should avoid these.
incr: for when the input files pushes memory limit. This is data dependent
(number of unique k-mers), but you may want to try using these incremental
version if you have multiple files in your dataset and the fastq files are
more than 1/16 of the total memory (a guess).
To support FASTA files and other k values, we just need to change the
CMakeLists.txt file to generate the appropriate targets. I can show you how
to do those.
To summarize quickly, use the versions with "A4" and "freq" labels. If you
think you'll run out of memory, try the "incr" version. If you need fasta
file or other k-values support, let me know. If you need to remove bubbles
and dead ends, we should talk.
Making the configure process easier has been on my things to do for a
while. I'll try to find some time to work on this.
Thanks
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
.
|
Hi, Tony,
Thanks for your reply! I can build the program now. Can you please specify what I should modify in the CMakeLists.txt, so that I can build a binary with K=29 and input file format = fasta?
Best regards.
Shuang
在 2019年5月1日,下午10:11,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
Hi, Shuang,
Let's get you compiling first.
Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
is the commandline way of changing cmake parameters. Next you can either
do "make" to build everything or use the following to build specific
binaries:
cmake --build . --target help
and pick the target binary you want, and run
make {targetname}
Once you have it running, you can invoke a binary with "--help" to see a
list of its parameters, and the corresponding explanations. You probably
will have some questions - please feel free to contact me.
The choice of k value depends on what you're trying to do. For
computational benchmarking, k <= 32 has the advantage of being long enough
to have some biological relevance while short enough to fit in a machine
word. For real assembly of human genome, however, larger k works better
for resolving repeat regions, for example Hipmer uses 55 for human and
SPADES goes up to 77 in their default settings.
As I mentioned previously, our minimizer is not ready for use, and I am
considering deprecating it completely. Please do not use it for genome
assembly or performance benchmarking.
Thanks, and let me know what other questions you may have.
On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***>> wrote:
Dear Tony,
Thanks for your instruction! Unfortunately I can not use ccmake under
CentOS in our lab servers. Could you please provide other instructions on
how to use it with only cmake and make? For example, if I want to run bruno
on dataset human chromosome 14, how can I build the program, and what
parameters, e.g. kmer length, minimizer length, should I specify running
it?
Best regards.
Shuang
在 2019年4月18日,下午11:02,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>>> 写道:
Hi, Shuang,
It looks like I forgot to turn on the "Build Example Applications" by
default.
Can you try to use ccmake instead of cmake
ccmake src_dir
which will present a graphical user interface for configuring the project.
The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to "ON".
then press c to configure and g to "generate and exit".
Now this is going to create a somewhat large number of targets (what I
needed during my evaluation). You can see a list of the build target by
cmake --build . --target help
At this point if you run "make" (or "make -j 4"), everything will be built
and it might take a while, or you can run "make {target}", where {target}
is one of the targets listed. All the targets came about due to c++
templating and a desire to reduce individual binary size and to avoid
excessive branching in the code.
Now some explanation of the target naming conventions. Here are a couple
of example targets
"compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
"compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
fastq: means it operates on fastq files. We can easily support fasta files
as well - I'll explain how in a little bit.
A4: standard 2bit DNA encoding. The other alternative is A16, which
supports 4 bit DNA encoding (IUPAC)
K21: kmer length. the cmake script is currently configured with 21, 31,51,
55, and 63. We can easily support others as well.
freq: this is my "code name" for an optimized graph construction
algorithm. You should by default choose binaries with this label.
clean and clean_recompact: bubbles and deadends are removed and chains are
recompacted. I used some simple criteria for identifying bubbles and
deadends, and they may not be what you want. The code is set up so that an
application developer can define their own criteria, but this requires some
c++ coding.
minimizer: attempt at using minimizers for data distribution across
multiple nodes - not performing well yet. You should avoid these.
incr: for when the input files pushes memory limit. This is data dependent
(number of unique k-mers), but you may want to try using these incremental
version if you have multiple files in your dataset and the fastq files are
more than 1/16 of the total memory (a guess).
To support FASTA files and other k values, we just need to change the
CMakeLists.txt file to generate the appropriate targets. I can show you how
to do those.
To summarize quickly, use the versions with "A4" and "freq" labels. If you
think you'll run out of memory, try the "incr" version. If you need fasta
file or other k-values support, let me know. If you need to remove bubbles
and dead ends, we should talk.
Making the configure process easier has been on my things to do for a
while. I'll try to find some time to work on this.
Thanks
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
|
Hi, Shuang,
Good to hear.
If you do not need bubble and deadend removal, then edit
test/test/CMakeLists.txt,
1. add "29" to line 164.
2. uncomment lines 183-186 for the "freq" version of FASTA
3. uncomment lines 195-198 for the "incr" version of FASTA
If you need bubble and deadend removal, then
1. add "29" to line 209
2. duplicate lines 218-221 and change all occurrences of "fastq" and
"FASTQ" to "fasta" and "FASTA" in the duplicates.
3. for the "incr" verison, duplicate lines 228-231, and change all
occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the
duplicates.
Then rerun cmake in your build directory, and compile the K29 versions of
the targets.
That should be it. Please let me know if you run into any issues. Thanks!
Tony
…On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.***> wrote:
Hi, Tony,
Thanks for your reply! I can build the program now. Can you please specify
what I should modify in the CMakeLists.txt, so that I can build a binary
with K=29 and input file format = fasta?
Best regards.
Shuang
在 2019年5月1日,下午10:11,Tony Pan ***@***.***<mailto:
***@***.***>> 写道:
Hi, Shuang,
Let's get you compiling first.
Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
is the commandline way of changing cmake parameters. Next you can either
do "make" to build everything or use the following to build specific
binaries:
cmake --build . --target help
and pick the target binary you want, and run
make {targetname}
Once you have it running, you can invoke a binary with "--help" to see a
list of its parameters, and the corresponding explanations. You probably
will have some questions - please feel free to contact me.
The choice of k value depends on what you're trying to do. For
computational benchmarking, k <= 32 has the advantage of being long enough
to have some biological relevance while short enough to fit in a machine
word. For real assembly of human genome, however, larger k works better
for resolving repeat regions, for example Hipmer uses 55 for human and
SPADES goes up to 77 in their default settings.
As I mentioned previously, our minimizer is not ready for use, and I am
considering deprecating it completely. Please do not use it for genome
assembly or performance benchmarking.
Thanks, and let me know what other questions you may have.
On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.***
***@***.***>> wrote:
> Dear Tony,
>
> Thanks for your instruction! Unfortunately I can not use ccmake under
> CentOS in our lab servers. Could you please provide other instructions
on
> how to use it with only cmake and make? For example, if I want to run
bruno
> on dataset human chromosome 14, how can I build the program, and what
> parameters, e.g. kmer length, minimizer length, should I specify running
> it?
>
> Best regards.
>
> Shuang
>
> 在 2019年4月18日,下午11:02,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>>> 写道:
>
>
> Hi, Shuang,
> It looks like I forgot to turn on the "Build Example Applications" by
> default.
>
> Can you try to use ccmake instead of cmake
>
> ccmake src_dir
>
> which will present a graphical user interface for configuring the
project.
> The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to
"ON".
> then press c to configure and g to "generate and exit".
>
> Now this is going to create a somewhat large number of targets (what I
> needed during my evaluation). You can see a list of the build target by
>
> cmake --build . --target help
>
> At this point if you run "make" (or "make -j 4"), everything will be
built
> and it might take a while, or you can run "make {target}", where
{target}
> is one of the targets listed. All the targets came about due to c++
> templating and a desire to reduce individual binary size and to avoid
> excessive branching in the code.
>
> Now some explanation of the target naming conventions. Here are a couple
> of example targets
>
> "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
> "compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
>
> fastq: means it operates on fastq files. We can easily support fasta
files
> as well - I'll explain how in a little bit.
> A4: standard 2bit DNA encoding. The other alternative is A16, which
> supports 4 bit DNA encoding (IUPAC)
> K21: kmer length. the cmake script is currently configured with 21,
31,51,
> 55, and 63. We can easily support others as well.
> freq: this is my "code name" for an optimized graph construction
> algorithm. You should by default choose binaries with this label.
> clean and clean_recompact: bubbles and deadends are removed and chains
are
> recompacted. I used some simple criteria for identifying bubbles and
> deadends, and they may not be what you want. The code is set up so that
an
> application developer can define their own criteria, but this requires
some
> c++ coding.
> minimizer: attempt at using minimizers for data distribution across
> multiple nodes - not performing well yet. You should avoid these.
> incr: for when the input files pushes memory limit. This is data
dependent
> (number of unique k-mers), but you may want to try using these
incremental
> version if you have multiple files in your dataset and the fastq files
are
> more than 1/16 of the total memory (a guess).
>
> To support FASTA files and other k values, we just need to change the
> CMakeLists.txt file to generate the appropriate targets. I can show you
how
> to do those.
>
> To summarize quickly, use the versions with "A4" and "freq" labels. If
you
> think you'll run out of memory, try the "incr" version. If you need
fasta
> file or other k-values support, let me know. If you need to remove
bubbles
> and dead ends, we should talk.
>
> Making the configure process easier has been on my things to do for a
> while. I'll try to find some time to work on this.
>
> Thanks
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<
> #1 (comment)>, or
> mute the thread<
>
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA>
.
|
Hi, Tony,
Thanks for your reply! Then how can I specify the input file? I didn’t see any parameter specification when I ran the compiled binary.
Best regards.
Shuang
在 2019年5月2日,上午1:20,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
Hi, Shuang,
Good to hear.
If you do not need bubble and deadend removal, then edit
test/test/CMakeLists.txt,
1. add "29" to line 164.
2. uncomment lines 183-186 for the "freq" version of FASTA
3. uncomment lines 195-198 for the "incr" version of FASTA
If you need bubble and deadend removal, then
1. add "29" to line 209
2. duplicate lines 218-221 and change all occurrences of "fastq" and
"FASTQ" to "fasta" and "FASTA" in the duplicates.
3. for the "incr" verison, duplicate lines 228-231, and change all
occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the
duplicates.
Then rerun cmake in your build directory, and compile the K29 versions of
the targets.
That should be it. Please let me know if you run into any issues. Thanks!
Tony
On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.******@***.***>> wrote:
Hi, Tony,
Thanks for your reply! I can build the program now. Can you please specify
what I should modify in the CMakeLists.txt, so that I can build a binary
with K=29 and input file format = fasta?
Best regards.
Shuang
在 2019年5月1日,下午10:11,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>>> 写道:
Hi, Shuang,
Let's get you compiling first.
Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
is the commandline way of changing cmake parameters. Next you can either
do "make" to build everything or use the following to build specific
binaries:
cmake --build . --target help
and pick the target binary you want, and run
make {targetname}
Once you have it running, you can invoke a binary with "--help" to see a
list of its parameters, and the corresponding explanations. You probably
will have some questions - please feel free to contact me.
The choice of k value depends on what you're trying to do. For
computational benchmarking, k <= 32 has the advantage of being long enough
to have some biological relevance while short enough to fit in a machine
word. For real assembly of human genome, however, larger k works better
for resolving repeat regions, for example Hipmer uses 55 for human and
SPADES goes up to 77 in their default settings.
As I mentioned previously, our minimizer is not ready for use, and I am
considering deprecating it completely. Please do not use it for genome
assembly or performance benchmarking.
Thanks, and let me know what other questions you may have.
On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***>
***@***.***>> wrote:
> Dear Tony,
>
> Thanks for your instruction! Unfortunately I can not use ccmake under
> CentOS in our lab servers. Could you please provide other instructions
on
> how to use it with only cmake and make? For example, if I want to run
bruno
> on dataset human chromosome 14, how can I build the program, and what
> parameters, e.g. kmer length, minimizer length, should I specify running
> it?
>
> Best regards.
>
> Shuang
>
> 在 2019年4月18日,下午11:02,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>><mailto:
> ***@***.******@***.******@***.***>>> 写道:
>
>
> Hi, Shuang,
> It looks like I forgot to turn on the "Build Example Applications" by
> default.
>
> Can you try to use ccmake instead of cmake
>
> ccmake src_dir
>
> which will present a graphical user interface for configuring the
project.
> The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to
"ON".
> then press c to configure and g to "generate and exit".
>
> Now this is going to create a somewhat large number of targets (what I
> needed during my evaluation). You can see a list of the build target by
>
> cmake --build . --target help
>
> At this point if you run "make" (or "make -j 4"), everything will be
built
> and it might take a while, or you can run "make {target}", where
{target}
> is one of the targets listed. All the targets came about due to c++
> templating and a desire to reduce individual binary size and to avoid
> excessive branching in the code.
>
> Now some explanation of the target naming conventions. Here are a couple
> of example targets
>
> "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
> "compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
>
> fastq: means it operates on fastq files. We can easily support fasta
files
> as well - I'll explain how in a little bit.
> A4: standard 2bit DNA encoding. The other alternative is A16, which
> supports 4 bit DNA encoding (IUPAC)
> K21: kmer length. the cmake script is currently configured with 21,
31,51,
> 55, and 63. We can easily support others as well.
> freq: this is my "code name" for an optimized graph construction
> algorithm. You should by default choose binaries with this label.
> clean and clean_recompact: bubbles and deadends are removed and chains
are
> recompacted. I used some simple criteria for identifying bubbles and
> deadends, and they may not be what you want. The code is set up so that
an
> application developer can define their own criteria, but this requires
some
> c++ coding.
> minimizer: attempt at using minimizers for data distribution across
> multiple nodes - not performing well yet. You should avoid these.
> incr: for when the input files pushes memory limit. This is data
dependent
> (number of unique k-mers), but you may want to try using these
incremental
> version if you have multiple files in your dataset and the fastq files
are
> more than 1/16 of the total memory (a guess).
>
> To support FASTA files and other k values, we just need to change the
> CMakeLists.txt file to generate the appropriate targets. I can show you
how
> to do those.
>
> To summarize quickly, use the versions with "A4" and "freq" labels. If
you
> think you'll run out of memory, try the "incr" version. If you need
fasta
> file or other k-values support, let me know. If you need to remove
bubbles
> and dead ends, we should talk.
>
> Making the configure process easier has been on my things to do for a
> while. I'll try to find some time to work on this.
>
> Thanks
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<
> #1 (comment)>, or
> mute the thread<
>
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>.
|
Hi, Shuang,
Can you verify that when you call the binary you get at least something
like this?
EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact
PARSE ERROR:
Required argument missing: filenames
Brief USAGE:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C]
[-N] [-R] [-B] [-U <uint16>] ...
[-L <uint16>] ... [-T] [-O
<string>] [--] [--version] [-h]
<string> ...
For complete USAGE and HELP type:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help
You can add the "--help" switch to see the full parameter list. Let me
know if you have any questions - the choice of switches will depend on what
your goals are - benchmarking, generating and writing out the contigs, with
or without bubble and deadend cleaning, etc. You can list all fasta files
at the end of the command.
I also want to re-emphasize that the bubble and deadend cleaning is meant
as a demonstration of the library's capabilitiy and is based on my
definition of bubbles and deadends. If you need graph cleaning, we should
talk to make sure your desired logic is implemented.
Thanks!
Tony
…On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.***> wrote:
Hi, Tony,
Thanks for your reply! Then how can I specify the input file? I didn’t see
any parameter specification when I ran the compiled binary.
Best regards.
Shuang
在 2019年5月2日,上午1:20,Tony Pan ***@***.***<mailto:
***@***.***>> 写道:
Hi, Shuang,
Good to hear.
If you do not need bubble and deadend removal, then edit
test/test/CMakeLists.txt,
1. add "29" to line 164.
2. uncomment lines 183-186 for the "freq" version of FASTA
3. uncomment lines 195-198 for the "incr" version of FASTA
If you need bubble and deadend removal, then
1. add "29" to line 209
2. duplicate lines 218-221 and change all occurrences of "fastq" and
"FASTQ" to "fasta" and "FASTA" in the duplicates.
3. for the "incr" verison, duplicate lines 228-231, and change all
occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the
duplicates.
Then rerun cmake in your build directory, and compile the K29 versions of
the targets.
That should be it. Please let me know if you run into any issues. Thanks!
Tony
On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.***
***@***.***>> wrote:
> Hi, Tony,
>
> Thanks for your reply! I can build the program now. Can you please
specify
> what I should modify in the CMakeLists.txt, so that I can build a binary
> with K=29 and input file format = fasta?
>
> Best regards.
>
> Shuang
>
> 在 2019年5月1日,下午10:11,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>>> 写道:
>
> Hi, Shuang,
> Let's get you compiling first.
>
> Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
> is the commandline way of changing cmake parameters. Next you can either
> do "make" to build everything or use the following to build specific
> binaries:
>
>
> cmake --build . --target help
>
> and pick the target binary you want, and run
>
> make {targetname}
>
> Once you have it running, you can invoke a binary with "--help" to see a
> list of its parameters, and the corresponding explanations. You probably
> will have some questions - please feel free to contact me.
>
>
> The choice of k value depends on what you're trying to do. For
> computational benchmarking, k <= 32 has the advantage of being long
enough
> to have some biological relevance while short enough to fit in a machine
> word. For real assembly of human genome, however, larger k works better
> for resolving repeat regions, for example Hipmer uses 55 for human and
> SPADES goes up to 77 in their default settings.
>
> As I mentioned previously, our minimizer is not ready for use, and I am
> considering deprecating it completely. Please do not use it for genome
> assembly or performance benchmarking.
>
>
> Thanks, and let me know what other questions you may have.
>
> On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.***
***@***.***>
> ***@***.***>> wrote:
>
> > Dear Tony,
> >
> > Thanks for your instruction! Unfortunately I can not use ccmake under
> > CentOS in our lab servers. Could you please provide other instructions
> on
> > how to use it with only cmake and make? For example, if I want to run
> bruno
> > on dataset human chromosome 14, how can I build the program, and what
> > parameters, e.g. kmer length, minimizer length, should I specify
running
> > it?
> >
> > Best regards.
> >
> > Shuang
> >
> > 在 2019年4月18日,下午11:02,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>><mailto:
> > ***@***.******@***.***><mailto:
***@***.***>>> 写道:
> >
> >
> > Hi, Shuang,
> > It looks like I forgot to turn on the "Build Example Applications" by
> > default.
> >
> > Can you try to use ccmake instead of cmake
> >
> > ccmake src_dir
> >
> > which will present a graphical user interface for configuring the
> project.
> > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to
> "ON".
> > then press c to configure and g to "generate and exit".
> >
> > Now this is going to create a somewhat large number of targets (what I
> > needed during my evaluation). You can see a list of the build target
by
> >
> > cmake --build . --target help
> >
> > At this point if you run "make" (or "make -j 4"), everything will be
> built
> > and it might take a while, or you can run "make {target}", where
> {target}
> > is one of the targets listed. All the targets came about due to c++
> > templating and a desire to reduce individual binary size and to avoid
> > excessive branching in the code.
> >
> > Now some explanation of the target naming conventions. Here are a
couple
> > of example targets
> >
> > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
> > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
> >
> > fastq: means it operates on fastq files. We can easily support fasta
> files
> > as well - I'll explain how in a little bit.
> > A4: standard 2bit DNA encoding. The other alternative is A16, which
> > supports 4 bit DNA encoding (IUPAC)
> > K21: kmer length. the cmake script is currently configured with 21,
> 31,51,
> > 55, and 63. We can easily support others as well.
> > freq: this is my "code name" for an optimized graph construction
> > algorithm. You should by default choose binaries with this label.
> > clean and clean_recompact: bubbles and deadends are removed and chains
> are
> > recompacted. I used some simple criteria for identifying bubbles and
> > deadends, and they may not be what you want. The code is set up so
that
> an
> > application developer can define their own criteria, but this requires
> some
> > c++ coding.
> > minimizer: attempt at using minimizers for data distribution across
> > multiple nodes - not performing well yet. You should avoid these.
> > incr: for when the input files pushes memory limit. This is data
> dependent
> > (number of unique k-mers), but you may want to try using these
> incremental
> > version if you have multiple files in your dataset and the fastq files
> are
> > more than 1/16 of the total memory (a guess).
> >
> > To support FASTA files and other k values, we just need to change the
> > CMakeLists.txt file to generate the appropriate targets. I can show
you
> how
> > to do those.
> >
> > To summarize quickly, use the versions with "A4" and "freq" labels. If
> you
> > think you'll run out of memory, try the "incr" version. If you need
> fasta
> > file or other k-values support, let me know. If you need to remove
> bubbles
> > and dead ends, we should talk.
> >
> > Making the configure process easier has been on my things to do for a
> > while. I'll try to find some time to work on this.
> >
> > Thanks
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub<
> > #1 (comment)>,
or
> > mute the thread<
> >
>
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
>
> >
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <#1 (comment)>,
or
> mute
> > the thread
> > <
>
https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
>
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<
> #1 (comment)>, or
> mute the thread<
>
https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA>
.
|
Hi, Tony,
Thanks for your reply! when I call the binary, it didn’t ask for any input parameter. It just ran and output results as follows,
READING <path to bruno>/test/data/test.debruijn.small.fastq via posix
total size read is 939
PARSING and INSERT
rank 0 BEFFORE input=210 size=0 buckets=512
rank 0 AFTER input=210 size=140 reported=140 buckets=512
PARSING and INSERT DONE: total size after insert/rehash is 140
HISTOGRAM
TOTAL Edge Existence Histogram:
0 1 2 3 4
0 0 5 0 0 0
1 1 132 0 1 0
2 0 0 0 0 0
3 0 1 0 0 0
4 0 0 0 0 0
rank 0 finished checking index
PRINT BRANCHES
PRINT BRANCH KMERS
SIZES simple biedge size: 24 kmer size 8 node size 32
MAKE CHAINMAP
MARK TERMINI NEXT TO BRANCHES
estimate available mem=124400472064 bytes, p=1, alloc 124400472064 elements
estimate num chain terminal updates=140, value_type size=24 bytes
LIST RANKING
rank 0 iter 1 updated 264, unfinished 137 internal chain nodes 117
rank 0 iter 2 updated 260, unfinished 137 internal chain nodes 97
rank 0 iter 3 updated 248, unfinished 137 internal chain nodes 57
rank 0 iter 4 updated 214, unfinished 114 internal chain nodes 0
REMOVE ISOLATED
REMOVED 0 isolated nodes
rank 0/1 input is EMPTY.
rank 0 BEFORE input=210 size=0 buckets=512
rank 0 AFTER input=210 size=140 buckets=512
rank 0 map_base get_multiplicity called
rank 0 BEFORE input=138 size=0 buckets=512
rank 0 AFTER input=138 size=6 buckets=512
PRINT CHAIN String
PRINT CHAIN Nodes
COMPUTE CHAIN FREQ SUMMARY
GATHER NON_REP_END EDGE FREQUENCY
rank 0 result size 6 capacity 7
CREATE CHAIN EDGE FREQUENCIES
PRINT CHAIN EDGE FREQS
Best regards.
Shuang
在 2019年5月3日,下午8:46,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
Hi, Shuang,
Can you verify that when you call the binary you get at least something
like this?
EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact
PARSE ERROR:
Required argument missing: filenames
Brief USAGE:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C]
[-N] [-R] [-B] [-U <uint16>] ...
[-L <uint16>] ... [-T] [-O
<string>] [--] [--version] [-h]
<string> ...
For complete USAGE and HELP type:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help
You can add the "--help" switch to see the full parameter list. Let me
know if you have any questions - the choice of switches will depend on what
your goals are - benchmarking, generating and writing out the contigs, with
or without bubble and deadend cleaning, etc. You can list all fasta files
at the end of the command.
I also want to re-emphasize that the bubble and deadend cleaning is meant
as a demonstration of the library's capabilitiy and is based on my
definition of bubbles and deadends. If you need graph cleaning, we should
talk to make sure your desired logic is implemented.
Thanks!
Tony
On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.******@***.***>> wrote:
Hi, Tony,
Thanks for your reply! Then how can I specify the input file? I didn’t see
any parameter specification when I ran the compiled binary.
Best regards.
Shuang
在 2019年5月2日,上午1:20,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>>> 写道:
Hi, Shuang,
Good to hear.
If you do not need bubble and deadend removal, then edit
test/test/CMakeLists.txt,
1. add "29" to line 164.
2. uncomment lines 183-186 for the "freq" version of FASTA
3. uncomment lines 195-198 for the "incr" version of FASTA
If you need bubble and deadend removal, then
1. add "29" to line 209
2. duplicate lines 218-221 and change all occurrences of "fastq" and
"FASTQ" to "fasta" and "FASTA" in the duplicates.
3. for the "incr" verison, duplicate lines 228-231, and change all
occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the
duplicates.
Then rerun cmake in your build directory, and compile the K29 versions of
the targets.
That should be it. Please let me know if you run into any issues. Thanks!
Tony
On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.******@***.***>
***@***.***>> wrote:
> Hi, Tony,
>
> Thanks for your reply! I can build the program now. Can you please
specify
> what I should modify in the CMakeLists.txt, so that I can build a binary
> with K=29 and input file format = fasta?
>
> Best regards.
>
> Shuang
>
> 在 2019年5月1日,下午10:11,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>><mailto:
> ***@***.******@***.******@***.***>>> 写道:
>
> Hi, Shuang,
> Let's get you compiling first.
>
> Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command. This
> is the commandline way of changing cmake parameters. Next you can either
> do "make" to build everything or use the following to build specific
> binaries:
>
>
> cmake --build . --target help
>
> and pick the target binary you want, and run
>
> make {targetname}
>
> Once you have it running, you can invoke a binary with "--help" to see a
> list of its parameters, and the corresponding explanations. You probably
> will have some questions - please feel free to contact me.
>
>
> The choice of k value depends on what you're trying to do. For
> computational benchmarking, k <= 32 has the advantage of being long
enough
> to have some biological relevance while short enough to fit in a machine
> word. For real assembly of human genome, however, larger k works better
> for resolving repeat regions, for example Hipmer uses 55 for human and
> SPADES goes up to 77 in their default settings.
>
> As I mentioned previously, our minimizer is not ready for use, and I am
> considering deprecating it completely. Please do not use it for genome
> assembly or performance benchmarking.
>
>
> Thanks, and let me know what other questions you may have.
>
> On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.******@***.***>
***@***.***>
> ***@***.***>> wrote:
>
> > Dear Tony,
> >
> > Thanks for your instruction! Unfortunately I can not use ccmake under
> > CentOS in our lab servers. Could you please provide other instructions
> on
> > how to use it with only cmake and make? For example, if I want to run
> bruno
> > on dataset human chromosome 14, how can I build the program, and what
> > parameters, e.g. kmer length, minimizer length, should I specify
running
> > it?
> >
> > Best regards.
> >
> > Shuang
> >
> > 在 2019年4月18日,下午11:02,Tony Pan ***@***.******@***.***><mailto:
***@***.******@***.***>><mailto:
> ***@***.******@***.******@***.***>><mailto:
> > ***@***.******@***.******@***.***><mailto:
***@***.******@***.***>>>> 写道:
> >
> >
> > Hi, Shuang,
> > It looks like I forgot to turn on the "Build Example Applications" by
> > default.
> >
> > Can you try to use ccmake instead of cmake
> >
> > ccmake src_dir
> >
> > which will present a graphical user interface for configuring the
> project.
> > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to
> "ON".
> > then press c to configure and g to "generate and exit".
> >
> > Now this is going to create a somewhat large number of targets (what I
> > needed during my evaluation). You can see a list of the build target
by
> >
> > cmake --build . --target help
> >
> > At this point if you run "make" (or "make -j 4"), everything will be
> built
> > and it might take a while, or you can run "make {target}", where
> {target}
> > is one of the targets listed. All the targets came about due to c++
> > templating and a desire to reduce individual binary size and to avoid
> > excessive branching in the code.
> >
> > Now some explanation of the target naming conventions. Here are a
couple
> > of example targets
> >
> > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
> > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
> >
> > fastq: means it operates on fastq files. We can easily support fasta
> files
> > as well - I'll explain how in a little bit.
> > A4: standard 2bit DNA encoding. The other alternative is A16, which
> > supports 4 bit DNA encoding (IUPAC)
> > K21: kmer length. the cmake script is currently configured with 21,
> 31,51,
> > 55, and 63. We can easily support others as well.
> > freq: this is my "code name" for an optimized graph construction
> > algorithm. You should by default choose binaries with this label.
> > clean and clean_recompact: bubbles and deadends are removed and chains
> are
> > recompacted. I used some simple criteria for identifying bubbles and
> > deadends, and they may not be what you want. The code is set up so
that
> an
> > application developer can define their own criteria, but this requires
> some
> > c++ coding.
> > minimizer: attempt at using minimizers for data distribution across
> > multiple nodes - not performing well yet. You should avoid these.
> > incr: for when the input files pushes memory limit. This is data
> dependent
> > (number of unique k-mers), but you may want to try using these
> incremental
> > version if you have multiple files in your dataset and the fastq files
> are
> > more than 1/16 of the total memory (a guess).
> >
> > To support FASTA files and other k values, we just need to change the
> > CMakeLists.txt file to generate the appropriate targets. I can show
you
> how
> > to do those.
> >
> > To summarize quickly, use the versions with "A4" and "freq" labels. If
> you
> > think you'll run out of memory, try the "incr" version. If you need
> fasta
> > file or other k-values support, let me know. If you need to remove
> bubbles
> > and dead ends, we should talk.
> >
> > Making the configure process easier has been on my things to do for a
> > while. I'll try to find some time to work on this.
> >
> > Thanks
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub<
> > #1 (comment)>,
or
> > mute the thread<
> >
>
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
>
> >
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <#1 (comment)>,
or
> mute
> > the thread
> > <
>
https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
>
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<
> #1 (comment)>, or
> mute the thread<
>
https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA>
.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#1 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AE6DTTVVBO52TH4UYLFOUXLPTQX3RANCNFSM4HGVRLGA>.
|
Hi, Shuang,
Sorry for the late reply - was on vacation.
Can you verify the exact command you used? And also the git commit number
you used?
What do you get when you call?
"bin/compact_debruijn_graph_fastq_A4_K31_freq --help"
The binaries are not interactive. The parameters need to be specified at
the commandline as parameters to the binary. When you call the binary
without any parameters, it directly starts execution using the default
parameters, which uses the included test data - essentially it becomes an
integration test run.
Tony
…On Fri, May 24, 2019 at 7:45 AM Shuang Qiu ***@***.***> wrote:
Hi, Tony,
Thanks for your reply! when I call the binary, it didn’t ask for any input
parameter. It just ran and output results as follows,
READING <path to bruno>/test/data/test.debruijn.small.fastq via posix
total size read is 939
PARSING and INSERT
rank 0 BEFFORE input=210 size=0 buckets=512
rank 0 AFTER input=210 size=140 reported=140 buckets=512
PARSING and INSERT DONE: total size after insert/rehash is 140
HISTOGRAM
TOTAL Edge Existence Histogram:
0 1 2 3 4
0 0 5 0 0 0
1 1 132 0 1 0
2 0 0 0 0 0
3 0 1 0 0 0
4 0 0 0 0 0
rank 0 finished checking index
PRINT BRANCHES
PRINT BRANCH KMERS
SIZES simple biedge size: 24 kmer size 8 node size 32
MAKE CHAINMAP
MARK TERMINI NEXT TO BRANCHES
estimate available mem=124400472064 bytes, p=1, alloc 124400472064
elements
estimate num chain terminal updates=140, value_type size=24 bytes
LIST RANKING
rank 0 iter 1 updated 264, unfinished 137 internal chain nodes 117
rank 0 iter 2 updated 260, unfinished 137 internal chain nodes 97
rank 0 iter 3 updated 248, unfinished 137 internal chain nodes 57
rank 0 iter 4 updated 214, unfinished 114 internal chain nodes 0
REMOVE ISOLATED
REMOVED 0 isolated nodes
rank 0/1 input is EMPTY.
rank 0 BEFORE input=210 size=0 buckets=512
rank 0 AFTER input=210 size=140 buckets=512
rank 0 map_base get_multiplicity called
rank 0 BEFORE input=138 size=0 buckets=512
rank 0 AFTER input=138 size=6 buckets=512
PRINT CHAIN String
PRINT CHAIN Nodes
COMPUTE CHAIN FREQ SUMMARY
GATHER NON_REP_END EDGE FREQUENCY
rank 0 result size 6 capacity 7
CREATE CHAIN EDGE FREQUENCIES
PRINT CHAIN EDGE FREQS
Best regards.
Shuang
在 2019年5月3日,下午8:46,Tony Pan ***@***.***<mailto:
***@***.***>> 写道:
Hi, Shuang,
Can you verify that when you call the binary you get at least something
like this?
EXECUTING bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact
PARSE ERROR:
Required argument missing: filenames
Brief USAGE:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact [-M] [-C]
[-N] [-R] [-B] [-U <uint16>] ...
[-L <uint16>] ... [-T] [-O
<string>] [--] [--version] [-h]
<string> ...
For complete USAGE and HELP type:
bin/compact_debruijn_graph_fastq_A4_K31_freq_clean_recompact --help
You can add the "--help" switch to see the full parameter list. Let me
know if you have any questions - the choice of switches will depend on
what
your goals are - benchmarking, generating and writing out the contigs,
with
or without bubble and deadend cleaning, etc. You can list all fasta files
at the end of the command.
I also want to re-emphasize that the bubble and deadend cleaning is meant
as a demonstration of the library's capabilitiy and is based on my
definition of bubbles and deadends. If you need graph cleaning, we should
talk to make sure your desired logic is implemented.
Thanks!
Tony
On Fri, May 3, 2019 at 12:49 AM Shuang Qiu ***@***.***
***@***.***>> wrote:
> Hi, Tony,
>
> Thanks for your reply! Then how can I specify the input file? I didn’t
see
> any parameter specification when I ran the compiled binary.
>
> Best regards.
>
> Shuang
>
> 在 2019年5月2日,上午1:20,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>>> 写道:
>
> Hi, Shuang,
> Good to hear.
>
> If you do not need bubble and deadend removal, then edit
> test/test/CMakeLists.txt,
>
> 1. add "29" to line 164.
> 2. uncomment lines 183-186 for the "freq" version of FASTA
> 3. uncomment lines 195-198 for the "incr" version of FASTA
>
> If you need bubble and deadend removal, then
>
> 1. add "29" to line 209
> 2. duplicate lines 218-221 and change all occurrences of "fastq" and
> "FASTQ" to "fasta" and "FASTA" in the duplicates.
> 3. for the "incr" verison, duplicate lines 228-231, and change all
> occurrences of "fastq" and "FASTQ" to "fasta" and "FASTA" in the
> duplicates.
>
>
> Then rerun cmake in your build directory, and compile the K29 versions
of
> the targets.
>
> That should be it. Please let me know if you run into any issues.
Thanks!
>
> Tony
>
> On Wed, May 1, 2019 at 10:48 AM Shuang Qiu ***@***.***
***@***.***>
> ***@***.***>> wrote:
>
> > Hi, Tony,
> >
> > Thanks for your reply! I can build the program now. Can you please
> specify
> > what I should modify in the CMakeLists.txt, so that I can build a
binary
> > with K=29 and input file format = fasta?
> >
> > Best regards.
> >
> > Shuang
> >
> > 在 2019年5月1日,下午10:11,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>><mailto:
> > ***@***.******@***.***><mailto:
***@***.***>>> 写道:
> >
> > Hi, Shuang,
> > Let's get you compiling first.
> >
> > Try adding "-DBUILD_EXAMPLE_APPLICATIONS=ON" to your cmake command.
This
> > is the commandline way of changing cmake parameters. Next you can
either
> > do "make" to build everything or use the following to build specific
> > binaries:
> >
> >
> > cmake --build . --target help
> >
> > and pick the target binary you want, and run
> >
> > make {targetname}
> >
> > Once you have it running, you can invoke a binary with "--help" to see
a
> > list of its parameters, and the corresponding explanations. You
probably
> > will have some questions - please feel free to contact me.
> >
> >
> > The choice of k value depends on what you're trying to do. For
> > computational benchmarking, k <= 32 has the advantage of being long
> enough
> > to have some biological relevance while short enough to fit in a
machine
> > word. For real assembly of human genome, however, larger k works
better
> > for resolving repeat regions, for example Hipmer uses 55 for human and
> > SPADES goes up to 77 in their default settings.
> >
> > As I mentioned previously, our minimizer is not ready for use, and I
am
> > considering deprecating it completely. Please do not use it for genome
> > assembly or performance benchmarking.
> >
> >
> > Thanks, and let me know what other questions you may have.
> >
> > On Wed, May 1, 2019 at 1:04 AM Shuang Qiu ***@***.***
***@***.***>
> ***@***.***>
> > ***@***.***>> wrote:
> >
> > > Dear Tony,
> > >
> > > Thanks for your instruction! Unfortunately I can not use ccmake
under
> > > CentOS in our lab servers. Could you please provide other
instructions
> > on
> > > how to use it with only cmake and make? For example, if I want to
run
> > bruno
> > > on dataset human chromosome 14, how can I build the program, and
what
> > > parameters, e.g. kmer length, minimizer length, should I specify
> running
> > > it?
> > >
> > > Best regards.
> > >
> > > Shuang
> > >
> > > 在 2019年4月18日,下午11:02,Tony Pan ***@***.***<mailto:
***@***.***><mailto:
> ***@***.******@***.***>><mailto:
> > ***@***.******@***.***><mailto:
***@***.***>><mailto:
> > > ***@***.******@***.***><mailto:
***@***.***><mailto:
> ***@***.******@***.***>>>> 写道:
> > >
> > >
> > > Hi, Shuang,
> > > It looks like I forgot to turn on the "Build Example Applications"
by
> > > default.
> > >
> > > Can you try to use ccmake instead of cmake
> > >
> > > ccmake src_dir
> > >
> > > which will present a graphical user interface for configuring the
> > project.
> > > The first item should be BUILD_EXAMPLE_APPLICATIONS. change that to
> > "ON".
> > > then press c to configure and g to "generate and exit".
> > >
> > > Now this is going to create a somewhat large number of targets (what
I
> > > needed during my evaluation). You can see a list of the build target
> by
> > >
> > > cmake --build . --target help
> > >
> > > At this point if you run "make" (or "make -j 4"), everything will be
> > built
> > > and it might take a while, or you can run "make {target}", where
> > {target}
> > > is one of the targets listed. All the targets came about due to c++
> > > templating and a desire to reduce individual binary size and to
avoid
> > > excessive branching in the code.
> > >
> > > Now some explanation of the target naming conventions. Here are a
> couple
> > > of example targets
> > >
> > > "compact_debruijn_graph_fastq_A4_K21_freq_clean_recompact_incr"
> > > "compact_debruijn_graph_fastq_A4_K31_freq_minimizer"
> > >
> > > fastq: means it operates on fastq files. We can easily support fasta
> > files
> > > as well - I'll explain how in a little bit.
> > > A4: standard 2bit DNA encoding. The other alternative is A16, which
> > > supports 4 bit DNA encoding (IUPAC)
> > > K21: kmer length. the cmake script is currently configured with 21,
> > 31,51,
> > > 55, and 63. We can easily support others as well.
> > > freq: this is my "code name" for an optimized graph construction
> > > algorithm. You should by default choose binaries with this label.
> > > clean and clean_recompact: bubbles and deadends are removed and
chains
> > are
> > > recompacted. I used some simple criteria for identifying bubbles and
> > > deadends, and they may not be what you want. The code is set up so
> that
> > an
> > > application developer can define their own criteria, but this
requires
> > some
> > > c++ coding.
> > > minimizer: attempt at using minimizers for data distribution across
> > > multiple nodes - not performing well yet. You should avoid these.
> > > incr: for when the input files pushes memory limit. This is data
> > dependent
> > > (number of unique k-mers), but you may want to try using these
> > incremental
> > > version if you have multiple files in your dataset and the fastq
files
> > are
> > > more than 1/16 of the total memory (a guess).
> > >
> > > To support FASTA files and other k values, we just need to change
the
> > > CMakeLists.txt file to generate the appropriate targets. I can show
> you
> > how
> > > to do those.
> > >
> > > To summarize quickly, use the versions with "A4" and "freq" labels.
If
> > you
> > > think you'll run out of memory, try the "incr" version. If you need
> > fasta
> > > file or other k-values support, let me know. If you need to remove
> > bubbles
> > > and dead ends, we should talk.
> > >
> > > Making the configure process easier has been on my things to do for
a
> > > while. I'll try to find some time to work on this.
> > >
> > > Thanks
> > >
> > > —
> > > You are receiving this because you authored the thread.
> > > Reply to this email directly, view it on GitHub<
> > > #1 (comment)>,
> or
> > > mute the thread<
> > >
> >
>
https://github.com/notifications/unsubscribe-auth/AE6DTTTRVED4TRUQ463R453PRCEQRANCNFSM4HGVRLGA>.
>
> >
> > >
> > >
> > > —
> > > You are receiving this because you commented.
> > > Reply to this email directly, view it on GitHub
> > > <#1 (comment)>,
> or
> > mute
> > > the thread
> > > <
> >
>
https://github.com/notifications/unsubscribe-auth/AAHFAKNWSIIP6SR6W7BYBJDPTEQGRANCNFSM4HGVRLGA>
>
> >
> > > .
> > >
> >
> > —
> > You are receiving this because you authored the thread.
> > Reply to this email directly, view it on GitHub<
> > #1 (comment)>,
or
> > mute the thread<
> >
>
https://github.com/notifications/unsubscribe-auth/AE6DTTSHQV7CDBKIPHBNBGDPTGQKJANCNFSM4HGVRLGA>.
>
> >
> >
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > <#1 (comment)>,
or
> mute
> > the thread
> > <
>
https://github.com/notifications/unsubscribe-auth/AAHFAKI3J73DZEIFMWQQFTTPTGUSRANCNFSM4HGVRLGA>
>
> > .
> >
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<
> #1 (comment)>, or
> mute the thread<
>
https://github.com/notifications/unsubscribe-auth/AE6DTTWL2UPLGDPFYIDEHHLPTHGPFANCNFSM4HGVRLGA>.
>
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#1 (comment)>, or
mute
> the thread
> <
https://github.com/notifications/unsubscribe-auth/AAHFAKNJWLTZ5FWGW3COQR3PTO75ZANCNFSM4HGVRLGA>
> .
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<
#1 (comment)>, or
mute the thread<
https://github.com/notifications/unsubscribe-auth/AE6DTTVVBO52TH4UYLFOUXLPTQX3RANCNFSM4HGVRLGA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1?email_source=notifications&email_token=AAHFAKJGH7NV6QMQTRWZ7CDPW7INJA5CNFSM4HGVRLGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWFBN2I#issuecomment-495589097>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKL3QT2RCRFHLEBQKM3PW7INJANCNFSM4HGVRLGA>
.
|
Hi, Tony,
Thanks a lot for your reply.
Sorry, I previously missed the last parameter for inputing a file, as listed with —help (but the program does not ask the input parameter if I directly execute ./compact_debruijn_graph_fastq_A4_K31. It directly input the test data.)
I have another question: how to specify the number of CPU threads/processes to run it? It seems that only one CPU core is used by default when I execute ./compact_debruijn_graph_fastq_A4_K31 <input fastq file>
The git commit number is db82d6f.
Best regards.
Shuang
在 2019年5月31日,下午10:51,Tony Pan <[email protected]<mailto:[email protected]>> 写道:
interactive
|
Hi, Shuang,
Yeah, I should have mentioned this part about multi-core. The binary is an
MPI program. What you need to do is use one of the MPI flavors: OpenMPI,
MPICH, MVAPICH, or on Cray systems is Cray MPI. typically, there is an
mpirun or mpiexec command that you'd prefix the binary. You'd also
specify the cores/processes as parameter to the mpirun/mpiexec command.
For example, for OpenMPI, the process count is specified by "-np", so the
commandline might look like
mpirun -np 16 ./compact_debruijn_graph_fastq_A4_K31 <input fastq file>
Without using mpirun, the bruno binaries essentially runs as single
threaded.
Unfortunately, each MPI implementation may call their command differently,
as well as the set of flags. Furthermore, to have it run well, the MPI
processes should be pinned to the cores. Each MPI implementation again has
its own way of doing this. Finally, if you are using a job scheduler, that
will also have impact on how the processes are assigned to cores.
Since you were able to compile, I assume you have an MPI installation on
your system. If you can let me know which mpi you are using, and which job
schedule if any, I'd be able to better tell you what switches are needed.
Tony
…On Thu, Jun 6, 2019 at 2:47 AM Shuang Qiu ***@***.***> wrote:
Hi, Tony,
Thanks a lot for your reply.
Sorry, I previously missed the last parameter for inputing a file, as
listed with —help (but the program does not ask the input parameter if I
directly execute ./compact_debruijn_graph_fastq_A4_K31. It directly input
the test data.)
I have another question: how to specify the number of CPU
threads/processes to run it? It seems that only one CPU core is used by
default when I execute ./compact_debruijn_graph_fastq_A4_K31 <input fastq
file>
The git commit number is db82d6f.
Best regards.
Shuang
在 2019年5月31日,下午10:51,Tony Pan ***@***.***<mailto:
***@***.***>> 写道:
interactive
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#1?email_source=notifications&email_token=AAHFAKKEXMP3PTIYRHMHMLLPZCXHJA5CNFSM4HGVRLGKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXB4YHA#issuecomment-499371036>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHFAKL7D5AJKX56YMVMQ4LPZCXHJANCNFSM4HGVRLGA>
.
|
Hi, I downloaded the software, and use the following command to build it:
mkdir build
cd build
cmake ..
make
But it returns the following error:
CMake Error: File /ext/kmerind/src/config/config.hpp.in does not exist.
Can you suggest the way to build and use it? Thanks!
The text was updated successfully, but these errors were encountered: