Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smooth chain #272

Merged
merged 29 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
9740f8f
make the direction hard cutoff in chaining more correct
ekg Sep 12, 2024
41e84af
more stringent query based sort
ekg Sep 12, 2024
d7ca6aa
feat: Implement new chaining logic for mapping merging
ekg Sep 14, 2024
09242d8
feat: Apply chaining logic to second mapping and sort chain pairs in …
ekg Sep 14, 2024
869da99
feat: Modify mergeMappingsInRange function to improve merging logic
ekg Sep 14, 2024
68c9df1
simplify sorting and just keep best chain pair option around
ekg Sep 14, 2024
0e8bcc9
get rid of axis weighted distance
ekg Sep 14, 2024
fefb4ee
comment cleanup
ekg Sep 14, 2024
377f0c5
refactor: processChainWithSplits
ekg Sep 14, 2024
810ce0a
cleanup decision to chain
ekg Sep 14, 2024
cb94726
best-buddy chain merging
ekg Sep 14, 2024
d33a392
feat: Implement filtering based on maximally merged chains
ekg Sep 15, 2024
6401eba
feat: Restructure mapping and filtering logic
ekg Sep 15, 2024
29ceaa2
feat: add filterNonMergedMappings function
ekg Sep 15, 2024
90fe5b6
one codepath for mapping chain processing
ekg Sep 15, 2024
c7c171e
remove unused block coordinate info
ekg Sep 15, 2024
972222d
simplify maximally merged filter
ekg Sep 15, 2024
99b6681
fix: Modify FASTA file mapping to allow self-mapping
ekg Sep 15, 2024
0222496
feat: Implement improved mapping merging and filtering
ekg Sep 15, 2024
c4408b7
fix: Update distance calculation in range merging to account for stra…
ekg Sep 16, 2024
5fb526e
refactor: Improve distance calculation logic in computeMap.hpp
ekg Sep 16, 2024
c3e2b02
refactor: Ensure all fields in merged mappings are properly initialized
ekg Sep 16, 2024
c89f1b0
feat: Instrument the Lee plane sweep filter algorithm
ekg Sep 17, 2024
c67c73b
feat: Add detailed mapping information to debug statements
ekg Sep 17, 2024
7c14341
fix: Recalculate blockNucIdentity and handle edge cases in score calc…
ekg Sep 17, 2024
60ba144
fix: Remove debug statements from liFilterAlgorithm
ekg Sep 17, 2024
9bdef53
fix: Remove debug statements from markGood function
ekg Sep 17, 2024
cd0542e
fix: Correct calculation of query_end and target_end in tail patching
ekg Sep 18, 2024
240a0d3
add pafcheck to validation
ekg Sep 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/test_on_push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,12 @@ jobs:
override: true
- name: Install wgatools
run: cargo install --git https://github.com/wjwei-handsome/wgatools.git
- name: Install wgatools
run: cargo install --git https://github.com/ekg/pafcheck.git
- name: Run wfmash and generate PAF
run: build/bin/wfmash -t 8 -n 1 -k 19 -s 5000 -p 90 -c 30k -P 50k -T SGDref -Q S288C -Y '#' data/scerevisiae8.fa.gz data/scerevisiae8.fa.gz > test.paf
run: build/bin/wfmash -t 8 -T SGDref -Q S288C -Y '#' data/scerevisiae8.fa.gz > test.paf
- name: check PAF coordinates and extended CIGAR validity
run: pafcheck --query-fasta data/scerevisiae8.fa.gz --paf test.paf
- name: Convert PAF to MAF using wgatools
run: wgatools paf2maf --target data/scerevisiae8.fa.gz --query data/scerevisiae8.fa.gz test.paf > test.maf
- name: Check if MAF file is not empty
Expand Down
14 changes: 7 additions & 7 deletions src/common/wflign/src/wflign_patch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1397,17 +1397,17 @@ void write_merged_alignment(
query_pos = query_length;
target_pos = target_length;

// Add safety checks first
uint64_t new_query_end = query_offset + query_length;
uint64_t new_target_end = target_offset + target_length + actual_extension;
// Calculate new ends relative to the segment being aligned
uint64_t new_query_end = query_length;
uint64_t new_target_end = target_length + actual_extension;

if (new_query_end > query_total_length || new_target_end > target_total_length) {
if (query_offset + new_query_end > query_total_length || target_offset + new_target_end > target_total_length) {
std::cerr << "[wfmash::patch] Warning: Alignment extends beyond sequence bounds. Truncating." << std::endl;
}

// Adjust query_end and target_end, ensuring we don't exceed the total lengths
query_end = std::min(new_query_end, query_total_length);
target_end = std::min(new_target_end, target_total_length);
// Adjust query_end and target_end, ensuring we don't exceed the segment lengths
query_end = std::min(new_query_end, query_length);
target_end = std::min(new_target_end, target_length + actual_extension);
}
}

Expand Down
3 changes: 1 addition & 2 deletions src/interface/parse_args.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -237,8 +237,7 @@ void parse_args(int argc,

// If there are no queries, go in all-vs-all mode with the sequences specified in `target_sequence_file`
if (target_sequence_file && map_parameters.querySequences.empty()) {
map_parameters.skip_self = true;
std::cerr << "[mashmap] Skipping self mappings for single file all-vs-all mapping." << std::endl;
std::cerr << "[mashmap] Performing all-vs-all mapping including self mappings." << std::endl;
map_parameters.querySequences.push_back(map_parameters.refSequences.back());
align_parameters.querySequences.push_back(align_parameters.refSequences.back());
}
Expand Down
6 changes: 2 additions & 4 deletions src/map/include/base_types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -157,10 +157,6 @@ namespace skch
seqno_t refSeqId; //internal sequence id of the reference contig
seqno_t querySeqId; //internal sequence id of the query sequence
offset_t blockLength; //the block length of the mapping
offset_t blockRefStartPos;
offset_t blockRefEndPos;
offset_t blockQueryStartPos;
offset_t blockQueryEndPos;
float blockNucIdentity;

float nucIdentity; //calculated identity
Expand All @@ -178,6 +174,8 @@ namespace skch
uint8_t discard; // set to 1 for deletion
bool overlapped; // set to true if this mapping is overlapped with another mapping
bool selfMapFilter; // set to true if a long-to-short mapping in all-vs-all mode (we report short as the query)
double chainPairScore; // best score for potential chain pair
int64_t chainPairId; // best partner mapping for potential chain pair

offset_t qlen() { //length of this mapping on query axis
return queryEndPos - queryStartPos + 1;
Expand Down
Loading
Loading