Skip to content

Commit

Permalink
Ensure size_hint never exceeds graph_limit
Browse files Browse the repository at this point in the history
If we have thousands of syntax nodes on both sides, we can end
up attempting to preallocate a very large hashmap.

In #542, a user hit an issue with two JSON files where the LHS had
33,000 syntax nodes and the RHS had 34,000 nodes, so we'd attempt to
preallocate a hashmap of capacity 1,122,000,000. This required
allocating 70,866,960,400 bytes (roughly 66 GiB).

Impose a sensible limit on the hashmap.

Fixes #542
  • Loading branch information
Wilfred committed Aug 5, 2023
1 parent c937f81 commit 892d4fd
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 1 deletion.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@ prominent.

Improved syntax hightling for Java built-in types.

### Diffing

Fixed an issue with runaway memory usage when the two files input
files had a large number of differences.

## 0.49 (release 26th July 2023)

### Parsing
Expand Down
7 changes: 6 additions & 1 deletion src/diff/dijkstra.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,12 @@ pub fn mark_syntax<'a>(
// graph whose size is roughly quadratic. Use this as a size hint,
// so we don't spend too much time re-hashing and expanding the
// predecessors hashmap.
let size_hint = lhs_node_count * rhs_node_count;
//
// Cap this number to the graph limit, so we don't try to allocate
// an absurdly large (i.e. greater than physical memory) hashmap
// when there is a large number of nodes. We'll never visit more
// than graph_limit nodes.
let size_hint = std::cmp::min(lhs_node_count * rhs_node_count, graph_limit);

let start = Vertex::new(lhs_syntax, rhs_syntax);
let vertex_arena = Bump::new();
Expand Down

0 comments on commit 892d4fd

Please sign in to comment.