-
Notifications
You must be signed in to change notification settings - Fork 208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the O(n^3) performance issue in is_connected
utility function
#397
base: develop
Are you sure you want to change the base?
Conversation
Adhere to the specification in comment: // Is the undirected graph connected? // Is the directed graph strongly connected? Instead of checking if each vertex is reachable from each vertex (which is O(n^3)), consider different approaches for directed and undirected graphs: For an undirected graph check if each vertex was reachable with a single DFS walk. This runs in O(N) (modulo back edges). For a directed graph run Tarjan SCC algorithm and check if all vertices end up in a single component. This runs in O(N+E). The speed-up is considerable, e.g. for a 25x25 square connected graph it is about: time: undirected, connected - prev implementation - elapsed 16.458s time: undirected, connected - new implementation - elapsed 6.1249e-05s time: directed, connected - prev implementation - elapsed 7.45684s time: directed, connected - new implementation - elapsed 4.9894e-05s For similar 80x80 graph it's about 0.000563113s, while it just takes forever with the previous O(n^3) approach.
Adhere to the specification in function comment: // Is the undirected graph connected? // Is the directed graph strongly connected? Instead of checking if each vertex is reachable from each vertex (which is O(n^3)), consider different approaches for directed and undirected graphs: For an undirected graph check if each vertex was reachable with a single DFS walk. This runs in O(N) of a single connected component (modulo its back edges). For a directed graph run Tarjan SCC algorithm and check if all vertices end up in a single component. This runs in O(N + E). The speed-up is considerable, e.g. for a 25x25 square connected graph it is about: time: undirected, connected - prev implementation - elapsed 16.458s time: undirected, connected - new implementation - elapsed 6.1249e-05s time: directed, connected - prev implementation - elapsed 7.45684s time: directed, connected - new implementation - elapsed 4.9894e-05s For similar 80x80 graph it's about 0.000563113s, while it just takes forever with the previous O(n^3) approach.
…backward compatibility with the previous implementation
I've also opened #398 to track this. |
This resolves a build error due to boost::detail::get preferred without the using declaration: ../../../boost/pending/detail/disjoint_sets.hpp:58:24: error: no matching function for call to 'get(long unsigned int*&, int&)' 58 | assert(i == get(p, i)); ../../../boost/graph/reverse_graph.hpp:561:7: note: candidate: 'template<class E> E boost::detail::get(boost::detail::underlying_edge_desc_map_type<E>, const boost::detail::reverse_graph_edge_descriptor<EdgeDesc>&)' 561 | E get(underlying_edge_desc_map_type< E > m, | ^~~ ../../../boost/graph/reverse_graph.hpp:561:7: note: template argument deduction/substitution failed: ../../../boost/pending/detail/disjoint_sets.hpp:58:24: note: mismatched types 'boost::detail::underlying_edge_desc_map_type<E>' and 'long unsigned int*' 58 | assert(i == get(p, i)); | ~~~^~~~~~
I only have one question: how does someone with almost no activity history in GitHub know so much about Boost.Graph? :) |
A great question! Well, I read the book as soon at it was published ;) And Alexander Stepanov's recommendation meant a lot to me, I know him personally and appreciate his inventions a lot. I also ended up rolling up my own graph library before, because Boost was not welcome in my organization. As of GitHub, I hope to publish a few things soon, including the small project, where I encountered this error. One reason I actually hit it is that I go straight to the code first and resort to the documentation when the code doesn't work as I'd expect. Thanks for looking into this! |
For the record, this function was added in 86ef707 on |
Interesting. Did you also work with him? I'm also a great admirer of his work and managed to visit him once for a chat; unfortunately, Palo Alto is a long way from Australia. Anyway, my question was fairly flippant, so let me address an actual concern: maybe this function shouldn't exist at all? As you pointed out, the function is undocumented and not used anywhere in the code base that I can see, so we are free to reinvent it. |
No, we also only met for a few days.
Yes, I like this idea. Let me draft an implementation do discuss in greater details. I'd still consider it important to keep the interface backward-compatible. What do you think about it? Also is it possible to mark a function as deprecated in Boost? I only see the deprecated headers or comments for function deprecation. |
And one more question: I see |
The
is_connected()
function is implemented in terms ofis_rechable()
, effectively checking if each vertex is reachable from each other vertex in a nested loop. This quickly becomes terribly slow as the number of vertices and edges grow, becoming roughly unusable with a square connected graph of size 20x20.This request fixes the issue by
This reduces the algorithm complexity from cubic to linear and makes it possible to use it on large graphs. See also the commit message in 80ae041
This change is fully backward compatible with the previous implementation.
It also adds a new
is_connected.cpp
test. Feel free to run this test on the previous implementation. This test passes without the new algorithm, but picks the size of 20x20 square to be considerably slow. This test also accepts an integer as argv[1], the size of the square, so it is possible to observe the slow-down growing as the size increases (the default is 20).Please see an excerpt of my measurement below (I can upload this program as well). The previous implementation quickly becomes impracticably slow.
I note, that
is_connected
is not documented and is not widely used. Still I consider it to be important to fully adhere to the interface specified in comment and preserve the compatibility, as it could have been used in non-public projects. In short one important decision here is to avoid any breaking changes.I foresee a few concerns with this request, which I'll be happy to discuss and address if possible:
graph_utility.hpp
header file seems to be designed as low-dependency. Including < vector > and <strong_components.hpp> there seems to violate that choice. But I think the practical aspect of this is reasonably minimal; while fixing the issue fully preserving the backward compatibility is important in practice.strong_components
do not accept the colormap and pass it to DFS. I don't see this as a big concern. This can be addressed in 2 ways: updating SCC interface to pass the colormap to DFS, or (probably better) provide the version ofis_connected
with a single argument and no colormap.VertexIndexGraph
concept. (I think) This can be fixed by usingmap
instead ofvector
for the component map (comp_map
) in case the graph is not VertexIndexGraph. I'm not sure, how important is this, but this seems to be a deviation from the full backward compatibility.I consider a bit of follow-up work on this, which can also be done in a separate pull request:
is_connected(const Graph& g)
. This will require including the colormap header into the graph_utilityis_connected
(and other utility functions likeis_reachable
)Measurement: