Basic operations - style and syntax #3
Replies: 3 comments
-
Maybe I can add some 'Design Considerations for Graphoscope'
One important decision to make is whether to use generic types or interfaces for representing nodes and edges in our graph library. This choice will impact the flexibility and reusability of our library. By utilizing generic types, we can create a more generic and flexible solution, allowing users to define their own custom node and edge types. On the other hand, using interfaces can provide a more structured approach and enforce specific behavior and constraints.
Another consideration is the choice between utilizing the matrix representation provided by FSharp.Stats or implementing our own individual representation. It is worth noting that CPU and GPU support is expected to be introduced in FSharp.Stats. Therefore, we need to evaluate whether leveraging the FSharp.Stats matrix representation will align with our performance requirements and provide the necessary computational capabilities.
We need to decide whether our graph library should aim for a complete implementation of all algorithms on each graph implementation, regardless of their performance. Alternatively, we can opt for a selective approach, implementing algorithms based on the data structure that offers the best performance. This decision will impact the breadth of algorithms available, and the efficiency of their execution might be the responsibility of the user.
The choice between using modules or classes with static functions for organizing our codebase is another crucial consideration. It is important to weigh the pros and cons of each approach and determine which one aligns better with our library's goals and requirements.
The decision of whether to use overloaded functions, optional parameters, or avoid such features altogether also needs to be addressed. Overloaded functions can provide different method signatures for improved expressiveness; however, they can be tedious to use in pure F#. It is essential to find the right balance that promotes clarity and maintainability in our graph library. By discussing these design considerations, we can collectively make informed decisions that will shape the design and functionality of our graph library. Each choice we make will have implications for the library's flexibility, performance, and ease of use. It is important to weigh the trade-offs and consider the long-term implications of our decisions to create a robust and efficient graph library that meets the needs of our users. Your insights and suggestions on these considerations are highly appreciated. Let's have a productive discussion! |
Beta Was this translation helpful? Give feedback.
-
Additionally, it might be beneficial to discuss the organizational structure of the Graphoscope library. Normally this is quite important for usability. In a first attempt I would suggest building the library model centric. This would mean: • graph data structures on root level |
Beta Was this translation helpful? Give feedback.
-
After using the library a bit I can provide my opinion from the perspective of an F# newbie, so someone who might want to give a shot to our library coming from python, R, or Julia. I have more thoughts, but I think these are the most general and impactful ones. I also don't necessarily want to advocate strongly for my own position, but just to make sure we are thinking about these issues and we are making a motivated choice.
If functions that provide logically equivalent results accept different inputs or provide different outputs, this might lead to puzzling mistakes by the users. One example of inconsistent output is Measures.Degree.maximum which returns a float for DiGraph, but an int for FGraph. One example of inconsistent input is for shortest paths. Dijkstra requires node ids, but FloydWarshall implicitly uses a 0-index output. If I have an 1-indexed input, to get the path lengths for the same node I need to specify 4 for Dijkstra, but 3 to index FloydWarshall's output.
The user will spend most of their time with Graphoscope not writing code but reading the docs. In one example, the Dijkstra documentation says that it will show how to "find shortest path between all the vertices", but the function will return the path lengths that originate from a specific node. The user might want the actual paths, and might expect to find them all, not only from a single origin, based on what is written.
Point raised by Timo above, I'm not a huge fan of overloaded functions. As an example, when finding shortest paths, one could be tempted to have an overloaded function that could take an origin and destination node, just the origin, just the destination, or neither, with the omitted parameter being implicitly replaced by "all nodes". But I think having four separate functions with different names (AllShortestPaths, SingleOriginShortestPaths, SingleDestinationShortestPaths, OriginDestinationShortestPaths) is more user friendly. If we add the above consideration, we have 8 functions (one to return just the length, the other returning the actual path), rather than an overly complicated function with 8 possible parameter combinations.
What I said above is a bit against the current method naming philosophy, which is method-oriented (the user is responsible for choosing which algorithm they want to run). My proposal implies that there is a Algorithms.ShortestPaths.* structure, where "All" would be using FloydWarshall and the rest Dijkstra, without the user really being exposed to which specific algorithm is running under the hood. So I'm being task-oriented: I just want the shortest paths, I don't need to know how they are computed. I don't know how much we already discussed this.
If we want to have happy users, we need to put them in the best conditions of not making silly mistakes. I imported a graph incorrectly and I was puzzled at some of the results I obtained. This would have been avoided if at the time of the import, the system would have kept me aware of how many nodes and edges the imported graph had. The documentation does this, but the "Successfully imported the graph!" message did not appear to me when using VS Code. |
Beta Was this translation helpful? Give feedback.
-
There is an open question on which ‘Basic’ operations each graph representation should support and how these basic operations should be implemented.
Basic operations would include things like GetEdge, GetDegree, Get OutDegree etc.
The core data structure would have a complete set of these operations for all graphs types, and all representations will have a conversion function to this core type, but we need to define a minimum required set for other representations.
We also need to decide how these operations should be implemented. Ideally the implementation would support pipelining and idiomatic f#. This touches on other design considerations. For example, If the graphs representations are mutable should we return the updated graph from each operation? If the primary use environment of the lib is interactive(.fsx) and notebooks perhaps we should limit mutation so user's dont have to keep track of so much session scope. Maybe we could offer in place mutable functions for large graphs, while for smaller graphs users can stick to immutable style. Perhaps the plotly.net api could provide a good model to follow.
Further discussion needed!
Beta Was this translation helpful? Give feedback.
All reactions