Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider updating/extending the matrix.type heuristics #62

Open
CarterButts opened this issue Jun 14, 2021 · 0 comments
Open

Consider updating/extending the matrix.type heuristics #62

CarterButts opened this issue Jun 14, 2021 · 0 comments
Assignees

Comments

@CarterButts
Copy link
Contributor

From a different thread:

Yeah, the matrix type heuristics have to make some judgment calls, and those are tricky in some cases. Currently, a square matrix is always assumed to be an adjacency matrix if not specified (since it usually is); if it has an n attribute that doesn't match the dimension, then that flags as an error. Regular edgelists with two edges, or sna edgelists with three edges, can be hard to distinguish from valued adjacency matrices. It may be worth revisiting those heuristics (especially since we weren't using extra attributes, IIRC, when they were first created); the help does specify that they are dubious, but one would like them to be as smart as they can reasonably be under the circumstances. One such heuristic might be that if a square matrix has 2 or 3 columns, an n attribute not matching the dimension, and the first two columns contain only values in 1:n, then it's probably an edgelist.

As background, coercing a matrix to a network requires knowing what type of matrix (sociomatrix, edgelist, or incidence matrix) is involved; this is normally specified by the user, but when the user declines to explicitly specify it then we fall back to automagic solutions. Presently, this is controlled by the function which.matrix.type. Unfortunately, it is not always possible to unambiguously determine the matrix type from the data itself, and thus we rely on a series of heuristics that are based in part on common use cases (in fairness, we warn the user in the help pages about this). Over time, our most common use cases have evolved, so some heuristics may no longer be ideal (or, more broadly, we can do better). In particular, edgelists have gone from being a relatively uncommon data type to an extremely widely used one, so it is useful to be sure that we handle these well.

The proximate issue here has to do with edgelists that happen to create square matrices (so two-edge networks for conventional edgelists and three-edge networks for sna edgelists). The current heuristics assume that a square matrix is an adjacency matrix, which is almost always the right answer; however, it would be useful to be able to spot more of these odd cases. The above quote supplies a suggestion in that regard, which may be worth implementing. I am opening the issue mostly so that I don't forget, but also in case others have more cases that we'd like the heuristics to cover.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant