Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] cor_sort() can deal with non-square matrices #334

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 60 additions & 4 deletions R/cor_sort.R
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@
)

# Make sure Parameter columns are character
# Was added to fix a test, but makes the function not work
# (See https://github.com/easystats/correlation/issues/259)
# reordered$Parameter1 <- as.character(reordered$Parameter1)
# reordered$Parameter2 <- as.character(reordered$Parameter2)

Expand Down Expand Up @@ -76,8 +78,13 @@

#' @export
cor_sort.matrix <- function(x, distance = "correlation", hclust_method = "complete", ...) {
col_order <- .cor_sort_order(x, distance = distance, hclust_method = hclust_method, ...)
reordered <- x[col_order, col_order]
if(isSquare(x) & all(colnames(x) %in% rownames(x))) {

Check warning on line 81 in R/cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/cor_sort.R,line=81,col=5,[spaces_left_parentheses_linter] Place a space before left parenthesis, except in a function call.

Check warning on line 81 in R/cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/cor_sort.R,line=81,col=18,[vector_logic_linter] Use `&&` in conditional expressions.

Check warning on line 81 in R/cor_sort.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/cor_sort.R,line=81,col=5,[spaces_left_parentheses_linter] Place a space before left parenthesis, except in a function call.

Check warning on line 81 in R/cor_sort.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/cor_sort.R,line=81,col=18,[vector_logic_linter] Use `&&` in conditional expressions.
i <- .cor_sort_square(x, distance = distance, hclust_method = hclust_method, ...)
} else {
i <- .cor_sort_nonsquare(x, distance = "euclidean", ...)
}

reordered <- x[i$row_order, i$col_order]

# Restore class and attributes
attributes(reordered) <- utils::modifyList(
Expand All @@ -91,7 +98,7 @@
# Utils -------------------------------------------------------------------


.cor_sort_order <- function(m, distance = "correlation", hclust_method = "complete", ...) {
.cor_sort_square <- function(m, distance = "correlation", hclust_method = "complete", ...) {
if (distance == "correlation") {
d <- stats::as.dist((1 - m) / 2) # r = -1 -> d = 1; r = 1 -> d = 0
} else if (distance == "raw") {
Expand All @@ -101,5 +108,54 @@
}

hc <- stats::hclust(d, method = hclust_method)
row.names(m)[hc$order]
row_order <- row.names(m)[hc$order]
list(row_order = row_order, col_order = row_order)
}


.cor_sort_nonsquare <- function(m, distance = "euclidean", ...) {
# Step 1: Perform clustering on rows and columns independently
row_dist <- dist(m, method = distance) # Distance between rows
col_dist <- dist(t(m), method = distance) # Distance between columns

row_hclust <- stats::hclust(row_dist, method = "average")
col_hclust <- stats::hclust(col_dist, method = "average")

# Obtain clustering orders
row_order <- row_hclust$order
col_order <- col_hclust$order

# Reorder matrix based on clustering
clustered_matrix <- m[row_order, col_order]

# Step 2: Refine alignment to emphasize strong correlations along the diagonal
n_rows <- nrow(clustered_matrix)
n_cols <- ncol(clustered_matrix)

used_rows <- logical(n_rows)
refined_row_order <- integer(0)

for (col in seq_len(n_cols)) {
max_value <- -Inf
best_row <- NA

for (row in seq_len(n_rows)[!used_rows]) {
if (abs(clustered_matrix[row, col]) > max_value) {
max_value <- abs(clustered_matrix[row, col])
best_row <- row
}
}

if (!is.na(best_row)) {
refined_row_order <- c(refined_row_order, best_row)
used_rows[best_row] <- TRUE
}
}

# Append any unused rows at the end
refined_row_order <- c(refined_row_order, which(!used_rows))

# Apply
m <- clustered_matrix[refined_row_order, ]
list(row_order = rownames(m), col_order = colnames(m))
}
3 changes: 2 additions & 1 deletion R/display.R
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
#' @name display.easycormatrix
#'
#' @description Export tables (i.e. data frame) into different output formats.
#' `print_md()` is a alias for `display(format = "markdown")`.
#' `print_md()` is a alias for `display(format = "markdown")`. Note that
#' you can use `format()` to get the formatted table as a dataframe.
#'
#' @param object,x An object returned by
#' [`correlation()`][correlation] or its summary.
Expand Down
17 changes: 17 additions & 0 deletions tests/testthat/test-cor_sort.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
test_that("cor_sort", {
r <- cor(mtcars)
expect_equal(as.numeric(diag(r)), rep(1, ncol(mtcars)))

Check warning on line 3 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=3,col=3,[expect_identical_linter] Use expect_identical(x, y) by default; resort to expect_equal() only when needed, e.g. when setting ignore_attr= or tolerance=.
# heatmap(r, Rowv = NA, Colv = NA)

r1 <- cor_sort(r)
expect_equal(as.numeric(diag(r)), rep(1, ncol(mtcars)))

Check warning on line 7 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=7,col=3,[expect_identical_linter] Use expect_identical(x, y) by default; resort to expect_equal() only when needed, e.g. when setting ignore_attr= or tolerance=.
# heatmap(r1, Rowv = NA, Colv = NA)

r2 <- cor(mtcars[names(mtcars)[1:5]], mtcars[names(mtcars)[6:11]])
expect_equal(rownames(r2), names(mtcars)[1:5])

Check warning on line 11 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=11,col=3,[expect_identical_linter] Use expect_identical(x, y) by default; resort to expect_equal() only when needed, e.g. when setting ignore_attr= or tolerance=.
# heatmap(r2, Rowv = NA, Colv = NA)

r3 <- cor_sort(r2)
expect_equal(all(rownames(r3) == names(mtcars)[1:5]), FALSE)

Check warning on line 15 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=15,col=3,[expect_identical_linter] Use expect_identical(x, y) by default; resort to expect_equal() only when needed, e.g. when setting ignore_attr= or tolerance=.

Check warning on line 15 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=15,col=3,[expect_true_false_linter] expect_false(x) is better than expect_equal(x, FALSE)
# heatmap(r3, Rowv = NA, Colv = NA)
})

Check warning on line 17 in tests/testthat/test-cor_sort.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=tests/testthat/test-cor_sort.R,line=17,col=3,[trailing_blank_lines_linter] Add a terminal newline.
Loading