`GCXS` matmul => slice leads to incorrect results #607

matbryan52 · 2023-09-25T06:58:13Z

Describe the bug
Performing matmul @ between two GCXS arrays then slicing the output can lead to incorrect results, most commonly an array of all zeros.

To Reproduce

import numpy as np
from sparse import GCXS
# Two matmul compatible arrays with no zeros
left = GCXS.from_numpy(np.ones((4, 16)))
right = GCXS.from_numpy(np.ones((16, 2)))
dotprod = left @ right
assert dotprod.nnz == dotprod.size  # result all nonzeros
dotprod_dense = dotprod.todense()
# Slice to get the first row of the result
print(dotprod_dense[0])
# > [16. 16.]  # correct result, reduced dimension was 16
print(dotprod[0].todense())
# > [0. 0.]  # sliced result is all zeros
assert all(dotprod[i].nnz == 0 for i in range(dotprod.shape[0]))  # Every row is empty !

Expected behavior
Slicing the sparse result should be consistent with slicing the dense result.

System

OS and version: Ubuntu 20
sparse version 0.14.0
NumPy version 1.24.3
Numba version 0.57.0

Additional context
I wonder if there is a link to #602 which also refers to slicing? The result array has an unsorted indices (array([1, 0, 1, 0, 1, 0, 1, 0]) with indptr array([0, 2, 4, 6, 8])).

The text was updated successfully, but these errors were encountered:

SudhanshuJoshi09 · 2023-09-27T17:31:17Z

Hey @matbryan52, I went through the observation you shared. I believe the problem might be related to left @ right.

Updating the indices after the dot product resulted in the following responses which might be inline with our expected result:

Output - 01

import numpy as np
from sparse import GCXS
import sparse as sp

left = GCXS.from_numpy(np.ones((4, 16)))
right = GCXS.from_numpy(np.ones((16, 2)))
tmp1 = GCXS.from_numpy(16*np.ones((4, 2)), fill_value=0, compressed_axes=(0,))
tmp1.fill_value = 0

dotprod = left @ right

dotprod.indices = tmp1.indices

print('-'*90)
print(tmp1)
print(dotprod)
print('-'*90)

print(tmp1.data)
print(dotprod.data)
print('-'*90)

x = tmp1[0]
y = dotprod[0]

print(x.data)
print(y.data)
print('-'*90)

print(x.data.shape)
print(y.data.shape)
print('-'*90)

print(x)
print(y)
print('-'*90)

print(x.data)
print(y.data)
print('-'*90)

print(vars(tmp1))
print(vars(dotprod))
print('-'*90)

Output - 01

------------------------------------------------------------------------------------------
<GCXS: shape=(4, 2), dtype=float64, nnz=8, fill_value=0, compressed_axes=(0,)>
<GCXS: shape=(4, 2), dtype=float64, nnz=8, fill_value=0, compressed_axes=(0,)>
------------------------------------------------------------------------------------------
[16. 16. 16. 16. 16. 16. 16. 16.]
[16. 16. 16. 16. 16. 16. 16. 16.]
------------------------------------------------------------------------------------------
[16. 16.]
[16. 16.]
------------------------------------------------------------------------------------------
(2,)
(2,)
------------------------------------------------------------------------------------------
<GCXS: shape=(2,), dtype=float64, nnz=2, fill_value=0, compressed_axes=None>
<GCXS: shape=(2,), dtype=float64, nnz=2, fill_value=0, compressed_axes=None>
------------------------------------------------------------------------------------------
[16. 16.]
[16. 16.]
------------------------------------------------------------------------------------------
{'data': array([16., 16., 16., 16., 16., 16., 16., 16.]), 'indices': array([0, 1, 0, 1, 0, 1, 0, 1]), 'indptr': array([0, 2, 4, 6, 8]), 'shape': (4, 2), '_compressed_axes': (0,), 'fill_value': 0}
{'data': array([16., 16., 16., 16., 16., 16., 16., 16.]), 'indices': array([0, 1, 0, 1, 0, 1, 0, 1]), 'indptr': array([0, 2, 4, 6, 8]), 'shape': (4, 2), '_compressed_axes': (0,), 'fill_value': 0}
------------------------------------------------------------------------------------------

while without replacing the indices I got this response .

Output - 02

------------------------------------------------------------------------------------------
<GCXS: shape=(4, 2), dtype=float64, nnz=8, fill_value=0, compressed_axes=(0,)>
<GCXS: shape=(4, 2), dtype=float64, nnz=8, fill_value=0, compressed_axes=(0,)>
------------------------------------------------------------------------------------------
[16. 16. 16. 16. 16. 16. 16. 16.]
[16. 16. 16. 16. 16. 16. 16. 16.]
------------------------------------------------------------------------------------------
[16. 16.]
[]
------------------------------------------------------------------------------------------
(2,)
(0,)
------------------------------------------------------------------------------------------
<GCXS: shape=(2,), dtype=float64, nnz=2, fill_value=0, compressed_axes=None>
<GCXS: shape=(2,), dtype=float64, nnz=0, fill_value=0, compressed_axes=None>
------------------------------------------------------------------------------------------
[16. 16.]
[]
------------------------------------------------------------------------------------------
{'data': array([16., 16., 16., 16., 16., 16., 16., 16.]), 'indices': array([0, 1, 0, 1, 0, 1, 0, 1]), 'indptr': array([0, 2, 4, 6, 8]), 'shape': (4, 2), '_compressed_axes': (0,), 'fill_value': 0}
{'data': array([16., 16., 16., 16., 16., 16., 16., 16.]), 'indices': array([1, 0, 1, 0, 1, 0, 1, 0]), 'indptr': array([0, 2, 4, 6, 8]), 'shape': (4, 2), '_compressed_axes': (0,), 'fill_value': 0}

would like to hear your take on this, can I pick this issue up ?

hameerabbasi · 2023-09-27T17:42:12Z

Anyone is welcome to pick up the issue. 😉

EuGig · 2023-12-02T17:31:50Z

Hi everyone, as @SudhanshuJoshi09 suggested the problem is related to left @ right.

Specifically, the method _dot_csr_csr in _common.py does not return ordered arrays of, neither indices nor data (This is explicitly stated in https://github.com/scipy/scipy/blob/master/scipy/sparse/sparsetools/csr.h, where the code for _dot_csr_csr has been borrowed from).

It seems that the algorithm returns indices and data in reversed order. Inverting only indices leads to incorrect results once todense() is called. If also data are inverted, everything seems correct.

Code

import numpy as np
from sparse import GCXS

left = GCXS.from_numpy(np.array([[1,2], [1,2]]))
right = GCXS.from_numpy(np.array(([[1,2,3], [4,5,6]])))
dotprod = left @ right

print("----------- dotprod before before inverting indices--------------")
print(f"Data: {dotprod.data}")
print(f"Indices: {dotprod.indices}")
print(f"Dense Matrix:\n{dotprod.todense()}")

print("----------- sliced vector before inverting indices-----------")
print(f"Data: {dotprod[0].data}")
print(f"Indices: {dotprod[0].indices}")

# inverting indices
ord_indices = dotprod.indices
dotprod.indices = ord_indices[::-1]

print("----------- sliced vector after inverting indices-----------")
print(f"Data: {dotprod[0].data}")
print(f"Indices: {dotprod[0].indices}")

print("----------- wrong dense matrix-----------")
print(f"Dense matrix of dotprod:\n{dotprod.todense()}")

print("----------- wrong dense vector-----------")
print(f"Dense matrix of dotprod[0]:\n{dotprod[0].todense()}")

#inverting also data
ord_data = dotprod.data
dotprod.data = ord_data[::-1]


print("----------- dense matrix after inverting also data--------")
print(f"Dense matrix of dotprod:\n{dotprod.todense()}")

print("----------- dense sliced vector after inverting also data--------")
print(f"Dense matrix of dotprod[0]:\n{dotprod[0].todense()}")

Output

----------- dotprod before before inverting indices--------------
Data: [15 12  9 15 12  9]
Indices: [2 1 0 2 1 0]
Dense Matrix:
[[ 9 12 15]
 [ 9 12 15]]
----------- sliced vector before inverting indices-----------
Data: []
Indices: []
----------- sliced vector after inverting indices-----------
Data: [15 12  9]
Indices: [0 1 2]
----------- wrong dense matrix-----------
Dense matrix of dotprod:
[[15 12  9]
 [15 12  9]]
----------- wrong dense vector-----------
Dense matrix of dotprod[0]:
[15 12  9]
----------- dense matrix after inverting also data--------
Dense matrix of dotprod:
[[ 9 12 15]
 [ 9 12 15]]
----------- dense sliced vector after inverting also data--------
Dense matrix of dotprod[0]:
[ 9 12 15]

So, the easiest solution would be to reverse indices and data in _dot_csr_csr before returning them. I don't know If this approach is the best regarding performance. What are your thoughts about that?

hameerabbasi · 2023-12-02T17:47:21Z

Thanks @EuGig for the detailed analysis. I'd be willing to accept a PR if you make one.

matbryan52 added the bug Indicates an unexpected problem or unintended behavior label Sep 25, 2023

EuGig mentioned this issue Dec 4, 2023

Fixing "GCXS matmul => slice leads to incorrect results" #611

Merged

hameerabbasi closed this as completed in #611 Dec 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`GCXS` matmul => slice leads to incorrect results #607

`GCXS` matmul => slice leads to incorrect results #607

matbryan52 commented Sep 25, 2023

SudhanshuJoshi09 commented Sep 27, 2023 •

edited

Loading

hameerabbasi commented Sep 27, 2023

EuGig commented Dec 2, 2023

hameerabbasi commented Dec 2, 2023

GCXS matmul => slice leads to incorrect results #607

GCXS matmul => slice leads to incorrect results #607

Comments

matbryan52 commented Sep 25, 2023

SudhanshuJoshi09 commented Sep 27, 2023 • edited Loading

hameerabbasi commented Sep 27, 2023

EuGig commented Dec 2, 2023

Code

Output

hameerabbasi commented Dec 2, 2023

`GCXS` matmul => slice leads to incorrect results #607

`GCXS` matmul => slice leads to incorrect results #607

SudhanshuJoshi09 commented Sep 27, 2023 •

edited

Loading