Add scan. #531

dcherian · 2024-07-31T18:06:46Z

Closes #277

dcherian · 2024-07-31T18:07:22Z

cubed/core/ops.py

+    #    Here we diverge from Blelloch, who runs a balanced tree algorithm to calculate the scan.
+    #    Instead we generalize recursively apply the scan to `reduced`.
+    # 3a. First we merge to a decent intermediate chunksize since reduced.chunksize[axis] == 1
+    new_chunksize = min(reduced.shape[axis], reduced.chunksize[axis] * 5)


need input here on choosing a new intermediate chunksize to rechunk to based on memory info.

There are a couple of things to consider here: the number of chunks to combine at each stage, and the memory limits.

The first is like split_every in reduction, where the default is 4, although 6 or 8 may be better for larger workloads.

For the second, we should make sure the new chunksize is no larger than (x.spec.allowed_mem - x.spec.reserved_mem) // 4, where the factor of 4 is comes about because of the {compressed,uncompressed} * {input,output} copies.

There is an error case where this memory constraint means the new chunksize is no larger than the existing one, so the computation can't proceed. The user can fix this either by reducing the chunksize or by increasing the memory. This is similar to this case:

cubed/cubed/core/ops.py

Lines 985 to 991 in 88c5dc4

# single axis: see how many result chunks fit in max_mem

# factor of 4 is memory for {compressed, uncompressed} x {input, output}

target_chunk_size = (max_mem - chunk_mem) // (chunk_mem * 4)

if target_chunk_size <= 1:

raise ValueError(

f"Not enough memory for reduction. Increase allowed_mem ({allowed_mem}) or decrease chunk size"

)

dcherian · 2024-07-31T18:07:30Z

cubed/core/ops.py

+        shape=scanned.shape,
+        dtype=scanned.dtype,
+        chunks=scanned.chunks,
+        extra_projected_mem=scanned.chunkmem * 2,  # arbitrary


need input here too.

This should be the memory allocated to read from the side inputs (scanned and increment here). We double the chunk memory to account for reading the compressed Zarr chunk, so the result would be

extra_projected_mem=scanned.chunkmem * 2 + increment.chunkmem * 2

(There's an open issue #288 to make this a bit more transparent.)

tomwhite

Are you going to add a user-facing cumulative_sum function from the Array API? This would be a good function for the unit tests to test.

tomwhite · 2024-08-01T09:40:18Z

cubed/core/ops.py

+        shape=scanned.shape,
+        dtype=scanned.dtype,
+        chunks=scanned.chunks,
+        extra_projected_mem=scanned.chunkmem * 2,  # arbitrary


This should be the memory allocated to read from the side inputs (scanned and increment here). We double the chunk memory to account for reading the compressed Zarr chunk, so the result would be

extra_projected_mem=scanned.chunkmem * 2 + increment.chunkmem * 2

(There's an open issue #288 to make this a bit more transparent.)

tomwhite · 2024-08-01T09:57:21Z

cubed/core/ops.py

+    #    Here we diverge from Blelloch, who runs a balanced tree algorithm to calculate the scan.
+    #    Instead we generalize recursively apply the scan to `reduced`.
+    # 3a. First we merge to a decent intermediate chunksize since reduced.chunksize[axis] == 1
+    new_chunksize = min(reduced.shape[axis], reduced.chunksize[axis] * 5)


There are a couple of things to consider here: the number of chunks to combine at each stage, and the memory limits.

The first is like split_every in reduction, where the default is 4, although 6 or 8 may be better for larger workloads.

For the second, we should make sure the new chunksize is no larger than (x.spec.allowed_mem - x.spec.reserved_mem) // 4, where the factor of 4 is comes about because of the {compressed,uncompressed} * {input,output} copies.

There is an error case where this memory constraint means the new chunksize is no larger than the existing one, so the computation can't proceed. The user can fix this either by reducing the chunksize or by increasing the memory. This is similar to this case:

cubed/cubed/core/ops.py

Lines 985 to 991 in 88c5dc4

# single axis: see how many result chunks fit in max_mem

# factor of 4 is memory for {compressed, uncompressed} x {input, output}

target_chunk_size = (max_mem - chunk_mem) // (chunk_mem * 4)

if target_chunk_size <= 1:

raise ValueError(

f"Not enough memory for reduction. Increase allowed_mem ({allowed_mem}) or decrease chunk size"

)

tomwhite · 2024-08-01T10:06:37Z

cubed/core/ops.py

+    """
+    # Blelloch (1990) out-of-core algorithm.
+    # 1. First, scan blockwise
+    scanned = blockwise(func, "ij", array, "ij", axis=axis)


Using map_blocks would be simpler and avoid the 2D assumption

tomwhite · 2024-08-01T10:08:20Z

cubed/core/ops.py

@@ -1442,3 +1443,120 @@ def smallest_blockdim(blockdims):
            m = ntd[0]
            out = ntd
    return out
+
+
+def wrapper_binop(


Maybe call something like _scan_binop to link it to the scan implementation? I've been using a naming convention like that elsewhere in the file.

Add scan.

179cbce

Closes cubed-dev#277

dcherian commented Jul 31, 2024

View reviewed changes

tomwhite reviewed Aug 1, 2024

View reviewed changes

tomwhite mentioned this pull request Aug 1, 2024

Tree merge chunks #527

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add scan. #531

Add scan. #531

dcherian commented Jul 31, 2024

dcherian Jul 31, 2024

tomwhite Aug 1, 2024

dcherian Jul 31, 2024

tomwhite Aug 1, 2024

tomwhite left a comment

tomwhite Aug 1, 2024

tomwhite Aug 1, 2024

tomwhite Aug 1, 2024

tomwhite Aug 1, 2024

	# single axis: see how many result chunks fit in max_mem
	# factor of 4 is memory for {compressed, uncompressed} x {input, output}
	target_chunk_size = (max_mem - chunk_mem) // (chunk_mem * 4)
	if target_chunk_size <= 1:
	raise ValueError(
	f"Not enough memory for reduction. Increase allowed_mem ({allowed_mem}) or decrease chunk size"
	)

Add scan. #531

Are you sure you want to change the base?

Add scan. #531

Conversation

dcherian commented Jul 31, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomwhite left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment