Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a precomputed cloudvolume from an HDF5 file #613

Open
Magic-Ludo opened this issue May 31, 2024 · 1 comment
Open

Creating a precomputed cloudvolume from an HDF5 file #613

Magic-Ludo opened this issue May 31, 2024 · 1 comment
Labels
question What is going on??? :thinking emoji:

Comments

@Magic-Ludo
Copy link

I have 3620 images in .tif format, totalling 1TB.

Here's their info:

Shape: (21796, 12876)
dtype: uint16

I'm looking to use Neuroglancer to visualize this dataset. To achieve this, I have converted all the .tif files to .h5 format with the following script:

import os
import h5py
import tifffile as tiff
from tqdm import tqdm

tif_dir = "/path/to/tif/folder/"
output_file = "/path/to/file.h5"

image_files = sorted(
    [os.path.join(tif_dir, f) for f in os.listdir(tif_dir) if f.endswith(".tif")]
)

first_image = tiff.imread(image_files[0])
img_shape = first_image.shape

with h5py.File(output_file, "w") as h5f:
    # Create a dataset to hold the TIFF images
    dtype = first_image.dtype

    # Create the dataset with chunking and compression
    dataset = h5f.create_dataset(
        "images",
        shape=(len(image_files),) + img_shape,
        dtype=dtype,
        chunks=(1, img_shape[0] // 10, img_shape[1] // 10),  # Example chunking
        compression="gzip"
    )

    # Iterate through each TIFF file and write it to the HDF5 dataset
    for i, filename in enumerate(tqdm(image_files, desc="Processing TIFF files")):
        if filename.endswith(".tif"):
            filepath = os.path.join(tif_dir, filename)
            image_data = tiff.imread(filepath)
            dataset[i, :, :] = image_data

The .h5 file generated is very large (900 GB) and therefore cannot be loaded into RAM. From what I've read, you can pre-compute the contents of this file to display it in chunks on-demand with CloudVolume. I tried this with the following script:

import tifffile as tiff
import h5py
import os
from tqdm import tqdm
from cloudvolume import CloudVolume
from cloudvolume.lib import mkdir

tif_dir = "/path/to/tif/folder/"

image_files = sorted(
    [os.path.join(tif_dir, f) for f in os.listdir(tif_dir) if f.endswith(".tif")]
)

first_image = tiff.imread(image_files[0])
img_shape = first_image.shape


input_hdf5_file = "/path/to/file.h5"
dataset_name = "images"

output_dir = "file:///path/to/Neuroglancer/folder"
mkdir(output_dir.replace("file://", ""))

with h5py.File(input_hdf5_file, "r") as h5f:
    dataset = h5f[dataset_name]
    data_shape = dataset.shape
    dtype = dataset.dtype
    print(f"Dataset shape: {data_shape}")
    print(f"Dataset dtype: {dtype}")

    # Create a CloudVolume object for the Neuroglancer precomputed format
    vol = CloudVolume(
        output_dir,
        info={
            "type": "image",
            "data_type": str(dtype),
            "num_channels": 1,
            "scales": [
                {
                    "chunk_sizes": [1, img_shape[0] // 10, img_shape[1] // 10],
                    "encoding": "raw",
                    "key": "full_resolution",
                    "resolution": [1, 1, 1],
                    "size": list(data_shape),  # Neuroglancer expects (z, y, x) order
                }
            ],
            "voxel_offset": [0, 0, 0],
        },
    )
    vol.commit_info()
    print("CloudVolume info:")
    print(vol.info)

    # Write data to the Neuroglancer precomputed format
    for z in tqdm(range(data_shape[0]), desc="Converting to Neuroglancer format"):
        slice_data = dataset[z, :, :]
        print(f"slice_data shape: {slice_data.shape}")
        slice_data = slice_data.reshape((slice_data.shape[1], slice_data.shape[0], 1))
        vol[:, :, z] = slice_data

But on the way out I find myself with a size incompatibility problem:

First image shape: (21796, 12876)
Dtype: uint16
Dataset shape: (3620, 21796, 12876)
Dataset dtype: uint16
CloudVolume info:
{'type': 'image', 'data_type': 'uint16', 'num_channels': 1, 'scales': [{'chunk_sizes': [1, 2179, 1287], 'encoding': 'raw', 'key': 'full_resolution', 'resolution': [1, 1, 1], 'size': [3620, 21796, 12876]}], 'voxel_offset': [0, 0, 0]}
Converting to Neuroglancer format:   0%|          | 0/3620 [00:02<?, ?it/s]
slice_data shape: (21796, 12876)

---------------------------------------------------------------------------
AlignmentError                            Traceback (most recent call last)
Cell In[18], line 65
     61         print(f"slice_data shape: {slice_data.shape}")
     62         slice_data = slice_data.reshape(
     63             (slice_data.shape[1], slice_data.shape[0], 1)
     64         )
---> 65         vol[:, :, z] = slice_data
     67 print("Conversion to Neuroglancer precomputed format completed.")

File ~/.conda/envs/SEG/lib/python3.10/site-packages/cloudvolume/frontends/precomputed.py:962, in CloudVolumePrecomputed.__setitem__(self, slices, img)
    959   imgshape = imgshape + [ self.num_channels ]
    961 if not np.array_equal(imgshape, slice_shape):
--> 962   raise exceptions.AlignmentError("""
    963     Input image shape does not match slice shape.
    964 
    965     Image Shape: {}  
    966     Slice Shape: {}
    967   """.format(imgshape, slice_shape))
    969 if self.autocrop:
    970   if not self.bounds.contains_bbox(bbox):

AlignmentError: 
        Input image shape does not match slice shape.

        Image Shape: [12876, 21796, 1, 1]  
        Slice Shape: [3620, 21796, 1, 1]

I've tried changing dimensions or reversing the valuers for each slice, but I always end up with a problem like this...

Thanks for your help!

@william-silversmith william-silversmith added the question What is going on??? :thinking emoji: label May 31, 2024
@william-silversmith
Copy link
Contributor

Hi!

Thanks for writing in. The first thing I would try, since you seem to be inserting these slices in an XY plane, is to set the chunk size to be [1024,1024,1], currently it is a single pixel in the x direction. Second, maybe try doing a transpose instead of a reshape? slice_data.T You can then do a np.squeeze. CloudVolume expects 4D arrays so add two trivial dimensions to the end:

slice_data = np.squeeze(slice_data.T)
while slice_data.ndim < 4:
    slice_data = slice_data[..., np.new_axis]

Let me know if this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question What is going on??? :thinking emoji:
Projects
None yet
Development

No branches or pull requests

2 participants