Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skeleton.from_swc breaks with KeyError parent value 0 #575

Open
manoaman opened this issue Feb 4, 2023 · 19 comments
Open

Skeleton.from_swc breaks with KeyError parent value 0 #575

manoaman opened this issue Feb 4, 2023 · 19 comments
Assignees
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users.

Comments

@manoaman
Copy link

manoaman commented Feb 4, 2023

Hi,

I've recently updated CloudVolume (8.18.1) and realized a couple of things broke while converting to precomputed format.

  1. Skeleton.from_swc() throws KeyError in this line for swc files containing a parent_id value 0

Changed this line to avoid the the error.

      if parent_id > 0:
        if vid < parent_id:
          edge = [vid, parent_id]
        else:
          edge = [parent_id, vid]

Sample data:

# id type x y z radius parent
1 1 10662.0 1035.0 4095.0 11.0 0
2 3 10650.999 1040.0 4098.258 5.0 1
3 3 10649.728 1042.636 4100.0 4.0 2
4 3 10648.539 1041.539 4102.154 3.0 3
5 3 10646.334 1041.0 4103.333 1.0 4
6 3 10645.667 1042.333 4105.445 3.0 5

In other cases, parent value takes -1 and they convert ok. Would it be possible to allow 0 parent value in swc files?

  1. scales value in info file

Has the "scales" definition changed lately? It seems I need to change the value in an array of dict type now. Do you have an latest example of the placeholder I can use for skeleton?

        # info = {"@type": "neuroglancer_skeletons", "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}], "scales":"um"}  
        info = {"@type": "neuroglancer_skeletons", "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}], "scales":[{'key':'um','resolution':[1000,1000,1000]}]}

Thank you,
-m

@william-silversmith william-silversmith added the bug The code is not performing according to the design or a design flaw is seriously impacting users. label Feb 4, 2023
@william-silversmith
Copy link
Contributor

Hi m! Sorry, I was on vacation for a few weeks. I'll take a look into this soon.

@william-silversmith william-silversmith self-assigned this Feb 7, 2023
@william-silversmith
Copy link
Contributor

william-silversmith commented Feb 7, 2023

Hi m,

I was trying to follow this specification.
https://web.archive.org/web/20180423163403/http://research.mssm.edu/cnic/swc.html

n T x y z R P

n is an integer label that identifies the current point and increments by one from one line to the next.
...

P indicates the parent (the integer label) of the current point or -1 to indicate an origin (soma).

In the file you provided, does 0 refer to an origin? Unless this is a common thing to do, I would think these files are corrupt. What software generated them?

As for point 2, I might be outdated, but I'm not familiar with a "scales" parameter in the skeleton info file. I looked at the Neuroglancer specification and don't see it there. I think the "transform" parameter is more typically used for this.

https://github.com/google/neuroglancer/blob/master/src/neuroglancer/datasource/precomputed/skeletons.md

Can you help me understand how it is used and where it comes from?

@manoaman
Copy link
Author

manoaman commented Feb 7, 2023

Hi Will,

Welcome back and I hope you had a good vacation!!

In the file you provided, does 0 refer to an origin? Unless this is a common thing to do, I would think these files are corrupt. What software generated them?

I believe 0 does refer to an origin. Unfortunately, it seems to be the originating files are in a different format (.eswc) and could have been from scripts generating this format. I suppose I could handle these 0 values to -1 while converting the format into .swc to circumvent the errors.

As for point 2, I might be outdated, but I'm not familiar with a "scales" parameter in the skeleton info file. I looked at the Neuroglancer specification and don't see it there. I think the "transform" parameter is more typically used for this.
Can you help me understand how it is used and where it comes from?

Hmm.... I thought I was using remnants from other versions but seems to be an error after taking out scales.


swc_to_precomputed.py:38: in indexed_swc_to_precomputed
    cv = CloudVolume(f'file://{precomputed_path}', info=info)
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/cloudvolume.py:230: in __new__
    return REGISTERED_PLUGINS[path.format](**kwargs)
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/__init__.py:104: in create_precomputed
    return CloudVolumePrecomputed(
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py:55: in __init__
    self.mip = mip
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py:272: in mip
    self.config.mip = self.meta.to_mip(mip)
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py:600: in to_mip
    scales = [ ",".join(map(str, scale)) for scale in self.available_resolutions ]
/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py:490: in available_resolutions
    return (s["resolution"] for s in self.scales)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <cloudvolume.datasource.precomputed.metadata.PrecomputedMetadata object at 0x11eaabf10>

    @property
    def scales(self):
>     return self.info['scales']
E     KeyError: 'scales'

/usr/local/var/pyenv/versions/3.8.5/envs/cloudvolume/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py:430: KeyError

@william-silversmith
Copy link
Contributor

Hi m!

Thank you! I think if you can adjust the conversion script that would be best, but then again, I've never heard of eswc before. Can you link me to some documentation?

I'm not totally sure, but I think that error means you have the info file in the main directory. The skeleton info file should be in the skeletons directory. A normal image describing info file should be in the top level directory.

Will

@manoaman
Copy link
Author

manoaman commented Feb 8, 2023

Thank you! I think if you can adjust the conversion script that would be best, but then again, I've never heard of eswc before. Can you link me to some documentation?

https://www.nature.com/articles/sdata2017207#MOESM122

I'm not totally sure, but I think that error means you have the info file in the main directory. The skeleton info file should be in the skeletons directory. A normal image describing info file should be in the top level directory.

Interesting, that reminds me the way how I created a skeleton's info file. I restructure attributes from the main directory's info schema since CloudVolume requires info to be passed in. I'd have to think that the main directory's info template has changed since the older version I used. This works.

      print(f'Start converting {filename} ....')
      with open(os.path.join(os.getcwd(), filename), 'r') as file: # open in readonly mode
        data = file.read()
        skel = Skeleton.from_swc(data) # decode an SWC file        

        # info = {"@type": "neuroglancer_skeletons", "transform": [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}]}
        info = {"@type": "neuroglancer_skeletons", "transform": [0, 1000, 0, 0, 0, -1000, 0, 0, 1000, 0, 0, 0], "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}], "scales":[{'key':'um','resolution':[1000,1000,1000]}]}
        # info = {"@type": "neuroglancer_skeletons", "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}], "scales":"um"}        
        # info = {"@type": "neuroglancer_skeletons", "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}]}

        skel.id = os.path.splitext(os.path.basename(filename))[0]
        cv = CloudVolume(f'file://{precomputed_path}', info=info)

        # prepare for info file
        # cv.skeleton.meta.info['@type'] = 'neuroglancer_skeletons'
        # cv.skeleton.meta.info['vertex_attributes'] = [{'id': 'radius', 'data_type': 'float32','num_components': 1}]
        # del cv.skeleton.meta.info['sharding']
        # del cv.skeleton.meta.info['spatial_index']
        cv.skeleton.meta.info = info
        del cv.skeleton.meta.info['scales']
        cv.skeleton.meta.commit_info()

        # remove 
        skel.extra_attributes = [ 
            attr for attr in skel.extra_attributes 
            if attr['data_type'] in ('float32', 'float64')
        ]
        cv.skeleton.upload(skel)

@william-silversmith
Copy link
Contributor

Interesting. It looks like the 0 corresponds to the eswc format, but now to swc. The SWC format is defined in this paper: https://www.sciencedirect.com/science/article/pii/S0165027098000910

It looks like eswc is used in this field. If there's more demand for its support, I'll consider it, but I'd like to keep things to a tighter scope if possible.

As for the example above, I'm surprised that you need to add and delete the scales attribute. The simple assignment operator for the info shouldn't be triggering any additional behavior.

@manoaman
Copy link
Author

Hi @william-silversmith

I'm not totally sure, but I think that error means you have the info file in the main directory. The skeleton info file should be in the skeletons directory. A normal image describing info file should be in the top level directory.

As for the example above, I'm surprised that you need to add and delete the scales attribute. The simple assignment operator for the info shouldn't be triggering any additional behavior.

Might be related to this topic. CloudVolume offers a feature to save skeletons. However, if the target datasource url only contains precomputed skeletons with an info file as the following, I get the error when initializing a CloudVolume object vol = cloudvolume.CloudVolume(source.url, parallel=True, progress=True) asking for scales json attribute. In the meantime, there is no multi-resolution precomputed images nor mesh directories and associated info files available in the same folder.

Is there a way to get around these errors asking for valid info file format when only the data source url is provided with a standalone precomputed skeletons? Perhaps a fake template of an scales json attribute? Any thoughts?

info

{"@type": "neuroglancer_skeletons", "transform": [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], "vertex_attributes": [{"id": "radius", "data_type": "float32", "num_components": 1}],"segment_properties":"segment_properties"}

errors

Traceback (most recent call last):
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 77, in <module>
    main()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 71, in main
    save_skeletons(state=parsed_args.state,
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 44, in save_skeletons
    vol = cloudvolume.CloudVolume(source.url, parallel=True, progress=True)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/cloudvolume.py", line 230, in __new__
    return REGISTERED_PLUGINS[path.format](**kwargs)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/__init__.py", line 104, in create_precomputed
    return CloudVolumePrecomputed(
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py", line 55, in __init__
    self.mip = mip
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/frontends/precomputed.py", line 272, in mip
    self.config.mip = self.meta.to_mip(mip)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py", line 600, in to_mip
    scales = [ ",".join(map(str, scale)) for scale in self.available_resolutions ]
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py", line 490, in available_resolutions
    return (s["resolution"] for s in self.scales)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/datasource/precomputed/metadata.py", line 430, in scales
    return self.info['scales']
KeyError: 'scales'

@william-silversmith
Copy link
Contributor

Oh yea, the entire system is designed around the images. You need to have a valid image info in the directory above the skeletons. We can maybe change that, but for now you just need to make a fake info.

@manoaman
Copy link
Author

Do you know if the vertex_type will get lost during the conversion from a swc to a precomputed format? This part of the information seems to be empty when I try to download a skeleton.

Saving layer 'skeletons' object 1 -> skeletons_test/1.swc
Traceback (most recent call last):
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 77, in <module>
    main()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 71, in main
    save_skeletons(state=parsed_args.state,
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 55, in save_skeletons
    data = skeleton.to_swc()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/skeleton.py", line 1076, in to_swc
    skels = self.components()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/skeleton.py", line 918, in components
    vtypes = skel.vertex_types[vert_idx]
IndexError: index 0 is out of bounds for axis 0 with size 0

@william-silversmith
Copy link
Contributor

Hi m,

Can you provide some sample data? I tried Skeleton.from_swc on the sample data you provided above (changing parent 0 to parent -1). The vertex types appear to be correctly recovered and then rendered when I call to_swc again.

@manoaman
Copy link
Author

Hi Will,

I can provide info files I'm using for precomputed dataset. My doubts are in the construction of info files for both the skeleton and the fake one for the image. Not sure why I get the error on the vertex_type. Converted file displayed ok on the Neuroglancer viewer.

The code I tried to run is what I modified from the Neuroglancer's tool package which I specifically modified for saving skeletons.

Usage: python -m neuroglancer.tool.save_skeletons --url 'https://.....' --output-dir skeletons_test

import argparse
import os
import sys

import neuroglancer
import neuroglancer.cli

try:
    import cloudvolume
except ImportError:
    print('cloud-volume package is required: pip install cloud-volume')
    sys.exit(1)

def save_skeletons(state, output_dir, output_format):
    for layer in state.layers:
        if not isinstance(layer.layer, neuroglancer.SegmentationLayer): continue
        if not layer.visible: return False
        for source in layer.source:
            if not source.url.startswith('precomputed://'):
                continue
            vol = cloudvolume.CloudVolume(source.url, parallel=True, progress=True)
            if len(layer.segments) == 0: continue
            get_skeleton_kwargs = {}
            for segment in layer.segments:
                output_path = os.path.join(output_dir, '%d.%s' % (segment, output_format))
                print('Saving layer %r object %s -> %s' % (layer.name, segment, output_path))
                os.makedirs(output_dir, exist_ok=True)
                skeleton = vol.skeleton.get(segment, **get_skeleton_kwargs)
                if isinstance(skeleton, dict):
                    skeleton = list(skeleton.values())[0]
                if output_format == 'swc':
                    data = skeleton.to_swc()
                elif output_format == 'precomputed':
                    data = skeleton.to_precomputed()
                with open(output_path, 'wb') as f:
                    f.write(data)
            return
    print('No segmentation layer found')
    sys.exit(1)


def main(args=None):
    ap = argparse.ArgumentParser()
    neuroglancer.cli.add_state_arguments(ap, required=True)
    ap.add_argument('--format', choices=['swc'], default='swc')
    ap.add_argument('--output-dir', default='.')
    parsed_args = ap.parse_args()
    save_skeletons(state=parsed_args.state,
                output_dir=parsed_args.output_dir,
                output_format=parsed_args.format)


if __name__ == '__main__':
    main()

[skeleton info]
{"@type": "neuroglancer_skeletons", "transform": [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0], "vertex_attributes": [{"id": "radius", "data_type": "float32", "num_components": 1}]}

[fake image info]

{
  "data_type": "uint8",
  "num_channels": 1,
  "scales": [
    {
      "chunk_sizes": [
        [
          128,
          128,
          128
        ]
      ],
      "encoding": "raw",
      "key": "1000_1000_1000",
      "resolution": [
        1000,
        1000,
        1000
      ],
      "sharding": {
        "@type": "neuroglancer_uint64_sharded_v1",
        "data_encoding": "gzip",
        "hash": "identity",
        "minishard_bits": 0,
        "minishard_index_encoding": "gzip",
        "preshift_bits": 10,
        "shard_bits": 12
      },
      "size": [
        17664,
        12032,
        8448
      ],
      "voxel_offset": [
        0,
        0,
        0
      ]
    }
  ],
  "type": "segmentation",
  "skeletons":"skeletons"
}

And the stack trace of the error:

Saving layer 'skeletons' object 1 -> skeletons_test/1.swc
Traceback (most recent call last):
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/var/pyenv/versions/3.8.5/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 77, in <module>
    main()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 71, in main
    save_skeletons(state=parsed_args.state,
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/neuroglancer/tool/save_skeletons.py", line 55, in save_skeletons
    data = skeleton.to_swc()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/skeleton.py", line 1076, in to_swc
    skels = self.components()
  File "/usr/local/var/pyenv/versions/cloudvolume_test_385/lib/python3.8/site-packages/cloudvolume/skeleton.py", line 918, in components
    vtypes = skel.vertex_types[vert_idx]
IndexError: index 0 is out of bounds for axis 0 with size 0

@manoaman
Copy link
Author

This is the code I use to convert swc to precomputed format.

def swc_to_precomputed(base_dir, tgt_dir, transform_matrix=[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0]):
  if not os.path.isdir(base_dir):
    print(f'Not a directory {base_dir}')
    sys.exit(2)

  skels = {} 
  id = 1 
  for filename in glob.glob(base_dir+'/*.swc'):
    with open(os.path.join(os.getcwd(), filename), 'r') as file: # open in readonly mode
        data = file.read()
        skel = Skeleton.from_swc(data) # decode an SWC file
        bytes = skel.to_precomputed()
        skels[id] = skel
        id = id + 1 

  try:
      print("Start of the SWC processing...")

      for id, skel in skels.items():
          info = {"@type": "neuroglancer_skeletons", "transform": transform_matrix, "vertex_attributes": [{"id": "radius", "data_type": "float32","num_components": 1}], "scales":[{'key':'um','resolution':[1000,1000,1000]}],"segment_properties":"segment_properties"}
          skel.id = id 
          cv = CloudVolume(f'file://{tgt_dir}/precomputed_swc', info=info)

          # prepare for info file
          cv.skeleton.meta.info['@type'] = 'neuroglancer_skeletons'
          # cv.skeleton.meta.info['transform'] = transform_matrix    # transformation matrix does not seem to position swc files in intended space
          cv.skeleton.meta.info['vertex_attributes'] = [{'id': 'radius', 'data_type': 'float32','num_components': 1}]
          del cv.skeleton.meta.info['sharding']
          del cv.skeleton.meta.info['spatial_index']
          cv.skeleton.meta.commit_info()

          # remove 
          skel.extra_attributes = [ 
              attr for attr in skel.extra_attributes 
              if attr['data_type'] in ('float32', 'float64')
          ]    

          cv.skeleton.upload(skel)


  except IOError as err:
    errno, strerror = err.args
    print ('I/O error({0}): {1}'.format(errno, strerror))
    print (err)
  except ValueError as ve:
    print ('Could not convert data to an integer.')
    print (ve) 
  except:
    print ('Unexpected error:', sys.exc_info()[0])
    raise  

@william-silversmith
Copy link
Contributor

Hi m,

At a glance, the problem seems to be that vertex_types is not included in the vertex_attributes list.

[
      {
        "id": "radius",
        "data_type": "float32",
        "num_components": 1,
      }, 
      {
        "id": "vertex_types",
        "data_type": "uint8",
        "num_components": 1,
      }
    ]```

@manoaman
Copy link
Author

Hi Will,

Thank you for pointing out the missing fragment. I also noticed by commenting out this section of the code during format conversion and updated the generated precomputed file (.gz), the code seems to work ok. Not sure why I had this fragment but this part seems to have caused the issue.

          # remove 
          # skel.extra_attributes = [ 
          #     attr for attr in skel.extra_attributes 
          #     if attr['data_type'] in ('float32', 'float64')
          # ]    

Thank you for your help!

@william-silversmith
Copy link
Contributor

william-silversmith commented Apr 19, 2023 via email

@manoaman
Copy link
Author

It looks like {'id': 'vertex_types', 'data_type': 'uint8', 'num_components': 1} get removed from skel.extra_attributes. Should I allow uint8?

@william-silversmith
Copy link
Contributor

william-silversmith commented Apr 19, 2023 via email

@manoaman
Copy link
Author

Hi Will,

So by setting and keeping uint8 vertex_types attribute in the skel.extra_attributes, I was able to save a skeleton with CloudVolume.

[{'id': 'radius', 'data_type': 'float32', 'num_components': 1}, {'id': 'vertex_types', 'data_type': 'uint8', 'num_components': 1}]

However, on the Neuroglancer side, this attribute prevents from viewing. Is there another way to configure this vertex_types?

Screenshot 2023-04-24 at 2 54 13 PM

@william-silversmith
Copy link
Contributor

william-silversmith commented Apr 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users.
Projects
None yet
Development

No branches or pull requests

2 participants