Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing CSV keypoint data from SLEAP #170

Open
smasri09 opened this issue Oct 1, 2024 · 11 comments
Open

Importing CSV keypoint data from SLEAP #170

smasri09 opened this issue Oct 1, 2024 · 11 comments

Comments

@smasri09
Copy link

smasri09 commented Oct 1, 2024

Hello, thank you for sharing these resources. I'm starting the process of using this interesting package to analyze my behavior involving two animals separated by a divider, recorded with an overhead camera. Due to issues with consistent animal tracking in sleap, I import CSV files exported by sleap and manually define tracks in matlab taking advantage of the divider. I am under the impression that keypoint-moseq works best with a single animal at a time, and will require consistent "track" data, so I made a script to export each animal individually back as CSVs in the format sleap exports them. Is it possible to import these CSVs back into keypoint-moseq currently? The manual indicates that h5 files are required.

Also, I'm curious as to what the model might do if both animals were included, with the two skeletons put in a as a single skeleton with twice as many points. Could it perhaps find syllables involving both animals?

Thank you for your time,
Samer Masri, NYU

@calebweinreb
Copy link
Contributor

Hi Samer,

I didn't realize SLEAP had a csv option. Since you have some coding knowledge, it would probably be easiest to write your own loader, as explained here. If that isn't possible for whatever reason, then you could send me sample csv files (maybe one output directly by SLEAP and another output by you from Matlab) and I can give it a try.

Regarding putting two skeletons as a single animal, you are certainly free to try. We've never tried it but I'd be very curious to see the results. In that case, I would suggest that you fix the heading as described here. I would also probably just run the "ar_only" modeling step and not the full modeling step.

@smasri09
Copy link
Author

smasri09 commented Oct 1, 2024

Thanks! I'd be happy to, however I'm not exactly sure how to interpret this paragraph:

If writing your own data loader, the output should be a coordinates dictionary that maps recording names to arrays of shape (num_frames, num_keypoints, num_dimensions), where num_dimensions is 2 or 3. The keypoint axis should correspond to the bodyparts list in the config. You can also include a confidences dictionary that maps recording names to arrays of shape (num_frames, num_keypoints)

This is how the data comes from sleap, and a separated track for one animal. I will also be doing interpolation for missing datapoints before feeding the data to moseq. Sorry for the trouble on that.

labels.v001.000_Back-2024_06_24_13_19_08.analysis.csv

labels.v001.000_Back-2024_06_24_13_19_08.analysis_bottom.csv

Edit: I understand the array description, I will see about saving the saving the dictionary from matlab and post code here

@calebweinreb
Copy link
Contributor

calebweinreb commented Oct 2, 2024

Sounds good! Although I think the easiest thing would be to load the files one at a time in python. Something like

import glob
import numpy as np
import os

keypoint_files = glob.glob("path/to/files/*.csv") # or define list manually

coordinates = {}
confidences = {}
for filepath in keypoint_files:
    data = np.loadtxt(filepath, skiprows=1, delimiter=",")
    data = data[:, 3:] # remove "track,frame_idx,instance_score" columns
    data = data.reshape(data.shape[0], -1, 3) # (n_frames, n_keyoints, 3)
    name = os.path.filename(filepath) # just use filename as key, not full path
    coordinates[name] = data[:, :, :2] # (x,y)
    confidences[name] = data[:, :, 2] # score

Note: I haven't tested this so you might have to debug a little / confirm that the arrays have the expected shapes etc.

@smasri09
Copy link
Author

smasri09 commented Oct 2, 2024

Thanks for the help with that. I think this works with a couple small changes:

import glob
import numpy as np
import os

coordinates = {}
confidences = {}
for filepath in keypoint_files:
data = np.loadtxt(filepath, skiprows=1, usecols=(3,4,5,6,7,8,9,10,11),delimiter=",")
data = data.reshape(data.shape[0], -1, 3) # (n_frames, n_keyoints, 3)
name = os.path.basename # just use filename as key, not full path
coordinates[name] = data[:, :, :2] # (x,y)
confidences[name] = data[:, :, 2] # score

I just have to figure out how to link videos to the CSVs. Right now I have two CSVs per video as I separated animals. I could also use one combined file with two tracks, I see moseq can use tracks with multiple animals?

@calebweinreb
Copy link
Contributor

Even if you have two files per video, it shouldn't matter. This function is used to match videos. As long as the video name appears as a prefix in both keypoint files, then the function will assign that video to both files.

@smasri09
Copy link
Author

smasri09 commented Oct 7, 2024

Hi, thanks again, I was able to follow that. Have not yet successfully run moseq, but I have the setup working correctly. I seem to have hit a couple new bugs following your project setup guide. I have two keypoint files and one video in the folder, both keypoint files point to the same video (one for each animal). Not sure how to proceed so I hope it's ok to ask here.

Skip to main panel

import keypoint_moseq as kpms

project_dir = r'C:\Users\smasr\Documents\MATLAB\Sanes\Observer workflow\moseq\first project'
video_dir = r'C:\Users\smasr\Desktop\Data\PrL GRABDA 1S R1\Observation\vids\test'
config = lambda: kpms.load_config(project_dir)

bodyparts=[
'tailstart', 'neck', 'nose']

skeleton=[
['tailstart', 'neck'],
['neck', 'nose']]

kpms.setup_project(
project_dir,
video_dir=video_dir,
bodyparts=bodyparts,
skeleton=skeleton)

kpms.update_config(project_dir,
anterior_bodyparts=['nose'],
posterior_bodyparts=['tailstart'],
use_bodyparts=[
'tailstart', 'neck', 'nose'])
import glob
import numpy as np
import os

keypoint_files = glob.glob(r"C:\Users\smasr\Desktop\Data\PrL GRABDA 1S R1\Observation\vids\test*.csv")

coordinates = {}
confidences = {}
for filepath in keypoint_files:
data = np.loadtxt(filepath, skiprows=1, usecols=(3,4,5,6,7,8,9,10,11),delimiter=",")
data = data.reshape(data.shape[0], -1, 3) # (n_frames, n_keyoints, 3)
name = os.path.basename # just use filename as key, not full path
coordinates[name] = data[:, :, :2] # (x,y)
confidences[name] = data[:, :, 2] # score
data, metadata = kpms.format_data(coordinates, confidences, **config())


TypeError Traceback (most recent call last)
Cell In[14], line 1
----> 1 data, metadata = kpms.format_data(coordinates, confidences, **config())

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\util.py:950, in format_data(coordinates, confidences, keys, seg_length, bodyparts, use_bodyparts, conf_pseudocount, added_noise_level, kwargs)
943 if bodyparts is not None:
944 assert len(bodyparts) == num_keypoints[0], fill(
945 f"The number of keypoints in coordinates ({num_keypoints[0]}) "
946 f"does not match the number of labels in bodyparts "
947 f"({len(bodyparts)})"
948 )
--> 950 if any([
"/" in key** for key in keys]):
951 warnings.warn(
952 fill(
953 'WARNING: Recording names should not contain "/", this will cause '
954 "problems with saving/loading hdf5 files."
955 )
956 )
958 if confidences is None:

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\util.py:950, in (.0)
943 if bodyparts is not None:
944 assert len(bodyparts) == num_keypoints[0], fill(
945 f"The number of keypoints in coordinates ({num_keypoints[0]}) "
946 f"does not match the number of labels in bodyparts "
947 f"({len(bodyparts)})"
948 )
--> 950 if any(["/" in key for key in keys]):
951 warnings.warn(
952 fill(
953 'WARNING: Recording names should not contain "/", this will cause '
954 "problems with saving/loading hdf5 files."
955 )
956 )
958 if confidences is None:

TypeError: argument of type 'function' is not iterable

and 2)
kpms.noise_calibration(project_dir, coordinates, confidences, **config())


TypeError Traceback (most recent call last)
Cell In[18], line 1
----> 1 kpms.noise_calibration(project_dir, coordinates, confidences, **config())

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\calibration.py:528, in noise_calibration(project_dir, coordinates, confidences, bodyparts, use_bodyparts, video_dir, video_extension, conf_pseudocount, downsample_rate, **kwargs)
525 annotations = load_annotations(project_dir)
526 sample_keys.extend(annotations.keys())
--> 528 sample_images = load_sampled_frames(
529 sample_keys, video_dir, video_extension, downsample_rate
530 )
532 return _noise_calibration_widget(
533 project_dir,
534 coordinates,
(...)
540 **kwargs,
541 )

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\calibration.py:100, in load_sampled_frames(sample_keys, video_dir, video_extension, downsample_rate)
75 """Load sampled frames from a directory of videos.
76
77 Parameters
(...)
97 corresponding videos frames.
98 """
99 keys = sorted(set([k[0] for k in sample_keys]))
--> 100 videos = find_matching_videos(keys, video_dir)
101 key_to_video = dict(zip(keys, videos))
102 readers = {key: OpenCVReader(video) for key, video in zip(keys, videos)}

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\util.py:165, in find_matching_videos(keys, video_dir, as_dict, recursive, recording_name_suffix, video_extension)
163 video_paths = []
164 for key in keys:
--> 165 matches = [
166 v
167 for v in videos_to_paths
168 if os.path.basename(key).startswith(v + recording_name_suffix)
169 ]
170 assert len(matches) > 0, fill(f"No matching videos found for {key}")
172 longest_match = sorted(matches, key=lambda v: len(v))[-1]

File ~.conda\envs\moseq\lib\site-packages\keypoint_moseq\util.py:168, in (.0)
163 video_paths = []
164 for key in keys:
165 matches = [
166 v
167 for v in videos_to_paths
--> 168 if os.path.basename(key).startswith(v + recording_name_suffix)
169 ]
170 assert len(matches) > 0, fill(f"No matching videos found for {key}")
172 longest_match = sorted(matches, key=lambda v: len(v))[-1]

File ~.conda\envs\moseq\lib\ntpath.py:216, in basename(p)
214 def basename(p):
215 """Returns the final component of a pathname"""
--> 216 return split(p)[1]

File ~.conda\envs\moseq\lib\ntpath.py:185, in split(p)
180 def split(p):
181 """Split a pathname.
182
183 Return tuple (head, tail) where tail is everything after the final slash.
184 Either part may be empty."""
--> 185 p = os.fspath(p)
186 seps = _get_bothseps(p)
187 d, p = splitdrive(p)

TypeError: expected str, bytes or os.PathLike object, not function

This is what I used to setup before calling the file loader

import keypoint_moseq as kpms

project_dir = r'C:\Users\smasr\Documents\MATLAB\Sanes\Observer workflow\moseq\first project'
video_dir = r'C:\Users\smasr\Desktop\Data\PrL GRABDA 1S R1\Observation\vids\test'
config = lambda: kpms.load_config(project_dir)

bodyparts=[
'tailstart', 'neck', 'nose']

skeleton=[
['tailstart', 'neck'],
['neck', 'nose']]

kpms.setup_project(
project_dir,
video_dir=video_dir,
bodyparts=bodyparts,
skeleton=skeleton)

kpms.update_config(project_dir,
anterior_bodyparts=['nose'],
posterior_bodyparts=['tailstart'],
use_bodyparts=[
'tailstart', 'neck', 'nose'])

@calebweinreb
Copy link
Contributor

The problem is
name = os.path.basename
it probably should be
name = os.path.basename(filepath)

Also, I noticed you only have 3 bodyparts. I would say that's really really pushing it for single animal syllables. It could be enough for the 2-animal version we were discussing earlier... although as I mentioned, we've never tried that before.

@smasri09
Copy link
Author

smasri09 commented Oct 7, 2024

Thanks again, that worked, it appeared the data was loading, but not correctly.

Does moseq make use of frame_idx in any way? It appears to be matching the wrong frames with keypoint rows because of that. There are often gaps, and some of my animals are introduced a minute into the recording. Or else I can fill in all missing frame indices with nans on the matlab side.

As for the skeleton, I'll have to try and see! I found a 3-point skeleton offered much better accuracy from sleap, and captured the important data I needed for my experiments, primarily head angle and velocity. If it looks bad I can try and add another on the spine or more.

@calebweinreb
Copy link
Contributor

Yes as of 0.4.10 (which I released seconds ago and might take a few mins to propogate through pypi) you can now pass video_frame_indexes to all functions that read from the videos (e.g. see https://keypoint-moseq.readthedocs.io/en/latest/advanced.html#trimming-inputs).

So your loading code could become something like:

video_frame_indexes = {}
coordinates = {}
confidences = {}
for filepath in keypoint_files:
    data = np.loadtxt(filepath, skiprows=1, delimiter=",")
    name = os.path.basename(filepath) # just use filename as key, not full path

    video_frame_indexes[name] = data[:,1].astype(int) # might need to subtract 1 if not already in base-0 indexing

    data = data[:, 3:] # remove "track,frame_idx,instance_score" columns
    data = data.reshape(data.shape[0], -1, 3) # (n_frames, n_keyoints, 3)
    coordinates[name] = data[:, :, :2] # (x,y)
    confidences[name] = data[:, :, 2] # score

@fhj0924
Copy link

fhj0924 commented Nov 1, 2024

I am so happy to someone talk about csv, so i want to know how to use csv to keypoint-moseq,because the manual indicates that h5 files are required. can you share your code about how to use csv?I am very grateful for this.

@fhj0924
Copy link

fhj0924 commented Nov 1, 2024

I am so happy to someone talk about csv, so i want to know how to use csv to keypoint-moseq,because the manual indicates that h5 files are required. can you share your code about how to use csv?I am very grateful for this.

Also, I would like to ask how you handle missing values in CSV files and how you complete the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants