Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate_grid_movies fails on large datasets #183

Open
mshallow opened this issue Dec 14, 2024 · 1 comment
Open

generate_grid_movies fails on large datasets #183

mshallow opened this issue Dec 14, 2024 · 1 comment

Comments

@mshallow
Copy link

I have been trying to work with my full dataset of ~1400 files, and have found a couple of places where things fail with that much data. The first was similar to issue #177, where the kernel would crash before the model could apply to the new data. This was solvable by just applying it in chunks as has been suggested a couple different places.
The other place that I have encountered errors (and seemingly meaningless/ erroneous error messages) is while trying to generate the grid movies. The main failure mode here is that when trying to generate grid movies from all of the data, the code often fails before any of them have been generated and gives either and ffmpeg codec error or an opencv reader error.
The ffmpeg codec error suggests that it is an issue with codec h264 and the opencv error will say that the files are unreadable. After almost a month of looking into if there was an actual file issue with these video files and finding nothing different or weird about them, it seems to be that if I run generate_grid_movies on chunks of data at a time, I don't run into these issues, suggesting that it might be some sort of memory usage issue.
I tried this approach, and still after running on ~900 off the ~1400 videos, in chunks of 75 videos at a time, the code still errored in a mysterious way. When it gets to the 900th video and tries to read in the coordinates etc for this chunk of videos, it gives a key error that the name of the coordinates doesn't exist. If I check if that key exists in the larger coordinates file, it does, but the code fails over and over again. If I try to skip this chunk of videos, it fails the same way on the first video of the next chunk.
I have tried running this multiple times and it has always failed on the 900th video.
I'm running things on an apple M3Pro chip with 18GB of RAM and have watched the memory pressure while things run etc. It never seems to spike before it errors, just holds steady using ~10-12GB of RAM.
Screenshot 2024-12-13 at 4 55 57 PM
Screenshot 2024-12-13 at 4 56 12 PM
Screenshot 2024-12-13 at 4 56 23 PM
Screenshot 2024-12-13 at 4 56 32 PM

@calebweinreb
Copy link
Contributor

Hi,

Thanks for the thorough description. It's hard for me to debug the key error you posted above without fully seeing the code for chunk_and_generate_grid_movies. Before that, however, I think we might be able to address the underlying issue that arises with the original generate_grid_movies.

First, we should double check that all the frames from each video are readable. You can do that using the code below, which should run without erroring any specific videos.

from vidio.read import OpenCVReader
video_paths = kpms.find_matching_videos(coordinates.keys(), video_dir, as_dict=True)
for key, path in video_paths.items():
    last_frame = coordinates[key].shape[0] - 1
    image = OpenCVReader(path)[last_frame]

If that works, then I agree that maybe there's a memory error with too many videos. This might be because kpms generates readers for all the videos ahead of time. We could potentially fix this by having it generate readers only for the videos that it actually loads. I'm super busy at the moment but if you're willing to edit the kpms source code, you could try implementing this change and if it works we would love to review a corresponding pull request.

The basic idea would be to no longer run this line

videos = {k: OpenCVReader(path) for k, path in video_paths.items()}

You'd then have to instantiate just one reader here to get the framerate

fps = list(videos.values())[0].fps

Instead of passing videos here, pass video_paths

Instantiate a video reader here where the frames are actually read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants