Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snapshot should write to a different file every time it is triggered #1099

Open
bijoua29 opened this issue Sep 25, 2022 · 26 comments · Fixed by #1842
Open

Snapshot should write to a different file every time it is triggered #1099

bijoua29 opened this issue Sep 25, 2022 · 26 comments · Fixed by #1842
Assignees
Labels
enhancement New feature or request

Comments

@bijoua29
Copy link

Description

When you trigger multiple snapshots through the service call, each snapshot data is written to the same file. This is not very useful for multiple snapshots.

The typical use-case for the snapshot feature is to save a snapshot when the user see something "interesting". This might happen multiple times. Usually triggering a snapshot is not always a one-time event.

Related Issues

N/A

Completion Criteria

  • Save to a different file when a snapshot is triggered
  • Add ability to specify filename for each snapshot or allow option for timestamped file

Implementation Notes / Suggestions

N/A

Testing Notes / Suggestions

N/A

@bijoua29 bijoua29 added the enhancement New feature or request label Sep 25, 2022
@swiz23
Copy link

swiz23 commented Feb 10, 2023

I second this feature request. This seems helpful.

@wvergouwenavular
Copy link

+1

@emersonknapp
Copy link
Collaborator

Note - the recorder does provide the service split_bagfile - you could call that immediately after the snapshot to split. Though, I think it's not unreasonable to provide a split option in Snapshot.srv so that it can happen atomically. PRs welcome!

@Timple
Copy link

Timple commented Jan 22, 2024

Shall we extend the Snapshot.srv from this:

---
bool success

To this?

string filename   # Name of bag file to save snapshot to.
---
bool success

In line with this: https://github.com/ros/rosbag_snapshot/blob/main/rosbag_snapshot_msgs/srv/TriggerSnapshot.srv

We probably need to come up with a way to append the timestamp as well. Perhaps if we enter something like my_file_{stamp} that this would result in my_file_2024_01_22-14_18_28/?
Suggestions welcome

@MichaelOrlov
Copy link
Contributor

@Timple Extend Snapshot.srv with a few new parameters is not a big deal or a problem.
However, it turns out that it is not trivial to implement this feature on the lower rosbag2 layers.
There is also another request to extend snapshot feature to be able to write a buffer after the event as well.
These two extensions are sort of overlaps in implementation and even alone are not trivial to implement.

@bijoua29
Copy link
Author

I guess I'm surprised that just adding the ability to name the file seems not trivial to implement. But I have no knowledge of the structure of the code.

@Timple
Copy link

Timple commented Feb 15, 2024

Would it make sense to align on the service call syntax already?

ROS2 Jade is coming up, and changing a service after release won't be possible I guess. Implementation can then be worked on after the interface is agreed.

@bijoua29
Copy link
Author

Yes, it would make sense on the service call syntax. Was there a proposal for an updated syntax?

@Timple
Copy link

Timple commented Feb 15, 2024

#1099 (comment) ?

@reinzor
Copy link
Contributor

reinzor commented Apr 9, 2024

Hi all,

Thanks for all the information so far. I've read this issue and studied this PR with all its comments. Based on this info and fiddling a bit in the rosbag2_cpp package, I can come up with the following plan:

  • Extend Snapshot.srv to include a name, as suggested here
  • This name can then be passed in the take_snapshot method as an input argument.
  • The SequentialWriter will then receive the name and implement the following logic after the notify_data_ready
    • Store the current storage_ path
    • Call the switch_to_next_storage method
    • Rename the last written storage file and alter the metadata (rename based on the name and the current stamp)
    • Write the metadata (as is being done in the close() method)

I rapidly implemented this and this seemed to work.

This will give the following output:

.
└── rosbag2_2024_04_09-14_59_55
    ├── metadata.yaml
    ├── rosbag2_2024_04_09-14_59_55_0_snapshot_A.mcap
    ├── rosbag2_2024_04_09-14_59_55_1_snapshot_B.mcap
    ├── rosbag2_2024_04_09-14_59_55_2_snapshot_C.mcap

@bijoua29 @dkorenkovnl @Timple @MichaelOrlov could you give your feedback on this approach before I send a PR? Thanks!

@MichaelOrlov
Copy link
Contributor

@reinzor That is one more very hacky way to implement this feature.

  1. I don't like the idea of renaming a file right after writing it because we should have a callback bag_events::BagEvent::WRITE_SPLIT right after creating a new file and closing the old one. Essentially we need functionality from the SequentialWriter::split_bagfile()
  2. The SequentialWriter::switch_to_next_storage() will call cache_consumer_->stop(); which will try to swap circular buffers again and flush data to disk one more time. And there will be a race condition between the current call to data dumping from the buffer. if we are lucky and we have a lot of messages in buffer and slow hdd we probably in a good luck. If not we will dump extra data randomly one more time which we were not asked for. This is undefined behavior!

@reinzor
Copy link
Contributor

reinzor commented Apr 11, 2024

@reinzor That is one more very hacky way to implement this feature.

  1. I don't like the idea of renaming a file right after writing it because we should have a callback bag_events::BagEvent::WRITE_SPLIT right after creating a new file and closing the old one. Essentially we need functionality from the SequentialWriter::split_bagfile()
  2. The SequentialWriter::switch_to_next_storage() will call cache_consumer_->stop(); which will try to swap circular buffers again and flush data to disk one more time. And there will be a race condition between the current call to data dumping from the buffer. if we are lucky and we have a lot of messages in buffer and slow hdd we probably in a good luck. If not we will dump extra data randomly one more time which we were not asked for. This is undefined behavior!

Thanks for your feedback @MichaelOrlov ; could you perhaps point me in the right direction for this implementation? Regarding point 1, what would be your suggestion instead of renaming? As far as I know, the name of the file is already defined when starting the record so if we would like the snapshot to have a specific filename, this would not be possible. We could maybe add some additional data to the custom_data structure in the metadata? As for point 2, it is not very clear for me how the internal consumer work so how can I implement this in a proper way without the undefined behavior? Thanks again!

@Timple
Copy link

Timple commented Apr 11, 2024

I believe #1584 was a step in that direction. However some workarounds where proposed there and now it's closed.
It does contain a hint on where to start it seems: #1584 (comment)

@MichaelOrlov you seem to be most familiar with the codebase in this thread. Would it be possible to write up a small set of steps to take to tackle this issue?

@MichaelOrlov
Copy link
Contributor

I am sorry folks, I am currently kinda busy with Jazzy release preparation. A lot of PRs and other things need to wrap up.
I can get back to this task after a couple of weeks.
Need to think thoroughly about how we can implement this feature in the best way.
In the case of the filename not sure if it would be possible to enforce a new file name over the snapshot service call without file renaming I was thinking about maybe in this case we can return the saved filename in the snapshot service response message instead of enforcing a new name?

@Timple
Copy link

Timple commented Apr 11, 2024

I'm not quite sure why the bagfile is already open actually. Because that is where this problem is originating from.

Why not open it upon receiving a snapshot request?

@fujitatomoya
Copy link
Contributor

Why not open it upon receiving a snapshot request?

sounds reasonable, file should be named and created upon snapshot capture request by user. but i am not sure how we can deal with metadata.yaml? can it include the multiple bag data information? if not, it should create rosbag2_2024_04_15-15_19_02?snapshot folder on the request by user?

@reinzor
Copy link
Contributor

reinzor commented Oct 22, 2024

@fujitatomoya @MichaelOrlov How can we best continue with this implementation. There appears to be a lot of interest in this functionality together with a fixed duration for a bag file. We can help in the implementation but this would require some pointing in the correct direction. Let us know :).

@fujitatomoya
Copy link
Contributor

@reinzor @MichaelOrlov @Timple

just a question though, do we need to support both snapshot and split at the same time? user is expecting the snapshot data for certain cache size. i think if the files are split into multiple files, that would not be useful for user? i do not see the actual use case for this...

@bijoua29
Copy link
Author

@fujitatomoya Yes, I would expect snapshot for a certain cache size or duration. So I wouldn't expect or need the snapshot to be split.

@MichaelOrlov
Copy link
Contributor

@reinzor As regards to

How can we best continue with this implementation?

I see a new related PR #1839. Let me review it first, and then I will try to figure out how to continue.
Glad to hear that you can allocate resources to help with this task.

@MichaelOrlov
Copy link
Contributor

@bijoua29 We need to move the completion criteria

  • Add ability to specify filename for each snapshot or allow option for timestamped file

to a separate issue with feature request.

@Timple
Copy link

Timple commented Oct 27, 2024

@reinzor @MichaelOrlov @Timple

just a question though, do we need to support both snapshot and split at the same time? user is expecting the snapshot data for certain cache size. i think if the files are split into multiple files, that would not be useful for user? i do not see the actual use case for this...

Our use-case:

In the case of a event (e.g. robot collision) we want to have a snapshot of (almost) all data available.

Single file

This snapshot should be a single rosbag. Having 12 snapshots in a single file still makes this file hard to offload from the robot. Also it's hard to link this file to an event as it contains multiple.

Also closing the file makes for other synchronization services to kick-in. If the file keeps being open for the duration of the robots uptime, we would not be able to offload it until the next reboot.

File names

Ideally we can also control the name of the bagfile. Such that it can be easier linked with events. Although this can also be solved with metadata inside the bagfile.
If there is a timestamp in the filename, it should either correlate to the time of the snapshot call or the time of the first data entry.
Having a timestamp in the filename when the node started does not make any sense regarding the data it contains.

Duration

Having a fixed duration (like 30seconds) makes more sense for our usecase than say (1gb of data). Because the amount of data can verify. And will most likely increase as our software evolves, making the bagfiles shorter.


Not all three points might belong to this issue, but I thought I'd fully clarify our usecase.


Edit: Appendix:

We also record all data on our machines. With a cleanup script to prevent the disk getting full. Currently there is quite a CPU hog as all data is received twice (once per bag process).

If the standard bag service would provide snapshot feature alongside it's standard recording, that would be awesome! Although I currently have no idea how this would look like, so might be far-fetched.

@MichaelOrlov
Copy link
Contributor

Hi @Timple, Thank you for posting real use cases. Your feedback is precious.
First of all, I want to clarify that when we were talking about

just a question though, do we need to support both snapshot and split at the same time?

We meant that it shouldn't happen bag_split due to the limit imposed on the maximum file size when we will be writing a snapshot to a new file.

As regards filenames. I looked at the code yesterday, and I can say that it is possible to implement a custom name for the snapshot file. We can correct the file name in metadata before saving and rename file at the right moment after the closing it.
Although, I am not convinced yet that timestamp in the filename makes sense or that it will add any value in it. The file timestamp is present in filesystem attributes and inside metadata. The sequence number makes more sense IMO. Please provide more details about this topic if you have.
Also, a few caveats could be in a custom filename. What if the file with the same filename already exists? Or what if the filename is too long or has a path in it to the "undesirable" or protected location? I would like to avoid handling all these corner cases. Therefore, having a custom filename is an undesirable feature.

I understand that users want to somehow tag the snapshot and know the recorded filename.
There are a few possible options in addition to the explicitly enforcing filename for the snapshot.

  1. Provide 'custom-data' with the <key, value> during the snapshot service request and save it in the exiting custom_data field in the metadata. See
    std::string compression_mode;
    std::unordered_map<std::string, std::string> custom_data; // {key: value, ...}
    std::string ros_distro;
  2. Subscribe to the Topic: /events/write_split | Type: rosbag2_interfaces/msg/WriteSplitEvent on the client side.
    We publishing on this topic from the Rosbag2 recorder information about closed and newly opened files. And we will publish on it when will be saving snapshot in the new file. Please see details in the:
    auto message = rosbag2_interfaces::msg::WriteSplitEvent();
    message.closed_file = bag_split_info_.closed_file;
    message.opened_file = bag_split_info_.opened_file;
    message.node_name = node->get_fully_qualified_name();
    try {
    split_event_pub_->publish(message);
    } catch (const std::exception & e) {
    RCLCPP_ERROR_STREAM(
    node->get_logger(),
    "Failed to publish message on '/events/write_split' topic. \nError: " << e.what());
    } catch (...) {
    RCLCPP_ERROR_STREAM(
    node->get_logger(),
    "Failed to publish message on '/events/write_split' topic.");
    }
  3. Return the file name of the newly saved snapshot via service response to the service request for the snapshot.

The request for a fixed snapshot duration is valid and desirable for almost everybody. However, it is not easy to implement with the current design. We are bound by the double buffering, lock-free scheme with the fixed size on the rosbag2_cpp layer. Bounding by the time limit will imply intensive dynamic allocations during recording, which we would like to avoid. It is likely that some compromise solution will be possible. However, I have not found it yet.

As regards to the proposal of saving snapshot during normal recording. It is hard to implement and will likely require a similar amount of additional CPU resources as if running one more instance of the rosbag recorder in parallel. BTW liikely it will be possible to reduce the CPU burden by disabling discovery in the recorder or increasing the timeout for it via CLI option.

@bijoua29
Copy link
Author

@bijoua29 We need to move the completion criteria

  • Add ability to specify filename for each snapshot or allow option for timestamped file

to a separate issue with feature request.

@MichaelOrlov That's fine to move this to a separate issue.

Regarding the options you propose for filename, I'm fine with option 3 where it returns the filename in the service response. I assume the service response is returned after the file is written to disk so I can safely rename the file if I so choose.
Currently, we do some hacky thing where we trigger the snapshot save and then rename the file after some assumptions that the file has been written and then restart another snapshot. This is prone to errors and possible race conditions.

@MichaelOrlov
Copy link
Contributor

MichaelOrlov commented Oct 30, 2024

Hi @bijoua29, Option 3, returning the filename, as a service response may not work for all use cases.
After the #1842 is merged, the snapshot will become a blocking call.

However, at least one corner case exists. When we enable per-file compression and use sequential_compression_writer. After closing a file and returning from the snapshot, we still run compression in the background in a separate thread, and as soon as the compression finishes, we rename the original file.
In theory, we can predict a new filename after compression and return it. However, that is not what you want if you are going to use it for renaming on your end as well. Since when we return the filename from the snapshot, that new file will not exist yet. If you try to wait for it with the polling cycle, there is a risk that you will try to rename it while it has not been closed yet by the Rosbag2 recorder.
Also, to facilitate the other feature request, where will need to add the option to optionally save snapshots not only before triggering the snapshot service event but also for some time after. We will need to reconsider the design of the snapshot service event to allow writing incoming data to the swapped buffer during saving snapshots, and the snapshot event call will likely become non-synchronous again.

The most robust way to know when and what filename is available after trigering a snapshot is the option 2.

  1. Subscribe to the Topic: /events/write_split | Type: rosbag2_interfaces/msg/WriteSplitEvent on the client side.

@MichaelOrlov
Copy link
Contributor

Hi @reinzor. You were asking how to move forward with the implementation of snapshot writing for the different files and the time-bounded snapshot features. And also were willing to help with it.

The first item Make snapshot writing into a new file each time it is triggered has been done and recently merged to rolling. It was a smooth backport to the jazzy but with some hiccups on Windows CI. However, the backport to the Humble needs to be adjusted.
We moved out of scope, providing a name for the saving snapshot file via a service request. It could be considered and implemented as a separate feature.

I've made good progress on the second item The time bounded snapshot feature #1844. I've made all the required changes for this feature. However, I didn't have time to test it and write unit tests.
The help there would be really appreciated. Feel free to cherry-pick commits from py #1844 PR or create a fork above it.
Need to review #1844, test it locally, write unit tests, and update documentation.

@MichaelOrlov MichaelOrlov self-assigned this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants