Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/unregistration memfile event handling #1214

Closed

Conversation

rex-schilasky
Copy link
Contributor

@rex-schilasky rex-schilasky commented Oct 13, 2023

Description

The new introduced "active register/unregister logic" calls depending on the connection type specific functions like (AddLocConnection/RemLocConnection,AddExtConnection/RemExtConnection) for the SHM and UDP data writer (TCP was just not handled for now).

The RemLocConnection was not called in that context before the new unregister/register logic and is closing the 2 connection events for updating and acknowledging memory file data transfer/receive. This leads to a dead shm connection when the subscriber on the other side is destroyed (triggering RemLocConnection in the publishing process) and created again.

Solution: Events are not closed on that level but are cleaned up together with the memory file resource (CSyncMemFile) properly.

Related issues

No issue now, will be created soon ..

Cherry-pick to

  • None

…when creating/destroying subscriber dynamically for the same topic
…nts are closed on CSyncMemFile Destruction/Disconnect)

added empty AddLocConnection/AddExtConnection RemLocConnection/RemExtConnection for all layers as unimplemented base functions (ecal_writer_base.h)
Copy link
Contributor

@KerstinKeller KerstinKeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, a lot of questions (concerning) the original design, still remain.

I find it a bit questionable, to keep all signalling resources open until the memory file itself is destroyed.
Couldn't it lead to the same problems on subscriber side, e.g. if the subscriber side destroys the events (e..g after 60 second timeout) and then creates a new subscriber?
(E.g. now we made sure events are not closed on sender side, because in the testcase they stayed open on subscriber side. What if they are closed/reopened on subscriber side... in that case communication can't be established, either?)

It feels to me, like we might have just shifted the problem.
Maybe it's worth to take a look at the underlying NamedEvents.

virtual bool AddExtConnection(const std::string& /*host_name_*/, const std::string& /*process_id_*/, const std::string& /*conn_par_*/) { return false; };
virtual bool RemExtConnection(const std::string& /*host_name_*/, const std::string& /*process_id_*/) { return false; };
virtual bool AddExtConnection(const std::string& /*host_name_*/, const std::string& /*process_id_*/, const std::string& /*topic_id_*/, const std::string& /*conn_par_*/) { return false; };
virtual bool RemExtConnection(const std::string& /*host_name_*/, const std::string& /*process_id_*/, const std::string& /*topic_id_*/) { return false; };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change the function signatures? Are the arguments actually being used somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not used yet, but IF any layer wants to intercept that information it makes sense to provide host_name, process_id, topic_id to identify exactly the connection the is added or removed. The old signature supported just SHM with the needed information process_id.

{
m_writer.shm.RemLocConnection(loc_sub.process_id);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the "goodbye" message but happens on registration timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's not the goodbye message and no more needed at all. It closed the connected events for the memory file that was connected to this shm writer. This is now done since many version in the destruction of the CSyncMemFile.


return ret_state;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit wrong that connections cannot be removed again. So the resources stay open until the publisher is closed? Is that what happens? Also they continue being signalled, etc?

And btw, I guess you have addd line 206 to prevent the testcase to fail. see the huge comment. Maybe it was just not working at all on Posix...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's no more signalling, it is just holding the resources for MemFile + Events. The publisher (writer) knows the number of connections local/extern all the time.

// currently all these functions are not implemented (there is no need to react on)
m_writer.udp_mc.RemLocConnection (local_info_.process_id, local_info_.topic_id);
m_writer.shm.RemLocConnection (local_info_.process_id, local_info_.topic_id);
m_writer.tcp.RemLocConnection (local_info_.process_id, local_info_.topic_id);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the tcp RemLocConnection / AddExtConnection should preferably be done in a different PR as it is not related to the issue that is being fixed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will mark this PR as a cleanup and we do the fix in another one.

Copy link
Contributor

@KerstinKeller KerstinKeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it doesn't work in the case that the subscriber is calling finalize() and initialize()

@FlorianReimold FlorianReimold marked this pull request as draft October 13, 2023 14:20
@rex-schilasky
Copy link
Contributor Author

mixed up bugfixes and code cleanups + did not really fix the issue

@rex-schilasky rex-schilasky deleted the bugfix/unregistration_memfile_event_handling branch February 15, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants