-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugging erratic DDS behaviour #171
Comments
Dear Richard, May i know your ospl configuraion? Please share us your previous and latest ospl configuration. With best regards, |
The configuration is as follows (which has not changed between 6.7 and 6.9): <OpenSplice>
<Domain>
<Name>rdghickman</Name>
<Id>224</Id>
<SingleProcess>true</SingleProcess>
<Description>Stand-alone 'single-process' deployment and standard DDSI networking.</Description>
<Service name="ddsi2">
<Command>ddsi2</Command>
</Service>
<Service enabled="false" name="cmsoap">
<Command>cmsoap</Command>
</Service>
<Report verbosity="INFO"/>
</Domain>
<DDSI2Service name="ddsi2">
<General>
<NetworkInterfaceAddress>eth0</NetworkInterfaceAddress>
<AllowMulticast>true</AllowMulticast>
<EnableMulticastLoopback>true</EnableMulticastLoopback>
<CoexistWithNativeNetworking>false</CoexistWithNativeNetworking>
</General>
<Compatibility>
<!-- see the release notes and/or the OpenSplice configurator on DDSI interoperability -->
<StandardsConformance>lax</StandardsConformance>
<!-- the following one is necessary only for TwinOaks CoreDX DDS compatibility -->
<!-- <ExplicitlyPublishQosSetToDefault>true</ExplicitlyPublishQosSetToDefault> -->
</Compatibility>
<Discovery>
<MaxAutoParticipantIndex>39</MaxAutoParticipantIndex>
</Discovery>
</DDSI2Service>
<TunerService name="cmsoap">
<Server>
<PortNr>Auto</PortNr>
</Server>
</TunerService>
</OpenSplice> This runs on x64 Linux built with GCC 10. After further analysis our current belief is that this only affects Java applications. C++ applications seem fine which may indicate an issue with the 6.9 JNI layer or its configuration, but it is difficult to be certain. |
It seems wired to me why it is not working because we didn't find any such problem in the OSPL 6.9.210323OSS. You could try with our Vortex Opensplice DDS Commercial 6.11.x ( https://www.adlinktech.com/en/vortex-data-distribution-service-dds-software-evaluation ). With best regards, |
Have now confirmed that reverting to 6.7.180404OSS with the same code appears to fix this issue, even with GCC10, so we will have to abandon 6.9.210323OSS at this time and look for another solution. |
After extensive analysis I believe I have actually found the problem. I don't believe it is actually a 6.7 vs 6.9 issue, but it is a subtle mis-use of DDS from the Java side which seems to create a catastrophic problem inside DDS. Unfortunately OpenSplice does not show up with any warnings or errors that I can find, which resulted in our assumption that the upgrade of OpenSplice caused the failure. While it appeared to occur in 6.9 and not 6.7, what is required is a Java application using We have a wrapper around the very old Java 5 DDS API, one of which is a reader-type class which supports a liveliness callback: public LivelinessOpenspliceDdsReader(Subscriber subscriber, Topic<T> topic, DdsReceiver<R> receiver, MessageConverter<T, R> converter,
DdsLivelinessListener livelinessListener, QosPolicy.ForDataReader... policies)
{
super(subscriber, topic, receiver, converter, Arrays.asList(DataAvailableStatus.class, LivelinessChangedStatus.class), policies);
this.livelinessListener = Objects.requireNonNull(livelinessListener);
}
@Override
public void onLivelinessChanged(LivelinessChangedEvent<T> status)
{
super.onLivelinessChanged(status);
livelinessListener.livelinessChanged(status.getStatus().getAliveCount());
} This code appears innocent enough and is completely valid Java however this causes what seems to be a massive problem. The base class does the following: protected BaseOpenspliceDdsReader(Subscriber subscriber, Topic<T> topic, DdsReceiver<R> receiver, MessageConverter<T, R> converter,
Collection<Class<? extends Status>> status, QosPolicy.ForDataReader... policies)
{
this.subscriber = Objects.requireNonNull(subscriber);
this.converter = Objects.requireNonNull(converter);
this.receiver = Objects.requireNonNull(receiver);
this.dataReader = subscriber.createDataReader(topic,
subscriber.copyFromTopicQos(subscriber.getDefaultDataReaderQos().withPolicies(policies),
topic.getQos()), this, status);
} It appears that It appears this is causing some significant problem inside the DDS layer at some point, however there are no warnings or errors. If the Given that the Java written is valid and there does not appear to be anything stopping JNI from calling into an object's methods before initialisation is complete, may I perhaps suggest the following:
|
Dear Richard, Thank you for your comment. I really appreciate your suggestion. We will test the case and consider your workaround, if required. With best regards, |
Until very recently, the system I was maintaining was running OSPL 6.7180404OSS. Part of this has been rebuilt recently against 6.9.210323OSS and I have begun to see some extremely erratic behaviour that almost certainly borders on broken, however I am clueless as to how to debug such a scenario.
Essentially there is a few C++ apps, one of which is publishing to a topic. There are other Java apps that use the supplied JNI layer that subscribe to this topic. When problems arose, I turned to a quick tool that was written to show all publishers and subscribers for a given topic and I observed the following:
This feels like the whole messaging layer is malformed somehow, however there are no ospl error logs. The only things of note in the info logs are missed heartbeats, which seems to occur when new publishers/subscribers are started.
I am looking to see if anyone would put a finger on any particular area to investigate if anyone had happened to see such behaviour before. At the moment given the limited tools and knowledge at my disposal, it may have to abort back to 6.7 unless I can ascertain what the actual issue may be.
The text was updated successfully, but these errors were encountered: