Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XSOR: Incomplete state transition in case of conflicting states #1290

Open
csc-gip opened this issue Dec 4, 2024 · 2 comments
Open

XSOR: Incomplete state transition in case of conflicting states #1290

csc-gip opened this issue Dec 4, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@csc-gip
Copy link
Contributor

csc-gip commented Dec 4, 2024

Node 1: CONNECTED
Node 2: DISCONNECTED but reachable (NAd died on Node 1)

Update on Node 1
Restart of Node 1
Start NAd

Node 1 is updated with all data from Node 2 (SYNC_M => SYNC_S)
Node 2 gets (old) updated state from Node 1 which is still in its incomming queue somehow

Update on Node 1: => Conflict as the states are different
Node 1 stays in state 'M'

The state should not stay in 'M' but return to 'S' and the data should be considered as invalid (for updates).

@csc-gip csc-gip added the bug Something isn't working label Dec 4, 2024
@csc-gip csc-gip self-assigned this Dec 4, 2024
@csc-gip
Copy link
Contributor Author

csc-gip commented Dec 4, 2024

If there is a conflict grabDoubleCheckAndReturnRereadPayload throws an exception


which is not handled in search
XSORPayload grabbedPayload = grabDoubleCheckAndReturnRereadPayload(transactionLockedPair.getInternalId(), transactionLockedPair.getPayload(),

As the state has been changed to 'E' it should be added to the grabbed elements

transactionContext.grabbed(internalId, searchRequest.getTablename());

as e.g. in the case of PENDING_ACTIONS_LOCAL_CHANGES and the payload needs to be released again.

            case CONFLICTING_REQUEST_FROM_OTHER_NODE:
              // state is 'E' but the grab failed due to inconsitency.
              transactionContext.grabbed(internalId, searchRequest.getTablename());
              releaseAndUnlockFromTransaction(internalId, payload, correspondingMemory, strictlyCoherent,
                  transactionContext);
              throw new CollisionWithRemoteRequestException();

@csc-gip
Copy link
Contributor Author

csc-gip commented Dec 4, 2024

The serverSocket is never closed, when pausing the server.

private void closeSocket() {
Socket s = socket;
if (s != null) {
socket = null;
socketClosedForWriting=true;
if (logger.isDebugEnabled()) {
logger.debug("closing open connection.");
}
try {
s.close();
} catch (IOException e) {
logger.warn("could not close connection", e);
}
}
}

public void pauseWorking() {
synchronized (notifyObject) {
paused = true;
notifyObject.notifyAll();
}
//sicherstellen, dass nicht weiter vom socket-inputstream gelesen wird.
closeSocket();
}

So the server will accept connections from clients while being paused and keep the messages from the client until accept is called again.

ServerSocket s = serverSocket;
while (running && !paused) {
if (s != null) {
Socket localSocket = s.accept();
if (logger.isInfoEnabled()) {
logger.info("got new socket from " + localSocket.getInetAddress() + ":" + localSocket.getPort());
}
socket = localSocket;
socketClosedForWriting=false;
InputStream is = localSocket.getInputStream();
while (running && !paused) {

Server closes its connection to the client:

04-12-2024 13:34:04 XYNA_001  INFO [XynaClusteringServicesManagementChangeHandlerThread0] (DHCPv4ServicesImpl.java:632) clusterstate changed to DISJOINED_RUNNING. oldstate = CONNECTED
04-12-2024 13:34:04 XYNA_001 TRACE [StateChangeHandlerExecutorThread-1] (XSORMemory.java:1103) changeQueueModeToMerging()
04-12-2024 13:34:04 XYNA_001  INFO [XynaClusteringServicesManagementChangeHandlerThread0] (DHCPv4ServicesImpl.java:632) clusterstate changed to DISJOINED_RUNNING. oldstate = CONNECTED
04-12-2024 13:34:04 XYNA_001 DEBUG [StateChangeHandlerExecutorThread-1] (InterconnectServer.java:202) closing open connection.
04-12-2024 13:34:04 XYNA_001 ERROR [InterconnectServer-Thread] (InterconnectServer.java:187) IOException reading data
java.net.SocketException: Socket closed
        at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
        at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
        at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
        at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
        at com.gip.xyna.xsor.interconnect.InterconnectServer.readInBuffer(InterconnectServer.java:281) ~[xsor.jar:?]
        at com.gip.xyna.xsor.interconnect.InterconnectServer.run(InterconnectServer.java:87) ~[xsor.jar:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]

Client immediately opens a new connection and (tries) to send updates

04-12-2024 13:34:04 XYNA_001  INFO [InterconnectSender-Thread] (InterconnectSender.java:106) opening socket to 192.168.178.133:1712
04-12-2024 13:34:04 XYNA_001  INFO [InterconnectSender-Thread] (InterconnectSender.java:111) try to set extended Options
04-12-2024 13:34:04 XYNA_001 DEBUG [InterconnectSender-Thread] (InterconnectSender.java:144) Interconnectsender connected to 192.168.178.133:1712

04-12-2024 13:34:10 XYNA_001 DEBUG [InterconnectSender-Thread] (InterconnectSender.java:156) CORRECT sending message 17121987I 45

04-12-2024 13:34:10 XYNA_001 DEBUG [InterconnectSender-Thread] (InterconnectSender.java:156) CORRECT sending message 17121988S 141

After synchronization the server accepts the connection and handles the (old) messages.

04-12-2024 13:40:33 XYNA_001  INFO [InterconnectSender-Thread] (InterconnectSender.java:90) opening socket to 192.168.178.132:1712
04-12-2024 13:40:33 XYNA_001  INFO [InterconnectServer-Thread] (InterconnectServer.java:81) got new socket from /192.168.178.132:51118
04-12-2024 13:40:33 XYNA_001 DEBUG [InterconnectSender-Thread] (InterconnectSender.java:116) Interconnectsender connected to 192.168.178.132:1712

04-12-2024 13:40:33 XYNA_001 DEBUG [InterconnectServer-Thread] (XSORProcess.java:805) newState=I, type=43, cid=17121987, checksumRemoteBackup=995033974, checkSumNewCalculatedRemotely=169220220, checkSumNew=0, modTime=1733318823495, relTi
meRemoteBackup=1733318823497, relTimeNew=1733318823497, objectID=10.22.24.124

04-12-2024 13:40:33 XYNA_001 DEBUG [InterconnectServer-Thread] (XSORProcess.java:805) newState=S, type=43, cid=17121988, checksumRemoteBackup=995033974, checkSumNewCalculatedRemotely=-943311168, checkSumNew=-943311168, modTime=1733319250919, relTimeRemoteBackup=1733318823497, relTimeNew=1733319250920, objectID=10.22.24.124

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant