Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flushing data from TCP after read error #3

Open
gambgia opened this issue Feb 14, 2019 · 12 comments
Open

Flushing data from TCP after read error #3

gambgia opened this issue Feb 14, 2019 · 12 comments
Labels
help wanted Extra attention is needed

Comments

@gambgia
Copy link

gambgia commented Feb 14, 2019

Hi,
I'm testing a connection with a S71500 PLC via VPN.
I'm reading different DB Areas from different DB.
Everything run fine.
Sometime, with the Internet traffic, one of the read fails with Tcp Read Data error and usually some data are still to be read (maybe data arrive later).
The Flushing included in method WaitForData is performed before the next read, but if TCPSocket.Available is less than the next read data size, the flush will not perform. Is it right ? In this case I will receive, within the next read, wrong data.
Thank you

@fbarresi fbarresi added the help wanted Extra attention is needed label Mar 8, 2019
@fbarresi
Copy link
Owner

fbarresi commented Mar 8, 2019

Hi!
Thank you for reporting this issue!

Can you please tell me if or how you can reproduce this issue?

@gambgia
Copy link
Author

gambgia commented Mar 11, 2019

Hi, I think I'm able to reproduce the issue.
I can give you some more details.
The PLC actually, controls an automatic machine. My application is a supervisor/controller of that machine. The app performs a frequent polling reads of different DB areas. I'm using S7MultiVar reader to read small db areas (due to negotiated PDU size limit, but faster than DBRead) and single read of bigger areas (using DBRead).
When I start a new instance of my application and connect it to the plant using a VPN connection sometime the reads return errors. Maybe on S7Multivar read or DBRead.
In this moments I'm reproducing the situation. I have some errors on DBRead (Error 0x00000005 TCP : Error receiving Data). If I stop execution on the error and I perform a loop to flush TCP until receive timeout is expired, I see something like:

Flushing loop start
SizeAvail: 487
Flushing...
SizeAvail: 0
Flushing loop end
End Flushing

Without this procedure, at the next polling, I will read wrong data.
For example, if I'm right, you will flush 7 bytes with the RecvIsoPacket() and then the size of the next read.
If the next read is of 100 bytes, you will flush 107 bytes total. and the other 380 bytes are received within the next read that in my example is a read on another DB.
This means wrong data.
In this moment I have implemented a workaround with a flushing loop of TCPSocket.Available bytes but only in caso of read error.
Thank you.

@fbarresi
Copy link
Owner

Hi!
Can you please share the workaround you developed? I would try to integrate it in the library.
Thank you!

@gambgia
Copy link
Author

gambgia commented Mar 18, 2019

Hi,
My workaround is a simple copy of your WaitForData (MsgSocket.cs) made public and renamed to "public int Flush(int Size)".
The Timeout parameter is replaced with _ReadTimeout.
Then I added a method in S7Client.cs:
public int Flush(int Size)
{
return Socket.Flush(Size);
}

Everytime I have a errTCPDataReceive I will call the Flush asking to flush 2048 bytes:

byte[] buffer = new byte[mds.DataStoreLength];
DbRead(mds.DataStoreNumber, mds.DataStoreOffset, mds.DataStoreLength, ref buffer);
mds.DataStoreBuffer = buffer;
if (_operationResult.IsOk)
{
OnPollingTelegramsCompleted(new PollingCompletedEventArgs(mds, _uniqueId));
}
else
{
System.Diagnostics.Debug.WriteLine("DataStoreTelegramsList - Error: " + operationResult.ErrorMessage);
_client.Flush(2048);
}

You should not be able to call another tcp read while the flushing loop is running.

When the error is in the S7Multivar read, usually the problem is that I'm requesting 3 DB blocks but in the result I have only one data block. The other datablock arrives later and with the workaround I throw all away. It's better to have less data but right, instead of more data but wrong (In a polling scenario).

Obviously the changes should be re-engineered.
Thank you

@fbarresi
Copy link
Owner

fbarresi commented Jun 2, 2020

Close due to inactivity

@fbarresi fbarresi closed this as completed Jun 2, 2020
@josbri
Copy link

josbri commented Jan 13, 2021

I am having the same problem. Apart from @gambgia solution, just closing (Close()) the socket when the connection fails ( LastError = S7Consts.errTCPDataReceive;) seems to help. The following is an example, not a solution:

#public int Receive(byte[] Buffer, int Start, int Size)
        {

            int BytesRead = 0;
            LastError = WaitForData(Size, _ReadTimeout);
            if (LastError == 0)
            {
                try
                {
                    BytesRead = TCPSocket.Receive(Buffer, Start, Size, SocketFlags.None);
                }
                catch
                {
                    LastError = S7Consts.errTCPDataReceive;
                }
                if (BytesRead == 0) // Connection Reset by the peer
                {
                    LastError = S7Consts.errTCPDataReceive;
                    Close();
                }
            }
  // CLOSING THE SOCKET ON TCP ERROR:
		    if (LastError == 5)
		    {
		        Close();
                    }
            return LastError;
        }

Any ideas on how this can be solved? Would just closing the socket work or are there any side effects?

Thanks.

@fbarresi
Copy link
Owner

fbarresi commented Jan 23, 2021

Hi!
Please apologize my late reply.
No, Closing the socket would no cause any side effects.

Do you use the sharp7 with parallel requests? It seems to me, that this errors are caused by faulty data, but the only scenario I can imagine for that is that the plc would send data to the wrong request.

@josbri
Copy link

josbri commented Jan 27, 2021

Hi,
I apologize for the late reply as well. We are not doing any parallell requests, but the problem seems to go away when only using one DB block. So far it looks exactly like what @gambgia describes. In the WaitforData method, the SizeAvail can be 0 at the time of the if (Expired && SizeAvail > 0), yet have the remaining data avaible just after.

sizeAvail

In this image you can see an example of what we are refering to. In this moment there is no connection, it didn´t go inside the if(Expired && (SizeAvail > 0)) in line 99 so the sizeAvail was 0 at the moment, but it did go into the Expired. Once inside if we inspect the TCPSocket.Available, we can see that there is actually data in there (89).

If the connection is recovered before the next read, the socket will never close and this data wil still be in the buffer.

Hope all this makes sense.

@fbarresi fbarresi reopened this Jan 27, 2021
@fbarresi
Copy link
Owner

fbarresi commented Jan 28, 2021

Hi,
Thank you for your post. I think I now see the real problem.
Please, help me to check if I understood you well:

  • For some reason (VPN or whatever) the connection get instable
  • the user want to receive some data (let say 42 bytes) from the plc, so the receive function is called (Line 124 of MsgSocket.cs)
  • In this function the method WaitforData is called
    In this method the software waits till the number of bytes available reach the size (42) or the time is out.
    In case of timeout every partial message will be trashed.

In your case there are still data after the timeout was recognized that will be trashed on the next reading. So I think a close is not needed in this case and the remaining data in the buffer will not effect further readings.

@josbri
Copy link

josbri commented Feb 2, 2021

Yes, that´s more or less what is happening. It appears that the data is not being trashed though.

@fbarresi
Copy link
Owner

fbarresi commented Feb 2, 2021

Do you mean, that the data are not trashed in your debug-example?
That's right, because they are going to be trashed at the next read operation.

@JellevanCampen
Copy link

Hi, I am experiencing this issue as well. We are running an application on a wireless device that connects to an S7-1500 PLC over WiFi. In some parts of the plant, the WiFi connection is weak and the connection slow causing data read errors. In contrast to a single error that resolves itself on the next data read operation (due to a lost connection), the data is continuously incorrect due to data arriving after the request has timed out (due to a slow connection).

Our application polls the PLC for data periodically across multiple reads. Every polling cycle looks like this:

  1. Read global data of a machine (read 1)
  2. Read data of component 1 (read 2)
  3. Read data of component 2
  4. Read data of component 3
  5. Read data of component 4
  6. Read data of component 5

When a read fails and the connection is restored afterwards, we observe that the result data of read 1 (Read global data of a machine) is returned upon calling read 2 (Read data of component 1).

For example, say the first data of read 1 and read 2 are both Strings. If read 1 is supposed to return string "Global data" and read 2 is supposed to return "Component 1 data", we instead see read 1 report a data read error, and read 2 return "Global data". The data of the data read operation is thus shifted to the next data read operation.

The data seems to shift from one data read operation to the next. It does not seem to be a shift of data within a single data operation. This could explain why @josbri doesn't observe the problem when reading from only a single DB.

What I assume is happening is the following. Currently, Sharp7 will flush any incomplete data in the buffer upon detection a timeout. It is however possible that more data is still on the way after this initial timeout detection. In case of a very slow connection, no data may have been received yet by the Sharp7 client upon triggering the timeout detection. All data will arrive after the flush of the socket. The next time a data read is executed, the data of the previous call will be available in buffer. This data will have all the correct S7Protocol headers, so it will pass as a valid message. It contains the response to the previous read request however, instead of the current read request. This again explains why the issue is not observed when only reading from a single DB in the polling cycle, as the data will be structurally correct, it is however data from one cycle ago (which may not be an issue in a polling scenario).

Would it be an option to flush the TCPSocket buffer at the start of every call to WaitForData? This would ensure it is empty before waiting for new data to arrive. Better yet may be to flush the buffer upon sending the data read request to the PLC (otherwise data may arrive inbetween the data read request and the call to WaitForData).

Our software engineers on-site have reported the issue is resolved by restarting our application. In case of a buffer issue, I would indeed expect this to solve the issue. We have currently deployed a quick fix that destroys the S7Client instance and creates a new instance upon encountering any data read error, but this is obviously a hacky solution. @josbri 's solution of closing the socket is a bit cleaner, and will also clear the buffer and make sure no more on-the-way data arrives in the buffer. @gambgia 's solution is the cleanest, but on-the-way data may still end up in the buffer as the socket is not closed, so the next data read may still be incorrect.

Could you advise on your view of this issue? A solution integrated into the Sharp7 library may be the cleanest. It would have to ensure the buffer is clean before any data read, and that no more data from a previous data read can enter the buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants