Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify control bit in S02 files #7

Open
antonroman opened this issue Jun 2, 2021 · 7 comments
Open

Verify control bit in S02 files #7

antonroman opened this issue Jun 2, 2021 · 7 comments

Comments

@antonroman
Copy link
Owner

antonroman commented Jun 2, 2021

The field "Bc" refers to Control Byte, it must be lower than 0x80 to consider the data as valid. We must check this to make sure the data is valid.
If we discover values equal or higher than 0x80 it can be a valuable input for the next experiment.

@gbarreiro
Copy link
Collaborator

I've created a script to check the control byte called detect_because_invalid.py and I've uploaded the outputhttps://github.com/antonroman/smart_meter_data_analysis/blob/master/data_processing/sample_csv_files/invalid_data.csv). There are a lot of rows with its control byte greater or equal to 0x80:

Screen Shot 2021-06-07 at 19 08 59

I will show you this in more detail in our next meeting.

@antonroman
Copy link
Owner Author

antonroman commented Jun 7, 2021

Ok, then we need to check with Gabriel what to do in this case:

  1. either drop the sample and fill with the same time of the previous day
  2. or assume the value as valid.

@antonroman
Copy link
Owner Author

In order to understand the magnitude of the problem, could be possible to plot a histogram showing the number of files which has the following percentages of errors: 0.1%, 0.5%, 1%, 2%, 3%, 4%, 5%, 10%, >10%?

Thanks a lot!

@antonroman antonroman reopened this Jun 8, 2021
@gbarreiro
Copy link
Collaborator

Done:

Screen Shot 2021-06-09 at 14 13 29

Screen Shot 2021-06-09 at 14 13 57

@antonroman
Copy link
Owner Author

Great,

so we can safely discard all the series with more than 0.1% of errors.
From those files we should check if either the incorrect samples have a realistic value or we need to fix it. For S02 we could follow the approach of filling these values with an average of the previous and posterior samples. Shall I create another issue for this?
Do you know if we also have errors in S05 samples?
Thanks!

@antonroman
Copy link
Owner Author

We should check if the S05 value provided for this records is "0". It would make sense since this is typically caused by an error in the data tx.

@gbarreiro
Copy link
Collaborator

The quality byte (Bc) is only available in the S02 records (hourly), not in the S05 (daily). I have counted the unique R1 and R4 values of the invalid_data.csv file and save them in the invalid_data_R1.csv and invalid_data_R4.csv files. As you can see in the screenshot below, the most common value of R1 (and the same with R4) for the records with a Bc higher or equal than 0x80 is 0, but there are many other invalid records with R1 values higher than zero:

Screen Shot 2021-07-08 at 16 56 17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants