Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a tableau workbook to identify outliers #3

Open
carlosparadis opened this issue Sep 21, 2017 · 8 comments
Open

Create a tableau workbook to identify outliers #3

carlosparadis opened this issue Sep 21, 2017 · 8 comments

Comments

@carlosparadis
Copy link
Member

carlosparadis commented Sep 21, 2017

Two-line plots, one for min and one for max for every sensor should be created. Use the dhhl database to prototype using the readings table, and then test it in frog_uhm.

Since this impact our ability to see if something is wrong, this issue is of high priority.

@carlosparadis
Copy link
Member Author

Create a workbook for a boxplot for the entire data.

The X axis should be the house id. The Y axis should be a sensor type (e.g. relative humidity, or temperature). We should create one boxplot for every type of sensor to identify outliers. Send a screenshot of the plot to slack tableau channel referring this issue link rather than post here.

@carlosparadis
Copy link
Member Author

@kathrynparadis

I just got word from Eileen that the second requested timestamp (see in the bottom the two clear sections of the requests is suspected to contain outliners in July and August. You should double check with her the precise time frame and use the time range to showcase the boxplots you have been working on for this issue, hopefully as a dashboard.

Please post here once you get the precise time range.

@kathrynparadis
Copy link
Member

We decided to create boxplots for each PurposeID by "type" (power, luminosity, humidity, temperature), and create a Dashboard containing the 4 plots for each house.

@kathrynparadis
Copy link
Member

kathrynparadis commented Nov 16, 2017

This is the most current version:

outliers2

I am also trying to work on creating a time-series version of this to display one month at a time that is easily changeable, which is important because the original boxplot shows all of the data at once:

outlier_timeseries

However, I'm having trouble getting one filter to apply independently to different dashboard on the same file (the building ID filter on image one also connects with the second image, meaning I can't look at 2 different buildings on separate dashboards at the same time. It will change both when I change one.)

I will post an update once I figure that out.

@carlosparadis
Copy link
Member Author

carlosparadis commented Nov 16, 2017

@eileenpeppard @ryantanaka @jygh98

Contrary to the missing data plot, this is an outlier plot. It is supposed to help us pick abnormal values. The main inspiration for this plot is the infamous egauge e792,
and in particular this comment: erdl/legacy-scrape-util#15 (comment)

We didn't realize there was something wrong with the PV always being 0 in this eGauge until 2 months later.

In the dashboard above, this could have been easily spotted in the bottom-right corner of Power when shuffling through the houses (the box would be squeezed in 0 forever, while the box you see there varies between 0 and 5 because of day and night cycle).

I wish I could show the boxplot of egaugee792 so you would actually see, but at the time not only we had an error in it, but also had a lock on it preventing to be accessed from the url, hence missing data, and therefore a problem for #6 workbook to solve. Although in this case, we would also notice the absence of the PV column if all data would be missing (but not just some).

Notice the plots are intended to don't require another table. In Power, both appliances and purpose id are included.

Future Work

@kathrynparadis will be adding the room type for the purpose ids of the other plots in the dashboard (temperature, light, and humidity).

@kathrynparadis p.s.: Remember to change the multiple choice box to a mutually exclusive choice box (aka radio buttons), as multiple choice here makes no sense.

@kathrynparadis
Copy link
Member

Here's an updated picture including the room types, and single choice box:

outliers_rooms

Still working on the time-series issue.

@jygh98
Copy link
Member

jygh98 commented Nov 18, 2017

So here are my initial impressions of the plot

  1. What egauge are these plots monitoring? I do see the purpose id above each column, but if i had to check what egauge these ids are associated with i would need to go onto the server and look at the config file.

  2. What are the time frame for these plots? If we are trying to look for outliers i think it would be important to label the time ranges for these plots.

  3. What do the color bars represent as compared to the error bars? If i am trying to look for outliers for example would that mean nearly all of the data in the Power plot for dryer be outliers? Mainly what i am exactly trying to look?

Also would it be possible to make the labels horizontal for the x-axis? Otherwise i think overall it looks pretty solid.

@ryantanaka
Copy link

  • I can get the general idea of the plot at first glance so that's good.
  • Is this data aggregated over a day or month? I'm not sure about that. I know we discussed it via conference call before but I don't recall.
  • Minor detail. Since there are short descriptions of purpose ids for the sensors on the x axis, why not put a short building description next to building ids so I know what building to refer to if I spot an outlier. (I could look it up in the database too.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants