-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Riemann Sum implementation for items #4439
Comments
This issue has been mentioned on openHAB Community. There might be relevant details there: https://community.openhab.org/t/arithmetic-mean-vs-average-linear-step-interpolation/159892/16 |
@dandjo Riemann Sum algorithms will only work properly when the interval between persisted values is constant. We cannot make that assumption. How do you see handling that? |
@mherwege I would disagree. As far as I know, a Riemann sum does not require constant intervals. Of course, the accuracy increases with the shortness of the interval. The Riemann sum Mid in particular best compensates for inconsistent intervals by distributing the weighting to the left and right of the measuring point. The intervals can be quite different and the calculation would still achieve a very accurate approximation. I already use this with a measuring device that only sends me changes when the power value changes. Here, the Riemann sum Mid calculates the same values as the energy measuring device, up to the third decimal place. The average calculation Left does not do this (by far). I find it a little cumbersome in the API when you have to multiply average values by time to get to the integral, especially since calculation and rounding errors can easily creep in here. I think an easily accessible API would make it much easier to use in rules. |
Indeed, my oversight. Most text books kind of assume a constant interval, but it does not have to be. And the way we calculate an average at the moment does not assume it either. It takes the left value.
Is the measuring device you get the values from the same as the energy measuring device? That sounds like a contradiction. Because if you store in persistence at change (depending on the persistence service), I would think the left Riemann sum should give you exactly the same value. Note this will not be true with rrd4j as it is bucketizing, but it would be with influx or any of the JDBC datastores. If there is a sensor that does not report all of its changes, I agree mid Riemann sum would give better results. And it would also on an rrd4j database. I can however imagine cases where left will always give better results, e.g. electricity prices which only change every hour for dynamic tariffs. But they stay constant within the hour. You don't have a continuous function.
There may always be some, but the internal calculation is done using BigDecimal. So there should not be much loss of precision. Do you see a use for the right Riemann sum? I see one for left and mid, not so much for right, trapezoidal and Simpson approximations. In general trapezoidal looks to be less accurate than sum. Simpson is probably not worth the extra calculation overhead. |
@dandjo djo I am coming back on this. I looked at your implementation (https://community.openhab.org/t/arithmetic-mean-vs-average-linear-step-interpolation/159892/15?u=mherwege) and it is not correct if you don't have constant intervals. What you use as a mid value is not in the middle of the period if the bucket before and after the value has not the same length. You could argue it is a good enough approximation for sure, but it definitely is not a mid value. And if there is large variation in bucket length (like when only persisting on change), it may be way off. Question then becomes if a trapezoidal Riemann sum (which does not suffer from this issue) is not better after all. You can treat each bucket independently. |
Yes it is. It's a Nous A1T with Tasmota (calibrated).
Not for my application. I measure the performance depending on the position of the 3-way valve on my heat pump. Then I use the Riemann sum to calculate the consumption for either heating, hot water or standby. In addition, there are a whole range of other measurements for which I can only obtain performance values anyway. I'm using InfluxDB, by the way.
Of course, the type of the sum depends on the data you are measuring. In cases where you have irregular measurements, the Riemann sum with midpoint simply works most accurately.
I think left, mid and right are the most common one to be implemented. I wouldn't put much value on trapezoidal or Simpsons models.
I divide the time interval between a measurement point and the previous measurement point and the next measurement point by 2. This makes it a midpoint approximation. When considering this, it doesn't really matter how long the time intervals between the measurement points are. The calculation simulates a connection between the points. If the connection between two measurement points is not a line but a complex curve, then the accuracy of the approximation decreases. But this is the natural behavior of this Riemann approximation and not an error. |
I'll try to show it graphically. In this example, the measurement points are not equally spaced. The time between the measurement points is halved for each measurement point. We don't know the value between two measurement points, so we want to approximate it. You're right, of course, this isn't a 100% Riemann midpoint sum, it's more of a generic one for non-discrete-time measurements with a midpoint approximation. But I don't see any other mathematical solution. If you have better suggestions, I'd be happy to hear your input. I could test it soon. |
I actually think in the graph you have drawn, trapezoidal would be a better approximation. I don't like the double approximation you do (estimating the interval and estimating the value in that point by approximating it as well). While mathematically it still makes sense, it looks more complex to me than trapezoidal approximation, just using the adjacent point of a bucket, the bucket length and no further assumptions on the graph in between. |
The graph was just a demonstration. Of course in case of such functions you would need much denser points or grid to approximate better. I don't understand your point. You also have the approximation with a Riemann left. With the leftpoint variant, the assumption is even greater, since it is assumed that the value remains constant until the next measurement point. In other words, two assumptions are made here too. The measurement points could also be quantized differently. For example, you could also calculate the mean value of the measurement points and use this to create an area under the curve to the left and right of the resulting fictitious measurement point. But here, too, assumptions are made for value and time. In the end, it's about compensating for the deviations of a linear interaction between two measurement points using the temporal assumption. Depending on the type of data or the shape of the curve function, one or the other works better. Hence this feature request, in order to be able to offer more as a basis than just a weighted average calculation based on a Riemann leftpoint sum divided by time. |
I am in agreement that left is not the best approximation in many cases, except when you effectively have a step function with a persist on change strategy. In that case it will give an exact result and any other method will give a wrong result. I have more or less implemented methods now for left, right, trapezoidal and midpoint (your version of it). When I have it tested a bit more, I would appreciate you testing it. |
@mherwege Thanks a lot. I would try to access the raw Java code via JS. Do you have any idea how the midpoint could be implemented differently? |
No, not without introducing extra computational complexity. While I now have implemented all of these types (left, right, midpoint, trapezoidal), I am not yet convinced all should be in the software. This increases complexity from a software maintenance and a user perspective. My feeling at the moment is we should only allow left (also for backward compatibility with the |
In my opinion, the implicit choice of Left is a little problematic. It produces relatively poor results for fairly common series of measurements. I would be happy if the functions for Left and Mid were available as integrals (and not as averages). The subsequent calculation of the period is tedious and fundamentally contrary to an intuitive API. Changing the Average is of course problematic in terms of backwards compatibility. This could possibly be solved with a default parameter. This would make the choice backwards compatible but still clearly documented and visible in use. |
That is my intention. Have left as default for backward compatibility and consistency, but as a default parameter which can be changed easily. |
And here is another design decision request: one needs to multiply the values with time for the Riemann sums. I need to choose a time unit for this. While I am doing the internal calculations with millis, I now opt for delivering the result with time unit seconds. My feeling is this is enough granularity. And if the result is a quantity value, it will carry the unit and can easily be converted. Any feedback? |
Hm, but why decreasing accuracy? I measure some Sensors every second (yes, I really do). Is there an issue with Millis or even Nanos? |
You multiply large numbers (the intervals) with most likely relatively small numbers, so there is a loss of accuracy (or you add up a lot of small numbers if your intervals are small). |
Ah, you mean the return value of the calculation. I personally would stick to SI. So seconds are a good choice. |
@dandjo If you want to test, in the description of the linked PR I have included a download link for a compiled jar. You will need to replace the persistence bundle by this one through the karaf console in a running system. |
@mherwege LGTM. Thanks. |
Requirement
I want to calculate the energy consumption of a sensor that only provides me with power data at irregular intervals. This could be, for example, the output of a heat pump, a heat output meter or other time-discrete values. For this, we need to implement an integral function.
Suggestion
Provide a Riemann sum implementation with different alignments (left, right, mid, trapezoidal). Center/mid alignment is the best compromise between accuracy and implementation complexity.
Riemann Sums
Implementation
I provided a Riemann sum mid implementation in this discussion:
Arithmetic mean vs average linear/step interpolation
Your Environment
The text was updated successfully, but these errors were encountered: