Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.filter_by_rank() not working (with DwdRoadRequest) #1353

Open
SB-511 opened this issue Sep 9, 2024 · 6 comments
Open

.filter_by_rank() not working (with DwdRoadRequest) #1353

SB-511 opened this issue Sep 9, 2024 · 6 comments

Comments

@SB-511
Copy link

SB-511 commented Sep 9, 2024

Describe the bug
After applying .filter_by_rank() to a DwdRoadRequest it still returns all stations.
Sidenote: .filter_by_distance() still works fine.

To Reproduce

import datetime
from zoneinfo import ZoneInfo
from wetterdienst.provider.dwd.road.api import DwdRoadRequest, DwdRoadResolution

LOCATION = (49, 8.4)
NOW = datetime.datetime.now(ZoneInfo("UTC")).replace(tzinfo=None)

# Check the available parameters -> don't miss new ones!
road_parameters_dict = DwdRoadRequest.discover(
    resolution=DwdRoadResolution.MINUTE_10
)
road_params = list(road_parameters_dict["minute_10"].keys())

request = DwdRoadRequest(
    parameter=road_params,
    start_date=NOW - datetime.timedelta(minutes=60),
    end_date=NOW,
)

stations = request.filter_by_rank(latlon=LOCATION, rank=5)

print(stations.df)

Output:

shape: (1_653, 15)
[...]

If you replace .filter_by_rank(LOCATION, 5) with .filter_by_distance(LOCATION, 20) it works fine:

shape: (4, 15)
[...]

Expected behavior
Work as described = return the n closest stations.

Screenshots

Desktop (please complete the following information):

  • OS: Ubuntu
  • Python-Version 3.10.12
  • wetterdienst-Version: 0.95.1

Additional context
I'm not sure, but I thought it was working some releases ago.

@SB-511
Copy link
Author

SB-511 commented Sep 9, 2024

(Short test with the DwdObservationRequest shows same problem)

@gutzbenj
Copy link
Member

gutzbenj commented Sep 12, 2024

This may be a bit confusing but

stations = request.filter_by_rank(latlon=LOCATION, rank=5)

result = stations.values.all()

print(result.df_stations)

would give you what you need.

We'll probably need to make this more clear but when going through those distance sorted stations we don't really know if any of those has the requested values so what it does is consume K of N stations until RANK stations with values were found. It then stops and df_stations is a view on the stations df based on the consumed stations WITH values.

@SB-511
Copy link
Author

SB-511 commented Sep 15, 2024

Hey @gutzbenj ,
thank you for the clarification!

Is there a way to access the actual values of these 5 closest stations?

@gutzbenj
Copy link
Member

So you'd either have to set ts_skip_empty to false or lower the ts_skip_threshold to something more pessimistic like 0.75. See https://wetterdienst.readthedocs.io/en/latest/usage/settings.html#settings for reference.

@SB-511
Copy link
Author

SB-511 commented Sep 15, 2024

Oh okay, so there is no way of accessing the values of these stations directly as .filter_by_rank() is searching for the closest stations, not the closest stations with data?

@gutzbenj
Copy link
Member

gutzbenj commented Oct 6, 2024

It is doing exactly that - looking for stations with data in accordance to the settings. But if you set ts_skip_empty=False it would just hand you the X closest stations' data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants