Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change get_data() to always return a long form data frame with dimension IDs. #32

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aaronweeden
Copy link

@aaronweeden aaronweeden commented Oct 30, 2024

WORK IN PROGRESS

Description

This PR changes the get_data() method to always return a long form data frame and to include the dimension IDs in the data. For example, given this call to get_data():

with dw:
    df = get_data(
        duration=('2024-01-01', '2024-01-02'),
        realm='Jobs',
        metric='Number of Jobs Ended',
        dimension='Service Provider',
        dataset_type='timeseries',
        aggregation_unit='Day',
        filters={
            'Service Provider': ('PSC', 'SDSC', 'NCSA'),
        },
    )

The df variable will be assigned this data frame:

Date Metric Service Provider ID Service Provider Label Value
0 2024-01-01 Number of Jobs Ended 848 PSC 16146
1 2024-01-01 Number of Jobs Ended 856 SDSC 8450
2 2024-01-01 Number of Jobs Ended 844 NCSA 2650
3 2024-01-02 Number of Jobs Ended 848 PSC 11702
4 2024-01-02 Number of Jobs Ended 856 SDSC 7413
5 2024-01-02 Number of Jobs Ended 844 NCSA 2497

Motivation and Context

Long form data is easier to process and plot.

Also, currently, sometimes data is returned as a Series instead of a DataFrame, which is difficult to work with because it often requires casting the Series as a DataFrame.

The dimension IDs are needed because labels are not necessarily unique (e.g., two people can have the same name).

This PR will also be a basis for allowing multiple metrics to be requested and returned in the same data frame (which will be a separate PR) and multiple data about the dimension to be included in the data frame (#35).

Tests performed

Types of changes

  • Refactoring / documentation update (non-breaking change which does not change functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Release preparation

Checklist:

  • CHANGELOG.md has been updated
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request
  • Running the automated tests (see docs/developing.md) produces no errors
  • Updates have been made to the xdmod-notebooks repository as necessary, and the notebooks all run successfully

@aaronweeden aaronweeden added the enhancement New feature or request label Oct 30, 2024
@aaronweeden aaronweeden added this to the 2.0.0 milestone Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant