Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token refresh does not work #627

Closed
dbalabka opened this issue Jun 27, 2024 · 3 comments
Closed

Token refresh does not work #627

dbalabka opened this issue Jun 27, 2024 · 3 comments

Comments

@dbalabka
Copy link

The problem is related to #32. Unfortunately, it is not obvious to implement token refresh when a library is used by another library (dask -> pyarrow -> fsspec -> gcsfs). It would be amazing to implement more prone token refresh.

Here is an exception that we get when the job runs longer than an hour:
gcsfs version is 2023.12.2.post1
HttpError: Invalid Credentials, 401

File /opt/conda/lib/python3.10/site-packages/dask_expr/_expr.py:3727, in _execute_task()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/core.py:97, in __call__()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/core.py:645, in read_parquet_part()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/core.py:646, in <listcomp>()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/arrow.py:641, in read_partition()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/arrow.py:1774, in _read_table()

File /opt/conda/lib/python3.10/site-packages/dask/dataframe/io/parquet/arrow.py:264, in _read_table_from_path()

File /opt/conda/lib/python3.10/site-packages/pyarrow/parquet/core.py:341, in __init__()

File ~/.../.venv/lib/python3.10/site-packages/pyarrow/_parquet.pyx:1250, in pyarrow._parquet.ParquetReader.open()
   1248 
   1249         with nogil:
-> 1250             check_status(builder.Open(self.rd_handle, properties, c_metadata))
   1251 
   1252         # Set up metadata

File ~/.../.venv/lib/python3.10/site-packages/pyarrow/types.pxi:88, in pyarrow.lib._datatype_to_pep3118()
     86 """
     87 try:
---> 88     char = _pep3118_type_map[type.id()]
     89 except KeyError:
     90     return None

File /opt/conda/lib/python3.10/site-packages/fsspec/spec.py:1844, in read()

File /opt/conda/lib/python3.10/site-packages/fsspec/caching.py:69, in _fetch()

File /opt/conda/lib/python3.10/site-packages/gcsfs/core.py:1850, in _fetch_range()

File /opt/conda/lib/python3.10/site-packages/fsspec/asyn.py:118, in wrapper()

File /opt/conda/lib/python3.10/site-packages/fsspec/asyn.py:103, in sync()

File /opt/conda/lib/python3.10/site-packages/fsspec/asyn.py:56, in _runner()

File /opt/conda/lib/python3.10/site-packages/gcsfs/core.py:1027, in _cat_file()

File /opt/conda/lib/python3.10/site-packages/gcsfs/core.py:437, in _call()

File /opt/conda/lib/python3.10/site-packages/decorator.py:221, in fun()

File /opt/conda/lib/python3.10/site-packages/gcsfs/retry.py:158, in retry_request()

File /opt/conda/lib/python3.10/site-packages/gcsfs/retry.py:123, in retry_request()

File /opt/conda/lib/python3.10/site-packages/gcsfs/core.py:430, in _request()

File /opt/conda/lib/python3.10/site-packages/gcsfs/retry.py:112, in validate_response()

HttpError: Invalid Credentials, 401
@dbalabka
Copy link
Author

dbalabka commented Jun 27, 2024

It seems there was an attempt to fix the problem in #486, but it was unsuccessful.

@dbalabka
Copy link
Author

To refresh the token, we are calling the following sequence of methods on each request:

  1. GCSFileSystem.def _request()
  2. GCSFileSystem._get_headers()
  3. GCSFileSystem.credentials.apply()
  4. GCSFileSystem.credentials.maybe_refresh()

My suspicion is that GoogleCredentials.credentials == None is not obligatory when we deal with anon auth. Hence, we have to keep an explicit token type identifier.

@dbalabka
Copy link
Author

dbalabka commented Jul 2, 2024

It happens only if provide token string w/o refresh_token which is expected behavior:

def get_storage_options():
    # It works, but I don't know if it's the best way to do it
    #
    # Also, seems like need to specify the scope, although worked before.
    # Source: https://stackoverflow.com/questions/60401040/getting-invalid-scope-when-attempting-to-obtain-a-refresh-token-via-the-google-a
    credentials, _ = google.auth.default(
        scopes=["https://www.googleapis.com/auth/cloud-platform"]
    )
    if not credentials.valid:
        credentials.refresh(Request())
    return {
        "token": credentials.token,
    }

pd.read_parquet(..., storage_options=get_storage_options())

Relying on any other auth method fixes the problem.

@dbalabka dbalabka closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant