Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: inject url path parts instead of endpoints #315

Merged
merged 33 commits into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from 32 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
0533f19
feat: add jobs
tdstein Oct 18, 2024
6b79912
--wip-- [skip ci]
tdstein Oct 22, 2024
279fcd6
refactor: introduce the active pattern
tdstein Oct 23, 2024
e349870
add link to parent
tdstein Oct 23, 2024
533839b
skip when Quarto unavailable
tdstein Oct 23, 2024
1066ca3
adds unit tests
tdstein Oct 23, 2024
437c515
adds docstrings
tdstein Oct 23, 2024
a1ca377
Update src/posit/connect/resources.py
tdstein Oct 24, 2024
82b9b7e
applies feedback discussed in pull requests
tdstein Oct 24, 2024
6b8126d
refactor: inject url path parts instead of endpoints
tdstein Oct 25, 2024
b64f3e7
update docstrings
tdstein Oct 28, 2024
f57340d
renames init arguments to path and pathinfo
tdstein Oct 28, 2024
72b62ac
minor cleanup
tdstein Oct 28, 2024
f1d6f42
refactors _data property into _get_or_fetch method
tdstein Oct 29, 2024
dd74d60
fix method signature
tdstein Oct 29, 2024
fb52c83
fix cache check
tdstein Oct 29, 2024
bbbd6b4
Update src/posit/connect/resources.py
tdstein Oct 29, 2024
bc2cfcb
feat: add jobs
tdstein Oct 18, 2024
9c3d6dd
--wip-- [skip ci]
tdstein Oct 22, 2024
9019386
refactor: introduce the active pattern
tdstein Oct 23, 2024
4bfe3f8
add link to parent
tdstein Oct 23, 2024
a721b61
skip when Quarto unavailable
tdstein Oct 23, 2024
107ee85
adds unit tests
tdstein Oct 23, 2024
a070f0a
adds docstrings
tdstein Oct 23, 2024
d196271
Update src/posit/connect/resources.py
tdstein Oct 24, 2024
d87cfe7
applies feedback discussed in pull requests
tdstein Oct 24, 2024
2add280
Merge remote-tracking branch 'origin/tdstein/jobs' into tdstein/jobs-…
tdstein Oct 29, 2024
97d24f6
Merge remote-tracking branch 'origin/main' into tdstein/jobs-endpoint…
tdstein Oct 30, 2024
7eeb054
refactor: wrap cache interactions (#318)
tdstein Oct 30, 2024
03accf8
removes pathinfo options when not needed
tdstein Oct 30, 2024
77f8a38
remove unecessary arg
tdstein Oct 30, 2024
ac488c7
additional path cleanup
tdstein Oct 30, 2024
b3ff1cd
remove self imposed complexity of the method
tdstein Oct 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/posit/connect/content.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ class ContentItemOwner(Resource):
class ContentItem(JobsMixin, VanityMixin, Resource):
def __init__(self, /, params: ResourceParameters, **kwargs):
ctx = Context(params.session, params.url)
super().__init__(ctx, **kwargs)
uid = kwargs["guid"]
path = f"v1/content/{uid}"
super().__init__(ctx, path, **kwargs)

def __getitem__(self, key: Any) -> Any:
v = super().__getitem__(key)
Expand Down
66 changes: 36 additions & 30 deletions src/posit/connect/jobs.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from typing import Literal, Optional, TypedDict, overload
import posixpath
from typing import Any, Literal, Optional, TypedDict, overload

from typing_extensions import NotRequired, Required, Unpack

from .context import Context
from .resources import Active, ActiveFinderMethods, ActiveSequence, Resource

JobTag = Literal[
Expand Down Expand Up @@ -99,13 +101,8 @@ class _Job(TypedDict):
tag: Required[JobTag]
"""A tag categorizing the job type. Options are build_jupyter, build_report, build_site, configure_report, git, packrat_restore, python_restore, render_shiny, run_api, run_app, run_bokeh_app, run_dash_app, run_fastapi_app, run_pyshiny_app, run_python_api, run_streamlit, run_tensorflow, run_voila_app, testing, unknown, val_py_ext_pkg, val_r_ext_pkg, and val_r_install."""

def __init__(self, ctx, parent: Active, **kwargs: Unpack[_Job]):
super().__init__(ctx, parent, **kwargs)
self._parent = parent

@property
def _endpoint(self) -> str:
return self._ctx.url + f"v1/content/{self._parent['guid']}/jobs/{self['key']}"
def __init__(self, ctx: Context, path: str, /, **attributes: Unpack[_Job]):
super().__init__(ctx, path, **attributes)

def destroy(self) -> None:
"""Destroy the job.
Expand All @@ -120,40 +117,36 @@ def destroy(self) -> None:
----
This action requires administrator, owner, or collaborator privileges.
"""
self._ctx.session.delete(self._endpoint)
endpoint = self._ctx.url + self._path
self._ctx.session.delete(endpoint)


class Jobs(
ActiveFinderMethods[Job],
ActiveSequence[Job],
):
def __init__(self, ctx, parent: Active, uid="key"):
class Jobs(ActiveFinderMethods[Job], ActiveSequence[Job]):
def __init__(self, ctx: Context, path: str):
"""A collection of jobs.
Parameters
----------
ctx : Context
The context containing the HTTP session used to interact with the API.
parent : Active
Parent resource for maintaining hierarchical relationships
uid : str, optional
The default field name used to uniquely identify records, by default "key"
The context object containing the session and URL for API interactions
path : str
The HTTP path component for the jobs endpoint (e.g., 'v1/content/544509fc-e4f0-41de-acb4-1fe3a2c1d797/jobs')
"""
super().__init__(ctx, parent, uid)
self._parent = parent
super().__init__(ctx, path, "key")

@property
def _endpoint(self) -> str:
return self._ctx.url + f"v1/content/{self._parent['guid']}/jobs"
def _create_instance(self, path: str, /, **attributes: Any) -> Job:
"""Creates a Job instance.
def _create_instance(self, **kwargs) -> Job:
"""Creates a `Job` instance.
Parameters
----------
path : str
The HTTP path component for the Job resource endpoint (e.g., 'v1/content/544509fc-e4f0-41de-acb4-1fe3a2c1d797/jobs/7add0bc0-0d89-4397-ab51-90ad4bc3f5c9')
Returns
-------
Job
"""
return Job(self._ctx, self._parent, **kwargs)
return Job(self._ctx, path, **attributes)

class _FindByRequest(TypedDict, total=False):
# Identifiers
Expand Down Expand Up @@ -287,6 +280,19 @@ def find_by(self, **conditions) -> Optional[Job]:
class JobsMixin(Active, Resource):
"""Mixin class to add a jobs attribute to a resource."""

def __init__(self, ctx, **kwargs):
super().__init__(ctx, **kwargs)
self.jobs = Jobs(ctx, self)
def __init__(self, ctx, path, /, **attributes):
"""Mixin class which adds a `jobs` attribute to the Active Resource.
Parameters
----------
ctx : Context
The context object containing the session and URL for API interactions
path : str
The HTTP path component for the resource endpoint
**attributes : dict
Resource attributes passed
"""
super().__init__(ctx, path, **attributes)

path = posixpath.join(path, "jobs")
self.jobs = Jobs(ctx, path)
174 changes: 83 additions & 91 deletions src/posit/connect/resources.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import posixpath
import warnings
from abc import ABC, abstractmethod
from dataclasses import dataclass
Expand Down Expand Up @@ -50,84 +51,118 @@ def __init__(self, params: ResourceParameters) -> None:


class Active(ABC, Resource):
def __init__(self, ctx: Context, parent: Optional["Active"] = None, **kwargs):
"""A base class representing an active resource.
def __init__(self, ctx: Context, path: str, /, **attributes):
"""A dict abstraction for any HTTP endpoint that returns a singular resource.

Extends the `Resource` class and provides additional functionality for via the session context and an optional parent resource.

Parameters
----------
ctx : Context
The context object containing the session and URL for API interactions.
parent : Optional[Active], optional
An optional parent resource that establishes a hierarchical relationship, by default None.
**kwargs : dict
Additional keyword arguments passed to the parent `Resource` class.
path : str
The HTTP path component for the resource endpoint
**attributes : dict
Resource attributes passed
"""
params = ResourceParameters(ctx.session, ctx.url)
super().__init__(params, **kwargs)
super().__init__(params, **attributes)
self._ctx = ctx
self._parent = parent
self._path = path


T = TypeVar("T", bound="Active")
"""A type variable that is bound to the `Active` class"""


class ActiveSequence(ABC, Generic[T], Sequence[T]):
def __init__(self, ctx: Context, parent: Optional[Active] = None):
"""A sequence abstraction for any HTTP GET endpoint that returns a collection.
"""A sequence for any HTTP GET endpoint that returns a collection."""

_cache: Optional[List[T]]
schloerke marked this conversation as resolved.
Show resolved Hide resolved

It lazily fetches data on demand, caches the results, and allows for standard sequence operations like indexing and slicing.
def __init__(self, ctx: Context, path: str, uid: str = "guid"):
"""A sequence abstraction for any HTTP GET endpoint that returns a collection.

Parameters
----------
ctx : Context
The context object that holds the HTTP session used for sending the GET request.
parent : Optional[Active], optional
An optional parent resource to establish a nested relationship, by default None.
The context object containing the session and URL for API interactions.
path : str
The HTTP path component for the collection endpoint
uid : str, optional
The field name of that uniquely identifiers an instance of T, by default "guid"
"""
super().__init__()
self._ctx = ctx
self._parent = parent
self._cache: Optional[List[T]] = None
self._path = path
self._uid = uid
self._cache = None

@property
@abstractmethod
def _endpoint(self) -> str:
def _create_instance(self, path: str, /, **kwargs: Any) -> T:
"""Create an instance of 'T'."""
raise NotImplementedError()

def cached(self) -> bool:
tdstein marked this conversation as resolved.
Show resolved Hide resolved
"""Returns True if the collection is cached.

Returns
-------
bool

See Also
--------
reload
"""
Abstract property to define the endpoint URL for the GET request.
return self._cache is not None

Subclasses must implement this property to return the API endpoint URL that will
be queried to fetch the data.
def reload(self) -> Self:
"""Reloads the collection from Connect.
tdstein marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
str
The API endpoint URL.
Self
"""
raise NotImplementedError()
self._cache = None
return self

def _fetch(self) -> List[T]:
"""Fetch the collection.

Fetches the collection directly from Connect. This operation does not effect the cache state.

Returns
-------
List[T]
"""
endpoint = self._ctx.url + self._path
response = self._ctx.session.get(endpoint)
results = response.json()
return [self._to_instance(result) for result in results]

def _to_instance(self, result: dict) -> T:
"""Converts a result into an instance of T."""
uid = result[self._uid]
path = posixpath.join(self._path, uid)
return self._create_instance(path, **result)

@property
def _data(self) -> List[T]:
"""
Fetch and cache the data from the API.
"""Get the collection.

This method sends a GET request to the `_endpoint` and parses the response as a list of JSON objects.
Each JSON object is used to instantiate an item of type `T` using the class specified by `_cls`.
The results are cached after the first request and reused for subsequent access unless reloaded.
Fetches the collection from Connect and caches the result. Subsequent invocations return the cached results unless the cache is explicitly reset.

Returns
-------
List[T]
A list of items of type `T` representing the fetched data.
"""
if self._cache:
return self._cache

response = self._ctx.session.get(self._endpoint)
results = response.json()
self._cache = [self._create_instance(**result) for result in results]
See Also
--------
cached
reload
"""
if self._cache is None:
tdstein marked this conversation as resolved.
Show resolved Hide resolved
self._cache = self._fetch()
return self._cache

@overload
Expand All @@ -148,52 +183,18 @@ def __str__(self) -> str:
def __repr__(self) -> str:
return repr(self._data)

@abstractmethod
def _create_instance(self, **kwargs) -> T:
"""Create an instance of 'T'.

Returns
-------
T
"""
raise NotImplementedError()
class ActiveFinderMethods(ActiveSequence[T], ABC):
"""Finder methods.

def reload(self) -> Self:
"""
Clear the cache and reload the data from the API on the next access.

Returns
-------
ActiveSequence
The current instance with cleared cache, ready to reload data on next access.
"""
self._cache = None
return self


class ActiveFinderMethods(ActiveSequence[T], ABC, Generic[T]):
def __init__(self, ctx: Context, parent: Optional[Active] = None, uid: str = "guid"):
"""Finder methods.

Provides various finder methods for locating records in any endpoint supporting HTTP GET requests.

Parameters
----------
ctx : Context
The context containing the HTTP session used to interact with the API.
parent : Optional[Active], optional
Optional parent resource for maintaining hierarchical relationships, by default None
uid : str, optional
The default field name used to uniquely identify records, by default "guid"
"""
super().__init__(ctx, parent)
self._uid = uid
Provides various finder methods for locating records in any endpoint supporting HTTP GET requests.
"""

def find(self, uid) -> T:
"""
Find a record by its unique identifier.

Fetches a record either by searching the cache or by making a GET request to the endpoint.
If the cache is already populated, it is checked first for matching record. If not, a conventional GET request is made to the Connect server.

Parameters
----------
Expand All @@ -203,26 +204,17 @@ def find(self, uid) -> T:
Returns
-------
T

Raises
------
ValueError
If no record is found.
"""
# todo - add some more comments about this
if self._cache:
if self.cached():
conditions = {self._uid: uid}
result = self.find_by(**conditions)
else:
endpoint = self._endpoint + uid
response = self._ctx.session.get(endpoint)
result = response.json()
result = self._create_instance(**result)

if not result:
raise ValueError(f"Failed to find instance where {self._uid} is '{uid}'")
if result:
return result

return result
endpoint = self._ctx.url + self._path + uid
response = self._ctx.session.get(endpoint)
result = response.json()
return self._to_instance(result)
Comment on lines +195 to +198
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I removed the call to invalidate the cache that existed a few commits before. After some additional consideration, I concluded that invalidating the cache is an unwanted side effect.

I think there is still an argument for invalidating the cache or appending the instance to the cached list. But, I don't think we have a good enough understanding of the side effects to proceed with either implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just call .find_by() all the time?

If the cache exists, ._data returns quickly. If not, it asks the server.

Then both methods have the same quirks. (Where as find() will not alter the cache, but find_by will... causing a followup call to find() to use the cached values, behaving differently.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in either situation, we end up with conflicting ideas.

If we always depend on find_by, we must fetch the entire collection before returning a single value, which would take significantly longer than a single GET request for the value.

Today, I think the obvious solution would be to always call the HTTP GET method to get the value from the server. This will sometimes be slightly slower than an in-memory list scan. But in reality, it's going to be a negligible difference. The weird edge case with this solution is when another process creates the value fetched by GET after the _cache is set. In this situation, the value returned by find will exist on the server but not in the _cache. This would probably warrant a cache invalidation. But that would take extra time to compute and may not be consistent behavior across all endpoints.

tl;dr - the speed up via find_by probably isn't worth the trouble. Classic over engineering.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole function body could be something like...

        conditions = {self._uid: uid}
        result = self.find_by(**conditions)
        if result is not None:
            return result
        raise ValueError(f"Object `\{ \"{self._uid}\": \"{ uid }\" \}` could not be found")

(untested)


def find_by(self, **conditions: Any) -> Optional[T]:
"""
Expand Down
Loading