-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve dora implementation #241
base: master
Are you sure you want to change the base?
improve dora implementation #241
Conversation
Thank you @Finnegan1 for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below. |
dora.py
Outdated
] | ||
)) | ||
): | ||
print("can't process") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please elaborate on what's wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it checks if the repository is a github repo. This happens because currently we can only access Github repos. If it is not an Github repo, we just skip it :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Finnegan1 : skipping unsupported resources is fine. With my (quite brief) comment, I wanted to suggest to make the logged message more expressive, and self-describing. Consider you look through the log and just see "can't process". This will make it very difficult to understand (without knowing the context / emitting code) what happened / whether or not it is a problem.
A better log-message might read: skipping {component-name+version}/{artefact-name}: has unsupported {type=} (we only support gitHub
.
In addition, you should preferably use logging
package (I think in this specific case, debug
might be an adequate severity/loglevel)
dora.py
Outdated
)) | ||
entry_date += datetime.timedelta(days=30) | ||
if not old_repo_url == new_repo_url: | ||
print("repo urls are not equal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I deleted the print :)
This is another edge case. If the repository of the component chaned between two updates, It can not compare it with the healp of the hashes. So for now it just skips this update.
Should I log this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation.
I think logging is useful here, probably even as warning because a component repository change is quite uncommon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Finnegan1 : it would be very helpful do document such edge-cases, ideally in a (brief) comment close to the code doing the handling.
However, I would like to remark that a change or repo-url does not necessarily mean the repository itself changed. The repository might simply have been moved (we, for example, recently moved this repository to its current location; although admittedly, in this specific case, we also changed the repository's commit-history)
OTOH, just because repository-URL did not change does not necessarily mean all commits stored in component-descriptors are actually still in (consider a forceful history-rewrite, that might be done upon publishing an unintended commit (e.g. containing credentials, or a large file that was added by accident)).
So instead of this early-exit, I would rather suggest to change the code such that the case not both commits are found in remote-repository should be handled
components.py
Outdated
@@ -76,6 +76,16 @@ async def check_if_component_exists( | |||
return False | |||
|
|||
|
|||
def ensure_utc(ts: datetime.datetime) -> datetime.datetime: | |||
if ts.tzinfo is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why explicitly checking for None
here is required? If not, I'd suggest to simply check if ts.tzinfo
is falsy or not.
leadTime: datetime.timedelta | ||
url: str | ||
|
||
def to_dict(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing all these to_dict
functions in this module, you can also just utilise util.dict_serialisation
which will already recursively serialise your dataclasses into a dict. Note however, that this will require an additional condition for handling datetime.timedelta
objects. As an alternative, you might also use built-in dataclasses.asdict
functionality and pass-in a a custom dict_factory
to specify serialisation of datetime.datetime
and datetime.timedelta
objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just let the ones which do already exist as they are or should I migrate them to util.dict_serialisation?
@@ -0,0 +1,140 @@ | |||
import collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us rather merge this module into existing dora.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will let it seperated for now, so that you can see the changes better. I will move it to dora.py as the last step 👍
components.py
Outdated
@@ -76,6 +76,16 @@ async def check_if_component_exists( | |||
return False | |||
|
|||
|
|||
def ensure_utc(ts: datetime.datetime) -> datetime.datetime: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not mistaken, this function will:
- return timestamp unchanged if it is tz-aware + tz equals utc
- return cp of ts w/ tz converted to utc it tz-aware + tz != utc
- return a cp of passed-in ts w/ tz to utc if ts is not tz-aware
Hence, I think the name is misleading + I am uncertain about why conversion (of tz-aware TSs) to utc is needed in the first place.
-> I imagine it should be sufficient for this function leave tz-aware TSs as they are + set tz for tz-unaware TSs. If this is not the case, could you elaborate on why?
Assuming conversion to utc for tz-aware TSs is not necessary, a better name might be as_tz_aware
, or make_tz_aware
, or similar. Also, it might be considered to expose target-tz as parameter (it is okay to prefill it w/ utc to make it more convenient for callers).
It should be noted, though, that "blindly" assuming utc as timezone will yield an error of up to 23h (or is it 24? 🤔 ). So the preferred approach IMO might be to actually go through the efforts and ensure each location in our code, where we create timestamps does so in a tz-aware manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- naming change is a good idea
- good idea to make the "wanted" time zone as a parameter.
For the reason why this function exists in the first place:
=> This function is only used when handeling external data => timestamps from Github commits and creation dates of Component descriptors.
=> To ensure, that there are no false results when calculating with the "internal" dates which are (at least should it be so) all UTC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timestamps from component-descriptors should actually be tz-aware (I think this is even forced by the ocm-spec); there might be historical component-descriptors w/ non-tz-aware timestamps though (do you happen to remember whether you actually encountered non-tz-aware TSs from component-descriptors? If not, it might be cleaner to rm this codepath for TSs from component-descriptors
As for the timestamps contained in git-commits, I think those should also be tz-aware (again: do you have concrete examples of commits in git-repositories which are not tz-aware?) 🤔
As for conversion to UTC: If I am not mistaken, python's datetime-API will make no mistakes when calculating timeranges and the like using tz-aware timestamps.
Soooo in summary:
- please check whether we actually have non-tz-aware TSs
- conversion of tz-aware TSs is only necessary to make timestamps "human-friendly" (for displaying/logging), but it is not needed as preliminary step for date-calculations
- -> it might be possible to drop this function - unless there are non-tz-aware TSs, of course
dora.py
Outdated
] | ||
)) | ||
): | ||
print("can't process") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please elaborate on what's wrong
dora.py
Outdated
|
||
@middleware.auth.noauth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe to give some context: we recently switched all routes (except the one for doing auth, for obvious reasons :-)) to requiring authentication
|
@Finnegan1 : I added some additional remarks to ongoing discussions. However, those were not attached to the (changed) code - please browse through discussions in "Conversation" Tab |
@@ -73,38 +74,65 @@ def to_dict(self): | |||
'deployents': [deployment.to_dict() for deployment in self.deployments], | |||
} | |||
|
|||
@dataclasses.dataclass(frozen=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
considering attrs are dicts (which are mutable), the dataclass should not be declared frozen (immutable)
@@ -73,38 +74,65 @@ def to_dict(self): | |||
'deployents': [deployment.to_dict() for deployment in self.deployments], | |||
} | |||
|
|||
@dataclasses.dataclass(frozen=True) | |||
class DeploymentFrequencies: | |||
deploymentsPerMonth: dict[typing.Any, int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't key's typehint be str
, rather than Any
?
'medianLeadTimePerDay': median_lead_time_per_day, | ||
} | ||
return LeadTimes( | ||
medianLeadTimePerMonth=dict(median_lead_time_per_month), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't those values already dicts?
|
||
|
||
def as_timezone(ts: datetime.datetime, target_tz: datetime.timezone = datetime.timezone.utc) -> datetime.datetime: | ||
if ts.tzinfo is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed in separate thread: is tz-conversion actually needed?
Special notes for your reviewer:
The return json object is not compitable with the version before.
So only accept together with the according delivery dashboard PR.