Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve dora implementation #241

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Finnegan1
Copy link

Special notes for your reviewer:
The return json object is not compitable with the version before.
So only accept together with the according delivery dashboard PR.

@Finnegan1 Finnegan1 requested a review from a team as a code owner November 11, 2024 00:25
@ocm-ci-robot-0
Copy link
Collaborator

Thank you @Finnegan1 for your contribution. Before I can start building your PR, a member of the organization must set the required label(s) {'reviewed/ok-to-test'}. Once started, you can check the build status in the PR checks section below.

@zkdev zkdev added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 11, 2024
@ocm-ci-robot-0 ocm-ci-robot-0 added needs/ok-to-test and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 11, 2024
components.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated
]
))
):
print("can't process")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please elaborate on what's wrong

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it checks if the repository is a github repo. This happens because currently we can only access Github repos. If it is not an Github repo, we just skip it :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Finnegan1 : skipping unsupported resources is fine. With my (quite brief) comment, I wanted to suggest to make the logged message more expressive, and self-describing. Consider you look through the log and just see "can't process". This will make it very difficult to understand (without knowing the context / emitting code) what happened / whether or not it is a problem.

A better log-message might read: skipping {component-name+version}/{artefact-name}: has unsupported {type=} (we only support gitHub.

In addition, you should preferably use logging package (I think in this specific case, debug might be an adequate severity/loglevel)

dora.py Outdated
))
entry_date += datetime.timedelta(days=30)
if not old_repo_url == new_repo_url:
print("repo urls are not equal")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the print :)

This is another edge case. If the repository of the component chaned between two updates, It can not compare it with the healp of the hashes. So for now it just skips this update.

Should I log this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation.
I think logging is useful here, probably even as warning because a component repository change is quite uncommon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Finnegan1 : it would be very helpful do document such edge-cases, ideally in a (brief) comment close to the code doing the handling.

However, I would like to remark that a change or repo-url does not necessarily mean the repository itself changed. The repository might simply have been moved (we, for example, recently moved this repository to its current location; although admittedly, in this specific case, we also changed the repository's commit-history)

OTOH, just because repository-URL did not change does not necessarily mean all commits stored in component-descriptors are actually still in (consider a forceful history-rewrite, that might be done upon publishing an unintended commit (e.g. containing credentials, or a large file that was added by accident)).

So instead of this early-exit, I would rather suggest to change the code such that the case not both commits are found in remote-repository should be handled

app.py Outdated Show resolved Hide resolved
components.py Outdated
@@ -76,6 +76,16 @@ async def check_if_component_exists(
return False


def ensure_utc(ts: datetime.datetime) -> datetime.datetime:
if ts.tzinfo is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why explicitly checking for None here is required? If not, I'd suggest to simply check if ts.tzinfo is falsy or not.

components.py Outdated Show resolved Hide resolved
leadTime: datetime.timedelta
url: str

def to_dict(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing all these to_dict functions in this module, you can also just utilise util.dict_serialisation which will already recursively serialise your dataclasses into a dict. Note however, that this will require an additional condition for handling datetime.timedelta objects. As an alternative, you might also use built-in dataclasses.asdict functionality and pass-in a a custom dict_factory to specify serialisation of datetime.datetime and datetime.timedelta objects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just let the ones which do already exist as they are or should I migrate them to util.dict_serialisation?

dora_result_calcs.py Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
@@ -0,0 +1,140 @@
import collections
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us rather merge this module into existing dora.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will let it seperated for now, so that you can see the changes better. I will move it to dora.py as the last step 👍

components.py Outdated
@@ -76,6 +76,16 @@ async def check_if_component_exists(
return False


def ensure_utc(ts: datetime.datetime) -> datetime.datetime:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, this function will:

  • return timestamp unchanged if it is tz-aware + tz equals utc
  • return cp of ts w/ tz converted to utc it tz-aware + tz != utc
  • return a cp of passed-in ts w/ tz to utc if ts is not tz-aware

Hence, I think the name is misleading + I am uncertain about why conversion (of tz-aware TSs) to utc is needed in the first place.

-> I imagine it should be sufficient for this function leave tz-aware TSs as they are + set tz for tz-unaware TSs. If this is not the case, could you elaborate on why?

Assuming conversion to utc for tz-aware TSs is not necessary, a better name might be as_tz_aware, or make_tz_aware, or similar. Also, it might be considered to expose target-tz as parameter (it is okay to prefill it w/ utc to make it more convenient for callers).

It should be noted, though, that "blindly" assuming utc as timezone will yield an error of up to 23h (or is it 24? 🤔 ). So the preferred approach IMO might be to actually go through the efforts and ensure each location in our code, where we create timestamps does so in a tz-aware manner.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. naming change is a good idea
  2. good idea to make the "wanted" time zone as a parameter.

For the reason why this function exists in the first place:
=> This function is only used when handeling external data => timestamps from Github commits and creation dates of Component descriptors.
=> To ensure, that there are no false results when calculating with the "internal" dates which are (at least should it be so) all UTC

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timestamps from component-descriptors should actually be tz-aware (I think this is even forced by the ocm-spec); there might be historical component-descriptors w/ non-tz-aware timestamps though (do you happen to remember whether you actually encountered non-tz-aware TSs from component-descriptors? If not, it might be cleaner to rm this codepath for TSs from component-descriptors

As for the timestamps contained in git-commits, I think those should also be tz-aware (again: do you have concrete examples of commits in git-repositories which are not tz-aware?) 🤔

As for conversion to UTC: If I am not mistaken, python's datetime-API will make no mistakes when calculating timeranges and the like using tz-aware timestamps.

Soooo in summary:

  • please check whether we actually have non-tz-aware TSs
  • conversion of tz-aware TSs is only necessary to make timestamps "human-friendly" (for displaying/logging), but it is not needed as preliminary step for date-calculations
  • -> it might be possible to drop this function - unless there are non-tz-aware TSs, of course

dora.py Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated
]
))
):
print("can't process")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • please elaborate on what's wrong

dora.py Outdated

@middleware.auth.noauth
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe to give some context: we recently switched all routes (except the one for doing auth, for obvious reasons :-)) to requiring authentication

dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
dora.py Outdated Show resolved Hide resolved
@ccwienk
Copy link
Collaborator

ccwienk commented Nov 14, 2024

  • please address linter + test errors - ty

@ccwienk
Copy link
Collaborator

ccwienk commented Dec 2, 2024

@Finnegan1 : I added some additional remarks to ongoing discussions. However, those were not attached to the (changed) code - please browse through discussions in "Conversation" Tab

@@ -73,38 +74,65 @@ def to_dict(self):
'deployents': [deployment.to_dict() for deployment in self.deployments],
}

@dataclasses.dataclass(frozen=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering attrs are dicts (which are mutable), the dataclass should not be declared frozen (immutable)

@@ -73,38 +74,65 @@ def to_dict(self):
'deployents': [deployment.to_dict() for deployment in self.deployments],
}

@dataclasses.dataclass(frozen=True)
class DeploymentFrequencies:
deploymentsPerMonth: dict[typing.Any, int]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't key's typehint be str, rather than Any?

'medianLeadTimePerDay': median_lead_time_per_day,
}
return LeadTimes(
medianLeadTimePerMonth=dict(median_lead_time_per_month),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't those values already dicts?



def as_timezone(ts: datetime.datetime, target_tz: datetime.timezone = datetime.timezone.utc) -> datetime.datetime:
if ts.tzinfo is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in separate thread: is tz-conversion actually needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants