forked from spotify/luigi
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latest luigi #13
Open
thisiscab
wants to merge
27
commits into
real-latest-luigi
Choose a base branch
from
latest-luigi
base: real-latest-luigi
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Latest luigi #13
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I think it would be neat if luigi.cfg could interpolate some environment variables. I think this is the minimum required amount of work to make this happen but I'm looking for guidance.
We're currently integration our solutions with DataDog. We wanted to integrate that ability to send metrics to that service from our Pipeline. Doing such, will allow us to monitor the status of our Pipeline by looking at statistics based on metrics sent by the pipeline. At this moment, there is only one event that's supported but as the feature progress forward, it's easy to see that we could support a bunch more. I had an implementation that was fairly basic at first but after navigating on the existing PR against the official Luigi repo I've discovered that there an ongoing implementation of exactly what we were trying to achieve but with similar service. Thanks to chrispalmer, I've been able to re-use his original work to implement ours. spotify#2044
Added more significant name and values to the event data.
This will add an event of type "error" in DataDog telling us that a task has failed running.
When a task is disabled, it means that it has failed multiple times given a certain window. That type of event is interesting to know about so implementing that into DataDog will allow us to permanently log that information.
This will let us keep track on how many tasks has been started, failed, disabled so that we can see a graphic and alert if things goes wrong.
Some users may want to namespace their metrics differently. We're allowing that to be configured in the Luigi configuration file.
Task parameters will now be displayed as tags in DataDog.
Fix issue were already completed task would trigger another event. We're hooking ourselves at a point where if the task was completed in previous time, it would still call our DataDog tracking event. When that case happen, we don't want to log an event again because we've already logged it.
We were calling the wrong event for disabled tasks event. This would cause problems since the method signature is different and thus crashing the task.
Double empty lines
The name of the task shouldn't have a dimension into it, it should be put into the tags section.
We can now improve the implementation quite a bit by being aware that metrics can also have tags. [ch8202] [ch8215]
This removes code duplication and reads much better.
Before adding this feature, we would have the metric namespaced with different values. For instance we would have `luigi_production` for production metrics and `luigi_staging` for staging metrics. This caused a whole lot of problems. Having this new feature will allow us to set up monitoring on multiple different environment in parallel thus effectively reducing the number of metrics we have to manage.
This is not necessary as we have a default value if set to nothing.
DataDog requires the use of `:` instead of `=`.
Instead of using the None value, we use an empty try. The check that we do before using the value will also return False if we detect an empty string.
Unfortunately, the datadog python library doesn't follow their own convention of being configurable from a `datadog.conf` file. We have to manually set those objects as configuration in Luigi and set the relevant properties when initializing the `statsd` specialized class of DataDog.
Previously, we had name that variable `default_event_tags` but it was confusing because we were sending those tags for both events and metrics. Renaming that variables + adding a sane default value will clarify the goal of that variable. [ch9820]
In our current implementation of this pipeline, we are already sending the `environment` parameter in all of our task. In our DD contrib, we log all the parameters that are passed to a task as a tag. That said, the environment tag is already set in our case. If we were to have `env` then we would have both tags sent to DD for every metrics / events which duplicated the numbers of that that we trully want. DD is clever enough that if a tag already exists, then it won't duplicate it, so by default we want to log the `environment` so that if it doesn't exists, it will create it, else it will use the one that we pass to the task itself. [ch9704]
This would yield an annoying warning telling us that we're passing an INT when it was expecting a STRING. This wasn't affecting anything, but today is the day that I'm removing this warning.
Hiya - still on this :)
…On Mon, Jan 28, 2019 at 2:39 PM Charles-André Bouffard < ***@***.***> wrote:
testing
------------------------------
You can view, comment on, or merge this pull request online at:
#13
Commit Summary
- Load os.environ into the config singleton
- Add the ability to add metrics based on certain scheduler events
- Add additional values to the task started event
- Add an event to DataDog when a Luigi task fails
- Add an event to DataDog when a task is flagged as disabled
- Add simple metrics to count tasks
- Add metrics and events when a task gets completed
- Add execution time of a task + bug fixes
- Add namespaced metrics
- Add missing RPC method for datadog
- Add support for params
- Fix duplicative event logging for done tasks
- Set the scheduler to be optional for metrics logging
- Fix wrongly named event method name
- Fix minor linter issues
- Change metric name for execution time
- Refactor the datadog metric class
- Refactor the namespace of the metric
- Fix invalid code path
- Add support for default environment flag
- Remove unecessary IF clause for the environment ENV variable
- Fix the environment tag for DataDog events
- Luigi throws warning if param is None
- Add the ability to configure statsd information in Luigi
- Change the DD tag from env to environment
- Change the behavior of the default tags
- Change statsd_port parameter type (#11)
File Changes
- *M* luigi/configuration.py
<https://github.com/glossier/luigi/pull/13/files#diff-0> (2)
- *A* luigi/contrib/datadog.py
<https://github.com/glossier/luigi/pull/13/files#diff-1> (117)
- *A* luigi/metrics.py
<https://github.com/glossier/luigi/pull/13/files#diff-2> (19)
- *M* luigi/scheduler.py
<https://github.com/glossier/luigi/pull/13/files#diff-3> (44)
Patch Links:
- https://github.com/glossier/luigi/pull/13.patch
- https://github.com/glossier/luigi/pull/13.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#13>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AS8DCPs0h_NqM49r86_kAbV81ffxgGUwks5vH1HsgaJpZM4aWoVz>
.
--
Kate Caputo
(239) 248-2690
[email protected]
[email protected]
|
We're working on removing you! :) Sorry for the annoying pings in the meantime. You can always unsubscribe to these notifications! |
Of course, no worries at all - just wanted to let you know in case there is
sensitive information shared :)
Hope you're doing well, Cab!
…On Mon, Jan 28, 2019 at 4:13 PM Charles-André Bouffard < ***@***.***> wrote:
We're working on removing you! :) Sorry for the annoying pings in the
meantime. You can always unsubscribe to these notifications!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#13 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AS8DCMCHv1XMTeCcpedyeEsbNFg_Cq_Lks5vH2figaJpZM4aWoVz>
.
--
Kate Caputo
(239) 248-2690
[email protected]
[email protected]
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
testing