Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom rule config (+use gear_id) #961

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ambrussimon
Copy link
Contributor

Resolves #922
Resolves #959

Could someone with more JSONSchema experience reassure me that this looks OK?
https://github.com/scitran/core/compare/rule-config-and-validation?expand=1#diff-31c3baaff2a2d926bb13668c2bbd23bdR5

Review Checklist

  • Tests were added to cover all code changes
  • Documentation was added / updated
  • Code and tests follow standards in CONTRIBUTING.md

@ambrussimon ambrussimon requested review from kofalt and nagem October 20, 2017 13:55
@ambrussimon ambrussimon self-assigned this Oct 20, 2017
@codecov-io
Copy link

codecov-io commented Oct 20, 2017

Codecov Report

Merging #961 into master will increase coverage by 0.13%.
The diff coverage is 89.65%.

@@            Coverage Diff             @@
##           master     #961      +/-   ##
==========================================
+ Coverage   90.92%   91.06%   +0.13%     
==========================================
  Files          49       49              
  Lines        7055     7051       -4     
==========================================
+ Hits         6415     6421       +6     
+ Misses        640      630      -10

@kofalt
Copy link
Contributor

kofalt commented Oct 20, 2017

Go ahead & do a rebase; a large amount of gear-related activity just landed on master.

@ryansanford
Copy link
Contributor

@kofalt
When this lands, can we make guarantees to gear authors that config.json will always be present and contain values for all non-optional config options? I think that's the big win for gear authors with this PR.

Copy link
Contributor

@kofalt kofalt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brush-pass LGTM

@kofalt
Copy link
Contributor

kofalt commented Oct 24, 2017

@ryansanford That should generally be the case, yes.

except APINotFoundException:
self.abort(400, 'Cannot find gear for alg {}, alg not valid'.format(doc['alg']))

doc['project_id'] = cid
doc.setdefault('config', gear['gear']['config'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retcon: this isn't correct, see out-of-band conversation for more

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kofalt is referencing that the config doc on the gear stores the default value (if available) under a default key. I added this function in jobs.gears that, given a gear doc and a config map, will set missing keys if a default value exists.

@nagem
Copy link
Contributor

nagem commented Oct 25, 2017

LGTM, with change to use new function to set defaults for submitted config.

The PUT method for rules properly does a key-wise replace. We would want to make sure any config object sent in an update does a full replace of the existing config doc, so they can properly remove keys.

@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from b6e2319 to a9dfd46 Compare October 25, 2017 19:30
@ambrussimon
Copy link
Contributor Author

According to further discussions since, it seems that triggered jobs using the latest available gear for a given name is the present and desired future behavior, so I opted for not storing a gear default config on new rules. Ties in nicely with @nagem's addition, where defaults are applied as late and as unified as possible.

@kofalt Please sanity check manifest -> gear intended fix.

@nagem @kofalt Should I add a check upon gear version addition, which makes sure that any rules that have a custom config still validate against the new gear manifest?

if len(gear.get('manifest', {}).get('config', {})) > 0:
invocation = gear_tools.derive_invocation_schema(gear['manifest'])
if len(gear.get('gear', {}).get('config', {})) > 0:
invocation = gear_tools.derive_invocation_schema(gear['gear'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice catch

@@ -531,7 +539,7 @@ def post(self):
gear = get_gear(gear_id)
if gear.get('gear', {}).get('custom', {}).get('flywheel', {}).get('invalid', False):
self.abort(400, 'Gear marked as invalid, will not run!')
validate_gear_config(gear, config_)
validate_gear_config(gear, fill_gear_default_values(gear, config_))
Copy link
Contributor

@nagem nagem Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config defaults will be filled in Queue.enqueue_job (called after batch/id/run). Setting it at the beginning is maybe slightly more efficient, but it will still iterate over all of the gear config keys for each batch job that is enqueued. You might have already been aware of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Deliberately not filling batch config with defaults, just using fill_gear_default_values for validation purposes. I think it's nice that the defaults only get persisted as late as possible, as you implemented it in enqueue_job. (This is a similar situation to rules as I see it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right Megan, but I think he's doing the right thing here. We probably need fill_gear_default_values whenever validate_gear_config is called.

@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from a9dfd46 to b8028d3 Compare October 30, 2017 21:53
except APINotFoundException:
self.abort(400, 'Cannot find gear for alg {}, alg not valid'.format(doc['alg']))

doc['project_id'] = cid

if 'config' in doc:
validate_gear_config(gear, fill_gear_default_values(gear, doc['config']))
Copy link
Contributor

@kofalt kofalt Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be called unconditionally, setting doc['config'] to {} and calling fill_gear_default_values on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, in what scenario can we run into a validation error if there is no custom config set for the new rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kofalt Do you think unconditional validation would be preferred here, too? https://github.com/scitran/core/pull/961/files#diff-8ae88fa922271b27b6066aae46fda7abR239

if 'alg' in updates or 'config' in updates:
gear = get_gear_by_name(updates.get('alg', doc['alg']))
config_ = fill_gear_default_values(gear, updates.get('config', doc.get('config', {})))
validate_gear_config(gear, config_)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think validate_gear_config will throw an exception; maybe catch that and throw a @nagem -friendly exception for status codes & such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kofalt The current (already friendly) exc thrown by it is APIValidationException (422). Wondering if that's appropriate or maybe InputValidationException is more fitting?

@@ -531,7 +539,7 @@ def post(self):
gear = get_gear(gear_id)
if gear.get('gear', {}).get('custom', {}).get('flywheel', {}).get('invalid', False):
self.abort(400, 'Gear marked as invalid, will not run!')
validate_gear_config(gear, config_)
validate_gear_config(gear, fill_gear_default_values(gear, config_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right Megan, but I think he's doing the right thing here. We probably need fill_gear_default_values whenever validate_gear_config is called.

@@ -199,6 +199,9 @@ def create_potential_jobs(db, container, container_type, file_):
gear = gears.get_gear_by_name(alg_name)
job = Job(str(gear['_id']), inputs, tags=['auto', alg_name])

if 'config' in rule:
job.config = rule['config']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the rule config have the defaults filled out, or will it need to be pushed through fill_gear_default_values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule config only ever contains custom values, so this still relies on fill_gear_default_values happening later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults will get filled in Queue.enqueue_job(). Let's not use fill_defaults when creating a job that will get enqueued through that codepath.

"project_id": { "type": "string" },
"alg": { "type": "string" },
"name": { "type": "string" },
"config": { "type": "object" },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these five definitions, and instead just use their { "type": "string" } / etc directly.

We probably do this a lot in other parts of the schema, but it's a bit overboard.

}
},
"additionalProperties": false
"allOf": [{"$ref": "../definitions/rule.json#/definitions/rule-input"}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagem Confirm for me that this is correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

{
"$schema": "http://json-schema.org/draft-04/schema#",
"type": "array",
"items": { "$ref":"../definitions/rule.json#/definitions/rule-output" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagem Same here?

@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from 3d3b322 to e126ebf Compare October 31, 2017 23:56
Copy link
Contributor

@kofalt kofalt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff is a little rough, but LGTM to the best of my ability

@@ -111,7 +111,7 @@ def validate_gear_config(gear, config_):
validator = Draft4Validator(ci)

try:
validator.validate(config_)
validator.validate(fill_gear_default_values(gear, config_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaults will be filled when the job is spawned, so I would think we wouldn't fill them here. What is the expected behavior if a gear is updated and the update has a new default that the rule didn't explicitly set? If we fill defaults here, it won't use the updated default. It is possible they wouldn't want to use an updated default without any user action.

If I missed a discussion on this, let me know. @kofalt @ambrussimon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's a job rule, I would expect valid defaults to be used, rather than invalid ones from a different gear document. Really the core issue here is that fill_gear_default_values modifies the map, it's reasonable to call this in places that you won't want to change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I would either not call fill_gear_default_values in places where you don't want to modify the document (quick fix), or (better fix) change fill_gear_default_values to use copy.deepcopy, then check the various callees.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch, that was the original intent... I totally agree, fixing right away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I was asking if we wanted defaults persisted to the database for rule config if the user didn't explicitly send them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagem I thought it would be best not to store the default config on the rule, and have new jobs simply use potentially updated/new defaults, filled at spawn time. I gathered that's the most useful/expected behavior for now ~ in line with how it works currently. Proper gear versioning is partially in conflict with the "auto-update" feature... At least that's what I thought during implementation. I guess auto-selecting the latest gear on the UI in a dropdown wouldn't be too complicated, either.

Anyhow, gear versioning or not, I think not storing the defaults is better. Assuming gears have sane defaults for every version, it just working out of the box after an update (at least if no custom config was set!) seems nice, and that's exactly what Thad wants to hold on to. If the user wants to hardcode the defaults, they are still free to, using the new rule config...

@nagem
Copy link
Contributor

nagem commented Nov 2, 2017

To be clear: we still haven't decided and acted on a solution to "what happens when someone uploads a new version of a gear whose config conflicts with an existing gear rule config", so this PR cannot be merged.

@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from 0381acf to 8bc87c5 Compare November 7, 2017 14:02
@nagem
Copy link
Contributor

nagem commented Nov 7, 2017

I think we've at least come to the conclusion that, when attempting to upload a gear, the user should be warned of project rules that would now be in conflict with with the gear's new config. I believe most clients use /gears/check before uploading a new gear, @kofalt would you say it's safe to place this warning logic as output of that endpoint? Do you have any opinions on the format of the warning? 200 but clients are expected to review the body of the response? Use a different status code?

@kofalt
Copy link
Contributor

kofalt commented Nov 8, 2017

/gears/check should definitely return 200 unless a gear upload with the submitted manifest would not succeed. Let's add a small body return:

{
    "disables_project_rules": [
        {
            "path": "scitran/testdata",
            "id": "aex"
        },
        {
            "path": "your-group/whatever",
            "id": "aex2"
        },
    ]
}

If the array is non-empty, frontends can print that as a warning before/after an upload.

@ryansanford
Copy link
Contributor

This is on hold until gear rules referencing gear id (version) is implemented

@kofalt kofalt added the Blocked label Nov 29, 2017
@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from 8bc87c5 to c163d43 Compare January 22, 2018 18:53
@ambrussimon
Copy link
Contributor Author

ambrussimon commented Jan 22, 2018

@nagem Bump. Freshly rebased on top of

  • support for disabled rules
  • rules using gear id instead of name

(Can PR separately if desired.)

I feel like I'm missing something... So is there any auto-disabling/auto-version-bump feature required on the API side, or is that now completely left for the UI/clients to sort out? (Now that rules use gear_id)

@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from c163d43 to 837e00e Compare January 22, 2018 19:20
@ambrussimon ambrussimon changed the title Add gear config to rules and improve rule input validation Custom rule config (+disabled rules, +use versioned gears via gear_id) Jan 23, 2018
@ambrussimon ambrussimon changed the title Custom rule config (+disabled rules, +use versioned gears via gear_id) Custom rule config (+disable, +use gear_id) Jan 23, 2018
@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from 837e00e to e8f6d18 Compare January 26, 2018 16:42
@ambrussimon ambrussimon changed the title Custom rule config (+disable, +use gear_id) Custom rule config (+use gear_id) Jan 26, 2018
@ambrussimon ambrussimon changed the base branch from master to disabled-rules January 26, 2018 16:43
@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from e8f6d18 to 71a8138 Compare January 29, 2018 15:37
@ambrussimon ambrussimon changed the base branch from disabled-rules to master January 29, 2018 15:37
@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from 71a8138 to da9e2b3 Compare February 28, 2018 17:02
@ambrussimon ambrussimon force-pushed the rule-config-and-validation branch from da9e2b3 to 56fa85e Compare March 20, 2018 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants