Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit policy changes #240

Closed
DilipSequeira opened this issue Apr 19, 2022 · 7 comments
Closed

Audit policy changes #240

DilipSequeira opened this issue Apr 19, 2022 · 7 comments

Comments

@DilipSequeira
Copy link
Contributor

Opening this for discussion of audit policy before we start crafting a PR.

My impression is that the proposal is to add to the rules the following:

  • if an accelerator has been selected for audit in the previous round, it will not be chosen for random audit in this round
  • a "No Audit" option will be added to the committee-selected audit, for cases where the committee does not see significant benefit to MLCommons in auditing any of the systems.

@psyhtest anything else we want to include here?

@psyhtest
Copy link
Contributor

Thank you for starting the thread @DilipSequeira.

Things to discuss regarding the first point:

  • "Selected in the previous round" - should apply to both randomly selected and committee-selected?
  • A new submission can be made by a submitter different from the one audited in the previous round. OK to exclude from random audit?
  • A new submission can be made under a different category e.g. Edge in the previous round, Datacenter in the current round. Still OK to exclude?
  • A new submission can use a different software stack. Probably not OK to exclude?
  • Do we want to exclude for exactly one round or for more than one round?

@DilipSequeira
Copy link
Contributor Author

Probably the important point here is that a submission not get hit with a random audit if it was audited in the previous round. I think we can be somewhat loose here, because abuse of this privilege will result in being hit with a committee-selected audit.

So I would suggest:

  • applies to both randomly selected and committee-selected
  • is independent of category
  • we use the "substantially determine" clause - if the CPU, accelerator and NIC are the same, and the submitter is the same, then the clause applies. Software stacks (at least, versions) will almost always change between rounds I think - if the perf increase is surprising, you might get a committee-selected audit.
  • exactly one round, although I'd be interested in hearing arguments for longer.

@psyhtest
Copy link
Contributor

  • I agree that the CPU and NIC can "substantially determine" performance, but in many cases it is the accelerator that matters.
  • I don't think that the submitter's name particularly matters, all else being (nearly) equal.
  • Software versions change for sure. I was thinking about, say, "native" SDKs and Triton.
  • No strong arguments for longer than one round.

@DilipSequeira
Copy link
Contributor Author

We can consider something like "A submission is not a candidate for randomly chosen audit if the system is equivalent to a system audited in the previous round. For the purposes of this rule, the systems which are equivalent to a previously audited system are those with the same CPU, NIC, accelerator, and accelerator count. The review committee may determine additional systems to be equivalent, for example in cases where some of these components do not substantially determine performance."

If that doesn't work, can you suggest an amended version, or other new rules text?

@psyhtest
Copy link
Contributor

Should we put the burden onto submitters to propose to the review committee which systems are to be considered equivalent to the ones audited in the previous round and therefore to be excluded from randomly chosen audit?

@DilipSequeira
Copy link
Contributor Author

That's the intent of the current PR - the equivalence criteria are quite strict, and if submitters whose systems don't meet those criteria want to have their systems be considered equivalent to those in the previous round they'll need to make that argument to the review committee - e.g. "we changed the accelerator count from 5 to 6 and got 17% perf boost, so we think this should be considered equivalent."

@mrmhodak
Copy link
Contributor

Closing, we have a new effort on the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants