Make operand data type and rank validation table-driven #657

inexorabletash · 2024-04-26T18:14:05Z

Improve readability of input operand data type and rank by introducing a table within each method definition that lines the restrictions for positional arguments and options. Steps are updated to reference the table columns.

Preview | Diff

Improve readability of input operand data type and rank by introducing a table within each method definition that outlines the restrictions for positional arguments and options. Steps are updated to reference the table columns.

zolkis · 2024-04-29T17:50:47Z

I'd prefer this style than the one in #646

inexorabletash · 2024-04-30T19:35:34Z

A few questions to consider:

dataType

For most "secondary" operands, the type is constrained to be the same as the primary operand. The relevant steps could be written as either:
- "If mean’s dataType is not one of its allowed data types, then throw a TypeError." - more consistent.
- "If mean’s dataType is not the same as input's dataType, then throw a TypeError." - clearer.

rank

In most cases there's only one allowed rank. So do we prefer:
- "If mean’s rank is not its allowed rank, then throw a TypeError." - more consistent.
- "If mean’s rank is not 1, then throw a TypeError." - clearer.
If we prefer the former (reference the table), what about Simplify, correct, and add validation for GRU/LSTM and friends #659 which proposes simplifying some validation e.g. "If mean’s shape is not equal to « input’s shape[options.axis] »..." — this implicitly validates the rank. If we had that, would we still want a step reading "If mean’s rank is not ..." ?
If any rank is allowed, do we still want to include the step, for consistency?

General

Should we add a "validate the dataType and rank of an operand" helper algorithm? The steps for each operands could then be collapsed to just: "If validating the dataType and rank of mean returns false, then throw a TypeError." - less text, but more is hidden behind the algorithm.
When there are multiple input operands, do we prefer separate steps or combined steps? At an extreme we could even write: "If validating the dataType and rank of any of mean, variance, options.scale (if it exists) or options.bias (if it exists) returns false, then throw a TypeError."

zolkis · 2024-05-02T08:08:17Z

IMHO the "consistent" option is clear enough in this case. :)

- gemm(): Fix ranks in table, align phrasing. - gru(): Align phrasing. - lstm(): Don't inline rank of 3, reference table. - matmul(): Align phrasing. - prelu(): Fix punctation. - triangular(): Add table, use for rank validation. - where(): Align phrasing.

inexorabletash · 2024-05-02T19:41:42Z

Other notes:

For the op groups (element-wise binary/unary/logical, pooling, reduction) a table is not given and the inline steps are retained. Any better suggestions?
Do we keep a rank validation step if there's a subsequent shape validation step? i.e. keep both of these lines:
1. If mean’s rank is not its allowed rank, then throw a TypeError.
2. If mean’s shape is not equal to « input’s shape[options.axis] », then throw a TypeError.

zolkis · 2024-05-03T14:59:31Z

For the op groups (element-wise binary/unary/logical, pooling, reduction) a table is not given and the inline steps are retained.

Looking at the preview, I think that makes sense.

Do we keep a rank validation step if there's a subsequent shape validation step?

Shape should include rank... I'd leave rank out and let impl optionally handle that as a quick check, if it makes sense somewhere.

inexorabletash · 2024-11-07T22:13:10Z

Marking this as "ready for review" - it looks like it didn't get very stale, and opSupportLimits() doesn't seem to require integration as currently written (the actual behavior of that method isn't specified in detail). @fdwr & @huningxin - please take a look?

huningxin

Thanks @inexorabletash !

It seems some operators are not covered, such as argMin/Max. Do you plan to do them in a separate CL?

index.bs

"input" -> "same as input" Co-authored-by: Ningxin Hu <[email protected]>

inexorabletash · 2024-11-08T19:29:04Z

It seems some operators are not covered, such as argMin/Max. Do you plan to do them in a separate CL?

My approach was to very mechanically migrate explicit rank/shape validation steps from the prose steps that contain the constraints to the table format where the prose steps reference the table. That has these implications:

If an op didn't have either explicit data type or rank validation steps, the table wasn't added since nothing would reference it. So e.g. argMin/Max didn't get a table.
If an op's data type and/or rank was provided as parameters to the algorithm steps, the steps were not touched, and the table wasn't added for that op.
If there wasn't an explicit step validating the data type (e.g. gather), the data type in the table is given as: any data type/any
If there wasn't an explicit step validating the rank (e.g. gemm), or that validate the rank against something other than an explicit number (e.g. layerNormalization), the rank in the table is given as: any rank/N
- Note that steps that validate the shape (e.g. where it must be broadcastable to something) are considered to not validate the rank.

In other words, only introduce the table where it will be normatively referenced.

Obviously we can change this!

We can include the table for all ops, which would have "any" / "N" for the allowed data types/ranks
- argMin/argMax, cast, clamp, concat, element-wise binary, element-wise logical, expand, pad, reshape, slice, split, transpose
- 🤔 Do we also add steps to the algorithm that reference the table, even though it's a no-op?
We can add in tables for the cases where data type/rank are passed in to the algorithm
- element-wise unary, averagePool2d, l2Pool2d, maxPool2d, reduceL1, reduceL2, reduceLogSum, reduceLogSumExp, reduceMax, reduceMean, reduceMin, reduceProduct, reduceSum, reduceSumSquare
- 🤔 This implies we'd have a table per op (e.g. one for abs, one for ceil, one for cos, etc), and modify the algorithms to pass/take both data types and ranks. That's... a lot of tables.
We can expand the definition of "allowed ranks" beyond "any" or an explicit number to include:
- a reference to other parameters, e.g. "same as input"
- a range
- 🤔 Do we introduce the implied duplicate steps? e.g. for layerNormalization, do we have steps that validate rank against the table (i.e. scale is in [0, input rank]) and a specific value (i.e. scale is equal to options.axes's size) ?

I am not sure whether it should point to the allowed data types table of batchNorm. It currently points to the allowed data types definition.

Yeah, I wasn't sure how much effort to put into explicitly linking things. Currently nothing actually links to the tables; it's implied that if a step mentions "allowed data types" or "allowed ranks" then the reader should find the nearby table and look up the appropriate value for the named input operand. It might be better to link those phrases in steps to the table cells.

huningxin · 2024-11-09T01:29:01Z

@inexorabletash

In other words, only introduce the table where it will be normatively referenced.

Makes sense to me! Thanks for the explanation!

We can include the table for all ops, which would have "any" / "N" for the allowed data types/ranks

argMin/argMax, cast, clamp, concat, element-wise binary, element-wise logical, expand, pad, reshape, slice, split, transpose

logicalNot may need steps to validate input data type is "uint8"? (and logicalAnd, logicalOr, logicalXor in @fdwr 's wave3 proposal)

And wave3 will introduce "int4"/"uint4" data types, I assume "any" data types may not apply for some ops, we may need "anyDataTypesAtLeast8Bits" (kAllDataTypesAtLeast8bits in Chromium prototype)

Of course we can handle the wave3 ops / data types after its PR landed.

🤔 Do we also add steps to the algorithm that reference the table, even though it's a no-op?

I think it's unnecessary to add no-op steps.

We can add in tables for the cases where data type/rank are passed in to the algorithm

element-wise unary, averagePool2d, l2Pool2d, maxPool2d, reduceL1, reduceL2, reduceLogSum, reduceLogSumExp, reduceMax, reduceMean, reduceMin, reduceProduct, reduceSum, reduceSumSquare

🤔 This implies we'd have a table per op (e.g. one for abs, one for ceil, one for cos, etc), and modify the algorithms to pass/take both data types and ranks. That's... a lot of tables.

Yes, it's a lot. Can we group some tables? Like a table for float-pointing element-wise unary, including ceil, floor, cos, sin, erf, exp, log, reciprocal, tan etc.,. However it may require linking the "allowed data types" to a particular table which is another discussion.

We can expand the definition of "allowed ranks" beyond "any" or an explicit number to include:

a reference to other parameters, e.g. "same as input"

a range

🤔 Do we introduce the implied duplicate steps? e.g. for layerNormalization, do we have steps that validate rank against the table (i.e. scale is in [0, input rank]) and a specific value (i.e. scale is equal to options.axes's size) ?

Duplicating steps is not the intention. I just feel user should not expect validation failure if the supplied tensor is allowed according to "allowed ranks". We have "same as input" for "allowed data types". Could we have something like "maximum to input rank" for "allowed ranks"?

inexorabletash · 2024-11-11T23:50:09Z

It seems some operators are not covered, such as argMin/Max.

f61292d adds tables for all ops (or op categories)

inexorabletash · 2024-11-11T23:52:19Z

🤔 This implies we'd have a table per op (e.g. one for abs, one for ceil, one for cos, etc), and modify the algorithms to pass/take both data types and ranks. That's... a lot of tables.

Yes, it's a lot. Can we group some tables? Like a table for float-pointing element-wise unary, including ceil, floor, cos, sin, erf, exp, log, reciprocal, tan etc.,. However it may require linking the "allowed data types" to a particular table which is another discussion.

For now, in f61292d I left it at one table per "op category" and introduced the phrase "specified as part of operation steps" in the table.

inexorabletash · 2024-11-12T00:10:43Z

I just feel user should not expect validation failure if the supplied tensor is allowed according to "allowed ranks".

Makes sense. Added in 4bf14f6 for the cases you called out - I didn't audit the ops for more, though. How does that look?

index.bs

inexorabletash · 2024-11-12T18:25:27Z

As always, having the table contain "specified as part of operation steps" is an intentionally minimal change. There are a few more things we could do:

Have separate tables; e.g. for element-wise logical we could have a generic table with "any" and a table specific to logicalNot; element-wise unary would need 3 (signed types, float types, and any for identity); pooling would need 2; reduction ops would need 3
Keep a single table, but expand the text with casual developer-focused commentary, e.g. for element-wise logical it could read: "specified as part of operation steps; most element-wise logical ops support any data type, but logicalNot() only supports uint8"

huningxin

LGTM!

index.bs

huningxin

LGTM!

fdwr

Thanks Josh! Minor comments, else LGTM.

index.bs

Fix copy/pasta of {{input}} for other params Co-authored-by: Dwayne Robinson <[email protected]>

fdwr

👍

SHA: c237cf1 Reason: push, by fdwr Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

inexorabletash mentioned this pull request Apr 26, 2024

Specify the operand data type constraints of operations #646

Merged

inexorabletash force-pushed the data-type-table branch from 31410f3 to 48b240b Compare April 26, 2024 19:26

Make operand data type and rank validation table-driven

7f2e78b

Improve readability of input operand data type and rank by introducing a table within each method definition that outlines the restrictions for positional arguments and options. Steps are updated to reference the table columns.

inexorabletash force-pushed the data-type-table branch from c8b4ac2 to 7f2e78b Compare April 27, 2024 00:24

inexorabletash changed the title ~~Data type table~~ Make operand data type and rank validation table-driven Apr 27, 2024

inexorabletash added 4 commits April 27, 2024 21:52

Merge remote-tracking branch 'origin/main' into data-type-table

555dd38

fix broken link

6576e16

Merge branch 'refs/heads/draft' into data-type-table

0b8dafd

typo fix

5ae43b5

inexorabletash added 2 commits April 30, 2024 14:41

Merge branch 'refs/heads/draft' into data-type-table

8e53efa

Merge branch 'refs/heads/draft' into data-type-table

0334cc8

inexorabletash added 2 commits May 2, 2024 07:26

Merge remote-tracking branch 'origin/main' into data-type-table

12127a0

Varioius fixes

295cd0b

- gemm(): Fix ranks in table, align phrasing. - gru(): Align phrasing. - lstm(): Don't inline rank of 3, reference table. - matmul(): Align phrasing. - prelu(): Fix punctation. - triangular(): Add table, use for rank validation. - where(): Align phrasing.

inexorabletash force-pushed the data-type-table branch from 4c13bf8 to 295cd0b Compare May 2, 2024 19:20

inexorabletash added the editorial label May 2, 2024

inexorabletash added 10 commits May 3, 2024 09:02

Merge branch 'refs/heads/draft' into data-type-table

4e06145

Merge branch 'refs/heads/draft' into data-type-table

87b7fb7

Merge branch 'refs/heads/draft' into data-type-table

af0306c

Merge branch 'refs/heads/draft' into data-type-table

9d4de41

Remove redundant rank validation, when shape is validated

312e23e

Merge branch 'refs/heads/draft' into data-type-table

c58076d

Merge branch 'refs/heads/draft' into data-type-table

35d7034

Merge branch 'refs/heads/draft' into data-type-table

e736d11

Merge branch 'refs/heads/draft' into data-type-table

c9a4f50

Merge branch 'refs/heads/draft' into data-type-table

5adb4c5

huningxin reviewed Nov 8, 2024

View reviewed changes

inexorabletash and others added 3 commits November 8, 2024 10:31

Merge branch 'refs/heads/draft' into data-type-table

96e667e

Apply suggestions from code review

1fee398

"input" -> "same as input" Co-authored-by: Ningxin Hu <[email protected]>

Merge remote-tracking branch 'mine/data-type-table' into data-type-table

38aa261

inexorabletash added 4 commits November 11, 2024 09:38

Merge branch 'refs/heads/review' into data-type-table

a55f7d4

scope 'allowed' links

7960c2a

add table captions; link steps to tables

8d9b3a5

Add tables for all ops

f61292d

Allow allowed ranks to be a range

4bf14f6

huningxin reviewed Nov 12, 2024

View reviewed changes

index.bs Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

elementwise-logical and reduction types are given in op steps

db8d1e2

align spelling of 'element-wise binary' with rest of spec

bd425cb

huningxin approved these changes Nov 13, 2024

View reviewed changes

huningxin reviewed Nov 13, 2024

View reviewed changes

index.bs Outdated Show resolved Hide resolved

restore prelu's slope's rank to 'any'

8e540f3

huningxin approved these changes Nov 13, 2024

View reviewed changes

fdwr reviewed Nov 15, 2024

View reviewed changes

index.bs Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

index.bs Outdated Show resolved Hide resolved

Apply suggestions from code review

88a7a12

Fix copy/pasta of {{input}} for other params Co-authored-by: Dwayne Robinson <[email protected]>

fdwr approved these changes Nov 15, 2024

View reviewed changes

fdwr merged commit c237cf1 into webmachinelearning:main Nov 15, 2024
2 checks passed

github-actions bot added a commit that referenced this pull request Nov 15, 2024

Make operand data type and rank validation table-driven (#657)

5f504e6

SHA: c237cf1 Reason: push, by fdwr Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

inexorabletash mentioned this pull request Nov 15, 2024

Include output operand type/rank in op constraint tables #789

Merged

inexorabletash deleted the data-type-table branch November 15, 2024 22:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make operand data type and rank validation table-driven #657

Make operand data type and rank validation table-driven #657

inexorabletash commented Apr 26, 2024 •

edited by pr-preview bot

Loading

zolkis commented Apr 29, 2024

inexorabletash commented Apr 30, 2024 •

edited

Loading

zolkis commented May 2, 2024 •

edited

Loading

inexorabletash commented May 2, 2024

zolkis commented May 3, 2024

inexorabletash commented Nov 7, 2024

huningxin left a comment

inexorabletash commented Nov 8, 2024

huningxin commented Nov 9, 2024

inexorabletash commented Nov 11, 2024

inexorabletash commented Nov 11, 2024

inexorabletash commented Nov 12, 2024

inexorabletash commented Nov 12, 2024

huningxin left a comment

huningxin left a comment

fdwr left a comment

fdwr left a comment

Make operand data type and rank validation table-driven #657

Make operand data type and rank validation table-driven #657

Conversation

inexorabletash commented Apr 26, 2024 • edited by pr-preview bot Loading

zolkis commented Apr 29, 2024

inexorabletash commented Apr 30, 2024 • edited Loading

zolkis commented May 2, 2024 • edited Loading

inexorabletash commented May 2, 2024

zolkis commented May 3, 2024

inexorabletash commented Nov 7, 2024

huningxin left a comment

Choose a reason for hiding this comment

inexorabletash commented Nov 8, 2024

huningxin commented Nov 9, 2024

inexorabletash commented Nov 11, 2024

inexorabletash commented Nov 11, 2024

inexorabletash commented Nov 12, 2024

inexorabletash commented Nov 12, 2024

huningxin left a comment

Choose a reason for hiding this comment

huningxin left a comment

Choose a reason for hiding this comment

fdwr left a comment

Choose a reason for hiding this comment

fdwr left a comment

Choose a reason for hiding this comment

inexorabletash commented Apr 26, 2024 •

edited by pr-preview bot

Loading

inexorabletash commented Apr 30, 2024 •

edited

Loading

zolkis commented May 2, 2024 •

edited

Loading