categorical does not validate probabilities #687

ricardoV94 · 2021-12-02T15:15:50Z

ricardoV94
Dec 2, 2021

The categorical seems to do a simple draw from the inverse cdf, and is oblivious to invalid probabilities,

import numpy as np
import aesara.tensor.random as atr

atr.categorical([1, 1, 1], size=100)  # Only 0s
atr.categorical([0, 0, 0], size=100)  # Only 3s

ricardoV94 · 2021-12-02T15:22:44Z

ricardoV94
Dec 2, 2021
Author

The multinomial does some checks for unit probabilities, but also does not check that they sum up to one, and does this:

atr.multinomial(100, [0, 0, 0, 0]).eval()  # array([  0,   0,   0, 100])

0 replies

brandonwillard · 2021-12-02T17:23:54Z

brandonwillard
Dec 2, 2021
Maintainer

The multinomial does some checks for unit probabilities, but also does not check that they sum up to one, and does this:
atr.multinomial(100, [0, 0, 0, 0]).eval()  # array([  0,   0,   0, 100])

This is the same behavior as NumPy:

import numpy as np


np.random.multinomial(100, [0, 0, 0, 0])
# array([  0,   0,   0, 100])

More importantly, don't forget that we cannot do input validation that can only be done with concrete/non-symbolic input values. At best, such validation could only be done under very narrow circumstances (e.g. all the inputs are Constants). Even then, its utility would be extremely limited relative to its cost.

Otherwise, it's not clear what this issue is about/requesting, so I'm going to close it for now, and we can reopen it once that's clarified.

0 replies

ricardoV94 · 2021-12-03T07:05:47Z

ricardoV94
Dec 3, 2021
Author

That still does not justify the categorical examples.

I understand there is a trade-off between input validation and speed, but I am specially surprised that the categorical can return values larger than the support domain as in my second example.

That was an extreme case, but it can happen whenever the cdf sums to something less than 1.

Obviously any input validation (if justified) would have to be done during perform.

Reopening just for discussion of the categorical case.

0 replies

brandonwillard · 2021-12-03T07:43:19Z

brandonwillard
Dec 3, 2021
Maintainer

Yeah, I don't particularly like that output either, but we need to be clear about the scope of this project, and it doesn't include making voluntary improvements to NumPy/SciPy—especially not ones that should be brought to NumPy/SciPy first.

These Ops are explicitly wrappers for NumPy/SciPy operations. If there is an outright bug in a NumPy function that prevents its intended use, then we might consider addressing that in some way, but elective input validation niceties are not a thing we can afford to spend core development time on.

Don't forget that one is able to make their own Op subclasses that exhibit custom behavior at any/every level. If a user finds that something doesn't work for their custom Op subclass (e.g. an optimization), that would have priority.

Another thing to remember is that all these validation ideas that involve changes to Aesara (or any other core libraries) have a few inherent downsides: e.g. they add steps to places in code that are called frequently and can be much more difficult to disable than they would be under a different approach.

For instance, if a user knows that input validation isn't needed and we add it into a basic Op, how are they going to circumvent it when they want/need better performance? If we naively add a "perform validation" flag, then we've just introduced a new interface that needs to be maintained and—hopefully—it isn't specific to just one Op. Now, that flag will need to be propagated when the Ops are cloned, but it might not always be clear when that should be done. Etc., etc.

Instead of making strong choices for the users and bringing that complexity into Aesara, we can focus on making our Ops simple and effective, and make it easier for users to customize and/or plug into the Aesara compilation process so that they can add their own validations—if need be.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categorical does not validate probabilities #687

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

categorical does not validate probabilities #687

ricardoV94 Dec 2, 2021

Replies: 4 comments

ricardoV94 Dec 2, 2021 Author

brandonwillard Dec 2, 2021 Maintainer

ricardoV94 Dec 3, 2021 Author

brandonwillard Dec 3, 2021 Maintainer

ricardoV94
Dec 2, 2021

ricardoV94
Dec 2, 2021
Author

brandonwillard
Dec 2, 2021
Maintainer

ricardoV94
Dec 3, 2021
Author

brandonwillard
Dec 3, 2021
Maintainer