Shapes in Aesara #1277

rlouf · 2022-10-27T20:15:01Z

rlouf
Oct 27, 2022
Maintainer

In the following I will try to summarize the situation with shapes & broadcasting in Aesara and try to lay it out in simple terms. None of this is news; most of the content of this post has been mentioned here and there in comments. It just hasn't been spelled out completely in one place.

Shapes in Aesara today

Users can currently initialize tensors without providing any information about the shape. Aesara’s shape system currently represents the lack of information about the shape with None:

import aesara.tensor as at

a_et = at.vector('a')
print(a_et.type.shape)
# (None,)

If the user has more information about the shape of a tensor they can specify it in different ways: either using the built-in type, initializing the tensor with a shape or using specify_shape:

import aesara.tensor as at

a_et = at.lrow('a')
print(a_et.type.shape)
# (1, None)

a_et = at.tensor(dtype='float32', shape=(4,))
print(a_et.type.shape)
# (4,)

a_et = at.vector('a')
b_et = at.specify_shape(a_et, (4,))
print(b_et.type.shape)
# (4,)

If they only have partial information they can also specify it:

import aesara.tensor as at

a_et = at.tensor(dtype='float32', shape=(None, 1, None))
print(a_et.type.shape)
# (None, 1, None)

This shape information is currently propagated through the graph using TensorType.shape.

Adding extra information: `shape different from 1`

There are situations in which this type-level information is not sufficient, for instance in Elemwise.grad where the Op needs to know whether broadcasting occured between inputs to know whether to sum or not. We thus need to encode more information in our type system, namely the knowledge that the shape is different from 1.

Let’s assume here that we represent this information with -1: if the shape of a dimension is specified as -1 this means that we know it is different from 1. It is not as specific as a full shape specification, but carries more information than None.

Users can specify this information when creating a tensor:

import aesara.tensor as at

a_et = at.tensor(dtype='float32', shape=(None, -1, None))
print(a_et.type.shape)
# (None, -1, None)

Again, this information can be propagated through the graph the same way None is. There are still places where we would need to add the correct logic to propagate this constraint, but these are Elemwise and other Ops we are trying to fix by doing this.

To summarize, the user will be able to specify the shape at the type level using, from the greatest to the least amount of information:

An integer fully specifies the shape;
-1 means that we only know the shape is not 1.
None means that we have no information about the shape;

That is all the information Aesara needs about shape at compile-time.

What do we do when this information is not available?

As stated above, there are situations in which the behavior of an Op differs depending on whether shape=1 of shape!=1. What to do when this information is not available in the graph, i.e. when there is ambiguity, is where potential tradeoffs appear. Let us go through these tradeoffs methodically.

There are three moments at which we can decide to handle the lack of information: construction-time, compile-time, and runtime. In the following we will only consider compile-time and runtime; handling shape inference at construction-time is not necessary, implies a lot of duplicated logic with compile-time inference for little added value, if any.

Handle the issue at compile-time

It is impossible to resolve the ambiguity at compile-time; if the information is not there, it is simply not there. We thus only have two ways forward: assuming or failing.

First, assuming. Remember that Aesara currently represents unspecified dimensions with None, e.g. when at.vector is used. This is just Aesara (correctly) stating that we have no shape information about the tensors that are created this way. We could change this behavior, and assume on behalf of the user that unspecified dimensions default to -1; this will obviously lead to surprising behavior when the user inputs e.g. np.ones((1,)) and is error-prone. We can always add a runtime assertion, but that does not make the behavior less surprising and adds a (small) runtime cost.

Then, failing. Here Aesara does not make any assumption about the user’s intents, but instead asks the user to resolve any ambiguity. That means failing at compile-time, explaining the ambiguity and asking the user to provide more information. This can however quickly become frustrating for users, make existing code fail, and completely forbid cases where they want to pass tensors where sometimes one dimension is equal to 1, and sometimes the same dimension is different from one.

In either case, trying to handle the lack of information at compile-time involves important tradeoffs with the user interface.

Handle the issue at runtime

The ambiguity is necessarily resolved at runtime: users call compiled functions with arrays that have a concrete shape. We could thus defer handling insufficient in-graph information at runtime. There are two ways to do this.

The first possibility to consider is a custom Op, let's call it SumDims which is called with booleans that indicate whether a dimension should be summed or not. This Op basically encapsulates a multi-branch conditional. However, none of this structure is explicit in the graph, and we will need to implement rewrites specific to this Op. This is a fundamental limitation with every custom Op approach.

Or we can make this branching structure explicit and use the existing IfElse directly: whenever there was not enough information to determine shape!=1 at compile time we enumerate at the distinct conditions and their graph so each branch represents a distinct x.sum. The advantage of this approach over the custom Op is that we can leverage existing rewrites; if new rewrites need to be implemented they will be more widely applicable in other scenarios.

There are still legitimate concerns with the IfElse approach¹: performance and graph complexity. The first thing to note is that the IfElse can be removed at compile-time when shape inference is performed if enough information was provided by the user. Whenever we cannot resolve the shapes at compile-time, however, branching logic is the price to pay to get the information that was not provided by the user: as far as the Aesara compiler knows, either scenario can happen. The runtime cost would be small, but the user interested in small performance gains can always avoid the cost of this branching logic by giving more information regarding the shapes of their tensors.

What is not in this write-up

I left out some of the concerns given in the other issues because they are not related to the representation in the IR and should be addressed downstream:

Performance-related considerations. These should be dealt with in the linkers. If there are faster ways to do this in e.g. Numba, the graph can always be rewritten from the Numba linker.
Backend-related considerations. Concerns related to compatibility with different backends are secondary, and apparent conflicts can generally be dealth with in in the respective linkers. As far as I know compiling to JAX should not be an issue with the IfElse solution.

I also purposefully did not use the expression "dynamic broadcasting" which is not specific enough to unambiguously identify a set of solutions.

These concerns also apply to the custom Op approach. ↩

brandonwillard · 2022-10-28T00:17:44Z

brandonwillard
Oct 28, 2022
Maintainer

Here's some more background and supplementary information:

compile-time is when a user calls aesara.function and a FunctionGraph is created and manipulated. At these times, we have at our disposal some Features that track all sorts of information about a graph as it is being manipulated by rewrites.
"compile-time shape inference" refers to a combination of ShapeFeature and a collection of independent rewrites that manipulate nodes that use Shape, Shape_i, SpecifyShape, and other shape-related Ops. A ShapeFeature instance maintains a dict mapping Variables to their inferred shapes, and that dict effectively serves as the single source of all shape information for a specific graph.
construction-time is when graph-constructing code is called, like Op.make_node, aesara.grad, and numerous other helper-functions throughout Aesara. It encompasses the lifetime of an Aesara user's Python session and compile-time. The rewrites used at compile-time also make extensive use of all the graph-constructing code.
"construction-time shape inference" refers to specialized construction-time code that performs limited forms of shape inference in order to produce TensorTypes with less ambiguous TensorType.shape values. (N.B. People will probably be most familiar with this logic in its old TensorType.broadcastable form.) Some construction-time code uses this partial shape information to determine the exact form of the graphs it produces. Construction-time code that does this is extremely problematic, because, if it's wrong or inaccurate, the resulting graphs can represent expressions that are invalid for valid inputs. This is the exact problem that Elemwise.grad poses in Wrong gradients when inputs are dynamically broadcasted #1089.

Moving construction-time shape inference to compile-time is necessary to guarantee that we'll always have the most shape information possible. It also serves to reduce duplication, because shape inference logic that applies at construction-time necessarily applies at compile-time, but the latter is capable of handling much more than the former. See #1275 for one such change in this direction.

Now, consider the following example:

import aesara
import aesara.tensor as at


s = at.lscalar("s")
x = at.ones((s,))

x.type
# TensorType(float64, (None,))

In this case, the shape information for x at construction-time is necessarily ambiguous, and a user can't directly specify the shape != -1 TensorType constraint on x that was mentioned above.

How could this be handled?

To start, we know that we ultimately need to add a constraint on the value of s–instead of the shape of s. Even if we were able to add a -1 constraint to x somehow, we would still need to "lift" it up to the relevant input, s, so that validation can be performed at the first point of inconsistency.

The point we're talking about is at the Type-level, or, in other words, when a Type is associated with a value and methods like Type.filter can be employed. That association first takes place in Function when user input is provided to a compiled graph and validated. Alternatively, CheckAndRaise nodes could be added to graphs in order to perform the same checks at run-time, but the true "points of origin" will always be wherever the first Type.filter calls are made.

Anyway, all of this work needs to be at compile-time, and that's another reason to focus on moving all shape inference to a single, coherent set of compile-time operations. For instance, at compile-time we're able to "parse" expressions like assert_op(a != 1) that express the desired constraints on values and cover cases like the one above. From there we can update the TensorTypes or add CheckAndRaise nodes.

0 replies

ricardoV94 · 2022-10-31T08:24:10Z

ricardoV94
Oct 31, 2022

Another option instead of shape != -1 might be to make better use of dimension equality assertions via at.specify_shape:

import aesara
import aesara.tensor as at
from aesara.ifelse import ifelse

x = at.vector("x")
y = at.vector("y")

y = at.specify_shape(y, x.shape)
r = ifelse(at.eq(x.shape[0], y.shape[0]), at.ones_like(y), at.ones_like(y).sum(keepdims=True))

f = aesara.function([x, y], r)
aesara.dprint(f)

This currently does not get rid of the IfElse, and actually removes the specify_shape check

if{inplace} [id A] 6
 |Elemwise{eq,no_inplace} [id B] 5
 | |Shape_i{0} [id C] 4
 | | |x [id D]
 | |Shape_i{0} [id E] 0
 |   |<TensorType(float64, (None,))> [id F]
 |Alloc [id G] 3
 | |TensorConstant{(1,) of 1.0} [id H]
 | |Shape_i{0} [id E] 0
 |Elemwise{Cast{float64}} [id I] 2
   |InplaceDimShuffle{x} [id J] 1
     |Shape_i{0} [id E] 0

f([1], [1])  # array([1.])
f([1, 1, 1, 1], [1, 1, 1, 1])  # array([1., 1., 1., 1.])
f([1], [1, 1, 1, 1])  # array([4.]), but should have raised!

6 replies

ricardoV94 Oct 31, 2022

It does not require specifying shape!=-1.

Telling users that they can get more efficient graphs by specifying which dimensions match seems less strange than asking them to specify which variables have a shape!=-1.

rlouf Oct 31, 2022
Maintainer Author

Users should be free to use either specify_shape or an assert-like statement to resolve the ambiguity, this is not a point of contention.

We need to introduce this -1 constraint to represent the lack of information in the graph. The only way to avoid using -1 is to /fail/ when the information isn't there and ask users to provide more information, or /assume/ when there is no information.

ricardoV94 Oct 31, 2022

We need to introduce this -1 constraint to represent the lack of information in the graph.

I am saying that maybe we don't need to do this. This an alternative way (maybe complementary) to answer the same question: Will I need to broadcast an input or not? You can get that by asserting something about the shape of each input (-1, 1, other constant, unknown), or about the equality of the shapes of multiple inputs (equal, different, unknown).

rlouf Oct 31, 2022
Maintainer Author

I agree this can be another way to provide information, and it should be implemented as such. That was explicitly mentioned in @brandonwillard's reply:

For instance, at compile-time we're able to "parse" expressions like assert_op(a != 1) that express the desired constraints on values and cover cases like the one above. From there we can update the TensorTypes or add CheckAndRaise nodes.

However we still need to consider what we do when the information is not provided and that is the whole point of my original post. Is there anything specific you disagree with in this post?

brandonwillard Oct 31, 2022
Maintainer

It's not clear to me how this is a complete replacement for the != 1 static shape constraint. The fact that it requires one to relate the shapes of two specific variables makes it seem more difficult to use and possibly less applicable.

Also, how would this work for multiple dimensions?

ricardoV94 · 2022-10-31T08:40:11Z

ricardoV94
Oct 31, 2022

As far as I know compiling to JAX should not be an issue with the IfElse solution

It seems it might be?

import aesara
import aesara.tensor as at
from aesara.ifelse import ifelse

x = at.vector("x")
y = at.vector("y")

r = ifelse(at.eq(x.shape[0], y.shape[0]), at.ones_like(y), at.ones_like(y).sum(keepdims=True))

f = aesara.function([x, y], r, mode="JAX")
f([1], [1, 1, 1, 1])  
# TypeError: true_fun and false_fun output must have identical types got
# ('DIFFERENT ShapedArray(float64[4]) vs. ShapedArray(float64[1])',).

It also fails without JIT

But JAX can cope with broadcasting, because it recompiles everytime the function inputs change shape. We might just need to represent graphs differently (without IfElse) for JAX to be able to exploit that. I don't know if we can get Aesara gradient graphs to work though...

3 replies

rlouf Oct 31, 2022
Maintainer Author

As I wrote in the original post, backend-specific considerations are irrelevant to this discussion which is about Aesara's internal representation. Backend-specific constraints can be handled in the linker: if JAX can perform the computations described by the Aesara model (i.e. no array with varying dimensions, use of boolean masks, etc.), then we are at worst a few rewrites away from compiling the Aesara graph to a JAX function that can be JIT-compiled.

ricardoV94 Oct 31, 2022

I don't think you can make smart decisions if you don't take into consideration the backend limitations, or at least acknowledge which ones you will be okay with breaking. Right now I suspect you cannot create a JAX graph that supports flexible broadcasting (at least for the gradient), unless you rely on jax.grad internally, or you make Aesara recompile when input shapes change like JAX does.

rlouf Oct 31, 2022
Maintainer Author

I don't think you can make smart decisions if you don't take into consideration the backend limitations

You totally can as long as Aesara graphs contain at least as much information as the equivalent backend implementation. For that we need to encode as much information that is provided by the user as possible, and also encode the lack of information.

In practice this might mean that we will need to improve the existing linkers and use backend-specific rewrites, but we surely don't want to import backend-limitations to the IR and break forward-compatibility with other backends whose constraints we don't know yet. In this case if that means using implicit gradients in the IR and replace those with jax.grad then so be it. But we probably won't need to do this, because

import jax
import jax.numpy as jnp

def shape_in_predicate(x):
    res = jax.lax.cond(
        x.shape[0] == 1,
        lambda _: 1,
        lambda _: -1,
        None,
    )
    return res

# This is jittable since x.shape is always a concrete array in JAX
fn = jax.jit(shape_in_predicate)
fn(jnp.array((1, 5))
# 1

And to circumvent the fact that the output of both branches need to have the same shape we can include a larger chunk of the computations in both branches.

It seems that your concern is more generally about Aesara's design principles and not the necessity to represent the lack of information in Aesara's IR. It would be best to open a new discussion for that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shapes in Aesara #1277

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Shapes in Aesara #1277

rlouf Oct 27, 2022 Maintainer

Shapes in Aesara today

Adding extra information: shape different from 1

What do we do when this information is not available?

Handle the issue at compile-time

Handle the issue at runtime

What is not in this write-up

Footnotes

Replies: 3 comments · 9 replies

brandonwillard Oct 28, 2022 Maintainer

ricardoV94 Oct 31, 2022

ricardoV94 Oct 31, 2022

rlouf Oct 31, 2022 Maintainer Author

ricardoV94 Oct 31, 2022

rlouf Oct 31, 2022 Maintainer Author

brandonwillard Oct 31, 2022 Maintainer

ricardoV94 Oct 31, 2022

rlouf Oct 31, 2022 Maintainer Author

ricardoV94 Oct 31, 2022

rlouf Oct 31, 2022 Maintainer Author

rlouf
Oct 27, 2022
Maintainer

Adding extra information: `shape different from 1`

Replies: 3 comments 9 replies

brandonwillard
Oct 28, 2022
Maintainer

ricardoV94
Oct 31, 2022

rlouf Oct 31, 2022
Maintainer Author

rlouf Oct 31, 2022
Maintainer Author

brandonwillard Oct 31, 2022
Maintainer

ricardoV94
Oct 31, 2022

rlouf Oct 31, 2022
Maintainer Author

rlouf Oct 31, 2022
Maintainer Author