Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For instance for Generators #169

Open
cscherrer opened this issue Aug 17, 2020 · 9 comments
Open

For instance for Generators #169

cscherrer opened this issue Aug 17, 2020 · 9 comments

Comments

@cscherrer
Copy link
Owner

cscherrer commented Aug 17, 2020

@zenon noticed in #166 that rand gives incorrect results for Generators. This is specific to v0.12, since it's now fixed in master.

So first, yes we need a new release.

But also...

In v0.12, we have the incorrect method definition

@inline function rand(d :: For{F,T,D,X}) where {F,T <: Base.Generator, D, X}
    rand.(Base.Generator(d.θ.f, d.θ.iter))
end

In 0345f04 this was updated to

@inline function rand(d::For{F,T,D,X}) where {F,T<:Base.Generator,D,X}
    return rand.(Base.Generator(d.f, d.θ))
end

But thinking more about this, I wonder if it would be better to have

@inline function rand(d::For{F,T,D,X}) where {F,T<:Base.Generator,D,X}
    return rand.(Base.Generator(d.f  d.θ.f, d.θ.iter))
end

Or even

@inline function rand(d::For{F,T,D,X}) where {F,T<:Base.Generator,D,X}
    return Base.Generator(rand  d.f  d.θ.f, d.θ.iter)
end

This would avoid allocation. We need to test...

  • Would this lead to better composability?
  • Would this improve performance?
@DilumAluthge
Copy link
Collaborator

Could you explain the difference between each of these?

@cscherrer
Copy link
Owner Author

Sure :)

This one

@inline function rand(d::For{F,T,D,X}) where {F,T<:Base.Generator,D,X}
    return rand.(Base.Generator(d.f, d.θ))
end

Builds a Generator-within-a-Generator, while this

@inline function rand(d::For{F,T,D,X}) where {F,T<:Base.Generator,D,X}
    return rand.(Base.Generator(d.f  d.θ.f, d.θ.iter))
end

unrolls it so you just have a single Generator. These may lead to the same compiled code, I'm not sure. But in both cases, the outer rand. forces the whole thing into an Array.

The last one is different, because the end result is a Generator over random numbers. When I mentioned composability, I'm thinking of things like streaming results of on simulation into the input for another.

@DilumAluthge
Copy link
Collaborator

In the very last example (the one that returns a Generator), if you really want an Array, you can just call collect() on the Generator, right?

@cscherrer
Copy link
Owner Author

Yes, that's right. Also, I just realized the last example has the interesting property that it acts as a suspended computation:

julia> θ = (sqrt(p) for p in 1:10)
Base.Generator{UnitRange{Int64},typeof(sqrt)}(sqrt, 1:10)

julia> d = For(θ) do θj
           Normal(θj, 1)
           end
For{var"#5#6",Base.Generator{UnitRange{Int64},typeof(sqrt)},Normal{Float64},Float64}(var"#5#6"(), Base.Generator{UnitRange{Int64},typeof(sqrt)}(sqrt, 1:10))

julia> r = rand(d)
Base.Generator{UnitRange{Int64},Base.var"#62#63"{Base.var"#62#63"{typeof(rand),var"#5#6"},typeof(sqrt)}}(Base.var"#62#63"{Base.var"#62#63"{typeof(rand),var"#5#6"},typeof(sqrt)}(Base.var"#62#63"{typeof(rand),var"#5#6"}(rand, var"#5#6"()), sqrt), 1:10)

julia> collect(r)
10-element Array{Float64,1}:
 0.06844850095205879
 4.345937348027204
 0.4260161672012266
 0.9027785360988985
 2.404324563944117
 2.2100818176085286
 2.2332541535358383
 1.6701557218576157
 1.7094957979035412
 3.336007963531211

julia> collect(r)
10-element Array{Float64,1}:
 1.65992410815293
 1.8507072361488428
 2.714179646583497
 1.8856240294357847
 3.247269903712069
 0.38141318206481634
 3.097531577391594
 3.818148250021317
 3.1237990423455737
 3.06097069796954

Though I'm not sure if this is guaranteed behavior, I can imagine some optimizations might want to cache it.

@DilumAluthge
Copy link
Collaborator

Is the idea that every time you call collect, you get a new sample?

Anyway, I like the last option. Because you can get the Array if you really want, by calling collect. But you don't need to allocate until you are ready.

@cscherrer
Copy link
Owner Author

Is the idea that every time you call collect, you get a new sample?

Right, the Generator doesn't allocate (or at least not much, I haven't checked that) so each traversal requires re-computing the function.

Anyway, I like the last option. Because you can get the Array if you really want, by calling collect. But you don't need to allocate until you are ready.

Right, I like that too. I'm wondering if a sensible approach is that the type of rand(::For) is related to the type of the indices you give it.

Worth noting here that arrays can sometimes be faster because despite the allocation, you get cache friendliness. But then again, since Soss generates code, I think down the line we can pull out any compiler trick in the book. So speed should get there, and maybe we should focus first on the semantics we want.

@DilumAluthge
Copy link
Collaborator

I'm wondering if a sensible approach is that the type of rand(::For) is related to the type of the indices you give it.

Hmmm. That might make sense. Although I wonder if it would make the API more confusing for new users to learn?

So speed should get there, and maybe we should focus first on the semantics we want.

For sure

@DilumAluthge
Copy link
Collaborator

I think the immediate next step is to make a new release off of master so that the bugfix can be deployed.

That buys us some time, I think, to make a decision on the semantics.

@cscherrer
Copy link
Owner Author

Good call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants