Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEP-18: CuPy backend requires unimplemented algorithms #73

Open
pentschev opened this issue Feb 28, 2019 · 14 comments · May be fixed by #75
Open

NEP-18: CuPy backend requires unimplemented algorithms #73

pentschev opened this issue Feb 28, 2019 · 14 comments · May be fixed by #75

Comments

@pentschev
Copy link
Member

pentschev commented Feb 28, 2019

For the past few weeks, I've been working on issues related to NEP-18 support for Dask (and ultimately, Dask-GLM as well) to allow CuPy to be used as a backend. I've made some progress, which I'll soon share with more details.

Among the issues I have now is the usage of the following Dask-GLM algorithms with CuPy:

  1. newton: requires numpy.linalg.lstsq
  2. admm and lbfgs: require scipy.optimize.fmin_l_bfgs_b

Both of these functions are not implemented in CuPy. So I'm wondering if anybody knows whether any of these functions have some open and reliable CUDA implementation that could be integrated into CuPy or if we could implement them somewhat easily on top of already supported CuPy functionality?

@mrocklin
Copy link
Member

cc @cicdw

@mrocklin
Copy link
Member

My guess is that in both cases the resulting array (the input of lstsq or the gradient returned by compute_loss_grad) is probably very small, and so one approach would be to just always convert it into a NumPy array. My guess is that from a performance perspective this would have very little impact and would be easy to accomplish.

The function dask.array.utils.normalize_to_array was intended for this purpose (though it currently supports only the cupy case. We might consider renaming it to normalize_to_numpy.

@cicdw
Copy link
Collaborator

cicdw commented Feb 28, 2019

I'm not familiar with the CuPy / CUDA ecosystem, but in terms of implementing the two methods you mention: I'd suspect that some form of lstsq would be doable, but fmin_l_bfgs_b would probably be incredibly difficult.

@mrocklin
Copy link
Member

So my proposal above isn't that we implement lstsq and fmin_l_bfgs_b in CUDA, it's that we just always coerce the necessary arrays to NumPy. My understanding is that the inputs to lstsq, and the outputs of compute_loss_grad are on the order of the number of parameters, which is typically under 100 or so and so is probably better suited to the CPU anyway. Is my understanding correct here @cicdw ?

@cicdw
Copy link
Collaborator

cicdw commented Feb 28, 2019

Ah I see that now; yea that's all correct.

@pentschev
Copy link
Member Author

The idea to use normalize_to_array() sounds good to me. I'm now wondering about the case where we need to convert these outputs back to their original type arrays. We can use *_like() but only for predefined fillings, do we have any ideas on doing the reverse normalize_to_array() operation? I thought maybe introducing a copy_like() function in NumPy could be useful for that, but perhaps there's already a better alternative that I don't know about.

@mrocklin
Copy link
Member

mrocklin commented Mar 1, 2019

I think that it's probably OK if we always return numpy arrays. These outputs are likely to be small.

@pentschev
Copy link
Member Author

It's not ok in here: https://github.com/dask/dask-glm/blob/master/dask_glm/algorithms.py#L347

We pass beta0, X and y to scipy.optimize.fmin_l_bfgs_b. beta0 gets has to be copied back to CPU because it gets modified here: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb.py#L326

Later on, x (formerly beta0) is passed here: https://github.com/scipy/scipy/blob/master/scipy/optimize/lbfgsb.py#L335, which is actually wrapping https://github.com/dask/dask-glm/blob/master/dask_glm/algorithms.py#L340. So here, X and y remain on GPU, but beta is on CPU. And this loop happens several times. If we copy X and y back to CPU, we beat the purpose of using CuPy in the first place.

@mrocklin
Copy link
Member

mrocklin commented Mar 1, 2019

Ah, my hope was that when X, y, and beta interacted that the GPU arrays would pull the CPU array to the GPU, rather than the other way around. One sec, I'm going to play with this briefly.

@mrocklin
Copy link
Member

mrocklin commented Mar 1, 2019

So I was hoping that things like the following would work

In [1]: import numpy as np

In [2]: import cupy

In [3]: x = np.arange(100)

In [4]: y = cupy.arange(100)

In [5]: z = x + y
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-8c8f78e0676d> in <module>
----> 1 z = x + y

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.__array_ufunc__()
   1734                     for x in inputs
   1735                 ])
-> 1736             return cp_ufunc(*inputs, **kwargs)
   1737         # Don't use for now, interface uncertain
   1738         # elif method =='at' and name == 'add':

~/cupy/cupy/core/_kernel.pyx in cupy.core._kernel.ufunc.__call__()
    790             raise TypeError('Wrong number of arguments for %s' % self.name)
    791
--> 792         args = _preprocess_args(args)
    793         if out is None:
    794             in_args = args[:self.nin]

~/cupy/cupy/core/_kernel.pyx in cupy.core._kernel._preprocess_args()
     84             arg = _scalar.convert_scalar(arg, use_c_scalar)
     85             if arg is None:
---> 86                 raise TypeError('Unsupported type %s' % typ)
     87         ret.append(arg)
     88     return ret

TypeError: Unsupported type <class 'numpy.ndarray'>

@mrocklin
Copy link
Member

mrocklin commented Mar 1, 2019

So, if we had empty_like(..., shape=) maybe that would allow us to do copy_like (without needing to actually add that to the NumPy API. Something like

if type(X) != type(beta):
    beta2 = np.empty_like(X, shape=beta.shape)
    beta2[:] = beta

I wanted to verify that assignment worked between numpy and cupy, but found that it didn't:

In [6]: y[:] = x
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-2b3b50afef58> in <module>
----> 1 y[:] = x

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.__setitem__()
   1682
   1683         """
-> 1684         _scatter_op(self, slices, value, 'update')
   1685
   1686     def scatter_add(self, slices, value):

~/cupy/cupy/core/core.pyx in cupy.core.core._scatter_op()
   3608     if op == 'update':
   3609         if not isinstance(value, ndarray):
-> 3610             y.fill(value)
   3611             return
   3612         x = value

~/cupy/cupy/core/core.pyx in cupy.core.core.ndarray.fill()
    520         if isinstance(value, numpy.ndarray):
    521             if value.shape != ():
--> 522                 raise ValueError(
    523                     'non-scalar numpy.ndarray cannot be used for fill')
    524             value = value.item()

ValueError: non-scalar numpy.ndarray cannot be used for fill

This seems like the kind of things that cupy might accept as a PR. I was surprised to learn that this didn't work.

@pentschev
Copy link
Member Author

I actually had the exact same idea with the y[:] = x assignment before, but I assumed the non-automatic conversion was intentional. I think if we can allow that to work, it would be a good solution, without even needing to touch NumPy, which seems better.

I will see if I can write a PR for that, hopefully it won't be too complicated.

@mrocklin
Copy link
Member

mrocklin commented Mar 1, 2019 via email

@pentschev
Copy link
Member Author

I found two open issues related to what we just talked here:

cupy/cupy#593
cupy/cupy#589

@pentschev pentschev linked a pull request Mar 4, 2019 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants