Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including derivative in parallel L-BFGS-B method #309

Open
AdrianPerezSalinas opened this issue Jan 12, 2021 · 16 comments
Open

Including derivative in parallel L-BFGS-B method #309

AdrianPerezSalinas opened this issue Jan 12, 2021 · 16 comments
Labels
enhancement New feature or request

Comments

@AdrianPerezSalinas
Copy link
Contributor

Hi everyone, have a Happy New Year.

We are most of the time using variational circuits that depend on some tunable parameters. We have been using scipy methods to find the optimal parameters, and recently the parallel L-BFGS-B method was added to the repository. My proposal is to extend this method to the case where the gradient of the function is given for optimizing.
I think this is useful for two main reasons:

  1. We can implement exact gradients for quantum circuits as in the finite differences method. This will be helpful when optimization is done using quantum hardware instead of simulation because this gradient is much more resilient to noise.
  2. In particular for reuploading-like circuits, the number of operations will be smaller. The finite differences method works by computing f(\theta +/- \pi / 2), where \theta is just one rotation angle. In reuploading circuits, \theta = w x + b, so with only one set of computation we find the gradient for 2 parameters (and more and more if more weights are used)

I have been looking to the code and saw that the core of the computation is somehow delegated to the standard scipy recipe
with self.mp.Pool(processes=self.processes) as self.pool:
from scipy.optimize import minimize
out = minimize(fun=self.fun, x0=x0, jac=self.jac, method='L-BFGS-B', bounds=self.bounds, callback=self.callback, options=self.options)

Thus, it should be easy to implement what I ask for. The first step should be allowing the use of the keyword fprime which is already defined for this function. This is trivial, I think. The second step should be including the gradient function. However, this could be left to the user to pass as an argument of the code.

What do you think?

@AdrianPerezSalinas AdrianPerezSalinas added the enhancement New feature or request label Jan 12, 2021
@scarrazza
Copy link
Member

@AdrianPerezSalinas thanks for this issue. I think this is feasible, by computing the analytic gradients manually or automatically (via tensorflow, or other backends).

@AdrianPerezSalinas
Copy link
Contributor Author

Do you think automatic gradients are compatible with the finite differences techniques?

@scarrazza
Copy link
Member

Usually, automatic analytic gradients are more efficient than finite differences, which are already computed by the scipy minimize.

@AdrianPerezSalinas
Copy link
Contributor Author

I think I did not explain myself properly. When I say finite differences for quantum circuits I am talking about a method that allows to compute the exact analytical gradient by shifting the values of the parameters a large quantity (for most operators this quantity is pi/2). This is robust against inherent statistical noise, so it is useful for the experiment. See Eqs. 13 and 14 from here: https://arxiv.org/pdf/1811.11184.pdf

@scarrazza
Copy link
Member

Thanks for the clarification, sorry for the misunderstanding.
Yes, this is something which may help, and for sure interesting to have build-in.

@AdrianPerezSalinas
Copy link
Contributor Author

Nice! We can discuss it tomorrow

@AdrianPerezSalinas
Copy link
Contributor Author

Hi @scarrazza , I leave here a document explaining my point with more details. Hope you find it useful!
main.pdf

@AdrianPerezSalinas
Copy link
Contributor Author

Hi @scarrazza ,
I have been testing the method I proposed to you. I have not done so much, but results are pretty promising. I have done a test for the standard VQE with small circuits, as states in QIBO's examples. If I compute the expected value of the hamiltonian exactly (0 shots), both methods return the same optimization path
VQE_0shots.pdf

However, if I do the same with measurements (approximation is kind of rough), optimization is really different. It gets stuck quickly since the derivatives do not give any information.
VQE_10000000.0shots.pdf

In addition it looks like it does not matter how many shots you perform to estimate the hamiltonian, results are not good

Today I will check the next steps

@scarrazza
Copy link
Member

Ok, thanks for this tests, lets see.

@AdrianPerezSalinas
Copy link
Contributor Author

I have tested a fitting problem like qPDF, results are comparable
fit_0shots.pdf
fit_10000shots.pdf

@AdrianPerezSalinas
Copy link
Contributor Author

Same test for an easy classifier,
classifier_0shots.pdf
classifier_10000shots.pdf

I think that it is clear, we need to implement this functionality when applying optimization to measurements. In addition, at least for measurement simulations, it is way faster to do it in this new way.

Do you want me to show you the code?

@alhajri
Copy link

alhajri commented Jan 20, 2021

Hi @AdrianPerezSalinas , just wondering if you are aware of this paper by Simon Benjamin on "Quantum Analytic Descent":
https://arxiv.org/abs/2008.13774

@AdrianPerezSalinas
Copy link
Contributor Author

I was not aware of it, but thank you very much for pointing it out!

I have taken a look at it and it sounds really interesting. However, after the discussion today I think that the exact derivatives method or this one can reduce the error due to sampling, but cannot deal with imperfect circuits. We will have to further investigate about it

@AdrianPerezSalinas
Copy link
Contributor Author

I leave here the results of some tests made with the exact derivative and a VQE model. As you may see, with errors of order 0.1% we can still have some minimization. Noisier circuits are chaotic.

VQE_100000.0shots.pdf

I think that implementing this kind of exact derivatives could be interesting only if we know that the circuit noise is below a certain threshold.

@scarrazza
Copy link
Member

OK and how this compares to the numerical derivative for similar configurations?

@AdrianPerezSalinas
Copy link
Contributor Author

Nothing returns result extremely good, but exact derivatives are more resilient to noise and errors than numerical ones. Not too much, though

VQE_0shots_0.01.pdf
VQE_0shots_0.001.pdf
VQE_10000shots_0.01.pdf
VQE_10000shots_0.001.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants