Including derivative in parallel L-BFGS-B method #309

AdrianPerezSalinas · 2021-01-12T11:07:01Z

Hi everyone, have a Happy New Year.

We are most of the time using variational circuits that depend on some tunable parameters. We have been using scipy methods to find the optimal parameters, and recently the parallel L-BFGS-B method was added to the repository. My proposal is to extend this method to the case where the gradient of the function is given for optimizing.
I think this is useful for two main reasons:

We can implement exact gradients for quantum circuits as in the finite differences method. This will be helpful when optimization is done using quantum hardware instead of simulation because this gradient is much more resilient to noise.
In particular for reuploading-like circuits, the number of operations will be smaller. The finite differences method works by computing f(\theta +/- \pi / 2), where \theta is just one rotation angle. In reuploading circuits, \theta = w x + b, so with only one set of computation we find the gradient for 2 parameters (and more and more if more weights are used)

I have been looking to the code and saw that the core of the computation is somehow delegated to the standard scipy recipe
with self.mp.Pool(processes=self.processes) as self.pool:
from scipy.optimize import minimize
out = minimize(fun=self.fun, x0=x0, jac=self.jac, method='L-BFGS-B', bounds=self.bounds, callback=self.callback, options=self.options)

Thus, it should be easy to implement what I ask for. The first step should be allowing the use of the keyword fprime which is already defined for this function. This is trivial, I think. The second step should be including the gradient function. However, this could be left to the user to pass as an argument of the code.

What do you think?

The text was updated successfully, but these errors were encountered:

scarrazza · 2021-01-12T11:12:39Z

@AdrianPerezSalinas thanks for this issue. I think this is feasible, by computing the analytic gradients manually or automatically (via tensorflow, or other backends).

AdrianPerezSalinas · 2021-01-12T11:23:09Z

Do you think automatic gradients are compatible with the finite differences techniques?

scarrazza · 2021-01-12T11:25:42Z

Usually, automatic analytic gradients are more efficient than finite differences, which are already computed by the scipy minimize.

AdrianPerezSalinas · 2021-01-12T11:31:40Z

I think I did not explain myself properly. When I say finite differences for quantum circuits I am talking about a method that allows to compute the exact analytical gradient by shifting the values of the parameters a large quantity (for most operators this quantity is pi/2). This is robust against inherent statistical noise, so it is useful for the experiment. See Eqs. 13 and 14 from here: https://arxiv.org/pdf/1811.11184.pdf

scarrazza · 2021-01-12T11:34:55Z

Thanks for the clarification, sorry for the misunderstanding.
Yes, this is something which may help, and for sure interesting to have build-in.

AdrianPerezSalinas · 2021-01-12T11:36:03Z

Nice! We can discuss it tomorrow

AdrianPerezSalinas · 2021-01-14T13:10:31Z

Hi @scarrazza , I leave here a document explaining my point with more details. Hope you find it useful!
main.pdf

AdrianPerezSalinas · 2021-01-18T10:25:00Z

Hi @scarrazza ,
I have been testing the method I proposed to you. I have not done so much, but results are pretty promising. I have done a test for the standard VQE with small circuits, as states in QIBO's examples. If I compute the expected value of the hamiltonian exactly (0 shots), both methods return the same optimization path
VQE_0shots.pdf

However, if I do the same with measurements (approximation is kind of rough), optimization is really different. It gets stuck quickly since the derivatives do not give any information.
VQE_10000000.0shots.pdf

In addition it looks like it does not matter how many shots you perform to estimate the hamiltonian, results are not good

Today I will check the next steps

scarrazza · 2021-01-18T11:05:28Z

Ok, thanks for this tests, lets see.

AdrianPerezSalinas · 2021-01-18T11:49:30Z

I have tested a fitting problem like qPDF, results are comparable
fit_0shots.pdf
fit_10000shots.pdf

AdrianPerezSalinas · 2021-01-18T12:18:59Z

Same test for an easy classifier,
classifier_0shots.pdf
classifier_10000shots.pdf

I think that it is clear, we need to implement this functionality when applying optimization to measurements. In addition, at least for measurement simulations, it is way faster to do it in this new way.

Do you want me to show you the code?

alhajri · 2021-01-20T13:49:51Z

Hi @AdrianPerezSalinas , just wondering if you are aware of this paper by Simon Benjamin on "Quantum Analytic Descent":
https://arxiv.org/abs/2008.13774

AdrianPerezSalinas · 2021-01-20T16:15:16Z

I was not aware of it, but thank you very much for pointing it out!

I have taken a look at it and it sounds really interesting. However, after the discussion today I think that the exact derivatives method or this one can reduce the error due to sampling, but cannot deal with imperfect circuits. We will have to further investigate about it

AdrianPerezSalinas · 2021-02-10T11:07:56Z

I leave here the results of some tests made with the exact derivative and a VQE model. As you may see, with errors of order 0.1% we can still have some minimization. Noisier circuits are chaotic.

VQE_100000.0shots.pdf

I think that implementing this kind of exact derivatives could be interesting only if we know that the circuit noise is below a certain threshold.

scarrazza · 2021-02-10T13:00:52Z

OK and how this compares to the numerical derivative for similar configurations?

AdrianPerezSalinas · 2021-02-11T07:20:56Z

Nothing returns result extremely good, but exact derivatives are more resilient to noise and errors than numerical ones. Not too much, though

VQE_0shots_0.01.pdf
VQE_0shots_0.001.pdf
VQE_10000shots_0.01.pdf
VQE_10000shots_0.001.pdf

AdrianPerezSalinas added the enhancement New feature or request label Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including derivative in parallel L-BFGS-B method #309

Including derivative in parallel L-BFGS-B method #309

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

AdrianPerezSalinas commented Jan 14, 2021

AdrianPerezSalinas commented Jan 18, 2021

scarrazza commented Jan 18, 2021

AdrianPerezSalinas commented Jan 18, 2021

AdrianPerezSalinas commented Jan 18, 2021

alhajri commented Jan 20, 2021

AdrianPerezSalinas commented Jan 20, 2021

AdrianPerezSalinas commented Feb 10, 2021

scarrazza commented Feb 10, 2021

AdrianPerezSalinas commented Feb 11, 2021

Including derivative in parallel L-BFGS-B method #309

Including derivative in parallel L-BFGS-B method #309

Comments

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

scarrazza commented Jan 12, 2021

AdrianPerezSalinas commented Jan 12, 2021

AdrianPerezSalinas commented Jan 14, 2021

AdrianPerezSalinas commented Jan 18, 2021

scarrazza commented Jan 18, 2021

AdrianPerezSalinas commented Jan 18, 2021

AdrianPerezSalinas commented Jan 18, 2021

alhajri commented Jan 20, 2021

AdrianPerezSalinas commented Jan 20, 2021

AdrianPerezSalinas commented Feb 10, 2021

scarrazza commented Feb 10, 2021

AdrianPerezSalinas commented Feb 11, 2021