Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer option (readability) #83

Open
Sam-XiaoyueLi opened this issue Oct 14, 2024 · 3 comments
Open

Optimizer option (readability) #83

Sam-XiaoyueLi opened this issue Oct 14, 2024 · 3 comments
Assignees
Labels

Comments

@Sam-XiaoyueLi
Copy link
Contributor

In train_vqe in main.py, the optimizer options are given by argument optimizer_options. However, the description in the help documentation is unclear (without example code, a general user wouldn't know what to put there) and the default value for nepochs is unrealistic (100000) and tol does not help terminate the code.

For example, when optimizer='sgd', tol=1e-2 and we run the following code

niter = 3
# define the qibo loss function
objective_boost = partial(vqe_loss)
# logging history
params_history, loss_history, grads_history, fluctuations = [], [], [], []
# set optimizer
optimizer = 'sgd'
tol = 1e-2

# train vqe
(
    partial_results,
    partial_params_history,
    partial_loss_history,
    partial_grads_history,
    partial_fluctuations,
    vqe,
) = train_vqe(
    deepcopy(ansatz_circ),
    ham_boost,  # Fixed hamiltonian
    optimizer,
    initial_params,
    tol=tol,
    niterations=1,
    nmessage=1,
    loss=objective_boost,
)
params_history = np.array(partial_params_history)
loss_history = np.array(partial_loss_history)
grads_history = np.array(partial_grads_history)

which does not specify optimizer_options, the code runs almost indefinitely, like so:
image

In the scenario where optimizer='cma' (backend='tensorflow'), the loss function fluctuates largely (changes sign)
image

In summary, the default value for nepochs in the optimizers.optimize function in ansazte.py may need to be more realistic for the general user. It may also be helpful if the help documentation has more detailed descriptions of the optimizer_options. Moreover, we may need to see if 'cma' is running correctly?

@MatteoRobbiati
Copy link
Collaborator

Thanks for pointing out this! This is definitely just a documentation problem. We are not re-writing the optimizers, but using external providers. CMA-ES and TensorFlow optimizers, for example, are provided by Qibo.
I wouldn't consider as an option to change the defaults of Qibo, but we can for sure set some reasonable parameters for our setup.

For example, I used to run SGD with the following config:

OPTIMIZER="sgd"
BACKEND="tensorflow"
OPTIMIZER_OPTIONS={
    "optimizer": "Adam",
    "learning_rate": "0.01",
    "nmessage": 1,
    "epochs": 1000,
}

Note, also, that it is better to use tensorflow backend only when using tensorflow SGD. For all the other optimizations, we should use numpy or qibojit.

In summary:

We should write these instructions into the README.md files and add docstring so that the users know some reasonable parameters values.

@marekgluza
Copy link
Contributor

I'd say it's quite confusing that train_vqe on our repo doesn't properly restrict maxiter

For example I tried to follow the documentation from optimizers.py in qibo

import cma
cma.CMAOptions()

which points to maxiter

{'AdaptSigma': 'True  # or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA',
 'CMA_active': 'True  # negative update, conducted after the original update',
 'CMA_active_injected': '0  #v weight multiplier for negative weights of injected solutions',
 'CMA_cmean': '1  # learning rate for the mean value',
 'CMA_const_trace': 'False  # normalize trace, 1, True, "arithm", "geom", "aeig", "geig" are valid',
 'CMA_diagonal': '0*100*N/popsize**0.5  # nb of iterations with diagonal covariance matrix, True for always',
 'CMA_diagonal_decoding': '0  # multiplier for additional diagonal update',
 'CMA_eigenmethod': 'np.linalg.eigh  # or cma.utilities.math.eig or pygsl.eigen.eigenvectors',
 'CMA_elitist': 'False  #v or "initial" or True, elitism likely impairs global search performance',
 'CMA_injections_threshold_keep_len': '1  #v keep length if Mahalanobis length is below the given relative threshold',
 'CMA_mirrors': 'popsize < 6  # values <0.5 are interpreted as fraction, values >1 as numbers (rounded), for `True` about 0.16 is used',
 'CMA_mirrormethod': '2  # 0=unconditional, 1=selective, 2=selective with delay',
 'CMA_mu': 'None  # parents selection parameter, default is popsize // 2',
 'CMA_on': '1  # multiplier for all covariance matrix updates',
 'CMA_sampler': 'None  # a class or instance that implements the interface of `cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass`',
 'CMA_sampler_options': '{}  # options passed to `CMA_sampler` class init as keyword arguments',
 'CMA_rankmu': '1.0  # multiplier for rank-mu update learning rate of covariance matrix',
 'CMA_rankone': '1.0  # multiplier for rank-one update learning rate of covariance matrix',
 'CMA_recombination_weights': 'None  # a list, see class RecombinationWeights, overwrites CMA_mu and popsize options',
 'CMA_dampsvec_fac': 'np.Inf  # tentative and subject to changes, 0.5 would be a "default" damping for sigma vector update',
 'CMA_dampsvec_fade': '0.1  # tentative fading out parameter for sigma vector update',
 'CMA_teststds': 'None  # factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production',
 'CMA_stds': 'None  # multipliers for sigma0 in each coordinate (not represented in C), or use `cma.ScaleCoordinates` instead',
 'CSA_dampfac': '1  #v positive multiplier for step-size damping, 0.3 is close to optimal on the sphere',
 'CSA_damp_mueff_exponent': '0.5  # zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option',
 'CSA_disregard_length': 'False  #v True is untested, also changes respective parameters',
 'CSA_clip_length_value': 'None  #v poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.',
 'CSA_squared': 'False  #v use squared length for sigma-adaptation ',
 'BoundaryHandler': 'BoundTransform  # or BoundPenalty, unused when ``bounds in (None, [None, None])``',
 'bounds': '[None, None]  # lower (=bounds[0]) and upper domain boundaries, each a scalar or a list/vector',
 'conditioncov_alleviate': '[1e8, 1e12]  # when to alleviate the condition in the coordinates and in main axes',
 'eval_final_mean': 'True  # evaluate the final mean, which is a favorite return candidate',
 'fixed_variables': 'None  # dictionary with index-value pairs like {0:1.1, 2:0.1} that are not optimized',
 'ftarget': '-inf  #v target function value, minimization',
 'integer_variables': '[]  # index list, invokes basic integer handling: prevent std dev to become too small in the given variables',
 'is_feasible': 'is_feasible  #v a function that computes feasibility, by default lambda x, f: f not in (None, np.NaN)',
 'maxfevals': 'inf  #v maximum number of function evaluations',
 'maxiter': '100 + 150 * (N+3)**2 // popsize**0.5  #v maximum number of iterations',
 'mean_shift_line_samples': 'False #v sample two new solutions colinear to previous mean shift',
 'mindx': '0  #v minimal std in any arbitrary direction, cave interference with tol*',
 'minstd': '0  #v minimal std (scalar or vector) in any coordinate direction, cave interference with tol*',
 'maxstd': 'None  #v maximal std (scalar or vector) in any coordinate direction',
 'maxstd_boundrange': '1/3  # maximal std relative to bound_range per coordinate, overruled by maxstd',
 'pc_line_samples': 'False #v one line sample along the evolution path pc',
 'popsize': '4 + 3 * np.log(N)  # population size, AKA lambda, int(popsize) is the number of new solution per iteration',
 'popsize_factor': '1  # multiplier for popsize, convenience option to increase default popsize',
 'randn': 'np.random.randn  #v randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss',
 'scaling_of_variables': 'None  # deprecated, rather use fitness_transformations.ScaleCoordinates instead (or CMA_stds). Scale for each variable in that effective_sigma0 = sigma0*scaling. Internally the variables are divided by scaling_of_variables and sigma is unchanged, default is `np.ones(N)`',
 'seed': 'time  # random number seed for `numpy.random`; `None` and `0` equate to `time`, `np.nan` means "do nothing", see also option "randn"',
 'signals_filename': 'cma_signals.in  # read versatile options from this file (use `None` or `""` for no file) which contains a single options dict, e.g. ``{"timeout": 0}`` to stop, string-values are evaluated, e.g. "np.inf" is valid',
 'termination_callback': '[]  #v a function or list of functions returning True for termination, called in `stop` with `self` as argument, could be abused for side effects',
 'timeout': 'inf  #v stop if timeout seconds are exceeded, the string "2.5 * 60**2" evaluates to 2 hours and 30 minutes',
 'tolconditioncov': '1e14  #v stop if the condition of the covariance matrix is above `tolconditioncov`',
 'tolfacupx': '1e3  #v termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0',
 'tolupsigma': '1e20  #v sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates "creeping behavior" with usually minor improvements',
 'tolflatfitness': '1  #v iterations tolerated with flat fitness before termination',
 'tolfun': '1e-11  #v termination criterion: tolerance in function value, quite useful',
 'tolfunhist': '1e-12  #v termination criterion: tolerance in function value history',
 'tolfunrel': '0  #v termination criterion: relative tolerance in function value: Delta f current < tolfunrel * (median0 - median_min)',
 'tolstagnation': 'int(100 + 100 * N**1.5 / popsize)  #v termination if no improvement over tolstagnation iterations',
 'tolx': '1e-11  #v termination criterion: tolerance in x-changes',
 'transformation': 'None  # depreciated, use cma.fitness_transformations.FitnessTransformation instead.\n            [t0, t1] are two mappings, t0 transforms solutions from CMA-representation to f-representation (tf_pheno),\n            t1 is the (optional) back transformation, see class GenoPheno',
 'typical_x': 'None  # used with scaling_of_variables',
 'updatecovwait': 'None  #v number of iterations without distribution update, name is subject to future changes',
 'verbose': '3  #v verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented',
 'verb_append': '0  # initial evaluation counter, if append, do not overwrite output files',
 'verb_disp': '100  #v verbosity: display console output every verb_disp iteration',
 'verb_disp_overwrite': 'inf  #v start overwriting after given iteration',
 'verb_filenameprefix': 'outcmaes/  # output path (folder) and filenames prefix',
 'verb_log': '1  #v verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions',
 'verb_log_expensive': 'N * (N <= 50)  # allow to execute eigendecomposition for logging every verb_log_expensive iteration, 0 or False for never',
 'verb_plot': '0  #v in fmin2(): plot() is called every verb_plot iteration',
 'verb_time': 'True  #v output timings on console',
 'vv': '{}  #? versatile set or dictionary for hacking purposes, value found in self.opts["vv"]'}

but this

param = params_history[-1]
(
    partial_results,
    partial_params_history,
    partial_loss_history,
    partial_grads_history,
    partial_fluctuations,
    vqe,
) = train_vqe(
    deepcopy(ansatz_circ),
    ham,  # Fixed hamiltonian
    optimizer,
    param,
    tol=tol,
    niterations=maxiter, # Show log info
    nmessage=nmessage,
    loss=objective_boost,
    training_options={'maxiter': maxiter}
)
params_history.extend(np.array(partial_params_history))
loss_history.extend(np.array(partial_loss_history))
grads_history.extend(np.array(partial_grads_history))
fluctuations.extend(np.array(partial_fluctuations))

is not terminating

...
INFO:root:Optimization iteration 30500/3500
INFO:root:Loss -10.61

At minimum I'd say there is a logging bug. It's not an issue for the paper submission because you know how to run it but for outside users this is difficult.

#101 might be fixing it?

@MatteoRobbiati
Copy link
Collaborator

MatteoRobbiati commented Nov 26, 2024

Yes this is kind of expected (but as you say it has to be clarified)!
ATM we have the INFO:root:Optimization iteration message printed into a callback of our cost function. Namely, this message is shown every time the loss function is computed.
On the other hand, if the rule 1 computation - 1 iteration is correct using gradient descent, this is not always true when using other optimizers (many of them require a lot of loss function evaluations for an optimization iteration). I would probably remove this message in all the cases we are not using a SGD, also because other optimizers have already a predefined logging policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants