-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPLEXDirect performance improvements #1416
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1416 +/- ##
==========================================
+ Coverage 71.57% 72.62% +1.05%
==========================================
Files 547 631 +84
Lines 83583 88074 +4491
==========================================
+ Hits 59821 63961 +4140
- Misses 23762 24113 +351
Continue to review full report at Codecov.
|
We're going to mark this as a WIP until tests are added. |
d70f719
to
bb84c73
Compare
- This is more efficient than always calling `.append()` on an empty list
Asking the `cplex` solution for *specific* variables' values is extremely slow due to the conversion on `cplex`'s end to lookup the variables you've asked for. It is orders of magnitude faster to get the full solution vector. Even worse, using the "specific variable interface" to get the full solution vector. If we must get specific variables (ie. `_load_vars()`) then their index should be used instead of their name.
The `cplex` package's Linear Constraints and Variable interfaces allow for batched transactions. I think an appropriate design is to generate all the necessary data and add these objects as one call to the `solver_model`. I've also removed unnecessary transactions such as resetting variable bounds immediately after adding that variable with an obsolete bound.
bb84c73
to
099cafd
Compare
099cafd
to
e363afe
Compare
This is ready for review though I'm not sure whom would be best placed. Perhaps @michaelbynum? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is really good. My biggest concern is around the licensing of nullcontext
, but I think that is easy to get around.
One final question: your new tests all rely on mock. Are there at least a few tests of this functionality that actually engage the cplex solver (just so that our tests will catch if the underlying cplex argument API drifts)?
# `nullcontext()` is part of the standard library as of Py3.7 | ||
# This is verbatim from `cpython/Lib/contextlib.py` | ||
class nullcontext(object): | ||
"""Context manager that does no additional processing. | ||
Used as a stand-in for a normal context manager, when a particular | ||
block of code is only sometimes used with a normal context manager: | ||
cm = optional_cm if condition else nullcontext() | ||
with cm: | ||
# Perform operation, using optional_cm if condition is True | ||
""" | ||
|
||
def __init__(self, enter_result=None): | ||
self.enter_result = enter_result | ||
|
||
def __enter__(self): | ||
return self.enter_result | ||
|
||
def __exit__(self, *excinfo): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is problematic, as that snippet is covered under the PSF License. While there is no reason we couldn't include it in Pyomo, that would force us to update all the licensing statements. If we really need nullcontext, I think a better option would be a dependency on contextlib2
.
That said, given that the context manager is only used in two places, I think I would prefer switching __exit__
to an explicit store_in_cplex
(or equivalent) method, especially because that is a fairly fundamental part of the solver interface, and it is a bit obscure to have that happen as a side effect of a context manager.
The place where you rely on a nullcontext could just as easily (and probably more performant) be written as:
_cplex_var_data = cplex_var_data if cplex_var_data is not None \
else _VariableData(self._solver_model)
_cplex_var_data.add(lb=lb, ub=ub, type_=vtype, name=varname)
if cplex_var_data is None:
_cplex_var_data.store_in_cplex()
def _add_block(self, block): | ||
with _VariableData(self._solver_model) as cplex_var_data: | ||
for var in block.component_data_objects( | ||
ctype=pyomo.core.base.var.Var, descend_into=True, active=True, sort=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here (and elsewhere), I believe that you can make this more performant (avoid unnecessary sorting) by using sort=SortComponents.deterministic
instead of True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to change this, just note it would be a behaviour change whereas the rest of this MR maintains identical behaviour to the existing implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting! I never thought about there being an issue with copying CPython code given all of its standard libraries are used anyway. Happy to drop the contextmanager design, your alternative is much more explicit.
All of the tests engage the CPLEX solver. self.assertEqual(opt._solver_model.linear_constraints.get_num(), 1) which would fail if |
- Calling a method to "finalise" the data objects is more explicit than `__exit__()` and doesn't rely on `nullcontext()` from CPython
I've updated accordingly.
Just to add a caveat to this... The user-facing behaviour is identical however we are now adding linear constraints to the CPLEX object after quadratic and SOS constraints. I guess it's fine for this to change though since this is an internal implementation detail and I don't think CPLEX is actually impacted if this order is changed. The order of the constraints within each interface (Linear, Quadratic, SOS) is still the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. The GitHub Actions test failures are unrelated (NEOS issues / GHA instability).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruaridhw Overall, this is great. We really appreciate these changes. I do have one comment below regarding quadratic objectives that needs addressed.
@@ -426,7 +539,6 @@ def _set_objective(self, obj): | |||
self._objective = None | |||
|
|||
self._solver_model.objective.set_linear([(i, 0.0) for i in range(len(self._pyomo_var_to_solver_var_map.values()))]) | |||
self._solver_model.objective.set_quadratic([[[0], [0]] for i in self._pyomo_var_to_solver_var_map.keys()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruaridhw I don't think this line should be removed completely. However, it can be simplified to:
self._solver_model.objective.set_quadratic([0]*len(self._pyomo_var_to_solver_var_map))
If this line is not included, then the quadratic part of the objective will not be updated correctly between solves (for the persistent interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelbynum, could you provide a small example of how this would break? I'd like to add a test case to that effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ruaridhw This should demonstrate the issue:
import pyomo.environ as pe
m = pe.ConcreteModel()
m.x = pe.Var(bounds=(-2, 2))
m.y = pe.Var(bounds=(-2, 2))
m.obj = pe.Objective(expr=m.x**2 + m.y**2)
m.c1 = pe.Constraint(expr=m.y >= 2*m.x - 1)
m.c2 = pe.Constraint(expr=m.y >= -m.x + 2)
opt = pe.SolverFactory('cplex_persistent')
opt.set_instance(m)
opt.solve()
print(m.x.value, m.y.value) # should be 1, 1
del m.obj
m.obj = pe.Objective(expr=m.x**2)
opt.set_objective(m.obj)
opt.solve()
print(m.x.value, m.y.value) # should be 0, 2 but result is 1, 1
opt.set_instance(m)
opt.solve()
print(m.x.value, m.y.value) # to demonstrate that the result should be 0, 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelbynum, I've added a fix and this test case. The main issue I had was that the quadratic objective interface shouldn't be triggered if the model is a MILP. Let me know if 1b555dc is acceptable.
1b555dc
to
fc0b8cc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
I think the test failures are due to older cplex versions. It looks like |
Per the CPLEX documentation:
I looked in Changing |
@jsiirola Oh, good catch! |
Great spot, thanks both. I've added explicit conversion of the coeffs to |
Fantastic! Once this round of tests pass we will merge this. |
The one failing test was due to a codecov upload problem. Merging. |
This PR contains a number of performance improvements to the
CPLEXDirect
class, and by extension, theCPLEXPersistent
class.Notes
There are just a couple of functions (if not lines) that are resulting in very large bottlenecks. On the whole, things are quite performant as-is.
Batching "transactions" with CPLEX
The
cplex
package's Linear Constraints and Variable interfaces allow for batched transactions. I think an appropriate design is to generate all the necessary data and add these objects as one call to thesolver_model
. I've also removed unnecessary transactions such as resetting variable bounds immediately after adding that variable with an obsolete bound.Querying results from CPLEX
Asking the
cplex
solution for specific variables' values is extremely slow due to the conversion oncplex
's end to lookup the variables you've asked for. It is orders of magnitude faster to get the full solution vector. Even worse, using the "specific variable interface" to get the full solution vector. If we must get specific variables (ie._load_vars()
) then their index should be used instead of their name.Expected performance vs LP solvers
It would be interesting to benchmark specifically writing the LP file vs the total time spent interfacing with the CPLEX library in isolation. I think it's a fallacy to suggest that the Direct interfaces should be faster because "there's no file IO" as suggested in GitHub issues and on SO. Once the repn has been generated, writing a string to disk takes no time at all. If anything I would imagine interfacing with a third-party library should be expected to take more time. To me, the gains are realised when using the Persistent interfaces to apply incremental resolves of a model. The LP solver has to regen the entire model whereas the Persistent interface only has to generate the changes.
Not to mention that a slight decline in performance is a small price to pay for access to the full advanced functionality of the
cplex
package for those who need it.Benchmark
master
n_steps = 10
so it's not as though we're optimising for very large models onlygenerate_standard_repn()
which both of these interfaces have to do (and is therefore out-of-scope here).There are many places where generators are used where the iterable is sufficiently small to consume into memory (eg.
ComponentSet.update()
)TODO
Related
Legal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: