CPLEXDirect performance improvements #1416

ruaridhw · 2020-04-28T20:12:06Z

This PR contains a number of performance improvements to the CPLEXDirect class, and by extension, the CPLEXPersistent class.

Notes

There are just a couple of functions (if not lines) that are resulting in very large bottlenecks. On the whole, things are quite performant as-is.

Batching "transactions" with CPLEX

The cplex package's Linear Constraints and Variable interfaces allow for batched transactions. I think an appropriate design is to generate all the necessary data and add these objects as one call to the solver_model. I've also removed unnecessary transactions such as resetting variable bounds immediately after adding that variable with an obsolete bound.

Querying results from CPLEX

Asking the cplex solution for specific variables' values is extremely slow due to the conversion on cplex's end to lookup the variables you've asked for. It is orders of magnitude faster to get the full solution vector. Even worse, using the "specific variable interface" to get the full solution vector. If we must get specific variables (ie. _load_vars()) then their index should be used instead of their name.

Expected performance vs LP solvers

It would be interesting to benchmark specifically writing the LP file vs the total time spent interfacing with the CPLEX library in isolation. I think it's a fallacy to suggest that the Direct interfaces should be faster because "there's no file IO" as suggested in GitHub issues and on SO. Once the repn has been generated, writing a string to disk takes no time at all. If anything I would imagine interfacing with a third-party library should be expected to take more time. To me, the gains are realised when using the Persistent interfaces to apply incremental resolves of a model. The LP solver has to regen the entire model whereas the Persistent interface only has to generate the changes.
Not to mention that a slight decline in performance is a small price to pay for access to the full advanced functionality of the cplex package for those who need it.

Benchmark

Ran the "benchmark-ish" case in With cplex, solver_io='python' is much slower than solver_io='nl' or default #51 ten times for each solver:

Solver	`master`	This branch
LP	10.74s to 11.06s. Mean 10.84s	10.94s to 11.27s. Mean 11.08s
Direct	32.48s to 34.00s. Mean 33.14s	11.67s to 12.23s. Mean 12.00s

There was still a speedup when I set n_steps = 10 so it's not as though we're optimising for very large models only
The total time is now almost wholly taken up by (a) model build and (b) generate_standard_repn() which both of these interfaces have to do (and is therefore out-of-scope here).
There are still a few inefficiencies but I didn't bother digging / fixing to shave part-seconds on a large test case. For example, generators are slower than list comprehension.
There are many places where generators are used where the iterable is sufficiently small to consume into memory (eg. ComponentSet.update())
Clearly the LP solver appears to for whatever reason gotten worse as well. I reran it as a control and I don't really have an explanation for that... Might need the reviewer's help in case I've somehow inadvertently impacted that solver.

TODO

Unit tests

I agree my contributions are submitted under the BSD license.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

codecov · 2020-04-28T20:18:04Z

Codecov Report

Merging #1416 into master will increase coverage by 1.05%.
The diff coverage is 95.04%.

@@            Coverage Diff             @@
##           master    #1416      +/-   ##
==========================================
+ Coverage   71.57%   72.62%   +1.05%     
==========================================
  Files         547      631      +84     
  Lines       83583    88074    +4491     
==========================================
+ Hits        59821    63961    +4140     
- Misses      23762    24113     +351

Impacted Files	Coverage Δ
pyomo/solvers/plugins/solvers/cplex_direct.py	`79.36% <95.04%> (+6.26%)`	⬆️
pyomo/contrib/pynumero/linalg/mumps_solver.py	`7.40% <0.00%> (-80.25%)`	⬇️
pyomo/dataportal/plugins/sheet.py	`76.08% <0.00%> (-8.70%)`	⬇️
pyomo/neos/kestrel.py	`76.98% <0.00%> (-4.02%)`	⬇️
pyomo/common/config.py	`97.00% <0.00%> (-2.00%)`	⬇️
...les/pysp/scripting/apps/generate_distributed_NL.py	`71.28% <0.00%> (-1.99%)`	⬇️
pyomo/solvers/plugins/converter/glpsol.py	`92.98% <0.00%> (-1.76%)`	⬇️
...mples/pysp/scripting/apps/compile_scenario_tree.py	`87.39% <0.00%> (-1.69%)`	⬇️
pyomo/solvers/plugins/solvers/GLPK.py	`82.84% <0.00%> (-1.46%)`	⬇️
pyomo/mpec/complementarity.py	`98.42% <0.00%> (-1.03%)`	⬇️
... and 133 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3f888c...41428a0. Read the comment docs.

blnicho · 2020-05-05T19:02:17Z

We're going to mark this as a WIP until tests are added.

- This is more efficient than always calling `.append()` on an empty list

Asking the `cplex` solution for *specific* variables' values is extremely slow due to the conversion on `cplex`'s end to lookup the variables you've asked for. It is orders of magnitude faster to get the full solution vector. Even worse, using the "specific variable interface" to get the full solution vector. If we must get specific variables (ie. `_load_vars()`) then their index should be used instead of their name.

The `cplex` package's Linear Constraints and Variable interfaces allow for batched transactions. I think an appropriate design is to generate all the necessary data and add these objects as one call to the `solver_model`. I've also removed unnecessary transactions such as resetting variable bounds immediately after adding that variable with an obsolete bound.

ruaridhw · 2020-05-07T07:20:34Z

This is ready for review though I'm not sure whom would be best placed. Perhaps @michaelbynum?

jsiirola

Overall I think this is really good. My biggest concern is around the licensing of nullcontext, but I think that is easy to get around.

One final question: your new tests all rely on mock. Are there at least a few tests of this functionality that actually engage the cplex solver (just so that our tests will catch if the underlying cplex argument API drifts)?

jsiirola · 2020-05-08T20:38:28Z

pyomo/solvers/plugins/solvers/cplex_direct.py

+# `nullcontext()` is part of the standard library as of Py3.7
+# This is verbatim from `cpython/Lib/contextlib.py`
+class nullcontext(object):
+    """Context manager that does no additional processing.
+    Used as a stand-in for a normal context manager, when a particular
+    block of code is only sometimes used with a normal context manager:
+    cm = optional_cm if condition else nullcontext()
+    with cm:
+        # Perform operation, using optional_cm if condition is True
+    """
+
+    def __init__(self, enter_result=None):
+        self.enter_result = enter_result
+
+    def __enter__(self):
+        return self.enter_result
+
+    def __exit__(self, *excinfo):
+        pass


This is problematic, as that snippet is covered under the PSF License. While there is no reason we couldn't include it in Pyomo, that would force us to update all the licensing statements. If we really need nullcontext, I think a better option would be a dependency on contextlib2.

That said, given that the context manager is only used in two places, I think I would prefer switching __exit__ to an explicit store_in_cplex (or equivalent) method, especially because that is a fairly fundamental part of the solver interface, and it is a bit obscure to have that happen as a side effect of a context manager.

The place where you rely on a nullcontext could just as easily (and probably more performant) be written as:

_cplex_var_data = cplex_var_data if cplex_var_data is not None \ else _VariableData(self._solver_model) _cplex_var_data.add(lb=lb, ub=ub, type_=vtype, name=varname) if cplex_var_data is None: _cplex_var_data.store_in_cplex()

jsiirola · 2020-05-08T21:48:11Z

pyomo/solvers/plugins/solvers/cplex_direct.py

+    def _add_block(self, block):
+        with _VariableData(self._solver_model) as cplex_var_data:
+            for var in block.component_data_objects(
+                ctype=pyomo.core.base.var.Var, descend_into=True, active=True, sort=True


Here (and elsewhere), I believe that you can make this more performant (avoid unnecessary sorting) by using sort=SortComponents.deterministic instead of True.

Happy to change this, just note it would be a behaviour change whereas the rest of this MR maintains identical behaviour to the existing implementation.

Good point. Preserving behavior is a good thing. We can revisit sorting later (probably as part of #677 / #1030).

ruaridhw · 2020-05-10T15:38:39Z

Overall I think this is really good. My biggest concern is around the licensing of nullcontext, but I think that is easy to get around.

Interesting! I never thought about there being an issue with copying CPython code given all of its standard libraries are used anyway. Happy to drop the contextmanager design, your alternative is much more explicit.

One final question: your new tests all rely on mock. Are there at least a few tests of this functionality that actually engage the cplex solver (just so that our tests will catch if the underlying cplex argument API drifts)?

All of the tests engage the CPLEX solver. mock is only used to "spy" on the API calls to the CPLEX solver rather than mocking the solver object. The tests can therefore assert on the underlying arguments that Pyomo provided CPLEX in addition to ensuring the API call actually worked. This is illustrated by assertions such as:

self.assertEqual(opt._solver_model.linear_constraints.get_num(), 1)

which would fail if _solver_model was not an actual cplex.Cplex() object.

- Calling a method to "finalise" the data objects is more explicit than `__exit__()` and doesn't rely on `nullcontext()` from CPython

ruaridhw · 2020-05-11T09:10:27Z

Happy to drop the contextmanager design, your alternative is much more explicit.

I've updated accordingly.

the rest of this MR maintains identical behaviour to the existing implementation

Just to add a caveat to this... The user-facing behaviour is identical however we are now adding linear constraints to the CPLEX object after quadratic and SOS constraints. I guess it's fine for this to change though since this is an internal implementation detail and I don't think CPLEX is actually impacted if this order is changed. The order of the constraints within each interface (Linear, Quadratic, SOS) is still the same.

jsiirola

This looks good. The GitHub Actions test failures are unrelated (NEOS issues / GHA instability).

michaelbynum

@ruaridhw Overall, this is great. We really appreciate these changes. I do have one comment below regarding quadratic objectives that needs addressed.

michaelbynum · 2020-05-14T15:20:08Z

pyomo/solvers/plugins/solvers/cplex_direct.py

@@ -426,7 +539,6 @@ def _set_objective(self, obj):
            self._objective = None

        self._solver_model.objective.set_linear([(i, 0.0) for i in range(len(self._pyomo_var_to_solver_var_map.values()))])
-        self._solver_model.objective.set_quadratic([[[0], [0]] for i in self._pyomo_var_to_solver_var_map.keys()])


@ruaridhw I don't think this line should be removed completely. However, it can be simplified to:

self._solver_model.objective.set_quadratic([0]*len(self._pyomo_var_to_solver_var_map))

If this line is not included, then the quadratic part of the objective will not be updated correctly between solves (for the persistent interface).

@michaelbynum, could you provide a small example of how this would break? I'd like to add a test case to that effect.

@ruaridhw This should demonstrate the issue:

import pyomo.environ as pe m = pe.ConcreteModel() m.x = pe.Var(bounds=(-2, 2)) m.y = pe.Var(bounds=(-2, 2)) m.obj = pe.Objective(expr=m.x**2 + m.y**2) m.c1 = pe.Constraint(expr=m.y >= 2*m.x - 1) m.c2 = pe.Constraint(expr=m.y >= -m.x + 2) opt = pe.SolverFactory('cplex_persistent') opt.set_instance(m) opt.solve() print(m.x.value, m.y.value) # should be 1, 1 del m.obj m.obj = pe.Objective(expr=m.x**2) opt.set_objective(m.obj) opt.solve() print(m.x.value, m.y.value) # should be 0, 2 but result is 1, 1 opt.set_instance(m) opt.solve() print(m.x.value, m.y.value) # to demonstrate that the result should be 0, 2

@michaelbynum, I've added a fix and this test case. The main issue I had was that the quadratic objective interface shouldn't be triggered if the model is a MILP. Let me know if 1b555dc is acceptable.

michaelbynum

Looks great.

michaelbynum · 2020-05-21T14:05:26Z

I think the test failures are due to older cplex versions. It looks like set_objective([0]*num_cols) only works in newer versions of cplex....

jsiirola · 2020-05-21T16:08:30Z

I think the test failures are due to older cplex versions. It looks like set_objective([0]*num_cols) only works in newer versions of cplex....

Per the CPLEX documentation:

If the quadratic objective function is separable, the entries of the list must all be of type float

I looked in lib/python3.5/site-packages/cplex/_internal/_subinterfaces.py, and there is an explicit test against the float type.

Changing [0] * num_cols to [0.] * num_cols should fix things.

michaelbynum · 2020-05-21T16:11:25Z

@jsiirola Oh, good catch!

ruaridhw · 2020-05-21T16:45:38Z

Great spot, thanks both. I've added explicit conversion of the coeffs to float inside _CplexExpr as well.

jsiirola · 2020-05-21T16:54:07Z

Fantastic! Once this round of tests pass we will merge this.

jsiirola · 2020-05-21T18:30:45Z

The one failing test was due to a codecov upload problem. Merging.

Pyomo#1416

jsiirola added the AT: PRE-TEST INSPECTED label Apr 29, 2020

pyomo-autotest removed the AT: PRE-TEST INSPECTED label Apr 29, 2020

blnicho changed the title ~~CPEXDirect performance improvements~~ CPLEXDirect performance improvements May 5, 2020

blnicho changed the title ~~CPLEXDirect performance improvements~~ [WIP] CPLEXDirect performance improvements May 5, 2020

ruaridhw force-pushed the perf/cplex_direct branch from d70f719 to bb84c73 Compare May 6, 2020 12:31

ruaridhw added 9 commits May 6, 2020 13:31

⚡ Initialise CplexExpr with full list

49039a7

- This is more efficient than always calling `.append()` on an empty list

🔨 No need to set an empty quadratic coef

b225180

⚡ Avoid calling set_bounds() when not necessary

f61fe42

🚨 Test nullcontext()

7ff9742

🚨 Test CPLEX Data Containers

4cd2c5a

🚨 Test _add_var()

274f9e5

🚨 Test _add_constraint()

6414dda

ruaridhw force-pushed the perf/cplex_direct branch from bb84c73 to 099cafd Compare May 6, 2020 12:35

ruaridhw changed the title ~~[WIP] CPLEXDirect performance improvements~~ CPLEXDirect performance improvements May 6, 2020

🚨 Test load_vars()

e363afe

ruaridhw force-pushed the perf/cplex_direct branch from 099cafd to e363afe Compare May 6, 2020 13:41

blnicho requested review from jsiirola and michaelbynum May 7, 2020 21:32

blnicho added the AT: PRE-TEST INSPECTED label May 7, 2020

blnicho self-requested a review May 7, 2020 21:34

pyomo-autotest removed the AT: PRE-TEST INSPECTED label May 7, 2020

jsiirola added the AT: PRE-TEST INSPECTED label May 8, 2020

jsiirola reviewed May 8, 2020

View reviewed changes

pyomo-autotest removed the AT: PRE-TEST INSPECTED label May 8, 2020

ruaridhw added 2 commits May 11, 2020 09:32

🔨 Use store_in_cplex() instead of ctxmanager

e0ab4f6

- Calling a method to "finalise" the data objects is more explicit than `__exit__()` and doesn't rely on `nullcontext()` from CPython

📚 Formatting

742d8bc

jsiirola added the AT: PRE-TEST INSPECTED label May 11, 2020

pyomo-autotest removed the AT: PRE-TEST INSPECTED label May 11, 2020

jsiirola approved these changes May 11, 2020

View reviewed changes

michaelbynum requested changes May 14, 2020

View reviewed changes

🔨 Refactor quadratic objective handling

fc0b8cc

ruaridhw force-pushed the perf/cplex_direct branch from 1b555dc to fc0b8cc Compare May 21, 2020 10:52

michaelbynum approved these changes May 21, 2020

View reviewed changes

jsiirola added the AT: PRE-TEST INSPECTED label May 21, 2020

pyomo-autotest removed the AT: PRE-TEST INSPECTED label May 21, 2020

🐛 Cast quadratic coefficients to floats explicitly

41428a0

jsiirola added the AT: PRE-TEST INSPECTED label May 21, 2020

pyomo-autotest removed the AT: PRE-TEST INSPECTED label May 21, 2020

jsiirola merged commit c89e2ce into Pyomo:master May 21, 2020

ruaridhw mentioned this pull request May 27, 2020

Pyomo#1416 flexciton/pyomo#9

Merged

ruaridhw referenced this pull request in flexciton/pyomo May 27, 2020

Merge pull request #9 from flexciton/perf/cplex_direct

6135420

Pyomo#1416

ruaridhw mentioned this pull request Jun 8, 2020

Unused variables are added to Direct solvers #1490

Open

michaelbynum mentioned this pull request Jun 19, 2020

Efficiency of MOSEK interface #1513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPLEXDirect performance improvements #1416

CPLEXDirect performance improvements #1416

ruaridhw commented Apr 28, 2020 •

edited

Loading

codecov bot commented Apr 28, 2020 •

edited

Loading

blnicho commented May 5, 2020

ruaridhw commented May 7, 2020

jsiirola left a comment

jsiirola May 8, 2020

jsiirola May 8, 2020

ruaridhw May 10, 2020

jsiirola May 11, 2020

ruaridhw commented May 10, 2020

ruaridhw commented May 11, 2020

jsiirola left a comment

michaelbynum left a comment

michaelbynum May 14, 2020

ruaridhw May 14, 2020

michaelbynum May 16, 2020

ruaridhw May 21, 2020

michaelbynum left a comment

michaelbynum commented May 21, 2020

jsiirola commented May 21, 2020 •

edited

Loading

michaelbynum commented May 21, 2020

ruaridhw commented May 21, 2020

jsiirola commented May 21, 2020

jsiirola commented May 21, 2020

CPLEXDirect performance improvements #1416

CPLEXDirect performance improvements #1416

Conversation

ruaridhw commented Apr 28, 2020 • edited Loading

Notes

Batching "transactions" with CPLEX

Querying results from CPLEX

Expected performance vs LP solvers

Benchmark

TODO

Related

Legal Acknowledgement

codecov bot commented Apr 28, 2020 • edited Loading

Codecov Report

blnicho commented May 5, 2020

ruaridhw commented May 7, 2020

jsiirola left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruaridhw commented May 10, 2020

ruaridhw commented May 11, 2020

jsiirola left a comment

Choose a reason for hiding this comment

michaelbynum left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbynum left a comment

Choose a reason for hiding this comment

michaelbynum commented May 21, 2020

jsiirola commented May 21, 2020 • edited Loading

michaelbynum commented May 21, 2020

ruaridhw commented May 21, 2020

jsiirola commented May 21, 2020

jsiirola commented May 21, 2020

ruaridhw commented Apr 28, 2020 •

edited

Loading

codecov bot commented Apr 28, 2020 •

edited

Loading

jsiirola commented May 21, 2020 •

edited

Loading