Replies: 9 comments 9 replies
-
Thanks for these questions. I'll do some experiments with the test problems you mention and formulate replies for you to read before we chat - as the replies may prompt a lot more questions! |
Beta Was this translation helpful? Give feedback.
-
If you're wanting to solve the relaxation of a MIP, use the option You're also using |
Beta Was this translation helpful? Give feedback.
-
One more pillar of benchmarking is to prevent self-timing. A system is properly measured from outside the system, and so the SPEC CPU harness is responsible for determining how long the runtime is. We recognize that the total binary execution time may include setup time that you don't usually measure and report. Hopefully the setup time is minimal. In addition to not checking the output of self-timing, we should actually disable calls to the timer which take up valuable cycles. I looked into adding a runtime option to HiGHS to disable timing, but feeding that option all the way down to the modules that needs it started getting cumbersome. Instead, something like this gets us what we need with minimal fuss.
Incidentally, when we apply this simple patch, we see a 4% performance improvement, proving that timing takes time. So there may be motivation to implement a cmdline option for your users. |
Beta Was this translation helpful? Give feedback.
-
Yes, (for reasons, see below) to the latest version, 1.7.2, unless we need to make some modifications to suit your purposes |
Beta Was this translation helpful? Give feedback.
-
As discussed, we will identify a set of LP problems that run in around 4 minutes on a modern system, and under 1.8 GB of memory, and yield the same simplex iteration count on all architectures for which we can test. |
Beta Was this translation helpful? Give feedback.
-
As discussed, the speed-up of the parallel simplex solver for LP is largely limited by the number of memory channels, due to simplex computation being (typically) "Take an (integer) index and (double) value from memory, multiply a component (given by the index) from a vector by the value, and add overwrite the vector entry with the the result, or add the result to an accumulating sum". Hopefully the whole of the vector is in cache, otherwise that may have to be read from memory, too! Hence speed-up tails off at 4-8 threads. Thread parallelism is the same in all versions of HiGHS |
Beta Was this translation helpful? Give feedback.
-
LP problems typically have multiple solutions, and a different settings of the Since the algorithm is deterministic, there's no issue relating to your "equal work guarantee is only possible when there are zero solutions, since that means the entire search space was visited" situaiton. For LP, the entire search space is too large for all of it to be visited. |
Beta Was this translation helpful? Give feedback.
-
As discussed, HiGHS has its own random number generator (in util/HighsRandom.h) that has not changed since v1.5.3. As I understand, it is (deliberately) compiler/architecture dependent Different settings of the
where the last three parameters are the lower bound, default value and upper bound of an option |
Beta Was this translation helpful? Give feedback.
-
For question 9, our testers are requesting a way to debug without having to rebuild the binary. Are there knobs we can enable to dump verbose logs? |
Beta Was this translation helpful? Give feedback.
-
Hello friends,
I’m a CPU architect at Ampere Computing where I do performance analysis and workload characterization. I also serve on the SPEC CPU committee, working on benchmarks for the next version of SPEC CPU, CPUv8 . We try to find computationally intensive workloads in diverse fields, to help measure performance across a wide variety of behaviors and application domains. Based on the longevity of HiGHS, its large active community in linear optimization fields, plus its use in education, we have proposed the HiGHS solver be included in the next set of marquee benchmarks in SPEC CPU.
As part of the effort, we have ported and integrated the HiGHS mainline code into the SPEC CPU harness so that it can be tested on a wide variety of systems in a controlled environment to produce reproducible results. It is a cross-company effort, and this work has been done mostly by my counterpart at Intel. We have both single-threaded and lightly-multi-threaded HiGHS command lines which run and produce the same output across many compilers (llvm, gcc, icc, aocc, nvhpc, cray), ISAs (aarch64, x86, power) and operating systems (linux, windows, android, macosx).
Each benchmark candidate undergoes intense scrutiny, so we are seeking some help and guidance on issues that have come up. I hope we can use this gitlab issue to share and ask questions. Please be aware that we are computer architects and low level systems engineers, not mathematicians! Here are our current set of questions.
lp_data/Highs.cpp
that print out thesimplex_iteration_count
because they were varying between runs and that led to miscompares in the output files between systems/compiler opts. Can you share what the Simplex iteration count represents and how it can vary between runs? Is this how we can measure equal work, and verify through some tolerance? E.g. could we say the run completed 20000 simplex iterations plus or minus 100, and that signifies equal work between systems? We do have some benchmarks that take a non-linear path to get to their result, and we check for almost-equal-work through printing out how many steps they took, or some other intermediate metric that allows us to view the work accomplished. (Just like a math professor asks a student to show their work!)--random-seed
? How much is randomness used in the solver algorithms? For SPEC CPU we replace std::rand with our own “deterministic rand” to ensure the same sequence of random numbers are produced regardless of system under test; however we are still seeing some big variance (10% or more). If you can clarify the intention of the randomness that can help us mitigate this.supportcast10.mps
, and is reproducible on one system. What is the best way to debug this? The expectation is that all systems would produce the same output, but when something like this happens we need to provide a way for the benchmark user to debug what happened. (Did their new compiler produce incorrect assembly, or unsafe optimizations?) Changing the random_seed fixed the issue on that system, but the error moved and showed up on another system, so we need to fix this the correct way.We appreciate your input and willingness to help us! Please share your thoughts at your convenience.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions