Fast serialization of solutions to/from disk #106

whart222 · 2017-01-20T12:53:31Z

We haven't considered performance issues when serializing to/from disk, which could be an issue for large applications and/or scripts where this is done frequently. I spoke with Bill Evans about this, and he shared the following ideas:

SQLite
Pros: portable; random-access querying; binary storage; fast-ish read/write
Cons: schema may need to be rather rich/complex to support variably-nested blocks

Raw JSON bypasses the variability with structure, and may be more readable/consumable. I think it is much more suitable than YAML for this purpose, since tables do not appear to be implemented (or perhaps not easily, I may be wrong on this). You can affect the “readability” of the JSON by “prettifying it”, but that will grow the file size significantly, and some portions of the json-tree may not really “need” to be prettified. I imagine the largest problem with JSON would be read/write speed.
Pros: flexible; can be easily “prettified” for human readability
Cons: relatively inefficient storage and deserialization; no random access reading/writing

There is a BSON (binary JSON) that alleges better storage and read/write performance, but I have not seen a lot of activity on it, nor can I find R or python implementations.

ProtoBuf, a google storage format, is advertised as “a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler” (ref: https://developers.google.com/protocol-buffers/docs/overview). There is a version for R (RProtoBuf) and python (protobuf python).
Pro: eventually fast, compact, very flexible
Cons: may be more complex to “just dump nested dictionaries/tables”; python implementation is currently reported as less mature and slow (protobuf github)
Unknown: random-access?

Apache Arvo is similar to ProtoBuf but by Apache (I have not worked with it yet), http://avro.apache.org/docs/current/
Unknown: random-access?

Feather (on-disk fast data frame storage)
Pros: fast, portable (at least between R and python)
Cons: I believe it stores one data frame per file, so a model would require multiple files

den-run-ai · 2017-01-20T13:40:06Z

One of original developers of protobuf created something faster and this is one of the wrappers for python: https://github.com/antocuni/capnpy

…

On Fri, Jan 20, 2017, 6:53 AM William Hart ***@***.***> wrote: We haven't considered performance issues when serializing to/from disk, which could be an issue for large applications and/or scripts where this is done frequently. I spoke with Bill Evans about this, and he shared the following ideas: SQLite Pros: portable; random-access querying; binary storage; fast-ish read/write Cons: schema may need to be rather rich/complex to support variably-nested blocks Raw JSON bypasses the variability with structure, and may be more readable/consumable. I think it is much more suitable than YAML for this purpose, since tables do not appear to be implemented (or perhaps not easily, I may be wrong on this). You can affect the “readability” of the JSON by “prettifying it”, but that will grow the file size significantly, and some portions of the json-tree may not really “need” to be prettified. I imagine the largest problem with JSON would be read/write speed. Pros: flexible; can be easily “prettified” for human readability Cons: relatively inefficient storage and deserialization; no random access reading/writing There is a BSON (binary JSON) that alleges better storage and read/write performance, but I have not seen a lot of activity on it, nor can I find R or python implementations. ProtoBuf, a google storage format, is advertised as “a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler” (ref: https://developers.google.com/protocol-buffers/docs/overview). There is a version for R (RProtoBuf) and python (protobuf python). Pro: eventually fast, compact, very flexible Cons: may be more complex to “just dump nested dictionaries/tables”; python implementation is currently reported as less mature and slow (protobuf github) Unknown: random-access? Apache Arvo is similar to ProtoBuf but by Apache (I have not worked with it yet), http://avro.apache.org/docs/current/ Unknown: random-access? Feather (on-disk fast data frame storage) Pros: fast, portable (at least between R and python) Cons: I believe it stores one data frame per file, so a model would require multiple files — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#106>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHgZ5YH1T6WMAUkfPDGnuL69JfR7lHOOks5rUK5LgaJpZM4LpRkp> .

rsmith54 · 2017-11-21T15:29:12Z

Hi, I was wondering if there is any news on this? Thanks!

jsiirola · 2020-05-08T22:16:29Z

Archived on the master Performance Proposals Issue (#1430). Closing this performance proposal until active development has begun.

carldlaird added enhancement pyomo.core labels Feb 28, 2018

blnicho added the performance label Sep 26, 2018

jsiirola mentioned this issue May 8, 2020

Pyomo performance enhancement list #1430

Closed

jsiirola closed this as completed May 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast serialization of solutions to/from disk #106

Fast serialization of solutions to/from disk #106

whart222 commented Jan 20, 2017

den-run-ai commented Jan 20, 2017 via email

rsmith54 commented Nov 21, 2017

jsiirola commented May 8, 2020

Fast serialization of solutions to/from disk #106

Fast serialization of solutions to/from disk #106

Comments

whart222 commented Jan 20, 2017

den-run-ai commented Jan 20, 2017 via email

rsmith54 commented Nov 21, 2017

jsiirola commented May 8, 2020