You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #828, most file-based database packages (including MontyDB in the already-implemented MontyStore) do not have any built-in protection against multiple Python processes (or threads) reading/writing to the same database at the same time. This makes them useful only for serial calculations and less suitable for high-throughput settings where the odds of a collision are very high.
Rather than relying on the external package to implement a file-locking system, we should introduce a file-locking mechanism within maggma that can be applied to all file-based data stores. py-filelock and portalocker are both good platform-agnostic options, with the former perhaps being slightly more active. There are built-in locking features in the MP monty package, but in my opinion we are better off using a battle-tested solution since they are usually light on the dependencies anyway (and the lock mechanism used in fireworks often caused headaches...).
I'm jotting this down so that I don't forget. I don't have plans to work on this right now, but I will likely need to implement it one day in the future.
The text was updated successfully, but these errors were encountered:
FYI: Here is what happens when two processes try to write to a montystore at the same time. It looks like montydb has a locking mechanism, but it doesn't support concurrent processes.
I had started some work to replace mongomock with actual mongodb in MemoryStore (see #846 ). Since JSONStore is backed by MemoryStore, I wonder whether doing this could also address the locking issue?
We have had success using JSONStore to run atomate2 workflows in low throughput, but I'm sure we would encounter a similar problem in high throughput.
As discussed in #828, most file-based database packages (including MontyDB in the already-implemented
MontyStore
) do not have any built-in protection against multiple Python processes (or threads) reading/writing to the same database at the same time. This makes them useful only for serial calculations and less suitable for high-throughput settings where the odds of a collision are very high.Rather than relying on the external package to implement a file-locking system, we should introduce a file-locking mechanism within maggma that can be applied to all file-based data stores. py-filelock and portalocker are both good platform-agnostic options, with the former perhaps being slightly more active. There are built-in locking features in the MP
monty
package, but in my opinion we are better off using a battle-tested solution since they are usually light on the dependencies anyway (and the lock mechanism used in fireworks often caused headaches...).I'm jotting this down so that I don't forget. I don't have plans to work on this right now, but I will likely need to implement it one day in the future.
The text was updated successfully, but these errors were encountered: