Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss the usage of an object store in favor of an RDBMS to persist entities #1753

Open
phavekes opened this issue Nov 29, 2024 · 6 comments

Comments

@phavekes
Copy link
Member

This issue is imported from pivotal - Originaly created at Feb 11, 2020 by bstrooband

In order to persist entities I think we would be better of using an object store then a relational database. Now that we have dropped Edugain support we only query for entities on EntityId so it will be possible.

Some benefits:

  • schema less (no migrations)
  • we could update entities easily one-by-ony instead of all-at-once.
  • we could get rid of doctrine which will require an extra release before cleanup (fields in selects based on annotations)
  • The removal of doctrine prevents extensive hydration.
  • better horizontal scaling
  • multi language support would not require additional fields

Maybe it's too early to do now because the procedural style of Corto which wil make refactoring harder. So maybe Corto needs to be refactored out before doing this. But I would like to open up this discussion on forehand.

@phavekes
Copy link
Member Author

@thijskh, @michielkodde what are your thoughts on this? (bstrooband - Feb 11, 2020)

@phavekes
Copy link
Member Author

My thought is that Manage stores everything in MongoDB (Thijs Kinkhorst - Feb 11, 2020)

@phavekes
Copy link
Member Author

@thijskh would you be so kind to elaborate what you\'re implying with this? (bstrooband - Feb 12, 2020)

@phavekes
Copy link
Member Author

So we have a schemaless non-RDBMS store of entities already available.

Using it directly would imply that we no longer need any metadata push and any saves are effective immediately.
Using it directly would also imply that this MongoDB becomes part of the critical path and must be very highly available and responsive. (Thijs Kinkhorst - Feb 12, 2020)

@phavekes
Copy link
Member Author

Is the metadata push currently a functional requirement (maybe in order to validate the changes before publishing them)? Or would you want to go to a situation were changes are indeed immediately?

I can imagine that for historical reasons a RDBMS solution was chosen. The reason why I ask this is to know were we coming from and were we want to go on the long run.
(bstrooband - Feb 12, 2020)

@phavekes
Copy link
Member Author

It's not a functional requirement. It has been done for performance and availability reasons. EB has all the data it needs at runtime readily available in its own database so any other component can fail of be slow and the logging in will still be working and fast.

In the past EB queried Janus' API for each login. That turned out to be way to slow, problematically so. Janus needs to do much more, e.g.. version control, than EB, which can just use on the "current" state of available entities. After an intermediate caching workaround the database table with push to update it was used as a solution. EB already needs a HA, replicated DB so adding just one table to it means the platform does not need an extra complex distributed database type next to it. Processing of changes is relatively slow but each login is fast.

SInce then the environment has changed. Manage has replaced Janus, but for scope reasons it was developed an EB pov as drop in replacement: we didn't want to intertwine the two projects and create a big scope. Manage uses MongoDB as a backend. However, until now the assumption is that this is for Manage so it is acceptable if it's not super highly available.

Should we choose to put another DB in the critical login path, it needs to be redundant, robust and fast. And ideally not add even more maintenance overhead. If we want to use another backend, it might make sense to look at Mongo since Manage already stores everything in that db so it might even "just work". But obviously it needs to be evaluated if we are confident with MongoDB being critical. And e.g. that there's the knowledge to recover it when there's a crash or a data restore is needed. (Thijs Kinkhorst - Feb 12, 2020)

@phavekes phavekes removed their assignment Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant