-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PG Adapter Presently Only Fakes Streaming #251
Comments
So excited to see this happening! |
Does the two-year-old |
I believe the only streaming tests we have are here: https://github.com/geddy/model/blob/master/test/integration/adapters/streaming.js |
Since that's the base sql adapter using the One thing that seems a little tricky to me right now is how to swap out the individual database's streaming mechanisms. I've also always wondered where exactly the SQL responses are converted into |
Converting to real streams with the current architecture is a little strange, posting my thoughts here so others could chime in: There is currently an innerProcessor that initially makes a minimal query to retrieve an array of IDs before querying for full models and associations. My current considerations center around whether this is necessary for a streaming adapter, and if so, how should it properly be implemented. While a simple solution would be to make the initial query non-streaming, and then stream back the full results, I feel like if we're going to do streaming, we should do it really properly, and this means making no assumptions like "The array of IDs is a reasonable size to return without streaming and hold in memory". One way would be for the initial query to stream back the IDs, in which case we have to decide how frequently to make the larger association requests. My hope is that the initial query isn't necessary if using true streams. Curious for someone who is familiar with this pre-query to chime in on its purpose. I swear you could do most of these sql queries without a pre-query, but I may be missing something. |
Oh I see now: So basically we'd have to implement both ways: Non-limiting/offsetting streaming requests can have a direct stream, but requests with limit or offset need IDs first, and so the question of "how often to query for subsequent results" remains. It almost makes me think it could be a config variable or request option, how many records to fetch at a time when streaming with a limit or offset. That's a pretty obscure variable, but I wouldn't want to bury it as an assumption... |
What if instead of making a pre-query, we made a subquery? I'm pretty sure both Postgres and MySQL support this: Instead of requesting IDs, then making a second request with an |
I was processing a pretty big dataset the other day, and naturally was feeling pretty grateful that Model supports streaming results, until der_On and I discovered:
The PG Adapter Only Fakes Streaming!
It inherits from
EventedQueryProcessor
, which inherits from anEventEmitter
, and only fakes a Stream object, but without the critical back-pressure handling and memory-efficiency of actual streams!You can see an example of how this is presently implemented here.
One good reason this may have been done originally, is that the core PG module does not support streams.
Fortunately, there is a module, node-pg-query-stream that does exactly what we need here.
I'm adding this issue to help guide efforts at adding proper streaming to the PG adapter, and am going to be taking a shot myself, although this will definitely be the deepest I've ever delved into
Model
, so no promises! (Only callbacks)The text was updated successfully, but these errors were encountered: