-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
top_fields.sql
times out
#54
Comments
(Keeping notes on what I tried for when someone has a chance to dive into this) Even just the first two joins of
|
One solution would be restricting scores earlier, e.g. keeping just the top k scores by level during inference, for some value of k that's higher than we think we'd ever really need. Say 10. This is what MAG used to do, and it would solve the efficiency problem for paper-level scoring without measurable impact on analysis. But cluster-level averages should probably be calculated over papers beforehand, which would introduce some complexity into the inference pipeline. |
In the short term I like the idea of restricting scores earlier, especially if you all are going to rerun this pipeline soon. |
Per ~ in person ~ discussion today, we're re-running this pipeline soonish assuming we can find efficiency gains. |
Or at least it did when I tried to run it ~2 weeks ago. The
l2_candidates
subquery seemed to be the culprit. I started looking into how it could be sped up but didn't solve it before I went OOO. Going ahead and opening an issue to give you guys a heads-up since it sounds like you'll be rerunning this soon (possibly before I have a chance to take a look again).The text was updated successfully, but these errors were encountered: