Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
replace outdated BQ lang table with GH Archive PR lang extract
The table [bigquery-public-data:github_repos.languages] was last updated in Nov 2022. This is a significant issue since, without any further updates, we can only count events that are happening for these outdated lists of repositories. Hence, we need a new method to obtain a large enough sample of repository primary language metadata. Fortunately, we can directly extract the language from PullRequest events, because they provide such a language field. So, whenever there is a PullRequest for any of the repos we want to include in our ranking, we are able to determine the language. These amount to many millions. The drawback is that we cannot include repositories that did not have any pull request for the current quarter. I think this is a fair trade-off for now until maybe there is some better solution.
- Loading branch information