-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize player table scraping in fb_league_stats()
#359
Conversation
worldfootballr_html_player_table <- function(session) { | ||
stopifnot(identical(class(session), c("WorldfootballRDynamicPage", "R6"))) | ||
|
||
## find element "above" commented out table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that i couldn't figure out how to identify the element with the commented out table directly, so i've opted to do it "indirectly", by identifying the node above it (which always has the same CSS class).
session <- worldfootballr_chromote_session(url) | ||
page <- worldfootballr_html_page(session) | ||
player_table <- worldfootballr_html_player_table(session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the new solution is a little more specific. this function only returns the HTML for 1 table, while the prior solution returned the HTML for all (3) tables on the page, from which the player table was later plucked. arguably this new solution is better
player_table_elements <- xml2::xml_children(xml2::xml_children(player_table)) | ||
parsed_player_table <- rvest::html_table(player_table_elements) | ||
renamed_player_table <- .rename_fb_cols(parsed_player_table[[1]]) | ||
renamed_player_table <- renamed_player_table[renamed_player_table$Rk != "Rk", ] | ||
renamed_player_table <- .add_player_href( | ||
renamed_player_table, | ||
parent_element = player_table_elements, | ||
player_xpath = ".//tbody/tr/td[@data-stat='player']/a" | ||
) | ||
} | ||
|
||
suppressMessages( | ||
readr::type_convert( | ||
clean_table, | ||
guess_integer = TRUE, | ||
na = "", | ||
trim_ws = TRUE | ||
suppressMessages( | ||
readr::type_convert( | ||
renamed_player_table, | ||
guess_integer = TRUE, | ||
na = "", | ||
trim_ws = TRUE | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's basically no logic change here. i've just added _player
to the variable names
@@ -406,16 +406,16 @@ test_that("fb_league_stats() for players works", { | |||
testthat::skip_on_cran() | |||
testthat::skip_on_ci() | |||
expected_player_shooting_cols <- c("Rk", "Player", "Player_Href", "Nation", "Pos", "Squad", "Age", "Born", "Mins_Per_90", "Gls_Standard", "Sh_Standard", "SoT_Standard", "SoT_percent_Standard", "Sh_per_90_Standard", "SoT_per_90_Standard", "G_per_Sh_Standard", "G_per_SoT_Standard", "Dist_Standard", "FK_Standard", "PK_Standard", "PKatt_Standard", "xG_Expected", "npxG_Expected", "npxG_per_Sh_Expected", "G_minus_xG_Expected", "np:G_minus_xG_Expected", "Matches", "url") | |||
epl_player_shooting_22 <- fb_league_stats( | |||
single_player_shooting_22 <- fb_league_stats( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realized this variable has epl_
in it but we're scraping for Brazil. i've renamed it to single_
to implicitly reflect its usage for testing just 1 league
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes you've made look really good.
Thanks so much
It turns out that, in at least one known case, the player table for league shooting stats is hidden by default on the page. One solution would be to try to identify when this occurs, "click" on the show button, and then parse the table as usual. But this adds a lot of overhead.
A generalizable solution implemented in this PR is to parse out the player table from an HTML comment always loaded with the page. The resulting code is a little more "specific", but I wouldn't deem it "hard-coded" by any means. Further, I think it's ok to make the code very specific in this case since we don't use chromote for any other functions in the package.
Appendix