Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reliability of fb_league_stats(..., team_or_player = "player") #396

Merged
merged 1 commit into from
Sep 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
use read_html_live, point to correct html table element for fb_league…
…_stats(..., team_or_player = "player")
  • Loading branch information
tonyelhabr committed Sep 2, 2024
commit 24f4c41829f7d46c212535d6cbfaae0841670ae6
12 changes: 6 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: worldfootballR
Title: Extract and Clean World Football (Soccer) Data
Version: 0.6.5.0008
Version: 0.6.6.0000
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fb_league_stats(..., team_or_player = "player") seems to be a popular function that was buggy before, I think this deserves a little more than a dev version increment!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, good choice.

Authors@R: c(
person("Jason", "Zivkovic", , "jaseziv83@gmail.com", role = c("aut", "cre", "cph")),
person("Tony", "ElHabr", , "anthonyelhabr@gmail.com", role = "ctb"),
Expand All @@ -13,14 +13,14 @@ Description: Allow users to obtain clean and tidy
number of popular sites, including 'FBref',
transfer and valuations data from
'Transfermarkt'<https://www.transfermarkt.com/> and shooting location
and other match stats data from 'Understat'<https://understat.com/>.
and other match stats data from 'Understat'<https://understat.com/>.
It gives users the ability to access data more efficiently, rather than
having to export data tables to files before being able to complete their
having to export data tables to files before being able to complete their
analysis.
License: GPL-3
URL: https://github.com/JaseZiv/worldfootballR
BugReports: https://github.com/JaseZiv/worldfootballR/issues
Depends:
Depends:
R (>= 4.0.0)
Imports:
dplyr,
Expand All @@ -36,7 +36,7 @@ Imports:
readr,
rlang,
rstudioapi,
rvest (>= 1.0.0),
rvest (>= 1.0.4),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notable requirement

stats,
stringi,
stringr,
Expand All @@ -47,7 +47,7 @@ Imports:
xml2,
tibble,
cli
Suggests:
Suggests:
chromote,
R6,
knitr,
Expand Down
4 changes: 1 addition & 3 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,6 @@ importFrom(magrittr,"%>%")
importFrom(progress,progress_bar)
importFrom(purrr,insistently)
importFrom(purrr,map)
importFrom(purrr,map_chr)
importFrom(purrr,map_dfr)
importFrom(purrr,pluck)
importFrom(purrr,possibly)
Expand All @@ -104,7 +103,6 @@ importFrom(readr,type_convert)
importFrom(rlang,.data)
importFrom(rlang,.env)
importFrom(rlang,arg_match0)
importFrom(rlang,check_installed)
importFrom(rlang,inform)
importFrom(rstudioapi,isAvailable)
importFrom(rstudioapi,versionInfo)
Expand All @@ -116,6 +114,7 @@ importFrom(rvest,html_nodes)
importFrom(rvest,html_table)
importFrom(rvest,html_text)
importFrom(rvest,read_html)
importFrom(rvest,read_html_live)
importFrom(stats,runif)
importFrom(stats,setNames)
importFrom(stringi,stri_unescape_unicode)
Expand All @@ -130,6 +129,5 @@ importFrom(utils,read.csv)
importFrom(utils,sessionInfo)
importFrom(xml2,read_html)
importFrom(xml2,xml_attr)
importFrom(xml2,xml_children)
importFrom(xml2,xml_find_all)
importFrom(xml2,xml_text)
17 changes: 9 additions & 8 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,22 @@

### Bugs

### Breaking Changes

### Improvements

***

# worldfootballR 0.6.6
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"reset" the NEWS


* `fb_league_stats()` not returning `opponent` table. (0.6.5.0001) [#355](https://github.com/JaseZiv/worldfootballR/issues/355)
* `tm_player_bio()` not returning values in the `player_valuation`, `max_player_valuation` and `max_player_valuation_date` fields. Unfortunately, `max_player_valuation` and `max_player_valuation_date` fields are no able to be scraped at this release (0.6.5.0002) [#357](https://github.com/JaseZiv/worldfootballR/issues/357)
* `fb_league_stats()` not returning `player` table when hidden on page load. (0.6.5.0003) [#351](https://github.com/JaseZiv/worldfootballR/issues/351)
* Fix parameter mis-sepcification in fbref vignette. (0.6.5.0005) [#385](https://github.com/JaseZiv/worldfootballR/issues/385)
* `fb_season_team_stats()` failing due to change in FBRef table name. (0.6.5.0007, 0.6.5.0008) [#395](https://github.com/JaseZiv/worldfootballR/issues/389)

### Breaking Changes

* In addressing the issue with `tm_player_injury_history()` in [#375](https://github.com/JaseZiv/worldfootballR/issues/375), the previously names column `club` has been renamed `club_missed_games_for` to better represent that this column will contain the games the player missed games for, as previously this column could have been misunderstood to be who they were playing for when they were injured (0.6.5.0004)

### Improvements

* `understat_match_players` and `understat_match_stats` added. (0.6.5.0006) [#386](https://github.com/JaseZiv/worldfootballR/issues/386)

***
* `fb_league_stats()` unreliable for `team_or_player = "player"`. (0.6.6) [#395](https://github.com/JaseZiv/worldfootballR/issues/395)

# worldfootballR 0.6.5

Expand Down
61 changes: 0 additions & 61 deletions R/chromote-fbref.R

This file was deleted.

24 changes: 13 additions & 11 deletions R/fb_league_stats.R
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#' @importFrom rvest html_table
#' @importFrom rvest html_table read_html_live
#' @importFrom purrr map_dfr
#' @importFrom dplyr mutate
#' @importFrom rlang inform
#' @importFrom tibble tibble
#' @importFrom readr type_convert
.fb_single_league_stats <- function(url, team_or_player) {
.fb_single_league_stats <- function(url, team_or_player, stat_type) {

clean_table <- if (team_or_player == "team") {
page <- .load_page(url)
Expand Down Expand Up @@ -32,26 +32,27 @@
} else {

rlang::inform(
'Please be aware that `fb_league_stats(..., team_or_player = "player")` depends on promises, which may not always work.',
'Please be aware that `fb_league_stats(..., team_or_player = "player")` depends on `rvest::read_html_live` (and chromote), which may not always work.',
.frequency = "once",
.frequency_id = "fb_league_stats-player"
)

session <- worldfootballr_chromote_session(url)
player_table <- worldfootballr_html_player_table(session)
session$session$close(wait_ = FALSE)
page <- rvest::read_html_live(url)
## for keepers: although URLs have plural term, div elements have singular term
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, so easy now! (i can completely delete chromote-fbref.R 😄)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have been using rvest::read_html_live() for work quite a bit, just didn't get to addressing it here though - awesome stuff!

stat_type <- gsub("keepers", "keeper", stat_type)
player_table_element <- page$html_elements(paste0("#div_stats_", stat_type))
page$session$close(wait_ = FALSE)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks for this ID element in the HTML comment that is loaded client side

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before, i hardcoded a search for the #stats_shooting_sh element. in hindsight, i think this meant that fb_league_stats(..., team_or_player = "player") only worked for stat_type = "shooting"?! maybe my thought that our prior code was unreliable was actually more so due to a bug than bad code!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hahaha a very nice unintended fix!


if (is.null(player_table)) {
if (is.null(player_table_element)) {
return(tibble::tibble())
}

player_table_elements <- xml2::xml_children(xml2::xml_children(player_table))
parsed_player_table <- rvest::html_table(player_table_elements)
parsed_player_table <- rvest::html_table(player_table_element)
renamed_player_table <- .rename_fb_cols(parsed_player_table[[1]])
renamed_player_table <- renamed_player_table[renamed_player_table$Rk != "Rk", ]
renamed_player_table <- .add_player_href(
renamed_player_table,
parent_element = player_table_elements,
parent_element = player_table_element,
player_xpath = ".//tbody/tr/td[@data-stat='player']/a"
)
suppressMessages(
Expand Down Expand Up @@ -213,6 +214,7 @@ fb_league_stats <- function(

fi <- purrr::insistently(
.fb_single_league_stats,
rate = purrr::rate_backoff(max_times = 2),
quiet = TRUE
)

Expand All @@ -232,7 +234,7 @@ fb_league_stats <- function(
}
pb$tick()
url <- urls[.x]
res <- fp(url, team_or_player = team_or_player)
res <- fp(url, team_or_player = team_or_player, stat_type = stat_type)
res[["url"]] <- url
res
}
Expand Down
Loading