diff --git a/_freeze/posts/2024-11-13_motorway-service-stations-info/index/execute-results/html.json b/_freeze/posts/2024-11-13_motorway-service-stations-info/index/execute-results/html.json index d7846de..a7a05e4 100644 --- a/_freeze/posts/2024-11-13_motorway-service-stations-info/index/execute-results/html.json +++ b/_freeze/posts/2024-11-13_motorway-service-stations-info/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "88d8091e0dc2f1e0eba68f07ffa2690b", + "hash": "177969eb42373d60c008a7052a3287c2", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Data Quest: Motorway Services UK\"\ndate: '2024-11-08'\nexecute:\n freeze: true\n message: false\n warning: false\ncode-fold: true\nengine: knitr\nfilters:\n - line-highlight\n---\n\n\n\n\n\n:::{layout-ncol=2}\n\n:::{.left}\nI'm working on a few ideas about Motorway Service Stations in the UK, or more specifically the mainland of Great Britain (England, Scotland and Wales). However, I was surprised to discover there weren't any well structured datasets. There is an **incredible** *website* available at [www.motorwayservices.info](www.motorwayservices.info) - I thoroughly recommend a visit.\n:::\n\n:::{.right}\n![](gg_services_roadless_simple.png)\n:::\n\n:::\n\n\nFor very sensible reasons they don't allow data to be scraped (we can't use `{rvest}`), so I've manually downloaded all 107 web pages for the motorway service stations and have them in this folder:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"cjhRutils\")\nlibrary(\"tidyverse\")\nlist.files(quarto_here(\"service-stations/\"), \".html\") %>% head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Abington Services M74 - Motorway Services Information.html\" \n[2] \"Annandale Water Services A74(M) - Motorway Services Information.html\"\n[3] \"Baldock Services A1(M) - Motorway Services Information.html\" \n[4] \"Beaconsfield Services M40 - Motorway Services Information.html\" \n[5] \"Birch Services M62 - Motorway Services Information.html\" \n[6] \"Birchanger Green Services M11 - Motorway Services Information.html\" \n```\n\n\n:::\n:::\n\n\n\nNow we can use `{rvest}` to read these HTML files - let's target the info I want\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"tidyverse\")\nlibrary(\"rvest\")\nexample_abingdon <- read_html(quarto_here(\"service-stations/Abington Services M74 - Motorway Services Information.html\")) %>% \n html_nodes(\".infotext\") %>% \n html_text() %>% \n tibble(\n info = .\n ) %>% \n separate_wider_delim(info, \":\", names = c(\"property\", \"value\")) %>% \n mutate(value = str_trim(value))\n\nexample_abingdon\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 18 × 2\n property value \n \n 1 Motorway \"M74\" \n 2 Where \"at J13\" \n 3 County \"South Lanarkshire\" \n 4 Postcode \"ML12 6RG\" \n 5 Type \"Single site, used by traffic in both directi…\n 6 Operator \"Welcome Break\" \n 7 Contact Phone \"01864 502637\" \n 8 Eat-In Food \"Starbucks, Papa John's, Burger King, Dunkin'…\n 9 Takeaway Food / General \"Retail Shop\" \n10 Other Non-Food Shops \"WH Smith\" \n11 Picnic Area \"yes\" \n12 Cash Machines in main building \"Yes (transaction charge applies)\" \n13 Parking Charges \"Cars free for the first 2 hours then £5 for …\n14 Other Facilities/Information \"GameZone, Tourist Information, BT Openzone\" \n15 Motel \"Days Inn Hotel Abington (Glasgow)\" \n16 Fuel Brand \"Shell\" \n17 LPG available \"Yes\" \n18 Cash Machines at fuel station \"Yes (transaction charge applies)\" \n```\n\n\n:::\n:::\n\n\n\nOkay! That's enough processing to a function I can use to read in all of the data:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nread_motorway_services_info <- function(file_path){\n name_service_station <- str_remove(basename(file_path), \" - Motorway Services Information.html\")\n \n read_html(file_path) %>% \n html_nodes(\".infotext\") %>% \n html_text() %>% \n tibble(\n info = .\n ) %>%\n separate_wider_delim(info, \":\", names = c(\"property\", \"value\"), too_many = \"merge\") %>%\n mutate(value = str_trim(value)) %>%\n mutate(service_station = name_service_station) %>% \n identity()\n}\n\nquarto_here(\"service-stations/Baldock Services A1(M) - Motorway Services Information.html\") %>% \n read_motorway_services_info()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 19 × 3\n property value service_station\n \n 1 Motorway A1(M) Baldock Servic…\n 2 Where at J10 and from A507 Baldock Servic…\n 3 County Hertfordshire Baldock Servic…\n 4 Postcode SG7 5TR Baldock Servic…\n 5 Type Single site, used by traffic … Baldock Servic…\n 6 Operator Extra MSA Baldock Servic…\n 7 Contact Phone 01494 678876 Baldock Servic…\n 8 Eat-In Food KFC, Le Petit Four, McDonalds… Baldock Servic…\n 9 Takeaway Food / General M&S Simply Food, WH Smith (wi… Baldock Servic…\n10 Picnic Area yes Baldock Servic…\n11 Children's Playground Yes Baldock Servic…\n12 Cash Machines in main building Yes (transaction charge appli… Baldock Servic…\n13 Parking Charges First two hours free for all … Baldock Servic…\n14 Other Facilities/Information Fast Food & Bakeries and Conv… Baldock Servic…\n15 Motel Days Inn Stevenage North Baldock Servic…\n16 Fuel Brand Shell Baldock Servic…\n17 LPG available Yes Baldock Servic…\n18 Cash Machines at fuel station Yes (free) Baldock Servic…\n19 Other Facilities/Information Costa Express & Deli2Go avail… Baldock Servic…\n```\n\n\n:::\n\n```{.r .cell-code}\ndata_raw_services <- list.files(quarto_here(\"service-stations/\"), \"[.]html\", full.names = TRUE) %>% \n map_dfr(~read_motorway_services_info(.x))\n```\n:::\n\n\n\n## Northbound / Southbound and Eastbound / Westbound\n\nSome service stations come in pairs (*dual-site service areas or twin sites*) that are split by the motorway and yet **still have the same name**. For instance, Rownhams Services has a McDonalds when accessed westbound but not eastbound. If you looked at a map of the services it appears that they're not connected (that's an overhead sign not a footbridge!).\n\n![](rownsham-services.png)\nBut they are! There's a subway connecting them, which is [apparently difficult to discover](https://www.sabre-roads.org.uk/forum/viewtopic.php?t=44620). Thankfully, our data source [www.motorwayservices.info](www.motorwayservices.info) knows they're connected but does suggest it's a footbridge.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_services %>% \n filter(service_station == \"Rownhams Services M27\") %>% \n filter(property == \"Type\") %>% \n pull(value)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Separate facilities for each carriageway, but linked by a pedestrian footbridge\"\n```\n\n\n:::\n:::\n\n\n\nWe need a way to identify these stations. It turns out the \"Eat-In Food\" property is our friend and identifies the 6 twin-site stations:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_eat_in_pairs <- data_raw_services %>% \n filter(property == \"Eat-In Food\",\n str_detect(value, \"Northbound|Eastbound\")) %>% \n pull(service_station)\nvec_eat_in_pairs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Northampton Services M1\" \"Rownhams Services M27\" \n[3] \"Sandbach Services M6\" \"Strensham Services M5\" \n[5] \"Tibshelf Services M1\" \"Watford Gap Services M1\"\n```\n\n\n:::\n:::\n\n\n\n## Where can we eat\n\nThe Eat-In variable is the most complicated, interesting and ripe for visualisation. So let's treat it separately. First we'll identify our twin-site restaurants:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_eat_in <- data_raw_services %>% \n filter(property == \"Eat-In Food\") %>% \n mutate(directional = str_detect(value,\n \"Northbound|Eastbound\"))\n\ndata_raw_directional_eat <- data_raw_eat_in %>% \n filter(directional == TRUE) %>% \n mutate(direction = case_when(\n str_detect(value, \"Northbound\") ~ \"Northbound|Southbound\",\n str_detect(value, \"Eastbound\") ~ \"Eastbound|Westbound\"\n )) %>% \n separate_longer_delim(direction,\n delim = \"|\") %>% \n mutate(value = case_when(\n direction == \"Northbound\" ~ str_extract(value,\n \"(?<=Northbound: ).*(?=Southbound)\"),\n direction == \"Southbound\" ~ str_extract(value, \"(?<=Southbound).*\"),\n direction == \"Eastbound\" ~ str_extract(value,\n \"(?<=Eastbound: ).*(?=Westbound)\"),\n direction == \"Westbound\" ~ str_extract(value,\n \"(?<=Westbound: ).*\")\n )) \n\ndata_raw_directional_eat\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 12 × 5\n property value service_station directional direction\n \n 1 Eat-In Food \"Costa, Restbite, The Burg… Northampton Se… TRUE Northbou…\n 2 Eat-In Food \" Costa, Hot Food Co., McD… Northampton Se… TRUE Southbou…\n 3 Eat-In Food \"Costa, Restbite. \" Rownhams Servi… TRUE Eastbound\n 4 Eat-In Food \"Costa, Restbite, McDonald… Rownhams Servi… TRUE Westbound\n 5 Eat-In Food \"Costa and Restbite, \" Sandbach Servi… TRUE Northbou…\n 6 Eat-In Food \": Costa, McDonald's, Hot … Sandbach Servi… TRUE Southbou…\n 7 Eat-In Food \"Soho Coffee Company, Hot … Strensham Serv… TRUE Northbou…\n 8 Eat-In Food \": Costa, Hot Food Co., Mc… Strensham Serv… TRUE Southbou…\n 9 Eat-In Food \"Costa, Restbite, McDonald… Tibshelf Servi… TRUE Northbou…\n10 Eat-In Food \": Costa, Restbite, McDona… Tibshelf Servi… TRUE Southbou…\n11 Eat-In Food \"Costa, Fresh Food Cafe, M… Watford Gap Se… TRUE Northbou…\n12 Eat-In Food \": Costa, Restbite, The Bu… Watford Gap Se… TRUE Southbou…\n```\n\n\n:::\n:::\n\n\n\nFrustratingly, Strensham Services has an extra little bit of data about Subway being in the Northbound Forecourt. That'll need manual removal. But other than that I think we end up with fairly well structured data for the eat-in component that we can begin to clean up.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_directional_eat <- data_raw_directional_eat %>% \n mutate(value = str_remove(value, \":\"),\n value = str_remove(value, \" Northbound.*\"),\n value = str_trim(value))\n\ndata_raw_directionless_eat <- data_raw_eat_in %>% \n filter(directional == FALSE) %>% \n mutate(value = str_remove(value, \":|;\"),\n value = str_remove(value, \"(Westbound)\"),\n value = str_trim(value),\n direction = \"Directionless\")\n\ndata_clean_eat_in <- data_raw_directionless_eat %>% \n bind_rows(data_raw_directional_eat) %>% \n select(-directional)\n```\n:::\n\n\n\nThere are lots of alternative spellings in the data, here's a case_when to grab them all. At some point in the future it would be interesting to see if edit distances could help, but for now let's concentrate on getting a useful dataset.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfn_fix_value_columns <- function(data){\n data %>% \n mutate(value = case_when(\n str_detect(tolower(value), \"arlo\") ~ \"Arlo's\", \n str_detect(tolower(value), \"^bk$\") ~ \"Burger King\",\n str_detect(tolower(value), \"cotton\") ~ \"Cotton Traders\", \n str_detect(tolower(value), \"chozen\") ~ \"Chozen Noodles\", \n str_detect(tolower(value), \"cornwall\") ~ \"West Cornwall Pasty Company\", \n str_detect(tolower(value), \"costa\") ~ \"Costa\", \n str_detect(tolower(value), \"eat & drink co\") ~ \"Eat & Drink Co\", \n str_detect(tolower(value), \"edc\") ~ \"Eat & Drink Co\", \n str_detect(tolower(value), \"fone\") ~ \"FoneBiz\", \n str_detect(tolower(value), \"full house\") ~ \"Full House\", \n str_detect(tolower(value), \"greg\") ~ \"Greggs\", \n str_detect(tolower(value), \"harry\") ~ \"Harry Ramsden's\", \n str_detect(tolower(value), \"hot food co\") ~ \"Hot Food Co\",\n str_detect(tolower(value), \"krispy\") ~ \"Krispy Kreme\", \n str_detect(tolower(value), \"le petit\") ~ \"Le Petit Four\", \n str_detect(tolower(value), \"lucky coin\") ~ \"Lucky Coin\", \n str_detect(tolower(value), \"m&s\") ~ \"M&S\", \n str_detect(tolower(value), \"marks\") ~ \"M&S\", \n str_detect(tolower(value), \"mcdona\") ~ \"McDonald's\", \n str_detect(tolower(value), \"papa john\") ~ \"Papa John's\", \n str_detect(tolower(value), \"pizza hut\") ~ \"Pizza Hut\", \n str_detect(tolower(value), \"quicksilver\") ~ \"Quicksilver\", \n str_detect(tolower(value), \"regus\") ~ \"Regus Business Lounge\", \n str_detect(tolower(value), \"restbite\") ~ \"Restbite\", \n str_detect(tolower(value), \"soho\") ~ \"SOHO Coffee Co\", \n str_detect(tolower(value), \"spar\") ~ \"SPAR\", \n str_detect(tolower(value), \"starbucks\") ~ \"Starbucks\", \n str_detect(tolower(value), \"the burger\") ~ \"The Burger Company\", \n str_detect(tolower(value), \"top gift\") ~ \"Top Gift\", \n str_detect(tolower(value), \"tourist information\") ~ \"Tourist Information\", \n str_detect(tolower(value), \"upper\") ~ \"Upper Crust\", \n str_detect(tolower(value), \"whs\") ~ \"WHSmiths\", \n str_detect(tolower(value), \"wild\") ~ \"Wild Bean Cafe\", \n tolower(value) %in% tolower(c(\"WH Smith\", \"WHSMiths\", \"Whsmith\",\"W H Smiths\", \"W.H.Smiths\", \"W H Smith\", \"WH Smiths\", \"Wh Smith\", \"WH smith\")) ~ \"WHSmiths\", \n value == \"Buger King\" ~ \"Burger King\", \n value == \"M & S Simply food\" ~ \"M&S\",\n TRUE ~ value\n ))\n}\n\ndata_long_eat_in <- data_clean_eat_in %>% \n separate_longer_delim(value,\n delim = \",\") %>% \n mutate(value = str_trim(value)) %>% \n filter(value != \"\") %>% \n fn_fix_value_columns() %>% \n select(retailer = value,\n service_station,\n direction)\n```\n:::\n\n\n\nNow... I'm a little unsure about what to do with the \"Takeaway Food / General\" property as it also contains information about where we can get food but for the 6 twin stations the direction isn't provided. Let's deal with the directionless other retailers now:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_long_other_shops_directionless <- data_raw_services %>% \n filter(!service_station %in% vec_eat_in_pairs) %>% \n filter(property %in% c(\"Takeaway Food / General\", \"Other Non-Food Shops\")) %>% \n select(value, service_station) %>% \n filter(value != \"01823680370\") %>% \n separate_longer_delim(value,\n delim = \",\") %>% \n mutate(value = str_trim(value, side = \"both\")) %>% \n fn_fix_value_columns() %>% \n mutate(value = str_remove(value, \"[(].*y[)]\"),\n value = str_trim(value)) %>% \n reframe(retailer = value,\n service_station = service_station,\n direction = \"Directionless\")\n\n## There's one bad record\ndata_long_other_shops_directionless <- tibble(\n retailer = c(\"Gamezone\", \"WHSmiths\", \"Waitrose\"),\n service_station = \"Newport Pagnell Services M1\",\n direction = \"Directionless\"\n) %>%\n bind_rows(filter(\n data_long_other_shops_directionless,!str_detect(retailer, \"24hr Gamezone WHSmith & Waitrose\")\n ))\n```\n:::\n\n\n\nAnd now I'll expand out the twin stations:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Expand out the twins\ndata_long_other_shops_w_direction <- data_raw_services %>% \n filter(service_station %in% vec_eat_in_pairs) %>% \n filter(property %in% c(\"Other Non-Food Shops\", \"Takeaway Food / General\")) %>% \n select(value, service_station) %>% \n mutate(direction = case_when(\n service_station == \"Rownhams Services M27\" ~ \"Eastbound;Westbound\",\n TRUE ~ \"Northbound;Southbound\"\n )) %>% \n separate_longer_delim(direction,\n delim = \";\") %>% \n fn_fix_value_columns() %>% \n rename(retailer = value)\n```\n:::\n\n\n\nIt's time to combine everything together into a list of retailers which I'll export into Excel and quickly categorise.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_long_retailers <- bind_rows(data_long_eat_in, data_long_other_shops_w_direction, data_long_other_shops_directionless)\n\ndata_long_retailers %>% \n distinct(retailer) %>% \n arrange(retailer) %>% \n write_csv(quarto_here(\"retailer_types.csv\"))\n```\n:::\n\n\n\nLet's impose these categorisations:\n\n- is_food_retailer:\n - Do we KNOW we it sells some food items?\n \n- is_retaurant:\n - Do we KNOW we can order food to eat in?\n \n- is_takeaway:\n - Do we KNOW we can order food to takeaway\n \n- is_prepared_food_only\n - Do we KNOW that there is no hot/fresh food, Tesco\n\n- is_coffee_shop\n - Do we KNOW you'd nip there for a coffee and it'll be good? Controversially, McDonald's isn't included.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"readxl\")\ndata_type_of_retailer <- read_excel(quarto_here(\"retailer_types.xlsx\"))\n\ndata_services_retailers <- data_long_retailers %>% \n left_join(data_type_of_retailer) %>% \n mutate(across(starts_with(\"is\"), ~ case_when(\n .x == \"Y\" ~ TRUE,\n .x == \"N\" ~ FALSE,\n TRUE ~ NA\n )))\n\ndata_services_retailers\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 650 × 8\n retailer service_station direction is_food_retailer is_restaurant is_takeaway\n \n 1 Starbuc… Abington Servi… Directio… TRUE NA TRUE \n 2 Papa Jo… Abington Servi… Directio… TRUE TRUE TRUE \n 3 Burger … Abington Servi… Directio… TRUE TRUE TRUE \n 4 Dunkin'… Abington Servi… Directio… TRUE FALSE TRUE \n 5 Harry R… Abington Servi… Directio… TRUE TRUE TRUE \n 6 Costa Annandale Wate… Directio… TRUE FALSE TRUE \n 7 Restbite Annandale Wate… Directio… TRUE NA NA \n 8 The Bur… Annandale Wate… Directio… TRUE TRUE TRUE \n 9 KFC Baldock Servic… Directio… TRUE TRUE TRUE \n10 Le Peti… Baldock Servic… Directio… TRUE FALSE TRUE \n# ℹ 640 more rows\n# ℹ 2 more variables: is_prepared_food_only , is_coffee_shop \n```\n\n\n:::\n:::\n\n\n\n## Non-food information\n\nThe non-food information is so much easier to deal with. Because I want to create an `{sf}` object and potentially support exporting as ESRI shapefiles let's make sure our colnanes have a maximum of 10 characters.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_wide_services <- data_raw_services %>% \n filter(property %in% c(\"Motorway\",\n \"Where\",\n \"Postcode\",\n \"Type\",\n \"Operator\",\n \"Parking Charges\",\n \"LPG available\",\n \"Electric Charge Point\")) %>% \n mutate(property = case_when(\n property == \"LPG available\" ~ \"has_lpg\",\n property == \"Electric Charge Point\" ~ \"has_electric_charge\",\n TRUE ~ property\n )) %>% \n mutate(value = str_replace_all(value, \"ï��[0-9]{1,}\", \"£\"),\n value = str_remove_all(value, \"Â\")) %>% \n pivot_wider(names_from = property,\n values_from = value) %>% \n janitor::clean_names() %>% \n mutate(is_single_site = str_detect(type, \"Single site\"),\n is_twin_station = str_detect(type, \"Separate facilities\"),\n has_walkway_between_twins = case_when(\n is_twin_station == TRUE & str_detect(type, \"linked\") ~ TRUE,\n is_twin_station == TRUE & str_detect(type, \"no link\") ~ FALSE,\n TRUE ~ NA),\n is_ireland = str_detect(service_station, \"Ireland\")) %>% \n reframe(\n name = service_station,\n motorway,\n where,\n postcode,\n type,\n operator,\n is_ireland,\n p_charges = parking_charges,\n has_charge = has_electric_charge,\n is_single = is_single_site,\n is_twin = is_twin_station,\n has_walk = has_walkway_between_twins\n )\n```\n:::\n\n\n\nThere are some services like Gloucester Services M5 that appear as two distinct rows but they still pass `is_single == FALSE`. Let's identify these services and mark them in the dataset as `pair_name`.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_paired_services <- data_wide_services %>% \n filter(is_single == FALSE) %>% \n filter(str_detect(name, \"North|South|East|West\")) %>% \n select(name) %>% \n separate_wider_delim(name, delim = \"Services\",\n names = c(\"name\", \"direction\")) %>% \n mutate(across(everything(), ~str_trim(.))) %>% \n separate_wider_delim(direction,\n delim = \" \",\n names = c(\"direction\",\n \"motorway\"),\n too_few = \"align_end\") %>% \n add_count(name, motorway) %>% \n filter(n > 1) %>% \n reframe(name = paste(name, \"Services\", direction, motorway),\n pair_name = paste(name, motorway))\n\n\ndata_services_info <- data_wide_services %>% \n left_join(data_paired_services) %>% \n mutate(is_pair = ifelse(is.na(pair_name), FALSE, TRUE))\n\ndata_services_info\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 106 × 14\n name motorway where postcode type operator is_ireland p_charges has_charge\n \n 1 Abing… M74 at J… ML12 6RG Sing… Welcome… FALSE \"Cars fr… \n 2 Annan… A74(M) at J… DG11 1HD Sing… RoadChef FALSE \"Parking… Yes (More…\n 3 Baldo… A1(M) at J… SG7 5TR Sing… Extra M… FALSE \"First t… \n 4 Beaco… M40 at J… HP9 2SE Sing… Extra M… FALSE \n 5 Birch… M62 betw… OL10 2HQ Sepa… Moto FALSE \"Car - £… \n 6 Birch… M11 at J… CM23 5QZ Sing… Welcome… FALSE \"Parking… \n 7 Black… M65 at J… BB3 0AT Sing… Extra M… FALSE \"Parking… \n 8 Blyth… A1(M) at J… S81 8HG Sing… Moto FALSE \"Free fo… Yes (More…\n 9 Bothw… M74 betw… G71 8BG Faci… RoadChef FALSE \"Parking… Yes (More…\n10 Bridg… M5 at J… TA6 6TS Sing… Moto FALSE \"Cars - … \n# ℹ 96 more rows\n# ℹ 5 more variables: is_single , is_twin , has_walk ,\n# pair_name , is_pair \n```\n\n\n:::\n:::\n\n\n\n## Getting the locations of the services...\n\nI've gone and got the coords from Google Maps and stored them in an Excel file (because I'm not perfect). Here's a very quick interactive `{leaflet}` map showing where they are:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"sf\")\nlibrary(\"leaflet\")\ndata_raw_service_locations <- read_excel(quarto_here(\"services-locations.xlsx\"))\n\ndata_clean_long_lat <- data_raw_service_locations %>% \n separate_wider_delim(google_pin,\n delim = \",\", \n names = c(\"lat\", \"long\")) %>% \n mutate(long = as.numeric(long),\n lat = as.numeric(lat)) %>% \n select(name, long, lat)\n\n\ndata_sf_service_locs <- data_clean_long_lat %>% \n full_join(data_services_info) %>% \n st_as_sf(coords = c(\"long\", \"lat\"),\n crs = 4326)\n \ndata_sf_service_locs %>% \n filter(is_ireland == FALSE) %>% \n leaflet() %>% \n addProviderTiles(providers$OpenStreetMap) %>% \n addCircleMarkers()\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\n\n## Operators\n\nI want to make sure we're not over counting operators due to pair sites! So let's make some explicit counts.\n\n- n_named_sites_all: How many uniquely named sites are there across Great Britian? `Gloucester Northbound Services M5` and `Gloucester Southbound Services M5` are distinctly named, but the `Birch Services M62` is listed once despite being a twinned site.\n\n- n_named_sites_mainland: same as above but discounting services in Ireland\n\n- n_named_sites_ireland: only counts uniquely named services in Ireland\n\n- n_single_sites: How many sites are accessible by traffic in both directions\n\n- n_twins: How many sites are twinned, two locations on each side of the motorway with or without a walkway between them\n\n- n_pairs: How many sites have paired names, eg `Gloucester Northbound Services M5` and `Gloucester Southbound Services M5`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_process_ops_single <- data_services_info %>% \n select(name, operator, is_single) %>% \n count(operator, is_single) %>% \n filter(is_single == TRUE) %>% \n reframe(operator,\n n_single_sites = n)\n\ndata_process_ops_twins <- data_services_info %>% \n count(operator, is_twin) %>% \n filter(is_twin == TRUE) %>% \n reframe(operator,\n n_twins = n)\n\ndata_process_ops_pair <- data_services_info %>% \n select(name, operator, is_pair) %>% \n count(operator, is_pair) %>% \n filter(is_pair == TRUE) %>% \n reframe(operator,\n n_pairs = n)\n\ndata_process_ops_simple_all <- data_services_info %>% \n count(operator, name = \"n_named_sites_all\") \n\ndata_process_ops_simple_ireland <- data_services_info %>% \n filter(str_detect(name, \"Ireland\")) %>% \n count(operator, name = \"n_named_sites_ireland\") \n\n\ndata_operators <- data_process_ops_simple_all %>% \n left_join(data_process_ops_simple_ireland) %>% \n left_join(data_process_ops_single) %>% \n left_join(data_process_ops_twins) %>% \n left_join(data_process_ops_pair) %>% \n mutate(across(everything(), ~replace_na(.x, 0))) %>% \n mutate(n_named_sites_mainland = n_named_sites_all - n_named_sites_ireland) %>% \n select(\n operator,\n n_named_sites_all,\n n_named_sites_mainland,\n n_named_sites_ireland,\n everything())\n\ndata_operators\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 9 × 7\n operator n_named_sites_all n_named_sites_mainland n_named_sites_ireland\n \n1 Applegreen 3 0 3\n2 BP Connect 1 1 0\n3 Euro Garages 2 2 0\n4 Extra MSA 7 7 0\n5 Moto 39 39 0\n6 RoadChef 20 20 0\n7 Stop 24 1 1 0\n8 Welcome Break 27 27 0\n9 Westmorland 6 6 0\n# ℹ 3 more variables: n_single_sites , n_twins , n_pairs \n```\n\n\n:::\n:::\n\n\n\n## Exporting all that good data\n\nI'd really love this dataset to become a Tidy Tuesday dataset! So while writing this post [I've created a fork of the repo](https://github.com/charliejhadley/tidytuesday/tree/Motorway-Services-UK/data/curated/motorway-services-uk). If my eventual pull request gets accepted we'd be able to pull the data from the official TidyTuesday repo, but until then it's available as follows\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(read_csv(\"https://raw.githubusercontent.com/charliejhadley/tidytuesday/refs/heads/Motorway-Services-UK/data/curated/motorway-services-uk/data_service_locations.csv\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 15\n name long lat motorway where postcode type operator p_charges has_charge\n \n1 Abin… -3.70 55.5 M74 at J… ML12 6RG Sing… Welcome… \"Cars fr… \n2 Anna… -3.41 55.2 A74(M) at J… DG11 1HD Sing… RoadChef \"Parking… Yes (More…\n3 Bald… -0.205 52.0 A1(M) at J… SG7 5TR Sing… Extra M… \"First t… \n4 Beac… -0.630 51.6 M40 at J… HP9 2SE Sing… Extra M… \n5 Birc… -2.23 53.6 M62 betw… OL10 2HQ Sepa… Moto \"Car - £… \n6 Birc… 0.192 51.9 M11 at J… CM23 5QZ Sing… Welcome… \"Parking… \n# ℹ 5 more variables: is_single , is_twin , has_walk ,\n# pair_name , is_pair \n```\n\n\n:::\n:::\n\n\n\n\n\n## Let's make a map\n\nI've obtained some [high quality shapefiles for the UK from the ONS](https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2023-boundaries-uk-bfc-2/about) which I'm going to immediately start throwing information away from.\n\n- There aren't any true service stations in Northern Ireland, so we'll include only England, Scotland and Wales\n\n- There are only service stations on the mainland! So let's discount any polygon with an area smaller than 1E10m^2\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_sf_uk <- read_sf(quarto_here(\"Countries_December_2023_Boundaries_UK_BFC_-5189344684762562119/\"))\n\ndata_sf_gb_mainland <- data_sf_uk %>% \n filter(CTRY23NM != \"Northern Ireland\") %>% \n st_cast(\"POLYGON\") %>% \n mutate(area = as.numeric(st_area(geometry))) %>% \n filter(area >= 1E10) %>%\n # st_union() %>%\n st_as_sf()\n\ndata_sf_gb_mainland\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 9 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: 134112.4 ymin: 11429.67 xmax: 655653.8 ymax: 976859.9\nProjected CRS: OSGB36 / British National Grid\n# A tibble: 3 × 10\n CTRY23CD CTRY23NM CTRY23NMW BNG_E BNG_N LONG LAT GlobalID \n* \n1 E92000001 England Lloegr 394883 370883 -2.08 53.2 ea73ad5d-1f4e-4f07-8e0…\n2 S92000003 Scotland Yr Alban 277744 700060 -3.97 56.2 f2267107-2e4a-442e-bc8…\n3 W92000004 Wales Cymru 263405 242881 -3.99 52.1 d818bd0d-8e08-446f-889…\n# ℹ 2 more variables: geometry , area \n```\n\n\n:::\n:::\n\n\n\nIt takes a fair amount of time to plot, so I can use `{rmapshaper} to simplify the borders, which look okay:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"rmapshaper\")\n\ndata_sf_simpler_mainland <- ms_simplify(data_sf_gb_mainland, keep = 0.0005)\n\nggplot() +\n geom_sf(data = data_sf_simpler_mainland) +\n geom_sf(data = filter(data_sf_service_locs, is_ireland == FALSE )) +\n coord_sf(crs = 4326,\n ylim = c(50, 59))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-20-1.png){width=672}\n:::\n:::\n\n\n\nLet's build towards an okay looking chart:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"patchwork\")\nlibrary(\"ggtext\")\n\ndata_plot_services <- data_sf_service_locs %>% \n filter(is_ireland == FALSE) %>% \n left_join(select(data_operators, operator, n_named_sites_all)) %>% \n mutate(operator = fct_reorder(operator, n_named_sites_all))\n\ncolour_motorway_blue <- \"#3070B5\"\n\ngg_services_roadless <- ggplot() +\n geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),\n fill = colour_motorway_blue,\n colour = \"white\",\n linewidth = 0.8) +\n geom_sf(data = st_transform(data_plot_services, crs = 4326),\n aes(fill = operator),\n pch = 21,\n size = 3.5,\n colour = \"white\") +\n geom_richtext(aes(x = -9,\n y = 54,\n label = \"Tiredness can kill
Take a break\"),\n family = \"Transport\",,\n fill = \"transparent\",\n label.color = NA,\n colour = \"white\"\n ) +\n scale_fill_brewer(palette = \"Dark2\") +\n guides(fill = guide_legend(\n # override.aes = list(size = 8), \n title = \"\", reverse = TRUE)\n ) +\n scale_x_continuous(labels = scales::label_number(accuracy = 0.01)) +\n scale_y_continuous(labels = scales::label_number(accuracy = 0.01)) +\n coord_sf(crs = 4326,\n ylim = c(50, 59),\n xlim = c(-12, 1.76)) + \n # theme_classic(base_family = \"Transport\") +\n theme_void(base_family = \"Transport\") +\n theme(legend.text = element_text(colour = \"white\"),\n # legend.spacing.y = unit(2.0, \"cm\"),\n legend.background = element_rect(fill = colour_motorway_blue, colour = \"transparent\"),\n plot.background = element_rect(fill = colour_motorway_blue),\n panel.background = element_blank()\n )\n\ngg_services_roadless\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-21-1.png){width=672}\n:::\n:::\n\n\n\n### Making it look like a motorway sign\n\nThere is a simply beautiful [design guide for UK traffic signs](https://assets.publishing.service.gov.uk/media/5c78f8c7e5274a0ebfec719c/traffic-signs-manual-chapter-07.pdf) that goes into **all** of the detail, for instance:\n\n![](motorway-sign-design.png)\n\nAt some point it could be fun to take all of this and convert it into a `{ggplot2}` theme - but that's a lot of work. I want to focus on getting that nice round white border on my chart. That's more difficult than I originally thought, there are two pathways:\n\n- Fiddle around with grobs thanks to [Claus Wilke's great StackOverflow Answer](https://stackoverflow.com/a/48220347/1659890) on adding round corners to the panel border.\n\n- Shove a rounded rectangle onto the chart through the `geom_rrect()` function from `{ggchicklet}`... which is much easier:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"ggchicklet\") # remotes::install_github(\"hrbrmstr/ggchicklet\")\ngg_services_roadless +\n geom_rrect(aes(xmin = -12, xmax = 1.76, ymin = 50, ymax = 59),\n fill = \"transparent\",\n colour = \"white\",\n r = unit(0.1, 'npc'))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-22-1.png){width=672}\n:::\n:::\n\n\n\nNow let's rebuild the chart and set the sizing to work well on export :)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlims_x <- list(min = -12.9, max = 1.86)\nlims_y <- list(min = 50.2, max = 58.5)\n\ngg_services_roadless <- ggplot() +\n geom_rrect(aes(xmin = lims_x$min - 0.8, \n xmax = lims_x$max + 0.8, \n ymin = lims_y$min - 0.65, \n ymax = lims_y$max + 0.65),\n fill = colour_motorway_blue,\n colour = \"white\",\n size = 15,\n r = unit(0.1, 'npc')) +\n geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),\n fill = colour_motorway_blue,\n colour = \"white\",\n linewidth = 0.8) +\n geom_sf(data = st_transform(data_plot_services, crs = 4326),\n aes(fill = operator),\n pch = 21,\n size = 3.5,\n colour = \"white\") +\n geom_richtext(aes(x = -8.5,\n y = 53.7,\n label = \"Tiredness can kill
Take a break\"), \n size = 20,\n family = \"Transport\",,\n fill = \"transparent\",\n label.color = NA,\n colour = \"white\"\n ) +\n scale_fill_brewer(palette = \"Dark2\") +\n guides(fill = guide_legend(override.aes = list(size = 8), title = \"\", reverse = TRUE)) +\n scale_x_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = 1)) +\n scale_y_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = c(1, 1))) +\n coord_sf(crs = 4326,\n ylim = as.numeric(lims_y),\n xlim = as.numeric(lims_x)) + \n # theme_classic(base_family = \"Transport\") +\n theme_void(base_family = \"Transport\") +\n theme(legend.position=c(.85,.75),\n legend.text = element_text(colour = \"white\", size = 20),\n legend.spacing.y = unit(2.0, \"cm\"),\n legend.key.size = unit(1.7, \"cm\"),\n legend.background = element_rect(fill = \"transparent\", colour = \"transparent\"),\n plot.background = element_rect(fill = \"grey90\", colour = \"transparent\"),\n panel.background = element_blank(),\n plot.margin = margin(t = 1, r = 0, b = 1, l = 0)\n )\n\ngg_services_roadless\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-23-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggsave(quarto_here(\"gg_services_roadless_simple.png\"),\n gg_services_roadless,\n width = 2 * 7.2,\n height = 2 * 7.5,\n bg = \"grey90\")\n```\n:::\n\n\n\n![](gg_services_roadless_simple.png)", + "markdown": "---\ntitle: \"Data Quest: Motorway Services UK\"\ndate: '2024-11-22'\nexecute:\n freeze: true\n message: false\n warning: false\ncode-fold: true\nengine: knitr\nfilters:\n - line-highlight\n---\n\n\n\n\n\n:::{layout-ncol=2}\n\n:::{.left}\nI'm working on a few ideas about Motorway Service Stations in the UK, or more specifically the mainland of Great Britain (England, Scotland and Wales). However, I was surprised to discover there weren't any well structured datasets. There is an **incredible** *website* available at [www.motorwayservices.info](www.motorwayservices.info) - I thoroughly recommend a visit.\n:::\n\n:::{.right}\n![](gg_services_roadless_simple.png)\n:::\n\n:::\n\n\nFor very sensible reasons they don't allow data to be scraped (we can't use `{rvest}`), so I've manually downloaded all 107 web pages for the motorway service stations and have them in this folder:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"cjhRutils\")\nlibrary(\"tidyverse\")\nlist.files(quarto_here(\"service-stations/\"), \".html\") %>% head()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Abington Services M74 - Motorway Services Information.html\" \n[2] \"Annandale Water Services A74(M) - Motorway Services Information.html\"\n[3] \"Baldock Services A1(M) - Motorway Services Information.html\" \n[4] \"Beaconsfield Services M40 - Motorway Services Information.html\" \n[5] \"Birch Services M62 - Motorway Services Information.html\" \n[6] \"Birchanger Green Services M11 - Motorway Services Information.html\" \n```\n\n\n:::\n:::\n\n\n\nNow we can use `{rvest}` to read these HTML files - let's target the info I want\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"tidyverse\")\nlibrary(\"rvest\")\nexample_abingdon <- read_html(quarto_here(\"service-stations/Abington Services M74 - Motorway Services Information.html\")) %>% \n html_nodes(\".infotext\") %>% \n html_text() %>% \n tibble(\n info = .\n ) %>% \n separate_wider_delim(info, \":\", names = c(\"property\", \"value\")) %>% \n mutate(value = str_trim(value))\n\nexample_abingdon\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 18 × 2\n property value \n \n 1 Motorway \"M74\" \n 2 Where \"at J13\" \n 3 County \"South Lanarkshire\" \n 4 Postcode \"ML12 6RG\" \n 5 Type \"Single site, used by traffic in both directi…\n 6 Operator \"Welcome Break\" \n 7 Contact Phone \"01864 502637\" \n 8 Eat-In Food \"Starbucks, Papa John's, Burger King, Dunkin'…\n 9 Takeaway Food / General \"Retail Shop\" \n10 Other Non-Food Shops \"WH Smith\" \n11 Picnic Area \"yes\" \n12 Cash Machines in main building \"Yes (transaction charge applies)\" \n13 Parking Charges \"Cars free for the first 2 hours then £5 for …\n14 Other Facilities/Information \"GameZone, Tourist Information, BT Openzone\" \n15 Motel \"Days Inn Hotel Abington (Glasgow)\" \n16 Fuel Brand \"Shell\" \n17 LPG available \"Yes\" \n18 Cash Machines at fuel station \"Yes (transaction charge applies)\" \n```\n\n\n:::\n:::\n\n\n\nOkay! That's enough processing to a function I can use to read in all of the data:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nread_motorway_services_info <- function(file_path){\n name_service_station <- str_remove(basename(file_path), \" - Motorway Services Information.html\")\n \n read_html(file_path) %>% \n html_nodes(\".infotext\") %>% \n html_text() %>% \n tibble(\n info = .\n ) %>%\n separate_wider_delim(info, \":\", names = c(\"property\", \"value\"), too_many = \"merge\") %>%\n mutate(value = str_trim(value)) %>%\n mutate(service_station = name_service_station) %>% \n identity()\n}\n\nquarto_here(\"service-stations/Baldock Services A1(M) - Motorway Services Information.html\") %>% \n read_motorway_services_info()\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 19 × 3\n property value service_station\n \n 1 Motorway A1(M) Baldock Servic…\n 2 Where at J10 and from A507 Baldock Servic…\n 3 County Hertfordshire Baldock Servic…\n 4 Postcode SG7 5TR Baldock Servic…\n 5 Type Single site, used by traffic … Baldock Servic…\n 6 Operator Extra MSA Baldock Servic…\n 7 Contact Phone 01494 678876 Baldock Servic…\n 8 Eat-In Food KFC, Le Petit Four, McDonalds… Baldock Servic…\n 9 Takeaway Food / General M&S Simply Food, WH Smith (wi… Baldock Servic…\n10 Picnic Area yes Baldock Servic…\n11 Children's Playground Yes Baldock Servic…\n12 Cash Machines in main building Yes (transaction charge appli… Baldock Servic…\n13 Parking Charges First two hours free for all … Baldock Servic…\n14 Other Facilities/Information Fast Food & Bakeries and Conv… Baldock Servic…\n15 Motel Days Inn Stevenage North Baldock Servic…\n16 Fuel Brand Shell Baldock Servic…\n17 LPG available Yes Baldock Servic…\n18 Cash Machines at fuel station Yes (free) Baldock Servic…\n19 Other Facilities/Information Costa Express & Deli2Go avail… Baldock Servic…\n```\n\n\n:::\n\n```{.r .cell-code}\ndata_raw_services <- list.files(quarto_here(\"service-stations/\"), \"[.]html\", full.names = TRUE) %>% \n map_dfr(~read_motorway_services_info(.x))\n```\n:::\n\n\n\n## Northbound / Southbound and Eastbound / Westbound\n\nSome service stations come in pairs (*dual-site service areas or twin sites*) that are split by the motorway and yet **still have the same name**. For instance, Rownhams Services has a McDonalds when accessed westbound but not eastbound. If you looked at a map of the services it appears that they're not connected (that's an overhead sign not a footbridge!).\n\n![](rownsham-services.png)\nBut they are! There's a subway connecting them, which is [apparently difficult to discover](https://www.sabre-roads.org.uk/forum/viewtopic.php?t=44620). Thankfully, our data source [www.motorwayservices.info](www.motorwayservices.info) knows they're connected but does suggest it's a footbridge.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_services %>% \n filter(service_station == \"Rownhams Services M27\") %>% \n filter(property == \"Type\") %>% \n pull(value)\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Separate facilities for each carriageway, but linked by a pedestrian footbridge\"\n```\n\n\n:::\n:::\n\n\n\nWe need a way to identify these stations. It turns out the \"Eat-In Food\" property is our friend and identifies the 6 twin-site stations:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nvec_eat_in_pairs <- data_raw_services %>% \n filter(property == \"Eat-In Food\",\n str_detect(value, \"Northbound|Eastbound\")) %>% \n pull(service_station)\nvec_eat_in_pairs\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n[1] \"Northampton Services M1\" \"Rownhams Services M27\" \n[3] \"Sandbach Services M6\" \"Strensham Services M5\" \n[5] \"Tibshelf Services M1\" \"Watford Gap Services M1\"\n```\n\n\n:::\n:::\n\n\n\n## Where can we eat\n\nThe Eat-In variable is the most complicated, interesting and ripe for visualisation. So let's treat it separately. First we'll identify our twin-site restaurants:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_eat_in <- data_raw_services %>% \n filter(property == \"Eat-In Food\") %>% \n mutate(directional = str_detect(value,\n \"Northbound|Eastbound\"))\n\ndata_raw_directional_eat <- data_raw_eat_in %>% \n filter(directional == TRUE) %>% \n mutate(direction = case_when(\n str_detect(value, \"Northbound\") ~ \"Northbound|Southbound\",\n str_detect(value, \"Eastbound\") ~ \"Eastbound|Westbound\"\n )) %>% \n separate_longer_delim(direction,\n delim = \"|\") %>% \n mutate(value = case_when(\n direction == \"Northbound\" ~ str_extract(value,\n \"(?<=Northbound: ).*(?=Southbound)\"),\n direction == \"Southbound\" ~ str_extract(value, \"(?<=Southbound).*\"),\n direction == \"Eastbound\" ~ str_extract(value,\n \"(?<=Eastbound: ).*(?=Westbound)\"),\n direction == \"Westbound\" ~ str_extract(value,\n \"(?<=Westbound: ).*\")\n )) \n\ndata_raw_directional_eat\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 12 × 5\n property value service_station directional direction\n \n 1 Eat-In Food \"Costa, Restbite, The Burg… Northampton Se… TRUE Northbou…\n 2 Eat-In Food \" Costa, Hot Food Co., McD… Northampton Se… TRUE Southbou…\n 3 Eat-In Food \"Costa, Restbite. \" Rownhams Servi… TRUE Eastbound\n 4 Eat-In Food \"Costa, Restbite, McDonald… Rownhams Servi… TRUE Westbound\n 5 Eat-In Food \"Costa and Restbite, \" Sandbach Servi… TRUE Northbou…\n 6 Eat-In Food \": Costa, McDonald's, Hot … Sandbach Servi… TRUE Southbou…\n 7 Eat-In Food \"Soho Coffee Company, Hot … Strensham Serv… TRUE Northbou…\n 8 Eat-In Food \": Costa, Hot Food Co., Mc… Strensham Serv… TRUE Southbou…\n 9 Eat-In Food \"Costa, Restbite, McDonald… Tibshelf Servi… TRUE Northbou…\n10 Eat-In Food \": Costa, Restbite, McDona… Tibshelf Servi… TRUE Southbou…\n11 Eat-In Food \"Costa, Fresh Food Cafe, M… Watford Gap Se… TRUE Northbou…\n12 Eat-In Food \": Costa, Restbite, The Bu… Watford Gap Se… TRUE Southbou…\n```\n\n\n:::\n:::\n\n\n\nFrustratingly, Strensham Services has an extra little bit of data about Subway being in the Northbound Forecourt. That'll need manual removal. But other than that I think we end up with fairly well structured data for the eat-in component that we can begin to clean up.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_raw_directional_eat <- data_raw_directional_eat %>% \n mutate(value = str_remove(value, \":\"),\n value = str_remove(value, \" Northbound.*\"),\n value = str_trim(value))\n\ndata_raw_directionless_eat <- data_raw_eat_in %>% \n filter(directional == FALSE) %>% \n mutate(value = str_remove(value, \":|;\"),\n value = str_remove(value, \"(Westbound)\"),\n value = str_trim(value),\n direction = \"Directionless\")\n\ndata_clean_eat_in <- data_raw_directionless_eat %>% \n bind_rows(data_raw_directional_eat) %>% \n select(-directional)\n```\n:::\n\n\n\nThere are lots of alternative spellings in the data, here's a case_when to grab them all. At some point in the future it would be interesting to see if edit distances could help, but for now let's concentrate on getting a useful dataset.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nfn_fix_value_columns <- function(data){\n data %>% \n mutate(value = case_when(\n str_detect(tolower(value), \"arlo\") ~ \"Arlo's\", \n str_detect(tolower(value), \"^bk$\") ~ \"Burger King\",\n str_detect(tolower(value), \"cotton\") ~ \"Cotton Traders\", \n str_detect(tolower(value), \"chozen\") ~ \"Chozen Noodles\", \n str_detect(tolower(value), \"cornwall\") ~ \"West Cornwall Pasty Company\", \n str_detect(tolower(value), \"costa\") ~ \"Costa\", \n str_detect(tolower(value), \"eat & drink co\") ~ \"Eat & Drink Co\", \n str_detect(tolower(value), \"edc\") ~ \"Eat & Drink Co\", \n str_detect(tolower(value), \"fone\") ~ \"FoneBiz\", \n str_detect(tolower(value), \"full house\") ~ \"Full House\", \n str_detect(tolower(value), \"greg\") ~ \"Greggs\", \n str_detect(tolower(value), \"harry\") ~ \"Harry Ramsden's\", \n str_detect(tolower(value), \"hot food co\") ~ \"Hot Food Co\",\n str_detect(tolower(value), \"krispy\") ~ \"Krispy Kreme\", \n str_detect(tolower(value), \"le petit\") ~ \"Le Petit Four\", \n str_detect(tolower(value), \"lucky coin\") ~ \"Lucky Coin\", \n str_detect(tolower(value), \"m&s\") ~ \"M&S\", \n str_detect(tolower(value), \"marks\") ~ \"M&S\", \n str_detect(tolower(value), \"mcdona\") ~ \"McDonald's\", \n str_detect(tolower(value), \"papa john\") ~ \"Papa John's\", \n str_detect(tolower(value), \"pizza hut\") ~ \"Pizza Hut\", \n str_detect(tolower(value), \"quicksilver\") ~ \"Quicksilver\", \n str_detect(tolower(value), \"regus\") ~ \"Regus Business Lounge\", \n str_detect(tolower(value), \"restbite\") ~ \"Restbite\", \n str_detect(tolower(value), \"soho\") ~ \"SOHO Coffee Co\", \n str_detect(tolower(value), \"spar\") ~ \"SPAR\", \n str_detect(tolower(value), \"starbucks\") ~ \"Starbucks\", \n str_detect(tolower(value), \"the burger\") ~ \"The Burger Company\", \n str_detect(tolower(value), \"top gift\") ~ \"Top Gift\", \n str_detect(tolower(value), \"tourist information\") ~ \"Tourist Information\", \n str_detect(tolower(value), \"upper\") ~ \"Upper Crust\", \n str_detect(tolower(value), \"whs\") ~ \"WHSmiths\", \n str_detect(tolower(value), \"wild\") ~ \"Wild Bean Cafe\", \n tolower(value) %in% tolower(c(\"WH Smith\", \"WHSMiths\", \"Whsmith\",\"W H Smiths\", \"W.H.Smiths\", \"W H Smith\", \"WH Smiths\", \"Wh Smith\", \"WH smith\")) ~ \"WHSmiths\", \n value == \"Buger King\" ~ \"Burger King\", \n value == \"M & S Simply food\" ~ \"M&S\",\n TRUE ~ value\n ))\n}\n\ndata_long_eat_in <- data_clean_eat_in %>% \n separate_longer_delim(value,\n delim = \",\") %>% \n mutate(value = str_trim(value)) %>% \n filter(value != \"\") %>% \n fn_fix_value_columns() %>% \n select(retailer = value,\n service_station,\n direction)\n```\n:::\n\n\n\nNow... I'm a little unsure about what to do with the \"Takeaway Food / General\" property as it also contains information about where we can get food but for the 6 twin stations the direction isn't provided. Let's deal with the directionless other retailers now:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_long_other_shops_directionless <- data_raw_services %>% \n filter(!service_station %in% vec_eat_in_pairs) %>% \n filter(property %in% c(\"Takeaway Food / General\", \"Other Non-Food Shops\")) %>% \n select(value, service_station) %>% \n filter(value != \"01823680370\") %>% \n separate_longer_delim(value,\n delim = \",\") %>% \n mutate(value = str_trim(value, side = \"both\")) %>% \n fn_fix_value_columns() %>% \n mutate(value = str_remove(value, \"[(].*y[)]\"),\n value = str_trim(value)) %>% \n reframe(retailer = value,\n service_station = service_station,\n direction = \"Directionless\")\n\n## There's one bad record\ndata_long_other_shops_directionless <- tibble(\n retailer = c(\"Gamezone\", \"WHSmiths\", \"Waitrose\"),\n service_station = \"Newport Pagnell Services M1\",\n direction = \"Directionless\"\n) %>%\n bind_rows(filter(\n data_long_other_shops_directionless,!str_detect(retailer, \"24hr Gamezone WHSmith & Waitrose\")\n ))\n```\n:::\n\n\n\nAnd now I'll expand out the twin stations:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\n## Expand out the twins\ndata_long_other_shops_w_direction <- data_raw_services %>% \n filter(service_station %in% vec_eat_in_pairs) %>% \n filter(property %in% c(\"Other Non-Food Shops\", \"Takeaway Food / General\")) %>% \n select(value, service_station) %>% \n mutate(direction = case_when(\n service_station == \"Rownhams Services M27\" ~ \"Eastbound;Westbound\",\n TRUE ~ \"Northbound;Southbound\"\n )) %>% \n separate_longer_delim(direction,\n delim = \";\") %>% \n fn_fix_value_columns() %>% \n rename(retailer = value)\n```\n:::\n\n\n\nIt's time to combine everything together into a list of retailers which I'll export into Excel and quickly categorise.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_long_retailers <- bind_rows(data_long_eat_in, data_long_other_shops_w_direction, data_long_other_shops_directionless)\n\ndata_long_retailers %>% \n distinct(retailer) %>% \n arrange(retailer) %>% \n write_csv(quarto_here(\"retailer_types.csv\"))\n```\n:::\n\n\n\nLet's impose these categorisations:\n\n- is_food_retailer:\n - Do we KNOW we it sells some food items?\n \n- is_retaurant:\n - Do we KNOW we can order food to eat in?\n \n- is_takeaway:\n - Do we KNOW we can order food to takeaway\n \n- is_prepared_food_only\n - Do we KNOW that there is no hot/fresh food, Tesco\n\n- is_coffee_shop\n - Do we KNOW you'd nip there for a coffee and it'll be good? Controversially, McDonald's isn't included.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"readxl\")\ndata_type_of_retailer <- read_excel(quarto_here(\"retailer_types.xlsx\"))\n\ndata_services_retailers <- data_long_retailers %>% \n left_join(data_type_of_retailer) %>% \n mutate(across(starts_with(\"is\"), ~ case_when(\n .x == \"Y\" ~ TRUE,\n .x == \"N\" ~ FALSE,\n TRUE ~ NA\n )))\n\ndata_services_retailers\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 650 × 8\n retailer service_station direction is_food_retailer is_restaurant is_takeaway\n \n 1 Starbuc… Abington Servi… Directio… TRUE NA TRUE \n 2 Papa Jo… Abington Servi… Directio… TRUE TRUE TRUE \n 3 Burger … Abington Servi… Directio… TRUE TRUE TRUE \n 4 Dunkin'… Abington Servi… Directio… TRUE FALSE TRUE \n 5 Harry R… Abington Servi… Directio… TRUE TRUE TRUE \n 6 Costa Annandale Wate… Directio… TRUE FALSE TRUE \n 7 Restbite Annandale Wate… Directio… TRUE NA NA \n 8 The Bur… Annandale Wate… Directio… TRUE TRUE TRUE \n 9 KFC Baldock Servic… Directio… TRUE TRUE TRUE \n10 Le Peti… Baldock Servic… Directio… TRUE FALSE TRUE \n# ℹ 640 more rows\n# ℹ 2 more variables: is_prepared_food_only , is_coffee_shop \n```\n\n\n:::\n:::\n\n\n\n## Non-food information\n\nThe non-food information is so much easier to deal with. Because I want to create an `{sf}` object and potentially support exporting as ESRI shapefiles let's make sure our colnanes have a maximum of 10 characters.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_wide_services <- data_raw_services %>% \n filter(property %in% c(\"Motorway\",\n \"Where\",\n \"Postcode\",\n \"Type\",\n \"Operator\",\n \"Parking Charges\",\n \"LPG available\",\n \"Electric Charge Point\")) %>% \n mutate(property = case_when(\n property == \"LPG available\" ~ \"has_lpg\",\n property == \"Electric Charge Point\" ~ \"has_electric_charge\",\n TRUE ~ property\n )) %>% \n mutate(value = str_replace_all(value, \"ï��[0-9]{1,}\", \"£\"),\n value = str_remove_all(value, \"Â\")) %>% \n pivot_wider(names_from = property,\n values_from = value) %>% \n janitor::clean_names() %>% \n mutate(is_single_site = str_detect(type, \"Single site\"),\n is_twin_station = str_detect(type, \"Separate facilities\"),\n has_walkway_between_twins = case_when(\n is_twin_station == TRUE & str_detect(type, \"linked\") ~ TRUE,\n is_twin_station == TRUE & str_detect(type, \"no link\") ~ FALSE,\n TRUE ~ NA),\n is_ireland = str_detect(service_station, \"Ireland\")) %>% \n reframe(\n name = service_station,\n motorway,\n where,\n postcode,\n type,\n operator,\n is_ireland,\n p_charges = parking_charges,\n has_charge = has_electric_charge,\n is_single = is_single_site,\n is_twin = is_twin_station,\n has_walk = has_walkway_between_twins\n )\n```\n:::\n\n\n\nThere are some services like Gloucester Services M5 that appear as two distinct rows but they still pass `is_single == FALSE`. Let's identify these services and mark them in the dataset as `pair_name`.\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_paired_services <- data_wide_services %>% \n filter(is_single == FALSE) %>% \n filter(str_detect(name, \"North|South|East|West\")) %>% \n select(name) %>% \n separate_wider_delim(name, delim = \"Services\",\n names = c(\"name\", \"direction\")) %>% \n mutate(across(everything(), ~str_trim(.))) %>% \n separate_wider_delim(direction,\n delim = \" \",\n names = c(\"direction\",\n \"motorway\"),\n too_few = \"align_end\") %>% \n add_count(name, motorway) %>% \n filter(n > 1) %>% \n reframe(name = paste(name, \"Services\", direction, motorway),\n pair_name = paste(name, motorway))\n\n\ndata_services_info <- data_wide_services %>% \n left_join(data_paired_services) %>% \n mutate(is_pair = ifelse(is.na(pair_name), FALSE, TRUE))\n\ndata_services_info\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 106 × 14\n name motorway where postcode type operator is_ireland p_charges has_charge\n \n 1 Abing… M74 at J… ML12 6RG Sing… Welcome… FALSE \"Cars fr… \n 2 Annan… A74(M) at J… DG11 1HD Sing… RoadChef FALSE \"Parking… Yes (More…\n 3 Baldo… A1(M) at J… SG7 5TR Sing… Extra M… FALSE \"First t… \n 4 Beaco… M40 at J… HP9 2SE Sing… Extra M… FALSE \n 5 Birch… M62 betw… OL10 2HQ Sepa… Moto FALSE \"Car - £… \n 6 Birch… M11 at J… CM23 5QZ Sing… Welcome… FALSE \"Parking… \n 7 Black… M65 at J… BB3 0AT Sing… Extra M… FALSE \"Parking… \n 8 Blyth… A1(M) at J… S81 8HG Sing… Moto FALSE \"Free fo… Yes (More…\n 9 Bothw… M74 betw… G71 8BG Faci… RoadChef FALSE \"Parking… Yes (More…\n10 Bridg… M5 at J… TA6 6TS Sing… Moto FALSE \"Cars - … \n# ℹ 96 more rows\n# ℹ 5 more variables: is_single , is_twin , has_walk ,\n# pair_name , is_pair \n```\n\n\n:::\n:::\n\n\n\n## Getting the locations of the services...\n\nI've gone and got the coords from Google Maps and stored them in an Excel file (because I'm not perfect). Here's a very quick interactive `{leaflet}` map showing where they are:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"sf\")\nlibrary(\"leaflet\")\ndata_raw_service_locations <- read_excel(quarto_here(\"services-locations.xlsx\"))\n\ndata_clean_long_lat <- data_raw_service_locations %>% \n separate_wider_delim(google_pin,\n delim = \",\", \n names = c(\"lat\", \"long\")) %>% \n mutate(long = as.numeric(long),\n lat = as.numeric(lat)) %>% \n select(name, long, lat)\n\n\ndata_sf_service_locs <- data_clean_long_lat %>% \n full_join(data_services_info) %>% \n st_as_sf(coords = c(\"long\", \"lat\"),\n crs = 4326)\n \ndata_sf_service_locs %>% \n filter(is_ireland == FALSE) %>% \n leaflet() %>% \n addProviderTiles(providers$OpenStreetMap) %>% \n addCircleMarkers()\n```\n\n::: {.cell-output-display}\n\n```{=html}\n
\n\n```\n\n:::\n:::\n\n\n\n## Operators\n\nI want to make sure we're not over counting operators due to pair sites! So let's make some explicit counts.\n\n- n_named_sites_all: How many uniquely named sites are there across Great Britian? `Gloucester Northbound Services M5` and `Gloucester Southbound Services M5` are distinctly named, but the `Birch Services M62` is listed once despite being a twinned site.\n\n- n_named_sites_mainland: same as above but discounting services in Ireland\n\n- n_named_sites_ireland: only counts uniquely named services in Ireland\n\n- n_single_sites: How many sites are accessible by traffic in both directions\n\n- n_twins: How many sites are twinned, two locations on each side of the motorway with or without a walkway between them\n\n- n_pairs: How many sites have paired names, eg `Gloucester Northbound Services M5` and `Gloucester Southbound Services M5`\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_process_ops_single <- data_services_info %>% \n select(name, operator, is_single) %>% \n count(operator, is_single) %>% \n filter(is_single == TRUE) %>% \n reframe(operator,\n n_single_sites = n)\n\ndata_process_ops_twins <- data_services_info %>% \n count(operator, is_twin) %>% \n filter(is_twin == TRUE) %>% \n reframe(operator,\n n_twins = n)\n\ndata_process_ops_pair <- data_services_info %>% \n select(name, operator, is_pair) %>% \n count(operator, is_pair) %>% \n filter(is_pair == TRUE) %>% \n reframe(operator,\n n_pairs = n)\n\ndata_process_ops_simple_all <- data_services_info %>% \n count(operator, name = \"n_named_sites_all\") \n\ndata_process_ops_simple_ireland <- data_services_info %>% \n filter(str_detect(name, \"Ireland\")) %>% \n count(operator, name = \"n_named_sites_ireland\") \n\n\ndata_operators <- data_process_ops_simple_all %>% \n left_join(data_process_ops_simple_ireland) %>% \n left_join(data_process_ops_single) %>% \n left_join(data_process_ops_twins) %>% \n left_join(data_process_ops_pair) %>% \n mutate(across(everything(), ~replace_na(.x, 0))) %>% \n mutate(n_named_sites_mainland = n_named_sites_all - n_named_sites_ireland) %>% \n select(\n operator,\n n_named_sites_all,\n n_named_sites_mainland,\n n_named_sites_ireland,\n everything())\n\ndata_operators\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 9 × 7\n operator n_named_sites_all n_named_sites_mainland n_named_sites_ireland\n \n1 Applegreen 3 0 3\n2 BP Connect 1 1 0\n3 Euro Garages 2 2 0\n4 Extra MSA 7 7 0\n5 Moto 39 39 0\n6 RoadChef 20 20 0\n7 Stop 24 1 1 0\n8 Welcome Break 27 27 0\n9 Westmorland 6 6 0\n# ℹ 3 more variables: n_single_sites , n_twins , n_pairs \n```\n\n\n:::\n:::\n\n\n\n## Exporting all that good data\n\nI'd really love this dataset to become a Tidy Tuesday dataset! So while writing this post [I've created a fork of the repo](https://github.com/charliejhadley/tidytuesday/tree/Motorway-Services-UK/data/curated/motorway-services-uk). If my eventual pull request gets accepted we'd be able to pull the data from the official TidyTuesday repo, but until then it's available as follows\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nhead(read_csv(\"https://raw.githubusercontent.com/charliejhadley/tidytuesday/refs/heads/Motorway-Services-UK/data/curated/motorway-services-uk/data_service_locations.csv\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\n# A tibble: 6 × 15\n name long lat motorway where postcode type operator p_charges has_charge\n \n1 Abin… -3.70 55.5 M74 at J… ML12 6RG Sing… Welcome… \"Cars fr… \n2 Anna… -3.41 55.2 A74(M) at J… DG11 1HD Sing… RoadChef \"Parking… Yes (More…\n3 Bald… -0.205 52.0 A1(M) at J… SG7 5TR Sing… Extra M… \"First t… \n4 Beac… -0.630 51.6 M40 at J… HP9 2SE Sing… Extra M… \n5 Birc… -2.23 53.6 M62 betw… OL10 2HQ Sepa… Moto \"Car - £… \n6 Birc… 0.192 51.9 M11 at J… CM23 5QZ Sing… Welcome… \"Parking… \n# ℹ 5 more variables: is_single , is_twin , has_walk ,\n# pair_name , is_pair \n```\n\n\n:::\n:::\n\n\n\n\n\n## Let's make a map\n\nI've obtained some [high quality shapefiles for the UK from the ONS](https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2023-boundaries-uk-bfc-2/about) which I'm going to immediately start throwing information away from.\n\n- There aren't any true service stations in Northern Ireland, so we'll include only England, Scotland and Wales\n\n- There are only service stations on the mainland! So let's discount any polygon with an area smaller than 1E10m^2\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\ndata_sf_uk <- read_sf(quarto_here(\"Countries_December_2023_Boundaries_UK_BFC_-5189344684762562119/\"))\n\ndata_sf_gb_mainland <- data_sf_uk %>% \n filter(CTRY23NM != \"Northern Ireland\") %>% \n st_cast(\"POLYGON\") %>% \n mutate(area = as.numeric(st_area(geometry))) %>% \n filter(area >= 1E10) %>%\n # st_union() %>%\n st_as_sf()\n\ndata_sf_gb_mainland\n```\n\n::: {.cell-output .cell-output-stdout}\n\n```\nSimple feature collection with 3 features and 9 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: 134112.4 ymin: 11429.67 xmax: 655653.8 ymax: 976859.9\nProjected CRS: OSGB36 / British National Grid\n# A tibble: 3 × 10\n CTRY23CD CTRY23NM CTRY23NMW BNG_E BNG_N LONG LAT GlobalID \n* \n1 E92000001 England Lloegr 394883 370883 -2.08 53.2 ea73ad5d-1f4e-4f07-8e0…\n2 S92000003 Scotland Yr Alban 277744 700060 -3.97 56.2 f2267107-2e4a-442e-bc8…\n3 W92000004 Wales Cymru 263405 242881 -3.99 52.1 d818bd0d-8e08-446f-889…\n# ℹ 2 more variables: geometry , area \n```\n\n\n:::\n:::\n\n\n\nIt takes a fair amount of time to plot, so I can use `{rmapshaper} to simplify the borders, which look okay:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"rmapshaper\")\n\ndata_sf_simpler_mainland <- ms_simplify(data_sf_gb_mainland, keep = 0.0005)\n\nggplot() +\n geom_sf(data = data_sf_simpler_mainland) +\n geom_sf(data = filter(data_sf_service_locs, is_ireland == FALSE )) +\n coord_sf(crs = 4326,\n ylim = c(50, 59))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-20-1.png){width=672}\n:::\n:::\n\n\n\nLet's build towards an okay looking chart:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"patchwork\")\nlibrary(\"ggtext\")\n\ndata_plot_services <- data_sf_service_locs %>% \n filter(is_ireland == FALSE) %>% \n left_join(select(data_operators, operator, n_named_sites_all)) %>% \n mutate(operator = fct_reorder(operator, n_named_sites_all))\n\ncolour_motorway_blue <- \"#3070B5\"\n\ngg_services_roadless <- ggplot() +\n geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),\n fill = colour_motorway_blue,\n colour = \"white\",\n linewidth = 0.8) +\n geom_sf(data = st_transform(data_plot_services, crs = 4326),\n aes(fill = operator),\n pch = 21,\n size = 3.5,\n colour = \"white\") +\n geom_richtext(aes(x = -9,\n y = 54,\n label = \"Tiredness can kill
Take a break\"),\n family = \"Transport\",,\n fill = \"transparent\",\n label.color = NA,\n colour = \"white\"\n ) +\n scale_fill_brewer(palette = \"Dark2\") +\n guides(fill = guide_legend(\n # override.aes = list(size = 8), \n title = \"\", reverse = TRUE)\n ) +\n scale_x_continuous(labels = scales::label_number(accuracy = 0.01)) +\n scale_y_continuous(labels = scales::label_number(accuracy = 0.01)) +\n coord_sf(crs = 4326,\n ylim = c(50, 59),\n xlim = c(-12, 1.76)) + \n # theme_classic(base_family = \"Transport\") +\n theme_void(base_family = \"Transport\") +\n theme(legend.text = element_text(colour = \"white\"),\n # legend.spacing.y = unit(2.0, \"cm\"),\n legend.background = element_rect(fill = colour_motorway_blue, colour = \"transparent\"),\n plot.background = element_rect(fill = colour_motorway_blue),\n panel.background = element_blank()\n )\n\ngg_services_roadless\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-21-1.png){width=672}\n:::\n:::\n\n\n\n### Making it look like a motorway sign\n\nThere is a simply beautiful [design guide for UK traffic signs](https://assets.publishing.service.gov.uk/media/5c78f8c7e5274a0ebfec719c/traffic-signs-manual-chapter-07.pdf) that goes into **all** of the detail, for instance:\n\n![](motorway-sign-design.png)\n\nAt some point it could be fun to take all of this and convert it into a `{ggplot2}` theme - but that's a lot of work. I want to focus on getting that nice round white border on my chart. That's more difficult than I originally thought, there are two pathways:\n\n- Fiddle around with grobs thanks to [Claus Wilke's great StackOverflow Answer](https://stackoverflow.com/a/48220347/1659890) on adding round corners to the panel border.\n\n- Shove a rounded rectangle onto the chart through the `geom_rrect()` function from `{ggchicklet}`... which is much easier:\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlibrary(\"ggchicklet\") # remotes::install_github(\"hrbrmstr/ggchicklet\")\ngg_services_roadless +\n geom_rrect(aes(xmin = -12, xmax = 1.76, ymin = 50, ymax = 59),\n fill = \"transparent\",\n colour = \"white\",\n r = unit(0.1, 'npc'))\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-22-1.png){width=672}\n:::\n:::\n\n\n\nNow let's rebuild the chart and set the sizing to work well on export :)\n\n\n\n::: {.cell}\n\n```{.r .cell-code}\nlims_x <- list(min = -12.9, max = 1.86)\nlims_y <- list(min = 50.2, max = 58.5)\n\ngg_services_roadless <- ggplot() +\n geom_rrect(aes(xmin = lims_x$min - 0.8, \n xmax = lims_x$max + 0.8, \n ymin = lims_y$min - 0.65, \n ymax = lims_y$max + 0.65),\n fill = colour_motorway_blue,\n colour = \"white\",\n size = 15,\n r = unit(0.1, 'npc')) +\n geom_sf(data = st_transform(data_sf_simpler_mainland, crs = 4326),\n fill = colour_motorway_blue,\n colour = \"white\",\n linewidth = 0.8) +\n geom_sf(data = st_transform(data_plot_services, crs = 4326),\n aes(fill = operator),\n pch = 21,\n size = 3.5,\n colour = \"white\") +\n geom_richtext(aes(x = -8.5,\n y = 53.7,\n label = \"Tiredness can kill
Take a break\"), \n size = 20,\n family = \"Transport\",,\n fill = \"transparent\",\n label.color = NA,\n colour = \"white\"\n ) +\n scale_fill_brewer(palette = \"Dark2\") +\n guides(fill = guide_legend(override.aes = list(size = 8), title = \"\", reverse = TRUE)) +\n scale_x_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = 1)) +\n scale_y_continuous(labels = scales::label_number(accuracy = 0.01), expand = expansion(add = c(1, 1))) +\n coord_sf(crs = 4326,\n ylim = as.numeric(lims_y),\n xlim = as.numeric(lims_x)) + \n # theme_classic(base_family = \"Transport\") +\n theme_void(base_family = \"Transport\") +\n theme(legend.position=c(.85,.75),\n legend.text = element_text(colour = \"white\", size = 20),\n legend.spacing.y = unit(2.0, \"cm\"),\n legend.key.size = unit(1.7, \"cm\"),\n legend.background = element_rect(fill = \"transparent\", colour = \"transparent\"),\n plot.background = element_rect(fill = \"grey90\", colour = \"transparent\"),\n panel.background = element_blank(),\n plot.margin = margin(t = 1, r = 0, b = 1, l = 0)\n )\n\ngg_services_roadless\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-23-1.png){width=672}\n:::\n\n```{.r .cell-code}\nggsave(quarto_here(\"gg_services_roadless_simple.png\"),\n gg_services_roadless,\n width = 2 * 7.2,\n height = 2 * 7.5,\n bg = \"grey90\")\n```\n:::\n\n\n\n![](gg_services_roadless_simple.png)", "supporting": [ "index_files" ], diff --git a/_freeze/posts/2024-11-XX_motorway-service-stations_fancy/index/execute-results/html.json b/_freeze/posts/2024-11-XX_motorway-service-stations_fancy/index/execute-results/html.json index 916f174..9ffd9b7 100644 --- a/_freeze/posts/2024-11-XX_motorway-service-stations_fancy/index/execute-results/html.json +++ b/_freeze/posts/2024-11-XX_motorway-service-stations_fancy/index/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "b6c27b9a16c32237f3c1e6413094e1ae", + "hash": "33a8d7e746a46111239cec49685e4013", "result": { "engine": "knitr", - "markdown": "---\ntitle: \"Star Trekking in Trek\"\ndate: '2024-11-08'\ndraft: true\nexecute:\n freeze: true\n message: false\n warning: false\n echo: false\n eval: false\ncode-fold: false\nengine: knitr\nfilters:\n - line-highlight\n---\n\n\n\n\n\n\nI've been looking for an opportunity to experiment with charts that use glow to expose borders and stumbled on the idea of visualising UK motorways via the service stations. Let's get going with the datasets I'll need.\n\n## UK Road Network\n\nThe Ordance Survey makes available a huge dataset containing UK roads. Let's download, unzip and read in the road links. Please note this generates an `{sf}` object that's >4Gb in size.\n\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\nWhat kinds of road do we have?\n\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\nLet's extract out the motorways and see if we can visualise them.\n\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\n### Great Britain mainland\n\nI'm interested in looking at only the mainland of Great Britain. The ONS provides high quality data from here https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2023-boundaries-uk-bfc-2/about. I'm going to extract the mainland by st_cast(\"POLYGON\") and discounting polygons with an area smaller than 2e10 m^2\n\n\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n\n\n### Service stations\n\nI'm finding it harder to find a dataset of the motorway service station locations! I've manually downloaded files from https://www.motorwayservices.info/list/name and will extract info from them\n\n\n\n\n::: {.cell}\n\n:::\n", + "markdown": "---\ntitle: \"Beacons of restrooms\"\ndate: '2024-11-08'\ndraft: true\nexecute:\n freeze: true\n message: false\n warning: false\n echo: false\n eval: false\ncode-fold: false\nengine: knitr\nfilters:\n - line-highlight\n---\n\n\n\n\n\nI'm really interested in \n\n## UK Road Network\n\nThe Ordance Survey makes available a huge dataset containing UK roads. Let's download, unzip and read in the road links. Please note this generates an `{sf}` object that's >4Gb in size.\n\n\n\n::: {.cell}\n\n:::\n\n\n\nTo keep this folder small, let's delete the unused files\n\n\n\n::: {.cell}\n\n:::\n\n\n\n\nWhat kinds of road do we have?\n\n\n\n::: {.cell}\n\n:::\n\n\n\nLet's extract out the motorways and see if we can visualise them.\n\n\n\n::: {.cell}\n\n:::\n\n\n\n### Great Britain mainland\n\nI'm interested in looking at only the mainland of Great Britain. The ONS provides high quality data from here https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2023-boundaries-uk-bfc-2/about. I'm going to extract the mainland by st_cast(\"POLYGON\") and discounting polygons with an area smaller than 2e10 m^2\n\n\n\n::: {.cell}\n\n:::\n\n::: {.cell}\n\n:::\n\n\n\n### Service stations\n\nI'm finding it harder to find a dataset of the motorway service station locations! I've manually downloaded files from https://www.motorwayservices.info/list/name and will extract info from them\n\n\n\n::: {.cell}\n\n:::\n", "supporting": [], "filters": [ "rmarkdown/pagebreak.lua" diff --git a/_site/blog.html b/_site/blog.html index 2811178..d12d389 100644 --- a/_site/blog.html +++ b/_site/blog.html @@ -198,7 +198,7 @@ +
Categories
All (23)
30DayChartChallenge 2022 (2)
Data visualisation (2)
GIS (2)
R (1)
dataviz (5)
reproducible research (1)
shiny (2)
@@ -227,7 +227,35 @@

Blog

-
+
+
+

+

+

+
+ + +
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+
-
+

diff --git a/_site/blog.xml b/_site/blog.xml index 74548d8..f698cd9 100644 --- a/_site/blog.xml +++ b/_site/blog.xml @@ -10,110 +10,7 @@ Data Science Consultancy & Training quarto-1.6.32 -Fri, 08 Nov 2024 00:00:00 GMT - - Pros, Cons and Neutrals lists? - Charlie Hadley - https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/ - I thought it would be interesting for me to keep better track of the Pros, Cons and Neutrals that I discover/decide on and to find a way to visualise these nicely. The idea came to me when I was wanting to create a new Quarto blogpost and googled for a quick solution to find
this issue where it’s noted

-
-

But a current limitation is that all are R focused rather than general quarto CLI tooling.

-
-

That gets to the heart of something that could put off some R users migrating from RMarkdown to Quarto… but also that’s the whole point of Quarto to be cross-platform.

-

Right! So what would a Pros / Cons / Neutrals list look like? Well, {bslib} has nice cards available. So it could be something like this:

-
-
-
-
-
-
-✅ -Quarto installs with few permissions -
-
-

Makes it really easy to use in training.

-
- - - - - -
-
-
-
-
-
-
-
-
-😐 -Quarto quick tools are difficult to make as it's a CLI! -
-
-

RMarkdown has lovely features like `blogdown::new_post()` that can't really be added to Quarto as it's a CLI! Usability tools would need to be added to the CLI instead of the wrapping package.

-
- - -
-
-
-
-
-
-
-
-
-❌ -Quarto occassionally summons demons -
-
-

It doesn't, that's a lie.

-
- - -
-
-
-
-
-
-

Feature creep

-

Ever heard of feature creep? She’s a beast.

-

I decided the fastest way to record these would be in a Google Sheet that I can then read easily into a Shiny app. But then I thought - I’d love to MoSCoW this. Which then led to me building up a bunch of data validation rules:

-

… and gosh, I’d discovered I was procrastinating. I’m really aiming to up my data blogging output and feel part of the tech community again. So, let’s settle with something that’s workable… an iOS note that I can also modify on my laptop.

-

- - -
- -

Reuse

Citation

BibTeX citation:
@online{hadley2024,
-  author = {Hadley, Charlie},
-  title = {Pros, {Cons} and {Neutrals} Lists?},
-  date = {2024-11-08},
-  url = {https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/},
-  langid = {en}
-}
-
For attribution, please cite this work as:
-Hadley, Charlie. 2024. “Pros, Cons and Neutrals Lists?” -November 8, 2024. https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/. -
]]> - https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/ - Fri, 08 Nov 2024 00:00:00 GMT - +Fri, 22 Nov 2024 00:00:00 GMT Data Quest: Motorway Services UK Charlie Hadley @@ -141,7 +38,7 @@ November 8, 2024.
Code -
library(head()
-
[1] "Abington Services M74 - Motorway Services Information.html"          
+
[1] "Abington Services M74 - Motorway Services Information.html"          
 [2] "Annandale Water Services A74(M) - Motorway Services Information.html"
 [3] "Baldock Services A1(M) - Motorway Services Information.html"         
 [4] "Beaconsfield Services M40 - Motorway Services Information.html"      
@@ -178,7 +75,7 @@ font-style: inherit;">head()
Code -
library(str_trim(value))
 example_abingdon
-
# A tibble: 18 × 2
+
# A tibble: 18 × 2
    property                       value                                         
    <chr>                          <chr>                                         
  1 Motorway                       "M74"                                         
@@ -273,7 +170,7 @@ font-style: inherit;">str_trim(value))
 
Code -
read_motorway_services_info 
read_motorway_services_info <- read_motorway_services_info()
-
# A tibble: 19 × 3
+
# A tibble: 19 × 3
    property                       value                          service_station
    <chr>                          <chr>                          <chr>          
  1 Motorway                       A1(M)                          Baldock Servic…
@@ -392,7 +289,7 @@ font-style: inherit;">read_motorway_services_info()
Code -
data_raw_services 
data_raw_services <- read_motorway_services_info(.x))
Code -
data_raw_services 
data_raw_services %>% 
   pull(value)
-
[1] "Separate facilities for each carriageway, but linked by a pedestrian footbridge"
+
[1] "Separate facilities for each carriageway, but linked by a pedestrian footbridge"

We need a way to identify these stations. It turns out the “Eat-In Food” property is our friend and identifies the 6 twin-site stations:

Code -
vec_eat_in_pairs 
vec_eat_in_pairs <- data_raw_services pull(service_station)
 vec_eat_in_pairs
-
[1] "Northampton Services M1" "Rownhams Services M27"  
+
[1] "Northampton Services M1" "Rownhams Services M27"  
 [3] "Sandbach Services M6"    "Strensham Services M5"  
 [5] "Tibshelf Services M1"    "Watford Gap Services M1"
@@ -495,7 +392,7 @@ font-style: inherit;">pull(service_station)
Code -
data_raw_eat_in 
data_raw_eat_in <- data_raw_services "(?<=Westbound: ).*")
 data_raw_directional_eat
-
# A tibble: 12 × 5
+
# A tibble: 12 × 5
    property    value                       service_station directional direction
    <chr>       <chr>                       <chr>           <lgl>       <chr>    
  1 Eat-In Food "Costa, Restbite, The Burg… Northampton Se… TRUE        Northbou…
@@ -652,7 +549,7 @@ font-style: inherit;">"(?<=Westbound: ).*")
 
Code -
data_raw_directional_eat 
data_raw_directional_eat <- data_raw_directional_eat -directional)
Code -
fn_fix_value_columns 
fn_fix_value_columns <- retailer = value,
 
Code -
data_long_other_shops_directionless 
data_long_other_shops_directionless <- data_raw_services "24hr Gamezone WHSmith & Waitrose")
 
Code -
## Expand out the twins
 data_long_other_shops_w_direction retailer = value)
Code -
data_long_retailers 
data_long_retailers <- "retailer_types.csv"))
Code -
library(NA
 data_services_retailers
-
# A tibble: 650 × 8
+
# A tibble: 650 × 8
    retailer service_station direction is_food_retailer is_restaurant is_takeaway
    <chr>    <chr>           <chr>     <lgl>            <lgl>         <lgl>      
  1 Starbuc… Abington Servi… Directio… TRUE             NA            TRUE       
@@ -1609,7 +1506,7 @@ font-style: inherit;">NA
 
Code -
data_wide_services 
data_wide_services <- data_raw_services has_walk = has_walkway_between_twins
 
Code -
data_paired_services 
data_paired_services <- data_wide_services TRUE))
 data_services_info
-
# A tibble: 106 × 14
+
# A tibble: 106 × 14
    name   motorway where postcode type  operator is_ireland p_charges has_charge
    <chr>  <chr>    <chr> <chr>    <chr> <chr>    <lgl>      <chr>     <chr>     
  1 Abing… M74      at J… ML12 6RG Sing… Welcome… FALSE      "Cars fr… <NA>      
@@ -1983,7 +1880,7 @@ font-style: inherit;">TRUE))
 
Code -
library(addCircleMarkers()
-
- +
+
@@ -2121,7 +2018,7 @@ font-style: inherit;">addCircleMarkers()
Code -
data_process_ops_single 
data_process_ops_single <- data_services_info everything())
 data_operators
-
# A tibble: 9 × 7
+
# A tibble: 9 × 7
   operator      n_named_sites_all n_named_sites_mainland n_named_sites_ireland
   <chr>                     <int>                  <int>                 <int>
 1 Applegreen                    3                      0                     3
@@ -2330,7 +2227,7 @@ font-style: inherit;">everything())
 
Code -
head("https://raw.githubusercontent.com/charliejhadley/tidytuesday/refs/heads/Motorway-Services-UK/data/curated/motorway-services-uk/data_service_locations.csv"))
-
# A tibble: 6 × 15
+
# A tibble: 6 × 15
   name    long   lat motorway where postcode type  operator p_charges has_charge
   <chr>  <dbl> <dbl> <chr>    <chr> <chr>    <chr> <chr>    <chr>     <chr>     
 1 Abin… -3.70   55.5 M74      at J… ML12 6RG Sing… Welcome… "Cars fr… <NA>      
@@ -2363,7 +2260,7 @@ font-style: inherit;">"https://raw.githubusercontent.com/charliejhadley/tidytues
 
Code -
data_sf_uk 
data_sf_uk <- st_as_sf()
 data_sf_gb_mainland
-
Simple feature collection with 3 features and 9 fields
+
Simple feature collection with 3 features and 9 fields
 Geometry type: POLYGON
 Dimension:     XY
 Bounding box:  xmin: 134112.4 ymin: 11429.67 xmax: 655653.8 ymax: 976859.9
@@ -2442,7 +2339,7 @@ Projected CRS: OSGB36 / British National Grid
 
Code -
library(59))
Code -
library(element_blank()
 
Code -
library('npc'))
Code -
lims_x 
lims_x <- 0)
 
Code -
ggsave("grey90")

Reuse

Citation

BibTeX citation:
@online{hadley2024,
   author = {Hadley, Charlie},
   title = {Data {Quest:} {Motorway} {Services} {UK}},
-  date = {2024-11-08},
+  date = {2024-11-22},
   url = {https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/},
   langid = {en}
 }
 
For attribution, please cite this work as:
Hadley, Charlie. 2024. “Data Quest: Motorway Services UK.” -November 8, 2024. https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/. +November 22, 2024. https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/.
]]> https://visibledata.co.uk/posts/2024-11-13_motorway-service-stations-info/ + Fri, 22 Nov 2024 00:00:00 GMT + + + Pros, Cons and Neutrals lists? + Charlie Hadley + https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/ + I thought it would be interesting for me to keep better track of the Pros, Cons and Neutrals that I discover/decide on and to find a way to visualise these nicely. The idea came to me when I was wanting to create a new Quarto blogpost and googled for a quick solution to find this issue where it’s noted

+
+

But a current limitation is that all are R focused rather than general quarto CLI tooling.

+
+

That gets to the heart of something that could put off some R users migrating from RMarkdown to Quarto… but also that’s the whole point of Quarto to be cross-platform.

+

Right! So what would a Pros / Cons / Neutrals list look like? Well, {bslib} has nice cards available. So it could be something like this:

+
+
+
+
+
+
+✅ +Quarto installs with few permissions +
+
+

Makes it really easy to use in training.

+
+ + + + + +
+
+
+
+
+
+
+
+
+😐 +Quarto quick tools are difficult to make as it's a CLI! +
+
+

RMarkdown has lovely features like `blogdown::new_post()` that can't really be added to Quarto as it's a CLI! Usability tools would need to be added to the CLI instead of the wrapping package.

+
+ + +
+
+
+
+
+
+
+
+
+❌ +Quarto occassionally summons demons +
+
+

It doesn't, that's a lie.

+
+ + +
+
+
+
+
+
+

Feature creep

+

Ever heard of feature creep? She’s a beast.

+

I decided the fastest way to record these would be in a Google Sheet that I can then read easily into a Shiny app. But then I thought - I’d love to MoSCoW this. Which then led to me building up a bunch of data validation rules:

+

… and gosh, I’d discovered I was procrastinating. I’m really aiming to up my data blogging output and feel part of the tech community again. So, let’s settle with something that’s workable… an iOS note that I can also modify on my laptop.

+

+ + +
+ +

Reuse

Citation

BibTeX citation:
@online{hadley2024,
+  author = {Hadley, Charlie},
+  title = {Pros, {Cons} and {Neutrals} Lists?},
+  date = {2024-11-08},
+  url = {https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/},
+  langid = {en}
+}
+
For attribution, please cite this work as:
+Hadley, Charlie. 2024. “Pros, Cons and Neutrals Lists?” +November 8, 2024. https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/. +
]]>
+ https://visibledata.co.uk/posts/2024-11-08_positives-and-negatives-lists/ Fri, 08 Nov 2024 00:00:00 GMT
diff --git a/_site/listings.json b/_site/listings.json index 612f42b..cdd073e 100644 --- a/_site/listings.json +++ b/_site/listings.json @@ -2,6 +2,7 @@ { "listing": "/blog.html", "items": [ + "/posts/2024-11-13_motorway-service-stations-info/index.html", "/posts/2024-11-08_positives-and-negatives-lists/index.html", "/posts/2024-11-08_curves-and-stones/index.html", "/posts/2024-10-28_bordering-country-graph/index.html", diff --git a/_site/posts/2024-11-13_motorway-service-stations-info/index.html b/_site/posts/2024-11-13_motorway-service-stations-info/index.html index 297c421..976083e 100644 --- a/_site/posts/2024-11-13_motorway-service-stations-info/index.html +++ b/_site/posts/2024-11-13_motorway-service-stations-info/index.html @@ -7,7 +7,7 @@ - + Data Quest: Motorway Services UK – Visible Data