- Knowledge Sharing - Strava data - Retrieve and use
- Table of Content (ToC)
- Overview
- References
- Quick starter
- Use cases
- Setup
Created by gh-md-toc
This project intends to document requirements and referential material to implement data-driven applications on top of Strava data.
Even though the members of the GitHub organization may be employed by some companies, they speak on their personal behalf and do not represent these companies.
The DataBricks examples project on GitHub explains how to setup PySpark and Jupyter Lab so that Jupyter notebooks use Spark Connect:
- GitHub - DataBricks examples -
- Most of the Jupyter notebook examples in this project make use of Spark Connect. Refer to the above-mentioned project to setup Jupyter, Spark and Spark Connect properly
- Apache Spark - Doc - Spark Connect - Overview
- Apache Spark - Doc - Spark Connect - Quick start
- GitHub - Data Engineering Helpers - Example of Jupyter notebook using Spark Connect
- StackOverflow - Strava API - How to get route image
- Google dev - Polyline algorithm
- Interactive Google utility to decode polylines
- Mark Needham - Display polylines on Leaflet with Python
- Mapbox utility in JavaScript
- Strava interactive application in React
- StackOverflow - Leaflet application to decode Polylines
- StackOverflow - How to connect VueJS and FastAPI
- GitHub - TestDriven.io - Developing a Single Page App with FastAPI and Vue.js
- Medium - How to Create a web app using FastAPI, Vuejs and MongoDB for generating and showcasing images generated by Stability AI API
- Auth0 - Hello World Full-Stack Security: Vue.js v2/JavaScript + FastAPI/Python
- From a dedicated terminal window/tab, launch Spark Connect server.
Note that the
SPARK_REMOTE
environment variable should not be set at this stage, otherwise the Spark Connect server will try to connect to the corresponding Spark Connect server and will therefore not start
$ sparkconnectstart
- From the current terminal/tab, different from the window/tab having launched
the Spark Connect server, launch PySpark from the command-line, which in
turn launches Jupyter Lab
- Follow the details given by PySpark to open Jupyter in a web browser
$ export SPARK_REMOTE="sc://localhost:15002"; pyspark
...
[C 2023-06-27 21:54:04.720 ServerApp]
To access the server, open this file in a browser:
file://$HOME/Library/Jupyter/runtime/jpserver-21219-open.html
Or copy and paste one of these URLs:
http://localhost:8889/lab?token=dd69151c26a3b91fabda4b2b7e9724d13b49561f2c00908d
http://127.0.0.1:8889/lab?token=dd69151c26a3b91fabda4b2b7e9724d13b49561f2c00908d
...
- Open Jupyter in a web browser. For instance, on MacOS:
$ open ~/Library/Jupyter/runtime/jpserver-*-open.html
- Open a notebook, for instance
ipython-notebooks/strava-fetch-and-display-activities.ipynb
(source code on GitHub)- Run the cells. The last cell should display a Leaflet map with the activities/trips retrieved from Strava
- Open the
strava-fetch-and-display-activities.ipynb
notebook (strava-fetch-and-display-activities.ipynb
source code on GitHub)- Run the cells. The last cell should display a Leaflet map with the activities/trips retrieved from Strava
- Authenticate with the Strava API as explained in the
Generate an access token section
- Store the access tokem as an environment variable:
$ export STRAVA_ACCESS_TOKEN="<the-strava-api-access-token>"
- Retrieve the trips from Strava into a CSV file (namely
data/private/strava-activity-polylines.csv
, which is ignored by GitHub):
$ python python/strava-leaflet-app/retrieve-strava-activities.py
$ ls -lFh data/private/strava-activity-polylines.csv
-rw-r--r-- 1 user group 88K Jul 4 16:44 data/private/strava-activity-polylines.csv
-
Retrieve the geographical coordinates of the center of the map. For instance, open Google Maps (https://maps.google.com) and center it on the area where the Strava trips are to be displayed. Copy the geographical coordinates (latitude and longitude) from the URL, as on the following screen capture
-
Update the coordinates in the
python/strava-leaflet-app/templates/leaflet.html
HTML template file for instance with a text editor -
Launch the Python Flask application to decode and display the polylines on an Open Street Map (OSM) Leaflet:
$ python python/strava-leaflet-app/strava-leaflet-app.py
* Serving Flask app 'strava-leaflet-app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5001
Press CTRL+C to quit
- Open the browser on http://localhost:5001 It should display something like
- Authenticate with the Strava API as explained in the
Generate an access token section
- Store the access tokem as an environment variable:
$ export STRAVA_ACCESS_TOKEN="<strava-api-access-token>"
- Retrieve a few details from the Strava profile:
$ curl -s -X GET https://www.strava.com/api/v3/athlete -H "Authorization: Bearer $STRAVA_ACCESS_TOKEN" | jq
{
"id": 123456789,
"username": null,
"resource_state": 2,
"firstname": "John",
"lastname": "Doe",
"bio": null,
"city": "Lille",
"state": "Nord",
"country": "France",
"sex": "M",
"premium": false,
"summit": false,
"created_at": "2023-07-02T15:32:42Z",
"updated_at": "2023-07-03T10:03:39Z",
"badge_type_id": 0,
"weight": 80,
"profile_medium": "https://graph.facebook.com/1234567890123456/picture?height=256&width=256",
"profile": "https://graph.facebook.com/1234567890123456/picture?height=256&width=256",
"friend": null,
"follower": null
}
- Reference: https://developers.strava.com/docs/reference/#api-Activities-getLoggedInAthleteActivities
curl -s -X GET https://www.strava.com/api/v3/athlete/activities -H 'Authorization: Bearer 82305f579f99247b6652e41b9e42b11c4b5e1185' | jq 2>&1 | tee data.private/strava-activities.json
cat data.private/strava-activities.json | jq
[
{
"resource_state": 2,
"athlete": {
"id": 123456789,
"resource_state": 1
},
"name": "Evening Ride",
"distance": 1471.8,
"moving_time": 279,
"elapsed_time": 279,
"total_elevation_gain": 0,
"type": "Ride",
"sport_type": "Ride",
"workout_type": 12,
"id": 9376697877,
"start_date": "2023-07-02T18:31:09Z",
"start_date_local": "2023-07-02T20:31:09Z",
"timezone": "(GMT+01:00) Europe/Paris",
"utc_offset": 7200,
"location_city": null,
"location_state": null,
"location_country": "France",
"achievement_count": 0,
"kudos_count": 6,
"comment_count": 0,
"athlete_count": 1,
"photo_count": 0,
"map": {
"id": "a123456789",
"summary_polyline": "}adtHo`yQ_@{@kA{AcEqGiNgUqEsHyBeDgDeGeDeFoK{P_@q@JUNQHCLHDC",
"resource_state": 2
},
"trainer": false,
"commute": true,
"manual": false,
"private": false,
"visibility": "followers_only",
"flagged": false,
"gear_id": "b12769770",
"start_latlng": [
50.65,
3.08
],
"end_latlng": [
50.66,
3.09
],
"average_speed": 5.275,
"max_speed": 7.2,
...
"has_kudoed": false
},
…
]
- Extract a few details from the JSON file as a CSV file:
$ cat data.private/strava-activities.json | jq -r '.[]|[.id,.start_date,.name,.type,.sport_type,.distance,.elev_high,.elev_low,.moving_time,.elapsed_time,.average_speed,.max_speed,.average_watts,.kilojoules,.average_heartrate,.location_country,.timezone,.utc_offset,.start_latlng[],.end_latlng[],.private,.gear_id]|@csv' | sed -e 's/"//g'
123456789,2023-07-02T18:31:09Z,Evening Ride,Ride,Ride,1471.8,39.4,25.6,279,279,5.275,7.2,57.8,16.1,140.1,France,(GMT+01:00) Europe/Paris,7200,50.65,3.08,50.66,3.09,false,123456789
…
-
Reference: https://developers.strava.com/docs/reference/#api-Activities-getActivityById
-
Launch cURL:
$ curl -s -X GET "https://www.strava.com/api/v3/activities/123456789" -H 'Authorization: Bearer 82305f579f99247b6652e41b9e42b11c4b5e1185' | jq 2>&1 | tee strava-activity-detail.json
{
"resource_state": 3,
"athlete": {
"id": 123456789,
"resource_state": 1
},
"name": "Lunch Gravel Ride",
"distance": 55984.1,
...
"map": {
"id": "a9374290388",
"polyline": "oyfXxxxE@@@",
"resource_state": 3,
"summary_polyline": "_tftXxxxiDpE"
},
"trainer": false,
...
"start_latlng": [
50.66,
3.10
],
"end_latlng": [
50.66,
3.10
],
"average_speed": 5.111,
...
"segment_efforts": [
{
"id": 123456789,
"resource_state": 2,
"name": "Avenue de la République - Croisé-Laroche - Clémenceau (vers l'ouest)",
"activity": {
"id": 123456789,
"resource_state": 1
},
...
"segment": {
"id": 20239919,
"resource_state": 2,
"name": "Avenue de la République - Croisé-Laroche - Clémenceau (vers l'ouest)",
"activity_type": "Ride",
...
},
"pr_rank": 3,
"achievements": [
{
"type_id": 3,
"type": "pr",
"rank": 3
}
],
"hidden": false
},
…
],
"splits_metric": [
{
"distance": 1000.9,
"elapsed_time": 218,
...
"pace_zone": 0
},
…
],
"splits_standard": [
{
"distance": 1610.3,
"elapsed_time": 334,
...
"pace_zone": 0
},
…
],
"laps": [
{
"id": 31984370773,
"resource_state": 2,
"name": "Lap 1",
...
"split": 1
}
],
"gear": {
"id": "123456789",
"primary": false,
"name": "RockRider ST540 S",
...
"converted_distance": 1100.9
},
"photos": {
...
},
"stats_visibility": [
{
"type": "heart_rate",
"visibility": "everyone"
},
...
],
"hide_from_home": false,
"device_name": "Apple Watch SE",
...
}
The quickest to decode a polyline is to go on the dedicated Google utility and:
- Paste the encoded polyline, and/or polyline summary, in the "Encoded Polyline" form field at the bottom of the page
- Check the "Unescape special characters in the encoded polylines" box
- Click on the "Decode Polyline" button at the bottom of the page
- Confirm in the dialog box/pop up window
- If PySpark is to be used with Spark Connect, which makes the whole process more repeatable/industrial, there are some subtleties in the setup process. Follow the instructions on GitHub - DataBricks example peoject for more details.
-
It is recommended to use PyEnv and to install a fairly recent stable Python environment (for instance, at the time of writing, Python 3.10.17 or 3.11.4)
-
Update the Pip utility, if needed:
$ python -mpip install -U pip
- Install a few Python libraries:
$ python -mpip install -U plotly pyvis folium
- Install PySpark:
$ python -mpip install -U pyspark[connect,sql,pandas_on_spark] pytest-spark
- Install JupterLab:
$ python -mpip install -U jupyterlab
- Install a few JupyterLab extensions (e.g., Leaflet):
$ jupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leaflet
To start developing with the Strava API, you will need to make an application
- If you have not already, go to https://www.strava.com/register and sign up for a Strava account.
- After you are logged in, go to https://www.strava.com/settings/api and create an app.
- You should see the “My API Application” page now. Here is what everything
means:
- Category: The category you chose for your application
- Club: Will show if you have a club associated with your application
- Client ID: Your application ID
- Client Secret: Your client secret (please keep this confidential)
- Authorization token: Your authorization token which will change every six hours (please keep this confidential)
- Your Refresh token: The token you will use to get a new authorization token (please keep this confidential)
- Rate limits: Your current rate limit
- Authorization Callback Domain: When building your app, change “Authorization Callback Domain” to localhost or any domain. When taking your app live, change “Authorization Callback Domain” to a real domain.
- The authorization process has to be done just once every so often
- The user (you, me) has to authorize the application (this application we are building here) to use the Strava API. The application is known to Strava API and appears on https://www.strava.com/settings/apps . If this is not the case yet, one can easily register a new application to use the Strava API (on https://www.strava.com/settings/api)
- Be sure to have the following Strava API details at hand,
from https://www.strava.com/settings/api:
- Client ID
- Client secret
- Save those details as environment variables (e.g., in the
~/.bashrc
/~/.zshrc
file):
export STRAVA_CLIENT_ID="<the-strava-client-id>"
export STRAVA_CLIENT_SECRET="<the-strava-client-secret>"
export STRAVA_ACCESS_TOKEN="" # empty for now
export STRAVA_REFRESH_TOKEN="" # empty for now
- Get a Strava API authorization code by opening the following link
(be careful to replace
strava-api-client-id
with your own Strava API client ID, which should read something like 1234567):- Open http://www.strava.com/oauth/authorize?client_id=strava-api-client-id&response_type=code&redirect_uri=http://localhost/exchange_token&approval_prompt=force&scope=read_all,activity:read_all
- Click on the Authorize button
- In the URL of the just opened page (displaying an error), copy the
code
value, it corresponds to the Strava API authorization code
- The Strava API authorization code remains valid for a long time (several days). You can save it along with other passwords (e.g., in a password manager) or in a private MS Word/Google Doc/text document.
- The authorization code is then used to generate both:
- A refresh token, which has roughly the same validity as the authorization code
- An access token, which itself is valid only for a limited period of time (typically, a few hours). That access token may be re-generated as many times as needed thanks to the refresh token. The access token is what the Strava API needs to answer to API requests
Perform either of the following two tasks
- To be performed when the refresh token is deprecated or when there is no refresh token yet
- Use cURL on the command-line to create the access code (and the refresh code):
curl -s -X POST https://www.strava.com/oauth/token -F client_id=$STRAVA_CLIENT_ID -F client_secret=$STRAVA_CLIENT_SECRET -F code=<strava-api-authorization-code> -F grant_type=authorization_code | jq
{
"token_type": "Bearer",
"expires_at": 1688424735,
"expires_in": 19659,
"refresh_token": "1a5b02some-tokena42a15",
"access_token": "7f988some-token950db8",
"athlete": {
"id": 123456789,
"username": null,
"resource_state": 2,
"firstname": "John",
"lastname": "Doe",
"bio": null,
"city": "Lille",
"state": "Nord",
"country": "France",
"sex": "M",
"premium": false,
"summit": false,
"created_at": "2023-07-02T15:32:42Z",
"updated_at": "2023-07-03T10:03:39Z",
"badge_type_id": 0,
"weight": 80,
"profile_medium": "https://graph.facebook.com/1234567890123456/picture?height=256&width=256",
"profile": "https://graph.facebook.com/1234567890123456/picture?height=256&width=256",
"friend": null,
"follower": null
}
}
- Store the access and refresh tokens, copied from above, as environment variables:
export STRAVA_ACCESS_TOKEN="<strava-api-access-token>"
export STRAVA_REFRESH_TOKEN="<strava-api-refresh-token>"
- Use cURL on the command-line to create the access code (and the refresh code):
curl -s -X POST https://www.strava.com/oauth/token -F client_id=$STRAVA_CLIENT_ID -F client_secret=$STRAVA_CLIENT_SECRET -F refresh_token=$STRAVA_REFRESH_TOKEN -F grant_type=refresh_token | jq
{
"token_type": "Bearer",
"access_token": "7f988some-token950db8",
"expires_at": 1689274725,
"expires_in": 12082,
"refresh_token": "1a5b02some-tokena42a15"
}
- Store the access token, copied from above, as an environment variable:
export STRAVA_ACCESS_TOKEN="<strava-api-access-token>"