trajectory-datasets

A curated datasets list of raw trajectories that can be used for trajectory classification.

Datasets

Name	Description	Availability	Classification Goal
`geolife`	Records of people outdoor movements	microsoft.com	Transportation mode: walk • bike • bus • car • subway • train • airplane • boat • run • motorcycle
`animals`	Elk, deer and cattle dataset. Starkey Project	github.com	Animal species Elk • Deer • Cattle
`hurdat2`	Atlantic hurricane database	nhc.noaa.gov	Huracane intensity (Saffir-Simpson scale): 0 • 1 • 2 • 3 • 4 • 5 (Zero means was not a huracane)
`mnist_stroke`	Sequences of strokes representing handwritten digits	edwin-de-jong.github.io	Decimal digits: 1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 0
`cma_bst`	Western North Pacific tropical cyclone database	tcdata.typhoon.org.cn	Huracane intensity: Weaker or unknown • Tropical Depression • Tropical Storm • Severe Tropical Storm • Typhoon • Severe Typhoon • Super Typhoon • Extratropical Cyclone
`uci_gotrack`	Cars and buses GPS trayectories	ics.uci.edu	Transportation mode: bus • car
`uci_pen_digits`	Pen-Based Recognition of Handwritten Digits	ics.uci.edu	Decimal digits: 1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 0
`uci_characters`	Character Trajectories Data Set	ics.uci.edu	Characters: a • b • c • d • e • g • h • l • m • n • o • p • q • r • s • u • v • w • y • z
`uci_movement_libras`	LIBRAS (brazilian signal language) movement dataset	ics.uci.edu	Movement type: curved swing • horizontal swing • vertical swing • anti-clockwise arc • clockwise arc • circle • horizontal straight-line • vertical straight-line • horizontal zigzag • vertical zigzag • horizontal wavy • vertical wavy • face-up curve • face-down curve • tremble
`traffic`	Traffic dataset over a road section	zen-traffic-data	Vehicle type: normal • large
`stochastic_models`	Trajectories generated using statistical models	here	Model used: Random Walk • Langevin Ecquation • Diffusing Diffusivity

Standarized versions of the above datasets

Since each dataset offers its data in a quite different format, we included some scripts for fetching and transforming them into a standard format. This will allow testing different analysis tools independently of the dataset.

A sandarized version is a single json file for each dataset, that contains the following keys:

name: Name of the dataset
version: Integer that indicates the version
trajs: List of trajectories contained in the dataset
labels: List of the labels associated to each trajectory

The standarized versions of the datasets are available in the releases page of this repository. Moreover, you could generate the standarized versions yourself by cloning this repo, and running build.py.

Loading trajectories from standarized datasets

Since the standarized format is a plain-text json file, it can be loaded in a vast variety of programming languages and json-compatible tools.

However, we recommend you to use yupi to load the datasets if you are using Python. A sample script could be:

import json
import yupi

with open('geolife.json', "r", encoding="utf-8") as f:
    dataset = json.load(f)
    name, version = dataset['name'], dataset['version']
    trajs = [yupi.core.JSONSerializer.from_json(traj) for traj in dataset['trajs']]
    labels = dataset['labels']

This approach will populate trajs as a list of yupi.Trajectory objects, which you can use with all the resources offered by yupi library.

If you are planning to use a dataset for Trajectory Classification, you could use pactus library instead of yupi. It is a framework designed to evaluate Trajectory Classification methods and it is (and will always be) compatible with all the datasets in this repository, by simply doing for example:

import pactus

geolife_dataset = pactus.Dataset.geolife()  
trajs, labels = geolife_dataset.trajs, geolife_dataset.labels

You don't need to download the dataset in advance. The library will do it for you only the first time you use a dataset. Here, trajs is also a list of yupi.Trajectory objects.

Adding datasets to this repository

New datasets are always welcome to this repository. We only need to ensure that those can be freely accessed and are relevant for raw-trajectory classification.

If you know about a potentially interesting dataset which is not already in this repository, you can open a Github Issue providing the information and we will integrate it as soon as possible.

Otherwise, you can integrate it yourself by:

Forking this project.
Writting a 'recipe' for the dataset. A Python script that downloads the original dataset and converts it to the standarized version. You can take a look at the existing recipes into the recipies folder.
Store your recipe script in the recipies folder and run build.py.
Make sure you got no errors and build.py successfully generated your compressed json file in the builds folder.
Add the dataset metadata to the table at the begining of this README.
Create a pull-request from your fork.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
recipes		recipes
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.py		build.py
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trajectory-datasets

Datasets

Standarized versions of the above datasets

Loading trajectories from standarized datasets

Adding datasets to this repository

About

Releases 6

Packages

Contributors 3

Languages

License

yupidevs/trajectory-datasets

Folders and files

Latest commit

History

Repository files navigation

trajectory-datasets

Datasets

Standarized versions of the above datasets

Loading trajectories from standarized datasets

Adding datasets to this repository

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 3

Languages

Packages