Skip to content

Latest commit

 

History

History
98 lines (75 loc) · 7.5 KB

README.md

File metadata and controls

98 lines (75 loc) · 7.5 KB

trajectory-datasets

A curated datasets list of raw trajectories that can be used for trajectory classification.

Datasets

Name Description Availability Classification Goal
geolife Records of people outdoor movements microsoft.com Transportation mode:
walk • bike • bus • car • subway • train • airplane • boat • run • motorcycle
animals Elk, deer and cattle dataset. Starkey Project github.com Animal species
Elk • Deer • Cattle
hurdat2 Atlantic hurricane database nhc.noaa.gov Huracane intensity (Saffir-Simpson scale):
0 • 1 • 2 • 3 • 4 • 5
(Zero means was not a huracane)
mnist_stroke Sequences of strokes representing handwritten digits edwin-de-jong.github.io Decimal digits:
1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 0
cma_bst Western North Pacific tropical cyclone database tcdata.typhoon.org.cn Huracane intensity:
Weaker or unknown • Tropical Depression • Tropical Storm • Severe Tropical Storm • Typhoon • Severe Typhoon • Super Typhoon • Extratropical Cyclone
uci_gotrack Cars and buses GPS trayectories ics.uci.edu Transportation mode:
bus • car
uci_pen_digits Pen-Based Recognition of Handwritten Digits ics.uci.edu Decimal digits:
1 • 2 • 3 • 4 • 5 • 6 • 7 • 8 • 9 • 0
uci_characters Character Trajectories Data Set ics.uci.edu Characters:
a • b • c • d • e • g • h • l • m • n • o • p • q • r • s • u • v • w • y • z
uci_movement_libras LIBRAS (brazilian signal language) movement dataset ics.uci.edu Movement type:
curved swing • horizontal swing • vertical swing • anti-clockwise arc • clockwise arc • circle • horizontal straight-line • vertical straight-line • horizontal zigzag • vertical zigzag • horizontal wavy • vertical wavy • face-up curve • face-down curve • tremble
traffic Traffic dataset over a road section zen-traffic-data Vehicle type:
normal • large
stochastic_models Trajectories generated using statistical models here Model used:
Random Walk • Langevin Ecquation • Diffusing Diffusivity

Standarized versions of the above datasets

Since each dataset offers its data in a quite different format, we included some scripts for fetching and transforming them into a standard format. This will allow testing different analysis tools independently of the dataset.

A sandarized version is a single json file for each dataset, that contains the following keys:

  • name: Name of the dataset
  • version: Integer that indicates the version
  • trajs: List of trajectories contained in the dataset
  • labels: List of the labels associated to each trajectory

The standarized versions of the datasets are available in the releases page of this repository. Moreover, you could generate the standarized versions yourself by cloning this repo, and running build.py.

Loading trajectories from standarized datasets

Since the standarized format is a plain-text json file, it can be loaded in a vast variety of programming languages and json-compatible tools.

However, we recommend you to use yupi to load the datasets if you are using Python. A sample script could be:

import json
import yupi

with open('geolife.json', "r", encoding="utf-8") as f:
    dataset = json.load(f)
    name, version = dataset['name'], dataset['version']
    trajs = [yupi.core.JSONSerializer.from_json(traj) for traj in dataset['trajs']]
    labels = dataset['labels']    

This approach will populate trajs as a list of yupi.Trajectory objects, which you can use with all the resources offered by yupi library.

If you are planning to use a dataset for Trajectory Classification, you could use pactus library instead of yupi. It is a framework designed to evaluate Trajectory Classification methods and it is (and will always be) compatible with all the datasets in this repository, by simply doing for example:

import pactus

geolife_dataset = pactus.Dataset.geolife()  
trajs, labels = geolife_dataset.trajs, geolife_dataset.labels

You don't need to download the dataset in advance. The library will do it for you only the first time you use a dataset. Here, trajs is also a list of yupi.Trajectory objects.

Adding datasets to this repository

New datasets are always welcome to this repository. We only need to ensure that those can be freely accessed and are relevant for raw-trajectory classification.

If you know about a potentially interesting dataset which is not already in this repository, you can open a Github Issue providing the information and we will integrate it as soon as possible.

Otherwise, you can integrate it yourself by:

  1. Forking this project.
  2. Writting a 'recipe' for the dataset. A Python script that downloads the original dataset and converts it to the standarized version. You can take a look at the existing recipes into the recipies folder.
  3. Store your recipe script in the recipies folder and run build.py.
  4. Make sure you got no errors and build.py successfully generated your compressed json file in the builds folder.
  5. Add the dataset metadata to the table at the begining of this README.
  6. Create a pull-request from your fork.