Skip to content

This repository offers a collection of descriptions and download links for datasets related to Time Series Analysis, including tasks like forecasting, classification, and anomaly detection.

Notifications You must be signed in to change notification settings

jinkwan1115/TimeSeriesDatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 

Repository files navigation

Hits

TimeSeries Datasets

This repository offers a collection of descriptions and download links for datasets related to Time Series Analysis, including tasks like forecasting, classification, and anomaly detection.

Datasets for Time Series Forecasting

  • Six Base Datasets

    • Electricity Transformer Temperature(ETT) datasets

      • ETTh1, ETTh2, ETTm1, ETTm2 (at 1-hour(or 15-minute) intervals / 6 variables / Target: 'OT')
        • HUFL / HULL
        • MUFL / MULL
        • LUFL / LULL
    • Electricity

    • Weather

    • Exchange

    • Traffic

    • Influenza-like Illness(ILI)

  • Monash Time Series Forecasting Archive (NeurIPS 2021)

    • https://forecastingdata.org
    • M1
      • yearly
      • quarterly
      • monthly
    • M3
      • yearly
      • quarterly
      • monthly
      • other
    • M4
      • yearly
      • quarterly
      • monthly
      • weekly
      • daily
      • hourly
    • NN5
      • daily
        • with missing values
        • without missing values
      • weekly
    • Tourism
    • CIF 2016
    • London Smart Meters
    • Australia Electricity Demand
    • Wind Farms
    • Dominick
    • Bitcoin
    • Pedestrian Counts
    • Vehicle Trips
    • KDD Cup 2018
    • Weather
    • Web Traffic
    • Solar
    • Electricity
    • Car Parts
    • FRED-MD
    • San Francisco Traffic
    • Rideshare
    • Hospital
    • COVID Deaths
    • Temperature Rain
    • Sunspot
    • Saugeen River Flow
    • US Births
    • Solar Power
    • Wind Power

Datasets for Spatio-Temporal Analysis

Datasets for Time Series Anomaly Detection

  • Secure Water Treatment(SWaT) dataset

    • You need to request dataset access from iTrust Labs(https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/#swat)
    • This dataset is collected from a real-world water treatment plant with 7 days of normal and 4 days of abnormal operation
    • This dataset consists of sensor values (water level, flow rate, etc.) and actuator operations (valves and pumps)
  • Water Distribution(WADI) dataset

    • You also need to request dataset access from iTrust Labs(https://itrust.sutd.edu.sg/itrust-labs_datasets/dataset_info/#swat)
    • This is an extension of the SWaT system but had more than twice the number of sensors and actuators than the SWaT model
    • The dataset is also collected for a longer duration of 14 and 2 days of normal and attack scenarios
  • Soil Moisture Active Passive(SMAP) dataset

  • Mars Science Laboratory(MSL) dataset

    • Similar to SMAP, but corresponds to the sensor and actuator data for the Mars rover itself
    • However, this dataset is known to contain many trivial sequences, with three non-trivial ones (A4, C2 and T1)
    • [KDD 2018 paper / NASA JPL] Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding
    • [KDD 2019 paper] Robust Anomaly Detection for Multivariate Time Series through Stochastic Recurrent Neural Network
  • Server Machine Dataset(SMD)

  • Numenta Anomaly Benchmark(NAB)

    • https://github.com/numenta/NAB
    • [Neurocomputing 2017] Unsupervised real-time anomaly detection for streaming data
    • A dataset of multiple real-world data traces, including readings from temperature sensors, CPU utilization of cloud machines, service request latencies and taxi demands in New York city
    • However, this dataset is known to have sequences with incorrect anomaly labels such as the nyc-taxi trace
    • Real data
      • realAWSCloudwatch
        • AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes
      • realAdExchange
        • Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM)
      • realKnownCause
        • This is data for which we know the anomaly causes; no hand labeling.
        • ambient_temperature_system_failure.csv
          • The ambient temperature in an office setting
        • cpu_utilization_asg_misconfiguration.csv
          • From Amazon Web Services (AWS) monitoring CPU usage – i.e. average CPU usage across a given cluster. When usage is high, AWS spins up a new machine, and uses fewer machines when usage is low
        • ec2_request_latency_system_failure.csv
          • CPU usage data from a server in Amazon's East Coast datacenter
          • The dataset ends with complete system failure resulting from a documented failure of AWS API servers(There's an interesting story behind this data in the Numenta blog)
        • machine_temperature_system_failure.csv
          • Temperature sensor data of an internal component of a large, industrial machine
          • The first anomaly is a planned shutdown of the machine
          • The second anomaly is difficult to detect and directly led to the third anomaly, a catastrophic failure of the machine
        • nyc_taxi.csv ${\textsf{\color{red}(exclude this due to incorrect anomaly labels)}}$
          • Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm
          • The raw data is from the NYC Taxi and Limousine Commission
          • The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets
        • rogue_agent_key_hold.csv
          • Timing the key holds for several users of a computer, where the anomalies represent a change in the user
        • rogue_agent_key_updown.csv
          • Timing the key strokes for several users of a computer, where the anomalies represent a change in the user
      • realTraffic
        • Real time traffic data from the Twin Cities Metro area in Minnesota, collected by the Minnesota Department of Transportation
        • Included metrics include occupancy, speed, and travel time from specific sensors
      • realTweets
        • A collection of Twitter mentions of large publicly-traded companies such as Google and IBM
        • The metric value represents the number of mentions for a given ticker symbol every 5 minutes
    • Artificial data
      • artificialNoAnomaly
        • Artificially generated data without any anomalies
      • artificialAnomaly
        • Artificially generated data with varying types of anomalies
  • HexagonML/UCR Time Series Anomaly Detection Datasets

  • PhysioNet Open Access Databases - MIT-BIH Supraventricular Arrhythmia(MBA) Database

  • Multi-Source Distributed System(MSDS) Dataset

  • Synthetic Dataset from MSCRED MSCRED github page

  • Yahoo S5 Dataset Link

    • Need to request for research purpose
    • Computing Systems Data S5(A Labeled Anomaly Detection Dataset)
  • Kaggle Credit Card Fraud Detection Dataset Link

    • The dataset contains transactions made by credit cards in September 2013 by European cardholders
    • This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions
    • The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions
  • HIL-based Augmented ICS (HAI) Security Dataset Link

    • The HAI dataset was collected from a realistic industrial control system (ICS) testbed augmented with a Hardware-In-the-Loop (HIL) simulator that emulates steam-turbine power generation and pumped-storage hydropower generation
  • Drinking Water Quality Dataset(CECCO Challenge 2018) Link

    • For the monitoring of the water quality, the Thüringer Fernwasserversorgung performs measurements at significant points throughout the whole water distribution system, in particular at the outflow of the waterworks and the in- and outflow of the water towers
    • For this purpose, a part of the water is bypassed through a sensor system where the most important water quality indicators are measured
    • Given is the amount of chlorine dioxide in the water, its pH value, the redox potential, its electric conductivity and the turbidity of the water. These values are the water quality indicators, any changes here are considered as events
    • The flow rate and the temperature of the water is considered as operational data, changes in these values may indicate changes in the related quality values but are not considered as events themselves

Datasets for Time Series Classification

About

This repository offers a collection of descriptions and download links for datasets related to Time Series Analysis, including tasks like forecasting, classification, and anomaly detection.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published