Skip to content

Example datasets, used for testing Wukong -- and in many cases useful beyond that

Notifications You must be signed in to change notification settings

DoctorKhan/wukong-example-data

 
 

Repository files navigation

Example Data for Wukong and friends

This repo holds example datasets for testing Wukong -- and in many cases useful beyond that.

Additional datasets

To keep the git repo from bloating too much, some datasets are put up as downloads and not versioned directly.

  • geo/wikigrounder_toponyms (5 MB/23 MB) -- grounded place names with rough categories (county, province, etc) download

Contents

  • wikipedia

    • wikipedia_articles -- article text
    • wikipedia_pageinfos -- article metadata
    • wikipedia_pagelinks -- pagelinks
    • wikipedia_pageviews -- pageview counts by hour
    • (geolocated)
    • (geoimplied)
    • (dbpedia)
  • geo

  • scaffold

    • fakered_customer_data
    • integers
    • lorem
  • airline flights

    • airline_flights
    • airline_airports
    • airline_airlines
    • airline_airfares
  • weather

    • weather_hourly -- hourly global
    • weather_stations -- weather stations
  • access logs

    • weblogs_waxydotorg
    • weblogs_worldcup

soon:

  • words

    • twl/CSW(sowpods) -- lang/corpora/scrabble
    • BNC --
    • quackle -- misc/words_quackle
    • wordnet
    • dirty_words
    • nltk
    • stopwords
    • color_names
  • text

    • short
    • gutenberg
  • UFO sightings

  • retrosheet game logs

    • parks
    • teams
    • franchises
    • players
    • games

About

Example datasets, used for testing Wukong -- and in many cases useful beyond that

Resources

Stars

Watchers

Forks

Packages

No packages published