Features, prioritized

Beta-phase insights

revisit priority of no-friction-upload vs. enforcing good metadata (possibly move title/description editing to before upload begins, or require to move from preview to indexing-in-progress state)
badge/metadata to indicate verified/CQed/gardener-approved datasets
standard metadata or checkboxes: "was this from a public source or was this created by our people?"; "is this data updated periodically?", "is this safe for publication?", etc
admin-editable PANDA homepage for news, links to other resources, maybe even search boxes (e.g. http://boundaries.tribapps.com)
click on an uploader's name to go to a list of all their datasets (wrinkle: add'l data file added by someone other than original dataset creator)
PANDA scheduled email reminders to write a follow-up FOIA or otherwise obtain updates for existing datasets

Still up for grabs, priority unknown

Permissions/set-level security (like Doc Cloud or LDAP?? got another suggestion... project teams? less like a hierarchy, more like a circle or ad-hoc group)
Sharing between organizations (not sharing the whole PANDA, just parts)
Edit data in PANDA, delete rows, add new columns, etc., read-only lock on a set?
Address normalization (solvable with fuzzy search instead?)
S, M, L sizing, or something like it
Faceted search
Fancy query builder like Doc Cloud
Search data within categories (#473)
Search datasets within a set or intersection of categories (#472)
Export PANDA data to a SQL database (#468)
RSS activity feeds for integration with CMSes and other systems (#469)
Duplicate detection during data import (#467)

Must-have

Import w/ arbitrary delimiters (not just commas)
Import from fixed-width files
Comments on a dataset (#116)
Primitive column types (int, varchar, date, etc.)
Meta type columns
- Address (and address like-stuffs)
In-system metrics. A dashboard for the admins of the PANDA instance, so that they can measure how well it's working inside their organization. (sneaky new feature inserted by Brian as the result of an interesting conversation with some of the folks that Knight asks that I speak with)
Profile stuff (create users, change my password, etc) (#150)
DONE A1 -- Store the original file
DONE A1 -- Data set metadata (source, provenance)
DONE A1 -- Import from CSV
DONE A1 -- Async data import (queuing)
DONE A1 -- Full-text search on a dataset
DONE A2 -- Taxonomy for datasets (categories, tags?)
DONE A2 -- Search dataset metadata (help me find a dataset)
DONE A2 -- Login/users
DONE A3 -- Cumulative data sets via write API
DONE A3 -- Cumulative data sets via write API demo
DONE A3 -- Cumulative data sets via scraperwiki (??)
DONE A3 -- Import from Excel (maybe by explaining people to use CSV, maybe parsing)
DONE A4 -- Cumulative data sets via additional file uploads (maybe this is solved with versioning?)
DONE A4 -- Encrypted communications (SSL)
DONE A4 -- Export a dataset (to csv, xls? etc)
DONE A4 -- Browser compatibility w/ recent versions of modern browsers: FF/Chrome/Safari/IE Beta 9
DONE A4 -- Documents related to the dataset
DONE B1 -- A plan for scaling (how to grow your PANDA)
DONE B1 -- Import wizard/walk through UI
DONE B1 -- Async data export
DONE B1 -- Amazon Machine Image

Want

Document our advanced query language for end users (solr-style)
Date range search
Related stories on a dataset (searchable?)
I18n/L10n
Initial demo data
Export search results (to csv, etc)
Iterative updates to a dataset (quarterly updates, etc. keep the old list)
Version tracking for datasets
Export a subset of a dataset (fewer columns from a wide set, filtered rows, etc)
Google Refine reconciliation endpoint
PANDA-hosted Google Refine
Import localized number formats (1.000, 1 000, 1,000)
IE7 support
Fuzzy name search (Abbreviations, Bill/William) (#476)
Other datasets related to this one (grouping?)
Row-level comments
Meta type columns
- Birthdate
- Phone number
Notifications (email? RSS?) for new data sets, new data in sets, etc.

Gravy

Number range search
Meta type columns
- Location (lat/lng)
- URL
- SSN
- Money
- Organization (name, DUNS, etc)
- User-extensible (make your own, like Illinois school codes)
- Foreign address
Geographic search by shapefiles
Geographic search by any drawn shape
Geographic search by distance
Map the data
Geocode addresses
Canned/saved searches
Import from MDB/Access
Import from shapefile
Import from DBF (#466)
Import from Google Refine, carry the audit trail into PANDA
Import/export to/from Google Docs
Export to Google Fusion Tables
Column statistics (std. dev., sum, etc)
Sysadmin notifications (you're running out of disk! etc.)
Single-click deployment
Automatic upgrades (like wordpress)
Search by taxonomy
De-normalize data / dataset merge (connect a table to its lookup table on import)
Fixtures to import (from the IRE data library, etc)
P13n, store queries that I like to run, etc

Meh

Encrypt all the data
Entity relationships (John Smith in dataset A = John Smith in dataset B, for neat stuff like social network analysis)
RDF , linked data endpoint
Deploy as a hosted service (somebody else can do that once we've written the regular version)
Automated server/resource scaling
Join datasets at runtime (reinvent SQL)
Non-tabular stuff (PDFs, emails, Doc Cloud and Overview Project)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features, prioritized

Beta-phase insights

Still up for grabs, priority unknown

Must-have

Want

Gravy

Meh

Clone this wiki locally