Features, prioritized

The current TODO list for PANDA is below. Much on the 1.0 list will not happen -- we've got a deadline coming fast!

(If there are features that you're dying to see, but look like they're not going to happen on our watch, you're welcome to contribute and we'll try to integrate your code!)

Allllllmost 1.0 (feature freeze)

We've put these features in rough order of priority, done rough estimates, and then totaled the estimate. There are 6 iterations after beta 2 until we hit feature freeze, so the trailing half of this list will likely not be completed before launch.

Feature	Estimate (in iterations)	Countdown	Done
"Welcome to PANDA" upon admin setup (time zone, etc)	1	1	Yep!
First login user welcome screen	0.5	1.5	Yep!
Delete data uploads and associated data	0.5	2	Yep!
User page: my notifications	0.5	2.5	Yep!
Sysadmin notifications (you're running low on disk space, yo!)	0.1	2.6	Yep!
Related files metadata/description	0.2	2.8	Yep!
Search text within just one column	0.2	3	Yep!
Links to exports on user pages	0.1	3.1	Yep!
Search for data across all datasets within a category	1	4.1	Yep!
Fuzzy matching for names	1	5.1	Yep!
Bulk create users	0.5	5.6	Yep!
Dataset metadata -- related stories	0.2	5.8	Yep!
API logging messages	0.5	6.3
Make the login cookie last longer (#639)	0	6.3	Yep!
Translation / i18n (CHANGED!)	2	8.3
Standard metadata (customizable? public source? updated periodically? safe of pub? verified?) (or tags?)	1	9.3
Google Refine reconciliation endpoint	4	13.3
Saved searches (personalization)	0	13.3
Edit(able) column headers	0.5	13.8
Search datasets - use a stemmer (#562)	0.2	14
Admin-editable PANDA home page: news/links/searches	0.1	14.1
User favoriting for datasets	0.25	14.35
preloaded data in PANDA	0.2	14.55
number import i18n	1	15.55
Sync w/ Google Docs	1	16.55
Universal upgrade script	0.5	17.05
backup_volumes.py progress meter	0.1	17.15
Dataset metadata: timeframe ,provenance, etc	0	17.15
Non-arbitrary ordering for datasets across search results	0	17.15
Sort by type-indexed columns (#537)	0	17.15
Column filter type as "factor" or "enumerated" (#541)	0	17.15
Web-configurable Time Zone (#619)	0	17.15	Yep!
Grouped/hierarchal categories	0	17.15

1.0

DONE 1.0 -- The PANDA cookbook

Storming to beta 2

DONE 0.2.0 -- Enforce good metadata (better title/description/etc) in the upload process
DONE 0.2.0 -- Verify available disk space before upload/import
DONE 0.2.0 -- Ability to abort a long-running task from the admin UI
DONE Squash critical bugs
DONE Tidy up the dashboard a bit
DONE 0.2.0 -- Notifications (subscribed searches via email, etc)

The features archive (look above for the currently-maintained list)

user requests

user-created tags for datasets (Tom M)
search only a certain tag (Tom M)
view related files so you dont have to download them (Tom M)
add or edit fields without downloading the data (Tom M)
edit the headers after upload (Tom M)
metadata/descriptions for related files (Tom M and Andy B)
metadata for a dataset: the timeframe this set applies to (Andy B) (can't this just be in the description/title)
DONE B1.4 -- list of data sets uploaded by individual users (Tom M and Andy B)

Beta-phase insights

UP -- revisit priority of no-friction-upload vs. enforcing good metadata (possibly move title/description editing to before upload begins, or require to move from preview to indexing-in-progress state)
UP -- badge/metadata to indicate verified/CQed/gardener-approved datasets
standard metadata or checkboxes: "was this from a public source or was this created by our people?"; "is this data updated periodically?", "is this safe for publication?", etc
admin-editable PANDA homepage for news, links to other resources, maybe even search boxes (e.g. http://boundaries.tribapps.com)
PANDA scheduled email reminders to write a follow-up FOIA or otherwise obtain updates for existing datasets
upload-page guidance on cleaning up datasets in cases when spreadsheets have extraneous formatting/tips to re-order XLS worksheets if first tab is instructions, not data/etc. (Tom M also asked for this)
DONE B1.4 -- click on an uploader's name to go to a list of all their datasets (wrinkle: add'l data file added by someone other than original dataset creator)

Still up for grabs, priority unknown

DOWN -- Permissions/set-level security (like Doc Cloud or LDAP?? got another suggestion... project teams? less like a hierarchy, more like a circle or ad-hoc group) (Tom M requested this too, for read-only users)
DOWN -- Sharing between organizations (not sharing the whole PANDA, just parts)
DOWN -- Edit data in PANDA, delete rows, add new columns, etc., read-only lock on a set?
Address normalization (solvable with fuzzy search instead?)
DOWN -- S, M, L sizing, or something like it
DOWN -- Faceted search
DOWN -- Fancy query builder like Doc Cloud
UP -- Search data within categories (#473)
Search datasets within a set or intersection of categories (#472)
DOWN -- Export PANDA data to a SQL database (#468)
DOWN -- RSS activity feeds for integration with CMSes and other systems (#469)
Duplicate detection during data import (#467)

Must-have

DOWN -- Import w/ arbitrary delimiters (not just commas)
DOWN -- Import from fixed-width files
DOWN -- Comments on a dataset (#116) (Tom M requested this too)
DOWN -- Meta type columns
- Address (and address like-stuffs)
DONE A1 -- Store the original file
DONE A1 -- Data set metadata (source, provenance)
DONE A1 -- Import from CSV
DONE A1 -- Async data import (queuing)
DONE A1 -- Full-text search on a dataset
DONE A2 -- Taxonomy for datasets (categories, tags?)
DONE A2 -- Search dataset metadata (help me find a dataset)
DONE A2 -- Login/users
DONE A3 -- Cumulative data sets via write API
DONE A3 -- Cumulative data sets via write API demo
DONE A3 -- Cumulative data sets via scraperwiki (??)
DONE A3 -- Import from Excel (maybe by explaining people to use CSV, maybe parsing)
DONE A4 -- Cumulative data sets via additional file uploads (maybe this is solved with versioning?)
DONE A4 -- Encrypted communications (SSL)
DONE A4 -- Export a dataset (to csv, xls? etc)
DONE A4 -- Browser compatibility w/ recent versions of modern browsers: FF/Chrome/Safari/IE Beta 9
DONE A4 -- Documents related to the dataset
DONE B1 -- A plan for scaling (how to grow your PANDA)
DONE B1 -- Import wizard/walk through UI
DONE B1 -- Async data export
DONE B1 -- Amazon Machine Image
DONE B1.1 -- Primitive column types (int, varchar, date, etc.)
DONE B1.3 -- In-system metrics. A dashboard for the admins of the PANDA instance, so that they can measure how well it's working inside their organization. (sneaky new feature inserted by Brian as the result of an interesting conversation with some of the folks that Knight asks that I speak with)
DONE B1.4 -- Profile stuff (create users, change my password, etc) (#150)

Want

Related stories on a dataset (searchable?) (Tom M requested this too)
I18n/L10n
Initial demo data
Iterative updates to a dataset (quarterly updates, etc. keep the old list)
Version tracking for datasets
Export a subset of a dataset (fewer columns from a wide set, filtered rows, etc)
UP -- Google Refine reconciliation endpoint
PANDA-hosted Google Refine
Import localized number formats (1.000, 1 000, 1,000)
IE7 support
UP -- Fuzzy name search (Abbreviations, Bill/William) (#476)
Other datasets related to this one (grouping?)
Row-level comments
Meta type columns
- Birthdate
- Phone number
UP -- Notifications (email? RSS?) for new data sets, new data in sets, etc. (changeset subscriptions?)
UP -- Welcome to panda (optional registration, set up your admin user, links to getting started docs)
DONE B1 -- Document our advanced query language for end users (solr-style)
DONE B1.1 -- Date range search
DONE B1.4 -- Export search results (to csv, etc)

Gravy

Meta type columns
- Location (lat/lng)
- URL
- SSN
- Money
- Organization (name, DUNS, etc)
- User-extensible (make your own, like Illinois school codes)
- Foreign address
Geographic search by shapefiles
Geographic search by any drawn shape
Geographic search by distance
Map the data
Geocode addresses
Canned/saved searches
Import from MDB/Access
Import from shapefile
Import from DBF (#466)
Import from Google Refine, carry the audit trail into PANDA
Import/export to/from Google Docs
Export to Google Fusion Tables
Column statistics (std. dev., sum, etc)
Sysadmin notifications (you're running out of disk! etc.)
Single-click deployment
Automatic upgrades (like wordpress)
Search by taxonomy
De-normalize data / dataset merge (connect a table to its lookup table on import)
Fixtures to import (from the IRE data library, etc)
P13n, store queries that I like to run, etc
DONE B1.1 -- Number range search

Meh

Encrypt all the data
Entity relationships (John Smith in dataset A = John Smith in dataset B, for neat stuff like social network analysis)
RDF, linked data endpoint
Deploy as a hosted service (somebody else can do that once we've written the regular version)
Automated server/resource scaling
Join datasets at runtime (reinvent SQL)
Non-tabular stuff (PDFs, emails, Doc Cloud and Overview Project)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly