-
Notifications
You must be signed in to change notification settings - Fork 49
Features, prioritized
JoeGermuska edited this page Mar 26, 2012
·
62 revisions
- revisit priority of no-friction-upload vs. enforcing good metadata (possibly move title/description editing to before upload begins, or require to move from preview to indexing-in-progress state)
- badge/metadata to indicate verified/CQed/gardener-approved datasets
- standard metadata or checkboxes: "was this from a public source or was this created by our people?"; "is this data updated periodically?", "is this safe for publication?", etc
- admin-editable PANDA homepage for news, links to other resources, maybe even search boxes (e.g. http://boundaries.tribapps.com)
- click on an uploader's name to go to a list of all their datasets (wrinkle: add'l data file added by someone other than original dataset creator)
- PANDA scheduled email reminders to write a follow-up FOIA or otherwise obtain updates for existing datasets
- Permissions/set-level security (like Doc Cloud or LDAP?? got another suggestion... project teams? less like a hierarchy, more like a circle or ad-hoc group)
- Sharing between organizations (not sharing the whole PANDA, just parts)
- Edit data in PANDA, delete rows, add new columns, etc., read-only lock on a set?
- Address normalization (solvable with fuzzy search instead?)
- S, M, L sizing, or something like it
- Faceted search
- Fancy query builder like Doc Cloud
- Search data within categories (#473)
- Search datasets within a set or intersection of categories (#472)
- Export PANDA data to a SQL database (#468)
- RSS activity feeds for integration with CMSes and other systems (#469)
- Duplicate detection during data import (#467)
- Import w/ arbitrary delimiters (not just commas)
- Import from fixed-width files
- Comments on a dataset (#116)
- Primitive column types (int, varchar, date, etc.)
- Meta type columns
- Address (and address like-stuffs)
- In-system metrics. A dashboard for the admins of the PANDA instance, so that they can measure how well it's working inside their organization. (sneaky new feature inserted by Brian as the result of an interesting conversation with some of the folks that Knight asks that I speak with)
- Profile stuff (create users, change my password, etc) (#150)
- DONE A1 -- Store the original file
- DONE A1 -- Data set metadata (source, provenance)
- DONE A1 -- Import from CSV
- DONE A1 -- Async data import (queuing)
- DONE A1 -- Full-text search on a dataset
- DONE A2 -- Taxonomy for datasets (categories, tags?)
- DONE A2 -- Search dataset metadata (help me find a dataset)
- DONE A2 -- Login/users
- DONE A3 -- Cumulative data sets via write API
- DONE A3 -- Cumulative data sets via write API demo
- DONE A3 -- Cumulative data sets via scraperwiki (??)
- DONE A3 -- Import from Excel (maybe by explaining people to use CSV, maybe parsing)
- DONE A4 -- Cumulative data sets via additional file uploads (maybe this is solved with versioning?)
- DONE A4 -- Encrypted communications (SSL)
- DONE A4 -- Export a dataset (to csv, xls? etc)
- DONE A4 -- Browser compatibility w/ recent versions of modern browsers: FF/Chrome/Safari/IE Beta 9
- DONE A4 -- Documents related to the dataset
- DONE B1 -- A plan for scaling (how to grow your PANDA)
- DONE B1 -- Import wizard/walk through UI
- DONE B1 -- Async data export
- DONE B1 -- Amazon Machine Image
- Document our advanced query language for end users (solr-style)
- Date range search
- Related stories on a dataset (searchable?)
- I18n/L10n
- Initial demo data
- Export search results (to csv, etc)
- Iterative updates to a dataset (quarterly updates, etc. keep the old list)
- Version tracking for datasets
- Export a subset of a dataset (fewer columns from a wide set, filtered rows, etc)
- Google Refine reconciliation endpoint
- PANDA-hosted Google Refine
- Import localized number formats (1.000, 1 000, 1,000)
- IE7 support
- Fuzzy name search (Abbreviations, Bill/William) (#476)
- Other datasets related to this one (grouping?)
- Row-level comments
- Meta type columns
- Birthdate
- Phone number
- Notifications (email? RSS?) for new data sets, new data in sets, etc.
- Number range search
- Meta type columns
- Location (lat/lng)
- URL
- SSN
- Money
- Organization (name, DUNS, etc)
- User-extensible (make your own, like Illinois school codes)
- Foreign address
- Geographic search by shapefiles
- Geographic search by any drawn shape
- Geographic search by distance
- Map the data
- Geocode addresses
- Canned/saved searches
- Import from MDB/Access
- Import from shapefile
- Import from DBF (#466)
- Import from Google Refine, carry the audit trail into PANDA
- Import/export to/from Google Docs
- Export to Google Fusion Tables
- Column statistics (std. dev., sum, etc)
- Sysadmin notifications (you're running out of disk! etc.)
- Single-click deployment
- Automatic upgrades (like wordpress)
- Search by taxonomy
- De-normalize data / dataset merge (connect a table to its lookup table on import)
- Fixtures to import (from the IRE data library, etc)
- P13n, store queries that I like to run, etc
- Encrypt all the data
- Entity relationships (John Smith in dataset A = John Smith in dataset B, for neat stuff like social network analysis)
- RDF , linked data endpoint
- Deploy as a hosted service (somebody else can do that once we've written the regular version)
- Automated server/resource scaling
- Join datasets at runtime (reinvent SQL)
- Non-tabular stuff (PDFs, emails, Doc Cloud and Overview Project)