Skip to content

Latest commit

 

History

History
93 lines (62 loc) · 5.07 KB

README.md

File metadata and controls

93 lines (62 loc) · 5.07 KB

datasette-csv-importer

A Datasette plugin for live-uploading CSV files with a user-friendly configuration UI that doesn't require the application to be restarted to work.

This plugin was inspired by the datasette-upload-csvs plugin and uses CSVs-to-SQlite to actually perform the import. Configuration comes from a parsed version of the CLI tool's --help output.

Installation

Via Datasette:

datasette install git+https://github.com/next-LI/datasette-csv-importer.git

Or a specific version:

datasette install git+https://github.com/next-LI/[email protected]

Development Installation

This repository uses submodules:

git clone --recursive https://github.com/next-LI/datasette-csv-importer

If you've already cloned non-recursively you can run this to get the submodules:

git submodule update --init

Then just install like any normal Datasette plugin:

python setup.py install

It accepts the following configuration in your metadata.json file:

{
  ... metadata.json ...
  "plugins": {
    "datasette-csv-importer": {
      "status_table": "_csv_importer_progress_",
      "status_database": "_internal",
      "database_path": "/data",
      "csvs_path": "/data/csvs"
    },
    ... the rest of your plugins configuration ...
  }
}

Details on the configuration:

  • status_database - name of the database that we're going to use to store import status. It defaults to the internal DB, _internal.
  • status_table - name of the table that we'll write import status rows to.
  • database_path - path to the directory that we'll use to write SQlite databases to. By default, the plugin will use the current working directory of the process running it.
  • csvs_path - path to the directory where raw uploaded CSVs will be saved, along with their csvs-to-sqlite import settings. If this isn't set, then this plugin won't do CSV/config saving.
  • use_db_metadata - (boolean) Whether or not to build the __metadata table as supported by the next-LI/datasette_live_config plugin, giving default access to the current user.
  • use_live_permissions - (boolean) Whether or not to integrate with the next-LI/datasette-live-permissions and grant the uploading user access upon upload.

Usage

tl;dr: Go to /-/csv-importer, select a CSV, choose import options, click submit 💥 and your data is live on the dashboard!

The plugin adds an interface at /-/csv-importer for importing a CSV file. Once you drag and drop a CSV file, you'll be shown a set of options for importing your data into Datasette using the CSVs-to-SQlite tool. Clicking "Submit" below will start the import process and you'll be given information about success or failure.

Permissions

This plugin uses Datasette's permissioning system. Granting actors the csv-importer permission will allow access to CSV importing. The root actor is granted access by default.

Development

There's two parts to this plugin: converting the CSV importer CLI tool's arguments list to a JSON schema which renders a form (this is mostly done by hand and ends up in templates/schema.json and the webapp that does the actual import.

Form configuration can be done via the form schema in templates/form.json and documentation for supported options are here

KNOWN ISSUES

There's an async race condition that can sometimes be triggered when uploading and inserting a new database. You'll get a 500 error and a console traceback like the following:

INFO:     127.0.0.1:52790 - "POST /-/csv-importer HTTP/1.1" 200 OK
/Users/brandon/.pyenv/versions/nextli/lib/python3.8/site-packages/pandas/core/generic.py:2779: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.
  sql.to_sql(
There is no current event loop in thread 'Thread-2'.
Traceback (most recent call last):
  File "/Users/brandon/.pyenv/versions/nextli/lib/python3.8/site-packages/datasette-0.56-py3.8.egg/datasette/app.py", line 1144, in route_path
    response = await view(request, send)
  File "/Users/brandon/.pyenv/versions/nextli/lib/python3.8/site-packages/datasette-0.56-py3.8.egg/datasette/views/base.py", line 146, in view
    return await self.dispatch_request(
  File "/Users/brandon/.pyenv/versions/nextli/lib/python3.8/site-packages/datasette-0.56-py3.8.egg/datasette/views/base.py", line 119, in dispatch_request
    await self.ds.refresh_schemas()
  File "/Users/brandon/.pyenv/versions/nextli/lib/python3.8/site-packages/datasette-0.56-py3.8.egg/datasette/app.py", line 343, in refresh_schemas
    for database_name, db in self.databases.items():
RuntimeError: OrderedDict mutated during iteration

As far as I can tell, you can ignore this error. If your import got to the point where this bug can be triggered, your data was successfully imported.