Cutplace is a tool and API to validate that tabular data stored in CSV, Excel, ODS and PRN files conform to a cutplace interface definition (CID).
As an example, consider the following customers.csv
file that stores data
about customers:
customer_id,surname,first_name,born,gender 1,Beck,Tyler,1995-11-15,male 2,Gibson,Martin,1969-08-18,male 3,Hopkins,Chester,1982-12-19,male 4,Lopez,Tyler,1930-10-13,male 5,James,Ana,1943-08-10,female 6,Martin,Jon,1932-09-27,male 7,Knight,Carolyn,1977-05-25,female 8,Rose,Tammy,2004-01-12,female 9,Gutierrez,Reginald,2010-05-18,male 10,Phillips,Pauline,1960-11-09,female
A CID can describe such a file in an easy to read way. It consists of three sections. First, there is the general data format:
Property | Value | |
---|---|---|
D | Format | Delimited |
D | Encoding | UTF-8 |
D | Header | 1 |
D | Line delimiter | LF |
D | Item delimiter | , |
Next there are the fields stored in the data file:
Name | Example | Empty | Length | Type | Rule | |
---|---|---|---|---|---|---|
F | customer_id | 3798 | Integer | 0...99999 | ||
F | surname | Miller | ...60 | |||
F | first_name | John | X | ...60 | ||
F | date_of_birth | 1978-11-27 | DateTime | YYYY-MM-DD | ||
F | gender | male | X | Choice | female, male |
Optionally you can describe conditions that must be met across the whole file:
Description | Type | Rule | |
---|---|---|---|
C | customer must be unique | IsUnique | customer_id |
The CID can be stored in common spreadsheet formats, in particular
Excel and ODS, for example cid_customers.ods
.
Cutplace can validate that the data file conforms to the CID:
$ cutplace cid_customers.ods customers.csv
Now add a new line with a broken date_of_birth
:
73921,Harris,Diana,04.08.1953,female
Cutplace rejects this file with the error message:
customers.csv (R12C4): cannot accept field 'date_of_birth': date must match format YYYY-MM-DD (%Y-%m-%d) but is: '04.08.1953'
Additionally, cutplace provides an easy to use API to read and write tabular data files using a common interface without having to deal with the intrinsic of data format specific modules. To read and validate the above example:
import cutplace import cutplace.errors cid_path = 'cid_customers.ods' data_path = 'customers.csv' try: for row in cutplace.rows(cid_path, data_path): pass # We could also do something useful with the data in ``row`` here. except cutplace.errors.DataError as error: print(error)
For more information, read the documentation at http://cutplace.readthedocs.org/ or visit the project at https://github.com/roskakori/cutplace.
This project has optional support for Django. When used together with Django you can get localized error messages.
In order to use it you need to add cutplace to the INSTALLED_APPS variable of your Django settings.