Skip to content
/ avram-js Public

Avram Schema Language implementation in JavaScript

License

Notifications You must be signed in to change notification settings

gbv/avram-js

Repository files navigation

avram-js

Avram Schema Language implementation in JavaScript

Test NPM Version standard-readme compliant

This Node package implements Avram Schema Language to validate field-based data formats, in particular library data formats such as MARC and PICA.

Table of Contents

Background

Several schema languages exist for JSON (JSON Schema), XML (XSD, DTD, Schematron, RELAX NG), RDF (RDFS, SHACL, ShEx), and Strings (regular expressions and formal grammars). Avram is a schema language designed for field-based data formats such as MARC and PICA. Avram can also be used to define and validate flat key-value structures such as found in tabular data (CSV, TSV).

Install

Requires Node >= 18.0.0 (possibly try nvm to get a current version of Node). Installation of this module provides bare functionality for validating records, including the command line client avram.

npm install -g avram

If To process selected data formats in serialization forms other than JSON, install additional parsing libraries marcjs for MARC, pica-data for PICA and csv-parse for CSV:

npm install -g marcjs pica-data csv-parse

To also validate schema files, install additional libraries ajv and ajv-formats:

npm install -g ajv ajv-formats

To convert schema files to HTML, install additional library ejs:

npm install -g ejs

Usage

See API for usage as programming library.

avram

Validate records from input file(s) or standard input. The first argument must be an Avram schema file. The list of supported input formats depends on installed parsing libraries (see Install).

Usage: avram [options] [validation options] <schema> [<files...>]

Validate file(s) with an Avram schema

Options:
  -f, --format [name]     input format (marcxml|iso2709|mrc|pp|plain|csv)
  -s, --schema            validate schema instead of record files
  -d, --document          document schema in HTML (requires ejs)
  -t, --type [types]      specify comma-separated record types
  -x, --extension [name]  specify comma-separated extensions (e.g. marc)
  -p, --print             print all input records (in JSON)
  -v, --verbose           verbose error messages
  -l, --list              list supported validation options
  -h, --help              output usage information
  -V, --version           output the version number

Validation options can be enable/disable by prepending + or - respectively. The following options (each with default status) are supported to report:

+invalidRecord           invalid records
+undefinedField          fields not found in the field schedule
+deprecatedField         report deprecated fields
+nonrepeatableField      repetition of non-repeatable fields
+missingField            required fields missing from a record
+invalidIndicator        field not matching expected validation definition
+invalidFieldValue       invalid flat field values
+invalidSubfield         invalid subfields (subsumes all subfield errors)
+undefinedSubfield       subfields not found in the subfield schedule
+deprecatedSubfield      report deprecated subfields
+nonrepeatableSubfield   repetition of non-repeatable subfields
+missingSubfield         required subfields missing from a field
+invalidSubfieldValue    invalid subfield values
+patternMismatch         values not matching an expected pattern
+invalidPosition         values not matching expected positions
+recordTypes             support record types
+invalidFlag             value is not a concatenation of flags
+undefinedCode           values not found in an expected codelist
-undefinedCodelist       non-resolveable codelist references
-countRecord             expected number of records not met
-countField              expected number of fields not met
-countSubfield           expected number of subfields not met

Proper validation of schemas requires additional libraries ajv and ajv-formats to be installed.

The JSON format emitted with option -p or --print looks like this:

{
  "fields": [
    { "key": "tag1", "value": "..." },
    { "key": "tag2", "value": "..." }
  ],
  "types": []
}

It can be converted to flat key-value structure by piping to jq command jq '.fields|from_entries'.

API

Validator

Class Validator implements validation against an Avram schema.

import { Validator } from "avram"

const validator = new Validator(schema, options)

// validate a set of records
const errors = validator.validateRecords(records)
if (!errors.length) {
  console.log("valid")
} else {
  errors.forEach(e => console.error(e))
}

// validate a single record
errors = validator.validate(record)

The record structure expected by validate, based on the Avram record model, is a JSON object with optional array types and required array fields, each a JSON object with the following keys:

  • mandatory tag (string), the key of a field
  • either value (string), the flat field value, or subfields (array with alternating subfield codes and subfield values)
  • optional occurrence (string) or indicators (array of two strings)

Method validate always returns a (hopefully empty) array of errors. Each error is a JSON object with these keys (all optional except message):

  • human readable error message
  • error with the number of the violated rule from Avram specification (e.g. "AR1")
  • tag or tag and occurence of an invalid field
  • identifier of an invalid field
  • code of an invalid subfield
  • value of an invalid (sub)field
  • pattern of an invalid (sub)field

Record

The Record object provides methods to convert usual formats to Avram record format:

import { Record } from "avram"

var record = Record.fromObject(obj)    // any key-value object. Non-flat values are ignored.
var record = Record.fromMarcjs(marc)   // expect marcjs record structure
var record = Record.fromPicajson(pica) // expect PICA/JSON record stucture

See marcjs records and PICA/JSON for reference.

SchemaValidator

Class SchemaValidator implements validation of an Avram schema (Avram schema meta-validator). Full validation requires additional libraries ajv and ajv-formats to be installed.

import { SchemaValidator } from "avram"

const validator = new SchemaValidator()
const errors = validator.validate(schema)
if (errors.length) {
  errors.forEach(e => console.error(e))
}

Test suites

This package contains the official test suite for Avram validators. See directory test/suite/ and its file README.md for details.

The unit tests of this library further contain a test suite of valid and invalid Avram schemas in file test/schema-suite.json.

Related projects

QA Catalogue implements validation of MARC 21, UNIMARC and K10plus PICA, partly based on Avram Schemas.

Perl modules MARC::Schema and PICA::Schema partially implement Avram as well.

Several libraries and tools exist to validate MARC data: @natlibfi/marc-record-validate, @russian-state-library/js-marc-rsl (Node), MARCEdit.

Maintainers

Contributing

Contributions are welcome! Best use the issue tracker for questions, bug reports, and/or feature requests!

License

MIT license