jstr

Minimalistic fully-validating Unicode-aware JSON parser in C

Pronounced Jester, a pun at JSMN/Jasmin, the parser generates a read-only JSON DOM in a caller-provided buffer.

Parsing is destructive, i.e. the parser alters JSON string as it goes. This is primarily used to terminate tokens with \0 characters. DOM node stores the value (a C string) as a pointer into the JSON string. String escapes, ex: \n, \uXXXX, are decoded in-place.

Incremental parsing is NOT supported. This decision allows to simplify the library's code significantly. Who needs incremental parsing, seriously?

Unlike JSMN/Jasmin, the world's sloppiest JSON parser (which touts itself as the world's fastest), Jester is not a joke! Non-string keys in objects are prohibited, every key must have a corresponding value, random unquoted literals are not allowed and inputs that aren't valid UTF-8 are rejected.

(Wonder how sloppy JSMN really is? Take a look at the numerous disabled tests in their testsuite!)

Read-only DOM in a caller-provided buffer, you say?

Consider a JSON snippet:

{"name":"root","id":0}

The figure below depicts it as a C string (a cell is a byte).

The parser alters a JSON string as it goes. String escape sequences are decoded in-place and \0 characters are written to terminate tokens.

The next figure shows JSON string before and after parser invocation. Cells bearing a question mark have undefined value.

Caller provides the parser with an array of jstr_token_t objects. This array ends up storing OBJECT, STRING, STRING, STRING, NUMBER tokens. Primitive tokens store a C string value — a pointer to characters — depicted with arrows in the drawing.

OBJECT and ARRAY tokens store the number of child tokens, making it possible to skip subtrees efficiently. Note that skipping a subtree yields the next sibling. The first child is the next token after the parent. These two operations are sufficient to traverse a tree, hence we are entitled to call our token array a JSON DOM.

This data structure is convenient for tree traversal, but modifications — insering a node, for instance — are NOT supported. Due to this limitation we call it a read-only DOM.

(Technically, since tokens don't require cleanup, a caller may mutate the buffer freely. Convert a JSON array of strings in a JSON DOM into a NULL-terminated array of pointers to characters in-place? Why not!)

Usage

void jstr_init(jstr_parser_t *parser);

Init parser (opaque jstr_parser_t structure). No cleanup is necessary.

ssize_t jstr_parse(
  jstr_parser_t *parser, char *json, jstr_token_t *token, size_t token_count
);

Parse JSON data. Returns the number of bytes consumed on success. A negative return value indicates a failure. Possible failures:

JSTR_INVAL (-1) Parse error.

JSTR_NOMEM (-2) Token array is too small. Grow the array. Resume the parser by calling jstr_parse again.

Tokens

Token is an opaque jstr_token_t structure. No cleanup is necessary. Token size matches the size of a pointer or exceeds it.

Use jstr_type function to get token's type:

jstr_type_t jstr_type(const jstr_token_t *token);

typedef enum {
  JSTR_OBJECT = 0x01,
  JSTR_ARRAY  = 0x02,
  JSTR_STRING = 0x04,
  JSTR_NUMBER = 0x08,
  JSTR_TRUE   = 0x10,
  JSTR_FALSE  = 0x20,
  JSTR_NULL   = 0x40
} jstr_type_t;

Use jstr_value function to get token's value (a C string). Don't use with OBJECT or ARRAY.

const char *jstr_value(const jstr_token_t *token);

Use jstr_next to skip all chidren of the token. Works with all token types.

const jstr_token_t *jstr_next(const jstr_token_t *token);

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
doc		doc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
jstr-test.c		jstr-test.c
jstr.c		jstr.c
jstr.h		jstr.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

jstr

Read-only DOM in a caller-provided buffer, you say?

Usage

Tokens

About

Releases 2

Packages

Languages

License

rapidlua/jstr

Folders and files

Latest commit

History

Repository files navigation

jstr

Read-only DOM in a caller-provided buffer, you say?

Usage

Tokens

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages