layout | title | subtitle |
---|---|---|
layout |
Query of Norikra |
Syntax and examples |
Norikra's query is actually Esper's EPL. But some limitations and additional features exist.
- Only
SELECT
clause is supported and tested- Features for SELECT are supported: Sub queries, JOINs and Patterns
- Java method invocations
- UDFs
- Unsupported statements of EPL
- Annotations
- Additional declarations (expressions, scripts, schemas and windows)
- Contexts, variables and constants
- Merging/splitting/updating streams (
INSERT INTO
,UPDATE
) - Accessing external data
- Event delivery with
For
clause
- Additional features
- Schema-less events (called 'targets')
- Complex events support: Container fields (hash, array)
Norikra's queries are designed to do simple SELECT ... FROM ...
, without any schemas. Many of complex features of EPL can be done with schema-less simple SELECT queries (ex: Merging streams are same with to use a target for many kind of events).
Norikra's queries MUST be parsed and compiled as Esper EPL. Extended syntax is in EPL syntax.
All queries have a name and a group name.
- Name
- must be unique in all queries
- is key to fetch results of query
- Group
- may be blank or a string
- is key to sweep results of query set
Names and groups are not accessed in query, and their naming rule is not defined (any strings are valid).
Norikra's query is a kind of standard SQL, except for views. SQL keywords and functions are case insensitive, but target and field names are case sensitive.
SELECT
age,
COUNT(1, session.valid IS NOT NULL) AS login_users,
COUNT(*) AS users
FROM accesses.win:time_batch(1 hours)
WHERE name.length() > 0
GROUP BY age
This query will be evaluated for only event records with name
, age
and session.valid
fields.
Target is something like a table name of RDBMS. But target is a name of set of streams, which have different set of fields. In typical way, one target is for one web service or one customer.
Target naming rule is [a-zA-Z][_a-zA-Z0-9]*
.
Views are used to define time windows or to control timing to fire events. With views, norikra controls the range of events for each queries.
A query without any views specified runs for all events. This query puts output events as much as events with age over 15.
SELECT name, age FROM events WHERE age > 15
NOTE: Current versions of Norikra has no RPC API to fetch output events as stream. So queries of this type are slightly dangerous: huge amount of output events may exist in memory if user does not fetch these.
Time batch windows (win:time_batch( time-period )
) separate events by its arrival time, and does 'group by' by this separation. Output events are generated just once per intervals of specified time period.
This specifier may be a most major one for Norikra.
SELECT
name,
age,
COUNT(*) AS count
FROM events.win:time_batch(1 hours)
WHERE age > 15
GROUP BY name, age
This query does its process for 1 hour from the time query registered, to 1 hour after it. Output events appears on just 1 hour after.
10:31:05 query registered
10:31:06 event 1
10:35:51 event 2
.
.
11:31:04 event N
11:31:05 -> output events (for event 1 .. N)
11:31:05 event N+1
.
.
12:31:05 -> output events (for event N+1 .. )
For time-period, many units are available, like sec|second|seconds, min|minute|minutes, hour|hours, day|days
to year|years
!. (EPL document)
If you want to specify time period offset to any point (ex: from 10:00:00 to 11:00:00 for 1 hour time batch), use second (optional) argument of time_batch with long-value milliseconds from epoch (ex: events.win:time_batch(1 hour, 0L)
).
And third optional argument is flow control. Comment below is a quotation from Esper EPL reference.
By default, if there are no events arriving in the current interval (insert stream), and no events remain from the prior batch (remove stream), then the view does not post results to listeners. The view allows overriding this default behavior via flow control keywords.
If no one event arrived in a time batch, query outputs '0' records only for the first time, and no records for second (or later) time batches. For available options, see EPL document.
Externally timed batch window (win:ext_timed_batch( expression, time-period )
) is same with time batch window, except for its 'time'. This window uses any fields (or a valid expression) instead of system time of event arrival (time batch window).
SELECT
name,
age,
COUNT(*) AS count
FROM events.win:ext_timed_batch(timestamp, 1 hours)
WHERE age > 15
GROUP BY name, age
timestamp
must be a long-value field (Norikra's 'integer' fields are actually Long) which contains a milliseconds from epoch. Expressions are valid if you have a field with seconds from epoch. (ex: FROM events.win:ext_timed_batch( seconds_from_epoch * 1000, 1 hours )
)
Optional arguments for time offset and flow controls are also available.
NOTE: This data window expects that events are ordered by timestamp
strictly, and disordered events will be discarded. Use ext_timed_batch
only with sorted events.
Time window (win:time( time-period )
), from specified time ago of latest event arrival, to latest event. Output events are generated for every input events.
SELECT
name,
age,
COUNT(*) AS count
FROM events.win:time(1 hours)
WHERE age > 15
GROUP BY name, age
This query generates output events as much as input events (as same as no views specified).
Length window (win:length( size )
) or length batch window (win:length_batch( size )
) are works you expecting for event sizes.
The unique view (std:unique( expression [, expression ... ] )
) for expressions. Queries with this view works when events arrived and specified expressions returns unique value(s).
SELECT
family_name,
name,
age
FROM events.std:unique(name, family_name)
WHERE age > 15
This query generates output events for the first arrival time of each name-family_name combinations. Of course, this distinct calculations are executed on memory, so we should use this view in combination with time (or other) windows (except for special cases, ex:country names).
- Combination
Views can be combined. The query below is used to get the first events of each event_code, for each days.
SELECT
event_code,
parameter
FROM events.win:time_batch(1 day, 0L).std:unique(event_code)
- Others
Many more views are available in Norikra, as same as Esper EPL itself. Norikra have no limitations for views. If you found any unavailabilities, it's bug.
Field names are written as [_a-zA-Z0-9]+
or dot-separated these parts in queries. (Dot-separated field names are container field accesses. See following 'Container fields' section.)
Norikra's input events are object (or hash, map, ...) that have pairs of field names and values. Its field names are directly referred by field names in queries.
Input event:
{
"username": "tagomoris",
"email": "[email protected]",
"url": "https://github.com/tagomoris",
"repositories": 50
}
Query example:
SELECT
username, -- "tagomoris"
email, -- "[email protected]"
repositories -- 50
FROM events
All norikra fields have a type, one of 'string', 'boolean', 'integer'(defined as 'long' internally), 'float'(defined as 'double' internally) or 'hash' and 'array'. 'hash' and 'array' are container fields. Types of fields are specified when targets are opened, or determined automatically by value types of the first input event for lazy targets.
To Access fields of input event record with invalid characters as query's field name, escape invalid chars with underscores (_
).
Input event:
{
"user name": "tagomoris",
"email": "[email protected]",
"url": "https://github.com/tagomoris",
"repositories": 50,
"phone.number": "+81-090-xxxx-xxxx"
}
Query example:
SELECT
user_name, -- "tagomoris"
phone_number -- "+81-090-xxxx-xxxx"
FROM events
Different fields with same escaped names (ex: "user.name", "user-name" and "user name") are mixed on Norikra's query, and treated as one field. If it is not desired behavior for you, you should escape fields before sending norikra server.
For 'hash' and 'array' fields, event records have nested objects, and we can access these nested objects by dot-separated field names.
Input event:
{
"user": {
"name": "tagomoris",
"email": "[email protected]",
},
"visit": "Tokyo",
"route": ["Shibuya", "Ohsaki", "Shinjuku", "Ikebukuro"]
}
Query example:
SELECT
user.name, -- "tagomoris"
route.$0 -- "Shibuya"
FROM events
WHERE visit="Tokyo" AND user.email IS NOT NULL
Accessing hash member is just joined fields by dot, like user.name
. To specify array member, use index as $0
, $1
, ... (negative index is not supported yet). Of course, 3 or more nested field accesses are available: users.$1.data.name.first
.
Use $$0
to specify string representations of integer (like '0') for hash member field name.
In norikra (and Esper) queries, each fields' values are Java object (String, Boolean, Long, Double), and we can call these instance methods on queries.
SELECT
log.substring(8) AS message
FROM events
WHERE log.startsWith('WARNING:')
And some of Java packages are automatically imported, so we can also call functions like Math.abs()
.
Esper auto-imports the following Java library packages:
java.lang.*
java.math.*
java.text.*
java.util.*
Method chains are also available, ex:
SELECT
DateFormat.getInstance().parse(date_field).getTime() AS epoch_msec
FROM events
Norikra/Esper has many operators:
- Arithmetic operations:
+
,-
,*
,/
,%
- Logical/comparison operations:
NOT
,OR
,AND
,=
,!=
,<
,>
,<=
,>=
,is
,is not
- String concatenation operation:
||
- String comparison operations:
LIKE
,REGEXP
exp IN (items)
,exp IN rage
(open range:(0:100)
and close range:[0:100]
), andbetween
Esper has built-in functions like these:
CASE value WHEN v1 THEN r1 WHEN v2 THEN r2 [...] ELSE rr END
CASE WHEN cond1 THEN r1 [...] ELSE rr END
CAST( expression, type_name )
PREV()
,CURRENT_TIMESTAMP()
AVG()
,COUNT()
,MAX()
,MIN()
,SUM()
and DISTINCTFIRST()
,LAST()
,MAXBY()
,MINBY()
These keywords work as same as we expect. For more, see EPL reference.
- "Aggregates and grouping: the Group-by Clause and the Having Clause" (link)
- "Limiting Row Count: the Limit Clause" (link)
- "Sorting Output: the Order By Clause" (link)
For analytics usage, Rollup, Cube and Grouping sets are supported with Group-by.
- "Using Group-By with Rollup, Cube and Grouping Sets" (link)
JOINs and SubQueries are available, but Norikra requires fully-qualified field names for JOINs/SubQueries.
Example query (JOIN, default is inner-join):
SELECT
a.name AS name,
a.country AS country,
b.code AS event_code,
COUNT(*) AS times
FROM registrations.std:unique(name) AS a,
behavior.win:time_batch(1 hours) AS b
WHERE a.name = b.name
AND b.log_level IN ('INFO', 'WARN', 'ERROR')
GROUP BY a.name, a.country, b.code
Example query (SubQuery):
SELECT
name,
( SELECT a.country
FROM registrations.std:unique(name) AS a
WHERE a.name = behavior.name
) AS country,
code
FROM behavior
WHERE log_level IN ('INFO', 'WARN', 'ERROR')
"Fully-qualified field name" means specifier like TARGET.FIELD
or ALIAS.FIELD
. For JOINs, Norikra cannot determine the target fields belongs to, so all fields must be specified with target name or alias. For SubQueries, fields in inner-query must be specified with targets/aliases. Fields of outer-query does not require it.
For JOINs and SubQueries, Esper EPL have many restrictions. See documents.
In norikra query, all fields can be qualified with targets (ex: SELECT Events.code FROM Events ...
). But Norikra cannot determine whether first part is a target name or a container field name if hash/array container field have a same name with target.
-- TARGET Name: Events
-- Data:
-- {
-- "Events": {"code":404, "path":"/"},
-- "code": 0,
-- }
SELECT
Events.code -- 404 ? 0 ?
FROM Events
In these case, Norikra handle 'Events' as target name. To access the field with the same name of target, use fully-qualified field name:
SELECT
Events.Events.code, -- 404
Events.Events.path, -- "/"
code -- 0
FROM Events
Both of Filters and Patterns are available in Norikra. For more details, see EPL's reference.
In norikra query, SELECT * FROM ...
may NOT perform as user expects. Specify all fields required in explicit way. This is a restriction of Norikra's schema-less feature.
Esper EPL's syntax have escaping by backquotes. But Norikra's query does NOT support it. In norikra queries, use field names escaped by underscore instead of backquote escaping.
NG: `field name`
OK: field_name
Norikra's queries are evaluated for events, which have all fields used in that query. Field existence checks are automatically done by Norikra server, and events without required fields are ignored.
Esper EPL have a function named as exist()
, but in Norikra query, exist( someone )
refers a field 'someone', so this expression is always true. So you should just use fields directly, without existence check, in Norikra.
If you want to use any field names in query even though that fields does not exists in input records, specify that field as non-optional per target, OR, specify NULLABLE(field_name)
in queries.
Once field_a
, field_b
and NULLABLE(field_c)
specified in a query (QUERY_A), Norikra processes all events with field_a
, field_b
and with/without field_c
for QUERY_A. NULL will be used for events without field_c
.