Replies: 4 comments
-
@kleihan, I agree that Orchestra could benefit from having richer and more extensible definitions of datatypes and the mapping to lexical spaces in the A bit of background: "set" is one of several aggregate datatypes defined by ISO 11404 General Purpose Datatypes. A set is an unordered collection of unique values. The other aggregate types are record, class, sequence, bag, array, and table. Each has its own characteristics. The Orchestra equivalent of "record" is Component. I leave it to the group to decide which of the others might be useful in Orchestra. I will add that the current schema has some rudimentary characteristics for expressing aggregate types. Attributes of <xs:attribute name="element" type="xs:string">
<xs:annotation>
<xs:documentation>Element type of an aggregate type such as an array or sequence</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name="size" type="xs:nonNegativeInteger">
<xs:annotation>
<xs:documentation>Size of an aggregate type such as an array. That is, the number of elements.</xs:documentation>
</xs:annotation>
</xs:attribute> Some aggregations, such as array, have a fixed size while others are variable length. We need to make that distinction clear. |
Beta Was this translation helpful? Give feedback.
-
Adding a concrete example for a "bitmap" from LSEG's Group Ticker Plant spec Data type: Bit Field I guess this does not fit with Don's description of a Set, but rather an array type. FWIW I originally modelled this as a component with codesets for each field, as follows: Component AllowedBookTypes (1)SynopsisDefines the order-book types that are allowed for the instrument. Each designated bit represents a book type. ‘0’ means not allowed and ‘1’ means allowed.
Codeset FirmQuoteBookCodeSet type Boolean (1001)
|
Beta Was this translation helpful? Give feedback.
-
@martinswanson, although the term "bitset" is commonly used, I agree that it is really an array of Boolean, where each Boolean is encoded as a bit (0/1). In an array, each element is addressable by an index, e.g. the third bit is off (0=false). |
Beta Was this translation helpful? Give feedback.
-
I'd like to add an overview and a proposal we drafted a few months ago. While it still needs some refinement, it can help us progress the discussion. It addresses options 1 and 2 from @kleihan's original post. Multi-Value Code Sets Support RequirementsThis problem is about fields where their value is not a single one, but it can be multiple values. For example, instead of being These are our requirements:
The following sections cover cases that we may encounter. Code sets of valuesCodeSet values have variable lengthIn this case, a CodeSet has values with variable length, e.g. String. This means that a code can have a single character value, and the other one can be two or more characters. This option requires special encoding, e.g. using a separator. Example <codeSet name="ExecInstCodeSet" id="18" type="MultipleValueString">
<code name="NoCross" id="18011" value="A"/>
<code name="Suspend" id="18012" value="S"/>
...
<code name="NoCrossCustom" id="18111" value="Ax"/>
</codeSet> Encoded field value when codes
CodeSet values have fixed lengthIn this case, a CodeSet has values of a fixed length. This case ensures efficient packing and is applicable when all codes are of the same length, e.g. a single character. This allows the selected codes to be packed sequentially without separators. Example <codeSet name="ExecInstCodeSet" id="18" type="MultipleValueString">
<code name="NoCross" id="18011" value="A"/>
<code name="Suspend" id="18012" value="S"/>
</codeSet> Encoded field value when codes
Code sets of flags (bits)CodeSet values indicate a bit in a bitmapIn this case, the CodeSet values are selected (i.e., enabled or disabled) based on a bit corresponding to each value. This allows the most efficient packing of the set into binary format, which is especially useful for binary protocols. In this case, a code value represents the bit associated with the code. Example <codeSet name="ExecInstCodeSet" id="18" type="BitSet8">
<code name="NoCross" id="18011" value="1"/>
<code name="Suspend" id="18012" value="2"/>
<code name="Hold" id="18013" value="3"/>
</codeSet> Encoded field value when Decimal Value and lexical space of codes (encoding)To encode a CodeSet, the datatype of its codes must be defined (i.e., their lexical space). This information is provided through the MappedDataType for each encoding. For example:
The datatype of codes is used to:
Value SeparatorA separator is necessary when encoding CodeSets with a variable length codes values, e.g. string values. This allows to identify each code values in the composite value. The separator must obviously belong to the composite lexical space. Possible Solution IdeasWe recommend introducing a new attribute, Example Define a CodeSet with two codes: <codeSet name="ExecInstCodeSet" id="18" type="char” compositeType="string">
<code id="1" name="NoPriceCheck" value="P"/>
<code id="2" name="NoNotionalCheck" value="N"/>
</codeSet> The type attribute refers to the datatype <datatype name="char">
<mappedDatatype standard="XML" base="xs:string" size="1" pattern="[A-Za-z0-9]"/>
</datatype>
<datatype name="string">
<mappedDatatype standard="XML" base="xs:string"/>
</datatype> Using the information above, an encoder understands the correct way to represent the selected codes within a composite type. In the above example, the decoder understands that individual codes’ characters are concatenated to form a string. As described earlier, a protocol may require a separator between encoded values. We recommend indicating this to the encoder with a new <<datatype name="string">
<mappedDatatype standard="XML" base="xs:string" elementSeparator=","/>
</datatype> For bit sets, <codeSet name="ExecInstCodeSet" id="18" type="BitNum8” compositeType="BitSet8">
<code id="1" name="NoPriceCheck" value="1"/>
<code id="2" name="NoNotionalCheck" value="2"/>
</codeSet>
<datatype name="BitSet8">
<mappedDatatype standard="ISO11404" base="bitstring" element="bit" size="8"/>
<!-- An octet representing up to 256 states -->
</datatype>
<datatype name="BitNum8">
<mappedDatatype standard="ISO11404" base="integer" minInclusive="1" maxInclusive="8"/>
<!-- A number of a bit in an octet -->
</datatype> In this example, the composite type of the CodeSet is a set of eight bits, even though the CodeSet may use only two of them. Most binary protocols align such datatypes by octets, so it can be represented internally as an unsigned byte. The |
Beta Was this translation helpful? Give feedback.
-
This thread is intended to discuss one of the proposals mentioned in #197 with the headline "Multiple Values". The FIX Protocol uses a repeating group with a single field to express a set where one or more values can be provided. FIX Latest still has two datatypes (MultipleCharValue and MultipleStringValue) that support space delimited single / multiple character values. These datatypes are no longer used for new fields and single field repeating groups are used instead, e.g. ExecInstRules.
Semantically, these are all sets but there is more than one way of encoding a set. It could be a string with or without delimiters to separate the values. It could also be a bitmap with one bit per value, requiring one or more bytes depending on the possible values. A programming language may choose to use an array as datatype to store this information and make it more easily accessible. The Orchestra element ' mappedDatatype' should be enhanced to support the description of the actual encoding.
FIX Latest currently only has one mapping of these datatypes for FIXML, using a pattern for validation:
MultipleCharValue uses
pattern="[A-Za-z0-9](\s[A-Za-z0-9])*"
MultipleStringValue uses
pattern=".+(\s.+)*"
Should Orchestra be able to express such sets explicitly? If so, what are the options to do so?
Here are some options to start the discussion:
mappedDatatype
details the encodingNote that ISO 20022 has a from-to cardinality (0,1,) attached to every element (field, component or group) that implicitly gives presence information (e.g. 0:1 is optional, 1:1 is required, 0: is unbounded and 1:* is at least one and unbounded). FIX Protocol may not use sets as they do not easily map to tag=value.
Beta Was this translation helpful? Give feedback.
All reactions