Skip to content

Commit

Permalink
Add docs for state and codepoint
Browse files Browse the repository at this point in the history
  • Loading branch information
rnjtranjan committed Jul 8, 2023
1 parent 5fa6d57 commit c50c25a
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 3 deletions.
18 changes: 18 additions & 0 deletions core/src/Streamly/Internal/Unicode/Stream.hs
Original file line number Diff line number Diff line change
Expand Up @@ -190,8 +190,26 @@ encodeLatin1Lax = encodeLatin1
-- UTF-8 decoding
-------------------------------------------------------------------------------

-- CodePoint represents a specific character in the Unicode standard.
-- The code point is a numerical value assigned to each character,
-- and UTF-8 encoding uses a variable number of bytes to represent
-- different code points.
--
-- Calculate the code point value: Depending on the type of the leading byte,
-- extract the significant bits from each byte of the sequence and combine them
-- to form the complete code point value. The specific bit manipulations will
-- differ based on the number of bytes used.
-- Int helps in cheaper conversion from Int to Char
type CodePoint = Int

-- DecodeState refers to the number of bytes remaining to complete the current
-- UTF-8 character decoding. For ASCII characters (code points 0 to 127),
-- no decoding state is necessary because they are represented by a single byte.
-- Therefore, the decoding state for ASCII characters can be considered as 0.
-- For multi-byte characters, the decoding state indicates the number of bytes
-- remaining to complete the character. It is usually initialized to a non-zero
-- value corresponding to the number of bytes in the multi-byte character, e.g
-- DecodeState will be 1 for 2-bytes char.
type DecodeState = Word8

-- We can divide the errors in three general categories:
Expand Down
6 changes: 3 additions & 3 deletions core/src/Streamly/Unicode/Stream.hs
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@
module Streamly.Unicode.Stream
(
DecodeState
, DecodeError(..)
, CodePoint

-- * Construction (Decoding)
, DecodeError(..)

-- * Resumable UTF-8 decoding
, decodeLatin1
, decodeUtf8
, decodeUtf8'
Expand Down

0 comments on commit c50c25a

Please sign in to comment.