base32768

Base32768 is a binary encoding optimised for UTF-16-encoded text. This JavaScript module, base32768, is the first implementation of this encoding.

The efficiency chart speaks for itself. Efficiency ratings are averaged over long inputs. Higher is better.

Encoding		Efficiency			Bytes per Tweet *
Encoding		UTF‑8	UTF‑16	UTF‑32	Bytes per Tweet *
ASCII‑constrained	Unary / Base1	0%	0%	0%	1
	Binary	13%	6%	3%	35
	Hexadecimal	50%	25%	13%	140
	Base64	75%	38%	19%	210
	Base85 †	80%	40%	20%	224
BMP‑constrained	HexagramEncode	25%	38%	19%	105
	BrailleEncode	33%	50%	25%	140
	Base2048	56%	69%	34%	385
	Base32768	63%	94%	47%	263
Full Unicode	Ecoji	31%	31%	31%	175
	Base65536	56%	64%	50%	280
	Base131072 ‡	53%+	53%+	53%	297

* New-style "long" Tweets, up to 280 Unicode characters give or take Twitter's complex "weighting" calculation.
† Base85 is listed for completeness but all variants use characters which are considered hazardous for general use in text: escape characters, brackets, punctuation etc..
‡ Base131072 is a work in progress, not yet ready for general use.

Base32768 uses only "safe" Unicode code points - no unassigned code points, no whitespace, no control characters, etc..

Installation

npm install base32768

Usage

import { encode, decode } from 'base32768'

const uint8Array = new Uint8Array([104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100])
const str = encode(uint8Array)
console.log(str)
// 6 code points, '媒腻㐤┖ꈳ埳'

const uint8Array2 = decode(str)
console.log(uint8Array2)
// [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]

In the browser

Load this file in the browser to gain access to a base32768 global.

<script src="https://unpkg.com/base32768@2/dist/iife/base32768.js" crossorigin></script>
<script>
  console.log(base32768.decode('怗膹䩈㭴䂊䫁輪黔'))
</script>

API

base32768.encode(uint8Array)

Encodes a Uint8Array and returns a Base32768 String. Note that every Node.js Buffer is a Uint8Array.

The string is suitable for passing safely through almost any "Unicode-clean" text-handling API. This string contains no special characters and is immune to Unicode normalization. Give or take some padding characters, the output string has 1 character per 15 bits of input.

All characters are chosen from the Basic Multilingual Plane. This means that when encoded as UTF-16, all characters occupy 16 bits. Thus, there are 16 bits of output UTF-16 text per 15 bits of input, an efficiency of 93.75%.

base32768.decode(str)

Decodes a Base32768 String and returns a Uint8Array containing the original binary data. Note that a Uint8Array can be converted to a Node.js Buffer like so:

const buffer = Buffer.from(uint8Array.buffer, uint8Array.byteOffset, uint8Array.byteLength)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
src		src
test-data		test-data
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.js		rollup.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

base32768

Installation

Usage

In the browser

API

base32768.encode(uint8Array)

base32768.decode(str)

License

About

Releases

Packages

Languages

exfinium/base32768

Folders and files

Latest commit

History

Repository files navigation

base32768

Installation

Usage

In the browser

API

base32768.encode(uint8Array)

base32768.decode(str)

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages