Skip to content

hingstarne/lector

Repository files navigation

lector

Charactar normalisation service that renders unicode confusables and send back the string via ocr and makes a judgement about profanity.

There is here a sample Postman collection.

To test it on your local machine just forward the service to your localhost and try the examples.

Current integration environment access is needed:

kubectl -n lector port-forward service/lector 8000:8000

Sample payload:

{"toCheck": "ꜰᴜᴄᴋ ᴍᴇ"}

Sample Response

{
    "ocr": {
        "string": "FUCK ME",
        "profan": true
    },
    "raw": {
        "string": "ꜰᴜᴄᴋ ᴍᴇ",
        "profan": false
    },
    "transcribed": {
        "string": "ꜰucĸ ʍᴇ",
        "profan": false
    }
}

Response struct:

type Response struct {
	Ocr struct {
		String string `json:"string"`
		Profan bool   `json:"profan"`
	} `json:"ocr"`
	Raw struct {
		String string `json:"string"`
		Profan bool   `json:"profan"`
	} `json:"raw"`
	Transcribed struct {
		String string `json:"string"`
		Profan bool   `json:"profan"`
	} `json:"transcribed"`
}

Confusbales in unicode are characters that look a like another one.

http://www.unicode.org/Public/security/latest/confusables.txt

If you like to try more sophisticated strings you can create one on your own here

One possible answer would be this lector service.

Credits to:

About

Confusable scanner

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published