This bot can join a Discord server and then will start taking secure backups .
The following environment variables need to be set:
DISCORD_TOKEN
- The authentication Token of the Bot from the Discord developer portalGPG_KEY_DIR
- Directory containing the PGP/GPG public keys (see Cryptography below)SIGN_KEY_PEM
- Path to the PEM file containing the ED25519 private key used to sign each messageEPHEMERAL_PATH
- working directory to use during on disk file operationsHEARTBEAT_URL
- A url to ping with aGET
request to indicate a successful backup run
For enabling a S3-compatible storage backend use the following environment variables to configure:
S3_ENABLED
- set to1
if you want to use S3 storageS3_ENDPOINT
- set to your S3 endpointS3_BUCKET
- set to the name of your bucketS3_ACCESS_KEY
- Username oraws_access_key_id
S3_SECRET_KEY
- Password oraws_secret_access_key
The main goal is to never write anything to disk/storage that contains message content or has structural data like channel names/id or server names/ids. To further guarantee integrity of each message, the bot will sign each message with its own ED25519 key. That key is the only sensitive key used by the bot.
Each message gets encrypted with the PGP/GPG public key passed to the bot via the environment variables (see Configuration). The message will be represented in other stored files only by the hash of the ciphertext.
The bot will sign the hash of the ciphertext with its ED25519 key. That signature will be written out-of-bounds to the manifest file (see Manifest format)
The ED25519 key is considered sensitive
Leakage of that key will allow to tamper with the messages by replacing the ciphertext with the attackers ciphertext.
Each directory file is encrypted with the PGP/GPG public key.
To not have any linkable data on disk, all server and channel id's used by the Manifest and Directory files are SHA256 hashes of those IDs. The only way to re-create the relations between servers, Channels and messages is to either know and hash the IDs, or by consulting the Directory file
The storage is seperated into:
- Message files
- Manifest files
- Directory files
- Location files
- Seal files
In S3 mode only the manifest files will be briefly on disk. Everything else will directly go to S3. The S3 structure is noted in the file format definitions.
Each message is stored in its own encrypted file. The filename is the SHA256 hash of the files contents with the extension .msg
.
The encrypted message file does not bear any markers towards the server or channel it originated from.
Messages will have the following key in S3: messages/<SHA256-MESSAGE-HASH>
The file contains the following message properties: All properties are from the time of grabbing the msg the first time.
{
"author": {
"id": "<Discord ID of the User>",
"name": "<Name of the User on that Server>",
"global_name": "<Global name of the user>"
},
"server": {
"id": "<Discord ID of the Server>",
"name": "<Name of the Server>"
},
"channel": {
"id": "<Discord ID of the Channel>",
"name": "<Name of the Channel>"
},
"category": "<Name of the Category>",
"parent": "<Name of the Channels parent (mainly for threads)>",
"content": "<Message text content>",
"created_at": "<Message creation time on Discord in ISO8601 format>",
"attachments": [
{
"type": "<Content type of the attachment (eg. image/jpg)",
"origin_name": "<Orginal filename>",
"content": "<Base64 encoded content of the file>"
}
]
}
Notes:
- Attachments are only filled when there are actual files attached to the message
- No backup taken time in the format to ensure a stable hash of the message
The manifest files contain the list a list of all messages per server and channel (threads are considered separate channels). They are simple CSV files with the following format per row:
- Discord created at time in ISO8601 format
- SHA256 hash of the message
- ED25519 signature of the message hash
The filename format is <SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>.manifest
Manifests will have the following key in S3: manifests/<SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>
Each manifest and directory will be signed by the bot after a backup run. This way any tamper with the manifest or directory can be detected by verifying the seal against the manifest/directory content. The seal file only contains the ED25519 signature against the hashed content of the manifest/directory.
The filename format is:
- for manifests:
<SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>.seal
- for directories:
<ISO8601-timestamp>.dirseal
The S3 key format is:
- Manifests seals:
manifests/seals/<SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>
- Directory seals:
directories/seals/<Run timestamp in ISO8601>
Contains the last discord message id, backed up in the channel. This is the only file that would contain a reference back to Discord.
The filename format is <SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>.loc
The S3 key format is: locations/<SHA256-hash-of-Server-ID>-<SHA256-hash-of-Channel-ID>
The Bot will create a directory file per run, which contains a JSON structure for all servers, channels and threads it had access to during that run.
The directory file is named <Run timestamp in ISO8601>.dir
and is GPG encrypted.
The S3 key format is: directories/<Run timestamp in ISO8601>
The reason to have a directory file per run, is because the bot is not able to decrypt any data it has encrypted using the PGP public keys provided. The directory has to be encrypted to not leak the server structure, channel and thread names.
Format of the directory file:
[
{
"server_id": "<Discord ID of the Server>",
"server_id_hashed": "<SHA256 hash of server_id>",
"server_name": "<Name of the Server>",
"channels": [
{
"channel_id": "<Discord ID of the Channel>",
"channel_id_hashed": "<SHA256 hash of channel_id>",
"channel_name": "<Name of the Channel>",
"threads": [
{
"thread_id": "<Discord ID of the thread>",
"thread_id_hashed": "<SHA256 hash of thread_id>",
"thread_name": "<Name of the thread>"
}
]
}
]
}
]
NOTE: This process is how to do it manually. There will be a restore tool in the future.
With the manifest it is possible to reconstruct the channel conversations based on date and time.
Reconstruction should follow these steps:
- Use a Directory File from around the time you want to restore data
- Decrypt the file with the PGP private key
- Find the server and channel in question and note down the hashed IDs. For threads note down the hashed thread id instead of the channel id
- Find the manifest for the channel/thread using the hashed ids
- Open the manifest and grab the message hashes and signatures
- Grab all message files in question
- Verify integrity of the message files
- run
sha256sum
and compare against with the filename - verify the signature by running a signature check against the ED25519 public key of the bot and the message hash
- run
- Decrypt the messages
- The seal mechanism re-opens the manifest file after a successful run against the Server/Channel combination. An external process could tamper with the manifest between finishing the backup and creating the seal. This risk is currently accepted
- Location files contain a discord reference to the last message in that channel
- atm @elasticroentgen thinks the attack vector is limited
- alternative would be to store a datetime range and a hash of the message id and first search for that