Skip to content

bcspragu/logseq-sync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
Nov 25, 2023
Nov 30, 2023
Nov 30, 2023
Dec 12, 2023
Oct 30, 2023
Nov 30, 2023
Dec 12, 2023
Dec 12, 2023
Oct 21, 2023
Feb 3, 2024
Nov 30, 2023
Nov 30, 2023
Nov 30, 2023

Repository files navigation

Logseq Sync

An attempt at an open-source version of the Logseq Sync service, intended for individual, self-hosted use.

It's vaguely functional (see What Works? below), but decidedly pre-alpha software. Definitely don't try to point a real, populated Logseq client at it, I have no idea what will happen.

What's Done/Exists?

Right now, the repo contains (in cmd/server) a mostly implemented version of the Logseq API, including credentialed blob uploads, signed blob downloads, a SQLite database for persistence, and most of the API surface at least somewhat implemented.

Currently, running any of this requires a modified version of the Logseq codebase (here), and the @logseq/rsapi package (here)

On that note, many thanks to the Logseq Team for open-sourcing rsapi recently, it made this project significantly easier to work with.

What Works?

With a modified Logseq, you can use the local server to

  1. Create a graph
  2. Upload (passphrase-encrypted) encryption keys
  3. Get temporary AWS credentials to upload your encrypted files to your private S3 bucket
  4. Upload your encrypted files

And that's basically the full end-to-end flow! The big remaining things are:

  • Implement the WebSockets protocol
  • Figure out how/when to increment the transaction (tx) counter

API Documentation

There's some documentation for the API in docs/API.md. This is the area I could benefit the most from having more information/help on, see Contributing below

Open Questions

S3 API

The real Logseq Sync API gets temp S3 credentials and uploads files direct to S3. I haven't looked closely enough to see if we can swap this out for something S3-compatible like s3proxy or MinIO, see #2 for a bit more discussion.

Currently, amazonaws.com is hardcoded in the client, so that'll be part of a larger discussion on how to make all of this configurable in the long run.

Associated Changes to Logseq

Being able to connect to a self-hosted sync server requires some changes to Logseq as well, namely to specify where your sync server can be accessed. Those changes are in a rough, non-functional state here: https://github.com/logseq/logseq/compare/master...bcspragu:logseq:brandon/settings-hack

Adding a database migration

The self-hosted sync backend has rudimentary support for persistence in a SQLite database. We use sqlc to do Go codegen for SQL queries, and Atlas to manage generating diffs.

The process for changing the database schema looks like:

  1. Update db/sqlite/schema.sql with your desired changes
  2. Run ./scripts/add_migration.sh <name of migration> to generate the relevant migration
  3. Run ./scripts/apply_migrations.sh to apply the migrations to your SQLite database

Why do it this way?

With this workflow, the db/sqlite/migrations/ directory is more or less unused by both sqlc and the actual server program. The reason it's structured this way is to keep a more reviewable audit log of the changes to a database, which a single schema.sql doesn't give you.

Contributing

If you're interested in contributing, thanks! I sincerely appreciate it. There's a few main avenues for contributions:

Getting official buy-in from Logseq

The main blocker right now is getting buy-in from the Logseq team, as I don't want to do the work to add self-hosting settings to the Logseq codebase if they won't be accepted upstream. I've raised the question on the Logseq forums, as well as in a GitHub Discussion on the Logseq repo, but have received no official response.

Understanding/documenting the API

One area where I would love help is specifying the official API more accurately. My API docs are based on a dataset of one, my own account. So there are areas that are underspecified, unknown, or where I just don't understand the flow. Any help there would be great!

Specifically, I'd like to understand:

  1. The details of the WebSocket protocol (doc started here), and
  2. How and when to update the transaction counter, tx in the API

Debugging S3 signature issues

I believe there's a bug (filed upstream, initially here) in the s3-presign crate used by Logseq's rsapi component, which handles the actual sync protocol bits (encryption, key generation, S3 upload, etc).

The bug causes flaky uploads with self-hosted, AWS-backed (i.e. S3 + STS) servers, but I haven't had the time to investigate the exact root cause. The source code for the s3-presign crate is available here, the GitHub repo itself doesn't appear to be public.

About

An open-source Logseq Sync backend implementation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published