Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OT Editor - Documentation #340

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
163 changes: 163 additions & 0 deletions backend/editor/OT/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# The Concurrent Editor
As a high level overview though the concurrent editor is an OT based editor thats heavily influenced by Google's Wave algorithm, Wave was Google's predecessor to Google docs and heavily inspired the architecture for Google Docs. Our editor uses the same underlying architecture as Wave except the difference lies in the type of operations we are sending between the server and client. Wave was designed to operate with operations that modified unstructured data, while our operations modify structured JSON data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"As a high level overview though, the concurrent editor"


Before reading the rest of this document I highly recommend you read through the resources linked in the base [README.md](../../README.md).

## High Level Architecture
At a very high level our editor server consists of 3 distinct layers.
- **The data layer**
- This layer is the server's copy of the current state of the document. This layer is what all incoming operations will modify. Our initial design for the data layer involved a singular struct modelling the entire document that we modified using reflection. This proved tricky due to the intricacies of Go's reflection system so we moved to an AST based approach. Currently the data layer is just the AST for the JSON of the document, and operations modify this AST directly. To prevent corrupted documents we have various data integrity checks utilising reflection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to mention reflection (i.e. is it still in the codebase)? if you really want to include it, it could be in some other document/section, but not here if the old approach is completely removed

- **The client layer**
- The client layer is an internal representation of an active client connection. Whenever a client connects to our server a new client object is allocated to represent their connection.
- **The document server layer**
- The server layer is a big object that models a singular document being edited, it maintains a list of all active client objects that are connected to it and the current state of its AST.

![Editor Architecture](./docs/editor_arch.png)

## Starting a Document
I personally feel like its rather easy to understand a system if you understand how it achieves its key features. This section is about what exactly happens when a user clicks the "edit" button on a document and constructs an edit session. What type of objects are created? Where do they live? Etc.

### The connection starts
When the user clicks the "edit" button, this instantiates a HTTP request that's handled by the HTTP handler in `main.go`: `func EditEndpoint(w http.ResponseWriter, r *http.Request)`. This handler takes the incoming request, looks up the requested document and if it exists upgrades the connection to a WebSocket connection. This is important as a WebSocket connection allows for bidirectional communication between the client and server in real time without needing either to poll for updates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full stop at the end


After upgrading the connection to a WebSocket connection the handler then asks the `DocumentServerFactory` to either create or fetch the object modelling an active edit session for the requested document. If the document does not already have an active edit session the `DocumentServerFactory` will proceed to read the document from disk, parse it (convert it from a text file to a go struct) and constructs a `DocumentServer`. This `DocumentServer` is responsible for managing the current state of the document, tracking the clients editing the document, and keeping the history of the operations.

After the `DocumentServer` is created / fetched, a client object is allocated and registered with the document server and a new goroutine is spun up to handle incoming operations from the client. The handler code is as follows (note it is subject to change):
```go
func EditEndpoint(w http.ResponseWriter, r *http.Request) {
requestedDocument := // parse request body
targetServer := GetDocumentServerFactoryInstance().FetchDocumentServer(requestedDoc)

wsClient := newClient(ws)
commPipe, signalDeparturePipe := targetServer.connectClient(wsClient)

go wsClient.run(commPipe, signalDeparturePipe)
}
```
During the connection process the document server returned two "pipes". A "pipe" is a function that the client (or at least the client's shadow on the server) can use to propagate messages to the server it is connected to.

### The client applies an operation
So the client has just applied an operation to their local document, the frontend has captured this and set it to the server via websockets, now what? 😳. If you remember back to the previous code snippet the last bit of code spun up a goroutine to run the `wsClient.run` function, this function is an infinite loop that is constantly reading from the websocket and forwarding the operations to the document server. The `wsClient.run` function at the moment of typing up this document looks something like:
```go
func (c *clientView) run(serverPipe pipe, signalDeparturePipe alertLeaving) {
for {
select {
case <-c.sendOp:
// push the operation down the websocket
// send an acknowledgement
break

case <-c.sendAcknowledgement:
// push the acknowledgement down the websocket
break

case <-c.sendTerminateSignal:
// looks like we've been told to terminate by the documentServer
// propagate this to the client and close this connection
c.socket.Close()
return

default:
if _, msg, err := c.socket.ReadMessage(); err == nil {
// push the update to the documentServer
if request, err := operations.ParseOperation(string(msg)); err == nil {
serverPipe(request)
}
} else {
// todo: push a terminate signal to the client, also tell the server we're leaving
signalDeparturePipe()
c.socket.Close()
}
}
}
}
```
The bit of interest is what happens in the `default` branch of the select statement (we will talk about the other branches later). Within this branch we attempt to read something from the websocket and then parse that (we will cover parsing a little later as its surprisingly complicated), we then use the `serverPipe` mentioned previously to send that request to the document server.

This `serverPipe` is a closure returned by the `buildClientPipe` method during the connection setup with the `DocumentServer`. The function is relatively intense so a lot of details have been left out here.
```go
func (s *documentServer) buildClientPipe(clientID int, workerWorkHandle chan func(), workerKillHandle chan empty) func(operations.Operation) {
return func(op operations.Operation) {
// this could also just be captured from the outer func
clientState := s.clients[clientID]
thisClient := clientState.clientView

// ... skipped implementation details

// spin up a goroutine to push this operation to the server
// we do this in a goroutine to prevent deadlocking
go func() {
defer func() {
clientState.canSendOps = true
thisClient.sendAcknowledgement <- empty{}
}()

clientState.canSendOps = false

// apply op to clientView states
s.stateLock.Lock()

// apply the operation locally and log the new operation
transformedOperation := s.transformOperation(op)
s.operationHistory = append(s.operationHistory, transformedOperation)

// apply the transformed operation locally (note that this is being applied against the server's state)
if !transformedOperation.IsNoOp {
newState, err := op.ApplyTo(s.state)
if err != nil {
log.Fatal(err)
clientState.sendTerminateSignal <- empty{}
} else {
s.state = newState
}
}

s.stateLock.Unlock()

// propagate updates to all connected clients except this one
// if we send it to this clientView then we may deadlock the server and clientView
s.clientsLock.Lock()
for id, connectedClient := range s.clients {
if id == clientID {
continue
}

// push update
connectedClient.sendOp <- transformedOperation
}
s.clientsLock.Unlock()
}
}
}
```
So whenever we get an operation from the client we:
1. communicate it to the document_server via a pipe
2. the operation is then transformed against the entire log of operations the server has applied
3. the operation is then applied to the server's representation of the document
4. the operation is then communicated to all other clients

### The document server wants to propagate operations
If you remember previously how it was mentioned that the document server "propagates" operations to the clients? It does that by sending these operations down a channel maintained by each client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite sentence, doesn't make sense

```go
type clientView struct {
socket *websocket.Conn

sendOp chan operations.Operation
sendAcknowledgement chan empty
sendTerminateSignal chan empty
}
```
`sendOp`, the `wsClient.run` function is constantly listening for messages down these channels and actions on them accordingly.
Varun-Sethu marked this conversation as resolved.
Show resolved Hide resolved

## Operation Parsing & Application
For information regarding how operation parsing and application works see ![this](./docs/operation_parsing.md) document.

## Lock Acquisition Order
To prevent deadlocks we have a defined lock acquisition order for each type, thankfully theres not many as most synchronization is achieved using channels. They are as follows.

`document_server`
1. document state lock
2. client lock

`server factory`
1. active servers lock
4 changes: 2 additions & 2 deletions backend/editor/OT/client_view.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ func newClient(socket *websocket.Conn) *clientView {
// them down the websocket, it also pulls stuff up the websocket
// the documentServer will use the appropriate channels to communicate
// updates to the client, namely: sendOp and sendAcknowledgement
func (c *clientView) run(serverPipe pipe, terminatePipe alertLeaving) {
func (c *clientView) run(serverPipe pipe, signalDeparturePipe alertLeaving) {
for {
select {
case <-c.sendOp:
Expand All @@ -59,7 +59,7 @@ func (c *clientView) run(serverPipe pipe, terminatePipe alertLeaving) {
}
} else {
// todo: push a terminate signal to the client, also tell the server we're leaving
terminatePipe()
signalDeparturePipe()
c.socket.Close()
}
}
Expand Down
Binary file added backend/editor/OT/docs/editor_arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 106 additions & 0 deletions backend/editor/OT/docs/operation_parsing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Operation Parsing & Application

Surprisingly most of the complexity in the OT editor lives in the operation parsing and application logic. This document aims to help demystify some of it. As a quick aside, everything regarding operations can be found in the `operation` folder.

## The operation struct
Operations (in Go) are defined as a struct. The operations we receive from the client conform to this structure and are parsed using a JSON parsing library (more on that later as there is a bit of complexity behind this).
```go
type (
// OperationModel defines an simple interface an operation must implement
OperationModel interface {
TransformAgainst(op OperationModel, applicationType EditType) (OperationModel, OperationModel)
Apply(parentNode cmsjson.AstNode, applicationIndex int, applicationType EditType) (cmsjson.AstNode, error)
}

// Operation is the fundamental incoming type from the frontend
Operation struct {
Path []int
OperationType EditType
AcknowledgedServerOps int

IsNoOp bool
Operation OperationModel
}
)
Varun-Sethu marked this conversation as resolved.
Show resolved Hide resolved
// EditType is an enum with `int` as the base type
type EditType int
const (
Add EditType = iota
Remove
)
```
The above code snippet uniquely defines an operation. Operations take a path to where in the document AST they are being applied (see the paper on tree based transform functions) and the physical operation being applied. The operation being applied (`OperationModel`) is actually an interface and is the reason why parsing operations is more complex than it seems. This interface defines two functions, one for transforming a operation against another operation and one for applying an operation to an AST.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to link the papers in the docs themselves, saves anyone reading having to click back and forth


The reason why `OperationModel` is an interface is because there are several distinct types of operations we can apply for varying types. Theres different operations for editing an `integer`, `array`, `boolean`, `object` and `string` field. Each of these define their own transformation functions and application logic, so in order to maintain a clean abstraction we use an interface. As an example this is how the `string` operation type implements this interface ![here](../operations/string_operation.go), its a rather intense implementation since it also implements string based transform functions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can get rid of the "it's a rather intense implementation" phrase



## Operation application
Recall that the `document_server` maintains an abstract syntax tree for the current JSON document, this AST is implemented by `cmsjson.AstNode`, and the document server maintains the **root node**. When applying an operation to a document we invoke the `Operation.ApplyTo` function
```go
func (op Operation) ApplyTo(document cmsjson.AstNode) (cmsjson.AstNode, error) {
parent, _, err := Traverse(document, op.Path)
if err != nil {
return nil, fmt.Errorf("failed to apply operation %v at target site: %w", op, err)
}

applicationIndex := op.Path[len(op.Path)-1]
return op.Operation.Apply(parent, applicationIndex, op.OperationType)
}
```
this function traverses the document AST (as defined by the path) and applies the operation to the node pointed at by the path. The final application makes use of the `Apply` function within `OperationModel`.

## Operation Parsing
So as pointed out earlier, parsing is rather tricky as our `Operation` struct contains an interface and the native JSON parsing lib in Go does not support interfaces. To get around this problem we wrote our own JSON unmarshaller based on the `goson` json parser, we call this unmarshaller `cmsjson`.

The `cmsjson` library expects a full list of all types that implement an interface we wish to parse into, this list of types is defined ![here](../operations/json_config.go)
```go
var CmsJsonConf = cmsjson.Configuration{
RegisteredTypes: map[reflect.Type]map[string]reflect.Type{
/// ....

// Type registrations for the OperationModel
reflect.TypeOf((*OperationModel)(nil)).Elem(): {
"integerOperation": reflect.TypeOf(IntegerOperation{}),
"booleanOperation": reflect.TypeOf(BooleanOperation{}),
"stringOperation": reflect.TypeOf(StringOperation{}),

"arrayOperation": reflect.TypeOf(ArrayOperation{}),
"objectOperation": reflect.TypeOf(ObjectOperation{}),
},
},
}
```
the configuration is in essence a mapping between the the `reflect.Type` representation of the interface and the `reflect.type` representation of every struct that "implements it". Implements is in quote as there is no way to statically verify this at compile time, instead if any of these config options are invalid a runtime error will be thrown during parsing. Usage of the `cmsjson` library for the most part is rather simple (thanks to generics in Go :D). An example can be found in the `ParseOperation` function within `operation_model.go`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can include a link to the code with the example


Interally the `cmsjson` library determines what struct to unmarshall into based on a `$type` attribute in a JSON object. Examples can be found in the test suite for the `cmsjson` library ![here](../../../pkg/cmsjson/cmsjson_test.go).

### More on `cmsjson`
This should ideally be under the `cmsjson` documentation but the library not only handles JSON unmarhsalling/marshalling but also exposes methods for constructing ASTs from a specific JSON document, this is used rather extensively by the `object_operation.go` object model to convert a document component to an AST. Once again, this is further documented within the `cmsjson` package, for the most part the package gives us the following interface for interacting with ASTs.
```go
// jsonNode is the internal implementation of AstNode, *jsonNode @implements AstNode
// AstNode is a simple interface that represents a node in our JSON AST, we have a few important constraints that should be enforced by any implementation of the AstNode, those constraints are:
// - An ASTNode is either a: JsonPrimitive, JsonObject or a JsonArray
// - GetKey can return nil indicating that it is JUST a value
// - Since a node can be either a JsonPrimitive, JsonObject or a JsonArray:
// - 2 of the three functions: JsonPrimitive(), JsonObject(), JsonArray() will return nil (indicating the node is not of that type) while one will return an actual value
// - We are guaranteed that one of these functions will return a value
// - All implementations of AstNode must conform to this specification (there is no way within the Go type system to enforce this unfortunately :( )
// - Note that the reflect.Type returned by JsonArray is the type of the array, ie if it was an array of integers then the reflect.type is an integer
// - Note that jsonNode implements AstNode (indirectly), AstNode is of the form:
AstNode interface {
GetKey() string

JsonPrimitive() (interface{}, reflect.Type)
JsonObject() ([]AstNode, reflect.Type)
JsonArray() ([]AstNode, reflect.Type)

// Update functions, if the underlying type does not match then an error is thrown
// ie if you perform an "UpdatePrimitive" on a JSONObject node
UpdateOrAddPrimitiveElement(AstNode) error
UpdateOrAddArrayElement(int, AstNode) error
UpdateOrAddObjectElement(int, AstNode) error

RemoveArrayElement(int) error
}
```
If you look carefully, you can see how we attempted to emulate sum types using interfaces 😛.
4 changes: 2 additions & 2 deletions backend/editor/OT/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ func EditEndpoint(w http.ResponseWriter, r *http.Request) {

wsClient := newClient(ws)
targetServer := GetDocumentServerFactoryInstance().FetchDocumentServer(uuid.MustParse(requestedDocument[0]))
commPipe, terminatePipe := targetServer.connectClient(wsClient)
commPipe, signalDeparturePipe := targetServer.connectClient(wsClient)

go wsClient.run(commPipe, terminatePipe)
go wsClient.run(commPipe, signalDeparturePipe)
}