csesoc · Varun-Sethu · Jun 15, 2023
diff --git a/backend/README.md b/backend/README.md
@@ -20,6 +20,20 @@ The backend folder contains all our backend code 🤯. Theres a few important fo
    - WIP TypeScript implementation of client server for operational transform
 
 
+## Architectural Overview
+At a very high level our CMS looks as follows
+![CMS arch](./docs/CMS%20high%20level%20architecture.png)
+There are a few core parts of the CMS
+ - Handlers (within the `endpoints` directory)
+ - Repository Layer (within the `database` directory)
+ - Concurrent Editor (within the `editor` directory)
+
+Each of these sections has their own little bit of documentation in their respective directory so this bit will focus more on the high level aspects of the CMS.
+
+At a very high level the CMS allows users to easily manage, collaborate on and create static web content, the whole goal of this is to allow for the easy extension of maintenance of the CSESoc Website as well as any other websites CSESoc produces. The CMS identifies different websites as "frontends", each frontend gets its own unique file tree that members of the frontend can manage, this file tree is stored within the `Postgres DB`. 
+
+Users of the CMS work on the idea of "documents", documents are unique pages that they can create and edit, there are two different types of documents: `published` and `unpublished`, published documents are visible to any unauthenticated user, regardless of if they are a member of the frontend group or not while `unpublished` documents are only visible to editors. Each of these document types are stored in their respective content volumes, we duplicate these documents as we may have a version A of a document that we wish to be public but have a version A' that we're still editing, that should be private. We also store other types of content such as images and videos but such content does not receive a published/unpublished distinction.
+
 ## Papers worth Reading
 Most of the complexity of the CMS backend is within the editor, to aid with your tickets we have accumilated a few great resources that are worth a read.
   - [A survey of OT algorithms](https://www.researchgate.net/profile/Ajay-Khunteta-2/publication/45183356_A_Survey_on_Operational_Transformation_Algorithms_Challenges_Issues_and_Achievements/links/5b9b27dca6fdccd3cb533171/A-Survey-on-Operational-Transformation-Algorithms-Challenges-Issues-and-Achievements.pdf?origin=publication_detail)

diff --git a/backend/docs/CMS high level architecture.png b/backend/docs/CMS high level architecture.png
diff --git a/backend/endpoints/docs/Endpoints.md b/backend/endpoints/docs/Endpoints.md
@@ -0,0 +1,105 @@
+# Endpoints
+
+Endpoints in the CMS are structured rather differently from endpoints in regular Go applications, CMS endpoints are structured in order to share a lot of common details such as auth and session management between endpoints. The goal was to allow future endpoints to be written in a manner that allowed them to focus PURELY on business logic and not annoying details such as authentication, form parsing and dependency acquisition.
+
+## Creating an Endpoint
+To gain a better understanding of how CMS endpoints work we will create a quick endpoint. This endpoint will be a `GET` endpoint that will take a request body of the form:
+```json
+{
+    "ping_count": 10,
+    "pong_message": "tomato"
+}
+```
+and will return a response of the form:
+```json
+{
+    "response": "message"
+}
+```
+where `"message"` is the pong message repeated `ping_count` times, ie. the response to the request above would be
+```json
+{
+    "response": "tomatotomatotomatotomatotomatotomatotomatotomatotomatotomato"
+}
+```
+Lets create a handler for this :). The first step is to define a type modelling the input to this http handler, this will look like:
+```go
+type RequestInput struct {
+    PingCount   int     `schema:"ping_count"`
+    PongMessage string  `schema:"pong_message"`
+}
+```
+The schema field annotations correspond to what fields in the JSON object correspond to the fields in the Go struct definition. Alongside this input type we must also define an output type
+```go
+type RequestResponse struct {
+    Response    string  `schema:"response"`
+}
+```
+Now that we have both our input and output types we can now define our handler. HTTP handlers take an input form alongside a "dependency factory" (more on this later) as arguments and output a specialized `handlerResponse` type.
+```go
+func RequestHandler(form RequestInput, df DependencyFactory) handlerResponse[RequestResponse] {
+
+}
+```
+The framework automatically handles the deserialization of the request input and serialization of our output, allowing us to focus entirely on business logic. All thats left for us to do now is define our business logic.
+```go
+func RequestHandler(form RequestInput, df DependencyFactory) handlerResponse[RequestResponse] {
+    response := strings.Builder{}
+
+    for i := 0; i < form.PingCount; i++ {
+        response.WriteString(form.PongMessage)
+    }
+
+    return handlerResponse[RequestResponse]{
+        Status: http.StatusOk,
+        Response: RequestResponse {
+            Response: response.String()
+        }
+    }
+}
+```
+Note that when writing this handler we didnt need to worry about any of the usual things we would be concerned with in Go, the inputs and outputs are "automagically" converted to/from JSON for us.
+
+Now that the core handler logic has been created we must now register it and map it to and endpoint. Endpoints are registered within `registration.go`, there are 3 possible registration types we can apply to an endpoint:
+ - Regular
+    - Regular handlers simply take the input form as an argument and a dependency factory. There is no authentication blocking access to these handlers.
+ - Authenticated
+    - These handlers are like regular handlers except they require authentication to access. 
+ - Raw
+    - These are a special type of handler used when your business logic requires access to raw `http.request` and `http.response` values, one obvious example is any handler that upgrades HTTP requests to a websocket connection.
+
+For our simple endpoint we will go with a regular handler. We can register it like so:
+```go
+mux.Handle("/ping", newHandler("GET", RequestHandler, /* isMultipart = */ false))
+```
+You may have noticed the weird `isMultipart` field, this field indicates if the request accepts multipart values (ie. Images, Videos, etc). An example of such an endpoint is `/api/filesystem/upload-image`.
+
+## Endpoint Configurations
+As mentioned earlier, there are quite a few ways to configure and customise your endpoint. The main ones being regular/authenticated and raw endpoints. There are however a few more options.
+ - Regular Handlers, Raw Handlers, Authenticated Handlers
+    - Each of these handler types support multipart requests, these were discussed earlier
+ - Authenticated / Unauthenticated raw handlers
+    - As a leak in the abstraction provided by our endpoint framework, raw handlers require a boolean flag indicating if they require authentication.
+ - Raw handlers / isWebsocket
+    - this configuration indicates if you intend to use the handler to upgrade HTTP connections to websocket connections, once again this is a good example of a leaky abstraction which should be refactored out
+    - the reason we care is because once connections are upgraded to websocket connections the framework must know that it can no longer write data to the corresponding request/response values.
+
+
+
+
+## Dependencies & Inversion of Control
+The way the framework deals with dependencies is by disallowing handlers to instantiate dependencies themselves, instead handlers must fetch dependencies from the `dependencyFactory` thats passed to them as an argument. The `DependencyFactory` is merely a simple interface with the definition:
+```go
+type DependencyFactory interface {
+	GetFilesystemRepo() repos.FilesystemRepository
+	GetGroupsRepo() repos.GroupsRepository
+	GetFrontendsRepo() repos.FrontendsRepository
+	GetPersonsRepo() repos.PersonRepository
+
+	GetUnpublishedVolumeRepo() repos.UnpublishedVolumeRepository
+	GetPublishedVolumeRepo() repos.PublishedVolumeRepository
+
+	GetLogger() *logger.Log
+}
+```
+the interface exposes a bunch of methods to acquire specific database repositories as well as stuff like loggers. The advantage of writing handlers in such a way is that it makes testing them really simple, we can employ simple mocking strategies to effectively unit test handlers, it also allows us to abstract over details such as what filesystem we may actually be looking at. Doing so allows endpoint handlers to be written without a concern of what filesystem/frontend we're looking at as those details are abstracted away by the factory.
diff --git a/backend/endpoints/registration.go b/backend/endpoints/registration.go
@@ -26,6 +26,7 @@ func RegisterEditorEndpoints(mux *http.ServeMux) {
 	mux.Handle("/editor", newRawHandler("GET", EditHandler, false, false, true)) // auth
 }
 
+// TODO: think of a better abstraction to remove all these boolean flags
 // newHandler is just a small wrapper around a handler that returns an instance of a handler struct
 func newHandler[T, V any](formType string, handlerFunc func(T, DependencyFactory) handlerResponse[V], isMultipart bool) handler[T, V] {
 	return handler[T, V]{

diff --git a/postgres/README.md b/postgres/README.md
@@ -0,0 +1,24 @@
+# Database
+
+The database/postgres folder contains everything related to our schema + in house migration system. Before discussing the schema lets talk about the migration mechanism.
+
+## Migrations
+In order to prevent old data from being deleted we cannot completely destroy the DB container when we created changes to the DB. As such we must introduce a migration mechanism. Migrations are basically small little scripts that contain patches we want to apply to the DB, when deploying changes we create these migration scripts and tell the migration engine to run the new script. 
+
+### How to do a CMS Migration
+If you're introducing a new DB change to the CMS simply create your migration script in `up/version.sql`, then increase the `dbver.txt` number by 1, doing so will introduce your changes to the staging database. Internally the migration engine just deletes and recreates the DB (this is fine for now but in production situations we cannot do this.) In the future we are looking to replace this with a proper migration engine like `sqitch`.
+
+## Schema and Tables
+The CMS is composed of a few key tables, these are:
+ - The Frontend Table
+    - Maintains metadata regarding all frontend clients registered on the CMS (the best way to think of a frontend client is as a website created on the CMS)
+ - The Person table
+    - User data (as in editors)
+ - The Groups table
+    - The groups table manages catch all permissions that are applied the users, users can be members of certain groups and will inherit those permissions.
+    - There are more details regarding frontend level groups but that PR is still pending at the moment.
+ - The Filesystem Table + Metadata Table
+    - Models filesystem for every possible frontend, the associated metadata table contains information about each entity within the FS table.
+
+We have produced a small diagram that will hopefully make the DB a little easier to understand. Do not that the diagram is not reflective of teh current state of the DB and is in fact reflective of what it will look like after a few key PRs are finalised.
+![Diagram](./db_diagram.png)
diff --git a/postgres/db_diagram.png b/postgres/db_diagram.png