Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine-readable config description #17892

Open
wants to merge 30 commits into
base: develop
Choose a base branch
from
Open

Conversation

V02460
Copy link
Contributor

@V02460 V02460 commented Oct 31, 2024

Adds a JSON Schema description of the Synapse configuration to the repo, together with a script generating docs/usage/config_documentation.md from it.

This has many advantages, one can:

  • Validate Synapse configuration files against the schema.
  • Keep configuration and its documentation closer together, avoiding de-synchronization.
  • Keep a uniform documentation style.
  • Represent enforceable constraints between config values.
  • Choose to reduce the amount of hand-rolled config validation code.
  • Have IDE support like keyword suggestion and linting when editing a Synapse configuration.

I was surprised how aligned the expressiveness of the schema file is with the current infos from the Synapse documentation. Besides added types and defaults, some whitespace differences, and minor changes to wording and ordering, the config documentation remains mostly unchanged. But still, it is no perfect match, with the most important restrictions being:

  1. No native sections for config options (the script patches those in).
  2. No comments in examples. Might be handled as part of the description.
  3. No native JSON Schema keyword representing the “Added in Synapse <version>” annotations (added to the plain-text description).

I expect there will be some feedback to work into this MR before it can get merged. I’m especially thinking about CI integration, JSON Schema IDs, types and defaults, and discussions about workflow – as well as a good amount of fixes due to the size of the change. Looking forward to your comments :)

Example

Running the script

scripts-dev/gen_config_documentation.py schema/synapse-config.schema.json > docs/usage/configuration/config_documentation.md

on a schema containing the following fragment

"soft_file_limit": {
  "type": "integer",
  "description": "Set the soft limit on the number of file descriptors synapse can use. Zero is used to indicate synapse should set the soft limit to the hard limit.",
  "default": 0,
  "examples": [3]
}

yields the corresponding Synapse documentation as Markdown:

### `soft_file_limit`

*(integer)* Set the soft limit on the number of file descriptors synapse can
use. Zero is used to indicate synapse should set the soft limit to the hard
limit. Defaults to `0`.

Example configuration:
```yaml
soft_file_limit: 3
```

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@V02460 V02460 requested a review from a team as a code owner October 31, 2024 17:26
@CLAassistant
Copy link

CLAassistant commented Oct 31, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot deployed to PR Documentation Preview November 5, 2024 14:09 Active
@reivilibre reivilibre requested a review from a team November 11, 2024 14:21
@@ -0,0 +1 @@
Generate config documentation from JSON Schema file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this PR in a weekly meeting to make sure that this is something we want to do. People are onboard with the benefits 👍

We just have to make sure this implementation is something we want to use and we won't be kicking ourselves months down the line that it's difficult to express constraints, validation, or formatting.

@MadLittleMods MadLittleMods requested a review from a team November 11, 2024 16:06
@reivilibre reivilibre requested review from reivilibre and removed request for a team November 13, 2024 10:13
@github-actions github-actions bot deployed to PR Documentation Preview November 13, 2024 22:29 Active
@reivilibre
Copy link
Contributor

reivilibre commented Nov 13, 2024

I reviewed the diff of the config documentation and fixed up some accidental differences that I saw, as well as some other minor improvements (IMO).

Due to time constraints I have yet to deal with these issues:

  • suspicious defaults
    • public_baseurl
    • databases Defaults to {}.
    • session_lifetime
    • refresh_token_lifetime
    • nonrefreshable_access_token_lifetime
    • room_list_publication_rules (should be "*"?)
    • alias_creation_rules (should be "*"?)

Due to a problem with my diff viewer and long lines (???), some descriptions were cut off whilst I was reading them and I have not yet re-reviewed those fragments:

  • remote_media_download_burst_count
  • remote_media_download_per_second
  • media_retention
  • For modern Android devices the notification con...
  • Ensure the main process and all pusher workers are r...
  • and it should be listed...
  • This option defaults to off, enable it by provid...

I thought the section on 'resources' in the listeners section looked suspicious but did not thoroughly re-review yet.


Questions (aimed at the team):

  • should we convert the JSONSchema document itself to YAML?
    That would allow writing comments as well as multiline strings.
    Currently the markdown strings inside the JSON document are a little bit awkward to edit, since they have to be single-line.

Thoughts:

  • string|null might look nicer as string or null.
  • I'd like to try avoiding throwing too much JSON in the reader's face:
    • 'Defaults to []' might look nicer as 'Defaults to an empty list'
    • 'Defaults to {}' might have a nicer alternative?
    • Maybe we should follow what we used to do and use the word 'none' instead of 'null'...?
    • 'list' instead of 'array'?
    • maybe there's a word for 'object' that would suit better
  • In url_preview_url_blacklist and some others, it would probably be nice to preserve the comments in the example. We can probably add some io.element.yaml_example field to the schema with verbatim YAML for such cases, which should be fairly rare.

Things missing from this PR that I think are needed:

  • some options are missing descriptions and this shows up as MISSING DESCRIPTION. We should add descriptions for those.
  • The config documentation generation should be checked in CI — to make sure the schema is valid and that the doc doesn't fall out of sync by accidental manual edits
  • missing extra docs on enable_authenticated_media (due to a concurrent PR)

Things missing from this PR that can probably wait:

  • Synapse should enforce the JSONSchema validations on startup (both because this would be useful and because this will help ensure they don't fall out of sync)

Risks:

  • if the schema is wrong and we start validating it in Synapse, we might break some servers' ability to start up

If we accepted this PR now but for some reason didn't like it later:

  • We could regenerate the config documentation, then remove the schema and generator. We'd be back in the same place as now, before accepting the PR.

@anoadragon453
Copy link
Member

anoadragon453 commented Nov 14, 2024

Questions (aimed at the team):

should we convert the JSONSchema document itself to YAML?
That would allow writing comments as well as multiline strings.
Currently the markdown strings inside the JSON document are a little bit awkward to edit, since they have to be single-line.

(without yet looking too deeply into the implementation) those benefits definitely sound worth it, if it's not painful to do.

@V02460
Copy link
Contributor Author

V02460 commented Nov 14, 2024

Nice to see you involved, I’m really glad you’re going with this MR! :)

  • suspicious defaults
    • public_baseurl

One important lesson I learned since starting with schemas is: behavior should be the same whether an option is unset or null. This is especially true if JSON documents are combined into a single document with one or more of those slightly weird and slightly different merge algorithms (looking at you, Helm). public_baseurl has a default that depends on another setting; to get this computed value, one should equally be able to omit the key completely or alternatively set it to null.

  • databases Defaults to {}.

Right, I think null is better here as it more clearly expresses mutual exclusiveness with database.

  • session_lifetime
  • refresh_token_lifetime
  • nonrefreshable_access_token_lifetime

Oups, I got confused with float inf values. "infinity" would be semantically fitting here, though.

  • room_list_publication_rules (should be "*"?)
  • alias_creation_rules (should be "*"?)

Change it like this?

{
  "type": ["array", "string"],
  "oneOf": [
    {
      "type": ["array"]
    },
    {
      "type": ["string"],
      "const": "*"
    }
  ],
  "default": "*"
}

should we convert the JSONSchema document itself to YAML?

I don’t see a strong reason against that. Depending on the tool, one might have to convert it back to JSON first.

string|null might look nicer as string or null.

The null type might be special-cased as an adjective like optional (“optional string”) to be closer to natural language, if that is what you want. I think a good word would apply to an unset as well as a null property.

'Defaults to {}' might have a nicer alternative?
maybe there's a word for 'object' that would suit better

“Defaults to an empty map” might be something you are more comfortable with.

(without yet looking too deeply into the implementation) those benefits definitely sound worth it, if it's not painful to do.

The conversion is pretty easy to do.

Regarding the io.element.* keywords: Is this best practice for extending the schema vocabulary? Does the JSON Schema spec say something about this? I know Kubernetes’ OpenAPI uses the x- prefix for custom keywords.

@V02460
Copy link
Contributor Author

V02460 commented Nov 14, 2024

The config documentation generation should be checked in CI — to make sure the schema is valid and that the doc doesn't fall out of sync by accidental manual edits

I’m not too experienced with GitHub CI, but those two snippets should do the job.

# Assert schema is valid
boon schemas/synapse-config.schema.json

Running in a clean git workspace:

# Regenerate config
scripts-dev/gen_config_documentation.py schemas/synapse-config.schema.json > docs/usage/configuration/config_documentation.md
# Print all modifications
git diff
# Check if any files were modified.
! git status --porcelain=1 | grep "^ M"

@reivilibre
Copy link
Contributor

Regarding the io.element.* keywords: Is this best practice for extending the schema vocabulary? Does the JSON Schema spec say something about this? I know Kubernetes’ OpenAPI uses the x- prefix for custom keywords.

Admittedly I didn't look! It's what we're used to in Matrix's JSON I guess (and reverse domain name is probably a Java influence). If the JSONSchema spec says anything about this, let's follow that, otherwise I think this convention is probably most familiar to people working on Synapse and makes it easy to distinguish from some standardised field.

@V02460
Copy link
Contributor Author

V02460 commented Nov 26, 2024

I looked into how to properly handle the meta stuff of JSON Schemas.

Because of the custom io.element.* keywords, a custom vocabulary must be defined – I went with the name https://schema.element.io/v1/vocab/documentation. To use this vocabulary, a custom meta schema is required, which defines the syntax (and in this case also the semantics) of the vocabulary. This meta schema is meta.schema.json, which is a small extension to https://json-schema.org/draft/2020-12/schema. I gave this custom meta schema the $id https://schema.element.io/v1/meta.schema.json. For versioning I used v1 in the path as I expect those files to seldom change.

Because the synapse-config.schema.json is expected to change most Synapse releases, it should be versioned by those release versions. For its $id, I went with https://schema.element.io/synapse/v1.120.0/synapse-config.schema.json. For the actual Synapse schema in GIT it could make sense to use some unversioned $id like https://schema.element.io/synapse/latest/synapse-config.schema.json and only generate the versioned ones on release.

It would be great if the Element team could actually provide these files at the given URLs, for this allows tools to directly look up and work with the relevant schemas. A suggestion for organizing the files would be the following structure (arrow denotes a redirection):

https://schema.element.io
├── v1
│   ├── meta.schema.json
│   └── vocab/documentation → /v1/meta.schema.json
└── synapse
    ├── v1.120.0/synapse-config.schema.json
    ├── v1.121.0/synapse-config.schema.json
    ├── v1.121.1/synapse-config.schema.json
    └── …

Now feel free to bikeshed 😉

@github-actions github-actions bot deployed to PR Documentation Preview November 26, 2024 14:17 Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants