Skip to content

Commit

Permalink
chore: format all documents
Browse files Browse the repository at this point in the history
Signed-off-by: peefy <[email protected]>
  • Loading branch information
Peefy committed Dec 14, 2023
1 parent 7ffc2a6 commit 98165c5
Show file tree
Hide file tree
Showing 11 changed files with 129 additions and 102 deletions.
47 changes: 24 additions & 23 deletions blog/2023-12-07-biweekly-newsletter/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,11 @@ Thank you to all contributors for their outstanding work over the past two weeks
**📦 Module Update**

The number of KCL models has increased to **240**, mainly including models related to Crossplane Provider and libraries related to JSON merging operations.
+ KCL JSON Patch library:_[https://artifacthub.io/packages/kcl/kcl-module/jsonpatch](https://artifacthub.io/packages/kcl/kcl-module/jsonpatch)_
+ KCL JSON Merge Patch library: _[https://artifacthub.io/packages/kcl/kcl-module/json_merge_patch](https://artifacthub.io/packages/kcl/kcl-module/json_merge_patch)_
+ KCL Kubernetes Strategy Merge Patch library: _[https://artifacthub.io/packages/kcl/kcl-module/strategic_merge_patch](https://artifacthub.io/packages/kcl/kcl-module/strategic_merge_patch)_
+ KCL Crossplane and Crossplane Provider series models: _[https://artifacthub.io/packages/search?org=kcl&sort=relevance&page=1&ts_query_web=crossplane](https://artifacthub.io/packages/search?org=kcl&sort=relevance&page=1&ts_query_web=crossplane)_

- KCL JSON Patch library:_[https://artifacthub.io/packages/kcl/kcl-module/jsonpatch](https://artifacthub.io/packages/kcl/kcl-module/jsonpatch)_
- KCL JSON Merge Patch library: _[https://artifacthub.io/packages/kcl/kcl-module/json_merge_patch](https://artifacthub.io/packages/kcl/kcl-module/json_merge_patch)_
- KCL Kubernetes Strategy Merge Patch library: _[https://artifacthub.io/packages/kcl/kcl-module/strategic_merge_patch](https://artifacthub.io/packages/kcl/kcl-module/strategic_merge_patch)_
- KCL Crossplane and Crossplane Provider series models: _[https://artifacthub.io/packages/search?org=kcl&sort=relevance&page=1&ts_query_web=crossplane](https://artifacthub.io/packages/search?org=kcl&sort=relevance&page=1&ts_query_web=crossplane)_

**🔧 Toolchain Update**

Expand Down Expand Up @@ -55,11 +56,11 @@ In addition to the existing Go and Python SDKs in KCL, a new Rust SDK has been a

**📒 Documentation Updates**

+ Added index cards for KCL system library documentation for easy navigation: _[https://kcl-lang.io/docs/reference/model/overview](https://kcl-lang.io/docs/reference/model/overview)_
+ Updated KCL CLI reference documentation: _[https://kcl-lang.io/docs/tools/cli/kcl/overview](https://kcl-lang.io/docs/tools/cli/kcl/overview)_
+ Updated KCL API reference documentation: _[https://kcl-lang.io/docs/reference/xlang-api/overview](https://kcl-lang.io/docs/reference/xlang-api/overview)_
+ KCL 2023 & 2024 Roadmap document: _[https://kcl-lang.io/docs/community/release-policy/roadmap](https://kcl-lang.io/docs/community/release-policy/roadmap)_
+ Supplemented project structure introduction and FAQ for Intellij KCL repository: _[https://github.com/kcl-lang/intellij-kcl/pull/18](https://github.com/kcl-lang/intellij-kcl/pull/18)_
- Added index cards for KCL system library documentation for easy navigation: _[https://kcl-lang.io/docs/reference/model/overview](https://kcl-lang.io/docs/reference/model/overview)_
- Updated KCL CLI reference documentation: _[https://kcl-lang.io/docs/tools/cli/kcl/overview](https://kcl-lang.io/docs/tools/cli/kcl/overview)_
- Updated KCL API reference documentation: _[https://kcl-lang.io/docs/reference/xlang-api/overview](https://kcl-lang.io/docs/reference/xlang-api/overview)_
- KCL 2023 & 2024 Roadmap document: _[https://kcl-lang.io/docs/community/release-policy/roadmap](https://kcl-lang.io/docs/community/release-policy/roadmap)_
- Supplemented project structure introduction and FAQ for Intellij KCL repository: _[https://github.com/kcl-lang/intellij-kcl/pull/18](https://github.com/kcl-lang/intellij-kcl/pull/18)_

## Special Thanks

Expand Down Expand Up @@ -155,10 +156,10 @@ original:
template:
spec:
containers:
- name: my-container-1
image: my-image-1
- name: my-container-2
image: my-image-2
- name: my-container-1
image: my-image-1
- name: my-container-2
image: my-image-2
patch:
apiVersion: apps/v1
kind: Deployment
Expand All @@ -171,10 +172,10 @@ patch:
template:
spec:
containers:
- name: my-container-1
image: my-new-image-1
- name: my-container-3
image: my-image-3
- name: my-container-1
image: my-new-image-1
- name: my-container-3
image: my-image-3
got:
apiVersion: apps/v1
kind: Deployment
Expand All @@ -188,12 +189,12 @@ got:
template:
spec:
containers:
- name: my-container-1
image: my-new-image-1
- name: my-container-2
image: my-image-2
- name: my-container-3
image: my-image-3
- name: my-container-1
image: my-new-image-1
- name: my-container-2
image: my-image-2
- name: my-container-3
image: my-image-3
```
As seen in the output, the labels, replicas, and container fields of the Deployment template have all been updated with the correct values. For more documentation and usage examples, please refer to [the document](https://artifacthub.io/packages/kcl/kcl-module/strategic_merge_patch).
Expand Down
74 changes: 48 additions & 26 deletions blog/2023-12-09-kcl-new-semantic-model/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,35 +8,39 @@ tags: [KCL, Semantic]
---

## What is the KCL semantic model?

![image.png](/img/blog/2023-12-09-kcl-new-semantic-model/01.png)

- "Semantic model" refers to the in-memory representation of modules, functions, and types that appear in source code. This representation is fully "resolved": all expressions have types (note that there may be expression types that cannot be deduced in the KCL, they will be defined as any **Type**), and all references are bound to declarations, etc.
- The client can submit a small amount of input data (typically changes to a single file) and get a new code model to explain the changes.
- The underlying engine ensures that the model is **Lazy (on demand) and incremental** computational and can be updated quickly for small changes.

## Why does KCL need a new semantic model?

First, we can take a brief look at the design of the old semantic model: ![image.png](/img/blog/2023-12-09-kcl-new-semantic-model/02.png) the old semantic model can be simply regarded as a collection of a large number of scopes, in which different scopes store the parent-child nodes between the scopes, as well as the symbol strings and corresponding types contained therein. It can simply meet the requirements of the compiler for type checking and code generation. But a simple structure involving an advanced tool chain, such as an IDE, is not sufficient. A few examples of typical IDE queries:

- What is the type of the AST node under the corresponding position?
- Where are all the references to the current AST node? (find reference)
- Which node is referenced by the current AST node? (find definition)
- All symbols accessible at the current position?
- What is the type of the AST node under the corresponding position?
- Where are all the references to the current AST node? (find reference)
- Which node is referenced by the current AST node? (find definition)
- All symbols accessible at the current position?

Only the old semantic model is used, which requires the IDE to traverse the AST for many times and perform repeated calculations. We can simply analyze the problems of the old semantic model, and we can find that:

- The old semantic model was more difficult to query information, and only stored the mapping from character string to symbol
- The association between symbols and the weak association between symbols and scopes often leads to the need to traverse all scopes when querying for relevant information
- A large amount of intermediate information is discarded in the analysis process and is not cached, resulting in repeated operations for multiple queries.
- The old semantic model was more difficult to query information, and only stored the mapping from character string to symbol
- The association between symbols and the weak association between symbols and scopes often leads to the need to traverse all scopes when querying for relevant information
- A large amount of intermediate information is discarded in the analysis process and is not cached, resulting in repeated operations for multiple queries.

In short, the old semantic model can not meet the query needs of the advanced tool chain, and a lot of information is missing. On the other hand, the old semantic model does not support incremental compilation, which also reduces the user experience of the tool chain.

## Main Idea: Map Reduce
The idea of the Map Reduce architecture is to split the analysis into a relatively simple indexing phase and a separate full analysis phase.

The core constraint of indexing is that it runs on a per-file basis, with the indexer taking the text of a single file, parsing it, and spitting out some data about that file. The indexer cannot touch other files. Full analysis can read other files and use the information in the index to save effort.
The idea of the Map Reduce architecture is to split the analysis into a relatively simple indexing phase and a separate full analysis phase.

The core constraint of indexing is that it runs on a per-file basis, with the indexer taking the text of a single file, parsing it, and spitting out some data about that file. The indexer cannot touch other files. Full analysis can read other files and use the information in the index to save effort.

This sounds too abstract, so let's look at a concrete example-Java. In Java, each file begins with a package declaration. The indexer concatenates the package name with the class name to get the fully qualified name. It also collects the set of methods declared in the class, the list of superclasses and interfaces, and so on.
This sounds too abstract, so let's look at a concrete example-Java. In Java, each file begins with a package declaration. The indexer concatenates the package name with the class name to get the fully qualified name. It also collects the set of methods declared in the class, the list of superclasses and interfaces, and so on.

The data for each file is merged into an index that maps fully qualified names (FQNs) to classes. The index is inexpensive to update, and when a file modification request arrives, the contribution to that file in the index is removed, the text of the file is changed, and the indexer runs on the new text and adds the new contribution. The amount of work to be done is proportional to the number of files changed and is independent of the total number of files.
The data for each file is merged into an index that maps fully qualified names (FQNs) to classes. The index is inexpensive to update, and when a file modification request arrives, the contribution to that file in the index is removed, the text of the file is changed, and the indexer runs on the new text and adds the new contribution. The amount of work to be done is proportional to the number of files changed and is independent of the total number of files.

Let's see how to use the FQN index to quickly provide completion.

Expand Down Expand Up @@ -85,8 +89,6 @@ One problem with the methods described so far is that parsing types from indexes

To summarize, the working principle of the first method is as follows:



1. Each file is independently and parallelly indexed, generating a "stub" - a set of visible top-level declarations with unresolved types.
2. Merge all stubs into one indexed data structure.
3. Name parsing and type inference are mainly based on stubs.
Expand All @@ -96,26 +98,29 @@ To summarize, the working principle of the first method is as follows:
- If the editor has not changed the stub of the file, there is no need to change the index.
- Otherwise, the old index will be deleted and the new index will be added again



This method is simple enough and has excellent performance. Most of the work is mainly in the indexing phase, and we can execute these tasks in parallel. Two examples of this architecture are [IntelliJ](https://www.jetbrains.com/idea/) And [Sorbet](https://sorbet.org/).

The main drawback of this method is that it is only effective when it is effective - specifically, not every language has a clearly defined concept of FQN. But overall, designing modules and name parsing is always good for languages, and specifically, in the current situation, KCL just meets this condition.

## New Semantic Model Pipeline

The overall pipeline of the new semantic model is as follows:
![image.png](/img/blog/2023-12-09-kcl-new-semantic-model/03.png)

![image.png](/img/blog/2023-12-09-kcl-new-semantic-model/04.png)

![image.png](/img/blog/2023-12-09-kcl-new-semantic-model/05.png)
Compared with the analysis process of the original semantic model, the new semantic model adds two rounds of pass, namer and advanced _ resolve, so as to enhance the support for the advanced tool chain without affecting the original compiler process.
Compared with the analysis process of the original semantic model, the new semantic model adds two rounds of pass, namer and advanced \_ resolve, so as to enhance the support for the advanced tool chain without affecting the original compiler process.

- The `resolver` is based on file level work, mainly involving the initialization of the `GlobalState`, parsing the source code into AST, and establishing the mapping of AST nodes to types for later stages to use. Therefore, we can cache the output of a single file index to completely skip parsing the file when its content has not changed.
- The early stage of `namer` will also be based on file level work, which collects global symbols defined in the file, and then merges the symbols based on FQN to obtain a unique `GlobalState`
- Based on file level, it means we can easily perform incremental compilation in the first two stages
- `advanced_resolver` will traverse the AST to resolve local symbols and point symbol references to their definitions, while setting the owner symbol for the local scope, such as `Schema` and `Package`

- The `resolver` is based on file level work, mainly involving the initialization of the `GlobalState`, parsing the source code into AST, and establishing the mapping of AST nodes to types for later stages to use. Therefore, we can cache the output of a single file index to completely skip parsing the file when its content has not changed.
- The early stage of `namer` will also be based on file level work, which collects global symbols defined in the file, and then merges the symbols based on FQN to obtain a unique `GlobalState`
- Based on file level, it means we can easily perform incremental compilation in the first two stages
- `advanced_resolver` will traverse the AST to resolve local symbols and point symbol references to their definitions, while setting the owner symbol for the local scope, such as `Schema` and `Package`
## Semantic Database: GlobalState

The core structure of the new semantic model is `core::GlobalState` that the tool chain mainly uses it to complete the interaction and query with the compiler.

```rust
/// GlobalState is used to store semantic information of KCL source code
#[derive(Default, Debug, Clone)]
Expand All @@ -130,14 +135,18 @@ pub struct GlobalState {
pub(crate) sema_db: SemanticDB,
}
```

`GlobalState` is used as a semantic database for the new semantic model and is the final product of semantic analysis, mainly containing four aspects of information:

- `SymbolData`: stores all symbols in the AST and their corresponding semantic information, and maintain the reference relationship.
- `ScopeData`: stores all scopes involved in the AST, while separating symbols, maintaining symbol visibility and scope nesting relationships
- `PackageDB`: stores package information, such as a collection of files for the package, import information, and so on.
- `SemanticDB`: stores auxiliary information to speed up queries, such as symbol sorting and position caching.

### SymbolData

`SymbolData` is responsible for managing the allocation of symbols and storing the allocated symbols and related semantic information. Here we borrow the arena design of rust to access the relevant symbols.

```rust
#[derive(Default, Debug, Clone)]
pub struct KCLSymbolData {
Expand All @@ -152,17 +161,21 @@ pub struct KCLSymbolData {
pub(crate) symbols_info: SymbolDB,
}
```

In the new semantic model, we use `core::SymbolRef` to represent a reference to a symbol, and also use `SymbolRef` to access `SymbolData` for the specific symbol information.

```rust
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct SymbolRef {
pub(crate) id: generational_arena::Index,
pub(crate) kind: SymbolKind,
}
```

Specifically, Symbols with different types will be taken out from `SymbolData` according to `SymbolRef::kind` and converted to abstract `trait Symbol`.

```rust
pub type KCLSymbol = dyn Symbol<SymbolData = KCLSymbolData,
pub type KCLSymbol = dyn Symbol<SymbolData = KCLSymbolData,
SemanticInfo = KCLSymbolSemanticInfo>;
pub fn get_symbol(&self, id: SymbolRef) -> Option<&KCLSymbol> {
match id.get_kind() {
Expand All @@ -174,6 +187,7 @@ pub fn get_symbol(&self, id: SymbolRef) -> Option<&KCLSymbol> {
}
}
```

```rust
pub trait Symbol {
type SymbolData;
Expand All @@ -195,9 +209,13 @@ pub trait Symbol {
fn full_dump(&self, data: &Self::SymbolData) -> Option<String>;
}
```

hrough this trait, the tool chain can easily complete the query of symbol semantic information and reference relationship.

### ScopeData

The design idea of `ScopeData` is actually similar to `SymbolData`, it stores `Scope` with different types and using `ScopeRef` to access them.

```rust
#[derive(Default, Debug, Clone)]
pub struct ScopeData {
Expand All @@ -207,6 +225,7 @@ pub struct ScopeData {
pub(crate) roots: generational_arena::Arena<RootSymbolScope>,
}
```

```rust
pub trait Scope {
type SymbolData;
Expand All @@ -221,13 +240,15 @@ pub trait Scope {

fn get_all_defs(&self, ...) -> HashMap<String, SymbolRef>;

fn dump(&self, scope_data: &ScopeData,
fn dump(&self, scope_data: &ScopeData,
symbol_data: &Self::SymbolData) -> Option<String>;
}
```

### SemanticDB

`SemanticDB` is essentially the caching and integration of partial semantic information of semantic objects. Its main function is to accelerate the maintenance and querying of internal information in `GlobalState`.

```rust
#[derive(Debug, Default, Clone)]
pub struct SemanticDB {
Expand All @@ -245,10 +266,11 @@ pub struct FileSemanticInfo {
```

## Summary

The new semantic model of KCL essentially only does two things, one is to sink the repeated calculation of the tool chain in the application layer to the semantic layer, and design a corresponding mechanism to simplify the information query, and the other is to re-analyze and cache the information lost in the semantic analysis process. There are several main purposes for doing so:

- The calculation process is gathered, and the intrusion of the application layer into the semantic core of the compiler is prevented
- The cache mechanism is improved, and the implementation of incremental compilation is simplified, so that the query speed is accelerated.
- Improve maintainability by simplifying the development of the application layer tool chain and reducing the handling of Corner Cases by the tool chain
- The calculation process is gathered, and the intrusion of the application layer into the semantic core of the compiler is prevented
- The cache mechanism is improved, and the implementation of incremental compilation is simplified, so that the query speed is accelerated.
- Improve maintainability by simplifying the development of the application layer tool chain and reducing the handling of Corner Cases by the tool chain

In practice, the above objectives are basically achieved. After migration, the code volume of LSP related functions is reduced by about 60%, and the compilation speed is increased by about 500% after incremental compilation.
In practice, the above objectives are basically achieved. After migration, the code volume of LSP related functions is reduced by about 60%, and the compilation speed is increased by about 500% after incremental compilation.
Loading

0 comments on commit 98165c5

Please sign in to comment.