Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge tolk #1351

Merged
merged 15 commits into from
Nov 2, 2024
Merged

Merge tolk #1351

merged 15 commits into from
Nov 2, 2024

Conversation

EmelyanenkoK
Copy link
Member

No description provided.

EmelyanenkoK and others added 15 commits October 22, 2024 11:56
Fix updating neighbors in private overlays (#1314)
The Tolk Language will be positioned as "next-generation FunC".
It's literally a fork of a FunC compiler,
introducing familiar syntax similar to TypeScript,
but leaving all low-level optimizations untouched.

Note, that FunC sources are partially stored
in the parser/ folder (shared with TL/B).
In Tolk, nothing is shared.
Everything from parser/ is copied into tolk/ folder.
All changes from PR "FunC v0.5.0":
#1026

Instead of developing FunC, we decided to fork it.
BTW, the first Tolk release will be v0.6,
a metaphor of FunC v0.5 that missed a chance to occur.
As it turned out, PSTRING() created a buffer of 128K.
If asm_code exceeded this buffer, it was truncated.
I've just dropped PSTRING() from there in favor of std::string.
A new lexer is noticeably faster and memory efficient
(although splitting a file to tokens is negligible in a whole pipeline).

But the purpose of rewriting lexer was not just to speed up,
but to allow writing code without spaces:
`2+2` is now 4, not a valid identifier as earlier.

The variety of symbols allowed in identifier has greatly reduced
and is now similar to other languages.

SrcLocation became 8 bytes on stack everywhere.

Command-line flags were also reworked:
- the input for Tolk compiler is only a single file now, it's parsed, and parsing continues while new #include are resolved
- flags like -A -P and so on are no more needed, actually
Several related changes:
- stdlib.tolk is embedded into a distribution (deb package or tolk-js),
  the user won't have to download it and store as a project file;
  it's an important step to maintain correct language versioning
- stdlib.tolk is auto-included, that's why all its functions are
  available out of the box
- strict includes: you can't use symbol `f` from another file
  unless you've #include'd this file
- drop all C++ global variables holding compilation state,
  merge them into a single struct CompilerState located at
  compiler-state.h; for instance, stdlib filename is also there
Now, the whole .tolk file can be loaded as AST tree and
then converted to Expr/Op.
This gives a great ability to implement AST transformations.
In the future, more and more code analysis will be moved out of legacy to AST-level.
Since I've implemented AST, now I can drop forward declarations.
Instead, I traverse AST of all files and register global symbols
(functions, constants, global vars) as a separate step, in advance.

That's why, while converting AST to Expr/Op, all available symbols are
already registered.
This greatly simplifies "intermediate state" of yet unknown functions
and checking them afterward.

Redeclaration of local variables (inside the same scope)
is now also prohibited.
Lots of changes, actually. Most noticeable are:
- traditional //comments
- #include -> import
- a rule "import what you use"
- ~ found -> !found (for -1/0)
- null() -> null
- is_null?(v) -> v == null
- throw is a keyword
- catch with swapped arguments
- throw_if, throw_unless -> assert
- do until -> do while
- elseif -> else if
- drop ifnot, elseifnot
- drop rarely used operators

A testing framework also appears here. All tests existed earlier,
but due to significant syntax changes, their history is useless.
- split stdlib.tolk into multiple files (tolk-stdlib/ folder)
  (the "core" common.tolk is auto-imported, the rest are
  needed to be explicitly imported like "@stdlib/tvm-dicts.tolk")
- all functions were renamed to long and clear names
- new naming is camelCase
This is a very big change.
If FunC has `.methods()` and `~methods()`, Tolk has only dot,
one and only way to call a `.method()`.
A method may mutate an object, or may not.
It's a behavioral and semantic difference from FunC.

- `cs.loadInt(32)` modifies a slice and returns an integer
- `b.storeInt(x, 32)` modifies a builder
- `b = b.storeInt()` also works, since it not only modifies, but returns
- chained methods also work, they return `self`
- everything works exactly as expected, similar to JS
- no runtime overhead, exactly same Fift instructions
- custom methods are created with ease
- tilda `~` does not exist in Tolk at all
Instead on 'ton_crypto', Tolk now depends on 'ton_crypto_core'.
The only purpose of ton_crypto (in FunC also, btw) is address parsing:
"EQCRDM9...", "0:52b3..." and so on.
Such parsing has been implemented manually exactly the same way.
Unary logical NOT was already implemented earlier.
Logical AND OR are expressed via conditional expression:
* a && b  ->  a ? (b != 0) : 0
* a || b  ->  a ? 1 : (b != 0)
They work as expected in any expressions. For instance, having
`cond && f()`, f is called only if cond is true.
For primitive cases, like `a > 0 && b > 0`, Fift code is not optimal,
it could potentially be without IFs.
These are moments of future optimizations. For now, it's more than enough.
Tolk Language: next-generation FunC
@EmelyanenkoK EmelyanenkoK merged commit a5f1f7d into testnet Nov 2, 2024
14 checks passed
Copy link

@mohamadrezaasadi308 mohamadrezaasadi308 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1798728048

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants