-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat(#56) blog about caching #58
Feat(#56) blog about caching #58
Conversation
@maxonfjvipon check please |
@maxonfjvipon check please |
@yegor256 @maxonfjvipon @volodya-lombrozo read please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Please, check all the grammar mistakes in this text.
|
||
|
||
## Introduction | ||
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an assembly, he loses focus on a task and spends valuable working
"Empty words". We can remove them without losing any meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different build systems use many tools,
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
Why do I need this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I written it to start this blog. I will delete these suggestions if they are not necessary.
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an | ||
assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools, | ||
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more. | ||
The subject of this article is caching, because completed tasks caching allows not to spend resources again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The subject of this article is caching."
The other is obvious:
because completed tasks caching allows not to spend resources again
assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools, | ||
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more. | ||
The subject of this article is caching, because completed tasks caching allows not to spend resources again. | ||
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Caching speeds up a "build time" or "program execution", not "programs work".
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more. | ||
The subject of this article is caching, because completed tasks caching allows not to spend resources again. | ||
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work. | ||
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do you have particular links to these issues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo do you mean that I should to attach a link to the issue where the error occurred?
The subject of this article is caching, because completed tasks caching allows not to spend resources again. | ||
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work. | ||
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin` | ||
for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It's hard to grasp without a context:
The error occurred, because using a file name and comparing equality of
compilation time and caching time is not the most reliable verification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo Do you have an example of context? Should it be code or diagram?
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin` | ||
for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of | ||
compilation time and caching time is not the most reliable verification. Unit tests were written showing that | ||
cache does not work correctly. Also reading a file was necessary for getting a programme name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unit tests were written to demonstrate that the cache does not function correctly. Additionally, reading a file was required to obtain a program name, which slowed down the assembly process."
By the way, what is the "assembly proccess"? A reader might not be familiar with this term.
compilation time and caching time is not the most reliable verification. Unit tests were written showing that | ||
cache does not work correctly. Also reading a file was necessary for getting a programme name | ||
that slowed down an assembly. | ||
That we came to conclusion that we need caching with a reliable verification which does not require reading a file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Came to conclusion" should be "came to the conclusion".
- "which does not require reading a file from disk" could be rephrased to "that does not require reading a file from a file system".
- "And using cache" should be "And using a cache"
That we came to conclusion that we need caching with a reliable verification which does not require reading a file | ||
from disk. And using cache should save us enough time for building a project. | ||
|
||
The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 This sentence might be connected with the previous one: "The subject of this article is caching."
@volodya-lombrozo check please |
|
||
|
||
## Introduction | ||
In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [EO](https://github.com/objectionary/eo), caching is used to speed up program execution.
- "caching" is uncountable here, you don't need an article "a"
- You need a comma after "In EO"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "program execution" or "program compilation" ?
|
||
## Introduction | ||
In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution. | ||
While developing [EO](https://github.com/objectionary/eo) we found a caching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 No need to mention EO
twice:
While developing [EO](https://github.com/objectionary/eo)
In the previous sentence you already mentioned it. Just:
Recently we found a error...
, btw a "error"? Maybe it's just a "bug"?
In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution. | ||
While developing [EO](https://github.com/objectionary/eo) we found a caching | ||
[error](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin` | ||
for EO version `0.34.0`. The error occurred because the cache was searched for the needed file using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96
Please, use active voice, instead of passive:
The error occurred because the cache was searched for the needed file using
a comparison of compilation time and caching time.
In many styles of writing, active voice is preferred over passive voice for clarity and easier reading.
a comparison of compilation time and caching time. | ||
This is not the most reliable verification method, | ||
because caching time does not have to be equal to compilation time. | ||
[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96
Please, use active voice, instead of passive:
The error occurred because the cache was searched for the needed file using
a comparison of compilation time and caching time.
In many styles of writing, active voice is preferred over passive voice for clarity and easier reading.
cache does not work correctly. Additionally, reading a file was necessary to obtain a program name | ||
that slowed down the build process. | ||
That we came to the conclusion that we need caching with a reliable verification method | ||
that does not require reading a file system. Using a cache should save us enough time for building a project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "from a file system"?
This is not the most reliable verification method, | ||
because caching time does not have to be equal to compilation time. | ||
[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the | ||
cache does not work correctly. Additionally, reading a file was necessary to obtain a program name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 [Question] To be honest, I don't think we need such details in this post. They only confuse a reader. Maybe it's better to omit them? What do you think?
cache does not work correctly. Additionally, reading a file was necessary to obtain a program name | ||
that slowed down the build process. | ||
That we came to the conclusion that we need caching with a reliable verification method | ||
that does not require reading a file system. Using a cache should save us enough time for building a project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a cache should save us enough time for building a project.
We already use some cache. So it already saves some time.
btw, "enough" - How much is that?
@volodya-lombrozo check please |
the compilation time and caching time to search for the needed file. | ||
This is not the most reliable verification method, | ||
because caching time does not have to be equal to compilation time. | ||
That we came to the conclusion that we need caching with a reliable verification method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "That...that" - to many "that`s". You can omit the first one.
Recently we found a caching | ||
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin` | ||
for EO version `0.34.0`. The error occurred because the algorithm compared | ||
the compilation time and caching time to search for the needed file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96
This sentence is a bit strange:
The error occurred because the algorithm compared
the compilation time and caching time to search for the needed file.
Which algorithm? Which file do you mean?
This is not the most reliable verification method, | ||
because caching time does not have to be equal to compilation time. | ||
That we came to the conclusion that we need caching with a reliable verification method | ||
that does not require reading a file system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Does caching read a file system? Maybe "files from a file system?"
that does not require reading a file system. | ||
|
||
The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) | ||
and to create effective caching in [EO](https://github.com/objectionary/eo). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "create" -> "implement"
The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`) | ||
and to create effective caching in [EO](https://github.com/objectionary/eo). | ||
|
||
<!--more--> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "More"?
according to the rules. | ||
At the end of its work, the compiler optimizes the resulting machine code and produces an object file. | ||
To speed up compilation, different files of the same project are compiled in parallel, | ||
that is, we receive several object files at once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 This is redundant:
that is, we receive several object files at once
To speed up compilation, different files of the same project are compiled in parallel, | ||
that is, we receive several object files at once. | ||
|
||
3) After all received project object files are passed to the linker. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What does it mean:
After all received project object files are passed to the linker.
Is it "After all, received project object files are passed to the linker."
or "After, all received project object files are passed to the linker." ?
Why "received"? Which "project" do you mean?
Maybe it's better just use "Then, object files are passed to the linker.", or better:
"Then linker <...do something...> with object files" (active voice)?
that is, we receive several object files at once. | ||
|
||
3) After all received project object files are passed to the linker. | ||
Linker is a program that combines program components, written in assembly language or a high-level programming language, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I though that Linker combines object files?
|
||
|
||
`ccache` has two main caching methods: | ||
1) `Direct mode` - hashcode is generated based on the source code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Which "hashcode" do you mean? You gave the definition below, that paragraph positioning confuses a lot. I have to skip this part and then return to it after.
The hashcode includes information: file contents, directory, compiler information, compilation time, extensions | ||
used by the compiler. A compressed machine code file is placed in the cache using the received key. | ||
|
||
`Direct mode` compiles the program faster, since the preprocessor step is skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You explains two modes by using this template:
1) Direct mode - hashcode is generated based on the source code.
2) Preprocessor mode - hashcode is generated based on the result of preprocessor.
3) Direct mode compiles the program faster...
4) Preprocessor mode is slower...
Looks strange, maybe it's better to explain one mode and the move to the another?
1) Direct mode - hashcode is generated based on the source code.
3) Direct mode compiles the program faster...
2) Preprocessor mode - hashcode is generated based on the result of preprocessor.
4) Preprocessor mode is slower..
@volodya-lombrozo check please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Could you please link this PR with the issue you are trying to solve?
</p> | ||
|
||
1) First, preprocessor gets the input files. The input files are source files (.cpp) and header files (.h). | ||
The result is a single edited file with human-readable code that the compiler will get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Which format the output file has?
The result is a single edited file with human-readable code that the compiler will get. | ||
|
||
|
||
2) The compiler receives the finished code file and converts it into machine code, presented in an object file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "finished" code? What does it mean?
To speed up compilation, different files of the same project are compiled in parallel. | ||
|
||
3) Then, the linker gets object files. | ||
Linker is a program that combines object files into an executable file or library. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
3) Then, the linker gets object files. | ||
Linker is a program that combines object files into an executable file or library. | ||
The result of the linker is an executable .exe file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maybe it's better to quote .exe
? What do you think?
This machine code is then combined into one executable file. | ||
|
||
|
||
`ccache` uses hashcode to find cached files. The hashcode includes information: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 [Question] I'm not sure here, but it seems that a "hashcode" isn't frequently used term, from Hash Function definition:
The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo Thanks, I will use "hash algorithm" as ccache documentation.
`ccache` has two main caching methods: | ||
1) `Direct mode` - hashcode is generated based on the source code. | ||
`Direct mode` compiles the program faster, since the preprocessor step is skipped. | ||
However,the header files are not checked for changes, so the wrong project may be built. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What are "wrong" and "right" projects here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo
"the wrong project" - is the project built with not verified header files.
"the right project" - is the project built with verified header files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maybe we can clarify it in the text?
2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor. | ||
`Preprocessor mode` is slower than `direct mode`, but the right project is built always. | ||
|
||
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do you mean some particular cloud? ("the")
`Preprocessor mode` is slower than `direct mode`, but the right project is built always. | ||
|
||
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud. | ||
And it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You have some problem with tense here (grammar)
|
||
### Maven | ||
[Maven](https://maven.apache.org) automates and manages Java-project builds. | ||
Building a project in `Maven` is completed in three |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 in three maven LifeCycles Maven. "Maven, maven"
<img src="/images/defaultPhaseMaven.svg"> | ||
</p> | ||
|
||
In `Maven` all phases and goals are executed strictly in order, linearly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 So, Maven doesn't use caching at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo As far as I understand, that Maven can use added extensions from Gradle for caching. Or Maven can rebuild only changed project modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maven has .m2
folder at least. In this folder it keeps all downloaded dependencies. So it's some sort of caching too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It's not about Gradle, I guess: https://maven.apache.org/extensions/maven-build-cache-extension/
@volodya-lombrozo check please |
To speed up the assembly of compiled languages, [ccache](https://ccache.dev) | ||
and [sccache](https://github.com/mozilla/sccache) are used. | ||
Let's look at the assembly scheme using C++ as an example | ||
to imagine the build process in compiled languages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maybe we can we change "Imagine" to "Visualize"? What do you think?
BTW, "to imagine the build process in compiled languages" looks redundant.
<img src="/images/defaultCPhase.svg"> | ||
</p> | ||
|
||
1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Wdyt?
"First, the preprocessor retrieves the source code files, which consist of both source files .cpp
and header files .h
."
</p> | ||
|
||
1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`. | ||
The result is a single edited file `.cpp` with human-readable code that the compiler will get. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "edited"? Seems redundant.
The result is a single edited file `.cpp` with human-readable code that the compiler will get. | ||
|
||
|
||
2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Is an "object file" machine code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo "object file" is machine code.
|
||
|
||
2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file. | ||
At the compilation stage, parsing occurs, which checks whether the code matches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on. | ||
To speed up the assembly of compiled languages, [ccache](https://ccache.dev) | ||
and [sccache](https://github.com/mozilla/sccache) are used. | ||
Let's look at the assembly scheme using C++ as an example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 To be honest, I have some doubts about this paragraph where you discuss "compilation steps":
- First of all this is a blog about caching, not about compilation
- You describe compilation incompletely. What about optimizations? Moreover, modern compilers usually convert source code to some sort of IR, like LLVM IR, for example. You can take a look how clang works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I believe that a reader will better understand this article if we briefly talk about the stages of compilation of the presented build systems. Yes, I describe compilation incompletely, but enough to indicate where caching works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Then, it's good to mention it.
I describe compilation incompletely, but enough to indicate where caching works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo "The goal is to implement effective caching in EO.
For this, we will briefly look at how frequently used build systems (ccache
, Maven
, Gradle
) work
in order to better understand the ideas behind caching in them."
it's ok?
This machine code is then combined into one executable file. | ||
|
||
|
||
`ccache` hash algorithm, for the hashing of information to find cached files fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 This sentence is inconsistent with the previous one. Moreover, it seems we have grammar errors here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo the two last sentences (lines 52-55) was unnecessary. I need delete them. Thanks, I have found grammar error.
1) `Direct mode` - hash is generated based on the source code. | ||
`Direct mode` compiles the program faster, since the preprocessor step is skipped. | ||
However,the header files are not checked for changes, so the project may be built with not verified header files. | ||
2) `Preprocessor mode` - hash is generated based on the result of preprocessor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Is it some particular result ("the")? Which preprocessor do you mean?
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage. | ||
And `sccache` includes support for caching the compilation of C/C++ code, | ||
[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using | ||
[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What is nvcc
you didn't mentioned it before.
And `sccache` includes support for caching the compilation of C/C++ code, | ||
[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using | ||
[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html), | ||
and [clang](https://llvm.org/docs/CompileCudaWithLLVM.html), while `ccache` works with C and C++ code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What do you mean:
ccache
works with C and C++ code.
As I undestand ccache
works with compilers, not with code.
@volodya-lombrozo check please |
In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation. | ||
Recently we found a caching | ||
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin` | ||
for EO version `0.34.0`. The bug occurred because the old verification method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It's better to say: "The bug occurred because the old verification method used compilation time and caching time to search for a cached file"
This is not the most reliable verification method, | ||
because caching time does not have to be equal to compilation time. | ||
We came to the conclusion that we need caching with a reliable verification method. | ||
And this verification method should not use the information that the cached file contains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Furthermore, this verification method should refrain from reading the file content."
## Build caching of existing build systems | ||
|
||
### ccache/sccache | ||
In compiled programming languages, building a project containing many source code files takes a long time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"containing" -> "with"
<img src="/images/defaultCPhase.svg"> | ||
</p> | ||
|
||
1) First, preprocessor retrieves the source code files, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You only say that "preprocessor" only retrieves the source code files. And then... magic...:
The result is a single file
.cpp
with human-readable code that the compiler will get.
Moreover, you don't need "compiler will get"
1) First, preprocessor retrieves the source code files, | ||
which consist of both source files `.cpp` and header files `.h`. | ||
The result is a single file `.cpp` with human-readable code that the compiler will get. | ||
2) The compiler receives the edited code file `.cpp` and converts it into object file - `.obj`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "edited code file" looks strange. Please, be more concrete here. Which file? Moreover, "edited" isn't frequently used in texts.
The result of the linker is an executable `.exe` file. | ||
|
||
|
||
`ccache` has hash algorithm, for the hashing of information to find cached files fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 This paragraph is disconnected with the previous one. You only said about compilation and then suddenly started with ccache
. Please, read this two paragraphs and you will see.
### Maven | ||
[Maven](https://maven.apache.org) automates and manages Java-project builds. | ||
Building a project in `Maven` is completed in three | ||
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<img src="/images/defaultPhaseMaven.svg"> | ||
</p> | ||
|
||
In `Maven` all phases and goals are executed strictly in order, linearly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It's not about Gradle, I guess: https://maven.apache.org/extensions/maven-build-cache-extension/
`Maven` suggests rebuilding only changed project modules to speed up the build process. | ||
|
||
### Gradle | ||
But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Just "Unlike Maven
..."
|
||
### Gradle | ||
But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph - | ||
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maybe it's better to give a link to a "Gradle task graph" instead? Why do I need to read about DAGs?
@volodya-lombrozo check please |
Furthermore, this verification method should refrain from reading the file content. | ||
|
||
The goal is to implement effective caching in EO. | ||
To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "frequently used", maybe it's better to use "well-known"? wdyt?
|
||
The goal is to implement effective caching in EO. | ||
To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle) | ||
in order to gain a deeper understanding of the caching concepts employed within them and to development caching in EO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "and to development caching in EO" is redundant.
1) First, preprocessor retrieves the source code files, | ||
which consist of both source files `.cpp` and header files `.h`. | ||
The result is a single file `.cpp` with human-readable code that the compiler will get. | ||
2) The compiler receives the file `.cpp` from the preprocessor and converts it into object file - `.obj`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "converts it" -> "compiles it"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "object file" -> "an object file" ?
|
||
Moreover, `ccache` has two types of the hashing: | ||
1) `Direct mode` - the hash is generated based on the source code only. | ||
This mode allows to build the program faster, since the preprocessor step is skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do we really skep "preprocessor step" here? Ot it's just the mode that doesn't require "preprocessor"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I have fixed the text so: "
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
The ccache
hash is
based on:
- the file contents
- the current directory of the file
- the name of the compiler
- the compiler’s size and modification time
- extensions used by the compiler.
Moreover, ccache
has two types of the hashing:
Direct mode
- the hash is generated based on the source code only.
When using this mode, the user must ensure that the external libraries used in a project have not changed.
Otherwise, the project will fail to build, resulting in errors.Preprocessor mode
- hash is generated based on the.cpp
file received after the preprocessor step.
Preprocessor mode
is slower thandirect mode
, but the project is built without compile errors.."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I added an explanation of what ccache
does: "When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times." And I changed the sentences about modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I like it. Except the last sentence "but the project is built without compile errors.." - It might be removed.
Moreover, `ccache` has two types of the hashing: | ||
1) `Direct mode` - the hash is generated based on the source code only. | ||
This mode allows to build the program faster, since the preprocessor step is skipped. | ||
When using this mode, the user must be sure that the external libraries, using in a project, have not changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "When using this mode, the user must ensure that the external libraries used in a project have not changed."
`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work. | ||
|
||
To understand how `Incremental build` works, consider the following steps: | ||
1) Before executing a task for the first time, `Gradle` takes a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "Before executing a task for the first time" -> "Before executing a task"
`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work. | ||
|
||
To understand how `Incremental build` works, consider the following steps: | ||
1) Before executing a task for the first time, `Gradle` takes a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 takes? Maybe it's better to say "calculates"? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I used "take" because this verb is used the Gradle documentation
1) Before executing a task for the first time, `Gradle` takes a | ||
[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing)) | ||
of the path and contents of the source files and saves it. | ||
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- the fingerprint
- "of the path and contents" - better to remove it, you already mentioned it above.
- Where does it save the fingerprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Where does it save the fingerprint?" - I don't know. the Gradle documentation has the sentence: "Gradle persists both fingerprints for the next time the task is executed." Nothing is said about the save location.
[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing)) | ||
of the path and contents of the source files and saves it. | ||
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files. | ||
3) Before each rebuilding of the task, `Gradle` generates a new fingerprint of the source files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "Then, when Gradle
starts a project build again, it generates a new fingerprint for the same files. If the new fingerprint has not changed, Gradle
can safely skip this task."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo "Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
If the new fingerprint has not changed, Gradle can safely skip this task can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." - is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You don't need this: "can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." It's obvious.
If none of the inputs or outputs have changed, Gradle can skip that task. | ||
|
||
|
||
In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What is the difference with the cache that you described above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo this cache is for various branches, above - for project of one branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Ok, let's leave it, but I have some doubts about the description. For me, it's not clear from the first glance.
|
||
Moreover, `ccache` has two types of the hashing: | ||
1) `Direct mode` - the hash is generated based on the source code only. | ||
This mode allows to build the program faster, since the preprocessor step is skipped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I like it. Except the last sentence "but the project is built without compile errors.." - It might be removed.
When using this mode, the user must be sure that the external libraries, using in a project, have not changed. | ||
Otherwise, the project will build with errors. | ||
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step. | ||
`Preprocessor mode` is slower than `direct mode`, but the project is built without errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why do you need to say here about compile errors? Which errors do you mean? If you mention the errors related to libraries, it's better to specify it. Otherwise, it's better just to remove these sentence.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step. | ||
`Preprocessor mode` is slower than `direct mode`, but the project is built without errors. | ||
|
||
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 ok
|
||
`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage. | ||
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I mean ccache and sccache altogether. What is the difference with other types of caching? Why did you choose these tools?
showcasing how inputs and outputs are specified to enable `Incremental build`: | ||
``` | ||
task myTask { | ||
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 good
|
||
|
||
In this chapter, we introduce the keywords: | ||
* `the source file`: This file serves as the input for goal operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why do you use "the" ? Is it some particular source file which we know, or which you introduced previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I will remove these text.
* `the cached file`: This file contains the results of goal's execution. | ||
|
||
|
||
The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It's untrue. They just interfaces. Why do they "derive" from the SafeMojo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Please, provide links to this files in the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo It's my mistake. There was SafeMojo
in another sentence.
this.validations = cv; | ||
} | ||
|
||
public Optional<XML> load(final Path source, final Path cache) {...}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Maybe it's better to make Path cache
a field? Since you are using it in all the methods.
The `CacheValidation` interface has the only method ensuring that each validation contains a specific test condition. | ||
|
||
``` | ||
public interface CacheValidation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I didn't grasp the idea why we might need this class and why it has exactly this implementation.
|
||
### Conclusion | ||
In this article, we explored various build systems and their caching methods. | ||
We were motivated to find an efficient caching approach for EO due to issues discovered during bug investigation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 To be honest, I don't understand why you "explored various build systems". This analysis is completely disconnected from the solution you propose. You observe high-level concepts in other build systems but didn't do any conclusions from them. You merely mention their existence. Why? Why should we read about them? Where is the connection with the second part of the text?
Later, you just mentioned that we have some goals and some Mojos in Maven. Why? Why do we need to read about them? What is the purpose of this?
Please, pause for a moment and consider these questions:
What is the purpose of this blog post? Why are you writing it, and why should people read it?
At which level of abstraction do you want to discuss? High-level caching mechanisms like ccache, or low-level CacheValidation
implementation?
@volodya-lombrozo check please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add conclusion after each build system you mentioned:
- for ccache
- for grade
- for maven
Maybe you will need to move some conclusions from "EO build cache".
"EO build cache" should be a conclusion for the entire blog post where you describe how we will implement caching (on a high-level) in EO.
P. S. I understand that you are trying to use different terms from different systems:
- fingerprint
- hash
- key
But sometimes it's hard to keep in mind that they mean the same. So it confuses a bit.
|
||
<!--more--> | ||
|
||
## Build caching of existing build systems |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What about "Caching in Build Systems" ? or " Caching in Other Build Systems".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I will choose " Caching in Other Build Systems"
At the compilation stage, parsing checks whether the code matches rules of a specific programming language. | ||
At the end, the compiler optimizes the resulting machine code and produces an object file. | ||
To speed up compilation, different files of the same project might be compiled in parallel. | ||
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You might use a different verb instead of "gets". It actually does something with this files. Combines? Resolves?
To speed up the build of compiled languages, [ccache](https://ccache.dev) | ||
and [sccache](https://github.com/mozilla/sccache) are used. | ||
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build. | ||
`ccache` uses the hash to save a code in the cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You might remove this sentence.
Moreover, `ccache` has two types of the hashing: | ||
1) `Direct mode` - the hash is generated based on the source code only. | ||
When using this mode, the user must ensure that the external libraries used in a project have not changed. | ||
Otherwise, the project will fail to build, resulting in errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "will" -> "might"
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files. | ||
3) Then, when Gradle starts a project build again, it generates a new fingerprint for the same files. | ||
If the new fingerprint has not changed, Gradle can safely skip this task. | ||
In the opposite case, the task needs to perform an action and to rewrite outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What about: "In the opposite case, the task performs an action again and rewrites outputs"
|
||
|
||
In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds, | ||
for example when switching from one branch to another. This feature is known as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Please, specify which branch do you mean. Is it a "git" branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo yes, "git" branch
Each `module` has its own `pom.xm` file, and there is an aggregator `pom.xml` that consolidates all the `modules`. | ||
This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`, | ||
including the source code and the configuration of the plugins used within it. | ||
`Modules` with the same key are current or unchanged and the cache can efficiently restore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 This sentence doesn't make sense to me: "Modules
with the same key are current or unchanged and the cache can efficiently restore them.".
By the way, what is the "key"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo key is hash. Can you explain me your point: "This sentence doesn't make sense to me"? Do you mean this is obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I don't understand "current" word here
This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`, | ||
including the source code and the configuration of the plugins used within it. | ||
`Modules` with the same key are current or unchanged and the cache can efficiently restore them. | ||
Conversely, the cache seamlessly delegates the build work to the standard Maven core, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "Conversely"?
These caching interfaces shared similar logic, but with minor differences. | ||
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical. | ||
Additionally, the conditions for searching data in the cache had errors. | ||
The cached file is considered valid if the end time of goal's execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You don't need to explain it again. I've already mentioned this bug above. Could you please remove this sentence?
from the file attributes without reading the file context. | ||
|
||
|
||
### Conclusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I don't think we need such a conclusion in a blog post. It isn't a scientific article. Moreover it doesn't provide any useful information. Kinda "water".
@volodya-lombrozo I have fixed this post.
Check please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96, please thoroughly review the entire text at least once (preferably more) before submitting it for review. This will greatly expedite our review process.
|
||
`ccache` is a high-level tool and cannot work with individual compilation tasks, | ||
therefore `ccache` is not suitable for solving our problems. | ||
However, the concept of non-local data storage could potentially be incorporated during the development of the EO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "of the EO cache"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or "EO caching implementation"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I meant EO in general. If "EO caching implementation" is better, I will fix it.
|
||
|
||
`ccache` is a high-level tool and cannot work with individual compilation tasks, | ||
therefore `ccache` is not suitable for solving our problems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I wouldn't say this:
therefore
ccache
is not suitable for solving our problems.
What about this:
ccache
cannot work with individual compilation tasks (...for example...). However, the hashing approach and the concept of non-local data storage could potentially be incorporated during the development of the EO caching mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo
"ccache
is a high-level tool and cannot work with individual compilation tasks (e.g. Maven goal
or Gradle task
).
However, the hashing approach and the concept of non-local data storage could potentially
be incorporated during the development of the EO caching mechanism."
is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Gradle` employs | ||
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work), | ||
to speed up project builds. | ||
For an incremental build to work, the tasks used to build the project must have specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 The second sentence clearly explains the idea which you are trying to explain here. I would suggest to combine this two sentences into a single one. Or jut to remove this sentence. What do you think?
|
||
To understand how `Incremental build` works, consider the following steps: | ||
1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it. | ||
The hash is considered current if the last modification time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "current" is a strange word here. I guess you meant something different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo "valid" is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I'm not sure what you mean here, but I guess, yes.
1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it. | ||
The hash is considered current if the last modification time | ||
and the size of the source files have not changed. | ||
2) Then `Gradle` executes the task and saves a hash of the path and contents of the output files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Is it a single hash for all the files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo yes. It is written in The Gradle documentation - "Gradle takes a fingerprint of the inputs. This fingerprint contains the paths of input files and a hash of the contents of each file. Gradle then executes the task."
Should I mark this clarification in blog-post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It would be nice
|
||
|
||
Maven's caching mechanisms operate at the level of `phases` and individual project modules. | ||
Therefore, existing caching systems in Maven do not align with our requirements for resolving present issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You haven't explained the caching in this section at all. How hash is generated which data it requires to generate a hash? file content, file path, last modification time?
You have the rather good section for ccache
. I believe we might use the same structure for Maven
and Gradle
.
These tasks happen one after the other, and each task relies on the output of the one before it. | ||
Each task has directories for input and output data, as well as a directory for storing cached data. | ||
Using the program name, each task can receive and store data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why do you need two consecutive empty lines here? If you need some logical division, use headings and clear sections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Same question.
leading to redundancy and complicating the caching infrastructure. | ||
|
||
|
||
To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why?
However, we cannot use them.
We can utilize parts of existing solutions, such as the hash generation algorithm from ccache and Gradle's task caching approach for our compilation steps.
To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them. | ||
We require a caching mechanism at the level of `goals`. | ||
In fact, we don't need to invent a new caching mechanism for EO. | ||
Instead, it suffices to verify the last modification time of the files involved in EO compilation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why "last modification time" ? We need more examples here, but it looks like a totally unreliable approach.
The modification time of the preceding task must not exceed that of the subsequent one. | ||
As each task possesses directories for input and output data, accessing the desired file | ||
via an absolute path enables retrieval of essential information, as file name and last modified time, | ||
from the file attributes without reading the file context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96, could you write down the cache usage algorithm as you have done it in the Gradle section? Please outline the steps as you "see" them.
@volodya-lombrozo check please |
which consist of both source files `.cpp` and header files `.h`. | ||
The result is a single file `.cpp` with human-readable code that the compiler will get. | ||
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`. | ||
At the compilation stage, parsing checks whether the code matches rules of a specific programming language. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Did you mean "parser" instead of "parsing"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo yes, thanks
The result is a single file `.cpp` with human-readable code that the compiler will get. | ||
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`. | ||
At the compilation stage, parsing checks whether the code matches rules of a specific programming language. | ||
At the end, the compiler optimizes the resulting machine code and produces an object file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You already mentioned it:
The compiler receives the file
.cpp
from the preprocessor and compiles it into an object file -.obj
.
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work), | ||
to speed up project builds. | ||
For an incremental build to work, the tasks used to build the project must have specified | ||
input and output files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "To enable an incremental build, the tasks that build the project must specify their input and output files."
``` | ||
|
||
|
||
To understand how `Incremental build` works, consider the following steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Something strange is happening here with punctuation. Did you put this sentences in this order intentionally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo If I replace "To understand how Incremental build
works, consider the following steps" with "How Incremental build
works", will it be ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Is it possible to remove this sentence?
|
||
To understand how `Incremental build` works, consider the following steps: | ||
`Incremental build` uses a hash to detect changes in the inputs and the outputs. | ||
The single hash contains the paths and the contents of all the input files or output files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "contains"? Maybe "uses"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo In Gragle documentation: "This fingerprint contains the paths of input files and a hash of the contents of each file."
|
||
`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs. | ||
And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description). | ||
Steps of the EO compiler can have input and output files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why did you write this two sentences about eo?
|
||
In Maven, the `phases` are inherently interconnected within the build lifecycle. | ||
A `phase` represents a specific task, and the execution order of `phases` is determined by the default Maven | ||
lifecycle bindings. Each `phase` functions as a series of individual tasks known as `goals`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You have already described phases
and goals
above. Could you please remove this redundancy and repetition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I will remove 130-131 lines:
Each lifecycle consists of phases
and these phases
consist of sets of goals
.
One phase
can consist of several goals
.
functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages | ||
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild), | ||
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract | ||
top-level classes that implement specific build actions, denoted as methods annotated `@Builder`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do we really need to know this low-level details about takari
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 The question remains
It does not use hashing algorithms, which can slow down project build times, | ||
and it does not have separate cache directories. | ||
Each `builder` has own directories for input and output data related to their work. | ||
The operational principle of the Takari Incremental API is similar to the operation of caching in EO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 We don't know about caching in EO yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Still an issue.
|
||
|
||
`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs. | ||
And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 You can add a link to the Maven
section.
It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file. | ||
Additionally, Maven also supports `goals` that are not bound to any build phase | ||
and can be executed outside the build lifecycle, directly through the command line. | ||
The sequence of achieving `goals` is as follows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I'm not sure we should explain maven lifecycle so deep. I would just leave necessary information. For the rest you might provide a link to the documentation.
functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages | ||
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild), | ||
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract | ||
top-level classes that implement specific build actions, denoted as methods annotated `@Builder`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 The question remains
|
||
Special attention should be given to the Takari Incremental API. | ||
This API can be applied to cache EO compilation stages as it operates with `goals`. | ||
It does not use hashing algorithms, which can slow down project build times, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 How does it possible? Does it really cache something if it doesn't use hashing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo The Takari checks the last modification time of the input files. It doesn't create a hash.
Or did I not understand the question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It would be good to mention it here:
The Takari checks the last modification time of the input files. It doesn't create a hash.
It does not use hashing algorithms, which can slow down project build times, | ||
and it does not have separate cache directories. | ||
Each `builder` has own directories for input and output data related to their work. | ||
The operational principle of the Takari Incremental API is similar to the operation of caching in EO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Still an issue.
</p> | ||
|
||
|
||
<p align="center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96, I agree that we need to remove the redundancy in the code. For example, we should combine the Footprint
and Optimization
methods and fix the hash comparison mechanism. As for checking the "previous" step, I completely disagree. First of all, some steps might be skipped, and more importantly, doing this significantly increases coupling between the phases, which is a significant architectural flaw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo check please |
It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file. | ||
Additionally, Maven also supports `goals` that are not bound to any build phase | ||
and can be executed outside the build lifecycle, directly through the command line. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do you need this empty line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo I guess so, because I talk since in lines 122-128 about Maven and in lines 131-150 about caching mechanisms in Maven.
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract | ||
top-level classes that implement specific build actions. | ||
They can produce various types of outputs, including generated/output files on the filesystem, | ||
build messages, and project model mutations. For each `builder` annotated method, a maven mojo, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 I didn't understand this sentence:
For each `builder` annotated method, a maven mojo, which represents a maven `goal`, is generated.
which represents a maven `goal`, is generated. | ||
When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs. | ||
Any changes in the inputs result in the removal of outputs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Do you need this empty line?
|
||
Special attention should be given to the Takari Incremental API. | ||
This API can be applied to cache EO compilation stages as it operates with `goals`. | ||
It does not use hashing algorithms, which can slow down project build times, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 It would be good to mention it here:
The Takari checks the last modification time of the input files. It doesn't create a hash.
These tasks happen one after the other, and each task relies on the output of the one before it. | ||
Each task has directories for input and output data, as well as a directory for storing cached data. | ||
Using the program name, each task can receive and store data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Same question.
1) We create EO program, named "example". | ||
Intermediate files during compilation of this program will have the same name, but not the format | ||
(e.g. `example.eo`, `example.xml`). | ||
2) When the EO compiler compiles this program task, it saves files of compilation steps into cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "compiler compiles"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo "compiler assembles" is it ok?
</p> | ||
|
||
|
||
<p align="center"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@volodya-lombrozo check please |
* Employing multiple caching mechanisms for similar entities is a suboptimal practice, | ||
leading to redundancy and complicating the caching infrastructure. | ||
|
||
In tackling caching challenges within EO, we conducted a thorough evaluation of current caching systems. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 There is lot of "water" in this paragraph. Could you simplify this text please?
for storing and retrieving data from the cache. | ||
The logic for checking the relevance of cached data is presented below: | ||
1) We create EO program, named "example". | ||
Intermediate files during compilation of this program will have the same name, but not the format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Why do we need to have different format?
1) We create EO program, named "example". | ||
Intermediate files during compilation of this program will have the same name, but not the format | ||
(e.g. `example.eo`, `example.xml`). | ||
When the EO compiler assembles this program task, it saves files of compilation steps into cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 What is the "program task"?
When the EO compiler assembles this program task, it saves files of compilation steps into cache. | ||
Each compilation step has its own caching directory and an input file directory. | ||
2) When the EO compiler starts a project build again, it will check if there is the input file, named "example", | ||
in the cache of step. If such a file exists, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "cache of step"?
Each compilation step has its own caching directory and an input file directory. | ||
2) When the EO compiler starts a project build again, it will check if there is the input file, named "example", | ||
in the cache of step. If such a file exists, | ||
then it is enough to check that the last modification time of cached file at the current step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 Which files do you compare? According with this text, you compare input with output. Their modification times will be definitely different.
4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg) | ||
or any input file [Picture 6](/images/RewritingInCacheEO2.svg) have changed, | ||
then the previously cached files become invalid. | ||
In this case, the compilation step performs an action again and rewrites outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Yanich96 "rewrite outputs"? Do you mean cache here?
@yegor256 Could you take a look, please? |
@Yanich96 I appreciate the study you conducted with existing caching tools! Great work! In order to make this blog post more impressive and informative I would suggest slightly modify its structure. How about this one:
WDYT? @volodya-lombrozo |
@yegor256 I don't think we need to stick to the scope of this article without adding something new. It's already large. So, I would exclude this points:
With all the rest I completely agree. @Yanich96 @yegor256 I would also recommend to move this changes into a separate PR since this one takes too much time to load on my laptop. (Because there are many comments already, I guess.) |
Blog that discusses caching compilation results for different build systems, so that for the development of effective caching in EO.
Closes: #56
PR-Codex overview
This PR adds SVG diagrams illustrating different compilation steps and cached files. It includes visual representations of various Mojos in the build process.
Detailed summary