Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(#56) blog about caching #58

Closed

Conversation

Yanich96
Copy link

@Yanich96 Yanich96 commented Mar 5, 2024

Blog that discusses caching compilation results for different build systems, so that for the development of effective caching in EO.

Closes: #56


PR-Codex overview

This PR adds SVG diagrams illustrating different compilation steps and cached files. It includes visual representations of various Mojos in the build process.

Detailed summary

  • Added SVG diagrams for AssembleMojo, SavingInCacheEO, RewritingInCacheEO1, and RewritingInCacheEO2
  • Illustrated compilation steps and cached files with visual representations
  • Included different Mojos like ParseMojo, OptimizeMojo, ShakeMojo, etc.

The following files were skipped due to too many changes: images/RewritingInCacheEO2.svg, images/defaultPhaseMaven.svg, images/defaultCPhase.svg, images/EO.svg, _posts/2024/2024-02-06-about-caching-in-eo.md

✨ Ask PR-Codex anything about this PR by commenting with /codex {your question}

@Yanich96
Copy link
Author

Yanich96 commented Mar 5, 2024

@maxonfjvipon check please

@Yanich96
Copy link
Author

Yanich96 commented Mar 5, 2024

@maxonfjvipon check please

@Yanich96
Copy link
Author

@yegor256 @maxonfjvipon @volodya-lombrozo read please

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, check all the grammar mistakes in this text.



## Introduction
Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an assembly, he loses focus on a task and spends valuable working

"Empty words". We can remove them without losing any meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

Different build systems use many tools,
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.

Why do I need this information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I written it to start this blog. I will delete these suggestions if they are not necessary.

Wasting a lot of time on building a project is a programming problem. At the moment a programmer starts an
assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools,
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
The subject of this article is caching, because completed tasks caching allows not to spend resources again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

"The subject of this article is caching."

The other is obvious:

because completed tasks caching allows not to spend resources again

assembly, he loses focus on a task and spends valuable working time. Different build systems use many tools,
helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
The subject of this article is caching, because completed tasks caching allows not to spend resources again.
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Caching speeds up a "build time" or "program execution", not "programs work".

helping to assemble a project faster, namely caching, task parallelization, distributed building and much more.
The subject of this article is caching, because completed tasks caching allows not to spend resources again.
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work.
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do you have particular links to these issues?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo do you mean that I should to attach a link to the issue where the error occurred?

The subject of this article is caching, because completed tasks caching allows not to spend resources again.
So in [EO](https://github.com/objectionary/eo) caching is used for speeding up programs work.
While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin`
for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of
Copy link
Member

@volodya-lombrozo volodya-lombrozo Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's hard to grasp without a context:

The error occurred, because using a file name and comparing equality of
compilation time and caching time is not the most reliable verification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo Do you have an example of context? Should it be code or diagram?

While developing [EO](https://github.com/objectionary/eo) we found caching errors in `eo-maven-plugin`
for EO version `0.34.0`. The error occurred, because using a file name and comparing equality of
compilation time and caching time is not the most reliable verification. Unit tests were written showing that
cache does not work correctly. Also reading a file was necessary for getting a programme name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

"Unit tests were written to demonstrate that the cache does not function correctly. Additionally, reading a file was required to obtain a program name, which slowed down the assembly process."

By the way, what is the "assembly proccess"? A reader might not be familiar with this term.

compilation time and caching time is not the most reliable verification. Unit tests were written showing that
cache does not work correctly. Also reading a file was necessary for getting a programme name
that slowed down an assembly.
That we came to conclusion that we need caching with a reliable verification which does not require reading a file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

  1. Came to conclusion" should be "came to the conclusion".
  2. "which does not require reading a file from disk" could be rephrased to "that does not require reading a file from a file system".
  3. "And using cache" should be "And using a cache"

That we came to conclusion that we need caching with a reliable verification which does not require reading a file
from disk. And using cache should save us enough time for building a project.

The goal of this article is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This sentence might be connected with the previous one: "The subject of this article is caching."

@Yanich96
Copy link
Author

@volodya-lombrozo check please



## Introduction
In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

In [EO](https://github.com/objectionary/eo), caching is used to speed up program execution.
  • "caching" is uncountable here, you don't need an article "a"
  • You need a comma after "In EO"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "program execution" or "program compilation" ?


## Introduction
In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution.
While developing [EO](https://github.com/objectionary/eo) we found a caching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 No need to mention EO twice:

While developing [EO](https://github.com/objectionary/eo)

In the previous sentence you already mentioned it. Just:

Recently we found a error...

, btw a "error"? Maybe it's just a "bug"?

In [EO](https://github.com/objectionary/eo) a caching is used to speed up program execution.
While developing [EO](https://github.com/objectionary/eo) we found a caching
[error](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
for EO version `0.34.0`. The error occurred because the cache was searched for the needed file using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96
Please, use active voice, instead of passive:

The error occurred because the cache was searched for the needed file using 
 a comparison of compilation time and caching time.

In many styles of writing, active voice is preferred over passive voice for clarity and easier reading.

a comparison of compilation time and caching time.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96
Please, use active voice, instead of passive:

The error occurred because the cache was searched for the needed file using 
 a comparison of compilation time and caching time.

In many styles of writing, active voice is preferred over passive voice for clarity and easier reading.

cache does not work correctly. Additionally, reading a file was necessary to obtain a program name
that slowed down the build process.
That we came to the conclusion that we need caching with a reliable verification method
that does not require reading a file system. Using a cache should save us enough time for building a project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "from a file system"?

This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
[Unit tests](https://github.com/objectionary/eo/pull/2749) were written to show that the
cache does not work correctly. Additionally, reading a file was necessary to obtain a program name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 [Question] To be honest, I don't think we need such details in this post. They only confuse a reader. Maybe it's better to omit them? What do you think?

cache does not work correctly. Additionally, reading a file was necessary to obtain a program name
that slowed down the build process.
That we came to the conclusion that we need caching with a reliable verification method
that does not require reading a file system. Using a cache should save us enough time for building a project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

Using a cache should save us enough time for building a project.

We already use some cache. So it already saves some time.
btw, "enough" - How much is that?

@Yanich96
Copy link
Author

@volodya-lombrozo check please

the compilation time and caching time to search for the needed file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
That we came to the conclusion that we need caching with a reliable verification method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "That...that" - to many "that`s". You can omit the first one.

Recently we found a caching
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
for EO version `0.34.0`. The error occurred because the algorithm compared
the compilation time and caching time to search for the needed file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96
This sentence is a bit strange:

The error occurred because the algorithm compared
the compilation time and caching time to search for the needed file.

Which algorithm? Which file do you mean?

This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
That we came to the conclusion that we need caching with a reliable verification method
that does not require reading a file system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Does caching read a file system? Maybe "files from a file system?"

that does not require reading a file system.

The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
and to create effective caching in [EO](https://github.com/objectionary/eo).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "create" -> "implement"

The goal of this blog is to research caching in frequently used build systems (`ccache`, `Maven`, `Gradle`)
and to create effective caching in [EO](https://github.com/objectionary/eo).

<!--more-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "More"?

according to the rules.
At the end of its work, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project are compiled in parallel,
that is, we receive several object files at once.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This is redundant:

that is, we receive several object files at once

To speed up compilation, different files of the same project are compiled in parallel,
that is, we receive several object files at once.

3) After all received project object files are passed to the linker.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What does it mean:

After all received project object files are passed to the linker.

Is it "After all, received project object files are passed to the linker."
or "After, all received project object files are passed to the linker." ?
Why "received"? Which "project" do you mean?

Maybe it's better just use "Then, object files are passed to the linker.", or better:
"Then linker <...do something...> with object files" (active voice)?

that is, we receive several object files at once.

3) After all received project object files are passed to the linker.
Linker is a program that combines program components, written in assembly language or a high-level programming language,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I though that Linker combines object files?



`ccache` has two main caching methods:
1) `Direct mode` - hashcode is generated based on the source code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Which "hashcode" do you mean? You gave the definition below, that paragraph positioning confuses a lot. I have to skip this part and then return to it after.

The hashcode includes information: file contents, directory, compiler information, compilation time, extensions
used by the compiler. A compressed machine code file is placed in the cache using the received key.

`Direct mode` compiles the program faster, since the preprocessor step is skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You explains two modes by using this template:

1) Direct mode - hashcode is generated based on the source code.
2) Preprocessor mode - hashcode is generated based on the result of preprocessor.
3) Direct mode compiles the program faster...
4) Preprocessor mode is slower...

Looks strange, maybe it's better to explain one mode and the move to the another?

1) Direct mode - hashcode is generated based on the source code.
3) Direct mode compiles the program faster...
2) Preprocessor mode - hashcode is generated based on the result of preprocessor.
4) Preprocessor mode is slower..

@Yanich96
Copy link
Author

@volodya-lombrozo check please

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Could you please link this PR with the issue you are trying to solve?

</p>

1) First, preprocessor gets the input files. The input files are source files (.cpp) and header files (.h).
The result is a single edited file with human-readable code that the compiler will get.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Which format the output file has?

The result is a single edited file with human-readable code that the compiler will get.


2) The compiler receives the finished code file and converts it into machine code, presented in an object file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "finished" code? What does it mean?

To speed up compilation, different files of the same project are compiled in parallel.

3) Then, the linker gets object files.
Linker is a program that combines object files into an executable file or library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What do you think if we just add the link to the Linter description, instead of explaining it here?


3) Then, the linker gets object files.
Linker is a program that combines object files into an executable file or library.
The result of the linker is an executable .exe file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to quote .exe? What do you think?

This machine code is then combined into one executable file.


`ccache` uses hashcode to find cached files. The hashcode includes information:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 [Question] I'm not sure here, but it seems that a "hashcode" isn't frequently used term, from Hash Function definition:

The values returned by a hash function are called hash values, hash codes, hash digests, digests, or simply hashes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo Thanks, I will use "hash algorithm" as ccache documentation.

`ccache` has two main caching methods:
1) `Direct mode` - hashcode is generated based on the source code.
`Direct mode` compiles the program faster, since the preprocessor step is skipped.
However,the header files are not checked for changes, so the wrong project may be built.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What are "wrong" and "right" projects here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo
"the wrong project" - is the project built with not verified header files.
"the right project" - is the project built with verified header files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe we can clarify it in the text?

2) `Preprocessor mode` - hashcode is generated based on the result of preprocessor.
`Preprocessor mode` is slower than `direct mode`, but the right project is built always.

`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do you mean some particular cloud? ("the")

`Preprocessor mode` is slower than `direct mode`, but the right project is built always.

`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in the cloud.
And it also has fixed some bugs (for example, there is a check of header files, which makes direct mode more accurate).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You have some problem with tense here (grammar)


### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
Building a project in `Maven` is completed in three
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 in three maven LifeCycles Maven. "Maven, maven"

<img src="/images/defaultPhaseMaven.svg">
</p>

In `Maven` all phases and goals are executed strictly in order, linearly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 So, Maven doesn't use caching at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo As far as I understand, that Maven can use added extensions from Gradle for caching. Or Maven can rebuild only changed project modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maven has .m2 folder at least. In this folder it keeps all downloaded dependencies. So it's some sort of caching too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96
Copy link
Author

@volodya-lombrozo check please

To speed up the assembly of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
Let's look at the assembly scheme using C++ as an example
to imagine the build process in compiled languages:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe we can we change "Imagine" to "Visualize"? What do you think?
BTW, "to imagine the build process in compiled languages" looks redundant.

<img src="/images/defaultCPhase.svg">
</p>

1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Wdyt?
"First, the preprocessor retrieves the source code files, which consist of both source files .cpp and header files .h."

</p>

1) First, preprocessor gets the input files. The input files are source files `.cpp` and header files `.h`.
The result is a single edited file `.cpp` with human-readable code that the compiler will get.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "edited"? Seems redundant.

The result is a single edited file `.cpp` with human-readable code that the compiler will get.


2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is an "object file" machine code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "object file" is machine code.



2) The compiler receives the edited code file `.cpp` and converts it into machine code, presented in an object file.
At the compilation stage, parsing occurs, which checks whether the code matches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Wdyt about "parsing checks ..."

This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
To speed up the assembly of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
Let's look at the assembly scheme using C++ as an example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 To be honest, I have some doubts about this paragraph where you discuss "compilation steps":

  1. First of all this is a blog about caching, not about compilation
  2. You describe compilation incompletely. What about optimizations? Moreover, modern compilers usually convert source code to some sort of IR, like LLVM IR, for example. You can take a look how clang works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I believe that a reader will better understand this article if we briefly talk about the stages of compilation of the presented build systems. Yes, I describe compilation incompletely, but enough to indicate where caching works.

Copy link
Member

@volodya-lombrozo volodya-lombrozo Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Then, it's good to mention it.

I describe compilation incompletely, but enough to indicate where caching works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "The goal is to implement effective caching in EO.
For this, we will briefly look at how frequently used build systems (ccache, Maven, Gradle) work
in order to better understand the ideas behind caching in them."
it's ok?

This machine code is then combined into one executable file.


`ccache` hash algorithm, for the hashing of information to find cached files fast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This sentence is inconsistent with the previous one. Moreover, it seems we have grammar errors here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo the two last sentences (lines 52-55) was unnecessary. I need delete them. Thanks, I have found grammar error.

1) `Direct mode` - hash is generated based on the source code.
`Direct mode` compiles the program faster, since the preprocessor step is skipped.
However,the header files are not checked for changes, so the project may be built with not verified header files.
2) `Preprocessor mode` - hash is generated based on the result of preprocessor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is it some particular result ("the")? Which preprocessor do you mean?

`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
And `sccache` includes support for caching the compilation of C/C++ code,
[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using
[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What is nvcc you didn't mentioned it before.

And `sccache` includes support for caching the compilation of C/C++ code,
[Rust](https://github.com/mozilla/sccache/blob/main/docs/Rust.md), as well as NVIDIA's CUDA using
[nvcc](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html),
and [clang](https://llvm.org/docs/CompileCudaWithLLVM.html), while `ccache` works with C and C++ code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What do you mean:

ccache works with C and C++ code.

As I undestand ccache works with compilers, not with code.

@Yanich96
Copy link
Author

@volodya-lombrozo check please

In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation.
Recently we found a caching
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
for EO version `0.34.0`. The bug occurred because the old verification method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's better to say: "The bug occurred because the old verification method used compilation time and caching time to search for a cached file"

This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
We came to the conclusion that we need caching with a reliable verification method.
And this verification method should not use the information that the cached file contains.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

"Furthermore, this verification method should refrain from reading the file content."

## Build caching of existing build systems

### ccache/sccache
In compiled programming languages, building a project containing many source code files takes a long time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

"containing" -> "with"

<img src="/images/defaultCPhase.svg">
</p>

1) First, preprocessor retrieves the source code files,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You only say that "preprocessor" only retrieves the source code files. And then... magic...:

The result is a single file .cpp with human-readable code that the compiler will get.

Moreover, you don't need "compiler will get"

1) First, preprocessor retrieves the source code files,
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the edited code file `.cpp` and converts it into object file - `.obj`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "edited code file" looks strange. Please, be more concrete here. Which file? Moreover, "edited" isn't frequently used in texts.

The result of the linker is an executable `.exe` file.


`ccache` has hash algorithm, for the hashing of information to find cached files fast.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This paragraph is disconnected with the previous one. You only said about compilation and then suddenly started with ccache. Please, read this two paragraphs and you will see.

### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
Building a project in `Maven` is completed in three
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Are you sure here? Yes, Maven definitely has three life cycles. However, to build a project you might use the only one.

Moreover, "Building a project in Maven is completed" has some particular meanings. None of them are suitable here.

<img src="/images/defaultPhaseMaven.svg">
</p>

In `Maven` all phases and goals are executed strictly in order, linearly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`Maven` suggests rebuilding only changed project modules to speed up the build process.

### Gradle
But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph -
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Just "Unlike Maven..."


### Gradle
But unlike `Maven`, [Gradle](https://gradle.org) builds projects using a task graph -
[Directed Acyclic Graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to give a link to a "Gradle task graph" instead? Why do I need to read about DAGs?

@Yanich96
Copy link
Author

@volodya-lombrozo check please

Furthermore, this verification method should refrain from reading the file content.

The goal is to implement effective caching in EO.
To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "frequently used", maybe it's better to use "well-known"? wdyt?


The goal is to implement effective caching in EO.
To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle)
in order to gain a deeper understanding of the caching concepts employed within them and to development caching in EO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "and to development caching in EO" is redundant.

1) First, preprocessor retrieves the source code files,
which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and converts it into object file - `.obj`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "converts it" -> "compiles it"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "object file" -> "an object file" ?


Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
This mode allows to build the program faster, since the preprocessor step is skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do we really skep "preprocessor step" here? Ot it's just the mode that doesn't require "preprocessor"?

Copy link
Author

@Yanich96 Yanich96 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I have fixed the text so: "
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
The ccache hash is
based on:

  • the file contents
  • the current directory of the file
  • the name of the compiler
  • the compiler’s size and modification time
  • extensions used by the compiler.

Moreover, ccache has two types of the hashing:

  1. Direct mode - the hash is generated based on the source code only.
    When using this mode, the user must ensure that the external libraries used in a project have not changed.
    Otherwise, the project will fail to build, resulting in errors.
  2. Preprocessor mode - hash is generated based on the .cpp file received after the preprocessor step.
    Preprocessor mode is slower than direct mode, but the project is built without compile errors.."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I added an explanation of what ccache does: "When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times." And I changed the sentences about modes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I like it. Except the last sentence "but the project is built without compile errors.." - It might be removed.

Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
This mode allows to build the program faster, since the preprocessor step is skipped.
When using this mode, the user must be sure that the external libraries, using in a project, have not changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "When using this mode, the user must ensure that the external libraries used in a project have not changed."

`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.

To understand how `Incremental build` works, consider the following steps:
1) Before executing a task for the first time, `Gradle` takes a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Before executing a task for the first time" -> "Before executing a task"

`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.

To understand how `Incremental build` works, consider the following steps:
1) Before executing a task for the first time, `Gradle` takes a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 takes? Maybe it's better to say "calculates"? wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I used "take" because this verb is used the Gradle documentation

1) Before executing a task for the first time, `Gradle` takes a
[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
of the path and contents of the source files and saves it.
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

  1. the fingerprint
  2. "of the path and contents" - better to remove it, you already mentioned it above.
  3. Where does it save the fingerprint?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Where does it save the fingerprint?" - I don't know. the Gradle documentation has the sentence: "Gradle persists both fingerprints for the next time the task is executed." Nothing is said about the save location.

[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
of the path and contents of the source files and saves it.
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
3) Before each rebuilding of the task, `Gradle` generates a new fingerprint of the source files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Then, when Gradle starts a project build again, it generates a new fingerprint for the same files. If the new fingerprint has not changed, Gradle can safely skip this task."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
If the new fingerprint has not changed, Gradle can safely skip this task can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." - is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You don't need this: "can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." It's obvious.

If none of the inputs or outputs have changed, Gradle can skip that task.


In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What is the difference with the cache that you described above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo this cache is for various branches, above - for project of one branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Ok, let's leave it, but I have some doubts about the description. For me, it's not clear from the first glance.


Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
This mode allows to build the program faster, since the preprocessor step is skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I like it. Except the last sentence "but the project is built without compile errors.." - It might be removed.

When using this mode, the user must be sure that the external libraries, using in a project, have not changed.
Otherwise, the project will build with errors.
2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
`Preprocessor mode` is slower than `direct mode`, but the project is built without errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need to say here about compile errors? Which errors do you mean? If you mention the errors related to libraries, it's better to specify it. Otherwise, it's better just to remove these sentence.

2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
`Preprocessor mode` is slower than `direct mode`, but the project is built without errors.

`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I mean ccache and sccache altogether. What is the difference with other types of caching? Why did you choose these tools?

showcasing how inputs and outputs are specified to enable `Incremental build`:
```
task myTask {
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 good



In this chapter, we introduce the keywords:
* `the source file`: This file serves as the input for goal operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you use "the" ? Is it some particular source file which we know, or which you introduced previously?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will remove these text.

* `the cached file`: This file contains the results of goal's execution.


The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's untrue. They just interfaces. Why do they "derive" from the SafeMojo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, provide links to this files in the repository.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo It's my mistake. There was SafeMojo in another sentence.

this.validations = cv;
}

public Optional<XML> load(final Path source, final Path cache) {...};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to make Path cache a field? Since you are using it in all the methods.

The `CacheValidation` interface has the only method ensuring that each validation contains a specific test condition.

```
public interface CacheValidation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I didn't grasp the idea why we might need this class and why it has exactly this implementation.


### Conclusion
In this article, we explored various build systems and their caching methods.
We were motivated to find an efficient caching approach for EO due to issues discovered during bug investigation.
Copy link
Member

@volodya-lombrozo volodya-lombrozo Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 To be honest, I don't understand why you "explored various build systems". This analysis is completely disconnected from the solution you propose. You observe high-level concepts in other build systems but didn't do any conclusions from them. You merely mention their existence. Why? Why should we read about them? Where is the connection with the second part of the text?

Later, you just mentioned that we have some goals and some Mojos in Maven. Why? Why do we need to read about them? What is the purpose of this?

Please, pause for a moment and consider these questions:

What is the purpose of this blog post? Why are you writing it, and why should people read it?
At which level of abstraction do you want to discuss? High-level caching mechanisms like ccache, or low-level CacheValidation implementation?

@Yanich96
Copy link
Author

@volodya-lombrozo check please

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

I would suggest to add conclusion after each build system you mentioned:

  1. for ccache
  2. for grade
  3. for maven

Maybe you will need to move some conclusions from "EO build cache".

"EO build cache" should be a conclusion for the entire blog post where you describe how we will implement caching (on a high-level) in EO.


P. S. I understand that you are trying to use different terms from different systems:

  • fingerprint
  • hash
  • key
    But sometimes it's hard to keep in mind that they mean the same. So it confuses a bit.


<!--more-->

## Build caching of existing build systems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What about "Caching in Build Systems" ? or " Caching in Other Build Systems".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will choose " Caching in Other Build Systems"

At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You might use a different verb instead of "gets". It actually does something with this files. Combines? Resolves?

To speed up the build of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
`ccache` uses the hash to save a code in the cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You might remove this sentence.

Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
When using this mode, the user must ensure that the external libraries used in a project have not changed.
Otherwise, the project will fail to build, resulting in errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "will" -> "might"

2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
3) Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
If the new fingerprint has not changed, Gradle can safely skip this task.
In the opposite case, the task needs to perform an action and to rewrite outputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What about: "In the opposite case, the task performs an action again and rewrites outputs"



In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
for example when switching from one branch to another. This feature is known as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, specify which branch do you mean. Is it a "git" branch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo yes, "git" branch

Each `module` has its own `pom.xm` file, and there is an aggregator `pom.xml` that consolidates all the `modules`.
This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`,
including the source code and the configuration of the plugins used within it.
`Modules` with the same key are current or unchanged and the cache can efficiently restore them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This sentence doesn't make sense to me: "Modules with the same key are current or unchanged and the cache can efficiently restore them.".
By the way, what is the "key"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo key is hash. Can you explain me your point: "This sentence doesn't make sense to me"? Do you mean this is obvious?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I don't understand "current" word here

This plugin takes a key for a `module`, it encapsulates the essential aspects of the `module`,
including the source code and the configuration of the plugins used within it.
`Modules` with the same key are current or unchanged and the cache can efficiently restore them.
Conversely, the cache seamlessly delegates the build work to the standard Maven core,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Conversely"?

These caching interfaces shared similar logic, but with minor differences.
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
Additionally, the conditions for searching data in the cache had errors.
The cached file is considered valid if the end time of goal's execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You don't need to explain it again. I've already mentioned this bug above. Could you please remove this sentence?

from the file attributes without reading the file context.


### Conclusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I don't think we need such a conclusion in a blog post. It isn't a scientific article. Moreover it doesn't provide any useful information. Kinda "water".

@Yanich96
Copy link
Author

Yanich96 commented Apr 2, 2024

@volodya-lombrozo I have fixed this post.

  1. I added "between goals" in the sentence in introduction "Recently we found a caching bug between goals in eo-maven-plugin for EO version 0.34.0." to make it clear at what level the EO code works.
  2. I have added conclusions after each build system.

Check please.

Copy link
Member

@volodya-lombrozo volodya-lombrozo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96, please thoroughly review the entire text at least once (preferably more) before submitting it for review. This will greatly expedite our review process.


`ccache` is a high-level tool and cannot work with individual compilation tasks,
therefore `ccache` is not suitable for solving our problems.
However, the concept of non-local data storage could potentially be incorporated during the development of the EO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "of the EO cache"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "EO caching implementation"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I meant EO in general. If "EO caching implementation" is better, I will fix it.



`ccache` is a high-level tool and cannot work with individual compilation tasks,
therefore `ccache` is not suitable for solving our problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I wouldn't say this:

therefore ccache is not suitable for solving our problems.

What about this:

ccache cannot work with individual compilation tasks (...for example...). However, the hashing approach and the concept of non-local data storage could potentially be incorporated during the development of the EO caching mechanism.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo
"ccache is a high-level tool and cannot work with individual compilation tasks (e.g. Maven goal or Gradle task).
However, the hashing approach and the concept of non-local data storage could potentially
be incorporated during the development of the EO caching mechanism."
is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need to mention this:

is a high-level tool and

?

`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 The second sentence clearly explains the idea which you are trying to explain here. I would suggest to combine this two sentences into a single one. Or jut to remove this sentence. What do you think?


To understand how `Incremental build` works, consider the following steps:
1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it.
The hash is considered current if the last modification time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "current" is a strange word here. I guess you meant something different.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "valid" is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I'm not sure what you mean here, but I guess, yes.

1) Before executing a task, `Gradle` takes a hash of the path and contents of the inputs files and saves it.
The hash is considered current if the last modification time
and the size of the source files have not changed.
2) Then `Gradle` executes the task and saves a hash of the path and contents of the output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is it a single hash for all the files?

Copy link
Author

@Yanich96 Yanich96 Apr 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo yes. It is written in The Gradle documentation - "Gradle takes a fingerprint of the inputs. This fingerprint contains the paths of input files and a hash of the contents of each file. Gradle then executes the task."

Should I mark this clarification in blog-post?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It would be nice



Maven's caching mechanisms operate at the level of `phases` and individual project modules.
Therefore, existing caching systems in Maven do not align with our requirements for resolving present issues.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You haven't explained the caching in this section at all. How hash is generated which data it requires to generate a hash? file content, file path, last modification time?
You have the rather good section for ccache. I believe we might use the same structure for Maven and Gradle.

These tasks happen one after the other, and each task relies on the output of the one before it.
Each task has directories for input and output data, as well as a directory for storing cached data.
Using the program name, each task can receive and store data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need two consecutive empty lines here? If you need some logical division, use headings and clear sections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Same question.

leading to redundancy and complicating the caching infrastructure.


To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why?

However, we cannot use them.

We can utilize parts of existing solutions, such as the hash generation algorithm from ccache and Gradle's task caching approach for our compilation steps.

To address caching challenges in EO, we closely examined existing caching systems. However, we cannot use them.
We require a caching mechanism at the level of `goals`.
In fact, we don't need to invent a new caching mechanism for EO.
Instead, it suffices to verify the last modification time of the files involved in EO compilation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why "last modification time" ? We need more examples here, but it looks like a totally unreliable approach.

The modification time of the preceding task must not exceed that of the subsequent one.
As each task possesses directories for input and output data, accessing the desired file
via an absolute path enables retrieval of essential information, as file name and last modified time,
from the file attributes without reading the file context.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96, could you write down the cache usage algorithm as you have done it in the Gradle section? Please outline the steps as you "see" them.

@Yanich96
Copy link
Author

@volodya-lombrozo check please

which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`.
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Did you mean "parser" instead of "parsing"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo yes, thanks

The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and compiles it into an object file - `.obj`.
At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
At the end, the compiler optimizes the resulting machine code and produces an object file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You already mentioned it:

The compiler receives the file .cpp from the preprocessor and compiles it into an object file - .obj.

[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
input and output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "To enable an incremental build, the tasks that build the project must specify their input and output files."

```


To understand how `Incremental build` works, consider the following steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Something strange is happening here with punctuation. Did you put this sentences in this order intentionally?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo If I replace "To understand how Incremental build works, consider the following steps" with "How Incremental build works", will it be ok?

Copy link
Member

@volodya-lombrozo volodya-lombrozo May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is it possible to remove this sentence?


To understand how `Incremental build` works, consider the following steps:
`Incremental build` uses a hash to detect changes in the inputs and the outputs.
The single hash contains the paths and the contents of all the input files or output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "contains"? Maybe "uses"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo In Gragle documentation: "This fingerprint contains the paths of input files and a hash of the contents of each file."


`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description).
Steps of the EO compiler can have input and output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why did you write this two sentences about eo?


In Maven, the `phases` are inherently interconnected within the build lifecycle.
A `phase` represents a specific task, and the execution order of `phases` is determined by the default Maven
lifecycle bindings. Each `phase` functions as a series of individual tasks known as `goals`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You have already described phases and goals above. Could you please remove this redundancy and repetition?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will remove 130-131 lines:

Each lifecycle consists of phases and these phases consist of sets of goals.
One phase can consist of several goals.

functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
top-level classes that implement specific build actions, denoted as methods annotated `@Builder`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do we really need to know this low-level details about takari?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 The question remains

It does not use hashing algorithms, which can slow down project build times,
and it does not have separate cache directories.
Each `builder` has own directories for input and output data related to their work.
The operational principle of the Takari Incremental API is similar to the operation of caching in EO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 We don't know about caching in EO yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Still an issue.



`Gradle Incremental build` can manage separate compilation tasks based on inputs and outputs.
And the EO compiler consists from a unit of work in `Maven` (the last section contains a detailed description).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You can add a link to the Maven section.

It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file.
Additionally, Maven also supports `goals` that are not bound to any build phase
and can be executed outside the build lifecycle, directly through the command line.
The sequence of achieving `goals` is as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I'm not sure we should explain maven lifecycle so deep. I would just leave necessary information. For the rest you might provide a link to the documentation.

functionality to plugins for the standard lifecycle, but with significantly fewer dependencies. This plugin leverages
[The Takari Incremental API](https://github.com/takari/io.takari.incrementalbuild),
which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
top-level classes that implement specific build actions, denoted as methods annotated `@Builder`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 The question remains


Special attention should be given to the Takari Incremental API.
This API can be applied to cache EO compilation stages as it operates with `goals`.
It does not use hashing algorithms, which can slow down project build times,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 How does it possible? Does it really cache something if it doesn't use hashing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo The Takari checks the last modification time of the input files. It doesn't create a hash.

Or did I not understand the question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It would be good to mention it here:

The Takari checks the last modification time of the input files. It doesn't create a hash.

It does not use hashing algorithms, which can slow down project build times,
and it does not have separate cache directories.
Each `builder` has own directories for input and output data related to their work.
The operational principle of the Takari Incremental API is similar to the operation of caching in EO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Still an issue.

</p>


<p align="center">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96, I agree that we need to remove the redundancy in the code. For example, we should combine the Footprint and Optimization methods and fix the hash comparison mechanism. As for checking the "previous" step, I completely disagree. First of all, some steps might be skipped, and more importantly, doing this significantly increases coupling between the phases, which is a significant architectural flaw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This comment is still relevant.

@Yanich96
Copy link
Author

@volodya-lombrozo check please

It's also possible to add a new `goal` to a desired phase by modifying the `pom.xml` file.
Additionally, Maven also supports `goals` that are not bound to any build phase
and can be executed outside the build lifecycle, directly through the command line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do you need this empty line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I guess so, because I talk since in lines 122-128 about Maven and in lines 131-150 about caching mechanisms in Maven.

which introduces the concept of `builders`. These `builders` are user-provided public non-abstract
top-level classes that implement specific build actions.
They can produce various types of outputs, including generated/output files on the filesystem,
build messages, and project model mutations. For each `builder` annotated method, a maven mojo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I didn't understand this sentence:

For each `builder` annotated method, a maven mojo, which represents a maven `goal`, is generated.

which represents a maven `goal`, is generated.
When a `builder` is run for a given set of inputs, it produces and saves to the specified directory the same outputs.
Any changes in the inputs result in the removal of outputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do you need this empty line?


Special attention should be given to the Takari Incremental API.
This API can be applied to cache EO compilation stages as it operates with `goals`.
It does not use hashing algorithms, which can slow down project build times,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It would be good to mention it here:

The Takari checks the last modification time of the input files. It doesn't create a hash.

These tasks happen one after the other, and each task relies on the output of the one before it.
Each task has directories for input and output data, as well as a directory for storing cached data.
Using the program name, each task can receive and store data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Same question.

1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
(e.g. `example.eo`, `example.xml`).
2) When the EO compiler compiles this program task, it saves files of compilation steps into cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "compiler compiles"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "compiler assembles" is it ok?

</p>


<p align="center">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This comment is still relevant.

@Yanich96
Copy link
Author

@volodya-lombrozo check please

* Employing multiple caching mechanisms for similar entities is a suboptimal practice,
leading to redundancy and complicating the caching infrastructure.

In tackling caching challenges within EO, we conducted a thorough evaluation of current caching systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 There is lot of "water" in this paragraph. Could you simplify this text please?

for storing and retrieving data from the cache.
The logic for checking the relevance of cached data is presented below:
1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do we need to have different format?

1) We create EO program, named "example".
Intermediate files during compilation of this program will have the same name, but not the format
(e.g. `example.eo`, `example.xml`).
When the EO compiler assembles this program task, it saves files of compilation steps into cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What is the "program task"?

When the EO compiler assembles this program task, it saves files of compilation steps into cache.
Each compilation step has its own caching directory and an input file directory.
2) When the EO compiler starts a project build again, it will check if there is the input file, named "example",
in the cache of step. If such a file exists,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "cache of step"?

Each compilation step has its own caching directory and an input file directory.
2) When the EO compiler starts a project build again, it will check if there is the input file, named "example",
in the cache of step. If such a file exists,
then it is enough to check that the last modification time of cached file at the current step
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Which files do you compare? According with this text, you compare input with output. Their modification times will be definitely different.

4) If the EO program file [Picture 5](/images/RewritingInCacheEO1.svg)
or any input file [Picture 6](/images/RewritingInCacheEO2.svg) have changed,
then the previously cached files become invalid.
In this case, the compilation step performs an action again and rewrites outputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "rewrite outputs"? Do you mean cache here?

@volodya-lombrozo
Copy link
Member

@yegor256 Could you take a look, please?

@yegor256
Copy link
Member

@Yanich96 I appreciate the study you conducted with existing caching tools! Great work! In order to make this blog post more impressive and informative I would suggest slightly modify its structure. How about this one:

  • What "build caching" is for?
  • What existing approaches to build caching we know? (illustrate each approach with an example and a link to a system that uses it)
  • What approach did we choose for EO and why?
  • How to manage build caching in EO? (show examples of turning it off, checking its status, etc.)
  • What are the known limitations (possible downsides) of our solution?
  • How much performance gain build caching gives us in EO projects?

WDYT? @volodya-lombrozo

@volodya-lombrozo
Copy link
Member

@Yanich96 I appreciate the study you conducted with existing caching tools! Great work! In order to make this blog post more impressive and informative I would suggest slightly modify its structure. How about this one:

  • What "build caching" is for?
  • What existing approaches to build caching we know? (illustrate each approach with an example and a link to a system that uses it)
  • What approach did we choose for EO and why?
  • How to manage build caching in EO? (show examples of turning it off, checking its status, etc.)
  • What are the known limitations (possible downsides) of our solution?
  • How much performance gain build caching gives us in EO projects?

WDYT? @volodya-lombrozo

@yegor256 I don't think we need to stick to the scope of this article without adding something new. It's already large. So, I would exclude this points:

  • What are the known limitations (possible downsides) of our solution?
  • How much performance gain build caching gives us in EO projects?

With all the rest I completely agree.

@Yanich96 @yegor256 I would also recommend to move this changes into a separate PR since this one takes too much time to load on my laptop. (Because there are many comments already, I guess.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blog post about caching
3 participants