Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(#56) blog about caching #58

Closed
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
285 changes: 285 additions & 0 deletions _posts/2024/2024-02-06-about-caching-in-eo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
---
layout: post
date: 2024-02-06
title: "Build cache in EO and other build systems"
author: Alekseeva Yana
---


## Introduction
In [EO](https://github.com/objectionary/eo), caching is used to speed up program compilation.
Recently we found a caching
[bug](https://github.com/objectionary/eo/issues/2790) in `eo-maven-plugin`
for EO version `0.34.0`. The bug occurred because the old verification method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's better to say: "The bug occurred because the old verification method used compilation time and caching time to search for a cached file"

used compilation time and caching time to search for a cached file.
This is not the most reliable verification method,
because caching time does not have to be equal to compilation time.
We came to the conclusion that we need caching with a reliable verification method.
Furthermore, this verification method should refrain from reading the file content.

The goal is to implement effective caching in EO.
To achieve the goal, we will briefly look at how frequently used build systems (such as ccache, Maven, Gradle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "frequently used", maybe it's better to use "well-known"? wdyt?

in order to gain a deeper understanding of the caching concepts employed within them and to development caching in EO.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "and to development caching in EO" is redundant.


<!--more-->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "More"?


## Build caching of existing build systems
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What about "Caching in Build Systems" ? or " Caching in Other Build Systems".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will choose " Caching in Other Build Systems"


### ccache/sccache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is it a build system or what? Where is the link? Short description?

In compiled programming languages, building a project with many source code files takes a long time.
This time is spent on loading of libraries, preparing, optimizing, checking the code, and so on.
Let's look at the assembly scheme using C++ as an example:

<p align="center">
<img src="/images/defaultCPhase.svg">
</p>

1) First, preprocessor retrieves the source code files,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You only say that "preprocessor" only retrieves the source code files. And then... magic...:

The result is a single file .cpp with human-readable code that the compiler will get.

Moreover, you don't need "compiler will get"

which consist of both source files `.cpp` and header files `.h`.
The result is a single file `.cpp` with human-readable code that the compiler will get.
2) The compiler receives the file `.cpp` from the preprocessor and converts it into object file - `.obj`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "converts it" -> "compiles it"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "object file" -> "an object file" ?

At the compilation stage, parsing checks whether the code matches rules of a specific programming language.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Did you mean "parser" instead of "parsing"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo yes, thanks

At the end, the compiler optimizes the resulting machine code and produces an object file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You already mentioned it:

The compiler receives the file .cpp from the preprocessor and compiles it into an object file - .obj.

To speed up compilation, different files of the same project might be compiled in parallel.
3) Then, the [Linker](https://en.wikipedia.org/wiki/Linker_(computing)) gets object files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Again. Linker gets something... magic... we have an executable file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You might use a different verb instead of "gets". It actually does something with this files. Combines? Resolves?

The result of the linker is an executable `.exe` file.


To speed up the build of compiled languages, [ccache](https://ccache.dev)
and [sccache](https://github.com/mozilla/sccache) are used.
`ccache` uses the hash algorithm for the hashing of code at certain stages of the build.
`ccache` uses the hash to save a code in the cache.
The [`ccache` hash](https://ccache.dev/manual/4.8.2.html#_common_hashed_information) is
based on:
* the file contents
* the current directory of the file
* the name of the compiler
* the compiler’s size and modification time
* extensions used by the compiler.

Moreover, `ccache` has two types of the hashing:
1) `Direct mode` - the hash is generated based on the source code only.
This mode allows to build the program faster, since the preprocessor step is skipped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Do we really skep "preprocessor step" here? Ot it's just the mode that doesn't require "preprocessor"?

Copy link
Author

@Yanich96 Yanich96 Mar 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I have fixed the text so: "
When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times.
The ccache hash is
based on:

  • the file contents
  • the current directory of the file
  • the name of the compiler
  • the compiler’s size and modification time
  • extensions used by the compiler.

Moreover, ccache has two types of the hashing:

  1. Direct mode - the hash is generated based on the source code only.
    When using this mode, the user must ensure that the external libraries used in a project have not changed.
    Otherwise, the project will fail to build, resulting in errors.
  2. Preprocessor mode - hash is generated based on the .cpp file received after the preprocessor step.
    Preprocessor mode is slower than direct mode, but the project is built without compile errors.."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I added an explanation of what ccache does: "When compiling a file, its hash is calculated.
If the file is already present in the registry of compiled files, the file will not be compiled again.
Instead, the previously compiled binary file will be utilized.
This approach can significantly accelerate the build process of certain packages, reducing build times by 5-10 times." And I changed the sentences about modes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I like it. Except the last sentence "but the project is built without compile errors.." - It might be removed.

When using this mode, the user must be sure that the external libraries, using in a project, have not changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "When using this mode, the user must ensure that the external libraries used in a project have not changed."

Otherwise, the project will build with errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Otherwise, the project will fail to build, resulting in errors."

2) `Preprocessor mode` - hash is generated based on the `.cpp` file received after the preprocessor step.
`Preprocessor mode` is slower than `direct mode`, but the project is built without errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This is strange statement. What if we have an error in the code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo It is ok: "Preprocessor mode is slower than direct mode, but the project is built without compile errors."?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need to say here about compile errors? Which errors do you mean? If you mention the errors related to libraries, it's better to specify it. Otherwise, it's better just to remove these sentence.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will delete these sentence.


`Sccache`, unlike `ccache`, allows you to store cached files not only locally, but also in a cloud data storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it worth mentioning, that sccache is the same as ccache? And only after it say about differences.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "Sccache is similar in purpose to ccache but provides more functionality.
Sccache allows to store cached files not only locally, but also in a cloud data storage.
And sccache supports a wider range of languages, while ccache focuses on caching C and C++ compiler." - is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And `sccache` supports a wider range of languages, while `ccache` focuses on caching C and C++ compiler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe we need to write a short summary 1-2 sentences about this type of caching?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo The principle of caching in sccache is the same as in ccache (using Direct and Preprocessor modes), the only difference is in the places where the data is stored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I mean ccache and sccache altogether. What is the difference with other types of caching? Why did you choose these tools?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I wrote above that I looked at well-known used build systems. Isn't this enough?


### Gradle
[Gradle](https://gradle.org) builds projects using a
[task graph](https://docs.gradle.org/current/userguide/build_lifecycle.html) that allows for synchronous execution
of certain tasks.
`Gradle` employs
[Incremental build](https://docs.gradle.org/current/userguide/incremental_build.html#sec:how_does_it_work),
to speed up project builds.
For an incremental build to work, the tasks used to build the project must have specified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Could you please simplify this sentence and use simple active voice?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "The tasks that build the project must have input and output files for an incremental build to work." - is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 The second sentence clearly explains the idea which you are trying to explain here. I would suggest to combine this two sentences into a single one. Or jut to remove this sentence. What do you think?

source and output files.
The provided code snippet demonstrates the implementation of a custom task in Gradle,
showcasing how inputs and outputs are specified to enable `Incremental build`:
```
task myTask {
inputs.dir 'src/main/java/MyTask.somebody' // Specify the input directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 MyTask.somebody looks like a file, not a directory.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I have fixed this example:

task myTask {
    inputs.file 'src/main/java/MyTask.somebody' // Specify the input file
    outputs.file 'build/classes/java/main/MyTask.somebody' // Specify the output file
    
    doLast {
        // Task actions go here
        // This code will only be executed if the inputs or outputs have changed
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 good

outputs.dir 'build/classes/java/main/MyTask.somebody' // Specify the output directory

doLast {
// Task actions go here
// This code will only be executed if the inputs or outputs have changed
}
}
```
`Gradle` uses this information to determine if a task is up-to-date and needs to perform any work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Who "needs to perform any work" ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I think that I will delete this sentence


To understand how `Incremental build` works, consider the following steps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Something strange is happening here with punctuation. Did you put this sentences in this order intentionally?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo If I replace "To understand how Incremental build works, consider the following steps" with "How Incremental build works", will it be ok?

Copy link
Member

@volodya-lombrozo volodya-lombrozo May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Is it possible to remove this sentence?

1) Before executing a task for the first time, `Gradle` takes a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Before executing a task for the first time" -> "Before executing a task"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 takes? Maybe it's better to say "calculates"? wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I used "take" because this verb is used the Gradle documentation

[fingerprint](https://en.wikipedia.org/wiki/Fingerprint_(computing))
of the path and contents of the source files and saves it.
2) Then `Gradle` executes the task and saves a fingerprint of the path and contents of the output files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96

  1. the fingerprint
  2. "of the path and contents" - better to remove it, you already mentioned it above.
  3. Where does it save the fingerprint?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Where does it save the fingerprint?" - I don't know. the Gradle documentation has the sentence: "Gradle persists both fingerprints for the next time the task is executed." Nothing is said about the save location.

3) Before each rebuilding of the task, `Gradle` generates a new fingerprint of the source files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 "Then, when Gradle starts a project build again, it generates a new fingerprint for the same files. If the new fingerprint has not changed, Gradle can safely skip this task."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo "Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
If the new fingerprint has not changed, Gradle can safely skip this task can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." - is it ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You don't need this: "can reuse outputs.
In the opposite case, the task needs to perform an action and to rewrite outputs.
The fingerprint is considered current if the last modification time
and the size of the source files have not changed." It's obvious.

and compares it with the current fingerprint.
The fingerprint is considered current if the last modification time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 This sentence makes no sense to me. If you have an intention to describe "fingerprint". It's better to do above where you left the link to the fingerprint definition. You might say that "fingerprint is based on last modification time and size of the source files"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I did this:

  1. Before executing a task, Gradle takes a
    fingerprint
    of the path and contents of the inputs files and saves it.
    The fingerprint is considered current if the last modification time
    and the size of the source files have not changed.
  2. Then Gradle executes the task and saves a fingerprint of the path and contents of the output files.
  3. Then, when Gradle starts a project build again, it generates a new fingerprint for the same files.
    If the new fingerprint is current, Gradle can safely skip this task.
    In the opposite case, the task performs an action again and rewrites outputs.

I believe that the sentence "fingerprint is based on last modification time and size of the source files" is not good idea because fingerprint is based on the path and the contents of a file.

and the size of the source files have not changed.
If none of the inputs or outputs have changed, Gradle can skip that task.


In addition to `Incremental build`, `Gradle` also stores fingerprints of previous builds, enabling quick project builds,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What is the difference with the cache that you described above?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo this cache is for various branches, above - for project of one branch

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Ok, let's leave it, but I have some doubts about the description. For me, it's not clear from the first glance.

for example when switching from one branch to another. This feature is known as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, specify which branch do you mean. Is it a "git" branch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo yes, "git" branch

the [Build Cache](https://docs.gradle.org/current/userguide/build_cache.html).


### Maven
[Maven](https://maven.apache.org) automates and manages Java-project builds.
`Maven` is based on the concept of
[Maven LifeCycles](https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html),
which include default, clean, and site lifecycles.
Each lifecycle consists of `phases` and these `phases` consist of sets of `goals`.

In Maven, there are default phases and goals for building any projects:

<p align="center">
<img src="/images/defaultPhaseMaven.svg">
</p>

By default, the `phases` in Maven are inherently connected within the build lifecycle.
Each `phase` represents a specific task, and the execution order of `goals` within `phases` is determined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I thought that:

  • a goal is a specific task, not a phase
  • execution order of phases is determined by Maven
  • execution order of goals is determined by a developer

by the default Maven lifecycle bindings. This means that while each `phase` operates as a series of individual tasks,
they are part of a cohesive build lifecycle, and their execution order is predefined by Maven.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, remove it "they are part of a cohesive build lifecycle,"



`Maven` supports `Incremental build` through plugins the `takari-lifecycle-plugin` and
`maven-build-cache-extension`.
The [takari-lifecycle-plugin](http://takari.io/book/40-lifecycle.html) is an alternative to the default Maven lifecycle
(building JAR files). Its distinctive feature is the use of a single universal plugin with the same functionality
as five separate plugins for the standard lifecycle, but with significantly fewer dependencies. As a result,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do we need to know that there are exactly "five" separate plugins?

it provides a much faster startup, more optimal operation, and lower resource consumption.
This leads to a significant increase in performance when compiling complex projects with a large number of modules.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Where is a "cache" here? You mentioned that it's a build tool only.


The [maven-build-cache-extension](https://maven.apache.org/extensions/maven-build-cache-extension/)
is used for large Maven projects that have a significant number of small modules.
This plugin takes a key for a project module, it encapsulates the essential aspects of the module,
including the source code and the configuration of the plugins used within it.
Projects with the same key are considered current (unchanged) and can be efficiently restored from the cache.
Conversely, projects that generate different keys are deemed outdated (changed),
prompting the cache to initiate a complete rebuild for them. In the event of a cache miss,
where an outdated project requires a complete rebuild,
the cache seamlessly delegates the build work to the standard Maven core,
without interfering with the build execution logic.
This ensures that only the changed modules within the project are rebuilt,
minimizing unnecessary overhead and optimizing the build process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 What is the conclusion? Why did you mention Maven? Does this caching similar to Grade? to ccache? What is the difference?


### EO build cache

The EO code uses the `Maven` for building projects.
For this purpose, there is the `eo-maven-plugin` containing the essential goals for working with EO code.
As previously mentioned, the build of projects in Maven follows a specific order of phases.
Below is a diagram illustrating the main phases and their corresponding goals for the EO:

<p align="center">
<img src="/images/EO.svg">
</p>

In [Picture 3](/images/EO.svg) the goals of the `eo-maven-plugin` are highlighted in green.


However, the actual work with EO code takes place in `AssembleMojo`.
`AssembleMojo` is the goal consisting of other goals that work with the EO file, as shown in
[Picture 4](/images/AssembleMojo.svg).


<p align="center">
<img src="/images/AssembleMojo.svg">
</p>

Each goal within `AssembleMojo` is a distinct compilation step for EO code.
These tasks happen one after the other, and each task relies on the output of the one before it.
To speed up the EO program rebuild process, it is helpful to save the results of each goal.
This avoids repeating actions and makes the compilation more efficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you constantly say this: "This avoids repeating actions and makes the compilation more efficient. Using caching methods significantly speeds up the build process."? We know why we use caching. It's totally redundant say it over and over again.

Using caching methods significantly speeds up the build process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you need two consecutive empty lines here? If you need some logical division, use headings and clear sections.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Same question.


In this chapter, we introduce the keywords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's not a book. We don't usually have "chapters" in a blog post.

* `the source file`: This file serves as the input for goal operations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Why do you use "the" ? Is it some particular source file which we know, or which you introduced previously?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo I will remove these text.

* `the cached file`: This file contains the results of goal's execution.


The previous caching mechanism in EO made use of distinct interfaces, specifically `Footprint` and `Optimization`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 It's untrue. They just interfaces. Why do they "derive" from the SafeMojo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Please, provide links to this files in the repository.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@volodya-lombrozo It's my mistake. There was SafeMojo in another sentence.

both of which derive from the `SafeMojo` class.
These caching interfaces shared similar logic, but with minor differences.
For instance, `Footprint` verifies the EO version of the compiler, whereas the remaining checks are identical.
Additionally, the conditions for searching data in the cache had errors.
The cached file is considered valid if the end time of goal's execution
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 You don't need to explain it again. I've already mentioned this bug above. Could you please remove this sentence?

and the time of saving goal's result to the cache are equal.
Due to this issue, the program behaved incorrectly, because saving the goal's result to the cache is not instantaneous.
After conducting an in-depth analysis of the project's incorrect operation,
several disadvantages of the previous caching mechanism in EO were brought to light:
* Incorrect search conditions for data in the cache.
* The verification method requires reading the file content, which results in inefficiencies.
* The presence of multiple caching mechanisms creates challenges in identifying and rectifying caching errors.
* Employing multiple caching mechanisms for similar entities is a suboptimal practice,
leading to redundancy and complicating the caching infrastructure.


To address these disadvantages, the following solutions are proposed:
1) Creating a unified caching mechanism for all goals associated in EO code compilation.
This mechanism, represented by the `Cache` class, will assume responsibility for data validation,
cache storage, and retrieval.
To improve the flexibility for different data verification conditions,
the constructor of the `Cache` class will accept a list of validations.
Here's the corresponding code:

```
public class Cache {

private List<CacheValidation> validations;

public Cache(final List<CacheValidation> cv) {
this.validations = cv;
}

public Optional<XML> load(final Path source, final Path cache) {...};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 Maybe it's better to make Path cache a field? Since you are using it in all the methods.


public void save(final Path cache, final Scalar<String> program, final Path relative) {...};
}
```

The `List<CacheValidation>` represents a list of validations implemented from the `CacheValidation` interface.
This interface defines the structure for validations within the `Cache` class.
The `CacheValidation` interface has the only method ensuring that each validation contains a specific test condition.

```
public interface CacheValidation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I didn't grasp the idea why we might need this class and why it has exactly this implementation.

boolean validate(final Path source, final Path cache) throws IOException;
}
```

2) In order to minimize disk access, we will utilize file paths represented by the `Path` class.
By leveraging methods provided by the `Path` and `Files` classes,
we can obtain essential information such as the file name and the time of the last modification.
The file name plays a crucial role in locating the cached file within the directory,
while the time of the last modification enables us to determine whether
the source file is older or equal in age to the cached file.
Given that the project build process in Maven is linear,
these conditions are deemed sufficient for our caching mechanism.


3) Searching for a cached data will use the following conditions:
* `The source file` and `the cached file` should have same file name;
* Each goal involved caching should have both a cache directory and a directory of result files.
The directory of result files corresponds to the directory of source files for the subsequent goal.
* The time of the last modification of the source file should be earlier or equal than cached file.


Example: Let's consider an EO program named `program.eo`, which is executed for the first time.
The cache of each goal will save the execution results in the cache directory and the result directory.
When this program is run again without changes, these goal will receive data from the cache,
without executing of task and rewriting of result.
However, if we make changes to the `program.eo` file or `the source files` of the goals and execute again,
the execution result of goals was overwritten in the directory of result files and the cache directory.
This approach effectively protects the program from artificial changes during the build process.


### Conclusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 I don't think we need such a conclusion in a blog post. It isn't a scientific article. Moreover it doesn't provide any useful information. Kinda "water".

In this article, we explored various build systems and their caching methods.
We were motivated to find an efficient caching approach for EO due to issues discovered during bug investigation.
Copy link
Member

@volodya-lombrozo volodya-lombrozo Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Yanich96 To be honest, I don't understand why you "explored various build systems". This analysis is completely disconnected from the solution you propose. You observe high-level concepts in other build systems but didn't do any conclusions from them. You merely mention their existence. Why? Why should we read about them? Where is the connection with the second part of the text?

Later, you just mentioned that we have some goals and some Mojos in Maven. Why? Why do we need to read about them? What is the purpose of this?

Please, pause for a moment and consider these questions:

What is the purpose of this blog post? Why are you writing it, and why should people read it?
At which level of abstraction do you want to discuss? High-level caching mechanisms like ccache, or low-level CacheValidation implementation?

The previous caching mechanism was flawed logically and architecturally, making it ineffective.
As a result, we discussed the problems, suggest solutions, and outline the criteria
for implementing a new caching system in EO.















49 changes: 49 additions & 0 deletions images/AssembleMojo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading