Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LFX Term 3]: Full storage path for KCL third-party dependencies #384

Closed
zong-zhe opened this issue Jul 16, 2024 · 23 comments · Fixed by #419
Closed

[LFX Term 3]: Full storage path for KCL third-party dependencies #384

zong-zhe opened this issue Jul 16, 2024 · 23 comments · Fixed by #419
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@zong-zhe
Copy link
Contributor

zong-zhe commented Jul 16, 2024

1. Problem

  • Dependencies from different registries may have the same name, and the current way kpm uses file names can cause conflicts that rely on duplicate names.
  • The local saving path of the three-party library is case-sensitive, this has different effects on windows and unix systems.

2. Solution

In order to solve the problem, we redesigned the storage local system of the third-party dependencies with reference to rust

└── kpm
    ├── .kpm # all the configuration file for kpm client
    ├── git # all the kcl dependencies from git repo
    ├── oci # all the kcl dependencies from oci registry

2.1 Git

Under the subdir in kpm/git, the storage local system is

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

NOTE:

  • <NAME>-<HASH> <NAME> is the name of git repo, <HASH> is calculated by the git url. <HASH> is mainly used to distinguish between different git repositories, such as github and gitlab, and the reason for using a hash value is mainly to mask the inconsistency of the symbols in the url and local path.

2.2 Oci

Under the subdir in kpm/oci, the storage local system is

kpm/oci
├── cache # the cache for KCL dependencies tar
│   ├── ghcr.io-2a81898195a215f1    # <HOST>-<HASH> HOST is the name of oci registry,  <HASH> is calculated by the oci full url.
│   │   └── k8s_1.29.tar    # the tar for KCL dependencies
│   ├── docker.io-578669463c900b87
│   │   └── k8s_1.28.tar
└── src                                              	
│   ├── ghcr.io-2a81898195a215f1
│   │   └── k8s_1.29    # the KCL dependencies tar will untar here
│   ├── docker.io-578669463c900b87
│   │   └── k8s_1.28

4. Implementation

@zong-zhe zong-zhe self-assigned this Jul 16, 2024
@zong-zhe zong-zhe added the enhancement New feature or request label Jul 16, 2024
@zong-zhe zong-zhe removed their assignment Jul 16, 2024
@zong-zhe zong-zhe added the help wanted Extra attention is needed label Jul 16, 2024
@zong-zhe
Copy link
Contributor Author

zong-zhe commented Jul 19, 2024

PreTest

  • Development of kpm current git module to expand its ability to download git bare repo.
  • Development of kpm current git module to expand its ability to checkout a source code repo from a git bare repo.
  • A unified dependency system is used to mask the underlying details, the file system only needs to provide the Get method externally. The unified file system automatically loads the cache and downloads.

@zong-zhe zong-zhe changed the title Enhancement: Full storage path for KCL third-party dependencies [LFX Term 3]: Full storage path for KCL third-party dependencies Jul 23, 2024
@Manoramsharma
Copy link
Contributor

HI @zong-zhe @Peefy I am excited to contribute to this issue. After going through the solution mentioned I have see following key take-aways:

  1. By generating unique hash values based on repository URLs, we can distinguish dependencies with identical names. This method not only avoids conflicts but also provides a consistent way to reference dependencies.
  2. Developing a system that abstracts the underlying details of fetching and caching dependencies. This system would expose a Get method, ensuring that the file system automatically manages caching and downloads. This would also standardize behavior across different operating systems.

In order to proceed forward with the pretest where I need to expand kpm's ability to download , chekout from a git bare repo. Which part of documentation I can refer for configuring this for kpm current git module?

@zong-zhe
Copy link
Contributor Author

Hi @Manoramsharma 😃

Welcome! Let me provide more details.

kpm currently uses a three-party library go-getter to complete git related work, go-getter will call your local git tool to complete the function, which means that you need to install git tools in your local, and refer to the git tool documentation to complete the configuration. With kpm, you don't need to do any additional configuration, and if your git clone runs successfully, kpm can add dependency with the git repo.

kpm and git

kpm and go-getter

@Manoramsharma
Copy link
Contributor

Okay, so kpm initially supported only the public git repo, now after the implementation of go-getter kpm supports cloning options for performing different git related operations. Right?

Now for our case, development of kpm current git module to expand its ability to download git bare repo do I need to modify the current git implementation by adding specific functions to the go-getter or the related file that will ultimately supports the checkout of specific version of git repository from cache bare repository? For the checkouts we will mention here:

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

@zong-zhe am I thinking in a right direction?

@Manoramsharma
Copy link
Contributor

Proposal to Enhance kpm's current git module

To enhance the kpm current git module, I propose the following solutions to expand its capabilities:

1. Expand Ability to Download Git Bare Repositories

1.1 Define a Function for Cloning Bare Repositories

I will add a function to clone Git repositories as bare repositories. This function will use Git commands to perform the cloning and store the bare repository in the appropriate cache directory.

Example Implementation:

package git

import (
	"fmt"
	"os/exec"
	"path/filepath"
)

// CloneBareRepo clones a Git repository as a bare repository.
func CloneBareRepo(repoURL string, cacheDir string) (string, error) {
	repoHash := calculateHash(repoURL) // Function to calculate a unique hash for the repo URL
	repoName := extractRepoName(repoURL) // Function to extract repository name from URL
	cachePath := filepath.Join(cacheDir, fmt.Sprintf("%s-%s", repoName, repoHash))

	if _, err := exec.LookPath("git"); err != nil {
		return "", fmt.Errorf("git not found in PATH")
	}

	if err := exec.Command("git", "clone", "--bare", repoURL, cachePath).Run(); err != nil {
		return "", fmt.Errorf("error cloning repository: %v", err)
	}

	return cachePath, nil
}

// Helper functions
func calculateHash(url string) string {
	// Implement hashing function for URL
}

func extractRepoName(url string) string {
	// Implement repo name extraction
}

1.2 Integrate with the Unified Dependency System

I will ensure that this new function integrates with the unified dependency system. This may involve updating the system to manage and utilize the cache effectively.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
	cacheDir string
}

// Get handles fetching the Git repository and managing the cache.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
	bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
	if err != nil {
		return err
	}
	return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

2. Expand Ability to Checkout Source Code from Bare Repositories

2.1 Define a Function for Checking Out Code from Bare Repositories

I will add a function to check out a specific commit or branch from a bare repository.

Example Implementation:

package git

import (
	"fmt"
	"os"
	"os/exec"
	"path/filepath"
)

// CheckoutFromBareRepo checks out a specific commit from a bare repository.
func CheckoutFromBareRepo(bareRepoPath string, commitID string, targetDir string) error {
	if err := os.MkdirAll(targetDir, 0755); err != nil {
		return fmt.Errorf("error creating target directory: %v", err)
	}

	cmd := exec.Command("git", "worktree", "add", targetDir, commitID)
	cmd.Dir = bareRepoPath
	if err := cmd.Run(); err != nil {
		return fmt.Errorf("error checking out commit: %v", err)
	}

	return nil
}

2.2 Update the Unified Dependency System

I will ensure that the unified dependency system correctly utilizes the new checkout functionality and integrates with the local cache.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
	cacheDir string
}

// Get handles fetching and checking out the Git repository.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
	bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
	if err != nil {
		return err
	}
	return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

3. Handle Cache and Metadata Management

3.1 Implement Cache Management

I will add functionality to manage and update cached repositories, including metadata.

package cache

// CacheManager handles operations related to caching dependencies.
type CacheManager struct {
	cacheDir string
}

// UpdateCache updates the local cache with new metadata and repositories.
func (cm *CacheManager) UpdateCache(repoURL string) error {
	// Implement caching logic here
	return nil
}

3.2 Ensure Consistency

I will ensure that the local storage system (e.g., kpm/git/checkouts/, kpm/git/db/) remains consistent with the new functionalities and can handle different scenarios.

cc: @Peefy @zong-zhe

@zong-zhe
Copy link
Contributor Author

Okay, so kpm initially supported only the public git repo, now after the implementation of go-getter kpm supports cloning options for performing different git related operations. Right?

Now for our case, development of kpm current git module to expand its ability to download git bare repo do I need to modify the current git implementation by adding specific functions to the go-getter or the related file that will ultimately supports the checkout of specific version of git repository from cache bare repository? For the checkouts we will mention here:

kpm/git
├── checkouts.   # checkout the specific version of git repository from cache bare repository 
│   ├── kcl-2a81898195a215f1
│   │   └── 33bb450. . # All the version of kcl package from git repository will be replaced with commit id
│   ├── kcl-578669463c900b87
│   │   └── 33bb450
└── db    # A bare git repository for cache git repo
    ├── kcl-2a81898195a215f1.      # <NAME>-<HASH> <NAME> is the name of git repo, 
    ├── kcl-578669463c900b87.   # <HASH> is calculated by the git full url.

@zong-zhe am I thinking in a right direction?

Hi @Manoramsharma 😃

You might need to learn if go-getter offers the ability to download bare repo, but if not, you can extend this file https://github.com/kcl-lang/kpm/blob/main/pkg/git/git.go to add kpm's basic ability to download a git bare repo and checkout the bare repo to src code repo. You can try to design your implementation with CloneOptions.

@zong-zhe
Copy link
Contributor Author

zong-zhe commented Jul 24, 2024

2.2 Update the Unified Dependency System

I will ensure that the unified dependency system correctly utilizes the new checkout functionality and integrates with the local cache.

package unified

// UnifiedDependencySystem is an abstraction for managing dependencies.
type UnifiedDependencySystem struct {
	cacheDir string
}

// Get handles fetching and checking out the Git repository.
func (uds *UnifiedDependencySystem) Get(repoURL string, commitID string, targetDir string) error {
	bareRepoPath, err := git.CloneBareRepo(repoURL, uds.cacheDir)
	if err != nil {
		return err
	}
	return git.CheckoutFromBareRepo(bareRepoPath, commitID, targetDir)
}

You may need to add some detail here, because KCL's dependencies are not only those from git, you also need to consider the dependencies from oci together, abstracting them from the same behavior.

You also need to consider combining the downloader to support the download behavior of different dependencies.

https://github.com/kcl-lang/kpm/blob/main/pkg/downloader/downloader.go

@zong-zhe
Copy link
Contributor Author

3. Handle Cache and Metadata Management

3.1 Implement Cache Management

I will add functionality to manage and update cached repositories, including metadata.

package cache

// CacheManager handles operations related to caching dependencies.
type CacheManager struct {
	cacheDir string
}

// UpdateCache updates the local cache with new metadata and repositories.
func (cm *CacheManager) UpdateCache(repoURL string) error {
	// Implement caching logic here
	return nil
}

This part I think may not need to be a separate module, it can be used as part of the unified file system.

@Manoramsharma
Copy link
Contributor

Hi @zong-zhe, I have carried out some research and found that go-getter just provides the support for downloading files from the source using URL and validating the URL to a correct type if required. Not something to facilitate bare repo cloning.

But our current git implementation includes go-git library functions like PlainClone, PlainOpen, InitOpen and others that can directly identify the repositoryto be bare or not-bare, by looking at the repository configuration and checks if worktree is nil.

Should I utilize these functions to extend our git implementaion for kpm adding support to clone bare repository? I need your reviews on this.

@zong-zhe
Copy link
Contributor Author

Hi @Manoramsharma 😃

I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.

  2. This method should be private.

    func (cloneOpts *CloneOptions) CloneBare() (*git.Repository, error) {
    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@Gmin2
Copy link
Contributor

Gmin2 commented Jul 30, 2024

Hey @zong-zhe so do i also need to refactor my previous work(done in #401 )

@zong-zhe
Copy link
Contributor Author

Hey @zong-zhe so do i also need to refactor my previous work(done in #401 )

Yes 😃

@Manoramsharma
Copy link
Contributor

Hi @Manoramsharma 😃

I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private.
    func (cloneOpts *CloneOptions) CloneBare() (*git.Repository, error) {

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager.

Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

@zong-zhe
Copy link
Contributor Author

zong-zhe commented Jul 30, 2024

Hi @Manoramsharma 😃
I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.

  2. This method should be private.

    func (cloneOpts *CloneOptions) CloneBare() (*git.Repository, error) {

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager.

Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

Hi @Manoramsharma 😃

What I mean is that the go-git may not be able to meet the current needs of kpm, because the go-git has defects in the processing of user credentials. I've discussed the defects of go-git in #335 (comment)

@Gmin2
Copy link
Contributor

Gmin2 commented Jul 30, 2024

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.
  2. This method should be private.
    func (cloneOpts *CloneOptions) CloneBare() (*git.Repository, error) {

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

implemented it in #418

cc @zong-zhe

@Manoramsharma
Copy link
Contributor

Manoramsharma commented Jul 30, 2024

Hi @Manoramsharma 😃
I reviewed your previous work, and I think that you may need to adjust some of your work.

  1. I think using go-git is not a very suitable way, because the go-git currently has problems with the authentication function of git repositories, and we cannot use go-git to complete the download of private repositories. Here are some of the previous discussions on this issue: added design doc for sparse checkout #335 (comment). If you're sure that go-getter can't download the bare repo, maybe you can use the local git command to support downloading from the bare repo repository.

  2. This method should be private.

    func (cloneOpts *CloneOptions) CloneBare() (*git.Repository, error) {

    It should not be exposed to the outside world. The outside world only needs to control Bare bool in CloneOptions to control whether the Clone() method downloads the Bare repository.

@zong-zhe are we talking of something similar to the cargo implementation that is package manager for RUST, because I can see in the discussion you mentioned about the clone options available with cargo package manager.
Or I can add an extra --flag to kcl mod add [options] .. command implementation that is taken as a CLI input from the user client and based upon the value of this flag can change out cloneOptions?

Hi @Manoramsharma 😃

What I mean is that the go-git may not be able to meet the current needs of kpm, because the go-git has defects in the processing of user credentials. I've discussed the defects of go-git in #335 (comment)

Hi @zong-zhe you can review my latest PR where I removed the go-git implementation and tried to implement the bare clone using the local git command inside the clone() itself , no extra function. I have searched a lot but was not able to see any supporting example of go-getter used to clone bare repo. see #419

Also I have one question regarding the checkout operation from the bare repo, the bare is something with no workingtree so how it is possible to checkout the particular commit or branch from the repository with no workingtree?

@zong-zhe
Copy link
Contributor Author

zong-zhe commented Jul 31, 2024

Also I have one question regarding the checkout operation from the bare repo, the bare is something with no workingtree so how it is possible to checkout the particular commit or branch from the repository with no workingtree?

Hi @Manoramsharma 😃

This part of my description is inaccurate, which has caused a misunderstanding. I apologize for it. My original idea was to use the bare repository as a cache, then clone different dependencies into the corresponding directory from the bare repo and checkout to the corresponding branch, commit or tag.

@Manoramsharma
Copy link
Contributor

Hi @zong-zhe after some research about the bare repository and its implementation as the cache support for our objective I have designed an overall proposal discussing the approach. Before moving forward with the exact changes to the codebase I have few questions in my mind:

  1. Which will be the remote bare repository we will be look up to?
  2. Bare repo supports connecting using ssh and the host IP (Discussed in the proposal), how to make sure a remote server running the code?

Looking forward to have your feedback on the same. Thanks for your support !!

@zong-zhe
Copy link
Contributor Author

zong-zhe commented Aug 1, 2024

Hi @Manoramsharma 😃

  1. Which will be the remote bare repository we will be look up to?

kpm supports adding dependencies from git repositories, which are specified by the user.

  1. Bare repo supports connecting using ssh and the host IP (Discussed in the proposal)

kpm's git capability is directly inherited from the locally installed git client. As long as git clone is successful, kpm can successfully download it, and kpm does not need to consider ssh or host IP.

how to make sure a remote server running the code?

I don't understand the question.

@Gmin2
Copy link
Contributor

Gmin2 commented Aug 12, 2024

Why this issue was closed @zong-zhe, i was working on
image

and merging the #419 PR conflict with merging my PR #418 which was merged previously

cc @Peefy @zong-zhe

@Manoramsharma
Copy link
Contributor

Manoramsharma commented Aug 12, 2024

Why this issue was closed @zong-zhe, i was working on image

and merging the #419 PR conflict with merging my PR #418 which was merged previously

cc @Peefy @zong-zhe

Hi @Gmin2 I think you are not following the issue quite well. I have already designed an overall roadmap for achieving this particular enhancement of KPM to manage third-party dependencies based on caching package repositories as bare repo to the local path.

All discussions including the overall approach to achieve a unified dependency management to low-level works in the form of pretest are already included in my proposal PR . You can have a look at the proposal here #423 . I am consistently working on the proposal refining and achieving the expected outcome by taking regular reviews from the maintainers on my work. I don't think you need to invest your time on this particular issue as I have already made a good progress on this.

Thanks

@Gmin2
Copy link
Contributor

Gmin2 commented Aug 12, 2024

Hi @Gmin2 I think you are not following the issue quite well. I have already designed an overall roadmap for achieving this particular enhancement of KPM to manage third-party dependencies based on caching package repositories as bare repo to the local path.

designing was not part of the pretest, it was not mentioned in the pretest i thought we had to give the design in the cover letter

@Manoramsharma
Copy link
Contributor

Hi @zong-zhe I have designed a paperwork for the second phase to support downloading the packages to designed local storage path. You can review the approach and provide me with some valuable feedbacks here. Thanks 😄

cc: @Peefy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
Status: Done
4 participants