Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submodule commit causes an error #260

Open
Creadeyh opened this issue Apr 7, 2023 · 4 comments
Open

Submodule commit causes an error #260

Creadeyh opened this issue Apr 7, 2023 · 4 comments

Comments

@Creadeyh
Copy link

Creadeyh commented Apr 7, 2023

Describe the bug
I'm analyzing the github repo avatarify and the commits containing submodule commits such as this one causes an exception to be raised:
ValueError: SHA b'72a32a67dee3a67dff76f565551907a2fc7e88e6' could not be resolved, git returned: b'72a32a67dee3a67dff76f565551907a2fc7e88e6missing'
The hash in the error being the one of the submodule commit.

To Reproduce
I've noticed this issue on 2 occurrences while working with avatarify:

When I use commits = pydriller.Repository(...).traverse_commits()
and retrieve either of dmm_unit_size/dmm_unit_complexity/dmm_unit_interfacing:

for commit in commit:
    dmm_unit_size = commit.dmm_unit_size
    dmm_unit_complexity = commit.dmm_unit_complexity
    dmm_unit_interfacing = commit.dmm_unit_interfacing

This is straightforward to patch on my side as I can just try-catch these metrics and replacing them by None if it fails on a commit. However the second case would require a change out of my reach.

When I call the constructor of pydriller.metrics.process.code_churn.CodeChurn

Unless I avoid the problematic commits by navigating with CodeChurn's from_commit/to_commit around them, I simply cannot compute the repo's churn

OS Version:
Windows

@Creadeyh
Copy link
Author

Same error when calling pydriller.Commit.modified_files

@ishepard
Copy link
Owner

ishepard commented Apr 15, 2023

Hi! The commit you are referring to is in a submodule.
To analyze those you need to clone submodules as well, otherwise Git complains that the commit doesn't exists.

As a test, try to run:

git show 72a32a67dee3a67dff76f565551907a2fc7e88e6

in your terminal. You'll see Git returns an error. After you init the submodules that should go away.

@Creadeyh
Copy link
Author

Creadeyh commented May 3, 2023

I understand that. The issue is that they removed the submodules, so the .gitmodules is empty and init does nothing.

I tried to work around it by retrieving the history of .gitmodules with Git.get_commits_modified_file(), then checkout where .gitmodules was filled, and init-update the submodules from there.
However, I still can't access that commit with git show, only if I navigate inside the submodule folder.

And when I call CodeChurn or a DMM metric, it still fails because Pydriller stays in the root folder.

@Creadeyh
Copy link
Author

Creadeyh commented May 3, 2023

@ishepard Here is the test script I put together if you want to try it out yourself. I'm running Python 3.8 and Pydriller 2.4.1

import subprocess
import tempfile
import os
from typing import List
from pydriller import Repository, Git

tmp_dir = tempfile.mkdtemp()
repo_dir = os.path.join(tmp_dir, "avatarify-python")
process = subprocess.run(["git", "clone", "https://github.com/alievk/avatarify-python"],
                             stdout=subprocess.PIPE,
                             cwd=tmp_dir)
process = subprocess.run(["git", "checkout", "master"],
                             stdout=subprocess.PIPE,
                             cwd=repo_dir)

git: Git = Git(repo_dir)
gitmodules_hist: List[str] = git.get_commits_modified_file(os.path.join(repo_dir, ".gitmodules"), include_deleted_files=True)
for hash in gitmodules_hist:
    git.checkout(hash)
    if os.path.exists(os.path.join(repo_dir, ".gitmodules")):
        print("SUBMODULE UPDATE")
        process = subprocess.run(["git", "submodule", "init"],
                                    stdout=subprocess.PIPE,
                                    cwd=repo_dir)
        process = subprocess.run(["git", "submodule", "update"],
                                    stdout=subprocess.PIPE,
                                    cwd=repo_dir)

git_commits = Repository(repo_dir, only_no_merge=True).traverse_commits()
commits = []
for git_commit in git_commits:

    if git_commit.hash == "80226c1717402f7372a9f82b098619b3836b8bc0":
        print("FOUND BEFORE SUBMODULE 1")
        # Fails here because 80226c references 72a32a
        print(git_commit.dmm_unit_size)
    elif git_commit.hash == "72a32a67dee3a67dff76f565551907a2fc7e88e6":
        print("FOUND SUBMODULE 1")
    elif git_commit.hash == "a5aabda05cc0d0da1e21f21a138e2e5dec01afa0":
        print("FOUND BEFORE SUBMODULE 2")
        # Fails here because a5aabd references 6c1fbf
        print(git_commit.dmm_unit_size)
    elif git_commit.hash == "6c1fbf39690130e2303bcecd3c6126c71cfacf85":
        print("FOUND SUBMODULE 2")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants