Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while writing results in 1GB repository size #499

Open
dimaslanjaka opened this issue Aug 22, 2023 · 3 comments
Open

Error while writing results in 1GB repository size #499

dimaslanjaka opened this issue Aug 22, 2023 · 3 comments

Comments

@dimaslanjaka
Copy link

dimaslanjaka commented Aug 22, 2023

I just run

git filter-repo --analyze

caught errors

Writing reports to .git\filter-repo\analysis...Traceback (most recent call last):
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\Scripts\git-filter-repo.exe\__main__.py", line 7, in <module>
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 3999, in main
    RepoAnalyze.run(args)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 2689, in run
    RepoAnalyze.write_report(reportdir, stats)
  File "C:\Users\dimas\AppData\Local\Programs\Python\Python39\lib\site-packages\git_filter_repo.py", line 2431, in write_report
    size = {'packed': stats['packed_size'][sha],
KeyError: b'2e9038bce62c4fdfcd855549ce5bd068802a36b2'
@newren
Copy link
Owner

newren commented Sep 4, 2023

Is the repository you ran this on available for me to clone? Hard to debug without a way to reproduce...

@dimaslanjaka
Copy link
Author

Is the repository you ran this on available for me to clone? Hard to debug without a way to reproduce...

I tried on these repo

@newren
Copy link
Owner

newren commented Aug 2, 2024

I cannot duplicate.

I note the the sizes of the repos when I clone them are:

$ du -hs */.git/objects | tac
981M	static-blog-generator-hexo/.git/objects
2.1G	static-blog-generator/.git/objects
291M	dimaslanjaka.github.io/.git/objects

So one is nearly a 1GB, and another is more than double that amount.

I instrumented git-filter-repo with the following changes to get an idea of the memory usage:

diff --git a/git-filter-repo b/git-filter-repo
index 9cce52a..a5bd003 100755
--- a/git-filter-repo
+++ b/git-filter-repo
@@ -38,6 +38,7 @@ import io
 import os
 import platform
 import re
+import resource
 import shutil
 import subprocess
 import sys
@@ -342,7 +343,10 @@ class ProgressWriter(object):
     now = time.time()
     if now - self._last_progress_update > .1:
       self._last_progress_update = now
-      sys.stdout.write("\r{}".format(msg))
+      mem = [resource.getrusage(resource.RUSAGE_SELF).ru_maxrss,
+             resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss]
+      sys.stdout.write("Self: %d Kb, Children: %d Kb\n" % tuple(mem))
+      #sys.stdout.write("\r{}".format(msg))
       sys.stdout.flush()
 
   def finish(self):
@@ -4135,6 +4139,11 @@ def main():
   else:
     filter = RepoFilter(args)
     filter.run()
+  sys.stdout.write("Final:\n")
+  mem = [resource.getrusage(resource.RUSAGE_SELF).ru_maxrss,
+         resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss]
+  sys.stdout.write("Self: %d Kb, Children: %d Kb\n" % tuple(mem))
+  sys.stdout.flush()
 
 if __name__ == '__main__':
   main()

and then when I ran I saw:

$ cd static-blog-generator-hexo/
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 21448 Kb, Children: 21448 Kb
Self: 21448 Kb, Children: 21448 Kb
Self: 21448 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21576 Kb, Children: 21448 Kb
Self: 21832 Kb, Children: 21448 Kb
Self: 21832 Kb, Children: 21448 Kb
Self: 21960 Kb, Children: 21448 Kb
Self: 21960 Kb, Children: 21448 Kb
Self: 22344 Kb, Children: 21448 Kb
Self: 22344 Kb, Children: 21448 Kb
Self: 23488 Kb, Children: 21448 Kb
Self: 24000 Kb, Children: 171188 Kb

Self: 24512 Kb, Children: 171188 Kb
Self: 25792 Kb, Children: 171188 Kb
Self: 27712 Kb, Children: 171188 Kb
Self: 27968 Kb, Children: 171188 Kb
Self: 28992 Kb, Children: 171188 Kb
Self: 33472 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36288 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 36416 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb
Self: 39232 Kb, Children: 171188 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 40512 Kb, Children: 171188 Kb
External monitoring: Memory: 171188 Kbytes, Time: 6.57 seconds

$ cd ../static-blog-generator
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 22328 Kb, Children: 21432 Kb
Self: 24352 Kb, Children: 21432 Kb
Self: 26936 Kb, Children: 21432 Kb
Self: 32844 Kb, Children: 21432 Kb
Self: 33868 Kb, Children: 21432 Kb
Self: 35660 Kb, Children: 250684 Kb

Self: 35788 Kb, Children: 250684 Kb
Self: 36940 Kb, Children: 250684 Kb
Self: 37068 Kb, Children: 250684 Kb
Self: 45644 Kb, Children: 250684 Kb
Self: 47820 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 48716 Kb, Children: 250684 Kb
Self: 49100 Kb, Children: 250684 Kb
Self: 49100 Kb, Children: 250684 Kb
Self: 55244 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 56396 Kb, Children: 250684 Kb
Self: 61852 Kb, Children: 250684 Kb
Self: 62108 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 62236 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 63516 Kb, Children: 250684 Kb
Self: 67868 Kb, Children: 250684 Kb
Self: 67868 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 70428 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73116 Kb, Children: 250684 Kb
Self: 73600 Kb, Children: 250684 Kb
Self: 73856 Kb, Children: 250684 Kb
Self: 74112 Kb, Children: 250684 Kb
Self: 75648 Kb, Children: 250684 Kb
Self: 76032 Kb, Children: 250684 Kb
Self: 76160 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76544 Kb, Children: 250684 Kb
Self: 76800 Kb, Children: 250684 Kb
Self: 77312 Kb, Children: 250684 Kb
Self: 77312 Kb, Children: 250684 Kb
Self: 77568 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 77952 Kb, Children: 250684 Kb
Self: 78592 Kb, Children: 250684 Kb
Self: 79232 Kb, Children: 250684 Kb
Self: 79232 Kb, Children: 250684 Kb
Self: 79360 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 80256 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81408 Kb, Children: 250684 Kb
Self: 81536 Kb, Children: 250684 Kb
Self: 94208 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 105728 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb
Self: 111488 Kb, Children: 250684 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 119220 Kb, Children: 734388 Kb
External monitoring: Memory: 734388 Kbytes, Time: 26.30 seconds

$ cd ../dimaslanjaka.github.io/
$ /usr/bin/time -f "External monitoring: Memory: %M Kbytes, Time: %e seconds" git filter-repo --analyze
Self: 22304 Kb, Children: 21280 Kb
Self: 23880 Kb, Children: 21280 Kb
Self: 26852 Kb, Children: 21280 Kb
Self: 27888 Kb, Children: 21280 Kb
Self: 32972 Kb, Children: 21280 Kb
Self: 33100 Kb, Children: 21280 Kb
Self: 34508 Kb, Children: 21280 Kb
Self: 35660 Kb, Children: 21280 Kb
Self: 36940 Kb, Children: 21280 Kb
Self: 45064 Kb, Children: 21280 Kb
Self: 45064 Kb, Children: 21280 Kb
Self: 45704 Kb, Children: 21280 Kb
Self: 46984 Kb, Children: 21280 Kb
Self: 48264 Kb, Children: 21280 Kb
Self: 49544 Kb, Children: 21280 Kb
Self: 51208 Kb, Children: 21280 Kb
Self: 51208 Kb, Children: 244668 Kb

Self: 57352 Kb, Children: 244668 Kb
Self: 58760 Kb, Children: 244668 Kb
Self: 61064 Kb, Children: 244668 Kb
Self: 64904 Kb, Children: 244668 Kb
Self: 66952 Kb, Children: 244668 Kb
Self: 69384 Kb, Children: 244668 Kb
Self: 75528 Kb, Children: 244668 Kb
Self: 75912 Kb, Children: 244668 Kb
Self: 76808 Kb, Children: 244668 Kb
Self: 78472 Kb, Children: 244668 Kb
Self: 80648 Kb, Children: 244668 Kb
Self: 82824 Kb, Children: 244668 Kb
Self: 84232 Kb, Children: 244668 Kb
Self: 86664 Kb, Children: 244668 Kb
Self: 88456 Kb, Children: 244668 Kb
Self: 90888 Kb, Children: 244668 Kb
Self: 99080 Kb, Children: 244668 Kb
Self: 100488 Kb, Children: 244668 Kb
Self: 101768 Kb, Children: 244668 Kb
Self: 102536 Kb, Children: 244668 Kb
Self: 104072 Kb, Children: 244668 Kb
Self: 104840 Kb, Children: 244668 Kb
Self: 105608 Kb, Children: 244668 Kb
Self: 106504 Kb, Children: 244668 Kb
Self: 107400 Kb, Children: 244668 Kb
Self: 107784 Kb, Children: 244668 Kb
Self: 108040 Kb, Children: 244668 Kb
Self: 108680 Kb, Children: 244668 Kb
Self: 110344 Kb, Children: 244668 Kb
Self: 112264 Kb, Children: 244668 Kb
Self: 113544 Kb, Children: 244668 Kb
Self: 114312 Kb, Children: 244668 Kb
Self: 114952 Kb, Children: 244668 Kb
Self: 115336 Kb, Children: 244668 Kb
Self: 115592 Kb, Children: 244668 Kb
Self: 116488 Kb, Children: 244668 Kb
Self: 117768 Kb, Children: 244668 Kb
Self: 119688 Kb, Children: 244668 Kb
Self: 120456 Kb, Children: 244668 Kb
Self: 121352 Kb, Children: 244668 Kb
Self: 122376 Kb, Children: 244668 Kb
Self: 122376 Kb, Children: 244668 Kb

Writing reports to .git/filter-repo/analysis...done.
Final:
Self: 141832 Kb, Children: 244668 Kb
External monitoring: Memory: 244668 Kbytes, Time: 10.73 seconds

So, all the filterings finish in less than half a minute, and in all cases the processes use than 1GB of RAM. Do you still get the bug today with these repositories? Can you try on a system with more memory and/or on a different operating system and see if you can still reproduce the error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants