Skip to content

Commit

Permalink
Merge branch 'cc/repack-sift-filtered-objects-to-separate-pack'
Browse files Browse the repository at this point in the history
"git repack" machinery learns to pay attention to the "--filter="
option.

* cc/repack-sift-filtered-objects-to-separate-pack:
  gc: add `gc.repackFilterTo` config option
  repack: implement `--filter-to` for storing filtered out objects
  gc: add `gc.repackFilter` config option
  repack: add `--filter=<filter-spec>` option
  pack-bitmap-write: rebuild using new bitmap when remapping
  repack: refactor finding pack prefix
  repack: refactor finishing pack-objects command
  t/helper: add 'find-pack' test-tool
  pack-objects: allow `--filter` without `--stdout`
  • Loading branch information
gitster committed Oct 10, 2023
2 parents afb0d08 + 9b96046 commit 1fdedb7
Show file tree
Hide file tree
Showing 15 changed files with 544 additions and 51 deletions.
16 changes: 16 additions & 0 deletions Documentation/config/gc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,22 @@ Multiple hooks are supported, but all must exit successfully, else the
operation (either generating a cruft pack or unpacking unreachable
objects) will be halted.

gc.repackFilter::
When repacking, use the specified filter to move certain
objects into a separate packfile. See the
`--filter=<filter-spec>` option of linkgit:git-repack[1].

gc.repackFilterTo::
When repacking and using a filter, see `gc.repackFilter`, the
specified location will be used to create the packfile
containing the filtered out objects. **WARNING:** The
specified location should be accessible, using for example the
Git alternates mechanism, otherwise the repo could be
considered corrupt by Git as it migh not be able to access the
objects in that packfile. See the `--filter-to=<dir>` option
of linkgit:git-repack[1] and the `objects/info/alternates`
section of linkgit:gitrepository-layout[5].

gc.rerereResolved::
Records of conflicted merge you resolved earlier are
kept for this many days when 'git rerere gc' is run.
Expand Down
4 changes: 2 additions & 2 deletions Documentation/git-pack-objects.txt
Original file line number Diff line number Diff line change
Expand Up @@ -296,8 +296,8 @@ So does `git bundle` (see linkgit:git-bundle[1]) when it creates a bundle.
nevertheless.

--filter=<filter-spec>::
Requires `--stdout`. Omits certain objects (usually blobs) from
the resulting packfile. See linkgit:git-rev-list[1] for valid
Omits certain objects (usually blobs) from the resulting
packfile. See linkgit:git-rev-list[1] for valid
`<filter-spec>` forms.

--no-filter::
Expand Down
23 changes: 23 additions & 0 deletions Documentation/git-repack.txt
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,29 @@ depth is 4095.
a larger and slower repository; see the discussion in
`pack.packSizeLimit`.

--filter=<filter-spec>::
Remove objects matching the filter specification from the
resulting packfile and put them into a separate packfile. Note
that objects used in the working directory are not filtered
out. So for the split to fully work, it's best to perform it
in a bare repo and to use the `-a` and `-d` options along with
this option. Also `--no-write-bitmap-index` (or the
`repack.writebitmaps` config option set to `false`) should be
used otherwise writing bitmap index will fail, as it supposes
a single packfile containing all the objects. See
linkgit:git-rev-list[1] for valid `<filter-spec>` forms.

--filter-to=<dir>::
Write the pack containing filtered out objects to the
directory `<dir>`. Only useful with `--filter`. This can be
used for putting the pack on a separate object directory that
is accessed through the Git alternates mechanism. **WARNING:**
If the packfile containing the filtered out objects is not
accessible, the repo can become corrupt as it might not be
possible to access the objects in that packfile. See the
`objects` and `objects/info/alternates` sections of
linkgit:gitrepository-layout[5].

-b::
--write-bitmap-index::
Write a reachability bitmap index as part of the repack. This
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -800,6 +800,7 @@ TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
TEST_BUILTINS_OBJS += test-env-helper.o
TEST_BUILTINS_OBJS += test-example-decorate.o
TEST_BUILTINS_OBJS += test-fast-rebase.o
TEST_BUILTINS_OBJS += test-find-pack.o
TEST_BUILTINS_OBJS += test-fsmonitor-client.o
TEST_BUILTINS_OBJS += test-genrandom.o
TEST_BUILTINS_OBJS += test-genzeros.o
Expand Down
10 changes: 10 additions & 0 deletions builtin/gc.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ static timestamp_t gc_log_expire_time;
static const char *gc_log_expire = "1.day.ago";
static const char *prune_expire = "2.weeks.ago";
static const char *prune_worktrees_expire = "3.months.ago";
static char *repack_filter;
static char *repack_filter_to;
static unsigned long big_pack_threshold;
static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;

Expand Down Expand Up @@ -170,6 +172,9 @@ static void gc_config(void)
git_config_get_ulong("gc.bigpackthreshold", &big_pack_threshold);
git_config_get_ulong("pack.deltacachesize", &max_delta_cache_size);

git_config_get_string("gc.repackfilter", &repack_filter);
git_config_get_string("gc.repackfilterto", &repack_filter_to);

git_config(git_default_config, NULL);
}

Expand Down Expand Up @@ -355,6 +360,11 @@ static void add_repack_all_option(struct string_list *keep_pack)

if (keep_pack)
for_each_string_list(keep_pack, keep_one_pack, NULL);

if (repack_filter && *repack_filter)
strvec_pushf(&repack, "--filter=%s", repack_filter);
if (repack_filter_to && *repack_filter_to)
strvec_pushf(&repack, "--filter-to=%s", repack_filter_to);
}

static void add_repack_incremental_option(void)
Expand Down
8 changes: 2 additions & 6 deletions builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -4402,12 +4402,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
if (!rev_list_all || !rev_list_reflog || !rev_list_index)
unpack_unreachable_expiration = 0;

if (filter_options.choice) {
if (!pack_to_stdout)
die(_("cannot use --filter without --stdout"));
if (stdin_packs)
die(_("cannot use --filter with --stdin-packs"));
}
if (stdin_packs && filter_options.choice)
die(_("cannot use --filter with --stdin-packs"));

if (stdin_packs && use_internal_rev_list)
die(_("cannot use internal rev list with --stdin-packs"));
Expand Down
164 changes: 122 additions & 42 deletions builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "pack.h"
#include "pack-bitmap.h"
#include "refs.h"
#include "list-objects-filter-options.h"

#define ALL_INTO_ONE 1
#define LOOSEN_UNREACHABLE 2
Expand Down Expand Up @@ -56,6 +57,7 @@ struct pack_objects_args {
int no_reuse_object;
int quiet;
int local;
struct list_objects_filter_options filter_options;
};

static int repack_config(const char *var, const char *value,
Expand Down Expand Up @@ -806,6 +808,86 @@ static void remove_redundant_bitmaps(struct string_list *include,
strbuf_release(&path);
}

static int finish_pack_objects_cmd(struct child_process *cmd,
struct string_list *names,
int local)
{
FILE *out;
struct strbuf line = STRBUF_INIT;

out = xfdopen(cmd->out, "r");
while (strbuf_getline_lf(&line, out) != EOF) {
struct string_list_item *item;

if (line.len != the_hash_algo->hexsz)
die(_("repack: Expecting full hex object ID lines only "
"from pack-objects."));
/*
* Avoid putting packs written outside of the repository in the
* list of names.
*/
if (local) {
item = string_list_append(names, line.buf);
item->util = populate_pack_exts(line.buf);
}
}
fclose(out);

strbuf_release(&line);

return finish_command(cmd);
}

static int write_filtered_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
struct existing_packs *existing,
struct string_list *names)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct string_list_item *item;
FILE *in;
int ret;
const char *caret;
const char *scratch;
int local = skip_prefix(destination, packdir, &scratch);

prepare_pack_objects(&cmd, args, destination);

strvec_push(&cmd.args, "--stdin-packs");

if (!pack_kept_objects)
strvec_push(&cmd.args, "--honor-pack-keep");
for_each_string_list_item(item, &existing->kept_packs)
strvec_pushf(&cmd.args, "--keep-pack=%s", item->string);

cmd.in = -1;

ret = start_command(&cmd);
if (ret)
return ret;

/*
* Here 'names' contains only the pack(s) that were just
* written, which is exactly the packs we want to keep. Also
* 'existing_kept_packs' already contains the packs in
* 'keep_pack_list'.
*/
in = xfdopen(cmd.in, "w");
for_each_string_list_item(item, names)
fprintf(in, "^%s-%s.pack\n", pack_prefix, item->string);
for_each_string_list_item(item, &existing->non_kept_packs)
fprintf(in, "%s.pack\n", item->string);
for_each_string_list_item(item, &existing->cruft_packs)
fprintf(in, "%s.pack\n", item->string);
caret = pack_kept_objects ? "" : "^";
for_each_string_list_item(item, &existing->kept_packs)
fprintf(in, "%s%s.pack\n", caret, item->string);
fclose(in);

return finish_pack_objects_cmd(&cmd, names, local);
}

static int write_cruft_pack(const struct pack_objects_args *args,
const char *destination,
const char *pack_prefix,
Expand All @@ -814,9 +896,8 @@ static int write_cruft_pack(const struct pack_objects_args *args,
struct existing_packs *existing)
{
struct child_process cmd = CHILD_PROCESS_INIT;
struct strbuf line = STRBUF_INIT;
struct string_list_item *item;
FILE *in, *out;
FILE *in;
int ret;
const char *scratch;
int local = skip_prefix(destination, packdir, &scratch);
Expand Down Expand Up @@ -861,27 +942,18 @@ static int write_cruft_pack(const struct pack_objects_args *args,
fprintf(in, "%s.pack\n", item->string);
fclose(in);

out = xfdopen(cmd.out, "r");
while (strbuf_getline_lf(&line, out) != EOF) {
struct string_list_item *item;

if (line.len != the_hash_algo->hexsz)
die(_("repack: Expecting full hex object ID lines only "
"from pack-objects."));
/*
* avoid putting packs written outside of the repository in the
* list of names
*/
if (local) {
item = string_list_append(names, line.buf);
item->util = populate_pack_exts(line.buf);
}
}
fclose(out);

strbuf_release(&line);
return finish_pack_objects_cmd(&cmd, names, local);
}

return finish_command(&cmd);
static const char *find_pack_prefix(const char *packdir, const char *packtmp)
{
const char *pack_prefix;
if (!skip_prefix(packtmp, packdir, &pack_prefix))
die(_("pack prefix %s does not begin with objdir %s"),
packtmp, packdir);
if (*pack_prefix == '/')
pack_prefix++;
return pack_prefix;
}

int cmd_repack(int argc, const char **argv, const char *prefix)
Expand All @@ -891,10 +963,8 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
struct string_list names = STRING_LIST_INIT_DUP;
struct existing_packs existing = EXISTING_PACKS_INIT;
struct pack_geometry geometry = { 0 };
struct strbuf line = STRBUF_INIT;
struct tempfile *refs_snapshot = NULL;
int i, ext, ret;
FILE *out;
int show_progress;

/* variables to be filled by option parsing */
Expand All @@ -907,6 +977,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
int write_midx = 0;
const char *cruft_expiration = NULL;
const char *expire_to = NULL;
const char *filter_to = NULL;

struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, &pack_everything,
Expand Down Expand Up @@ -948,6 +1019,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("limits the maximum number of threads")),
OPT_STRING(0, "max-pack-size", &po_args.max_pack_size, N_("bytes"),
N_("maximum size of each packfile")),
OPT_PARSE_LIST_OBJECTS_FILTER(&po_args.filter_options),
OPT_BOOL(0, "pack-kept-objects", &pack_kept_objects,
N_("repack objects in packs marked with .keep")),
OPT_STRING_LIST(0, "keep-pack", &keep_pack_list, N_("name"),
Expand All @@ -958,9 +1030,13 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
N_("write a multi-pack index of the resulting packs")),
OPT_STRING(0, "expire-to", &expire_to, N_("dir"),
N_("pack prefix to store a pack containing pruned objects")),
OPT_STRING(0, "filter-to", &filter_to, N_("dir"),
N_("pack prefix to store a pack containing filtered out objects")),
OPT_END()
};

list_objects_filter_init(&po_args.filter_options);

git_config(repack_config, &cruft_po_args);

argc = parse_options(argc, argv, prefix, builtin_repack_options,
Expand Down Expand Up @@ -1101,6 +1177,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
strvec_push(&cmd.args, "--incremental");
}

if (po_args.filter_options.choice)
strvec_pushf(&cmd.args, "--filter=%s",
expand_list_objects_filter_spec(&po_args.filter_options));
else if (filter_to)
die(_("option '%s' can only be used along with '%s'"), "--filter-to", "--filter");

if (geometry.split_factor)
cmd.in = -1;
else
Expand All @@ -1124,31 +1206,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
fclose(in);
}

out = xfdopen(cmd.out, "r");
while (strbuf_getline_lf(&line, out) != EOF) {
struct string_list_item *item;

if (line.len != the_hash_algo->hexsz)
die(_("repack: Expecting full hex object ID lines only from pack-objects."));
item = string_list_append(&names, line.buf);
item->util = populate_pack_exts(item->string);
}
strbuf_release(&line);
fclose(out);
ret = finish_command(&cmd);
ret = finish_pack_objects_cmd(&cmd, &names, 1);
if (ret)
goto cleanup;

if (!names.nr && !po_args.quiet)
printf_ln(_("Nothing new to pack."));

if (pack_everything & PACK_CRUFT) {
const char *pack_prefix;
if (!skip_prefix(packtmp, packdir, &pack_prefix))
die(_("pack prefix %s does not begin with objdir %s"),
packtmp, packdir);
if (*pack_prefix == '/')
pack_prefix++;
const char *pack_prefix = find_pack_prefix(packdir, packtmp);

if (!cruft_po_args.window)
cruft_po_args.window = po_args.window;
Expand Down Expand Up @@ -1203,6 +1269,19 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
}
}

if (po_args.filter_options.choice) {
if (!filter_to)
filter_to = packtmp;

ret = write_filtered_pack(&po_args,
filter_to,
find_pack_prefix(packdir, packtmp),
&existing,
&names);
if (ret)
goto cleanup;
}

string_list_sort(&names);

close_object_store(the_repository->objects);
Expand Down Expand Up @@ -1295,6 +1374,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
string_list_clear(&names, 1);
existing_packs_release(&existing);
free_pack_geometry(&geometry);
list_objects_filter_release(&po_args.filter_options);

return ret;
}
Loading

0 comments on commit 1fdedb7

Please sign in to comment.