git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de,
	sandals@crustytoothpaste.net, steadmon@google.com,
	jrnieder@gmail.com, peff@peff.net, congdanhqx@gmail.com,
	phillip.wood123@gmail.com, emilyshaffer@google.com,
	sluongng@gmail.com, jonathantanmy@google.com,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH v2 10/18] maintenance: add incremental-repack task
Date: Thu, 23 Jul 2020 15:00:11 -0700	[thread overview]
Message-ID: <xmqqeep1q4d0.fsf@gitster.c.googlers.com> (raw)
In-Reply-To: <b6328c210625e1ba98e2065208a2a478c2c64f94.1595527000.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 23 Jul 2020 17:56:32 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> 1. 'git multi-pack-index write' creates a multi-pack-index file if
>    one did not exist, and otherwise will update the multi-pack-index
>    with any new pack-files that appeared since the last write. This
>    is particularly relevant with the background fetch job.
>
>    When the multi-pack-index sees two copies of the same object, it
>    stores the offset data into the newer pack-file. This means that
>    some old pack-files could become "unreferenced" which I will use
>    to mean "a pack-file that is in the pack-file list of the
>    multi-pack-index but none of the objects in the multi-pack-index
>    reference a location inside that pack-file."

An obvious alternative is to favor the copy in the older pack,
right?  Is the expectation that over time, most of the objects that
are relevant would reappear in newer packs, so that eventually by
favoring the copies in the newer packs, we can retire and remove the
old pack, keeping only the newer ones?

But would that assumption hold?  The old packs hold objects that are
necessary for the older parts of the history, so unless you are
cauterizing away the old history, these objects in the older packs
are likely to stay with us longer than those used by the newer parts
of the history, some of which may not even have been pushed out yet
and can be rebased away?

> 2. 'git multi-pack-index expire' deletes any unreferenced pack-files
>    and updaes the multi-pack-index to drop those pack-files from the
>    list. This is safe to do as concurrent Git processes will see the
>    multi-pack-index and not open those packs when looking for object
>    contents. (Similar to the 'loose-objects' job, there are some Git
>    commands that open pack-files regardless of the multi-pack-index,
>    but they are rarely used. Further, a user that self-selects to
>    use background operations would likely refrain from using those
>    commands.)

OK.

> 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
>    of pack-files that are listed in the multi-pack-index and creates
>    a new pack-file containing the objects whose offsets are listed
>    by the multi-pack-index to be in those objects. The set of pack-
>    files is selected greedily by sorting the pack-files by modified
>    time and adding a pack-file to the set if its "expected size" is
>    smaller than the batch size until the total expected size of the
>    selected pack-files is at least the batch size. The "expected
>    size" is calculated by taking the size of the pack-file divided
>    by the number of objects in the pack-file and multiplied by the
>    number of objects from the multi-pack-index with offset in that
>    pack-file. The expected size approximats how much data from that

approximates.

>    pack-file will contribute to the resulting pack-file size. The
>    intention is that the resulting pack-file will be close in size
>    to the provided batch size.

> +static int maintenance_task_incremental_repack(void)
> +{
> +	if (multi_pack_index_write()) {
> +		error(_("failed to write multi-pack-index"));
> +		return 1;
> +	}
> +
> +	if (multi_pack_index_verify()) {
> +		warning(_("multi-pack-index verify failed after initial write"));
> +		return rewrite_multi_pack_index();
> +	}
> +
> +	if (multi_pack_index_expire()) {
> +		error(_("multi-pack-index expire failed"));
> +		return 1;
> +	}
> +
> +	if (multi_pack_index_verify()) {
> +		warning(_("multi-pack-index verify failed after expire"));
> +		return rewrite_multi_pack_index();
> +	}
> +	if (multi_pack_index_repack()) {
> +		error(_("multi-pack-index repack failed"));
> +		return 1;
> +	}

Hmph, I wonder if these warning should come from each helper
functions that are static to this function anyway.

It also makes it easier to reason about this function by eliminating
the need for having a different pattern only for the verify helper.
Instead, verify could call rewrite internally when it notices a
breakage.  I.e.

	if (multi_pack_index_write())
		return 1;
	if (multi_pack_index_verify("after initial write"))
		return 1;
	if (multi_pack_index_exire())
		return 1;
	...

Also, it feels odd, compared to our internal API convention, that
positive non-zero is used as an error here.

> +	return 0;
> +}
> +
>  typedef int maintenance_task_fn(void);
>  
>  struct maintenance_task {
> @@ -1037,6 +1152,10 @@ static void initialize_tasks(void)
>  	tasks[num_tasks]->fn = maintenance_task_loose_objects;
>  	num_tasks++;
>  
> +	tasks[num_tasks]->name = "incremental-repack";
> +	tasks[num_tasks]->fn = maintenance_task_incremental_repack;
> +	num_tasks++;
> +
>  	tasks[num_tasks]->name = "gc";
>  	tasks[num_tasks]->fn = maintenance_task_gc;
>  	tasks[num_tasks]->enabled = 1;

Exactly the same comment as 08/18 about natural/inherent ordering
applies here as well.

Thanks.

  reply	other threads:[~2020-07-23 22:00 UTC|newest]

Thread overview: 164+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-07 14:21 [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 01/21] gc: use the_repository less often Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 02/21] gc: use repository in too_many_loose_objects() Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 03/21] gc: use repo config Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 04/21] gc: drop the_repository in log location Derrick Stolee via GitGitGadget
2020-07-09  2:22   ` Jonathan Tan
2020-07-09 11:13     ` Derrick Stolee
2020-07-07 14:21 ` [PATCH 05/21] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 06/21] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 07/21] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 08/21] maintenance: initialize task array and hashmap Derrick Stolee via GitGitGadget
2020-07-09  2:25   ` Jonathan Tan
2020-07-09 13:15     ` Derrick Stolee
2020-07-09 13:51       ` Junio C Hamano
2020-07-07 14:21 ` [PATCH 09/21] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-07-09  2:29   ` Jonathan Tan
2020-07-09 11:14     ` Derrick Stolee
2020-07-09 22:52       ` Jeff King
2020-07-09 23:41         ` Derrick Stolee
2020-07-07 14:21 ` [PATCH 10/21] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 11/21] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 12/21] maintenance: add fetch task Derrick Stolee via GitGitGadget
2020-07-09  2:35   ` Jonathan Tan
2020-07-07 14:21 ` [PATCH 13/21] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 14/21] maintenance: add pack-files task Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 15/21] maintenance: auto-size pack-files batch Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 16/21] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 17/21] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 18/21] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 19/21] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 20/21] maintenance: add pack-files auto condition Derrick Stolee via GitGitGadget
2020-07-07 14:21 ` [PATCH 21/21] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-07-08 23:57 ` [PATCH 00/21] Maintenance builtin, allowing 'gc --auto' customization Emily Shaffer
2020-07-09 11:21   ` Derrick Stolee
2020-07-09 12:43     ` Derrick Stolee
2020-07-09 23:16       ` Jeff King
2020-07-09 23:45         ` Derrick Stolee
2020-07-10 18:46           ` Emily Shaffer
2020-07-10 19:30             ` Son Luong Ngoc
2020-07-09 14:05     ` Junio C Hamano
2020-07-09 15:54       ` Derrick Stolee
2020-07-09 16:26         ` Junio C Hamano
2020-07-09 16:56           ` Derrick Stolee
2020-07-23 17:56 ` [PATCH v2 00/18] " Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 01/18] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-07-25  1:26     ` Taylor Blau
2020-07-25  1:47     ` Đoàn Trần Công Danh
2020-07-29 22:19     ` Jonathan Nieder
2020-07-30 13:12       ` Derrick Stolee
2020-07-31  0:30         ` Jonathan Nieder
2020-08-03 17:37           ` Derrick Stolee
2020-08-03 17:46             ` Jonathan Nieder
2020-08-03 22:46               ` Taylor Blau
2020-08-03 23:01                 ` Jonathan Nieder
2020-08-03 23:08                   ` Taylor Blau
2020-08-03 23:17                     ` Jonathan Nieder
2020-08-04  0:07                       ` Junio C Hamano
2020-08-04 13:32                       ` Derrick Stolee
2020-08-04 14:42                         ` Jonathan Nieder
2020-08-04 16:32                           ` Derrick Stolee
2020-08-04 17:02                             ` Jonathan Nieder
2020-08-04 17:51                               ` Derrick Stolee
2020-08-05 15:02               ` Derrick Stolee
2020-07-31 16:40         ` Jonathan Nieder
2020-08-03 23:52         ` Jonathan Nieder
2020-07-23 17:56   ` [PATCH v2 02/18] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 03/18] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-07-23 20:21     ` Junio C Hamano
2020-07-25  1:33       ` Taylor Blau
2020-07-30 13:29       ` Derrick Stolee
2020-07-30 13:31         ` Derrick Stolee
2020-07-30 19:00           ` Eric Sunshine
2020-07-30 20:21             ` Derrick Stolee
2020-07-23 17:56   ` [PATCH v2 04/18] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-07-23 19:57     ` Junio C Hamano
2020-07-24 12:23       ` Derrick Stolee
2020-07-24 12:51         ` Derrick Stolee
2020-07-24 19:39           ` Junio C Hamano
2020-07-25  1:46           ` Taylor Blau
2020-07-29 22:19     ` Emily Shaffer
2020-07-23 17:56   ` [PATCH v2 05/18] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-07-23 20:22     ` Junio C Hamano
2020-07-24 13:09       ` Derrick Stolee
2020-07-24 19:47         ` Junio C Hamano
2020-07-25  1:52           ` Taylor Blau
2020-07-30 13:59             ` Derrick Stolee
2020-07-29  0:22     ` Jeff King
2020-07-23 17:56   ` [PATCH v2 06/18] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-07-23 20:21     ` Junio C Hamano
2020-07-23 22:18       ` Junio C Hamano
2020-07-24 13:36       ` Derrick Stolee
2020-07-24 19:50         ` Junio C Hamano
2020-07-23 17:56   ` [PATCH v2 07/18] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 08/18] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-07-23 20:53     ` Junio C Hamano
2020-07-24 14:25       ` Derrick Stolee
2020-07-24 20:47         ` Junio C Hamano
2020-07-25  1:37     ` Đoàn Trần Công Danh
2020-07-25  1:48       ` Junio C Hamano
2020-07-27 14:07         ` Derrick Stolee
2020-07-27 16:13           ` Junio C Hamano
2020-07-27 18:27             ` Derrick Stolee
2020-07-28 16:37               ` [PATCH v2] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano
2020-07-29  9:12                 ` Phillip Wood
2020-07-29  9:17                   ` Phillip Wood
2020-07-30 15:17                 ` Derrick Stolee
2020-07-23 17:56   ` [PATCH v2 09/18] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-07-23 20:59     ` Junio C Hamano
2020-07-24 14:50       ` Derrick Stolee
2020-07-24 19:57         ` Junio C Hamano
2020-07-29 22:21     ` Emily Shaffer
2020-07-30 15:38       ` Derrick Stolee
2020-07-23 17:56   ` [PATCH v2 10/18] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-07-23 22:00     ` Junio C Hamano [this message]
2020-07-24 15:03       ` Derrick Stolee
2020-07-29 22:22     ` Emily Shaffer
2020-07-23 17:56   ` [PATCH v2 11/18] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-07-23 22:15     ` Junio C Hamano
2020-07-23 23:09       ` Eric Sunshine
2020-07-23 23:24         ` Junio C Hamano
2020-07-24 16:09           ` Derrick Stolee
2020-07-24 19:51       ` Derrick Stolee
2020-07-24 20:17         ` Junio C Hamano
2020-07-29 22:23     ` Emily Shaffer
2020-07-30 16:57       ` Derrick Stolee
2020-07-30 19:02         ` Derrick Stolee
2020-07-30 19:24           ` Chris Torek
2020-08-05 12:37           ` Đoàn Trần Công Danh
2020-08-06 13:54             ` Derrick Stolee
2020-07-23 17:56   ` [PATCH v2 12/18] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 13/18] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 14/18] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 15/18] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 16/18] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 17/18] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-07-23 17:56   ` [PATCH v2 18/18] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-07-29 22:03   ` [PATCH v2 00/18] Maintenance builtin, allowing 'gc --auto' customization Emily Shaffer
2020-07-30 22:24   ` [PATCH v3 00/20] " Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 01/20] maintenance: create basic maintenance runner Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 02/20] maintenance: add --quiet option Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 03/20] maintenance: replace run_auto_gc() Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 04/20] maintenance: initialize task array Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 05/20] maintenance: add commit-graph task Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 06/20] maintenance: add --task option Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 07/20] maintenance: take a lock on the objects directory Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 08/20] fetch: optionally allow disabling FETCH_HEAD update Junio C Hamano via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 09/20] maintenance: add prefetch task Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 10/20] maintenance: add loose-objects task Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 11/20] midx: enable core.multiPackIndex by default Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 12/20] maintenance: add incremental-repack task Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 13/20] maintenance: auto-size incremental-repack batch Derrick Stolee via GitGitGadget
2020-07-30 23:36       ` Chris Torek
2020-08-03 17:43         ` Derrick Stolee
2020-07-30 22:24     ` [PATCH v3 14/20] maintenance: create maintenance.<task>.enabled config Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 15/20] maintenance: use pointers to check --auto Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 16/20] maintenance: add auto condition for commit-graph task Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 17/20] maintenance: create auto condition for loose-objects Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 18/20] maintenance: add incremental-repack auto condition Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 19/20] midx: use start_delayed_progress() Derrick Stolee via GitGitGadget
2020-07-30 22:24     ` [PATCH v3 20/20] maintenance: add trace2 regions for task execution Derrick Stolee via GitGitGadget
2020-07-30 23:06     ` [PATCH v3 00/20] Maintenance builtin, allowing 'gc --auto' customization Junio C Hamano
2020-07-30 23:31     ` Junio C Hamano
2020-07-31  2:58       ` Junio C Hamano
2020-08-06 17:58     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqeep1q4d0.fsf@gitster.c.googlers.com \
    --to=gitster@pobox.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=congdanhqx@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --cc=sluongng@gmail.com \
    --cc=steadmon@google.com \
    --subject='Re: [PATCH v2 10/18] maintenance: add incremental-repack task' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).