git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Son Luong Ngoc <sluongng@gmail.com>, Taylor Blau <me@ttaylorr.com>
Cc: git <git@vger.kernel.org>, dstolee@microsoft.com
Subject: Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
Date: Tue, 25 Aug 2020 08:45:50 -0400	[thread overview]
Message-ID: <4caa4d87-d5f8-6f34-025c-0f23506a19fb@gmail.com> (raw)
In-Reply-To: <CB6B70D3-5FC6-43FE-8460-33F6CFC123E6@gmail.com>

On 8/25/2020 3:55 AM, Son Luong Ngoc wrote:
> Hi Taylor,
> 
> Thanks for working on this.
> 
>> On Aug 25, 2020, at 04:01, Taylor Blau <me@ttaylorr.com> wrote:
>>
>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>> learned to remove a multi-pack-index file if it added or removed a pack
>> from the object store.
>>
>> This mechanism is a little over-eager, since it is only necessary to
>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>> Adding a pack outside of the MIDX does not require invalidating the
>> MIDX, and likewise for removing a pack the MIDX does not know about.
> 
> I wonder if its worth to trigger write_midx_file() to update the midx instead of
> just removing MIDX?

It would be reasonable to have 'git repack' run write_midx_file() after updating
the pack-files, but it still needs to delete the existing multi-pack-index after
deleting a referenced pack, as the current multi-pack-index will be incorrect,
leading to possibly invalid data during the write. 'git multi-pack-index expire'
carefully handles deleting pack-files that have become redundant.

It may be possible to change the behavior of write_midx_file() to check for the
state of each referenced pack-file, but it would then need to re-scan the
pack-indexes to rebuild the set of objects and which pack-files contain them.

> That is already the direction we are taking in the 'maintenance' patch series
> whenever the multi-pack-index file was deemed invalid.

The 'maintenance' builtin definitely adds new MIDX-aware maintenance tasks.
By performing a repack using the multi-pack-index as the starting point, we
can get around these issues.

One thing I've noticed by talking with Taylor about this change is that
'git repack' is a bit like 'git gc' in that it does a LOT of things and it
can be hard to understand exactly when certain sub-tasks are run or not.
There are likely some interesting operations that could be separated into
maintenance tasks. For example, 'git repack -d' is very similar to
'git maintenance run --task=loose-objects'.

> Or perhaps, we can check for 'core.multiPackIndex' value (which recently is 
> 'true' by default) and determine whether we should remove the MIDX or rewrite
> it?

We currently rewrite it during 'git repack' when GIT_TEST_MULTI_PACK_INDEX=1
to maximize how often we have a multi-pack-index in the test suite. It would
be simple to update that to also check a config option. I don't think adding
that behavior to 'core.multiPackIndex' is a good direction, though.

Thanks,
-Stolee

  reply	other threads:[~2020-08-25 12:45 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-25  2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25  2:26 ` Jeff King
2020-08-25  2:37   ` Taylor Blau
2020-08-25 13:14     ` Derrick Stolee
2020-08-25 14:41       ` Taylor Blau
2020-08-25 15:14         ` Derrick Stolee
2020-08-25 15:42           ` Taylor Blau
2020-08-25 16:56             ` Jeff King
2020-08-25 15:58   ` Junio C Hamano
2020-08-25 16:08     ` Taylor Blau
2020-08-25 16:18     ` Derrick Stolee
2020-08-25 17:34       ` Jeff King
2020-08-25 17:22     ` Jeff King
2020-08-25 18:05       ` Junio C Hamano
2020-08-25 18:27         ` Jeff King
2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
2020-08-25 23:09             ` Taylor Blau
2020-08-25 23:22               ` Junio C Hamano
2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
2020-08-26  1:24                 ` Eric Sunshine
2020-08-26  7:55                   ` Johannes Schindelin
2020-08-26 16:27                   ` Junio C Hamano
2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26  1:28                 ` Eric Sunshine
2020-08-26  1:42                   ` Junio C Hamano
2020-08-26 16:08                   ` Junio C Hamano
2020-08-26 16:28                     ` Junio C Hamano
2020-08-26  8:02                 ` Johannes Schindelin
2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
2020-08-26  1:19                 ` Junio C Hamano
2020-08-26  8:06                 ` Johannes Schindelin
2020-08-26 16:30                   ` Junio C Hamano
2020-08-28  2:13                 ` Johannes Schindelin
2020-08-28 22:03                   ` Junio C Hamano
2020-08-31  9:59                     ` Johannes Schindelin
2020-08-31 17:45                       ` Junio C Hamano
2020-12-20 15:25                   ` Johannes Schindelin
2020-12-21 22:24                     ` Junio C Hamano
2020-12-30  5:30                       ` Johannes Schindelin
2020-08-26  8:09               ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
2020-08-26 16:45                 ` Junio C Hamano
2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
2020-08-26 19:46                     ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
2020-08-26 19:46                     ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
2020-08-27  4:21                           ` Jeff King
2020-08-27  4:30                             ` Junio C Hamano
2020-08-27  4:31                             ` Eric Sunshine
2020-08-27  4:44                               ` Jeff King
2020-08-27  5:03                                 ` Eric Sunshine
2020-08-27  5:25                                   ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
2020-08-27  5:56                                     ` Eric Sunshine
2020-08-27 15:31                                       ` Junio C Hamano
2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
2020-08-27  4:22                           ` Jeff King
2020-08-27  4:31                           ` Junio C Hamano
2020-08-27  4:14                         ` Jeff King
2020-08-27 15:34                           ` Junio C Hamano
2020-08-31 22:56                         ` Junio C Hamano
2020-09-01  4:49                           ` Jeff King
2020-09-01 16:11                             ` Junio C Hamano
2020-08-27  0:57                     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
2020-08-27  1:22                       ` Junio C Hamano
2020-08-28  9:14             ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
2020-08-28 22:45               ` Junio C Hamano
2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
2020-08-25 12:45   ` Derrick Stolee [this message]
2020-08-25 14:45   ` Taylor Blau
2020-08-25 16:04     ` [PATCH v2] " Taylor Blau
2020-08-26 20:51       ` Derrick Stolee
2020-08-26 20:54         ` Junio C Hamano
2020-08-25 16:47     ` [PATCH] " Jeff King
2020-08-25 17:10       ` Derrick Stolee
2020-08-25 17:29         ` Jeff King
2020-08-25 17:34           ` Taylor Blau
2020-08-25 17:42             ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4caa4d87-d5f8-6f34-025c-0f23506a19fb@gmail.com \
    --to=stolee@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).