git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: git@vger.kernel.org, dstolee@microsoft.com,
	git@jeffhostetler.com, peff@peff.net, gitster@pobox.com,
	Johannes.Shindelin@gmx.de, jrnieder@gmail.com
Subject: Re: [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes
Date: Mon, 8 Jan 2018 11:32:26 -0800	[thread overview]
Message-ID: <20180108113226.da265814e5c1deea1f8c404d@google.com> (raw)
In-Reply-To: <20180107181459.222909-2-dstolee@microsoft.com>

On Sun,  7 Jan 2018 13:14:42 -0500
Derrick Stolee <stolee@gmail.com> wrote:

> +Design Details
> +--------------
> +
> +- The MIDX file refers only to packfiles in the same directory
> +  as the MIDX file.
> +
> +- A special file, 'midx-head', stores the hash of the latest
> +  MIDX file so we can load the file without performing a dirstat.
> +  This file is especially important with incremental MIDX files,
> +  pointing to the newest file.

I presume that the actual MIDX files are named by hash? (You might have
written this somewhere that I somehow missed.)

Also, I notice that in the "Future Work" section, the possibility of
multiple MIDX files is raised. Could this 'midx-head' file be allowed to
store multiple such files? That way, we avoid a bit of file format
churn (in that we won't need to define a new "chunk" in the future).

> +- If a packfile exists in the pack directory but is not referenced
> +  by the MIDX file, then the packfile is loaded into the packed_git
> +  list and Git can access the objects as usual. This behavior is
> +  necessary since other tools could add packfiles to the pack
> +  directory without notifying Git.
> +
> +- The MIDX file should be only a supplemental structure. If a
> +  user downgrades or disables the `core.midx` config setting,
> +  then the existing .idx and .pack files should be sufficient
> +  to operate correctly.

Let me try to summarize: so, at this point, there are no
backwards-incompatible changes to the repo disk format. Unupdated code
paths (and old versions of Git) can just read the .idx and .pack files,
as always. Updated code paths will look at the .midx and .idx files, and
will sort them as follows:
 - .midx files go into a data structure
 - .idx files not referenced by any .midx files go into the
   existing packed_git data structure

A writer can either merely write a new packfile (like old versions of
Git) or write a packfile and update the .midx file, and everything above
will still work. In the event that a writer deletes an existing packfile
referenced by a .midx (for example, old versions of Git during a
repack), we will lose the advantages of the .midx file - we will detect
that the .midx no longer works when attempting to read an object given
its information, but in this case, we can recover by dropping the .midx
file and loading all the .idx files it references that still exist.

As a reviewer, I think this is a very good approach, and this does make
things easier to review (as opposed to, say, an approach where a lot of
the code must be aware of .midx files).

  reply	other threads:[~2018-01-08 19:32 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-07 18:14 [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 01/18] docs: Multi-Pack Index (MIDX) Design Notes Derrick Stolee
2018-01-08 19:32   ` Jonathan Tan [this message]
2018-01-08 20:35     ` Derrick Stolee
2018-01-08 22:06       ` Jonathan Tan
2018-01-07 18:14 ` [RFC PATCH 02/18] midx: specify midx file format Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 03/18] midx: create core.midx config setting Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 04/18] midx: write multi-pack indexes for an object list Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 05/18] midx: create midx builtin with --write mode Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 06/18] midx: add t5318-midx.sh test script Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 07/18] midx: teach midx --write to update midx-head Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 08/18] midx: teach git-midx to read midx file details Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 09/18] midx: find details of nth object in midx Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 10/18] midx: use existing midx when writing Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 11/18] midx: teach git-midx to clear midx files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 12/18] midx: teach git-midx to delete expired files Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 13/18] t5318-midx.h: confirm git actions are stable Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 14/18] midx: load midx files when loading packs Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 15/18] midx: use midx for approximate object count Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 16/18] midx: nth_midxed_object_oid() and bsearch_midx() Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 17/18] sha1_name: use midx for abbreviations Derrick Stolee
2018-01-07 18:14 ` [RFC PATCH 18/18] packfile: use midx for object loads Derrick Stolee
2018-01-07 22:42 ` [RFC PATCH 00/18] Multi-pack index (MIDX) Ævar Arnfjörð Bjarmason
2018-01-08  0:08   ` Derrick Stolee
2018-01-08 10:20     ` Jeff King
2018-01-08 10:27       ` Jeff King
2018-01-08 12:28         ` Ævar Arnfjörð Bjarmason
2018-01-08 13:43       ` Johannes Schindelin
2018-01-09  6:50         ` Jeff King
2018-01-09 13:05           ` Johannes Schindelin
2018-01-09 19:51             ` Stefan Beller
2018-01-09 20:12               ` Junio C Hamano
2018-01-09 20:16                 ` Stefan Beller
2018-01-09 21:31                   ` Junio C Hamano
2018-01-10 17:05               ` Johannes Schindelin
2018-01-10 10:57             ` Jeff King
2018-01-08 13:43       ` Derrick Stolee
2018-01-09  7:12         ` Jeff King
2018-01-08 11:43     ` Ævar Arnfjörð Bjarmason
2018-06-06  8:13     ` Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 0/2] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 1/2] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-06 10:27       ` [RFC PATCH 2/2] sha1-name: add core.validateAbbrev & relative core.abbrev Ævar Arnfjörð Bjarmason
2018-06-06 12:04         ` Christian Couder
2018-06-06 11:24       ` [RFC PATCH 00/18] Multi-pack index (MIDX) Derrick Stolee
2018-01-10 18:25 ` Martin Fick
2018-01-10 19:39   ` Derrick Stolee
2018-01-10 21:01     ` Martin Fick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180108113226.da265814e5c1deea1f8c404d@google.com \
    --to=jonathantanmy@google.com \
    --cc=Johannes.Shindelin@gmx.de \
    --cc=dstolee@microsoft.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).