All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] read-cache: update index format default to v4
@ 2018-09-24 21:15 Derrick Stolee via GitGitGadget
  2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-24 21:15 UTC (permalink / raw)
  To: git; +Cc: pclouds, peartben, git, Junio C Hamano

After discussing this with several people off-list, I thought I would open
the question up to the list:

Should we update the default index version to v4?

The .git/index file stores the list of every path tracked by Git in the
working directory, including merge information, staged paths, and
information about the file system contents (based on modified time). The
only major update in v4 is that the paths are prefix-compressed. This
compression works best in repos with a lot of paths, especially deep paths.
For this reason, we set the index to v4 in VFS for Git.

Among VFS for Git contributors, we were talking about how the v4 format is
not covered by the test suite by default. We are working to increase the
number of CI builds that set extra GIT_TEST_* variables that we need.
However, I thought it worth having a discussion of whether this is a good
thing to recommend for all users of Git.

Personally, I'm not an expert here, but I am happy to start the
conversation.

Thanks, -Stolee

Derrick Stolee (1):
  read-cache: update index format default to v4

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: 53f9a3e157dbbc901a02ac2c73346d375e24978c
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-41%2Fderrickstolee%2Findex-v4-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-41/derrickstolee/index-v4-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/41
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] read-cache: update index format default to v4
  2018-09-24 21:15 [PATCH 0/1] read-cache: update index format default to v4 Derrick Stolee via GitGitGadget
@ 2018-09-24 21:15 ` Derrick Stolee via GitGitGadget
  2018-09-24 21:32   ` SZEDER Gábor
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-24 21:15 UTC (permalink / raw)
  To: git; +Cc: pclouds, peartben, git, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The index v4 format has been available since 2012 with 9d22778
"reach-cache.c: write prefix-compressed names in the index". Since
the format has been stable for so long, almost all versions of Git
in use today understand version 4, removing one barrier to upgrade
-- that someone may want to downgrade and needs a working repo.

Despite being stable for a long time, this index version was never
adopted as the default. This prefix-compressed version of the format
can get significant space savings on repos with large working
directories (which naturally tend to have deep nesting). This version
is set as the default for some external tools, such as VFS for Git.
Because of this external use, the format has had a lot of "testing in
production" and also is subject to continuous integration in these
environments.

Previously, to test version 4 indexes, we needed to run the test
suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).

One potential, but short-term, downside is that we lose coverage of
the version 3 indexes. The trade-off is that we may want to cover
that version using GIT_TEST_INDEX_VERSION=3.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 372588260e..af6c8f2a67 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
  * Index File I/O
  *****************************************************************/
 
-#define INDEX_FORMAT_DEFAULT 3
+#define INDEX_FORMAT_DEFAULT 4
 
 static unsigned int get_index_format_default(void)
 {
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
@ 2018-09-24 21:32   ` SZEDER Gábor
  2018-09-24 22:29     ` Junio C Hamano
  2018-09-25  7:06     ` Patrick Steinhardt
  0 siblings, 2 replies; 9+ messages in thread
From: SZEDER Gábor @ 2018-09-24 21:32 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, pclouds, peartben, git, Junio C Hamano, Derrick Stolee

On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The index v4 format has been available since 2012 with 9d22778
> "reach-cache.c: write prefix-compressed names in the index". Since
> the format has been stable for so long, almost all versions of Git
> in use today understand version 4, removing one barrier to upgrade
> -- that someone may want to downgrade and needs a working repo.

What about alternative implementations, like JGit, libgit2, etc.?

> Despite being stable for a long time, this index version was never
> adopted as the default. This prefix-compressed version of the format
> can get significant space savings on repos with large working
> directories (which naturally tend to have deep nesting). This version
> is set as the default for some external tools, such as VFS for Git.
> Because of this external use, the format has had a lot of "testing in
> production" and also is subject to continuous integration in these
> environments.
> 
> Previously, to test version 4 indexes, we needed to run the test
> suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> 
> One potential, but short-term, downside is that we lose coverage of
> the version 3 indexes. The trade-off is that we may want to cover
> that version using GIT_TEST_INDEX_VERSION=3.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  read-cache.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 372588260e..af6c8f2a67 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
>   * Index File I/O
>   *****************************************************************/
>  
> -#define INDEX_FORMAT_DEFAULT 3
> +#define INDEX_FORMAT_DEFAULT 4
>  
>  static unsigned int get_index_format_default(void)
>  {
> -- 
> gitgitgadget

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-24 21:32   ` SZEDER Gábor
@ 2018-09-24 22:29     ` Junio C Hamano
  2018-09-25  7:06     ` Patrick Steinhardt
  1 sibling, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2018-09-24 22:29 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
	Derrick Stolee

SZEDER Gábor <szeder.dev@gmail.com> writes:

> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> The index v4 format has been available since 2012 with 9d22778
>> "reach-cache.c: write prefix-compressed names in the index". Since
>> the format has been stable for so long, almost all versions of Git
>> in use today understand version 4, removing one barrier to upgrade
>> -- that someone may want to downgrade and needs a working repo.
>
> What about alternative implementations, like JGit, libgit2, etc.?

Good question.

Because the index-version of an index file is designed to be sticky,
repos that need to be accessed by other implementations can keep
whatever current version.  A new repo that need to be accessed by
them can be (forcibly) written in v2 and keep its v2ness, I would
think.

And that would serve as an incentive for the implementations to
catch up ;-)




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-24 21:32   ` SZEDER Gábor
  2018-09-24 22:29     ` Junio C Hamano
@ 2018-09-25  7:06     ` Patrick Steinhardt
  2018-09-25 14:29       ` Derrick Stolee
  1 sibling, 1 reply; 9+ messages in thread
From: Patrick Steinhardt @ 2018-09-25  7:06 UTC (permalink / raw)
  To: SZEDER Gábor
  Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
	Junio C Hamano, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]

On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > From: Derrick Stolee <dstolee@microsoft.com>
> > 
> > The index v4 format has been available since 2012 with 9d22778
> > "reach-cache.c: write prefix-compressed names in the index". Since
> > the format has been stable for so long, almost all versions of Git
> > in use today understand version 4, removing one barrier to upgrade
> > -- that someone may want to downgrade and needs a working repo.
> 
> What about alternative implementations, like JGit, libgit2, etc.?

Speaking of libgit2, we are able to read and write index v4 since
commit c1b370e93 (Merge pull request #3837 from
novalis/dturner/indexv4, 2016-08-17), released with v0.25 in
December 2016. Due to insufficient tests, our read support was
initially broken, which got fixed with commit 3bc95cfe3 (Merge
pull request #4236 from pks-t/pks/index-v4-fixes, 2017-06-07) and
released with v0.26 in June 2017.

Right now I'm not aware of additional bugs in our index v4
support, so we'd be fine with changing the default.

Patrick

> > Despite being stable for a long time, this index version was never
> > adopted as the default. This prefix-compressed version of the format
> > can get significant space savings on repos with large working
> > directories (which naturally tend to have deep nesting). This version
> > is set as the default for some external tools, such as VFS for Git.
> > Because of this external use, the format has had a lot of "testing in
> > production" and also is subject to continuous integration in these
> > environments.
> > 
> > Previously, to test version 4 indexes, we needed to run the test
> > suite with GIT_TEST_INDEX_VERSION=4 (or TEST_GIT_INDEX_VERSION=4).
> > 
> > One potential, but short-term, downside is that we lose coverage of
> > the version 3 indexes. The trade-off is that we may want to cover
> > that version using GIT_TEST_INDEX_VERSION=3.
> > 
> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> > ---
> >  read-cache.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/read-cache.c b/read-cache.c
> > index 372588260e..af6c8f2a67 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1484,7 +1484,7 @@ struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
> >   * Index File I/O
> >   *****************************************************************/
> >  
> > -#define INDEX_FORMAT_DEFAULT 3
> > +#define INDEX_FORMAT_DEFAULT 4
> >  
> >  static unsigned int get_index_format_default(void)
> >  {
> > -- 
> > gitgitgadget

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-25  7:06     ` Patrick Steinhardt
@ 2018-09-25 14:29       ` Derrick Stolee
  2018-09-25 18:01         ` Stefan Beller
  0 siblings, 1 reply; 9+ messages in thread
From: Derrick Stolee @ 2018-09-25 14:29 UTC (permalink / raw)
  To: Patrick Steinhardt, SZEDER Gábor
  Cc: Derrick Stolee via GitGitGadget, git, pclouds, peartben, git,
	Junio C Hamano, Derrick Stolee, Jonathan Nieder

On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>
>>> The index v4 format has been available since 2012 with 9d22778
>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>> the format has been stable for so long, almost all versions of Git
>>> in use today understand version 4, removing one barrier to upgrade
>>> -- that someone may want to downgrade and needs a working repo.
>> What about alternative implementations, like JGit, libgit2, etc.?
> Speaking of libgit2, we are able to read and write index v4 since
> commit c1b370e93

This is a good point, Szeder.

Patrick: I'm glad LibGit2 is up-to-date with index formats.

Unfortunately, taking a look (for the first time) at the JGit code 
reveals that they don't appear to have v4 support. In 
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the 
DirCache.readFrom() method: lines 488-494, I see the following snippet:

                 final int ver = NB.decodeInt32(hdr, 4);
                 boolean extended = false;
                 if (ver == 3)
                         extended = true;
                 else if (ver != 2)
                         throw new 
CorruptObjectException(MessageFormat.format(
JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));

It looks like this will immediately throw with versions other than 2 or 3.

I'm adding Jonathan Nieder to CC so he can check with JGit people about 
the impact of this change.

Thanks,

-Stolee



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-25 14:29       ` Derrick Stolee
@ 2018-09-25 18:01         ` Stefan Beller
  2018-09-25 21:22           ` Ben Peart
  2018-09-26 21:03           ` Matthias Sohn
  0 siblings, 2 replies; 9+ messages in thread
From: Stefan Beller @ 2018-09-25 18:01 UTC (permalink / raw)
  To: Derrick Stolee, Matthias Sohn
  Cc: Patrick Steinhardt, SZEDER Gábor, gitgitgadget, git,
	Duy Nguyen, Ben Peart, Jeff Hostetler, Junio C Hamano,
	Derrick Stolee, Jonathan Nieder

On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> >>> From: Derrick Stolee <dstolee@microsoft.com>
> >>>
> >>> The index v4 format has been available since 2012 with 9d22778
> >>> "reach-cache.c: write prefix-compressed names in the index". Since
> >>> the format has been stable for so long, almost all versions of Git
> >>> in use today understand version 4, removing one barrier to upgrade
> >>> -- that someone may want to downgrade and needs a working repo.
> >> What about alternative implementations, like JGit, libgit2, etc.?
> > Speaking of libgit2, we are able to read and write index v4 since
> > commit c1b370e93
>
> This is a good point, Szeder.
>
> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>
> Unfortunately, taking a look (for the first time) at the JGit code
> reveals that they don't appear to have v4 support. In
> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>
>                  final int ver = NB.decodeInt32(hdr, 4);
>                  boolean extended = false;
>                  if (ver == 3)
>                          extended = true;
>                  else if (ver != 2)
>                          throw new
> CorruptObjectException(MessageFormat.format(
> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>
> It looks like this will immediately throw with versions other than 2 or 3.
>
> I'm adding Jonathan Nieder to CC so he can check with JGit people about
> the impact of this change.

JGit is used both on the server (which doesn't use index/staging area)
as well as client side as e.g. an Eclipse integration, which would
very much like to use the index.

Adding Matthias Sohn as well, who is active in JGit and cares
more about the client side than Googlers who only make use
of the server side part of JGit.

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-25 18:01         ` Stefan Beller
@ 2018-09-25 21:22           ` Ben Peart
  2018-09-26 21:03           ` Matthias Sohn
  1 sibling, 0 replies; 9+ messages in thread
From: Ben Peart @ 2018-09-25 21:22 UTC (permalink / raw)
  To: Stefan Beller, Derrick Stolee, Matthias Sohn
  Cc: Patrick Steinhardt, SZEDER Gábor, gitgitgadget, git,
	Duy Nguyen, Jeff Hostetler, Junio C Hamano, Derrick Stolee,
	Jonathan Nieder



On 9/25/2018 2:01 PM, Stefan Beller wrote:
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
>>> On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
>>>> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
>>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>>
>>>>> The index v4 format has been available since 2012 with 9d22778
>>>>> "reach-cache.c: write prefix-compressed names in the index". Since
>>>>> the format has been stable for so long, almost all versions of Git
>>>>> in use today understand version 4, removing one barrier to upgrade
>>>>> -- that someone may want to downgrade and needs a working repo.
>>>> What about alternative implementations, like JGit, libgit2, etc.?
>>> Speaking of libgit2, we are able to read and write index v4 since
>>> commit c1b370e93
>>
>> This is a good point, Szeder.
>>
>> Patrick: I'm glad LibGit2 is up-to-date with index formats.
>>
>> Unfortunately, taking a look (for the first time) at the JGit code
>> reveals that they don't appear to have v4 support. In
>> org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
>> DirCache.readFrom() method: lines 488-494, I see the following snippet:
>>
>>                   final int ver = NB.decodeInt32(hdr, 4);
>>                   boolean extended = false;
>>                   if (ver == 3)
>>                           extended = true;
>>                   else if (ver != 2)
>>                           throw new
>> CorruptObjectException(MessageFormat.format(
>> JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
>>
>> It looks like this will immediately throw with versions other than 2 or 3.
>>
>> I'm adding Jonathan Nieder to CC so he can check with JGit people about
>> the impact of this change.
> 
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
> 
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.
> 
> Stefan
> 

In addition to the compatibility with other implementations of git 
question, I think we should include Duy's patch [1] to "optimize reading 
index format v4" before flipping the default to V4.

In my testing, this reduced the index load time of V4 by 10%-15% making 
the CPU cost only ~2% more expensive than V2 or V3 indexes.  This 
increase in CPU cost is more than offset by the significant reduction in 
file IO due to the reduced index file size.

[1] https://public-inbox.org/git/20180825064458.28484-1-pclouds@gmail.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] read-cache: update index format default to v4
  2018-09-25 18:01         ` Stefan Beller
  2018-09-25 21:22           ` Ben Peart
@ 2018-09-26 21:03           ` Matthias Sohn
  1 sibling, 0 replies; 9+ messages in thread
From: Matthias Sohn @ 2018-09-26 21:03 UTC (permalink / raw)
  To: Stefan Beller
  Cc: stolee, ps, szeder.dev, gitgitgadget, git, pclouds, peartben,
	git, Junio C Hamano, dstolee, jrnieder

On Tue, Sep 25, 2018 at 8:01 PM Stefan Beller <sbeller@google.com> wrote:
>
> On Tue, Sep 25, 2018 at 7:30 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 9/25/2018 3:06 AM, Patrick Steinhardt wrote:
> > > On Mon, Sep 24, 2018 at 11:32:23PM +0200, SZEDER Gábor wrote:
> > >> On Mon, Sep 24, 2018 at 02:15:30PM -0700, Derrick Stolee via GitGitGadget wrote:
> > >>> From: Derrick Stolee <dstolee@microsoft.com>
> > >>>
> > >>> The index v4 format has been available since 2012 with 9d22778
> > >>> "reach-cache.c: write prefix-compressed names in the index". Since
> > >>> the format has been stable for so long, almost all versions of Git
> > >>> in use today understand version 4, removing one barrier to upgrade
> > >>> -- that someone may want to downgrade and needs a working repo.
> > >> What about alternative implementations, like JGit, libgit2, etc.?
> > > Speaking of libgit2, we are able to read and write index v4 since
> > > commit c1b370e93
> >
> > This is a good point, Szeder.
> >
> > Patrick: I'm glad LibGit2 is up-to-date with index formats.
> >
> > Unfortunately, taking a look (for the first time) at the JGit code
> > reveals that they don't appear to have v4 support. In
> > org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java, the
> > DirCache.readFrom() method: lines 488-494, I see the following snippet:
> >
> >                  final int ver = NB.decodeInt32(hdr, 4);
> >                  boolean extended = false;
> >                  if (ver == 3)
> >                          extended = true;
> >                  else if (ver != 2)
> >                          throw new
> > CorruptObjectException(MessageFormat.format(
> > JGitText.get().unknownDIRCVersion, Integer.valueOf(ver)));
> >
> > It looks like this will immediately throw with versions other than 2 or 3.
> >
> > I'm adding Jonathan Nieder to CC so he can check with JGit people about
> > the impact of this change.
>
> JGit is used both on the server (which doesn't use index/staging area)
> as well as client side as e.g. an Eclipse integration, which would
> very much like to use the index.
>
> Adding Matthias Sohn as well, who is active in JGit and cares
> more about the client side than Googlers who only make use
> of the server side part of JGit.

thanks for the heads up, in fact JGit does not yet support index version 4.
I will look into this.

-Matthias

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-09-26 21:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-24 21:15 [PATCH 0/1] read-cache: update index format default to v4 Derrick Stolee via GitGitGadget
2018-09-24 21:15 ` [PATCH 1/1] " Derrick Stolee via GitGitGadget
2018-09-24 21:32   ` SZEDER Gábor
2018-09-24 22:29     ` Junio C Hamano
2018-09-25  7:06     ` Patrick Steinhardt
2018-09-25 14:29       ` Derrick Stolee
2018-09-25 18:01         ` Stefan Beller
2018-09-25 21:22           ` Ben Peart
2018-09-26 21:03           ` Matthias Sohn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.