All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] split-index behavior during interactive rebase
@ 2021-09-16  5:50 Matt Roper
  2021-09-21  7:34 ` Lucas De Marchi
  2021-09-26 21:57 ` SZEDER Gábor
  0 siblings, 2 replies; 4+ messages in thread
From: Matt Roper @ 2021-09-16  5:50 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

  I activated split index mode on a repo ("git config core.splitIndex
  true"), performed an interactive rebase, modified a commit earlier in
  the history.

  The steps can be reproduced via a sequence of:
      $ mkdir tmp && cd tmp && git init
      $ git config core.splitIndex true
      $ for x in `seq 20`; do echo $x >> count; git add count; git commit -m "Commit $x"; done
      $ git rebase -i HEAD~10
      
      ## Add "x git commit --amend --no-edit" as the first command of
      ## the todo list.

What did you expect to happen? (Expected behavior)

  My expectation was that there would still only be a single shared index
  file in the .git directory upon completion of the rebase.

What happened instead? (Actual behavior)

  A large number of distinct sharedindex.* files were generated in the .git
  directory during the rebase.

What's different between what you expected and what actually happened?

  Rather than a single shared index file, I wound up with a huge number of
  large shared index files.  The real repository I was working with (a Linux
  kernel source tree) had a shared index file size of about 7MB, and I was
  modifying a commit several hundred back in history (in case it
  matters, these were all linear commits, no merges), so the resulting
  collection of shared index files consumed a surprising amount of disk
  space.

Anything else you want to add:

  As an experiment, I tried setting splitIndex.sharedIndexExpire=now to see
  if it would avoid the explosion of shared index files, but it appears the
  stale index files are still not being removed during the rebase, and I
  still wind up with a huge number at the end of the rebase.  If I manually
  run "git update-index --split-index" after the rebase completes it will
  properly delete all of the stale ones at that point.

  Rebases that do not actually modify the history do _not_ trigger the
  explosion of shared index files (e.g., "git rebase -i HEAD~10 --exec 'echo
  foo'").

  If I do not set the core.splitIndex setting on the repository, but only
  activate split index manually via "git update-index --split-index" there
  is only one shared index file at the end of the rebase, but based on the
  file size it appears the repository is no longer operating in split index
  mode.

  Before:
  $ ll .git | grep index
  -rw-rw-r--   1 mdroper mdroper   149165 Sep 15 22:21 index
  -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:21 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4

  After (just amending HEAD~1 in this case):
  $ ls -l .git | grep index
  -rw-rw-r--   1 mdroper mdroper  7445145 Sep 15 22:22 index
  -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:22 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4


[System Info]
git version:
git version 2.33.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.8.18-100.fc31.x86_64 #1 SMP Mon Nov 2 20:32:55 UTC 2020 x86_64
compiler info: gnuc: 9.3
libc info: glibc: 2.30
$SHELL (typically, interactive shell): /bin/bash


[Enabled Hooks]

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG REPORT] split-index behavior during interactive rebase
  2021-09-16  5:50 [BUG REPORT] split-index behavior during interactive rebase Matt Roper
@ 2021-09-21  7:34 ` Lucas De Marchi
  2021-09-26 21:57 ` SZEDER Gábor
  1 sibling, 0 replies; 4+ messages in thread
From: Lucas De Marchi @ 2021-09-21  7:34 UTC (permalink / raw)
  To: Matt Roper; +Cc: git

On Wed, Sep 15, 2021 at 10:50:57PM -0700, Matt Roper wrote:
>What did you do before the bug happened? (Steps to reproduce your issue)
>
>  I activated split index mode on a repo ("git config core.splitIndex
>  true"), performed an interactive rebase, modified a commit earlier in
>  the history.
>
>  The steps can be reproduced via a sequence of:
>      $ mkdir tmp && cd tmp && git init
>      $ git config core.splitIndex true
>      $ for x in `seq 20`; do echo $x >> count; git add count; git commit -m "Commit $x"; done
>      $ git rebase -i HEAD~10
>
>      ## Add "x git commit --amend --no-edit" as the first command of
>      ## the todo list.
>
>What did you expect to happen? (Expected behavior)
>
>  My expectation was that there would still only be a single shared index
>  file in the .git directory upon completion of the rebase.
>
>What happened instead? (Actual behavior)
>
>  A large number of distinct sharedindex.* files were generated in the .git
>  directory during the rebase.

Probably relevant to the debug, but I still didn't figure out the cause. This
works ok and only one .sharedindex is created

	git config core.splitIndex true
	git am 000[123].patch
	git config core.splitIndex false

Prepare test:
	git config core.splitIndex false
	git update-index --no-split-index
	rm .git/sharedindex.*
	git reset --hard HEAD~3

	git -c core.splitIndex=true am 000[123].patch

This will create 4 .git/sharedindex.* files.

Then it will create 1 .git/shareindex.* file per call to status if the
current head doesn't match the previous and the splitIndex doesn't match
the previous. This keeps increasing:

	git reset --hard ORIG_HEAD; git -c core.splitIndex=true status; ls -l .git/sharedindex.* | wc -l
	...
	4
	git reset --hard ORIG_HEAD; git -c core.splitIndex=true status; ls -l .git/sharedindex.* | wc -l
	...
	5
	...

note that if I pass -c core.splitIndex=true to git reset, this behavior
goes away. It seems that somehow the setting splitindex is getting reset
during git-am with multiple patches (or during rebase)... ?

Lucas De Marchi

>
>What's different between what you expected and what actually happened?
>
>  Rather than a single shared index file, I wound up with a huge number of
>  large shared index files.  The real repository I was working with (a Linux
>  kernel source tree) had a shared index file size of about 7MB, and I was
>  modifying a commit several hundred back in history (in case it
>  matters, these were all linear commits, no merges), so the resulting
>  collection of shared index files consumed a surprising amount of disk
>  space.
>
>Anything else you want to add:
>
>  As an experiment, I tried setting splitIndex.sharedIndexExpire=now to see
>  if it would avoid the explosion of shared index files, but it appears the
>  stale index files are still not being removed during the rebase, and I
>  still wind up with a huge number at the end of the rebase.  If I manually
>  run "git update-index --split-index" after the rebase completes it will
>  properly delete all of the stale ones at that point.
>
>  Rebases that do not actually modify the history do _not_ trigger the
>  explosion of shared index files (e.g., "git rebase -i HEAD~10 --exec 'echo
>  foo'").
>
>  If I do not set the core.splitIndex setting on the repository, but only
>  activate split index manually via "git update-index --split-index" there
>  is only one shared index file at the end of the rebase, but based on the
>  file size it appears the repository is no longer operating in split index
>  mode.
>
>  Before:
>  $ ll .git | grep index
>  -rw-rw-r--   1 mdroper mdroper   149165 Sep 15 22:21 index
>  -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:21 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4
>
>  After (just amending HEAD~1 in this case):
>  $ ls -l .git | grep index
>  -rw-rw-r--   1 mdroper mdroper  7445145 Sep 15 22:22 index
>  -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:22 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4
>
>
>[System Info]
>git version:
>git version 2.33.0
>cpu: x86_64
>no commit associated with this build
>sizeof-long: 8
>sizeof-size_t: 8
>shell-path: /bin/sh
>uname: Linux 5.8.18-100.fc31.x86_64 #1 SMP Mon Nov 2 20:32:55 UTC 2020 x86_64
>compiler info: gnuc: 9.3
>libc info: glibc: 2.30
>$SHELL (typically, interactive shell): /bin/bash
>
>
>[Enabled Hooks]
>
>-- 
>Matt Roper
>Graphics Software Engineer
>VTT-OSGC Platform Enablement
>Intel Corporation
>(916) 356-2795

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG REPORT] split-index behavior during interactive rebase
  2021-09-16  5:50 [BUG REPORT] split-index behavior during interactive rebase Matt Roper
  2021-09-21  7:34 ` Lucas De Marchi
@ 2021-09-26 21:57 ` SZEDER Gábor
  2021-09-27  2:17   ` Matt Roper
  1 sibling, 1 reply; 4+ messages in thread
From: SZEDER Gábor @ 2021-09-26 21:57 UTC (permalink / raw)
  To: Matt Roper; +Cc: git

On Wed, Sep 15, 2021 at 10:50:57PM -0700, Matt Roper wrote:
> What did you do before the bug happened? (Steps to reproduce your issue)
> 
>   I activated split index mode on a repo ("git config core.splitIndex
>   true"), performed an interactive rebase, modified a commit earlier in
>   the history.
> 
>   The steps can be reproduced via a sequence of:
>       $ mkdir tmp && cd tmp && git init
>       $ git config core.splitIndex true
>       $ for x in `seq 20`; do echo $x >> count; git add count; git commit -m "Commit $x"; done

It's important to note that this test repository has only a single
tracked file in it.

>       $ git rebase -i HEAD~10
>       
>       ## Add "x git commit --amend --no-edit" as the first command of
>       ## the todo list.
> 
> What did you expect to happen? (Expected behavior)
> 
>   My expectation was that there would still only be a single shared index
>   file in the .git directory upon completion of the rebase.
> 
> What happened instead? (Actual behavior)
> 
>   A large number of distinct sharedindex.* files were generated in the .git
>   directory during the rebase.

I think this works as intended.

A new shared index is written when the number of index entries that
would be writen to '.git/index' is higher than a given percentage of
the total number of index entries.  This percentage can be specified
with the 'splitIndex.maxPercentChange' configuration variable and it
defaults to 20%.  In your test repository above there is only a single
file and it is modified in every commit, so when switching from one
commit to the other 100% of the index entries would be written to
'.git/index', resulting in a new shared index file written for each
rebase step.

> What's different between what you expected and what actually happened?
> 
>   Rather than a single shared index file, I wound up with a huge number of
>   large shared index files.  The real repository I was working with (a Linux
>   kernel source tree) had a shared index file size of about 7MB, and I was
>   modifying a commit several hundred back in history (in case it
>   matters, these were all linear commits, no merges), so the resulting
>   collection of shared index files consumed a surprising amount of disk
>   space.

The last commit in my somewhat outdated linux repo contains ~71k
files, the 20% of that is ~14k.  Does that linear string of "several
hundred" commits modify that many files?

> Anything else you want to add:
> 
>   As an experiment, I tried setting splitIndex.sharedIndexExpire=now

I would advise against that, it's potentially dangerous, because it
can remove shared index files that are still in use by other git
processes.

>   to see
>   if it would avoid the explosion of shared index files, but it appears the
>   stale index files are still not being removed during the rebase, and I
>   still wind up with a huge number at the end of the rebase.  If I manually
>   run "git update-index --split-index" after the rebase completes it will
>   properly delete all of the stale ones at that point.
> 
>   Rebases that do not actually modify the history do _not_ trigger the
>   explosion of shared index files (e.g., "git rebase -i HEAD~10 --exec 'echo
>   foo'").
> 
>   If I do not set the core.splitIndex setting on the repository, but only
>   activate split index manually via "git update-index --split-index" there
>   is only one shared index file at the end of the rebase, but based on the
>   file size it appears the repository is no longer operating in split index
>   mode.
> 
>   Before:
>   $ ll .git | grep index
>   -rw-rw-r--   1 mdroper mdroper   149165 Sep 15 22:21 index
>   -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:21 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4
> 
>   After (just amending HEAD~1 in this case):
>   $ ls -l .git | grep index
>   -rw-rw-r--   1 mdroper mdroper  7445145 Sep 15 22:22 index
>   -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:22 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4

> git version 2.33.0

I could reproduce all this with v2.33.0 (except that I saw the split
index being turned off even with core.splitIndex enabled), but was
unable to do so with current master.

I think that this is a bug in the interaction between the split index
feature and 'git rebase' when using the recursive merge strategy and
when a couple of other, more subtle conditions are met.  It seems that
with the right conditions rebase only writes regular index files, and
by not entering the split index code paths it doesn't look for old
shared index files to expire.

After v2.33.0 we switched the default merge strategy from recursive to
'ort', and with that these cases appear to work as intended, i.e. old
shared index files are expired and the split index feature doesn't get
turned off.  Since the 'ort' strategy is in many ways better (faster,
more correct, etc.) than the recursive, I don't think it's worth the
effort to try to fix this issue with split index, rebase and the
recursive strategy.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [BUG REPORT] split-index behavior during interactive rebase
  2021-09-26 21:57 ` SZEDER Gábor
@ 2021-09-27  2:17   ` Matt Roper
  0 siblings, 0 replies; 4+ messages in thread
From: Matt Roper @ 2021-09-27  2:17 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: git

On Sun, Sep 26, 2021 at 11:57:03PM +0200, SZEDER Gábor wrote:
> On Wed, Sep 15, 2021 at 10:50:57PM -0700, Matt Roper wrote:
> > What did you do before the bug happened? (Steps to reproduce your issue)
> > 
> >   I activated split index mode on a repo ("git config core.splitIndex
> >   true"), performed an interactive rebase, modified a commit earlier in
> >   the history.
> > 
> >   The steps can be reproduced via a sequence of:
> >       $ mkdir tmp && cd tmp && git init
> >       $ git config core.splitIndex true
> >       $ for x in `seq 20`; do echo $x >> count; git add count; git commit -m "Commit $x"; done
> 
> It's important to note that this test repository has only a single
> tracked file in it.
> 
> >       $ git rebase -i HEAD~10
> >       
> >       ## Add "x git commit --amend --no-edit" as the first command of
> >       ## the todo list.
> > 
> > What did you expect to happen? (Expected behavior)
> > 
> >   My expectation was that there would still only be a single shared index
> >   file in the .git directory upon completion of the rebase.
> > 
> > What happened instead? (Actual behavior)
> > 
> >   A large number of distinct sharedindex.* files were generated in the .git
> >   directory during the rebase.
> 
> I think this works as intended.
> 
> A new shared index is written when the number of index entries that
> would be writen to '.git/index' is higher than a given percentage of
> the total number of index entries.  This percentage can be specified
> with the 'splitIndex.maxPercentChange' configuration variable and it
> defaults to 20%.  In your test repository above there is only a single
> file and it is modified in every commit, so when switching from one
> commit to the other 100% of the index entries would be written to
> '.git/index', resulting in a new shared index file written for each
> rebase step.

Good point; my attempt to create a simple reproducer may not be
sufficiently complex in this case.  But I don't think this is the source
of the problem for my real Linux kernel repo; see below.

> 
> > What's different between what you expected and what actually happened?
> > 
> >   Rather than a single shared index file, I wound up with a huge number of
> >   large shared index files.  The real repository I was working with (a Linux
> >   kernel source tree) had a shared index file size of about 7MB, and I was
> >   modifying a commit several hundred back in history (in case it
> >   matters, these were all linear commits, no merges), so the resulting
> >   collection of shared index files consumed a surprising amount of disk
> >   space.
> 
> The last commit in my somewhat outdated linux repo contains ~71k
> files, the 20% of that is ~14k.  Does that linear string of "several
> hundred" commits modify that many files?

No.  In the real Linux repo I'm working with, nearly all of the commits
are in the drm/i915 driver tree.  The overall diff of the patches being
rebased is

         614 files changed, 107114 insertions(+), 8751 deletions(-)

> 
> > Anything else you want to add:
> > 
> >   As an experiment, I tried setting splitIndex.sharedIndexExpire=now
> 
> I would advise against that, it's potentially dangerous, because it
> can remove shared index files that are still in use by other git
> processes.

Yeah, I also found sharedIndexExpire=now to be incompatible with a few
other commands such as "git stash" too.


> 
> >   to see
> >   if it would avoid the explosion of shared index files, but it appears the
> >   stale index files are still not being removed during the rebase, and I
> >   still wind up with a huge number at the end of the rebase.  If I manually
> >   run "git update-index --split-index" after the rebase completes it will
> >   properly delete all of the stale ones at that point.
> > 
> >   Rebases that do not actually modify the history do _not_ trigger the
> >   explosion of shared index files (e.g., "git rebase -i HEAD~10 --exec 'echo
> >   foo'").
> > 
> >   If I do not set the core.splitIndex setting on the repository, but only
> >   activate split index manually via "git update-index --split-index" there
> >   is only one shared index file at the end of the rebase, but based on the
> >   file size it appears the repository is no longer operating in split index
> >   mode.
> > 
> >   Before:
> >   $ ll .git | grep index
> >   -rw-rw-r--   1 mdroper mdroper   149165 Sep 15 22:21 index
> >   -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:21 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4
> > 
> >   After (just amending HEAD~1 in this case):
> >   $ ls -l .git | grep index
> >   -rw-rw-r--   1 mdroper mdroper  7445145 Sep 15 22:22 index
> >   -rw-rw-r--   1 mdroper mdroper  7296080 Sep 15 22:22 sharedindex.f916dd59ccc22ca34298f557a4659aca2767dae4
> 
> > git version 2.33.0
> 
> I could reproduce all this with v2.33.0 (except that I saw the split
> index being turned off even with core.splitIndex enabled), but was
> unable to do so with current master.
> 
> I think that this is a bug in the interaction between the split index
> feature and 'git rebase' when using the recursive merge strategy and
> when a couple of other, more subtle conditions are met.  It seems that
> with the right conditions rebase only writes regular index files, and
> by not entering the split index code paths it doesn't look for old
> shared index files to expire.
> 
> After v2.33.0 we switched the default merge strategy from recursive to
> 'ort', and with that these cases appear to work as intended, i.e. old
> shared index files are expired and the split index feature doesn't get
> turned off.  Since the 'ort' strategy is in many ways better (faster,
> more correct, etc.) than the recursive, I don't think it's worth the
> effort to try to fix this issue with split index, rebase and the
> recursive strategy.

Yeah, if this is specific to the recursive strategy then it's probably
not worth sinking too much time into tracking down.  I just tried
setting pull.twohead=ort in my config to make v2.33.0 also use the ort
strategy by default, and from some preliminary testing that does indeed
appear to solve the problem.

Thanks for looking into this!


Matt

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-27  2:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-16  5:50 [BUG REPORT] split-index behavior during interactive rebase Matt Roper
2021-09-21  7:34 ` Lucas De Marchi
2021-09-26 21:57 ` SZEDER Gábor
2021-09-27  2:17   ` Matt Roper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.