All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] gc: correct gc.autoPackLimit documentation
@ 2016-06-25  1:14 Eric Wong
  2016-06-25  2:06 ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2016-06-25  1:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

I'm not sure if this is the best approach, or if changing
too_many_packs can be done without causing problems for
hosts of big repos.

-------8<-----
Subject: [PATCH] gc: correct gc.autoPackLimit documentation

I want to ensure there is only one pack in my repo to take
advantage of pack bitmaps.  Based on my reading of the
documentation, I configured gc.autoPackLimit=1 which led to
"gc --auto" constantly trying to repack on every invocation.

Update the documentation to reflect what is probably a
long-standing off-by-one bug in builtin/gc.c::too_many_packs:

	-	return gc_auto_pack_limit <= cnt;
	+	return gc_auto_pack_limit < cnt;

However, changing gc itself at this time may cause problems
for people who are already using gc.autoPackLimit=2 and
expect bitmaps to work for them.

Signed-off-by: Eric Wong <e@80x24.org>
---
 Documentation/config.txt | 2 +-
 Documentation/git-gc.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2e1b2e4..b0de3f1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1345,7 +1345,7 @@ gc.auto::
 	default value is 6700.  Setting this to 0 disables it.
 
 gc.autoPackLimit::
-	When there are more than this many packs that are not
+	When there at least this many packs that are not
 	marked with `*.keep` file in the repository, `git gc
 	--auto` consolidates them into one larger pack.  The
 	default	value is 50.  Setting this to 0 disables it.
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index fa15104..658612d 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -54,7 +54,7 @@ all loose objects are combined into a single pack using
 `git repack -d -l`.  Setting the value of `gc.auto` to 0
 disables automatic packing of loose objects.
 +
-If the number of packs exceeds the value of `gc.autoPackLimit`,
+If the number of packs matches or exceeds the value of `gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file)
 are consolidated into a single pack by using the `-A` option of
 'git repack'. Setting `gc.autoPackLimit` to 0 disables
-- 
EW

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC] gc: correct gc.autoPackLimit documentation
  2016-06-25  1:14 [RFC] gc: correct gc.autoPackLimit documentation Eric Wong
@ 2016-06-25  2:06 ` Jeff King
  2016-06-25  2:53   ` Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff King @ 2016-06-25  2:06 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git

On Sat, Jun 25, 2016 at 01:14:50AM +0000, Eric Wong wrote:

> I'm not sure if this is the best approach, or if changing
> too_many_packs can be done without causing problems for
> hosts of big repos.
> 
> -------8<-----
> Subject: [PATCH] gc: correct gc.autoPackLimit documentation
> 
> I want to ensure there is only one pack in my repo to take
> advantage of pack bitmaps.  Based on my reading of the
> documentation, I configured gc.autoPackLimit=1 which led to
> "gc --auto" constantly trying to repack on every invocation.

I'm not sure if you might be misinterpreting earlier advice on bitmaps
here. At the time of packing, bitmaps need for all of the objects to go
to a single pack (they cannot handle a case where one object in the pack
can reach another object that is not in the pack). But that is easily
done with "git repack -adb".

After that packing, you can add new packs that do not have bitmaps, and
the bitmaps will gracefully degrade. E.g., imagine master was at tip X
when you repacked with bitmaps, and now somebody has pushed to make it
tip Y.  Somebody then clones, asking for Y. The bitmap code will start
at Y and walk backwards. When it hits X, it stops walking as it can fill
in the rest of the reachability from there.

So you do have to walk X..Y the old-fashioned way, but that's generally
not a big problem for a few pushes.

IOW, I think trying to repack on every single push is probably overkill.
Yes, it will buy you a little savings on fetch requests, but whether it
is worthwhile to pack depends on:

 - how big the push was (e.g., 2 commits versus thousands; the bigger
   it is, the more you save per fetch

 - how big the repo is (the bigger it is, the more it costs to do the
   repack; packing is linear-ish effort in the number of objects in the
   repo)

 - how often you get fetches versus pushes (your cost is amortized
   across all the fetches)

There are numbers where it can be worth it to pack really aggressively,
but I doubt it's common. At GitHub we use a combination of number of
packs (and we try to keep it under 50) and size of objects not in the
"main" pack (I did a bunch of fancy logging and analysis of object
counts, bytes in packs, etc, at one point, and we basically realized
that for the common cases, all of the interesting metrics are roughly
proportional to the number of bytes that could be moved into the main
pack).

That's neither here nor there for the off-by-one in gc or its
documentation, of course, but just FYI.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] gc: correct gc.autoPackLimit documentation
  2016-06-25  2:06 ` Jeff King
@ 2016-06-25  2:53   ` Eric Wong
  2016-06-25  6:14     ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2016-06-25  2:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git

Jeff King <peff@peff.net> wrote:
> On Sat, Jun 25, 2016 at 01:14:50AM +0000, Eric Wong wrote:
> 
> > I'm not sure if this is the best approach, or if changing
> > too_many_packs can be done without causing problems for
> > hosts of big repos.
> > 
> > -------8<-----
> > Subject: [PATCH] gc: correct gc.autoPackLimit documentation
> > 
> > I want to ensure there is only one pack in my repo to take
> > advantage of pack bitmaps.  Based on my reading of the
> > documentation, I configured gc.autoPackLimit=1 which led to
> > "gc --auto" constantly trying to repack on every invocation.
> 
> I'm not sure if you might be misinterpreting earlier advice on bitmaps
> here. At the time of packing, bitmaps need for all of the objects to go
> to a single pack (they cannot handle a case where one object in the pack
> can reach another object that is not in the pack). But that is easily
> done with "git repack -adb".
> 
> After that packing, you can add new packs that do not have bitmaps, and
> the bitmaps will gracefully degrade. E.g., imagine master was at tip X
> when you repacked with bitmaps, and now somebody has pushed to make it
> tip Y.  Somebody then clones, asking for Y. The bitmap code will start
> at Y and walk backwards. When it hits X, it stops walking as it can fill
> in the rest of the reachability from there.

Ah, thanks, makes sense.  I was misinterpreting earlier advice
on bitmaps.

> That's neither here nor there for the off-by-one in gc or its
> documentation, of course, but just FYI.

I'm now inclined to fix the problem in gc and leave the
documentation as-is (unless it cause other problems...)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] gc: correct gc.autoPackLimit documentation
  2016-06-25  2:53   ` Eric Wong
@ 2016-06-25  6:14     ` Junio C Hamano
  2016-06-25  6:46       ` [PATCH] gc: fix off-by-one error with gc.autoPackLimit Eric Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2016-06-25  6:14 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jeff King, git

Eric Wong <e@80x24.org> writes:

>> That's neither here nor there for the off-by-one in gc or its
>> documentation, of course, but just FYI.
>
> I'm now inclined to fix the problem in gc and leave the
> documentation as-is (unless it cause other problems...)

I think that is the best in the longer term.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] gc: fix off-by-one error with gc.autoPackLimit
  2016-06-25  6:14     ` Junio C Hamano
@ 2016-06-25  6:46       ` Eric Wong
  2016-06-27 19:38         ` Jeff King
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Wong @ 2016-06-25  6:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git

This matches the documentation and allows gc.autoPackLimit=1
to maintain a single pack without attempting a repack on every
"git gc --auto" invocation.

Signed-off-by: Eric Wong <e@80x24.org>
---
 builtin/gc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c583aad..332bcf7 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -177,7 +177,7 @@ static int too_many_packs(void)
 		 */
 		cnt++;
 	}
-	return gc_auto_pack_limit <= cnt;
+	return gc_auto_pack_limit < cnt;
 }
 
 static void add_repack_all_option(void)
-- 
EW

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] gc: fix off-by-one error with gc.autoPackLimit
  2016-06-25  6:46       ` [PATCH] gc: fix off-by-one error with gc.autoPackLimit Eric Wong
@ 2016-06-27 19:38         ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2016-06-27 19:38 UTC (permalink / raw)
  To: Eric Wong; +Cc: Junio C Hamano, git

On Sat, Jun 25, 2016 at 06:46:47AM +0000, Eric Wong wrote:

> This matches the documentation and allows gc.autoPackLimit=1
> to maintain a single pack without attempting a repack on every
> "git gc --auto" invocation.
> 
> Signed-off-by: Eric Wong <e@80x24.org>
> ---
>  builtin/gc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index c583aad..332bcf7 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -177,7 +177,7 @@ static int too_many_packs(void)
>  		 */
>  		cnt++;
>  	}
> -	return gc_auto_pack_limit <= cnt;
> +	return gc_auto_pack_limit < cnt;
>  }

Looks good, and I cannot think of any real downside. "0" is special for
"do not use this limit", so you now have no way of asking to gc every
time. But why would you want to? Asking for 1 pack is effectively "gc if
something happened, otherwise do nothing".

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-27 19:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-25  1:14 [RFC] gc: correct gc.autoPackLimit documentation Eric Wong
2016-06-25  2:06 ` Jeff King
2016-06-25  2:53   ` Eric Wong
2016-06-25  6:14     ` Junio C Hamano
2016-06-25  6:46       ` [PATCH] gc: fix off-by-one error with gc.autoPackLimit Eric Wong
2016-06-27 19:38         ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.