All of lore.kernel.org
 help / color / mirror / Atom feed
* gc getting called on each git command ... what's wrong?
@ 2011-06-08  1:33 Geoff Russell
  2011-06-08  1:48 ` Peter Harris
  0 siblings, 1 reply; 10+ messages in thread
From: Geoff Russell @ 2011-06-08  1:33 UTC (permalink / raw)
  To: git

Hi all,

I'm running git version 1.7.0.4 on Ubuntu 10.04 LTS

As of today, almost every time I do a git command, gc is getting
invoked. This is a multi-gigabyte repository with over
half a million objects, so this takes a while ... and I'm guessing
that it shouldn't be happening anyway!

I've run an fsck (which doesn't do a gc!) and the repository looks
clean ... no output.

I have packSizeLimit set to 30M ... not sure why I did this, was
investigating something I
didn't understand.  There are 96 pack files.

Any help greatly appreciated, many thanks,

Cheers,
Geoff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08  1:33 gc getting called on each git command ... what's wrong? Geoff Russell
@ 2011-06-08  1:48 ` Peter Harris
  2011-06-08 16:02   ` Drew Northup
  2011-06-08 17:09   ` Jakub Narebski
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Harris @ 2011-06-08  1:48 UTC (permalink / raw)
  To: geoffrey.russell; +Cc: git

On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
>
> As of today, almost every time I do a git command, gc is getting
> invoked.

>   There are 96 pack files.

That's why. See gc.autopacklimit in "git help config" -- by default,
git will gc if there are more than 50 pack files.

Peter Harris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08  1:48 ` Peter Harris
@ 2011-06-08 16:02   ` Drew Northup
  2011-06-08 16:29     ` Brandon Casey
  2011-06-08 16:50     ` Junio C Hamano
  2011-06-08 17:09   ` Jakub Narebski
  1 sibling, 2 replies; 10+ messages in thread
From: Drew Northup @ 2011-06-08 16:02 UTC (permalink / raw)
  To: Peter Harris; +Cc: geoffrey.russell, git


On Tue, 2011-06-07 at 21:48 -0400, Peter Harris wrote:
> On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
> >
> > As of today, almost every time I do a git command, gc is getting
> > invoked.
<re-added>
> >   I have packSizeLimit set to 30M 
</re-added>
> >   There are 96 pack files.
> 
> That's why. See gc.autopacklimit in "git help config" -- by default,
> git will gc if there are more than 50 pack files.

Do we want to consider ignoring (or automatically doubling, or something
like that) gc.autopacklimit if that number of packs meet or exceed
gc.packSizeLimit? I have no idea what the patch for this might look
like, but it seems to make more sense than this situation.

Just a random brain fart...

-- 
-Drew Northup
________________________________________________
"As opposed to vegetable or mineral error?"
-John Pescatore, SANS NewsBites Vol. 12 Num. 59

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08 16:02   ` Drew Northup
@ 2011-06-08 16:29     ` Brandon Casey
  2011-06-08 16:50     ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Brandon Casey @ 2011-06-08 16:29 UTC (permalink / raw)
  To: Drew Northup; +Cc: Peter Harris, geoffrey.russell, git

On 06/08/2011 11:02 AM, Drew Northup wrote:
> 
> On Tue, 2011-06-07 at 21:48 -0400, Peter Harris wrote:
>> On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
>>>
>>> As of today, almost every time I do a git command, gc is getting
>>> invoked.
> <re-added>
>>>   I have packSizeLimit set to 30M 
> </re-added>
>>>   There are 96 pack files.
>>
>> That's why. See gc.autopacklimit in "git help config" -- by default,
>> git will gc if there are more than 50 pack files.
> 
> Do we want to consider ignoring (or automatically doubling, or something
> like that) gc.autopacklimit if that number of packs meet or exceed
> gc.packSizeLimit? I have no idea what the patch for this might look
> like, but it seems to make more sense than this situation.
> 
> Just a random brain fart...
> 

Or just ignore the packs that exceed pack.packSizeLimit...

diff --git a/builtin/gc.c b/builtin/gc.c
index ff5f73b..7be14ab 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -26,6 +26,7 @@ static int pack_refs = 1;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static off_t pack_size_limit;
 static const char *prune_expire = "2.weeks.ago";
 
 #define MAX_ADD 10
@@ -64,6 +65,10 @@ static int gc_config(const char *var, const char *value, void *cb)
 		}
 		return git_config_string(&prune_expire, var, value);
 	}
+	if (!strcmp(var, "pack.packsizelimit")) {
+		pack_size_limit = git_config_ulong(var, value);
+		return 0;
+	}
 	return git_default_config(var, value, cb);
 }
 
@@ -135,10 +140,8 @@ static int too_many_packs(void)
 			continue;
 		if (p->pack_keep)
 			continue;
-		/*
-		 * Perhaps check the size of the pack and count only
-		 * very small ones here?
-		 */
+		if (pack_size_limit && p->pack_size >= pack_size_limit)
+			continue;
 		cnt++;
 	}
 	return gc_auto_pack_limit <= cnt;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08 16:02   ` Drew Northup
  2011-06-08 16:29     ` Brandon Casey
@ 2011-06-08 16:50     ` Junio C Hamano
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2011-06-08 16:50 UTC (permalink / raw)
  To: Drew Northup; +Cc: Peter Harris, geoffrey.russell, git

Drew Northup <drew.northup@maine.edu> writes:

>> That's why. See gc.autopacklimit in "git help config" -- by default,
>> git will gc if there are more than 50 pack files.
>
> Do we want to consider ignoring (or automatically doubling, or something
> like that) gc.autopacklimit if that number of packs meet or exceed
> gc.packSizeLimit? I have no idea what the patch for this might look
> like, but it seems to make more sense than this situation.

This is unrelated to the auto-gc, but it also would be fruitful to
question if it is a sane setting to limit packfiles to 30M, when the
repository needs 100 of them (total around 3G??). Just like having too
many loose object files degrade performance (and that is one of the
reasons we pack them in the first place), having many packs will degrade
performance unnecessarily and to a worse degree, as "check which pack has
this particular object" code has to examine all packs, unlike the loose
object case where we let the .git/objects/?? fan-out to give us some
hashing and the filesystem to do the heavylifting for us.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08  1:48 ` Peter Harris
  2011-06-08 16:02   ` Drew Northup
@ 2011-06-08 17:09   ` Jakub Narebski
  2011-06-15  1:28     ` Geoff Russell
       [not found]     ` <BANLkTi=w10KQ3MSd5YuYR+S=eMgywNTY-A@mail.gmail.com>
  1 sibling, 2 replies; 10+ messages in thread
From: Jakub Narebski @ 2011-06-08 17:09 UTC (permalink / raw)
  To: Peter Harris; +Cc: geoffrey.russell, git

Peter Harris <git@peter.is-a-geek.org> writes:

> On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
> >
> > As of today, almost every time I do a git command, gc is getting
> > invoked.
> 
> >   There are 96 pack files.
> 
> That's why. See gc.autopacklimit in "git help config" -- by default,
> git will gc if there are more than 50 pack files.

Actually it looks like it is combination of this and packSizeLimit set
to 30M.  Git notices that it has too many packfiles, and tries to
repack them, but packlimit forces Git to split it into small
packfiles... and end up with more packfiles than limit anyway.

Perhaps git should notice that it has nonsensical combination of
options...
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-08 17:09   ` Jakub Narebski
@ 2011-06-15  1:28     ` Geoff Russell
       [not found]     ` <BANLkTi=w10KQ3MSd5YuYR+S=eMgywNTY-A@mail.gmail.com>
  1 sibling, 0 replies; 10+ messages in thread
From: Geoff Russell @ 2011-06-15  1:28 UTC (permalink / raw)
  To: git

On Thu, Jun 9, 2011 at 2:39 AM, Jakub Narebski <jnareb@gmail.com> wrote:
>
> Peter Harris <git@peter.is-a-geek.org> writes:
>
> > On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
> > >
> > > As of today, almost every time I do a git command, gc is getting
> > > invoked.
> >
> > >   There are 96 pack files.
> >
> > That's why. See gc.autopacklimit in "git help config" -- by default,
> > git will gc if there are more than 50 pack files.

Thanks to everybody. This is exactly what was happening and the problems went
away when I set the packSizeLimit higher ... 3000M

>
> Actually it looks like it is combination of this and packSizeLimit set
> to 30M.  Git notices that it has too many packfiles, and tries to
> repack them, but packlimit forces Git to split it into small
> packfiles... and end up with more packfiles than limit anyway.
>
> Perhaps git should notice that it has nonsensical combination of
> options...

That would be nice. It should be reasonably easy to work out that the
packSizeLimit
will guarantee too many pack files after the gc. Disobeying a users
wishes shouldn't
be undertaken lightly, but sometimes we stuff up :)

Cheers,
Geoff.

--
6 Fifth Ave,
St Morris, S.A. 5068
Australia
Ph: 041 8805 184 / 08 8332 5069
http://perfidy.com.au

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
       [not found]     ` <BANLkTi=w10KQ3MSd5YuYR+S=eMgywNTY-A@mail.gmail.com>
@ 2011-06-15 15:35       ` Jakub Narebski
  2011-06-16  1:46         ` Geoff Russell
  0 siblings, 1 reply; 10+ messages in thread
From: Jakub Narebski @ 2011-06-15 15:35 UTC (permalink / raw)
  To: Geoff Russell; +Cc: Peter Harris, git

On Wed, 15 Jun 2011, Geoff Russell wrote:
> On Thu, Jun 9, 2011 at 2:39 AM, Jakub Narebski <jnareb@gmail.com> wrote:
> > Peter Harris <git@peter.is-a-geek.org> writes:
> > > On Tue, Jun 7, 2011 at 9:33 PM, Geoff Russell wrote:
> > > >
> > > > As of today, almost every time I do a git command, gc is getting
> > > > invoked.
> > >
> > > >   There are 96 pack files.
> > >
> > > That's why. See gc.autopacklimit in "git help config" -- by default,
> > > git will gc if there are more than 50 pack files.
> >
> > Actually it looks like it is combination of this and packSizeLimit set
> > to 30M.  Git notices that it has too many packfiles, and tries to
> > repack them, but packlimit forces Git to split it into small
> > packfiles... and end up with more packfiles than limit anyway.
> 
> Thanks to everybody. This is exactly what was happening and the problems
> went away when I set the packSizeLimit higher ... 3000M
 
Why did you set packSizeLimit at all?

 
> >
> > Perhaps git should notice that it has nonsensical combination of
> > options...
> 
> That would be nice. It should be reasonably easy to work out that the
> packSizeLimit will guarantee too many pack files after the gc.
> Disobeying a users wishes shouldn't be undertaken lightly, but sometimes
> we stuff up :) 

Well, git can simply notice that each except perhaps on file has
size greater or equal to gc.packSizeLimit, and then ignore gc.autopacklimit
hint, because repacking would not reduce number of packs, and not lower
it below gc.autopacklimit.

If `git gc` is called interactively, we can warn user about this situation...

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-15 15:35       ` Jakub Narebski
@ 2011-06-16  1:46         ` Geoff Russell
  2011-06-16 14:14           ` Jakub Narebski
  0 siblings, 1 reply; 10+ messages in thread
From: Geoff Russell @ 2011-06-16  1:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Peter Harris, git

2011/6/16 Jakub Narebski <jnareb@gmail.com>
>
>
> Why did you set packSizeLimit at all?
>
>

Some time ago (31/8/2010) I had a problem which seemed to be caused by
large packs (>4GB), you
can find it in the git list with a subject of "Large pack causes git
clone failures ... what to do?"

Anyway, I set packSizeLimit and fiddled around for a bit ...
eventually the problem went
away when I moved the central repository to another machine with less load and
more memory. At which point I gave a sigh of relief and forgot to
remove the packSizeLimit
until recently bitten. But the original problem was probably nothing
to do with large
packs and hasn't recurred.

Cheers,
Geoff.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: gc getting called on each git command ... what's wrong?
  2011-06-16  1:46         ` Geoff Russell
@ 2011-06-16 14:14           ` Jakub Narebski
  0 siblings, 0 replies; 10+ messages in thread
From: Jakub Narebski @ 2011-06-16 14:14 UTC (permalink / raw)
  To: geoffrey.russell; +Cc: Peter Harris, git

On Thu, 16 Jun 2011, Geoff Russell wrote:
> 2011/6/16 Jakub Narebski <jnareb@gmail.com>
> >
> > Why did you set packSizeLimit at all?
> 
> Some time ago (31/8/2010) I had a problem which seemed to be caused by
> large packs (>4GB), you can find it in the git list with a subject of
> "Large pack causes git clone failures ... what to do?"

So why did you set packSizeLimit to such ridiculous low value, instead
of 2g (2 GB) or something?

-- 
Jakub Narebski
Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-16 14:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-08  1:33 gc getting called on each git command ... what's wrong? Geoff Russell
2011-06-08  1:48 ` Peter Harris
2011-06-08 16:02   ` Drew Northup
2011-06-08 16:29     ` Brandon Casey
2011-06-08 16:50     ` Junio C Hamano
2011-06-08 17:09   ` Jakub Narebski
2011-06-15  1:28     ` Geoff Russell
     [not found]     ` <BANLkTi=w10KQ3MSd5YuYR+S=eMgywNTY-A@mail.gmail.com>
2011-06-15 15:35       ` Jakub Narebski
2011-06-16  1:46         ` Geoff Russell
2011-06-16 14:14           ` Jakub Narebski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.