All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin Fick <mfick@codeaurora.org>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Git List <git@vger.kernel.org>
Subject: Re: [PATCH] git exproll: steps to tackle gc aggression
Date: Tue, 6 Aug 2013 18:25:28 -0600	[thread overview]
Message-ID: <201308061825.28579.mfick@codeaurora.org> (raw)
In-Reply-To: <1375756727-1275-1-git-send-email-artagnon@gmail.com>

On Monday, August 05, 2013 08:38:47 pm Ramkumar Ramachandra 
wrote:
> This is the rough explanation I wrote down after reading
> it:
> 
> So, the problem is that my .git/objects/pack is polluted
> with little packs everytime I fetch (or push, if you're
> the server), and this is problematic from the
> perspective of a overtly (naively) aggressive gc that
> hammers out all fragmentation.  So, on the first run,
> the little packfiles I have are all "consolidated" into
> big packfiles; you also write .keep files to say that
> "don't gc these big packs we just generated".  In
> subsequent runs, the little packfiles from the fetch are
> absorbed into a pack that is immune to gc.  You're also
> using a size heuristic, to consolidate similarly sized
> packfiles.  You also have a --ratio to tweak the ratio
> of sizes.
> 
> From: Martin Fick<mfick@codeaurora.org>
> See: https://gerrit-review.googlesource.com/#/c/35215/
> Thread:
> http://thread.gmane.org/gmane.comp.version-control.git/2
> 31555 (Martin's emails are missing from the archive)
> ---

After analyzing today's data, I recognize that in some 
circumstances the size estimation after consolidation can be 
off by huge amounts.  The script naively just adds the 
current sizes together.  This gives a very rough estimate, 
of the new packfile size, but sometimes it can be off by 
over 2 orders of magnitude. :(  While many new packfiles are 
tiny (several K only), it seems like the larger new 
packfiles have a terrible tendency to throw the estimate way 
off (I suspect they simply have many duplicate objects).  
But despite this poor estimate, the script still offers 
drastic improvements over plain git gc.

So, it has me wondering if there isn't a more accurate way 
to estimate the new packfile without wasting a ton of time?

If not, one approach which might be worth experimenting with 
is to just assume that new packfiles have size 0!  Then just 
consolidate them with any other packfile which is ready for 
consolidation, or if none are ready, with the smallest 
packfile.  I would not be surprised to see this work on 
average better than the current summation,

-Martin


-- 
The Qualcomm Innovation Center, Inc. is a member of Code 
Aurora Forum, hosted by The Linux Foundation
 

  parent reply	other threads:[~2013-08-07  0:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  2:38 [PATCH] git exproll: steps to tackle gc aggression Ramkumar Ramachandra
2013-08-06 12:24 ` Duy Nguyen
2013-08-06 17:39   ` Junio C Hamano
2013-08-07  4:43     ` Ramkumar Ramachandra
2013-08-08  7:13       ` Junio C Hamano
2013-08-08  7:44         ` Ramkumar Ramachandra
2013-08-08 16:56           ` Junio C Hamano
2013-08-08 17:34             ` Martin Fick
2013-08-08 18:52               ` Junio C Hamano
2013-08-08 19:14                 ` Ramkumar Ramachandra
2013-08-08 17:36             ` Ramkumar Ramachandra
2013-08-08 19:37               ` Junio C Hamano
2013-08-08 20:04                 ` Ramkumar Ramachandra
2013-08-08 21:09                   ` Martin Langhoff
2013-08-09 11:00                   ` Jeff King
2013-08-09 13:34                     ` Ramkumar Ramachandra
2013-08-09 17:35                       ` Junio C Hamano
2013-08-09 22:16                       ` Jeff King
2013-08-10  1:24                         ` Duy Nguyen
2013-08-10  9:50                           ` Jeff King
2013-08-10  5:26                         ` Junio C Hamano
2013-08-10  8:42                         ` Ramkumar Ramachandra
2013-08-10  9:24                           ` Duy Nguyen
2013-08-10  9:28                             ` Duy Nguyen
2013-08-10  9:43                             ` Jeff King
2013-08-10  9:50                               ` Duy Nguyen
2013-08-10 10:05                                 ` Ramkumar Ramachandra
2013-08-10 10:16                                   ` Duy Nguyen
2013-08-10  9:38                           ` Jeff King
2013-08-07  0:10   ` Martin Fick
2013-08-08  2:18     ` Duy Nguyen
2013-08-07  0:25 ` Martin Fick [this message]
2013-08-07  4:36   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201308061825.28579.mfick@codeaurora.org \
    --to=mfick@codeaurora.org \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.