All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Duy Nguyen <pclouds@gmail.com>,
	Martin Fick <mfick@codeaurora.org>,
	Git List <git@vger.kernel.org>
Subject: Re: [PATCH] git exproll: steps to tackle gc aggression
Date: Thu, 08 Aug 2013 00:13:10 -0700	[thread overview]
Message-ID: <7va9ksbqpl.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <CALkWK0kmx8bi1ZT1CSk+iVnmmQJV4bNu1D7h1AHnRk7_TfhJ3Q@mail.gmail.com> (Ramkumar Ramachandra's message of "Wed, 7 Aug 2013 10:13:13 +0530")

Ramkumar Ramachandra <artagnon@gmail.com> writes:

> Junio C Hamano wrote:
>> Imagine we have a cheap way to enumerate the young objects without
>> the usual history traversal.
>
> Before we discuss the advantages, can you outline how we can possibly
> get this data without actually walking downwards from the roots
> (refs)? One way to do it is to pull data out of a log of ref updates
> (aka. reflog), but we both know how unreliable that can be.

My understanding of the topic is to come up with a way that is much
cheaper than the current "gc --auto" that involves recent history
walk to consolidate both loose objects and small young packs into
one, so that we can use that logic for "gc --auto".

The key phrase is "without the usual history traversal".  We are
talking about young objects, and they are likely to be reachable
from something (like reflog entries, if not refs).  We may include
unreachable cruft in the result in the "let's be quick and collect
them into a single young pack", and you will need to keep them while
reflog entries are alive, and you will need periodic sweeps with the
usual history walking to remove older crufts that recently have
become unreachable due to reflog expiry from packs anyway, so it is
not a problem for the pack that consolidates young objects into a
single pack to contain some unreachable crufts.

If you start from that assumption [*1*], the way to enumerate the
young objects without the usual history traversal should be fairly
obvious.

By definition, loose objects are all young because they were created
since the last "gc --auto".  Also pack .idx files know their own
creation timestamp to let you decide how old they are, you can see
how many objects there are in the corresponding .pack and how big it
is.

By doing an equivalent of "find .git/objects/[0-9a-f][0-9a-f]/", you
can enumerate the loose objects, and an equivalent of "show-ref"
will enumerate the objects in the pack that the .idx file you
determined to be small and young.

Note that *1* is an assumption. I do not know offhand if such a
"consolidate young objects quickly into one to keep the number of
packs small" strategy is an overall win.

  reply	other threads:[~2013-08-08  7:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-06  2:38 [PATCH] git exproll: steps to tackle gc aggression Ramkumar Ramachandra
2013-08-06 12:24 ` Duy Nguyen
2013-08-06 17:39   ` Junio C Hamano
2013-08-07  4:43     ` Ramkumar Ramachandra
2013-08-08  7:13       ` Junio C Hamano [this message]
2013-08-08  7:44         ` Ramkumar Ramachandra
2013-08-08 16:56           ` Junio C Hamano
2013-08-08 17:34             ` Martin Fick
2013-08-08 18:52               ` Junio C Hamano
2013-08-08 19:14                 ` Ramkumar Ramachandra
2013-08-08 17:36             ` Ramkumar Ramachandra
2013-08-08 19:37               ` Junio C Hamano
2013-08-08 20:04                 ` Ramkumar Ramachandra
2013-08-08 21:09                   ` Martin Langhoff
2013-08-09 11:00                   ` Jeff King
2013-08-09 13:34                     ` Ramkumar Ramachandra
2013-08-09 17:35                       ` Junio C Hamano
2013-08-09 22:16                       ` Jeff King
2013-08-10  1:24                         ` Duy Nguyen
2013-08-10  9:50                           ` Jeff King
2013-08-10  5:26                         ` Junio C Hamano
2013-08-10  8:42                         ` Ramkumar Ramachandra
2013-08-10  9:24                           ` Duy Nguyen
2013-08-10  9:28                             ` Duy Nguyen
2013-08-10  9:43                             ` Jeff King
2013-08-10  9:50                               ` Duy Nguyen
2013-08-10 10:05                                 ` Ramkumar Ramachandra
2013-08-10 10:16                                   ` Duy Nguyen
2013-08-10  9:38                           ` Jeff King
2013-08-07  0:10   ` Martin Fick
2013-08-08  2:18     ` Duy Nguyen
2013-08-07  0:25 ` Martin Fick
2013-08-07  4:36   ` Ramkumar Ramachandra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7va9ksbqpl.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=artagnon@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=mfick@codeaurora.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.