All of lore.kernel.org
 help / color / mirror / Atom feed
From: elton sky <eltonsky9404@gmail.com>
To: Git Mailing List <git@vger.kernel.org>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Thomas Rast <trast@student.ethz.ch>
Subject: Re: GSoC - Designing a faster index format
Date: Fri, 23 Mar 2012 21:27:32 +1100	[thread overview]
Message-ID: <CAKTdtZk4FJD9qXEybpN01+S=5fOm=4AbOp8trFr5c6Uxbfykkg@mail.gmail.com> (raw)
In-Reply-To: <CACsJy8AYs5bzRnhRj_R33qTt-2gPh-rJaO0=1iTva9n14wHB4w@mail.gmail.com>

Hi Nguyen, Jakub

Thank you for your explanations.

Just clarify question about track updated files:

On Fri, Mar 23, 2012 at 12:30 PM, Nguyen Thai Ngoc Duy
<pclouds@gmail.com> wrote:
> On Fri, Mar 23, 2012 at 3:32 AM, elton sky <eltonsky9404@gmail.com> wrote:
>> Got a few questions:
>>
>> 1. index is used for building next commit, so it should only include
>> files created/modified/deleted. But I see it has all entries for
>> current working dir. why?
>
> Jakub has answered this question.
>
>> 2. From read_index_from() I see the whole index is read into mem, and
>> write one by one (entry/ext) back to disk. This makes sense. But why
>> we have to compute Sha1 for all entries, especially unchanged entries?
>
> To catch disk corruption. If a bit is flipped anywhere in the index
> and we do not detect it, we may end up creating broken commits.
>
>> 3. how does git track updated files? Does it compare the ts between
>> working dir and index ? Or they are recorded somewhere?
>
> Check out refresh_cache_ent. At the beginning of most commands, they
> call refresh_index() or refresh_cache(), which checks a file's mtime
> against one stored in index (different means updated). In the worst
> scenario, refresh_cache_ent may call ce_compare_data(), which computes
> SHA-1 of the specified file and compare it with one stored in index.
>

This means working dir will compare each entry in index on mtime
field, to find out  if it's updated. The complexity for this operation
is O(nlogn). I assume the way of this checking is: it loops through
entries in the index, for each entry, it searches in working dir and
compare the mtime.

Because current index is a single steam of file, when it writes back
it has to write back everything sequentially. So we have to do
checksum for every entry. And I suppose this process is more time
consuming than previous step.

If we use a tree format, still, when looking for updated files, time
complexity is O(nlogn), i.e. we traverse the index entries and for
each entry we refer back to working dir. However, when we write index
back, we only need to recompute and write updated file nodes, but not
all entries. Total processing time benefit from here.

Please correct me if I am wrong.

-Elton


>> 4. When does git insert to cache tree? and when it retrieve from it?
>
> cache-tree is built from scratch in some cases, when we know HEAD (or
> some tree) matches index exactly (e.g. reset --hard). Usually it's
> only built up at commit time (update_main_cache_tree in
> builtin/commit.c).
> --
> Duy

  reply	other threads:[~2012-03-23 10:27 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 23:10 GSoC - Designing a faster index format elton sky
2012-03-21  1:18 ` Nguyen Thai Ngoc Duy
2012-03-21 11:25 ` Thomas Rast
2012-03-21 12:01   ` elton sky
2012-03-22 20:32     ` elton sky
2012-03-23  0:46       ` Jakub Narebski
2012-03-23  1:30       ` Nguyen Thai Ngoc Duy
2012-03-23 10:27         ` elton sky [this message]
2012-03-23 11:24           ` Nguyen Thai Ngoc Duy
     [not found]             ` <CAKTdtZmLOzAgG0uCDcVr+O41XPX-XnoVZjsZWPN-BLjq2oG-7A@mail.gmail.com>
2012-03-24  8:58               ` Nguyen Thai Ngoc Duy
     [not found]                 ` <CAKTdtZkpjVaBSkcieojKj+V7WztT3UDzjGfXyghY=S8mq+X9zw@mail.gmail.com>
     [not found]                   ` <CACsJy8D85thmK_5jLC7MxJtsitLr=zphKiw2miwPu7Exf7ty=Q@mail.gmail.com>
2012-03-26 12:36                     ` elton sky
2012-03-26 12:41                       ` elton sky
2012-03-26 14:28                       ` Thomas Rast
2012-03-26 15:25                         ` Nguyen Thai Ngoc Duy
2012-03-26 16:08                           ` Shawn Pearce
2012-03-27  2:49                             ` elton sky
2012-03-27  3:34                               ` David Barr
2012-03-27  6:33                                 ` Nguyen Thai Ngoc Duy
2012-03-29  9:45                                   ` Jeff King
2012-03-27  6:31                             ` Nguyen Thai Ngoc Duy
2012-03-26 16:19                         ` Nguyen Thai Ngoc Duy
2012-03-27  3:20                           ` elton sky
2012-03-27  6:43                             ` Nguyen Thai Ngoc Duy
2012-04-02 11:50                               ` elton sky
2012-04-02 12:31                                 ` Nguyen Thai Ngoc Duy
2012-04-02 14:27                                   ` Shawn Pearce
2012-04-02 15:12                                     ` Nguyen Thai Ngoc Duy
2012-04-04  8:26                                   ` elton sky
2012-04-04 12:20                                     ` Nguyen Thai Ngoc Duy
2012-04-04 16:22                                       ` elton sky
2012-04-06  3:13                                         ` elton sky
2012-04-06  3:15                                           ` elton sky
2012-04-07  8:29                                             ` elton sky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKTdtZk4FJD9qXEybpN01+S=5fOm=4AbOp8trFr5c6Uxbfykkg@mail.gmail.com' \
    --to=eltonsky9404@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.