All of lore.kernel.org
 help / color / mirror / Atom feed
From: elton sky <eltonsky9404@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
	Thomas Rast <trast@student.ethz.ch>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: GSoC - Designing a faster index format
Date: Fri, 23 Mar 2012 07:32:08 +1100	[thread overview]
Message-ID: <CAKTdtZmYc=xz4zCPQiuSTUvdmbLRKXNWNL3N6_4Bj0gujYmRvw@mail.gmail.com> (raw)
In-Reply-To: <CAKTdtZkGP3KbMGf88yW7zcCjemUyEy_4CVNkLD0SV=Lm7=Kveg@mail.gmail.com>

Got a few questions:

1. index is used for building next commit, so it should only include
files created/modified/deleted. But I see it has all entries for
current working dir. why?

2. From read_index_from() I see the whole index is read into mem, and
write one by one (entry/ext) back to disk. This makes sense. But why
we have to compute Sha1 for all entries, especially unchanged entries?

3. how does git track updated files? Does it compare the ts between
working dir and index ? Or they are recorded somewhere?

4. When does git insert to cache tree? and when it retrieve from it?


Some early thoughts for the tree format:

We can use B tree like format. Keep the header in the beginning of the
file as is, but add file length (4bytes) and the pointer to extensions
(8bytes) into header.
Entry list follows the header. The entry starts with number of
children offsets (1 byte) followed by list of offsets (4 bytes each).
We can limit the number for balance. Other fields leave as is.
Extensions can locate in between entries.

Use Sha1 , rather than the path, as the key for each entry node. This
beats the case like 1000 files in a dir which breaks the balance of
the tree, as Thomas mentioned. If a file is updated, the old Sha1 can
be found in object dir. This also gives flexibility. We may use splay
tree, in order to move updated nodes close to the root. The downside
is full path has to be stored in entry.

Regards,
Elton

On Wed, Mar 21, 2012 at 11:01 PM, elton sky <eltonsky9404@gmail.com> wrote:
> Hi Nguyen, Thomas
>
> Thanks for the points &clues. Processing them...
>
> -Elton
>
> On Wed, Mar 21, 2012 at 10:25 PM, Thomas Rast <trast@student.ethz.ch> wrote:
>> elton sky <eltonsky9404@gmail.com> writes:
>>
>>> I got questions like: how each operations affect index? how cache tree
>>> data and index is stored?
>>> Maybe you can point me how I should catch up quickly. I went through
>>> the article "git-for-computer-scientists", that quite makes sense.
>>
>> In addition to what Nguyen Thai Ngoc Duy said, check out the
>> (sub)threads
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/190016/focus=190132
>>  [origins of the GSoC project idea]
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/192014/focus=192025
>>  [perspectives of core developers in reply to the idea]
>>
>>  http://thread.gmane.org/gmane.comp.version-control.git/186244/focus=186282
>>  http://thread.gmane.org/gmane.comp.version-control.git/186357
>>  [the last few discussions about cache-tree]
>>
>> --
>> Thomas Rast
>> trast@{inf,student}.ethz.ch

  reply	other threads:[~2012-03-22 20:32 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 23:10 GSoC - Designing a faster index format elton sky
2012-03-21  1:18 ` Nguyen Thai Ngoc Duy
2012-03-21 11:25 ` Thomas Rast
2012-03-21 12:01   ` elton sky
2012-03-22 20:32     ` elton sky [this message]
2012-03-23  0:46       ` Jakub Narebski
2012-03-23  1:30       ` Nguyen Thai Ngoc Duy
2012-03-23 10:27         ` elton sky
2012-03-23 11:24           ` Nguyen Thai Ngoc Duy
     [not found]             ` <CAKTdtZmLOzAgG0uCDcVr+O41XPX-XnoVZjsZWPN-BLjq2oG-7A@mail.gmail.com>
2012-03-24  8:58               ` Nguyen Thai Ngoc Duy
     [not found]                 ` <CAKTdtZkpjVaBSkcieojKj+V7WztT3UDzjGfXyghY=S8mq+X9zw@mail.gmail.com>
     [not found]                   ` <CACsJy8D85thmK_5jLC7MxJtsitLr=zphKiw2miwPu7Exf7ty=Q@mail.gmail.com>
2012-03-26 12:36                     ` elton sky
2012-03-26 12:41                       ` elton sky
2012-03-26 14:28                       ` Thomas Rast
2012-03-26 15:25                         ` Nguyen Thai Ngoc Duy
2012-03-26 16:08                           ` Shawn Pearce
2012-03-27  2:49                             ` elton sky
2012-03-27  3:34                               ` David Barr
2012-03-27  6:33                                 ` Nguyen Thai Ngoc Duy
2012-03-29  9:45                                   ` Jeff King
2012-03-27  6:31                             ` Nguyen Thai Ngoc Duy
2012-03-26 16:19                         ` Nguyen Thai Ngoc Duy
2012-03-27  3:20                           ` elton sky
2012-03-27  6:43                             ` Nguyen Thai Ngoc Duy
2012-04-02 11:50                               ` elton sky
2012-04-02 12:31                                 ` Nguyen Thai Ngoc Duy
2012-04-02 14:27                                   ` Shawn Pearce
2012-04-02 15:12                                     ` Nguyen Thai Ngoc Duy
2012-04-04  8:26                                   ` elton sky
2012-04-04 12:20                                     ` Nguyen Thai Ngoc Duy
2012-04-04 16:22                                       ` elton sky
2012-04-06  3:13                                         ` elton sky
2012-04-06  3:15                                           ` elton sky
2012-04-07  8:29                                             ` elton sky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKTdtZmYc=xz4zCPQiuSTUvdmbLRKXNWNL3N6_4Bj0gujYmRvw@mail.gmail.com' \
    --to=eltonsky9404@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.