All of lore.kernel.org
 help / color / mirror / Atom feed
From: elton sky <eltonsky9404@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: GSoC - Designing a faster index format
Date: Tue, 27 Mar 2012 14:20:07 +1100	[thread overview]
Message-ID: <CAKTdtZnxSRffZ5xAq+SgW6fmy+b3P2Fu3AZmBB1jmGca6HmJAw@mail.gmail.com> (raw)
In-Reply-To: <CACsJy8AqQdWO4E2oYTMLbpYhxobH8iXE-jXPoj2BcEGtfh+T=Q@mail.gmail.com>

Hi Nguyen,

Thanks for the idea. just a few questions

On Tue, Mar 27, 2012 at 3:19 AM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Mon, Mar 26, 2012 at 9:28 PM, Thomas Rast <trast@student.ethz.ch> wrote:
>> Doesn't that venture into database land?
>
> How about this (a bit like memory management). Maybe it's simpler than
> a database and fits us better.
>
> The header consists of crc32 and three uint32_t, one points to the
> root tree, one the first extension block, the last one the free list
> at the end of the file. The rest of the file contains sizable blocks.
> There can be free space between them. Free spaces (offset and size)
> are recorded at the end of the file, pointed in header. The header's
> crc32 covers the header and free list.

How do you record free spaces at the end of file? Are you gonna have a
fixed size for the index and reserve space for free spaces offsets.

>
> When we need a new block, we look up in free list. If we cannot find a
> suitable space, we append to the end of the file (moving free list
> further to keep it always the end of the file). Removing a block means
> marking it in free list. We only truncate if there is free space at
> the end. Operations that we know will scratch the whole index are our
> opportunity to rewrite the index and make it compact again. No random
> garbage collection (iow disk is cheap).
>

I agree with you. Maybe we just ignore free spaces in the index and
let a background thread to compact it.

> A block starts with a signature (a tree block, or an extension...). A
> tree block consists of:
>
>  - uint32_t tree object's size
>  - sha-1 of tree object
>  - crc32 of the rest of the block except tree object
>  - maybe reference counter of a block can be refered by many blocks??
>  - tree object (i.e. something that tree-walk.c can parse)

Do you mean each block contains a tree and all its blobs? So the tree
object here, effectively a dir, also contains files in the dir ? In
this way, some blocks can be very big.

>  - other index attributes, stored separately in the same order as in
> tree object above, uint32_t block offset of subdirectories.

There can be many sub dirs some times. But maybe not a prob.

As tree object and  offset of subdirectories are variables, how do you
make a block resizable?

>
> An extension block basically consists of what we have now in an
> extension plus uint32_t offset to the next extension block, so we can
> keep track of all extensions. crc32 is used for extension blocks.
>
> This way we only need to verify checksum of the header (and free list)
> and blocks we visit. We don't need cache-tree extension because it's
> part of the format. There will be headache with unpack-trees.c because
> of entry order change. But in the end we would use the same order tree
> objects are using now, much simpler for us.
> --
> Duy

Cheers,
Elton

  reply	other threads:[~2012-03-27  3:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 23:10 GSoC - Designing a faster index format elton sky
2012-03-21  1:18 ` Nguyen Thai Ngoc Duy
2012-03-21 11:25 ` Thomas Rast
2012-03-21 12:01   ` elton sky
2012-03-22 20:32     ` elton sky
2012-03-23  0:46       ` Jakub Narebski
2012-03-23  1:30       ` Nguyen Thai Ngoc Duy
2012-03-23 10:27         ` elton sky
2012-03-23 11:24           ` Nguyen Thai Ngoc Duy
     [not found]             ` <CAKTdtZmLOzAgG0uCDcVr+O41XPX-XnoVZjsZWPN-BLjq2oG-7A@mail.gmail.com>
2012-03-24  8:58               ` Nguyen Thai Ngoc Duy
     [not found]                 ` <CAKTdtZkpjVaBSkcieojKj+V7WztT3UDzjGfXyghY=S8mq+X9zw@mail.gmail.com>
     [not found]                   ` <CACsJy8D85thmK_5jLC7MxJtsitLr=zphKiw2miwPu7Exf7ty=Q@mail.gmail.com>
2012-03-26 12:36                     ` elton sky
2012-03-26 12:41                       ` elton sky
2012-03-26 14:28                       ` Thomas Rast
2012-03-26 15:25                         ` Nguyen Thai Ngoc Duy
2012-03-26 16:08                           ` Shawn Pearce
2012-03-27  2:49                             ` elton sky
2012-03-27  3:34                               ` David Barr
2012-03-27  6:33                                 ` Nguyen Thai Ngoc Duy
2012-03-29  9:45                                   ` Jeff King
2012-03-27  6:31                             ` Nguyen Thai Ngoc Duy
2012-03-26 16:19                         ` Nguyen Thai Ngoc Duy
2012-03-27  3:20                           ` elton sky [this message]
2012-03-27  6:43                             ` Nguyen Thai Ngoc Duy
2012-04-02 11:50                               ` elton sky
2012-04-02 12:31                                 ` Nguyen Thai Ngoc Duy
2012-04-02 14:27                                   ` Shawn Pearce
2012-04-02 15:12                                     ` Nguyen Thai Ngoc Duy
2012-04-04  8:26                                   ` elton sky
2012-04-04 12:20                                     ` Nguyen Thai Ngoc Duy
2012-04-04 16:22                                       ` elton sky
2012-04-06  3:13                                         ` elton sky
2012-04-06  3:15                                           ` elton sky
2012-04-07  8:29                                             ` elton sky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKTdtZnxSRffZ5xAq+SgW6fmy+b3P2Fu3AZmBB1jmGca6HmJAw@mail.gmail.com \
    --to=eltonsky9404@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.