All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Thomas Rast <trast@student.ethz.ch>
Cc: elton sky <eltonsky9404@gmail.com>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: GSoC - Designing a faster index format
Date: Mon, 26 Mar 2012 23:19:36 +0700	[thread overview]
Message-ID: <CACsJy8AqQdWO4E2oYTMLbpYhxobH8iXE-jXPoj2BcEGtfh+T=Q@mail.gmail.com> (raw)
In-Reply-To: <87iphrjv23.fsf@thomas.inf.ethz.ch>

On Mon, Mar 26, 2012 at 9:28 PM, Thomas Rast <trast@student.ethz.ch> wrote:
> Doesn't that venture into database land?

How about this (a bit like memory management). Maybe it's simpler than
a database and fits us better.

The header consists of crc32 and three uint32_t, one points to the
root tree, one the first extension block, the last one the free list
at the end of the file. The rest of the file contains sizable blocks.
There can be free space between them. Free spaces (offset and size)
are recorded at the end of the file, pointed in header. The header's
crc32 covers the header and free list.

When we need a new block, we look up in free list. If we cannot find a
suitable space, we append to the end of the file (moving free list
further to keep it always the end of the file). Removing a block means
marking it in free list. We only truncate if there is free space at
the end. Operations that we know will scratch the whole index are our
opportunity to rewrite the index and make it compact again. No random
garbage collection (iow disk is cheap).

A block starts with a signature (a tree block, or an extension...). A
tree block consists of:

 - uint32_t tree object's size
 - sha-1 of tree object
 - crc32 of the rest of the block except tree object
 - maybe reference counter of a block can be refered by many blocks??
 - tree object (i.e. something that tree-walk.c can parse)
 - other index attributes, stored separately in the same order as in
tree object above, uint32_t block offset of subdirectories.

An extension block basically consists of what we have now in an
extension plus uint32_t offset to the next extension block, so we can
keep track of all extensions. crc32 is used for extension blocks.

This way we only need to verify checksum of the header (and free list)
and blocks we visit. We don't need cache-tree extension because it's
part of the format. There will be headache with unpack-trees.c because
of entry order change. But in the end we would use the same order tree
objects are using now, much simpler for us.
-- 
Duy

  parent reply	other threads:[~2012-03-26 16:20 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 23:10 GSoC - Designing a faster index format elton sky
2012-03-21  1:18 ` Nguyen Thai Ngoc Duy
2012-03-21 11:25 ` Thomas Rast
2012-03-21 12:01   ` elton sky
2012-03-22 20:32     ` elton sky
2012-03-23  0:46       ` Jakub Narebski
2012-03-23  1:30       ` Nguyen Thai Ngoc Duy
2012-03-23 10:27         ` elton sky
2012-03-23 11:24           ` Nguyen Thai Ngoc Duy
     [not found]             ` <CAKTdtZmLOzAgG0uCDcVr+O41XPX-XnoVZjsZWPN-BLjq2oG-7A@mail.gmail.com>
2012-03-24  8:58               ` Nguyen Thai Ngoc Duy
     [not found]                 ` <CAKTdtZkpjVaBSkcieojKj+V7WztT3UDzjGfXyghY=S8mq+X9zw@mail.gmail.com>
     [not found]                   ` <CACsJy8D85thmK_5jLC7MxJtsitLr=zphKiw2miwPu7Exf7ty=Q@mail.gmail.com>
2012-03-26 12:36                     ` elton sky
2012-03-26 12:41                       ` elton sky
2012-03-26 14:28                       ` Thomas Rast
2012-03-26 15:25                         ` Nguyen Thai Ngoc Duy
2012-03-26 16:08                           ` Shawn Pearce
2012-03-27  2:49                             ` elton sky
2012-03-27  3:34                               ` David Barr
2012-03-27  6:33                                 ` Nguyen Thai Ngoc Duy
2012-03-29  9:45                                   ` Jeff King
2012-03-27  6:31                             ` Nguyen Thai Ngoc Duy
2012-03-26 16:19                         ` Nguyen Thai Ngoc Duy [this message]
2012-03-27  3:20                           ` elton sky
2012-03-27  6:43                             ` Nguyen Thai Ngoc Duy
2012-04-02 11:50                               ` elton sky
2012-04-02 12:31                                 ` Nguyen Thai Ngoc Duy
2012-04-02 14:27                                   ` Shawn Pearce
2012-04-02 15:12                                     ` Nguyen Thai Ngoc Duy
2012-04-04  8:26                                   ` elton sky
2012-04-04 12:20                                     ` Nguyen Thai Ngoc Duy
2012-04-04 16:22                                       ` elton sky
2012-04-06  3:13                                         ` elton sky
2012-04-06  3:15                                           ` elton sky
2012-04-07  8:29                                             ` elton sky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACsJy8AqQdWO4E2oYTMLbpYhxobH8iXE-jXPoj2BcEGtfh+T=Q@mail.gmail.com' \
    --to=pclouds@gmail.com \
    --cc=eltonsky9404@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.