All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Barr <davidbarr@google.com>
To: elton sky <eltonsky9404@gmail.com>
Cc: Shawn Pearce <spearce@spearce.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: GSoC - Designing a faster index format
Date: Tue, 27 Mar 2012 14:34:44 +1100	[thread overview]
Message-ID: <CAFfmPPM_GOkOp6-tE2=YxdrZq6TL3s4EgOjXdRKf8+ffMD29xg@mail.gmail.com> (raw)
In-Reply-To: <CAKTdtZngYaTCwd5cri=XjUu3-o44ECjDotrDBNxqYL-Kcsosnw@mail.gmail.com>

On Tue, Mar 27, 2012 at 1:49 PM, elton sky <eltonsky9404@gmail.com> wrote:
> Thanks Shawn,
>
>> Or use LevelDb[2]. Its BSD license. Uses an immutable file format, but
>> writes updates to new smaller files and eventually collapses
>> everything back together into a bigger file. This can be a
>> dramatically simpler approach than dealing with your own free block
>> system inside of a single file. Its only real downside is needing to
>> periodically pay a penalty to rewrite the whole index. But this
>> rewrite is going to be faster than the time it takes to rewrite the
>> pack files for the same repository, which git gc or git repack
>> handles. So I don't think its actually a problem for the index.
>>
>> You might even be able to take a two level approach to compacting the
>> LevelDb database (or something like it). In a minor compaction you
>> compact all of the files except the huge base file, leaving you with 2
>> files. A huge base file that contains the first tree the user checked
>> out, and a second smaller file containing any differences they have
>> since the initial checkout (this may just be updated stat data for a
>> handful of files that differed across two branches as they switched
>> back and forth). During a git gc or git repack, add a new stage to
>> collapse the base file and everything else into a single new base file
>> as a major compaction.
>>
>> [2] http://code.google.com/p/leveldb/
>
> I don't know leveldb, but like to have a look.
> Just realize this solution is kinda popular. HDFS also uses the
> similar image file with edit file format for its file block index.

Another implementation in this general class is TinyCDB[1].
It is <1600 lines of plain C. Too few to be complete?
It is a derivative of DJB's CDB[2].

[1] http://www.corpit.ru/mjt/tinycdb.html
[2] http://cr.yp.to/cdb.html
--
David Barr

  reply	other threads:[~2012-03-27  3:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-20 23:10 GSoC - Designing a faster index format elton sky
2012-03-21  1:18 ` Nguyen Thai Ngoc Duy
2012-03-21 11:25 ` Thomas Rast
2012-03-21 12:01   ` elton sky
2012-03-22 20:32     ` elton sky
2012-03-23  0:46       ` Jakub Narebski
2012-03-23  1:30       ` Nguyen Thai Ngoc Duy
2012-03-23 10:27         ` elton sky
2012-03-23 11:24           ` Nguyen Thai Ngoc Duy
     [not found]             ` <CAKTdtZmLOzAgG0uCDcVr+O41XPX-XnoVZjsZWPN-BLjq2oG-7A@mail.gmail.com>
2012-03-24  8:58               ` Nguyen Thai Ngoc Duy
     [not found]                 ` <CAKTdtZkpjVaBSkcieojKj+V7WztT3UDzjGfXyghY=S8mq+X9zw@mail.gmail.com>
     [not found]                   ` <CACsJy8D85thmK_5jLC7MxJtsitLr=zphKiw2miwPu7Exf7ty=Q@mail.gmail.com>
2012-03-26 12:36                     ` elton sky
2012-03-26 12:41                       ` elton sky
2012-03-26 14:28                       ` Thomas Rast
2012-03-26 15:25                         ` Nguyen Thai Ngoc Duy
2012-03-26 16:08                           ` Shawn Pearce
2012-03-27  2:49                             ` elton sky
2012-03-27  3:34                               ` David Barr [this message]
2012-03-27  6:33                                 ` Nguyen Thai Ngoc Duy
2012-03-29  9:45                                   ` Jeff King
2012-03-27  6:31                             ` Nguyen Thai Ngoc Duy
2012-03-26 16:19                         ` Nguyen Thai Ngoc Duy
2012-03-27  3:20                           ` elton sky
2012-03-27  6:43                             ` Nguyen Thai Ngoc Duy
2012-04-02 11:50                               ` elton sky
2012-04-02 12:31                                 ` Nguyen Thai Ngoc Duy
2012-04-02 14:27                                   ` Shawn Pearce
2012-04-02 15:12                                     ` Nguyen Thai Ngoc Duy
2012-04-04  8:26                                   ` elton sky
2012-04-04 12:20                                     ` Nguyen Thai Ngoc Duy
2012-04-04 16:22                                       ` elton sky
2012-04-06  3:13                                         ` elton sky
2012-04-06  3:15                                           ` elton sky
2012-04-07  8:29                                             ` elton sky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFfmPPM_GOkOp6-tE2=YxdrZq6TL3s4EgOjXdRKf8+ffMD29xg@mail.gmail.com' \
    --to=davidbarr@google.com \
    --cc=eltonsky9404@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.