git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Shawn Pearce <spearce@spearce.org>
Cc: git <git@vger.kernel.org>
Subject: Re: reftable: new ref storage format
Date: Sun, 16 Jul 2017 06:01:41 -0400	[thread overview]
Message-ID: <20170716100141.h4skqqod6lq5s5cc@sigill.intra.peff.net> (raw)
In-Reply-To: <CAJo=hJtMo4OSxcYbq4oecTQYnwTR0zK8HgyqVEhOYZ-4eu4S9w@mail.gmail.com>

On Sat, Jul 15, 2017 at 11:01:47PM -0700, Shawn Pearce wrote:

> > How deep would you anticipate stacks getting? Would it be feasible for
> > the tip to contain the names of the tables in the entire chain? If we're
> > talking about 20 (or even 32) bytes per name, you could still fit over a
> > hundred names in a 4K inode.
> 
> I think we'd want to keep the stacks under 32, which is a reasonable
> amount of space used in the header of each reftable. I don't have this
> yet in my updated document + implementation, but I'll look at trying
> to add it over the next couple of days. Your idea to hold the explicit
> list of the stack in each reftable makes for a very safe atomic reader
> view.

Great.  I was thinking about this a bit and have one more possible
tweak.

If you store the names of the dependent reftables in each table, then
you end up using storage quadratic in the size of the stack. Because
the bottom-most table has 0 pointers, then the next one has 1, and then
next one has 2, and so on, until the nth one has n.

Now we're talking about n=32 here, so that's probably OK.

But one variant is that the reftables _don't_ know about their
ancestors. Instead, the list of reftables is kept in a top-level pointer
file, and it's that pointer file which is rewritten on update. I.e., a
write is something like:
 
   1. Take reftable.lock

   2. Write reftables/1234abcd to represent your update.

   3. Copy the old reftable to reftable.lock, then append "1234abcd".

   4. Atomic rename into place.

And the reader is just:

  1. Open reftable, read the list of tables.

  2. In parallel, open/fetch each of the tables and find your starting
     pointer for iteration/lookup.

  3. Do a list-merge on the open tables.

The one thing you lose is that "unreachable" reftables no longer form a
meaningful hierarchy. With the pointers inside the reftables themselves,
if your "reftable" file got corrupted, you could find the dangling table
at the apex of the graph and have a good guess at the ref state.
Without, you just have a jumble of states and you don't know which takes
precedence (though you could probably make a good guess from mtimes).

> I added log support to the reftable format. I updated [1] to reflect
> log blocks at the end of the file. I ran a year's worth of log
> records, 149,932 log entries on 43,061 refs to test:

Cool. I'll be on vacation for the next week, so apologies if I don't
keep the discussion going. But I'm very excited about the overall
direction. :)

-Peff

  reply	other threads:[~2017-07-16 10:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13  0:17 reftable: new ref storage format Shawn Pearce
2017-07-13 19:32 ` Jeff King
2017-07-13 19:56   ` Stefan Beller
2017-07-13 20:35     ` Jeff King
2017-07-13 21:51       ` Eric Wong
2017-07-14  0:27       ` Shawn Pearce
2017-07-14 20:10         ` Jeff King
2017-07-14  0:11   ` Shawn Pearce
2017-07-14 14:27     ` Dave Borowitz
2017-07-14 15:31       ` Shawn Pearce
2017-07-14 20:08     ` Jeff King
2017-07-16  6:01       ` Shawn Pearce
2017-07-16 10:01         ` Jeff King [this message]
2017-07-16  8:07       ` Johannes Sixt
2017-07-16 10:03         ` Jeff King
2017-07-16 10:10           ` Johannes Sixt
2017-07-16 17:33 ` Michael Haggerty
2017-07-16 19:43   ` Shawn Pearce
2017-07-16 21:12     ` Shawn Pearce
2017-07-16 21:13     ` Dave Borowitz
2017-07-16 21:31       ` Shawn Pearce
2017-07-18  1:43     ` Michael Haggerty
2017-07-18 18:53       ` Junio C Hamano
2017-07-23 22:56       ` Shawn Pearce
2017-07-23 23:03         ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170716100141.h4skqqod6lq5s5cc@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).