All of lore.kernel.org
 help / color / mirror / Atom feed
From: Raja R Harinath <harinath@hurrynot.org>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: git@vger.kernel.org, "Shawn O. Pearce" <spearce@spearce.org>,
	David Barr <david.barr@cordelta.com>
Subject: Re: [PATCH] fast-import: export correctly marks larger than 2^20-1
Date: Wed, 14 Jul 2010 12:18:41 +0530	[thread overview]
Message-ID: <87iq4i8som.fsf@hariville.hurrynot.org> (raw)
In-Reply-To: <20100713183126.GA2458@burratino> (Jonathan Nieder's message of "Tue, 13 Jul 2010 13:31:27 -0500")

Hi,

Jonathan Nieder <jrnieder@gmail.com> writes:

> (+cc David, who I think mentioned wishing for something like this)
>
> Raja R Harinath wrote:
>
>> Subject: fast-import: export correctly marks larger than 2^20-1
>
> Thank you!  That would be a very good thing.

I needed this so that I could maintain two 'marks' counters, one for
commits counting up from 1, and one for files counting down from some
large value, where the file marks could be re-used.

  http://gitorious.org/~harinath/svn2git/rrh-svn2git/commit/ffc5270a6fa106fecad1a6a9f1520ca8f075c6b7

However, the 1M limit on marks is a bit too tight.  For instance, the
mono SVN repository has 160K commits, with one project alone using 60K
commits.  And one of the commits touched nearly 24K files.  The number
of marks used is uncomfortably close: within one order of magnitude of
the limit.  I'd like 10 more bits of breathing space, please :-)

>> dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
>> i.e., when the sparse array has more than two levels.  The bug was
>> that the 'base' counter was being shifted by 20 bits at level 3, and
>> then again by 10 bits at level 2, rather than a total shift of 20 bits
>> in this argument to the recursive call:
>
> I haven’t read or grokked that code you are changing, so I can’t
> comment on the substance of your patch.  In case no one with such
> knowledge turns up, could you give a quick summary of what the
> existing code does and why?

This is not particular quick or clear, but here goes.

The marks are stored in a sparse array data structure.  The sparse array
is represented as a 1024-tree, with object storage only at the leafs,
and every path from root to leaf being the same length.  So, each node
can, and does, have the notion of a level, which is represented by a
number of bits to shift.

When you try to lookup at a non-leaf, the lookup index can be shifted
right the given number of bits, and masked to 10 bits [1], to get the
sub-tree to continue lookup in.  IOW, all the indices in a subtree have
a common prefix.

In dump_marks_helper, the 'base' argument contains that common prefix to
the current tree.  In the recursive case, the common prefix of the
sub-tree is computed by taking this 'base', and adding to it the
contribution from this node, i.e., k << shift [2].  The bug was that
'base' was being shifted too, even though it already had the correct
value from its caller.

- Hari

[1] The code actually does something different, but equivalent.  It
    masks off the top bits immediately before recursing.  However, it's
    easier to explain dump_marks_helper if we think of the index
    surviving to the leaf.

[2] Now that I think of it, the other fix, that I called more elegant,
    is harder to explain.  So, let's not go with that.

  reply	other threads:[~2010-07-14  6:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-13 11:51 [PATCH] fast-import: export correctly marks larger than 2^20-1 Raja R Harinath
2010-07-13 18:31 ` Jonathan Nieder
2010-07-14  6:48   ` Raja R Harinath [this message]
2010-08-10 14:20 ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87iq4i8som.fsf@hariville.hurrynot.org \
    --to=harinath@hurrynot.org \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.