All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fast-import: export correctly marks larger than 2^20-1
@ 2010-07-13 11:51 Raja R Harinath
  2010-07-13 18:31 ` Jonathan Nieder
  2010-08-10 14:20 ` Shawn Pearce
  0 siblings, 2 replies; 4+ messages in thread
From: Raja R Harinath @ 2010-07-13 11:51 UTC (permalink / raw)
  To: git; +Cc: Raja R Harinath, Shawn O. Pearce

dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
i.e., when the sparse array has more than two levels.  The bug was
that the 'base' counter was being shifted by 20 bits at level 3, and
then again by 10 bits at level 2, rather than a total shift of 20 bits
in this argument to the recursive call:

  (base + k) << m->shift

There are two ways to fix this correctly, the elegant:

  (base + k) << 10

and the one I chose due to edit distance:

  base + (k << m->shift)

Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Raja R Harinath <harinath@hurrynot.org>
---
 fast-import.c          |    2 +-
 t/t9300-fast-import.sh |   57 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 1 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 309f2c5..0c79289 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1666,7 +1666,7 @@ static void dump_marks_helper(FILE *f,
 	if (m->shift) {
 		for (k = 0; k < 1024; k++) {
 			if (m->data.sets[k])
-				dump_marks_helper(f, (base + k) << m->shift,
+				dump_marks_helper(f, base + (k << m->shift),
 					m->data.sets[k]);
 		}
 	} else {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 131f032..2aeed7b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -166,6 +166,63 @@ test_expect_success \
 	 test `git rev-parse --verify master:file2` \
 	    = `git rev-parse --verify verify--import-marks:copy-of-file2`'
 
+test_tick
+mt=$(git hash-object --stdin < /dev/null)
+: >input.blob
+: >marks.exp
+: >tree.exp
+
+cat >input.commit <<EOF
+commit refs/heads/verify--dump-marks
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+test the sparse array dumping routines with exponentially growing marks
+COMMIT
+EOF
+
+i=0
+l=4
+m=6
+n=7
+while test "$i" -lt 27; do
+    cat >>input.blob <<EOF
+blob
+mark :$l
+data 0
+blob
+mark :$m
+data 0
+blob
+mark :$n
+data 0
+EOF
+    echo "M 100644 :$l l$i" >>input.commit
+    echo "M 100644 :$m m$i" >>input.commit
+    echo "M 100644 :$n n$i" >>input.commit
+
+    echo ":$l $mt" >>marks.exp
+    echo ":$m $mt" >>marks.exp
+    echo ":$n $mt" >>marks.exp
+
+    printf "100644 blob $mt\tl$i\n" >>tree.exp
+    printf "100644 blob $mt\tm$i\n" >>tree.exp
+    printf "100644 blob $mt\tn$i\n" >>tree.exp
+
+    l=$(($l + $l))
+    m=$(($m + $m))
+    n=$(($l + $n))
+
+    i=$((1 + $i))
+done
+
+sort tree.exp > tree.exp_s
+
+test_expect_success 'A: export marks with large values' '
+	cat input.blob input.commit | git fast-import --export-marks=marks.large &&
+	git ls-tree refs/heads/verify--dump-marks >tree.out &&
+	test_cmp tree.exp_s tree.out &&
+	test_cmp marks.exp marks.large'
+
 ###
 ### series B
 ###
-- 
1.7.2.rc2.11.g03e33

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] fast-import: export correctly marks larger than 2^20-1
  2010-07-13 11:51 [PATCH] fast-import: export correctly marks larger than 2^20-1 Raja R Harinath
@ 2010-07-13 18:31 ` Jonathan Nieder
  2010-07-14  6:48   ` Raja R Harinath
  2010-08-10 14:20 ` Shawn Pearce
  1 sibling, 1 reply; 4+ messages in thread
From: Jonathan Nieder @ 2010-07-13 18:31 UTC (permalink / raw)
  To: Raja R Harinath; +Cc: git, Shawn O. Pearce, David Barr

(+cc David, who I think mentioned wishing for something like this)

Raja R Harinath wrote:

> Subject: fast-import: export correctly marks larger than 2^20-1

Thank you!  That would be a very good thing.

> dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
> i.e., when the sparse array has more than two levels.  The bug was
> that the 'base' counter was being shifted by 20 bits at level 3, and
> then again by 10 bits at level 2, rather than a total shift of 20 bits
> in this argument to the recursive call:

I haven’t read or grokked that code you are changing, so I can’t
comment on the substance of your patch.  In case no one with such
knowledge turns up, could you give a quick summary of what the
existing code does and why?

Barring that, v1.5.0-rc4~14^2~61 (Added option to export the marks
table when fast-import terminates., 2006-08-25) might be a good
starting point for a person looking to understand.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fast-import: export correctly marks larger than 2^20-1
  2010-07-13 18:31 ` Jonathan Nieder
@ 2010-07-14  6:48   ` Raja R Harinath
  0 siblings, 0 replies; 4+ messages in thread
From: Raja R Harinath @ 2010-07-14  6:48 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Shawn O. Pearce, David Barr

Hi,

Jonathan Nieder <jrnieder@gmail.com> writes:

> (+cc David, who I think mentioned wishing for something like this)
>
> Raja R Harinath wrote:
>
>> Subject: fast-import: export correctly marks larger than 2^20-1
>
> Thank you!  That would be a very good thing.

I needed this so that I could maintain two 'marks' counters, one for
commits counting up from 1, and one for files counting down from some
large value, where the file marks could be re-used.

  http://gitorious.org/~harinath/svn2git/rrh-svn2git/commit/ffc5270a6fa106fecad1a6a9f1520ca8f075c6b7

However, the 1M limit on marks is a bit too tight.  For instance, the
mono SVN repository has 160K commits, with one project alone using 60K
commits.  And one of the commits touched nearly 24K files.  The number
of marks used is uncomfortably close: within one order of magnitude of
the limit.  I'd like 10 more bits of breathing space, please :-)

>> dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
>> i.e., when the sparse array has more than two levels.  The bug was
>> that the 'base' counter was being shifted by 20 bits at level 3, and
>> then again by 10 bits at level 2, rather than a total shift of 20 bits
>> in this argument to the recursive call:
>
> I haven’t read or grokked that code you are changing, so I can’t
> comment on the substance of your patch.  In case no one with such
> knowledge turns up, could you give a quick summary of what the
> existing code does and why?

This is not particular quick or clear, but here goes.

The marks are stored in a sparse array data structure.  The sparse array
is represented as a 1024-tree, with object storage only at the leafs,
and every path from root to leaf being the same length.  So, each node
can, and does, have the notion of a level, which is represented by a
number of bits to shift.

When you try to lookup at a non-leaf, the lookup index can be shifted
right the given number of bits, and masked to 10 bits [1], to get the
sub-tree to continue lookup in.  IOW, all the indices in a subtree have
a common prefix.

In dump_marks_helper, the 'base' argument contains that common prefix to
the current tree.  In the recursive case, the common prefix of the
sub-tree is computed by taking this 'base', and adding to it the
contribution from this node, i.e., k << shift [2].  The bug was that
'base' was being shifted too, even though it already had the correct
value from its caller.

- Hari

[1] The code actually does something different, but equivalent.  It
    masks off the top bits immediately before recursing.  However, it's
    easier to explain dump_marks_helper if we think of the index
    surviving to the leaf.

[2] Now that I think of it, the other fix, that I called more elegant,
    is harder to explain.  So, let's not go with that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] fast-import: export correctly marks larger than 2^20-1
  2010-07-13 11:51 [PATCH] fast-import: export correctly marks larger than 2^20-1 Raja R Harinath
  2010-07-13 18:31 ` Jonathan Nieder
@ 2010-08-10 14:20 ` Shawn Pearce
  1 sibling, 0 replies; 4+ messages in thread
From: Shawn Pearce @ 2010-08-10 14:20 UTC (permalink / raw)
  To: Raja R Harinath, Junio C Hamano; +Cc: git

On Tue, Jul 13, 2010 at 4:51 AM, Raja R Harinath <harinath@hurrynot.org> wrote:
> dump_marks_helper() has a bug when dumping marks larger than 2^20-1,
> i.e., when the sparse array has more than two levels.  The bug was
> that the 'base' counter was being shifted by 20 bits at level 3, and
> then again by 10 bits at level 2, rather than a total shift of 20 bits
> in this argument to the recursive call:
>
>  (base + k) << m->shift
>
> There are two ways to fix this correctly, the elegant:
>
>  (base + k) << 10
>
> and the one I chose due to edit distance:
>
>  base + (k << m->shift)
>
> Cc: Shawn O. Pearce <spearce@spearce.org>
> Signed-off-by: Raja R Harinath <harinath@hurrynot.org>

Dang, that's a very old bug.  This change makes sense, thanks.

Acked-by: Shawn O. Pearce <spearce@spearce.org>

-- 
Shawn.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-08-10 14:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-13 11:51 [PATCH] fast-import: export correctly marks larger than 2^20-1 Raja R Harinath
2010-07-13 18:31 ` Jonathan Nieder
2010-07-14  6:48   ` Raja R Harinath
2010-08-10 14:20 ` Shawn Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.