All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] fix data corruption in fast-import
@ 2011-08-12 10:32 Dmitry Ivankov
  2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-12 10:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

Finally the bug reported first in [1] is solved and has a small testcase.
Preliminary attempts can be found in [2] for curious.
And the actual "3/3: fix" comes from [3].

Brief introduction. While testing huge imports produced by svn-fe I've found
a "failed to unpack delta" error in fast-import, which is actually caused by
"sha1 mismatch" error in a packfile, and this one is caused by a bug of producing
wrong deltas for tree objects in fast-import. 

Looks like that only 'M 040000 sha1_or_mark path' commands could trigger it. 
They were introduced in tags/v1.7.3-rc0~75^2
(30 Jun 2010  334fba65.. Teach fast-import to import subtrees named by tree id)
This series should resolve the bug for any copy/rename/set/delete trees scenario
anyway.

I've tested it on a gcc svn repository import - went fine, trees match the gcc
git mirror on github. One more test is ~700k commits from kde repository - fine
too.

1/3 just extracts a sha1 calculation function for 2/3
2/3 adds a die() for "corrupted" delta data and a testcase that triggers it
3/3 is the fix

[1] http://thread.gmane.org/gmane.comp.version-control.git/176753
[2] http://thread.gmane.org/gmane.comp.version-control.git/178007
[3] http://thread.gmane.org/gmane.comp.version-control.git/176753/focus=178053

Dmitry Ivankov (3):
  fast-import: extract object preparation function
  fast-import: add a check for tree delta base sha1
  fast-import: prevent producing bad delta

 fast-import.c          |   85 +++++++++++++++++++++++++++++++++++++++---------
 t/t9300-fast-import.sh |   38 +++++++++++++++++++++
 2 files changed, 107 insertions(+), 16 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] fast-import: extract object preparation function
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
@ 2011-08-12 10:32 ` Dmitry Ivankov
  2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-12 10:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

We're constructing raw objects and compute their sha1's in fast-import
just before saving them.

Extract header and sha1 computations so that we can get sha1 without
actually saving the object.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   32 +++++++++++++++++++++++++-------
 1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 7cc2262..d0f8580 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1006,6 +1006,30 @@ static void cycle_packfile(void)
 	start_packfile();
 }
 
+static void prepare_object_hash(
+	enum object_type type,
+	struct strbuf *dat,
+	unsigned char *hdr_out,
+	unsigned long *hdrlen_out,
+	unsigned char *sha1_out
+)
+{
+	unsigned char hdr_[96];
+	unsigned char *hdr = hdr_out ? hdr_out : hdr_;
+	unsigned long hdrlen;
+	git_SHA_CTX c;
+
+	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
+		(unsigned long)dat->len) + 1;
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, hdr, hdrlen);
+	git_SHA1_Update(&c, dat->buf, dat->len);
+	git_SHA1_Final(sha1_out, &c);
+
+	if (hdrlen_out)
+		*hdrlen_out = hdrlen;
+}
+
 static int store_object(
 	enum object_type type,
 	struct strbuf *dat,
@@ -1018,15 +1042,9 @@ static int store_object(
 	unsigned char hdr[96];
 	unsigned char sha1[20];
 	unsigned long hdrlen, deltalen;
-	git_SHA_CTX c;
 	git_zstream s;
 
-	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
-		(unsigned long)dat->len) + 1;
-	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, hdr, hdrlen);
-	git_SHA1_Update(&c, dat->buf, dat->len);
-	git_SHA1_Final(sha1, &c);
+	prepare_object_hash(type, dat, hdr, &hdrlen, sha1);
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] fast-import: add a check for tree delta base sha1
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
  2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
@ 2011-08-12 10:32 ` Dmitry Ivankov
  2011-08-13 21:02   ` Jonathan Nieder
  2011-08-12 10:32 ` [PATCH 3/3] fast-import: prevent producing bad delta Dmitry Ivankov
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-12 10:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

fast-import is able to write imported tree objects in delta format.
It holds a tree structure in memory where each tree entry may have
a delta base sha1 assigned. When delta base data is needed it is
reconstructed from this in-memory structure. Though sometimes the
delta base data doesn't match the delta base sha1 so wrong or even
corrupt pack is produced.

To create a small easily reproducible test, add an excessive check
for delta base sha1. It's not likely that computing sha1 for each
tree delta base costs us much.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c          |   20 +++++++++++++++-----
 t/t9300-fast-import.sh |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index d0f8580..8196d1b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1455,12 +1455,22 @@ static void store_tree(struct tree_entry *root)
 			store_tree(t->entries[i]);
 	}
 
-	le = find_object(root->versions[0].sha1);
-	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
+	if (!is_null_sha1(root->versions[0].sha1)
+					&& S_ISDIR(root->versions[0].mode)) {
+		unsigned char old_tree_sha1[20];
 		mktree(t, 0, &old_tree);
-		lo.data = old_tree;
-		lo.offset = le->idx.offset;
-		lo.depth = t->delta_depth;
+		prepare_object_hash(OBJ_TREE, &old_tree,
+						NULL, NULL, old_tree_sha1);
+
+		if (hashcmp(old_tree_sha1, root->versions[0].sha1))
+			die("internal tree delta base sha1 mismatch");
+
+		le = find_object(root->versions[0].sha1);
+		if (le && le->pack_id == pack_id) {
+			lo.data = old_tree;
+			lo.offset = le->idx.offset;
+			lo.depth = t->delta_depth;
+		}
 	}
 
 	mktree(t, 1, &new_tree);
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index f256475..c70e489 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -734,6 +734,44 @@ test_expect_success \
 	 git diff-tree --abbrev --raw L^ L >output &&
 	 test_cmp expect output'
 
+cat >input <<INPUT_END
+blob
+mark :1
+data <<EOF
+the data
+EOF
+
+commit refs/heads/L2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+init L2
+COMMIT
+
+M 644 :1 a/b/c
+M 644 :1 a/b/d
+M 644 :1 a/e/f
+INPUT_END
+
+cat >input2 <<INPUT_END
+commit refs/heads/L2
+committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+data <<COMMIT
+update L2
+COMMIT
+from refs/heads/L2^0
+M 040000 @A g
+M 040000 @E g/b
+M 040000 @E g/b/h
+INPUT_END
+
+test_expect_failure \
+    'L: verify internal tree delta base' \
+	'git fast-import <input &&
+	A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) &&
+	E=$(git ls-tree L2 a/e | tr " " "\t" | cut -f 3) &&
+	cat input2 | sed -e "s/@A/$A/" -e "s/@E/$E/" >input &&
+	git fast-import <input'
+
 ###
 ### series M
 ###
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] fast-import: prevent producing bad delta
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
  2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
  2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
@ 2011-08-12 10:32 ` Dmitry Ivankov
  2011-08-14 18:32 ` [PATCH v2 0/2] fix data corruption in fast-import Dmitry Ivankov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-12 10:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

To produce deltas for tree objects fast-import tracks two versions
of tree's entries - base and current one. Base version stands both
for a delta base of this tree, and for a entry inside a delta base
of a parent tree. So care should be taken to keep it in sync.

tree_content_set cuts away a whole subtree and replaces it with a
new one (or NULL for lazy load of a tree with known sha1). It
keeps a base sha1 for this subtree (needed for parent tree). And
here is the problem, 'subtree' tree root doesn't have the implied
base version entries.

Adjusting the subtree to include them would mean a deep rewrite of
subtree. Invalidating the subtree base version would mean recursive
invalidation of parents' base versions. So just mark this tree as
do-not-delta me. Abuse setuid bit for this purpose.

tree_content_replace is the same as tree_content_set except that is
is used to replace the root, so just clearing base sha1 here (instead
of setting the bit) is fine.

[di: log message]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c          |   33 +++++++++++++++++++++++++++++----
 t/t9300-fast-import.sh |    2 +-
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 8196d1b..d9049af 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -170,6 +170,11 @@ Format of STDIN stream:
 #define DEPTH_BITS 13
 #define MAX_DEPTH ((1<<DEPTH_BITS)-1)
 
+/*
+ * We abuse the setuid bit on directories to mean "do not delta".
+ */
+#define NO_DELTA S_ISUID
+
 struct object_entry {
 	struct pack_idx_entry idx;
 	struct object_entry *next;
@@ -1434,8 +1439,9 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 		struct tree_entry *e = t->entries[i];
 		if (!e->versions[v].mode)
 			continue;
-		strbuf_addf(b, "%o %s%c", (unsigned int)e->versions[v].mode,
-					e->name->str_dat, '\0');
+		strbuf_addf(b, "%o %s%c",
+			(unsigned int)(e->versions[v].mode & ~NO_DELTA),
+			e->name->str_dat, '\0');
 		strbuf_add(b, e->versions[v].sha1, 20);
 	}
 }
@@ -1445,7 +1451,7 @@ static void store_tree(struct tree_entry *root)
 	struct tree_content *t = root->tree;
 	unsigned int i, j, del;
 	struct last_object lo = { STRBUF_INIT, 0, 0, /* no_swap */ 1 };
-	struct object_entry *le;
+	struct object_entry *le = NULL;
 
 	if (!is_null_sha1(root->versions[1].sha1))
 		return;
@@ -1456,6 +1462,7 @@ static void store_tree(struct tree_entry *root)
 	}
 
 	if (!is_null_sha1(root->versions[0].sha1)
+					&& !(root->versions[0].mode & NO_DELTA)
 					&& S_ISDIR(root->versions[0].mode)) {
 		unsigned char old_tree_sha1[20];
 		mktree(t, 0, &old_tree);
@@ -1499,6 +1506,7 @@ static void tree_content_replace(
 {
 	if (!S_ISDIR(mode))
 		die("Root cannot be a non-directory");
+	hashclr(root->versions[0].sha1);
 	hashcpy(root->versions[1].sha1, sha1);
 	if (root->tree)
 		release_tree_content_recursive(root->tree);
@@ -1543,6 +1551,23 @@ static int tree_content_set(
 				if (e->tree)
 					release_tree_content_recursive(e->tree);
 				e->tree = subtree;
+
+				/*
+				 * We need to leave e->versions[0].sha1 alone
+				 * to avoid modifying the preimage tree used
+				 * when writing out the parent directory.
+				 * But after replacing the subdir with a
+				 * completely different one, it's not a good
+				 * delta base any more, and besides, we've
+				 * thrown away the tree entries needed to
+				 * make a delta against it.
+				 *
+				 * So let's just explicitly disable deltas
+				 * for the subtree.
+				 */
+				if (S_ISDIR(e->versions[0].mode))
+					e->versions[0].mode |= NO_DELTA;
+
 				hashclr(root->versions[1].sha1);
 				return 1;
 			}
@@ -2957,7 +2982,7 @@ static void print_ls(int mode, const unsigned char *sha1, const char *path)
 		/* mode SP type SP object_name TAB path LF */
 		strbuf_reset(&line);
 		strbuf_addf(&line, "%06o %s %s\t",
-				mode, type, sha1_to_hex(sha1));
+				mode & ~NO_DELTA, type, sha1_to_hex(sha1));
 		quote_c_style(path, &line, NULL, 0);
 		strbuf_addch(&line, '\n');
 	}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index c70e489..50b22f0 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -764,7 +764,7 @@ M 040000 @E g/b
 M 040000 @E g/b/h
 INPUT_END
 
-test_expect_failure \
+test_expect_success \
     'L: verify internal tree delta base' \
 	'git fast-import <input &&
 	A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) &&
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] fast-import: add a check for tree delta base sha1
  2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
@ 2011-08-13 21:02   ` Jonathan Nieder
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Nieder @ 2011-08-13 21:02 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Hi,

Dmitry Ivankov wrote:

> fast-import is able to write imported tree objects in delta format.
> It holds a tree structure in memory where each tree entry may have
> a delta base sha1 assigned. When delta base data is needed it is
> reconstructed from this in-memory structure. Though sometimes the
> delta base data doesn't match the delta base sha1 so wrong or even
> corrupt pack is produced.

I'm having trouble parsing this; not sure why.  Some guesses:

 - dropping the word "imported" could help, since it is the
   content of trees that comes from the user, not the tree objects

 - it's not clear to me what the second sentence is saying.  Do you
   mean that git looks at the versions[0].sha1 fields of item in
   t->entries to construct an in-memory tree object to delta against,
   instead of finding the object named by versions[0].sha1, inflating
   it, and using it directly?

 - the third sentence seems to be describing a problem, but I'm not
   sure what the relationship is to this patch: will this patch fix
   that problem, or does it just add a test illustrating it?

> To create a small easily reproducible test, add an excessive check
> for delta base sha1. It's not likely that computing sha1 for each
> tree delta base costs us much.

Since the first word in fast-import is "fast", I would be much
happier with some measurements with a typical import (i.e., one that
doesn't use --cat-blob-fd) than the statement "It's not likely". :)

If this tweak will continue to be useful after the fix, perhaps it
could be made optional.  I haven't thought carefully about this,
though.

On to the patch.

[...]
> +++ b/fast-import.c
> @@ -1455,12 +1455,22 @@ static void store_tree(struct tree_entry *root)
>  			store_tree(t->entries[i]);
>  	}
>  
> +	if (!is_null_sha1(root->versions[0].sha1)
> +					&& S_ISDIR(root->versions[0].mode)) {
> +		unsigned char old_tree_sha1[20];
> +		mktree(t, 0, &old_tree);
> +		prepare_object_hash(OBJ_TREE, &old_tree,
> +						NULL, NULL, old_tree_sha1);
> +
> +		if (hashcmp(old_tree_sha1, root->versions[0].sha1))
> +			die("internal tree delta base sha1 mismatch");

This is the heart of the patch; it involves several changes.

 1. construct the base object tree whether the base object is in the
    current pack or not

 2. calculate its hash and compare to ->versions[0].sha1 as a sanity
    check.

For large trees, I fear it could be an important slowdown.

> +
> -		le = find_object(root->versions[0].sha1);
> -		if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> -			mktree(t, 0, &old_tree);
> +		le = find_object(root->versions[0].sha1);
> +		if (le && le->pack_id == pack_id) {
>  			lo.data = old_tree;
>  			lo.offset = le->idx.offset;
>  			lo.depth = t->delta_depth;
>  		}
> +	}
[...]
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -734,6 +734,44 @@ test_expect_success \
[...]
> +cat >input2 <<INPUT_END
> +commit refs/heads/L2
> +committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
> +data <<COMMIT
> +update L2
> +COMMIT
> +from refs/heads/L2^0
> +M 040000 @A g
> +M 040000 @E g/b
> +M 040000 @E g/b/h
> +INPUT_END
> +
> +test_expect_failure \
> +    'L: verify internal tree delta base' \
> +	'git fast-import <input &&
> +	A=$(git ls-tree L2 a | tr " " "\t" | cut -f 3) &&
> +	E=$(git ls-tree L2 a/e | tr " " "\t" | cut -f 3) &&
> +	cat input2 | sed -e "s/@A/$A/" -e "s/@E/$E/" >input &&
> +	git fast-import <input'

The description ("L: verify internal tree delta base" here) should
describe something we want to work --- a facility or a statement ---
and should leave out words like "verify" unless it is a test of
verification facilities.

In this example, I guess it is testing something like "delta base is
not corrupted when replacing one directory by another"?  (That's a
random, wild guess and not meant as an example to be used verbatim.)

I suppose I would be happier if we can find a way to reproduce this
without modifying the behavior in such an invasive way.  Which should
be easier while thinking about the fix, so I'll move on to that.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/2] fix data corruption in fast-import
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
                   ` (2 preceding siblings ...)
  2011-08-12 10:32 ` [PATCH 3/3] fast-import: prevent producing bad delta Dmitry Ivankov
@ 2011-08-14 18:32 ` Dmitry Ivankov
  2011-08-14 18:32 ` [PATCH v2 1/2] fast-import: add a test for tree delta base corruption Dmitry Ivankov
  2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
  5 siblings, 0 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-14 18:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

It turns out the bug is older than "M 040000.." command.
Managed to reproduce with just "C .." command from tags/v1.5.3-rc2~6^2
b6f3481b.. Teach fast-import to recursively copy files/directories (Jul 15 2007)

And even better, there is no need to add a explicit check for sha1 mismatch (we
may still want to have this, but it can go separately).

Basically the test does:
Fill two distinct directories old/a, old/b
Commit them 
(necessary to make trees have sha1 computed and thus become potential delta bases)
C old new
C old/a new/b
M ... new/b/new_file

new/b is stored as a delta against old/a, but with delta base pointing to old/b.
And so ls-tree new/b fails, fsck fails both with "failed to apply delta".

Dmitry Ivankov (2):
  fast-import: add a test for tree delta base corruption
  fast-import: prevent producing bad delta

 fast-import.c          |   35 ++++++++++++++++++++++++++++++-----
 t/t9300-fast-import.sh |   41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 71 insertions(+), 5 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] fast-import: add a test for tree delta base corruption
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
                   ` (3 preceding siblings ...)
  2011-08-14 18:32 ` [PATCH v2 0/2] fix data corruption in fast-import Dmitry Ivankov
@ 2011-08-14 18:32 ` Dmitry Ivankov
  2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
  5 siblings, 0 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-14 18:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

fast-import is able to write imported tree objects in delta format.
It holds a tree structure in memory where each tree entry may have
a delta base sha1 assigned. When delta base data is needed it is
reconstructed from this in-memory structure. Though sometimes the
delta base data doesn't match the delta base sha1 so wrong or even
corrupt pack is produced.

Add a small test that produces a corrupt pack. It uses just tree
copy and file modification commands aside from the very basic commit
and blob commands.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 t/t9300-fast-import.sh |   41 +++++++++++++++++++++++++++++++++++++++++
 1 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index f256475..e2b94b5 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -734,6 +734,47 @@ test_expect_success \
 	 git diff-tree --abbrev --raw L^ L >output &&
 	 test_cmp expect output'
 
+cat >input <<INPUT_END
+blob
+mark :1
+data <<EOF
+the data
+EOF
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+init L2
+COMMIT
+M 644 :1 a/b/c
+M 644 :1 a/b/d
+M 644 :1 a/e/f
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+update L2
+COMMIT
+C a g
+C a/e g/b
+M 644 :1 g/b/h
+INPUT_END
+
+cat <<EOF >expect
+g/b/f
+g/b/h
+EOF
+
+test_expect_failure \
+    'L: nested tree copy does not corrupt deltas' \
+	'git fast-import <input &&
+	git ls-tree L2 g/b/ >tmp &&
+	cat tmp | cut -f 2 >actual &&
+	test_cmp expect actual &&
+	git fsck `git rev-parse L2`'
+
+git update-ref -d refs/heads/L2
+
 ###
 ### series M
 ###
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] fast-import: prevent producing bad delta
  2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
                   ` (4 preceding siblings ...)
  2011-08-14 18:32 ` [PATCH v2 1/2] fast-import: add a test for tree delta base corruption Dmitry Ivankov
@ 2011-08-14 18:32 ` Dmitry Ivankov
  2011-08-20  1:09   ` [PATCH v3] fast-import: do not write bad delta for replaced subtrees Jonathan Nieder
  5 siblings, 1 reply; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-14 18:32 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

To produce deltas for tree objects fast-import tracks two versions
of tree's entries - base and current one. Base version stands both
for a delta base of this tree, and for a entry inside a delta base
of a parent tree. So care should be taken to keep it in sync.

tree_content_set cuts away a whole subtree and replaces it with a
new one (or NULL for lazy load of a tree with known sha1). It
keeps a base sha1 for this subtree (needed for parent tree). And
here is the problem, 'subtree' tree root doesn't have the implied
base version entries.

Adjusting the subtree to include them would mean a deep rewrite of
subtree. Invalidating the subtree base version would mean recursive
invalidation of parents' base versions. So just mark this tree as
do-not-delta me. Abuse setuid bit for this purpose.

tree_content_replace is the same as tree_content_set except that is
is used to replace the root, so just clearing base sha1 here (instead
of setting the bit) is fine.

[di: log message]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c          |   35 ++++++++++++++++++++++++++++++-----
 t/t9300-fast-import.sh |    2 +-
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 7cc2262..0be7629 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -170,6 +170,11 @@ Format of STDIN stream:
 #define DEPTH_BITS 13
 #define MAX_DEPTH ((1<<DEPTH_BITS)-1)
 
+/*
+ * We abuse the setuid bit on directories to mean "do not delta".
+ */
+#define NO_DELTA S_ISUID
+
 struct object_entry {
 	struct pack_idx_entry idx;
 	struct object_entry *next;
@@ -1416,8 +1421,9 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 		struct tree_entry *e = t->entries[i];
 		if (!e->versions[v].mode)
 			continue;
-		strbuf_addf(b, "%o %s%c", (unsigned int)e->versions[v].mode,
-					e->name->str_dat, '\0');
+		strbuf_addf(b, "%o %s%c",
+			(unsigned int)(e->versions[v].mode & ~NO_DELTA),
+			e->name->str_dat, '\0');
 		strbuf_add(b, e->versions[v].sha1, 20);
 	}
 }
@@ -1427,7 +1433,7 @@ static void store_tree(struct tree_entry *root)
 	struct tree_content *t = root->tree;
 	unsigned int i, j, del;
 	struct last_object lo = { STRBUF_INIT, 0, 0, /* no_swap */ 1 };
-	struct object_entry *le;
+	struct object_entry *le = NULL;
 
 	if (!is_null_sha1(root->versions[1].sha1))
 		return;
@@ -1437,7 +1443,8 @@ static void store_tree(struct tree_entry *root)
 			store_tree(t->entries[i]);
 	}
 
-	le = find_object(root->versions[0].sha1);
+	if (!(root->versions[0].mode & NO_DELTA))
+		le = find_object(root->versions[0].sha1);
 	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
 		mktree(t, 0, &old_tree);
 		lo.data = old_tree;
@@ -1471,6 +1478,7 @@ static void tree_content_replace(
 {
 	if (!S_ISDIR(mode))
 		die("Root cannot be a non-directory");
+	hashclr(root->versions[0].sha1);
 	hashcpy(root->versions[1].sha1, sha1);
 	if (root->tree)
 		release_tree_content_recursive(root->tree);
@@ -1515,6 +1523,23 @@ static int tree_content_set(
 				if (e->tree)
 					release_tree_content_recursive(e->tree);
 				e->tree = subtree;
+
+				/*
+				 * We need to leave e->versions[0].sha1 alone
+				 * to avoid modifying the preimage tree used
+				 * when writing out the parent directory.
+				 * But after replacing the subdir with a
+				 * completely different one, it's not a good
+				 * delta base any more, and besides, we've
+				 * thrown away the tree entries needed to
+				 * make a delta against it.
+				 *
+				 * So let's just explicitly disable deltas
+				 * for the subtree.
+				 */
+				if (S_ISDIR(e->versions[0].mode))
+					e->versions[0].mode |= NO_DELTA;
+
 				hashclr(root->versions[1].sha1);
 				return 1;
 			}
@@ -2929,7 +2954,7 @@ static void print_ls(int mode, const unsigned char *sha1, const char *path)
 		/* mode SP type SP object_name TAB path LF */
 		strbuf_reset(&line);
 		strbuf_addf(&line, "%06o %s %s\t",
-				mode, type, sha1_to_hex(sha1));
+				mode & ~NO_DELTA, type, sha1_to_hex(sha1));
 		quote_c_style(path, &line, NULL, 0);
 		strbuf_addch(&line, '\n');
 	}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index e2b94b5..106e3f3 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -765,7 +765,7 @@ g/b/f
 g/b/h
 EOF
 
-test_expect_failure \
+test_expect_success \
     'L: nested tree copy does not corrupt deltas' \
 	'git fast-import <input &&
 	git ls-tree L2 g/b/ >tmp &&
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3] fast-import: do not write bad delta for replaced subtrees
  2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
@ 2011-08-20  1:09   ` Jonathan Nieder
  2011-08-20  9:08     ` Andreas Schwab
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2011-08-20  1:09 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> To produce deltas for tree objects fast-import tracks two versions
> of tree's entries - base and current one. Base version stands both
> for a delta base of this tree, and for a entry inside a delta base
> of a parent tree. So care should be taken to keep it in sync.

Thanks again for this.  Abusing (S_ISDIR | S_ISUID) still leaves a bad
taste in my mouth, but after your description I'm convinced that
behavior-wise it's the right thing to do.

I'm thinking of queueing the following to svn-fe-maint.  If there's
something wrong with it, I'd be happy to hear that now; otherwise,
we can put fixes on top.  In other words, "please speak now or forever
hold your peace".  Changes since v2:

 - clarify description

 - some tiny style nitpicks in the code and test

 - made a pass through checking uses of "mode" to make sure we are
   scrubbing out the S_ISUID bit before passing it to code that is not
   aware of that bit.

   The result is a few assert() calls to document those cases that
   required a second's thought and some extra scrubbing in tecmp0()
   when passing the mode to base_name_compare.  Although the latter is
   not needed and base_name_compare() only pays attention to the
   S_IFDIR bit, it seems best to stick to the expected interface and
   pass a real mode.

 - take care of a missed case where NO_DELTA was not being set:

	create branch "basis":

		M 100644 inline dir/hello.c
		data <<EOF
		hello
		EOF
		C dir/hello.c unrelated

	on branch "master":

		from refs/heads/basis
		D dir

	corrupt the subtree

		C unrelated dir/nothello.c

The patch has way too few tests.  Oh, well.

Bugs?  Improvements?

-- >8 --
Subject: fast-import: do not write bad delta for replaced subtrees

To produce deltas for tree objects, fast-import tracks two versions of
each tree entry - a base and the current version. The base version on
a tree stands both for a delta base of this tree, and for a entry
inside the delta base of the parent tree. So care needs to be taken to
keep them consistent.

Unfortunately this all gets forgotten when replacing one subtree by
another using tree_content_set.  When writing an entry representing
the new subtree, it keeps the old base sha1, since it is needed by the
parent tree.  But the new tree doesn't have the implied base version
entries, and when it is time to write it to pack, git writes an
invalid delta that is declared to have one base (the old tree name)
but actually has another one (the new tree for an "M" command, or the
tree's old base for an "R" or "C" command).

How to fix it?  Modifying the new subtree's entries to match the
declared base would be expensive, since it requires reading the tree
corresponding to the declared base from the object db and recursively
rewriting children's base versions to match.  Invalidating the parent
trees' bases would involve recursively walking up the tree and
disables deltas for each tree it touches, meaning a larger pack.
Let's just mark the new tree as do-not-delta-me instead.  We abuse the
setuid bit in the base "mode" field for this purpose.

tree_content_replace is in a similar predicament to tree_content_set,
except that because it is only used to replace the root, just
invalidating the base sha1 there (instead of setting the no-delta bit)
is fine.

Initial hack by Jonathan, test and description by Dmitry.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 fast-import.c          |   55 ++++++++++++++++++++++++++++++++++++++++++-----
 t/t9300-fast-import.sh |   40 ++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 65d65bf8..95919b63 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -170,6 +170,11 @@ Format of STDIN stream:
 #define DEPTH_BITS 13
 #define MAX_DEPTH ((1<<DEPTH_BITS)-1)
 
+/*
+ * We abuse the setuid bit on directories to mean "do not delta".
+ */
+#define NO_DELTA S_ISUID
+
 struct object_entry {
 	struct pack_idx_entry idx;
 	struct object_entry *next;
@@ -1380,9 +1385,12 @@ static int tecmp0 (const void *_a, const void *_b)
 {
 	struct tree_entry *a = *((struct tree_entry**)_a);
 	struct tree_entry *b = *((struct tree_entry**)_b);
+
 	return base_name_compare(
-		a->name->str_dat, a->name->str_len, a->versions[0].mode,
-		b->name->str_dat, b->name->str_len, b->versions[0].mode);
+		a->name->str_dat, a->name->str_len,
+					a->versions[0].mode & ~NO_DELTA,
+		b->name->str_dat, b->name->str_len,
+					b->versions[0].mode & ~NO_DELTA);
 }
 
 static int tecmp1 (const void *_a, const void *_b)
@@ -1405,6 +1413,14 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 		qsort(t->entries,t->entry_count,sizeof(t->entries[0]),tecmp1);
 
 	for (i = 0; i < t->entry_count; i++) {
+		/*
+		 * A hypothetical mode == (0 | NO_DELTA) would mean
+		 * "this version does not exist, and please don't
+		 * make deltas against it when writing a tree object
+		 * based on it".  That is spelled as "mode == 0".
+		 */
+		assert(t->entries[i]->versions[v].mode != NO_DELTA);
+
 		if (t->entries[i]->versions[v].mode)
 			maxlen += t->entries[i]->name->str_len + 34;
 	}
@@ -1415,8 +1431,9 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 		struct tree_entry *e = t->entries[i];
 		if (!e->versions[v].mode)
 			continue;
-		strbuf_addf(b, "%o %s%c", (unsigned int)e->versions[v].mode,
-					e->name->str_dat, '\0');
+		strbuf_addf(b, "%o %s%c",
+			(unsigned int)(e->versions[v].mode & ~NO_DELTA),
+			e->name->str_dat, '\0');
 		strbuf_add(b, e->versions[v].sha1, 20);
 	}
 }
@@ -1436,7 +1453,10 @@ static void store_tree(struct tree_entry *root)
 			store_tree(t->entries[i]);
 	}
 
-	le = find_object(root->versions[0].sha1);
+	if (root->versions[0].mode & NO_DELTA)
+		le = NULL;
+	else
+		le = find_object(root->versions[0].sha1);
 	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
 		mktree(t, 0, &old_tree);
 		lo.data = old_tree;
@@ -1470,6 +1490,7 @@ static void tree_content_replace(
 {
 	if (!S_ISDIR(mode))
 		die("Root cannot be a non-directory");
+	hashclr(root->versions[0].sha1);
 	hashcpy(root->versions[1].sha1, sha1);
 	if (root->tree)
 		release_tree_content_recursive(root->tree);
@@ -1514,11 +1535,30 @@ static int tree_content_set(
 				if (e->tree)
 					release_tree_content_recursive(e->tree);
 				e->tree = subtree;
+
+				/*
+				 * We need to leave e->versions[0].sha1 alone
+				 * to avoid modifying the preimage tree used
+				 * when writing out the parent directory.
+				 * But after replacing the subdir with a
+				 * completely different one, e->versions[0]
+				 * is not a good delta base any more, and
+				 * besides, we've thrown away the tree
+				 * entries needed to make a delta against it.
+				 *
+				 * Let's just disable deltas when the time
+				 * comes to write this subtree to pack.
+				 */
+				if (S_ISDIR(e->versions[0].mode))
+					e->versions[0].mode |= NO_DELTA;
+
 				hashclr(root->versions[1].sha1);
 				return 1;
 			}
 			if (!S_ISDIR(e->versions[1].mode)) {
 				e->tree = new_tree_content(8);
+				if (S_ISDIR(e->versions[0].mode))
+					e->versions[0].mode |= NO_DELTA;
 				e->versions[1].mode = S_IFDIR;
 			}
 			if (!e->tree)
@@ -2918,6 +2958,9 @@ static void print_ls(int mode, const unsigned char *sha1, const char *path)
 		S_ISDIR(mode) ? tree_type :
 		blob_type;
 
+	/* NO_DELTA is only used with tree objects. */
+	assert(mode != NO_DELTA);
+
 	if (!mode) {
 		/* missing SP path LF */
 		strbuf_reset(&line);
@@ -2928,7 +2971,7 @@ static void print_ls(int mode, const unsigned char *sha1, const char *path)
 		/* mode SP type SP object_name TAB path LF */
 		strbuf_reset(&line);
 		strbuf_addf(&line, "%06o %s %s\t",
-				mode, type, sha1_to_hex(sha1));
+				mode & ~NO_DELTA, type, sha1_to_hex(sha1));
 		quote_c_style(path, &line, NULL, 0);
 		strbuf_addch(&line, '\n');
 	}
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 6b1ba6c8..180d357b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -734,6 +734,46 @@ test_expect_success \
 	 git diff-tree --abbrev --raw L^ L >output &&
 	 test_cmp expect output'
 
+cat >input <<INPUT_END
+blob
+mark :1
+data <<EOF
+the data
+EOF
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+init L2
+COMMIT
+M 644 :1 a/b/c
+M 644 :1 a/b/d
+M 644 :1 a/e/f
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+update L2
+COMMIT
+C a g
+C a/e g/b
+M 644 :1 g/b/h
+INPUT_END
+
+cat <<EOF >expect
+g/b/f
+g/b/h
+EOF
+
+test_expect_success \
+    'L: modifying a copied tree does not produce a corrupt pack' \
+	'test_when_finished "git update-ref -d refs/heads/L2" &&
+	git fast-import <input &&
+	git ls-tree L2 g/b/ >tmp &&
+	cut -f 2 <tmp >actual &&
+	test_cmp expect actual &&
+	git fsck L2'
+
 ###
 ### series M
 ###
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] fast-import: do not write bad delta for replaced subtrees
  2011-08-20  1:09   ` [PATCH v3] fast-import: do not write bad delta for replaced subtrees Jonathan Nieder
@ 2011-08-20  9:08     ` Andreas Schwab
  2011-08-20 15:43       ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2011-08-20  9:08 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: Dmitry Ivankov, git, Shawn O. Pearce, David Barr

Jonathan Nieder <jrnieder@gmail.com> writes:

> Thanks again for this.  Abusing (S_ISDIR | S_ISUID) still leaves a bad
> taste in my mouth, but after your description I'm convinced that
> behavior-wise it's the right thing to do.

$ git grep S_ISUID
compat/mingw.h:#define S_ISUID 0

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] fast-import: do not write bad delta for replaced subtrees
  2011-08-20  9:08     ` Andreas Schwab
@ 2011-08-20 15:43       ` Jonathan Nieder
  2011-08-20 17:22         ` [PATCH v4] " Dmitry Ivankov
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2011-08-20 15:43 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Dmitry Ivankov, git, Shawn O. Pearce, David Barr

Andreas Schwab wrote:

> $ git grep S_ISUID
> compat/mingw.h:#define S_ISUID 0

Good catch, thanks.  Fix below --- despite appearances, this "mode"
field is about tree objects in the repository and there is no need to
match the current platform's file status conventions.

Another worrisome detail is that the patch changes the semantics of
tree_entry.versions[0].mode (by introducing the NO_DELTA bit) without
changing its name.  It would be safer to rename it to te_mode or
something.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
diff --git i/fast-import.c w/fast-import.c
index 95919b63..45e2128a 100644
--- i/fast-import.c
+++ w/fast-import.c
@@ -173,7 +173,7 @@ Format of STDIN stream:
 /*
  * We abuse the setuid bit on directories to mean "do not delta".
  */
-#define NO_DELTA S_ISUID
+#define NO_DELTA 04000
 
 struct object_entry {
 	struct pack_idx_entry idx;
-- 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v4] fast-import: do not write bad delta for replaced subtrees
  2011-08-20 15:43       ` Jonathan Nieder
@ 2011-08-20 17:22         ` Dmitry Ivankov
  2011-08-20 17:48           ` Jonathan Nieder
  0 siblings, 1 reply; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-20 17:22 UTC (permalink / raw)
  To: git
  Cc: Jonathan Nieder, Andreas Schwab, Shawn O. Pearce, David Barr,
	Dmitry Ivankov

How about adding a new bit field "no_delta" instead? The patch is
smaller this way. Also could 04000 theoretically be S_IFDIR on some
platform?

Changes:
- switch to a separate no_delta bit in tree_entry, don't assume that it
  is used only for trees. Could be useful for blobs too once their delta
  base logic changes (now it's just delta against blob that was stored
  just before the current one)
- when setting no_delta = 1 don't check for S_ISDIR(versions[0].mode),
  this is a redundant check and logic duplication. Who knows, maybe some
  day we'll want to delta a tree against blob. :)
- removed the asserts on mode, as now "mode" meaning isn't changed
- removed the duplicated signed-of-by :)

-- >8 --
Subject: fast-import: do not write bad delta for replaced subtrees

To produce deltas for tree objects, fast-import tracks two versions of
each tree entry - a base and the current version. The base version on
a tree stands both for a delta base of this tree, and for a entry
inside the delta base of the parent tree. So care needs to be taken to
keep them consistent.

Unfortunately this all gets forgotten when replacing one subtree by
another using tree_content_set.  When writing an entry representing
the new subtree, it keeps the old base sha1, since it is needed by the
parent tree.  But the new tree doesn't have the implied base version
entries, and when it is time to write it to pack, git writes an
invalid delta that is declared to have one base (the old tree name)
but actually has another one (the new tree for an "M" command, or the
tree's old base for an "R" or "C" command).

How to fix it?  Modifying the new subtree's entries to match the
declared base would be expensive, since it requires reading the tree
corresponding to the declared base from the object db and recursively
rewriting children's base versions to match.  Invalidating the parent
trees' bases would involve recursively walking up the tree and
disables deltas for each tree it touches, meaning a larger pack.
Let's just mark the new tree as do-not-delta-me instead. Add a new bit
for tree_entry named no_delta. It is set to 1 when subtree is replaced
and reset back to 0 when we set a new legal delta base, that is when
e->versions[0] is changed.

tree_content_replace is in a similar predicament to tree_content_set,
except that because it is only used to replace the root, just
invalidating the base sha1 there (instead of setting the no-delta bit)
is fine.

Initial hack by Jonathan, test and description by Dmitry.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c          |   27 ++++++++++++++++++++++++++-
 t/t9300-fast-import.sh |   40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 1 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 7cc2262..3bae498 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -221,6 +221,7 @@ struct tree_entry {
 		uint16_t mode;
 		unsigned char sha1[20];
 	} versions[2];
+	unsigned no_delta : 1;
 };
 
 struct tree_content {
@@ -1368,6 +1369,7 @@ static void load_tree(struct tree_entry *root)
 		if (!c)
 			die("Corrupt mode in %s", sha1_to_hex(sha1));
 		e->versions[0].mode = e->versions[1].mode;
+		e->no_delta = 0;
 		e->name = to_atom(c, strlen(c));
 		c += e->name->str_len + 1;
 		hashcpy(e->versions[0].sha1, (unsigned char *)c);
@@ -1437,7 +1439,10 @@ static void store_tree(struct tree_entry *root)
 			store_tree(t->entries[i]);
 	}
 
-	le = find_object(root->versions[0].sha1);
+	if (root->no_delta)
+		le = NULL;
+	else
+		le = find_object(root->versions[0].sha1);
 	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
 		mktree(t, 0, &old_tree);
 		lo.data = old_tree;
@@ -1453,6 +1458,7 @@ static void store_tree(struct tree_entry *root)
 		struct tree_entry *e = t->entries[i];
 		if (e->versions[1].mode) {
 			e->versions[0].mode = e->versions[1].mode;
+			e->no_delta = 0;
 			hashcpy(e->versions[0].sha1, e->versions[1].sha1);
 			t->entries[j++] = e;
 		} else {
@@ -1471,6 +1477,7 @@ static void tree_content_replace(
 {
 	if (!S_ISDIR(mode))
 		die("Root cannot be a non-directory");
+	hashclr(root->versions[0].sha1);
 	hashcpy(root->versions[1].sha1, sha1);
 	if (root->tree)
 		release_tree_content_recursive(root->tree);
@@ -1515,11 +1522,28 @@ static int tree_content_set(
 				if (e->tree)
 					release_tree_content_recursive(e->tree);
 				e->tree = subtree;
+
+				/*
+				 * We need to leave e->versions[0].sha1 alone
+				 * to avoid modifying the preimage tree used
+				 * when writing out the parent directory.
+				 * But after replacing the subdir with a
+				 * completely different one, e->versions[0]
+				 * is not a good delta base any more, and
+				 * besides, we've thrown away the tree
+				 * entries needed to make a delta against it.
+				 *
+				 * Let's just disable deltas when the time
+				 * comes to write this subtree to pack.
+				 */
+				e->no_delta = 1;
+
 				hashclr(root->versions[1].sha1);
 				return 1;
 			}
 			if (!S_ISDIR(e->versions[1].mode)) {
 				e->tree = new_tree_content(8);
+				e->no_delta = 1;
 				e->versions[1].mode = S_IFDIR;
 			}
 			if (!e->tree)
@@ -1537,6 +1561,7 @@ static int tree_content_set(
 	e = new_tree_entry();
 	e->name = to_atom(p, n);
 	e->versions[0].mode = 0;
+	e->no_delta = 0;
 	hashclr(e->versions[0].sha1);
 	t->entries[t->entry_count++] = e;
 	if (slash1) {
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index f256475..04fa70d 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -734,6 +734,46 @@ test_expect_success \
 	 git diff-tree --abbrev --raw L^ L >output &&
 	 test_cmp expect output'
 
+cat >input <<INPUT_END
+blob
+mark :1
+data <<EOF
+the data
+EOF
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+init L2
+COMMIT
+M 644 :1 a/b/c
+M 644 :1 a/b/d
+M 644 :1 a/e/f
+
+commit refs/heads/L2
+committer C O Mitter <committer@example.com> 1112912473 -0700
+data <<COMMIT
+update L2
+COMMIT
+C a g
+C a/e g/b
+M 644 :1 g/b/h
+INPUT_END
+
+cat <<EOF >expect
+g/b/f
+g/b/h
+EOF
+
+test_expect_success \
+    'L: modifying a copied tree does not produce a corrupt pack' \
+	'test_when_finished "git update-ref -d refs/heads/L2" &&
+	git fast-import <input &&
+	git ls-tree L2 g/b/ >tmp &&
+	cut -f 2 <tmp >actual &&
+	test_cmp expect actual &&
+	git fsck L2'
+
 ###
 ### series M
 ###
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] fast-import: do not write bad delta for replaced subtrees
  2011-08-20 17:22         ` [PATCH v4] " Dmitry Ivankov
@ 2011-08-20 17:48           ` Jonathan Nieder
  2011-08-20 18:28             ` Dmitry Ivankov
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Nieder @ 2011-08-20 17:48 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Andreas Schwab, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> How about adding a new bit field "no_delta" instead?

Currently the layout of "struct tree_entry_ms" is:

	uint16_t mode;			two bytes
	unsigned char sha1[20]		20 bytes

which adds up to 22 bytes.  Here is "struct tree_entry":

	struct tree_entry *tree;		one machine word
	struct atom_str *name;			one machine word
	struct tree_entry_ms versions[2];	44 bytes

Although it only looks like it adds one byte per tree entry, in
practice I suspect your patch adds four.  Is that worth it?  (The
answer might be yes.  I'm not sure.)

> The patch is
> smaller this way. Also could 04000 theoretically be S_IFDIR on some
> platform?

No, these modes are part of the format of objects as written on disk
and over the wire, so when we meet a platform with S_IFDIR != 040000,
there will have to be bigger changes (to distinguish between the
platform's idea of file status and git's idea of modes).

> - switch to a separate no_delta bit in tree_entry

If it doesn't cost too much, this is a good idea.

> - when setting no_delta = 1 don't check for S_ISDIR(versions[0].mode),
>   this is a redundant check and logic duplication. Who knows, maybe some
>   day we'll want to delta a tree against blob. :)

Why?  When versions[0] is not a tree, the hack is not needed, since
versions[0].mode and versions[0].sha1 accurately describe the delta
base and are not inconsistent with anything.

Thanks, that was helpful.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v4] fast-import: do not write bad delta for replaced subtrees
  2011-08-20 17:48           ` Jonathan Nieder
@ 2011-08-20 18:28             ` Dmitry Ivankov
  0 siblings, 0 replies; 14+ messages in thread
From: Dmitry Ivankov @ 2011-08-20 18:28 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Andreas Schwab, Shawn O. Pearce, David Barr

On Sat, Aug 20, 2011 at 11:48 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Dmitry Ivankov wrote:
>
>> How about adding a new bit field "no_delta" instead?
>
> Currently the layout of "struct tree_entry_ms" is:
>
>        uint16_t mode;                  two bytes
>        unsigned char sha1[20]          20 bytes
>
> which adds up to 22 bytes.  Here is "struct tree_entry":
>
>        struct tree_entry *tree;                one machine word
>        struct atom_str *name;                  one machine word
>        struct tree_entry_ms versions[2];       44 bytes
>
> Although it only looks like it adds one byte per tree entry, in
> practice I suspect your patch adds four.  Is that worth it?  (The
> answer might be yes.  I'm not sure.)
Hm, we can make mode 1 byte in fast-import (only 8 values are used).
But not in this patch of course.

>
>> The patch is
>> smaller this way. Also could 04000 theoretically be S_IFDIR on some
>> platform?
>
> No, these modes are part of the format of objects as written on disk
> and over the wire, so when we meet a platform with S_IFDIR != 040000,
> there will have to be bigger changes (to distinguish between the
> platform's idea of file status and git's idea of modes).
>
>> - switch to a separate no_delta bit in tree_entry
>
> If it doesn't cost too much, this is a good idea.
>
>> - when setting no_delta = 1 don't check for S_ISDIR(versions[0].mode),
>>   this is a redundant check and logic duplication. Who knows, maybe some
>>   day we'll want to delta a tree against blob. :)
>
> Why?  When versions[0] is not a tree, the hack is not needed, since
> versions[0].mode and versions[0].sha1 accurately describe the delta
> base and are not inconsistent with anything.
Oh, right you are. The new check and one in store_tree look the same
but the purpose differs.

>
> Thanks, that was helpful.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-08-20 18:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-12 10:32 [PATCH 0/3] fix data corruption in fast-import Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 1/3] fast-import: extract object preparation function Dmitry Ivankov
2011-08-12 10:32 ` [PATCH 2/3] fast-import: add a check for tree delta base sha1 Dmitry Ivankov
2011-08-13 21:02   ` Jonathan Nieder
2011-08-12 10:32 ` [PATCH 3/3] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 0/2] fix data corruption in fast-import Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 1/2] fast-import: add a test for tree delta base corruption Dmitry Ivankov
2011-08-14 18:32 ` [PATCH v2 2/2] fast-import: prevent producing bad delta Dmitry Ivankov
2011-08-20  1:09   ` [PATCH v3] fast-import: do not write bad delta for replaced subtrees Jonathan Nieder
2011-08-20  9:08     ` Andreas Schwab
2011-08-20 15:43       ` Jonathan Nieder
2011-08-20 17:22         ` [PATCH v4] " Dmitry Ivankov
2011-08-20 17:48           ` Jonathan Nieder
2011-08-20 18:28             ` Dmitry Ivankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.