All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta"
@ 2011-07-28  4:46 Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

A very short summary. It was found [1] that fast-import sometimes can produce
broken packfiles (sha1 mismatch) or even wrong packfiles (data differs
from what was expected). This happens mostly on not so tiny svn-to-git or
even cvs-to-svn-to-git imports with all these copying across the tree
(simulating tags/branches as a directories in git, for example). But I won't
be surprised if this can happen without these operations too.

Technically, fast-import has in-memory tree representation where it stores
sha1's of some previous tree states (to make delta on them), but when it comes
to producing the delta, old sha1's tree content is fetched from the in-memory
node and it's children (not via sha1->object lookup). And these can turn out
to be unrelated to each other as some operations changes the children's states.

The most wanted bit for these patches is small testcases. Keeping in mind all
the in-memory tree state and fast-import logic is hard for me, so I wasn't able
to create small tests (the best is [2] - 15M archive + custom git builds + fix the
Makefile in [2] + a few minutes to reproduce).
Another good todo is to always avoid base sha1's mismatch (not just to avoid 
corruption if it is detected). I think I can do this, but I won't be sure in 
the code unless there is a bunch of good tests, this series is quite big already.

[1] http://thread.gmane.org/gmane.comp.version-control.git/176753
[2] http://thread.gmane.org/gmane.comp.version-control.git/176753/focus=177901

Dmitry Ivankov (7):
  fast-import: extract object preparation function
  fast-import: be saner with temporary trees
  fast-import: fix a data corruption in parse_ls
  fast-import: fix data corruption in store_tree
  fast-import: extract tree_content reading function
  fast-import: workaround data corruption
  fast-import: fix data corruption in load_tree

 fast-import.c |  169 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 135 insertions(+), 34 deletions(-)

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH/WIP 1/7] fast-import: extract object preparation function
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

We're constructing raw objects and compute their sha1's in fast-import
just before saving them.

Extract header and sha1 computations so that we can get sha1 without
actually saving the object.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   32 +++++++++++++++++++++++++-------
 1 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 9e8d186..05cc55e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1005,6 +1005,30 @@ static void cycle_packfile(void)
 	start_packfile();
 }
 
+static void prepare_object_hash(
+	enum object_type type,
+	struct strbuf *dat,
+	unsigned char *hdr_out,
+	unsigned long *hdrlen_out,
+	unsigned char *sha1_out
+)
+{
+	unsigned char hdr_[96];
+	unsigned char *hdr = hdr_out ? hdr_out : hdr_;
+	unsigned long hdrlen;
+	git_SHA_CTX c;
+
+	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
+		(unsigned long)dat->len) + 1;
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, hdr, hdrlen);
+	git_SHA1_Update(&c, dat->buf, dat->len);
+	git_SHA1_Final(sha1_out, &c);
+
+	if (hdrlen_out)
+		*hdrlen_out = hdrlen;
+}
+
 static int store_object(
 	enum object_type type,
 	struct strbuf *dat,
@@ -1017,15 +1041,9 @@ static int store_object(
 	unsigned char hdr[96];
 	unsigned char sha1[20];
 	unsigned long hdrlen, deltalen;
-	git_SHA_CTX c;
 	git_zstream s;
 
-	hdrlen = sprintf((char *)hdr,"%s %lu", typename(type),
-		(unsigned long)dat->len) + 1;
-	git_SHA1_Init(&c);
-	git_SHA1_Update(&c, hdr, hdrlen);
-	git_SHA1_Update(&c, dat->buf, dat->len);
-	git_SHA1_Final(sha1, &c);
+	prepare_object_hash(type, dat, hdr, &hdrlen, sha1);
 	if (sha1out)
 		hashcpy(sha1out, sha1);
 
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 2/7] fast-import: be saner with temporary trees
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  7:27   ` Jonathan Nieder
  2011-07-28  4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

new_tree_entry() doesn't zero or otherwise initialize the returned
entry, neither does release_tree_entry(). So it is quite possible
to get previously released data in a new entry.

parse_ls doesn't set entry->versions[0] fields, but it does call
store_tree(entry) which looks for this base sha1 and tries to do
delta compression with that random object.

Reset entry->versions[0] fields to make things more predictable
and to avoid surprises here.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 05cc55e..da9cb62 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2968,6 +2968,9 @@ static void parse_ls(struct branch *b)
 	} else {
 		struct object_entry *e = parse_treeish_dataref(&p);
 		root = new_tree_entry();
+		hashclr(root->versions[0].sha1);
+		root->versions[0].mode = 0;
+		root->versions[1].mode = S_IFDIR;
 		hashcpy(root->versions[1].sha1, e->idx.sha1);
 		load_tree(root);
 		if (*p++ != ' ')
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  7:34   ` Jonathan Nieder
  2011-07-28  4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

store_tree sets versions[0] = versions[1] unconditionally. This is fine
if it is run from the very root. But if it's run for a intermediate
node in parse_ls, node's parent versions[0] can become invalid as it
references it's children versions[0].

Move dropping old version to a function and don't drop old version in
parse_ls. Also this split will allow to perform a few more fixes for
store_tree itself.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   42 ++++++++++++++++++++++++++----------------
 1 files changed, 26 insertions(+), 16 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index da9cb62..d5915b8 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1439,12 +1439,36 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 	}
 }
 
-static void store_tree(struct tree_entry *root)
+static void drop_old(struct tree_entry *root)
 {
 	struct tree_content *t = root->tree;
 	unsigned int i, j, del;
+
+	hashcpy(root->versions[0].sha1, root->versions[1].sha1);
+	root->versions[0].mode = root->versions[1].mode;
+
+	if (!t)
+		return;
+
+	for (i = 0, j = 0, del = 0; i < t->entry_count; i++) {
+		struct tree_entry *e = t->entries[i];
+		if (e->versions[1].mode) {
+			drop_old(e);
+			t->entries[j++] = e;
+		} else {
+			release_tree_entry(e);
+			del++;
+		}
+	}
+	t->entry_count -= del;
+}
+
+static void store_tree(struct tree_entry *root)
+{
+	struct tree_content *t = root->tree;
 	struct last_object lo = { STRBUF_INIT, 0, 0, /* no_swap */ 1 };
 	struct object_entry *le;
+	unsigned int i;
 
 	if (!is_null_sha1(root->versions[1].sha1))
 		return;
@@ -1464,20 +1488,7 @@ static void store_tree(struct tree_entry *root)
 
 	mktree(t, 1, &new_tree);
 	store_object(OBJ_TREE, &new_tree, &lo, root->versions[1].sha1, 0);
-
 	t->delta_depth = lo.depth;
-	for (i = 0, j = 0, del = 0; i < t->entry_count; i++) {
-		struct tree_entry *e = t->entries[i];
-		if (e->versions[1].mode) {
-			e->versions[0].mode = e->versions[1].mode;
-			hashcpy(e->versions[0].sha1, e->versions[1].sha1);
-			t->entries[j++] = e;
-		} else {
-			release_tree_entry(e);
-			del++;
-		}
-	}
-	t->entry_count -= del;
 }
 
 static void tree_content_replace(
@@ -2640,8 +2651,7 @@ static void parse_new_commit(void)
 
 	/* build the tree and the commit */
 	store_tree(&b->branch_tree);
-	hashcpy(b->branch_tree.versions[0].sha1,
-		b->branch_tree.versions[1].sha1);
+	drop_old(&b->branch_tree);
 
 	strbuf_reset(&new_data);
 	strbuf_addf(&new_data, "tree %s\n",
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
                   ` (2 preceding siblings ...)
  2011-07-28  4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  7:42   ` Jonathan Nieder
  2011-07-28  4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

store_tree didn't check for S_ISDIR(versions[1].mode), but it did check
for tree != NULL.

It's possible that tree == NULL && S_ISDIR(versions[1].mode) in which
case we need to load_tree and then to store_tree, and not to drop that
entry. Calling load_tree requires S_ISDIR check to be present, that's
why it is added..

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index d5915b8..d917ea6 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1474,8 +1474,11 @@ static void store_tree(struct tree_entry *root)
 		return;
 
 	for (i = 0; i < t->entry_count; i++) {
-		if (t->entries[i]->tree)
-			store_tree(t->entries[i]);
+		if (!S_ISDIR(t->entries[i]->versions[1].mode))
+			continue;
+		if (!t->entries[i]->tree)
+			load_tree(t->entries[i]);
+		store_tree(t->entries[i]);
 	}
 
 	le = find_object(root->versions[0].sha1);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 5/7] fast-import: extract tree_content reading function
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
                   ` (3 preceding siblings ...)
  2011-07-28  4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
  2011-07-28  4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov
  6 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

It will be useful to fetch tree contents by a sha1. First, we can check
our in-memory tree against it. Second, we may need to read both old and
new tree contents and merge them in load_tree.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   19 +++++++++++--------
 1 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index d917ea6..9f0d2fe 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1344,19 +1344,14 @@ static const char *get_mode(const char *str, uint16_t *modep)
 	return str;
 }
 
-static void load_tree(struct tree_entry *root)
+static void load_tree_content(struct tree_content **root, unsigned char *sha1)
 {
-	unsigned char *sha1 = root->versions[1].sha1;
 	struct object_entry *myoe;
-	struct tree_content *t;
+	struct tree_content *t = *root;
 	unsigned long size;
 	char *buf;
 	const char *c;
 
-	root->tree = t = new_tree_content(8);
-	if (is_null_sha1(sha1))
-		return;
-
 	myoe = find_object(sha1);
 	if (myoe && myoe->pack_id != MAX_PACK_ID) {
 		if (myoe->type != OBJ_TREE)
@@ -1377,7 +1372,7 @@ static void load_tree(struct tree_entry *root)
 		struct tree_entry *e = new_tree_entry();
 
 		if (t->entry_count == t->entry_capacity)
-			root->tree = t = grow_tree_content(t, t->entry_count);
+			*root = t = grow_tree_content(t, t->entry_count);
 		t->entries[t->entry_count++] = e;
 
 		e->tree = NULL;
@@ -1394,6 +1389,14 @@ static void load_tree(struct tree_entry *root)
 	free(buf);
 }
 
+static void load_tree(struct tree_entry *root)
+{
+	root->tree = t = new_tree_content(8);
+	if (is_null_sha1(sha1))
+		return;
+       load_tree_content(&root->tree, root->versions[1].sha1);
+}
+
 static int tecmp0 (const void *_a, const void *_b)
 {
 	struct tree_entry *a = *((struct tree_entry**)_a);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 6/7] fast-import: workaround data corruption
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
                   ` (4 preceding siblings ...)
  2011-07-28  4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  2011-07-28  6:31   ` Jonathan Nieder
  2011-07-28  4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov
  6 siblings, 1 reply; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

fast-import keeps track of some delta-base for tree objects. When it is
time to compute the delta, base object is constructed from in-memory
tree representation using children's delta bases sha1. But these can be
unrelated due to several bugs, and it leads to object with wrong sha1
being delta-written to the packfile.

We have the base sha1 and what we think it's data is. Verify sha1 and if
it doesn't match, report it to stderr and don't use delta for this tree.

We could also die() here when bugs are fixed. Or we can see if the data
we've got is from our pack file and so still try to use it as a base.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   14 +++++++++++++-
 1 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 9f0d2fe..14a2a63 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1469,7 +1469,8 @@ static void drop_old(struct tree_entry *root)
 static void store_tree(struct tree_entry *root)
 {
 	struct tree_content *t = root->tree;
-	struct last_object lo = { STRBUF_INIT, 0, 0, /* no_swap */ 1 };
+	struct strbuf empty = STRBUF_INIT;
+	struct last_object lo = { empty, 0, 0, /* no_swap */ 1 };
 	struct object_entry *le;
 	unsigned int i;
 
@@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
 
 	le = find_object(root->versions[0].sha1);
 	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
+		unsigned char sh[20];
 		mktree(t, 0, &old_tree);
 		lo.data = old_tree;
 		lo.offset = le->idx.offset;
 		lo.depth = t->delta_depth;
+
+		prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
+		if (hashcmp(sh, root->versions[0].sha1)) {
+			fprintf(stderr, "internal sha1 delta base mismatch,"
+					" won't use delta for that tree\n");
+			lo.data = empty;
+			lo.offset = 0;
+			lo.depth = 0;
+		}
+
 	}
 
 	mktree(t, 1, &new_tree);
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree
  2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
                   ` (5 preceding siblings ...)
  2011-07-28  4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
@ 2011-07-28  4:46 ` Dmitry Ivankov
  6 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  4:46 UTC (permalink / raw)
  To: git; +Cc: Jonathan Nieder, Shawn O. Pearce, David Barr, Dmitry Ivankov

load_tree could be used to load a tree having different base and
current sha1. For example it can happens after a parent tree was
set by sha1 (it's tree becomes NULL, versions[0].sha1 remain and
versions[1].sha1 change). But it doesn't look at versions[0].sha1
and just loads a new version resetting the base one to the new one.
This corrupts parent tree delta.

Try to detect that case. Load both base and new trees and merge them
together so that mktree is able to produce both base and new trees
correctly.

There still may be a delta data corruption. For example tree_content_set
with subtree != NULL can produce subtree entries bases and subtree's new
parent base mismatch. tree_content_set is used in file_modify_cr - copy
and move trees by names. And another place is notes writing thing that
does some trees magic too.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
---
 fast-import.c |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 14a2a63..feccd14 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1389,14 +1389,6 @@ static void load_tree_content(struct tree_content **root, unsigned char *sha1)
 	free(buf);
 }
 
-static void load_tree(struct tree_entry *root)
-{
-	root->tree = t = new_tree_content(8);
-	if (is_null_sha1(sha1))
-		return;
-       load_tree_content(&root->tree, root->versions[1].sha1);
-}
-
 static int tecmp0 (const void *_a, const void *_b)
 {
 	struct tree_entry *a = *((struct tree_entry**)_a);
@@ -1442,6 +1434,66 @@ static void mktree(struct tree_content *t, int v, struct strbuf *b)
 	}
 }
 
+static void load_tree(struct tree_entry *root)
+{
+	struct tree_content *oldt;
+	size_t n, i, j;
+
+	root->tree = new_tree_content(8);
+	if (is_null_sha1(root->versions[1].sha1)) {
+		if (!S_ISDIR(root->versions[0].mode) || is_null_sha1(root->versions[0].sha1) || !hashcmp(root->versions[0].sha1, root->versions[1].sha1))
+			return;
+		// looks like it is currently unreachable, but let it be for a while
+		load_tree_content(&root->tree, root->versions[0].sha1);
+		for (i = 0; i < root->tree->entry_count; ++i) {
+			root->tree->entries[i]->versions[1].mode = 0;
+			hashclr(root->tree->entries[i]->versions[1].sha1);
+		}
+		return;
+	}
+
+	load_tree_content(&root->tree, root->versions[1].sha1);
+	if (!S_ISDIR(root->versions[0].mode) || is_null_sha1(root->versions[0].sha1) || !hashcmp(root->versions[0].sha1, root->versions[1].sha1))
+ 		return;
+
+	oldt = new_tree_content(8);
+	load_tree_content(&oldt, root->versions[0].sha1);
+
+	qsort(root->tree->entries, root->tree->entry_count, sizeof(root->tree->entries[0]), tecmp1);
+	qsort(oldt->entries, oldt->entry_count, sizeof(oldt->entries[0]), tecmp1);
+
+	n = root->tree->entry_count;
+	i = 0;
+	j = 0;
+	while (i < n || j < oldt->entry_count) {
+		int cmp = i == n ? 1 : j == oldt->entry_count ? -1 : tecmp1(root->tree->entries + i, oldt->entries + j);
+		if (cmp > 0) {
+			if (root->tree->entry_count == root->tree->entry_capacity)
+				root->tree = grow_tree_content(root->tree, root->tree->entry_count);
+			oldt->entries[j]->versions[1].mode = 0;
+			hashclr(oldt->entries[j]->versions[1].sha1);
+			root->tree->entries[root->tree->entry_count++] = oldt->entries[j];
+			oldt->entries[j] = NULL;
+			++j;
+		} else if (cmp < 0) {
+			root->tree->entries[i]->versions[0].mode = 0;
+			hashclr(root->tree->entries[i]->versions[0].sha1);
+			++i;
+		} else {
+			root->tree->entries[i]->versions[0].mode = oldt->entries[j]->versions[1].mode;
+			hashcpy(root->tree->entries[i]->versions[0].sha1, oldt->entries[j]->versions[1].sha1);
+			++i;
+			++j;
+		}
+	}
+	for (j = 0; j < oldt->entry_count; ++j)
+		if (oldt->entries[j]) {
+			release_tree_entry(oldt->entries[j]);
+			oldt->entries[j] = NULL;
+		}
+	release_tree_content(oldt);
+}
+
 static void drop_old(struct tree_entry *root)
 {
 	struct tree_content *t = root->tree;
-- 
1.7.3.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH/WIP 6/7] fast-import: workaround data corruption
  2011-07-28  4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
@ 2011-07-28  6:31   ` Jonathan Nieder
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2011-07-28  6:31 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Hi!

Dmitry Ivankov wrote:

> fast-import keeps track of some delta-base for tree objects. When it is
> time to compute the delta, base object is constructed from in-memory
> tree representation using children's delta bases sha1. But these can be
> unrelated due to several bugs, and it leads to object with wrong sha1
> being delta-written to the packfile.

This seems like as good a starting point as any.  Small language
nitpicks:

 - "some delta-base" is somewhat vague

 - missing article (probably "the") before "base object" and
   "children's"

 - does "children's delta bases sha1" mean "the pre-computed object
   names and modes in versions[0].{sha1,mode} for each tree entry" or
   something else?

 - when you say "unrelated": unrelated to what?

> We have the base sha1 and what we think it's data is. Verify sha1 and if
> it doesn't match, report it to stderr and don't use delta for this tree.

At any rate, if I understand correctly, the idea is that when store_tree()
is being called, root->versions[0].sha1 does not match the tree object
implied by root->tree->entries[*]->{name,versions[0].{mode,sha1}}.  This
patch adds a quick check to notice when that happens.

Makes a lot of sense, especially as a demonstration of the problem.
store_object() returns early in many cases so this probably does make
the bug easier to find (good).

I wonder if it would make more sense to perform this sanity check in
store_object() so blobs could benefit, too.  How much does the check slow
fast-import down (if at all)?

> We could also die() here when bugs are fixed.

If the check is fast, sure, why not? :)  The only reason I can think
of is that computing a SHA-1 is an O(size of object) cost which it
would be nice to avoid if profiling shows it is noticeable.

> Or we can see if the data
> we've got is from our pack file and so still try to use it as a base.

Can you elaborate?  When would versions[0].sha1 not match the data
but the data still be in the pack file?

> --- a/fast-import.c
> +++ b/fast-import.c
[...]
> @@ -1486,10 +1487,21 @@ static void store_tree(struct tree_entry *root)
>  
>  	le = find_object(root->versions[0].sha1);
>  	if (S_ISDIR(root->versions[0].mode) && le && le->pack_id == pack_id) {
> +		unsigned char sh[20];

What does "sh" stand for?

>  		mktree(t, 0, &old_tree);
>  		lo.data = old_tree;
>  		lo.offset = le->idx.offset;
>  		lo.depth = t->delta_depth;
> +
> +		prepare_object_hash(OBJ_TREE, &old_tree, NULL, NULL, sh);
> +		if (hashcmp(sh, root->versions[0].sha1)) {
> +			fprintf(stderr, "internal sha1 delta base mismatch,"
> +					" won't use delta for that tree\n");
> +			lo.data = empty;

To avoid a memory leak and introducing an unnecessary variable:

			strbuf_reset(&lo.data);

Thanks, that was interesting.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/WIP 2/7] fast-import: be saner with temporary trees
  2011-07-28  4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
@ 2011-07-28  7:27   ` Jonathan Nieder
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2011-07-28  7:27 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> new_tree_entry() doesn't zero or otherwise initialize the returned
> entry, neither does release_tree_entry(). So it is quite possible
> to get previously released data in a new entry.

Thanks.  Background for the confused: new_tree_entry /
release_tree_entry manage a stack of tree_entry structs to use as
temporaries.  Initializing them is the responsibility of the caller,
both after allocation with xmalloc() when existing temporaries are
exhausted and after used entries are pushed with release_tree_entry().

> parse_ls doesn't set entry->versions[0] fields, but it does call
> store_tree(entry) which looks for this base sha1 and tries to do
> delta compression with that random object.
>
> Reset entry->versions[0] fields to make things more predictable
> and to avoid surprises here.
>
> Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>

Good idea --- "root" is invalid at this point.  Looks like a mistake
introduced by my tweaks to v1.7.5-rc0~3^2~33 (fast-import: add 'ls'
command, 2010-12-02).

But isn't store_tree() only called on "leaf" which is completely
zero-initialized?

So I don't see why this change would have any effect.  For what it's
worth, even so, except for the commit message,

Acked-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls
  2011-07-28  4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
@ 2011-07-28  7:34   ` Jonathan Nieder
  0 siblings, 0 replies; 13+ messages in thread
From: Jonathan Nieder @ 2011-07-28  7:34 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> store_tree sets versions[0] = versions[1] unconditionally. This is fine
> if it is run from the very root.

True.

> But if it's run for a intermediate
> node in parse_ls, node's parent versions[0] can become invalid as it
> references it's children versions[0].

A puzzle: when would parse_ls() call store_tree() on a subdirectory?
The store_tree() call is preceded by

	tree_content_get(root, p, &leaf);

which makes a deep copy of "root" in leaf (which seems to be leaked
--- oops).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree
  2011-07-28  4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
@ 2011-07-28  7:42   ` Jonathan Nieder
  2011-07-28  8:11     ` Dmitry Ivankov
  0 siblings, 1 reply; 13+ messages in thread
From: Jonathan Nieder @ 2011-07-28  7:42 UTC (permalink / raw)
  To: Dmitry Ivankov; +Cc: git, Shawn O. Pearce, David Barr

Dmitry Ivankov wrote:

> --- a/fast-import.c
> +++ b/fast-import.c
> @@ -1474,8 +1474,11 @@ static void store_tree(struct tree_entry *root)
>  		return;
>  
>  	for (i = 0; i < t->entry_count; i++) {
> -		if (t->entries[i]->tree)
> -			store_tree(t->entries[i]);
> +		if (!S_ISDIR(t->entries[i]->versions[1].mode))
> +			continue;
> +		if (!t->entries[i]->tree)
> +			load_tree(t->entries[i]);
> +		store_tree(t->entries[i]);

How can this load_tree call work if t->entries[i]->versions[1].sha1
is not already in the object database?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree
  2011-07-28  7:42   ` Jonathan Nieder
@ 2011-07-28  8:11     ` Dmitry Ivankov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Ivankov @ 2011-07-28  8:11 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Shawn O. Pearce, David Barr

On Thu, Jul 28, 2011 at 1:42 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:
> Dmitry Ivankov wrote:
>
>> --- a/fast-import.c
>> +++ b/fast-import.c
>> @@ -1474,8 +1474,11 @@ static void store_tree(struct tree_entry *root)
>>               return;
>>
>>       for (i = 0; i < t->entry_count; i++) {
>> -             if (t->entries[i]->tree)
>> -                     store_tree(t->entries[i]);
>> +             if (!S_ISDIR(t->entries[i]->versions[1].mode))
>> +                     continue;
>> +             if (!t->entries[i]->tree)
>> +                     load_tree(t->entries[i]);
>> +             store_tree(t->entries[i]);
>
> How can this load_tree call work if t->entries[i]->versions[1].sha1
> is not already in the object database?
Well, it can be in the database and at the same time
t->entries[i]->tree == NULL.
In my testcase the commit is like

ls :10386 trunk
M 040000 e1b2ea9d3634cb7914044425ffae91945c41ac6a branches/cygnus
D branches/cygnus/gcc/assert.h
D branches/cygnus/gcc/collect2.c
D branches/cygnus/gcc/cccp.c
D branches/cygnus/gcc/c-parse.gperf
...
only D branches/cygnus/gcc/..... lines
progress done #here we call it, so it's when we store whole tree, some
subdirectory of branches/cygnus

the load_tree really gets called (with whole series applied to
fast-import) and the load works (sha1 =
2e90a035614854f626bb35eeac517f41ac84da27)
Though I don't yet see if not loading it breaks anything in my test
(with the whole series).

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-07-28  8:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-28  4:46 [PATCH/WIP 0/7] was: long fast-import errors out "failed to apply delta" Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 1/7] fast-import: extract object preparation function Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 2/7] fast-import: be saner with temporary trees Dmitry Ivankov
2011-07-28  7:27   ` Jonathan Nieder
2011-07-28  4:46 ` [PATCH/WIP 3/7] fast-import: fix a data corruption in parse_ls Dmitry Ivankov
2011-07-28  7:34   ` Jonathan Nieder
2011-07-28  4:46 ` [PATCH/WIP 4/7] fast-import: fix data corruption in store_tree Dmitry Ivankov
2011-07-28  7:42   ` Jonathan Nieder
2011-07-28  8:11     ` Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 5/7] fast-import: extract tree_content reading function Dmitry Ivankov
2011-07-28  4:46 ` [PATCH/WIP 6/7] fast-import: workaround data corruption Dmitry Ivankov
2011-07-28  6:31   ` Jonathan Nieder
2011-07-28  4:46 ` [PATCH/WIP 7/7] fast-import: fix data corruption in load_tree Dmitry Ivankov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.