All of lore.kernel.org
 help / color / mirror / Atom feed
* "repack" when a needed commit object missing "kills" the repo
@ 2009-02-09  2:26 Björn Steinbrink
  2009-02-09  9:13 ` repack might cause having when commit objects are missing Björn Steinbrink
  0 siblings, 1 reply; 4+ messages in thread
From: Björn Steinbrink @ 2009-02-09  2:26 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds; +Cc: git

Hi,

I have a git-svn repo with a large grafts file to recreate merges done
in svn and in one case to simply change a commit's parent. I don't
recall the reason for the latter, it's been there for almost a year now,
probably some crap from the svn repo that I didn't want to have in my
git history. So the original history was like:

A---B---C

And the graft turns that into:

A---C

Earlier repacks seem to have already pruned the B commit object, so
removing the grafts correctly causes git to complain about that missing
object.

Now I decided to finally run filter-branch on that repo to get rid of
the huge grafts file. So I ran "git filter-branch -- --all", removed the
grafts file, dropped all the refs/original/ stuff and went on to "git
repack -adf". And when that finished, a _lot_ of objects were suddenly
missing, including some of the rewritten commits --> repo busted.
Fortunately I had made a backup. :-)

The pack-objects process complained about the B commit object that was
already missing earlier, and after some time I realised that I didn't
expire the reflogs, so the "broken" commit was still reachable through
them. So I retried with the reflogs empty, and yup, that worked well. So
missing objects seem to confuse pack-objects enough to stop it from
packing all of objects reachable through non-broken chains.

A backtrace from the point at which pack-objects complains about the
missing object shows this:

#0  error (err=0x4bff35 "Could not read %s") at usage.c:64
#1  0x0000000000459e3c in parse_commit (item=0x15f8c50) at commit.c:304
#2  0x000000000048d2ea in add_parents_to_list (revs=0x7fff5f8bd7c0, 
    commit=0x15f8a10, list=0x7fff5f8bd7c0, cache_ptr=0x0) at revision.c:514
#3  0x000000000048d75d in get_revision_1 (revs=0x7fff5f8bd7c0)
    at revision.c:1740
#4  0x000000000048d7cb in get_revision_internal (revs=0x7fff5f8bd7c0)
    at revision.c:1843
#5  0x000000000048da31 in get_revision (revs=0x7fff5f8bd7c0) at revision.c:1924
#6  0x0000000000471f0c in traverse_commit_list (revs=0x7fff5f8bd7c0, 
    show_commit=0x437ec0 <show_commit>, show_object=0x437e80 <show_object>)
    at list-objects.c:147
#7  0x0000000000437fc8 in get_object_list (ac=<value optimized out>, 
    av=<value optimized out>) at builtin-pack-objects.c:2059
#8  0x000000000043aa4e in cmd_pack_objects (argc=6, argv=0x7fff5f8bde38, 
    prefix=<value optimized out>) at builtin-pack-objects.c:2288
#9  0x0000000000404983 in handle_internal_command (argc=6, argv=0x7fff5f8bde38)
    at git.c:244
#10 0x0000000000404eb6 in main (argc=6, argv=0x7fff5f8bde38) at git.c:473

Maybe get_revision_internal() switching into boundary mode is bad there?
The failing parse_commit() call causes get_revision_1() to return NULL,
so get_revision_internal() assumes it ran out of commits, but in this
case, it "just" hit a bad commit.

Up to now, I don't have a small reproduction case yet and unfortunately
I can't make the repo with which I've seen the failure available.

Thanks,
Björn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: repack might cause having when commit objects are missing
  2009-02-09  2:26 "repack" when a needed commit object missing "kills" the repo Björn Steinbrink
@ 2009-02-09  9:13 ` Björn Steinbrink
  2009-02-09  9:21   ` Björn Steinbrink
  0 siblings, 1 reply; 4+ messages in thread
From: Björn Steinbrink @ 2009-02-09  9:13 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds; +Cc: git

[topic fixed to resemble something sane...]

On 2009.02.09 03:26:42 +0100, Björn Steinbrink wrote:
> Up to now, I don't have a small reproduction case yet and unfortunately
> I can't make the repo with which I've seen the failure available.

Not a self-contained reproduction test, but at least it shows the
problem with a public repo. Assuming you have no refs referencing
a0325eb^

In a clone of git.git add to .git/info/grafts:
a0325eb2e982e25376e4cb89ebcac5d8d703548d 604dd0a0782b3e33bcc397c27e811ebc019c9f5a

Then:
git reflog expire --expire=0 --all
git repack -ad
rm .git/info/grafts
git repack -ad

The second repack wrecks the repo:

doener@atjola:git2 (master) $ cp /home/doener/grafts .git/info/
doener@atjola:git2 (master) $ git reflog expire --expire=0 --all
doener@atjola:git2 (master) $ git repack -ad
Counting objects: 92161, done.
Delta compression using 4 threads.
Compressing objects: 100% (23481/23481), done.
Writing objects: 100% (92161/92161), done.
Total 92161 (delta 67185), reused 92061 (delta 67092)
doener@atjola:git2 (master) $ rm .git/info/grafts
doener@atjola:git2 (master) $ git repack -ad
error: Could not read c0e2e12fa0babcc4ab28b95bc5ad4f86e139d6b4
Counting objects: 3195, done.
Delta compression using 4 threads.
Compressing objects: 100% (2245/2245), done.
Writing objects: 100% (3195/3195), done.
Total 3195 (delta 1263), reused 2418 (delta 886)
error: refs/heads/rebase does not point to a valid object!
[dozens of more broken refs]

Note how in the second repack, the walking machniery finds only 3195
objects, as opposed to more than 92k before.

I've also tried with threaded delta search disabled (pack.threads=1),
no difference.

Björn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: repack might cause having when commit objects are missing
  2009-02-09  9:13 ` repack might cause having when commit objects are missing Björn Steinbrink
@ 2009-02-09  9:21   ` Björn Steinbrink
  2009-02-11  9:27     ` [PATCH] revision traversal and pack: notice and die on missing commit Junio C Hamano
  0 siblings, 1 reply; 4+ messages in thread
From: Björn Steinbrink @ 2009-02-09  9:21 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds; +Cc: git

On 2009.02.09 10:13:03 +0100, Björn Steinbrink wrote:
> [topic fixed to resemble something sane...]

I'll just give up... s/having/havoc/ FWIW

Björn, too stupid to type

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH] revision traversal and pack: notice and die on missing commit
  2009-02-09  9:21   ` Björn Steinbrink
@ 2009-02-11  9:27     ` Junio C Hamano
  0 siblings, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2009-02-11  9:27 UTC (permalink / raw)
  To: Björn Steinbrink; +Cc: Linus Torvalds, git

cc0e6c5 (Handle return code of parse_commit in revision machinery,
2007-05-04) attempted to tighten error checking in the revision machinery,
but it wasn't enough.  When get_revision_1() was asked for the next commit
to return, it tries to read and simplify the parents of the commit to be
returned, but an error while doing so was silently ignored and reported as
a truncated history to the caller instead.

This resulted in an early end of "git log" output or a pack that lacks
older commits from "git pack-objects", without any error indication in the
exit status from these commands, even though the underlying parse_commit()
issues an error message to the end user.

Note that the codepath in add_parents_list() that paints parents of an
UNINTERESTING commit UNINTERESTING silently ignores the error when
parse_commit() fails; this is deliberate and in line with aeeae1b
(revision traversal: allow UNINTERESTING objects to be missing,
2009-01-27).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Dscho, do not complain that I did not use test_commit, because this is
   meant to apply against maint-1.5.4, which happens to be the oldest
   maintenance track I am still keeping around.  The breakage actually
   goes even older; cc0e6c5 is *not* the culprit, but merely an incomplete
   attempt to fix this issue.

   There is a fairly obvious comflict when this patch is applied to maint
   and master, but I'll be merging the commit upward after a bit more
   testing.

 revision.c                     |    6 ++++--
 t/t5307-pack-missing-commit.sh |   39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 43 insertions(+), 2 deletions(-)
 create mode 100755 t/t5307-pack-missing-commit.sh

diff --git a/revision.c b/revision.c
index d79f2b3..b8c015e 100644
--- a/revision.c
+++ b/revision.c
@@ -1484,14 +1484,16 @@ static struct commit *get_revision_1(struct rev_info *revs)
 			    (commit->date < revs->max_age))
 				continue;
 			if (add_parents_to_list(revs, commit, &revs->commits) < 0)
-				return NULL;
+				die("Failed to traverse parents of commit %s",
+				    sha1_to_hex(commit->object.sha1));
 		}
 
 		switch (simplify_commit(revs, commit)) {
 		case commit_ignore:
 			continue;
 		case commit_error:
-			return NULL;
+			die("Failed to simplify parents of commit %s",
+			    sha1_to_hex(commit->object.sha1));
 		default:
 			return commit;
 		}
diff --git a/t/t5307-pack-missing-commit.sh b/t/t5307-pack-missing-commit.sh
new file mode 100755
index 0000000..ae52a18
--- /dev/null
+++ b/t/t5307-pack-missing-commit.sh
@@ -0,0 +1,39 @@
+#!/bin/sh
+
+test_description='pack should notice missing commit objects'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+	for i in 1 2 3 4 5
+	do
+		echo "$i" >"file$i" &&
+		git add "file$i" &&
+		test_tick &&
+		git commit -m "$i" &&
+		git tag "tag$i"
+	done &&
+	obj=$(git rev-parse --verify tag3) &&
+	fanout=$(expr "$obj" : "\(..\)") &&
+	remainder=$(expr "$obj" : "..\(.*\)") &&
+	rm -f ".git/objects/$fanout/$remainder"
+'
+
+test_expect_success 'check corruption' '
+	test_must_fail git fsck
+'
+
+test_expect_success 'rev-list notices corruption (1)' '
+	test_must_fail git rev-list HEAD
+'
+
+test_expect_success 'rev-list notices corruption (2)' '
+	test_must_fail git rev-list --objects HEAD
+'
+
+test_expect_success 'pack-objects notices corruption' '
+	echo HEAD |
+	test_must_fail git pack-objects --revs pack
+'
+
+test_done
-- 
1.6.2.rc0.50.g0a27

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-02-11  9:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-09  2:26 "repack" when a needed commit object missing "kills" the repo Björn Steinbrink
2009-02-09  9:13 ` repack might cause having when commit objects are missing Björn Steinbrink
2009-02-09  9:21   ` Björn Steinbrink
2009-02-11  9:27     ` [PATCH] revision traversal and pack: notice and die on missing commit Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.