All of lore.kernel.org
 help / color / mirror / Atom feed
* Stack overflow at write_one()
@ 2011-11-19 20:27 Cesar Eduardo Barros
  2011-11-19 21:08 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Cesar Eduardo Barros @ 2011-11-19 20:27 UTC (permalink / raw)
  To: git

I have found a stack overflow at builtin/pack-objects.c:write_one(), 
where it calls itself endlessly. This is caused by the object_entry e 
and e->delta->delta being the same. But I have no idea how that happened.

First, the full story:

I used Google's repo tool to mirror AOSP to my machine. This mirrors 
several kernel trees (six last time I counted), without sharing objects 
one with another. To save space, I decided to point their 
objects/info/alternates to my mirror of the Linus kernel tree (which 
should be safe, since Linus makes it always fast-forward), and run "git 
gc" on them to create a smaller pack. This worked for all trees except 
one, where it core dumped (abrt report at 
https://bugzilla.redhat.com/show_bug.cgi?id=755132).

I compiled the latest git (v1.7.8-rc3-17-gf56ef11) to see if it still 
happened, and here is what I could get from gdb. I attached to the 
pack-objects process before it crashed (full command line "git 
pack-objects --keep-true-parents --honor-pack-keep --non-empty --all 
--reflog --unpack-unreachable --local --delta-base-offset 
/home/cesarb/src/bug755132/omap.git/objects/pack/.tmp-5171-pack"), 
continued, and let it crash:

(gdb) cont
Continuing.
[New Thread 0x7f3f2bad3700 (LWP 5205)]
[New Thread 0x7f3f2b2d2700 (LWP 5206)]
[New Thread 0x7f3f2aad1700 (LWP 5207)]
[New Thread 0x7f3f2a2d0700 (LWP 5208)]
[Thread 0x7f3f2b2d2700 (LWP 5206) exited]
[Thread 0x7f3f2bad3700 (LWP 5205) exited]
[Thread 0x7f3f2aad1700 (LWP 5207) exited]
[Thread 0x7f3f2a2d0700 (LWP 5208) exited]

Program received signal SIGSEGV, Segmentation fault.
0x00000000004472b9 in write_one (f=0x6a97db0, e=0x7f3f30233490,
     offset=0x7fff79b53908) at builtin/pack-objects.c:415
415	{

Unlike on Fedora's git binary, where it happened on a call instruction, 
this time it happened on a push instruction:

(gdb) disassemble
Dump of assembler code for function write_one:
    0x00000000004472b0 <+0>:	push   %r15
    0x00000000004472b2 <+2>:	push   %r14
    0x00000000004472b4 <+4>:	push   %r13
    0x00000000004472b6 <+6>:	mov    %rdx,%r13
=> 0x00000000004472b9 <+9>:	push   %r12
    0x00000000004472bb <+11>:	mov    %rdi,%r12

The last few frames on the stack show the endless recursion:

(gdb) where
#0  0x00000000004472b9 in write_one (f=0x6a97db0, e=0x7f3f30233490,
     offset=0x7fff79b53908) at builtin/pack-objects.c:415
#1  0x00000000004472ed in write_one (f=0x6a97db0, e=0x7f3f30277390,
     offset=0x7fff79b53908) at builtin/pack-objects.c:423
#2  0x00000000004472ed in write_one (f=0x6a97db0, e=0x7f3f30233490,
     offset=0x7fff79b53908) at builtin/pack-objects.c:423
#3  0x00000000004472ed in write_one (f=0x6a97db0, e=0x7f3f30277390,
     offset=0x7fff79b53908) at builtin/pack-objects.c:423
#4  0x00000000004472ed in write_one (f=0x6a97db0, e=0x7f3f30233490,
     offset=0x7fff79b53908) at builtin/pack-objects.c:423

And here is the loop in the data structures:

(gdb) p e
$1 = (struct object_entry *) 0x7f3f30233490
(gdb) p e->delta
$2 = (struct object_entry *) 0x7f3f30277390
(gdb) p e->delta->delta
$3 = (struct object_entry *) 0x7f3f30233490

Unfortunately, I do not know enough of git's internals to debug further. 
In case it helps, here is the contents of a few of these structures:

(gdb) p *e
$4 = {idx = {
     sha1 = "\257>J\241)\266\023\064\a\342J\320\375ӆ\262M\245", 
<incomplete sequence \356>, crc32 = 0, offset = 0}, size = 20, in_pack = 
0x259b610,
   in_pack_offset = 231061238, delta = 0x7f3f30277390,
   delta_child = 0x7f3f30277390, delta_sibling = 0x7f3f30413b10,
   delta_data = 0x0, delta_size = 20, z_delta_size = 0, hash = 2099915708,
   type = OBJ_OFS_DELTA, in_pack_type = OBJ_OFS_DELTA,
   in_pack_header_size = 5 '\005', preferred_base = 0 '\000',
   no_try_delta = 0 '\000', tagged = 0 '\000', filled = 1 '\001'}
(gdb) p *(e->delta)
$5 = {idx = {
     sha1 = 
"\372\307\035\372\017\350\307\f\310R\t\236\006\034\063N*T\216\253",
     crc32 = 0, offset = 0}, size = 14, in_pack = 0x259b610,
   in_pack_offset = 39990, delta = 0x7f3f30233490,
   delta_child = 0x7f3f30233490, delta_sibling = 0x0, delta_data = 0x0,
   delta_size = 14, z_delta_size = 0, hash = 2099915708, type = 
OBJ_REF_DELTA,
   in_pack_type = OBJ_REF_DELTA, in_pack_header_size = 21 '\025',
   preferred_base = 0 '\000', no_try_delta = 0 '\000', tagged = 0 '\000',
   filled = 1 '\001'}
(gdb) p *(e->in_pack)
$6 = {next = 0x25a53c0, windows = 0x259bc40, pack_size = 449155894,
   index_data = 0x7f3f4f0a9000, index_size = 58351420, num_objects = 
2083941,
   num_bad_objects = 0, bad_object_sha1 = 0x0, index_version = 2,
   mtime = 1321387261, pack_fd = -1, pack_local = 1, pack_keep = 0,
   do_not_close = 0, sha1 = 
"\371Q4\177.ȳv\364\246\332Z\234\025?\352ݠP\210",
   pack_name = 0x259b671 
"/home/cesarb/src/bug755132/omap.git/objects/pack/pack-f951347f2ec8b376f4a6da5a9c153feadda05088.pack"}

I tried using "git fsck" to see if it could find anything strange, but 
it seems to get stuck (using 100% CPU) after these lines:

[...]
Checking commit fb630b9fc902e24209166b1659a8b375bf38099c
Checking tree fc32c012c750084eb1d82782cee7c80a45a78289
Checking blob fc7bbba585cee2c2b0d5282c42fb986bfb032a0a
Checking commit fdcb23634c9b6649bb02c681033d4973491b0e35
Checking tree fe773cf73ff553249be2f24ddf770f5dc43a41f1
Checking blob fe67b5c79f0ff33d92ebe7469a89c5a5d044fc0a
Checking blob fe73276e026bf263f494a917c84c6a3fcaeaaeda
Checking tree fe30eda9d92d074816f9c3a47fd3ffb9b89ca835
Checking tree fe9c75396e6d433b289d0e40c7e47921b91cad3a
Checking blob ff3ed6086ce1c6b6b4b5111c034d14a208c0d045
Checking blob ff66638ff54d5ad7067e4f246d392059eef1a7bf
Checking tree ff126d2bc67017199049ddba761979f3bda57eb9

Unfortunately, the reproducer I have (a copy of both trees with 
objects/info/alternates modified) is 1.8G in size, and I do not know how 
to create a smaller reproducer. If you know of a command which would get 
more relevant information from them, just ask; I plan on keeping them 
around for a while.

-- 
Cesar Eduardo Barros
cesarb@cesarb.net
cesar.barros@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stack overflow at write_one()
  2011-11-19 20:27 Stack overflow at write_one() Cesar Eduardo Barros
@ 2011-11-19 21:08 ` Junio C Hamano
  2011-11-19 21:46   ` Cesar Eduardo Barros
  0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2011-11-19 21:08 UTC (permalink / raw)
  To: Cesar Eduardo Barros; +Cc: git

Already found the real cause (jGit bug) and workaround posted, I think.

See $gmane/185573

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stack overflow at write_one()
  2011-11-19 21:08 ` Junio C Hamano
@ 2011-11-19 21:46   ` Cesar Eduardo Barros
  2011-11-19 23:30     ` Shawn Pearce
  0 siblings, 1 reply; 6+ messages in thread
From: Cesar Eduardo Barros @ 2011-11-19 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Em 19-11-2011 19:08, Junio C Hamano escreveu:
> Already found the real cause (jGit bug) and workaround posted, I think.

I presume the cause then is what was fixed by 
http://egit.eclipse.org/w/?p=jgit.git;a=commit;h=2fbf296fda205446eac11a13abd4fcdb182f28d9 
?

> See $gmane/185573

That did it, thanks! The patch had an offset, a fuzz, and a reject, but 
it was easy to fix by hand.

$ ../git/git gc
Counting objects: 30254, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (6614/6614), done.
warning: recursive delta detected for object 
fac71dfa0fe8c70cc852099e061c334e2a548eab
warning: recursive delta detected for object 
1b730f5b2e0bdb2a2206af8ed30170509e75a2f5
warning: recursive delta detected for object 
2f25a87e67fa3a226e367b9e080f11aa90c9f953
warning: recursive delta detected for object 
d5e5eefac91788da9a94efe9a15e0b928a77489e
Writing objects: 100% (30254/30254), done.
Total 30254 (delta 24008), reused 28803 (delta 23266)

And after that the repack does not break anymore:

$ ../git/git gc
Counting objects: 30254, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (5876/5876), done.
Writing objects: 100% (30254/30254), done.
Total 30254 (delta 24008), reused 30254 (delta 24008)

-- 
Cesar Eduardo Barros
cesarb@cesarb.net
cesar.barros@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stack overflow at write_one()
  2011-11-19 21:46   ` Cesar Eduardo Barros
@ 2011-11-19 23:30     ` Shawn Pearce
  2011-11-20  0:02       ` Cesar Eduardo Barros
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn Pearce @ 2011-11-19 23:30 UTC (permalink / raw)
  To: Cesar Eduardo Barros; +Cc: Junio C Hamano, git

On Sat, Nov 19, 2011 at 13:46, Cesar Eduardo Barros <cesarb@cesarb.net> wrote:
> Em 19-11-2011 19:08, Junio C Hamano escreveu:
>>
>> Already found the real cause (jGit bug) and workaround posted, I think.
>
> I presume the cause then is what was fixed by
> http://egit.eclipse.org/w/?p=jgit.git;a=commit;h=2fbf296fda205446eac11a13abd4fcdb182f28d9
> ?

Yes. The AOSP servers were all updated with the above JGit patch, so
the servers are no longer sending duplicate objects. But yes, for a
period of time there were duplicates in the kernel repositories,
particularly kernal/omap.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stack overflow at write_one()
  2011-11-19 23:30     ` Shawn Pearce
@ 2011-11-20  0:02       ` Cesar Eduardo Barros
  2011-11-20  2:00         ` Shawn Pearce
  0 siblings, 1 reply; 6+ messages in thread
From: Cesar Eduardo Barros @ 2011-11-20  0:02 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git

Em 19-11-2011 21:30, Shawn Pearce escreveu:
> On Sat, Nov 19, 2011 at 13:46, Cesar Eduardo Barros<cesarb@cesarb.net>  wrote:
>> Em 19-11-2011 19:08, Junio C Hamano escreveu:
>>>
>>> Already found the real cause (jGit bug) and workaround posted, I think.
>>
>> I presume the cause then is what was fixed by
>> http://egit.eclipse.org/w/?p=jgit.git;a=commit;h=2fbf296fda205446eac11a13abd4fcdb182f28d9
>> ?
>
> Yes. The AOSP servers were all updated with the above JGit patch, so
> the servers are no longer sending duplicate objects. But yes, for a
> period of time there were duplicates in the kernel repositories,
> particularly kernal/omap.

So, would an alternative workaround in my situation be to delete 
kernel/omap.git and let repo sync recreate it? It seems repo does not 
have extra metadata anywhere else, so just removing the directory should 
be enough for it to clone again from scratch, hopefully getting a 
corrected pack from the server.

-- 
Cesar Eduardo Barros
cesarb@cesarb.net
cesar.barros@gmail.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Stack overflow at write_one()
  2011-11-20  0:02       ` Cesar Eduardo Barros
@ 2011-11-20  2:00         ` Shawn Pearce
  0 siblings, 0 replies; 6+ messages in thread
From: Shawn Pearce @ 2011-11-20  2:00 UTC (permalink / raw)
  To: Cesar Eduardo Barros; +Cc: Junio C Hamano, git

On Sat, Nov 19, 2011 at 16:02, Cesar Eduardo Barros <cesarb@cesarb.net> wrote:
> Em 19-11-2011 21:30, Shawn Pearce escreveu:
>>
>> On Sat, Nov 19, 2011 at 13:46, Cesar Eduardo Barros<cesarb@cesarb.net>
>>  wrote:
>>>
>>> Em 19-11-2011 19:08, Junio C Hamano escreveu:
>>>>
>>>> Already found the real cause (jGit bug) and workaround posted, I think.
>>>
>>> I presume the cause then is what was fixed by
>>>
>>> http://egit.eclipse.org/w/?p=jgit.git;a=commit;h=2fbf296fda205446eac11a13abd4fcdb182f28d9
>>> ?
>>
>> Yes. The AOSP servers were all updated with the above JGit patch, so
>> the servers are no longer sending duplicate objects. But yes, for a
>> period of time there were duplicates in the kernel repositories,
>> particularly kernal/omap.
>
> So, would an alternative workaround in my situation be to delete
> kernel/omap.git and let repo sync recreate it? It seems repo does not have
> extra metadata anywhere else, so just removing the directory should be
> enough for it to clone again from scratch, hopefully getting a corrected
> pack from the server.

Yes. repo does not have extra state, so just removing the directory
and running `repo sync` again to clone the repository would correct
the local repository.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2011-11-20  2:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-19 20:27 Stack overflow at write_one() Cesar Eduardo Barros
2011-11-19 21:08 ` Junio C Hamano
2011-11-19 21:46   ` Cesar Eduardo Barros
2011-11-19 23:30     ` Shawn Pearce
2011-11-20  0:02       ` Cesar Eduardo Barros
2011-11-20  2:00         ` Shawn Pearce

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.