On Wed, Aug 29, 2012 at 01:10:29AM +0200, Oliver wrote:
> On Tuesday 28 August 2012 19:16:39 Oliver wrote:
> > During testing I found that the kernel is indeed solid and does not panic;
> > however, I managed to make conntrackd eat 100% of a CPU core on one of the
> > pair and conntrack entries remained unevicted from the kernel until I killed
> > the conntrackd process.
> 
> having conntrackd running while the conntrack table is full is causing a GPF - 
> I have attached a dmesg output of the kernel panic resulting from a general 
> protection fault.
> 
> The first GPF is from the kernel patched with the code provided in my previous 
> e-mail (the one for v3.4.10 based on the patch you provided me)
> 
> the second is with my only my original patch (the one-liner that checks the 
> dying bit in death_by_event) - although that's likely not relevant here since 
> that function is not part of the stack.

The problem seems to be the re-use of the conntrack timer. Races may
happen in entries that were just inserted in the dying list while
packets / ctnetlink still hold a reference to them.

Would you give a try to this patch? Please, remove the previous I
sent.

Let me know if you hit more crashes. Thanks.