From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755996Ab1KBR1m (ORCPT ); Wed, 2 Nov 2011 13:27:42 -0400 Received: from mail-wy0-f174.google.com ([74.125.82.174]:35291 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755810Ab1KBR1k (ORCPT ); Wed, 2 Nov 2011 13:27:40 -0400 Message-ID: <1320254854.2292.14.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC> Subject: Re: Linux 3.1-rc9 From: Eric Dumazet To: Thomas Gleixner Cc: Simon Kirby , David Miller , Peter Zijlstra , Linus Torvalds , Linux Kernel Mailing List , Dave Jones , Martin Schwidefsky , Ingo Molnar , Network Development Date: Wed, 02 Nov 2011 18:27:34 +0100 In-Reply-To: References: <1318874090.4172.84.camel@twins> <1318879396.4172.92.camel@twins> <1318928713.21167.4.camel@twins> <20111018182046.GF1309@hostway.ca> <20111024190203.GA24410@hostway.ca> <20111025202049.GB25043@hostway.ca> <20111031173246.GA10614@hostway.ca> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.0- Content-Transfer-Encoding: 8bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Le mercredi 02 novembre 2011 à 17:40 +0100, Thomas Gleixner a écrit : > On Mon, 31 Oct 2011, Simon Kirby wrote: > > On Tue, Oct 25, 2011 at 01:20:49PM -0700, Simon Kirby wrote: > > > > > On Mon, Oct 24, 2011 at 12:02:03PM -0700, Simon Kirby wrote: > > > > > > > Ok, hit the hang about 4 more times, but only this morning on a box with > > > > a serial cable attached. Yay! > > > > > > Here's lockdep output from another box. This one looks a bit different. > > > > One more, again a bit different. The last few lockups have looked like > > this. Not sure why, but we're hitting this at a few a day now. Thomas, > > this is without your patch, but as you said, that's right before a free > > and should print a separate lockdep warning. > > > > No "huh" lines until after the trace on this one. I'll move to 3.1 with > > That means that the lockdep warning hit in the same net_rx cycle > before the leak was detected by the softirq code. > > > cherry-picked b0691c8e now. > > Can you please add the debug patch below and try the following: > > Enable CONFIG_FUNCTION_TRACER & CONFIG_FUNCTION_GRAPH_TRACER > > # cd $DEBUGFSMOUNTPOINT/tracing > # echo sk_clone >set_ftrace_filter > # echo function >current_tracer > # echo 1 >options/func_stack_trace > > Now wait until it reproduces (which stops the trace) and read out > > # cat trace >/tmp/trace.txt > > Please provide the trace file along with the lockdep splat. That > should tell us which callchain is responsible for the spinlock > leakage. > > Thanks, > > tglx > > ---------------> > kernel/softirq.c | 1 + > 1 file changed, 1 insertion(+) > > Index: linux-2.6/kernel/softirq.c > =================================================================== > --- linux-2.6.orig/kernel/softirq.c > +++ linux-2.6/kernel/softirq.c > @@ -238,6 +238,7 @@ restart: > h->action(h); > trace_softirq_exit(vec_nr); > if (unlikely(prev_count != preempt_count())) { > + tracing_off(); > printk(KERN_ERR "huh, entered softirq %u %s %p" > "with preempt_count %08x," > " exited with %08x?\n", vec_nr, I believe it might come from commit 0e734419 (ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.) In case inet_csk_route_child_sock() returns NULL, we dont release socket lock.