From mboxrd@z Thu Jan 1 00:00:00 1970 From: gowrishankar Subject: Re: 2.6.33.[56]-rt23: howto create repeatable explosion in wakeup_next_waiter() Date: Wed, 07 Jul 2010 19:41:41 +0530 Message-ID: <4C348B1D.5060008@linux.vnet.ibm.com> References: <1278478019.10245.77.camel@marge.simson.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-rt-users@vger.kernel.org, Thomas Gleixner , Peter Zijlstra , Darren Hart To: Mike Galbraith Return-path: Received: from [202.81.31.147] ([202.81.31.147]:48854 "EHLO e23smtp05.au.ibm.com" rhost-flags-FAIL-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1755178Ab0GGOfB (ORCPT ); Wed, 7 Jul 2010 10:35:01 -0400 Received: from d23relay05.au.ibm.com (d23relay05.au.ibm.com [202.81.31.247]) by e23smtp05.au.ibm.com (8.14.4/8.13.1) with ESMTP id o67E7ZO1020307 for ; Thu, 8 Jul 2010 00:07:35 +1000 Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138]) by d23relay05.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o67EBlkd1556648 for ; Thu, 8 Jul 2010 00:11:47 +1000 Received: from d23av02.au.ibm.com (loopback [127.0.0.1]) by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o67EBlMI024077 for ; Thu, 8 Jul 2010 00:11:47 +1000 In-Reply-To: <1278478019.10245.77.camel@marge.simson.net> Sender: linux-rt-users-owner@vger.kernel.org List-ID: On Wednesday 07 July 2010 10:16 AM, Mike Galbraith wrote: > Greetings, > > Stress testing, looking to trigger RCU stalls, I've managed to find a > way to repeatably create fireworks. (got RCU stall, see attached) > > 1. download ltp-full-20100630. Needs to be this version because of > testcase bustage in earlier versions, and must be built with gcc> 4.3, > else testcases will segfault due to a gcc bug. > > Hi Mike, I have seen this segfault esp with GCC v4.3.4. I am about to post this patch in ltp: Signed-off-by: Gowrishankar --- testcases/realtime/include/librttest.h | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/testcases/realtime/include/librttest.h b/testcases/realtime/include/librttest.h index e526ab4..273de6f 100644 --- a/testcases/realtime/include/librttest.h +++ b/testcases/realtime/include/librttest.h @@ -118,9 +118,9 @@ static inline int atomic_add(int i, atomic_t *v) int __i; __i = i; asm volatile( - "lock; xaddl %0, %1;" - :"=r"(i) - :"m"(v->counter), "0"(i)); + "lock; xaddl %1, %0;" + :"=m"(v->counter) + :"r"(i), "m" (v->counter)); return i + __i; #elif defined(__powerpc__) #define ISYNC_ON_SMP "\n\tisync\n" -- Please let me know if this patch helps. Thanks, Gowri > 2. apply patchlet so you can run testcases/realtime/perf/latency/run.sh > at all. > > --- pthread_cond_many.c.org 2010-07-05 09:05:59.000000000 +0200 > +++ pthread_cond_many.c 2010-07-04 12:12:25.000000000 +0200 > @@ -259,7 +259,7 @@ void usage(void) > > int parse_args(int c, char *v) > { > - int handled; > + int handled = 1; > switch (c) { > case 'h': > usage(); > > 3. add --realtime for no particular reason. > > --- run.sh.org 2010-07-06 15:54:58.000000000 +0200 > +++ run.sh 2010-07-06 16:37:34.000000000 +0200 > @@ -22,7 +22,7 @@ make > # process to run realtime. The remainder of the processes (if any) > # will run non-realtime in any case. > > -nthread=5000 > +nthread=500 > iter=400 > nproc=5 > > @@ -39,7 +39,7 @@ i=0 > i=1 > while test $i -lt $nproc > do > - ./pthread_cond_many --broadcast -i $iter -n $nthread> $nthread.$iter.$nproc.$i.out& > + ./pthread_cond_many --realtime --broadcast -i $iter -n $nthread> $nthread.$iter.$nproc.$i.out& > i=`expr $i + 1` > done > wait > > 4. run it. > > What happens here is we hit WARN_ON(pendowner->pi_blocked_on != waiter), > this does not make it to consoles (poking sysrq-foo doesn't either). > Next comes WARN_ON(!pendowner->pi_blocked_on), followed by the NULL > explosion, which does make it to consoles. > > With explosion avoidance, I also see pendowner->pi_blocked_on->task == > NULL at times, but that, as !pendowner->pi_blocked_on, seems to be > fallout. The start of bad juju is always pi_blocked_on != waiter. > > [ 141.609268] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 > [ 141.609268] IP: [] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] PGD 20e174067 PUD 20e253067 PMD 0 > [ 141.609268] Oops: 0000 [#1] PREEMPT SMP > [ 141.609268] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > [ 141.609268] CPU 0 > [ 141.609268] Pid: 8154, comm: pthread_cond_ma Tainted: G W 2.6.33.6-rt23 #12 MS-7502/MS-7502 > [ 141.609268] RIP: 0010:[] [] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] RSP: 0018:ffff88020e3cdd78 EFLAGS: 00010097 > [ 141.609268] RAX: 0000000000000000 RBX: ffff8801e8eba5c0 RCX: 0000000000000000 > [ 141.609268] RDX: ffff880028200000 RSI: 0000000000000046 RDI: 0000000000000009 > [ 141.609268] RBP: ffff88020e3cdda8 R08: 0000000000000002 R09: 0000000000000000 > [ 141.609268] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff81659068 > [ 141.609268] R13: ffff8801e8ebdb58 R14: 0000000000000000 R15: ffff8801e8ebac08 > [ 141.609268] FS: 00007f664d539700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 > [ 141.609268] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 141.609268] CR2: 0000000000000058 CR3: 0000000214266000 CR4: 00000000000006f0 > [ 141.609268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 141.609268] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 141.609268] Process pthread_cond_ma (pid: 8154, threadinfo ffff88020e3cc000, task ffff88020e2a4700) > [ 141.609268] Stack: > [ 141.609268] 0000000000000000 ffffffff81659068 0000000000000202 0000000000000000 > [ 141.609268]<0> 0000000000000000 0000000080001fda ffff88020e3cddc8 ffffffff812fec48 > [ 141.609268]<0> ffffffff81659068 0000000000606300 ffff88020e3cddd8 ffffffff812ff1b9 > [ 141.609268] Call Trace: > [ 141.609268] [] rt_spin_lock_slowunlock+0x43/0x61 > [ 141.609268] [] rt_spin_unlock+0x46/0x48 > [ 141.609268] [] do_futex+0x83c/0x935 > [ 141.609268] [] ? handle_mm_fault+0x6de/0x6f1 > [ 141.609268] [] ? do_futex+0x8f3/0x935 > [ 141.609268] [] sys_futex+0x142/0x154 > [ 141.609268] [] ? do_page_fault+0x23e/0x28e > [ 141.609268] [] ? math_state_restore+0x3d/0x3f > [ 141.609268] [] ? do_device_not_available+0xe/0x12 > [ 141.609268] [] system_call_fastpath+0x16/0x1b > [ 141.609268] Code: c7 09 6d 41 81 e8 ac 34 fd ff 4c 39 ab 70 06 00 00 74 11 be 47 02 00 00 48 c7 c7 09 6d 41 81 e8 92 34 fd ff 48 8b 83 70 06 00 00<4c> 39 60 58 74 11 be 48 02 00 00 48 c7 c7 09 6d 41 81 e8 74 34 > [ 141.609268] RIP [] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] RSP > [ 141.609268] CR2: 0000000000000058 > [ 141.609268] ---[ end trace 58805b944e6f93ce ]--- > [ 141.609268] note: pthread_cond_ma[8154] exited with preempt_count 2 > > (5. eyeball locks.. -> zzzzt -> report -> eyeball..) > > -Mike >