All of lore.kernel.org
 help / color / mirror / Atom feed
From: Darren Hart <dvhltc@us.ibm.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>,
	linux-rt-users <linux-rt-users@vger.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>
Subject: Re: 2.6.33.[56]-rt23: howto create repeatable explosion in wakeup_next_waiter()
Date: Wed, 07 Jul 2010 07:03:44 -0700	[thread overview]
Message-ID: <4C348940.9070805@us.ibm.com> (raw)
In-Reply-To: <alpine.LFD.2.00.1007071356480.2604@localhost.localdomain>

On 07/07/2010 04:57 AM, Thomas Gleixner wrote:
> Cc'ing Darren.
>
> On Wed, 7 Jul 2010, Mike Galbraith wrote:
>

Hi Mike,

>> Greetings,
>>
>> Stress testing, looking to trigger RCU stalls, I've managed to find a
>> way to repeatably create fireworks.  (got RCU stall, see attached)
>>
>> 1. download ltp-full-20100630.  Needs to be this version because of
>> testcase bustage in earlier versions, and must be built with gcc>  4.3,
>> else testcases will segfault due to a gcc bug.

Interesting, I had not hit any gcc specific issues with this. Can you 
point me to the bug?

>>
>> 2. apply patchlet so you can run testcases/realtime/perf/latency/run.sh
>> at all.
>>
>> --- pthread_cond_many.c.org	2010-07-05 09:05:59.000000000 +0200
>> +++ pthread_cond_many.c	2010-07-04 12:12:25.000000000 +0200
>> @@ -259,7 +259,7 @@ void usage(void)
>>
>>   int parse_args(int c, char *v)
>>   {
>> -	int handled;
>> +	int handled = 1;
>>           switch (c) {
>>   		case 'h':
>>   			usage();
>>
>> 3. add --realtime for no particular reason.
>>
>> --- run.sh.org	2010-07-06 15:54:58.000000000 +0200
>> +++ run.sh	2010-07-06 16:37:34.000000000 +0200
>> @@ -22,7 +22,7 @@ make
>>   # process to run realtime.  The remainder of the processes (if any)
>>   # will run non-realtime in any case.
>>
>> -nthread=5000
>> +nthread=500

Was this just to lighten the load, or was it required to reproduce?

>>   iter=400
>>   nproc=5
>>
>> @@ -39,7 +39,7 @@ i=0
>>   i=1
>>   while test $i -lt $nproc
>>   do
>> -        ./pthread_cond_many --broadcast -i $iter -n $nthread>  $nthread.$iter.$nproc.$i.out&
>> +        ./pthread_cond_many --realtime --broadcast -i $iter -n $nthread>  $nthread.$iter.$nproc.$i.out&
>>           i=`expr $i + 1`
>>   done
>>   wait
>>

We'll do an audit and see if any pthread_cond_many patches have been 
dropped, or just fix the above issues if not.


>> 4. run it.
>>

Which architecture?

Glibc version?

I see kernel version is: 2.6.33.6-rt23, have you reproduced this on 
earlier kernel versions as well? Any 2.6.31 rt kernel would be a good 
data point.

Is this immediately reproducible for you?

I see a possibly fault occuring in the stack, if you run with 
mlockall(), does the problem go away? (assuming not, but an easy thing 
to test).

Nothing comes to mind re. cause quite yet, will have to dig into it.

--
Darren

>> What happens here is we hit WARN_ON(pendowner->pi_blocked_on != waiter),
>> this does not make it to consoles (poking sysrq-foo doesn't either).
>> Next comes WARN_ON(!pendowner->pi_blocked_on), followed by the NULL
>> explosion, which does make it to consoles.
>>
>> With explosion avoidance, I also see pendowner->pi_blocked_on->task ==
>> NULL at times, but that, as !pendowner->pi_blocked_on, seems to be
>> fallout.  The start of bad juju is always pi_blocked_on != waiter.
>>
>> [  141.609268] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
>> [  141.609268] IP: [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177
>> [  141.609268] PGD 20e174067 PUD 20e253067 PMD 0
>> [  141.609268] Oops: 0000 [#1] PREEMPT SMP
>> [  141.609268] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
>> [  141.609268] CPU 0
>> [  141.609268] Pid: 8154, comm: pthread_cond_ma Tainted: G        W  2.6.33.6-rt23 #12 MS-7502/MS-7502
>> [  141.609268] RIP: 0010:[<ffffffff8106856d>]  [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177
>> [  141.609268] RSP: 0018:ffff88020e3cdd78  EFLAGS: 00010097
>> [  141.609268] RAX: 0000000000000000 RBX: ffff8801e8eba5c0 RCX: 0000000000000000
>> [  141.609268] RDX: ffff880028200000 RSI: 0000000000000046 RDI: 0000000000000009
>> [  141.609268] RBP: ffff88020e3cdda8 R08: 0000000000000002 R09: 0000000000000000
>> [  141.609268] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff81659068
>> [  141.609268] R13: ffff8801e8ebdb58 R14: 0000000000000000 R15: ffff8801e8ebac08
>> [  141.609268] FS:  00007f664d539700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
>> [  141.609268] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  141.609268] CR2: 0000000000000058 CR3: 0000000214266000 CR4: 00000000000006f0
>> [  141.609268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  141.609268] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>> [  141.609268] Process pthread_cond_ma (pid: 8154, threadinfo ffff88020e3cc000, task ffff88020e2a4700)
>> [  141.609268] Stack:
>> [  141.609268]  0000000000000000 ffffffff81659068 0000000000000202 0000000000000000
>> [  141.609268]<0>  0000000000000000 0000000080001fda ffff88020e3cddc8 ffffffff812fec48
>> [  141.609268]<0>  ffffffff81659068 0000000000606300 ffff88020e3cddd8 ffffffff812ff1b9
>> [  141.609268] Call Trace:
>> [  141.609268]  [<ffffffff812fec48>] rt_spin_lock_slowunlock+0x43/0x61
>> [  141.609268]  [<ffffffff812ff1b9>] rt_spin_unlock+0x46/0x48
>> [  141.609268]  [<ffffffff81067d7f>] do_futex+0x83c/0x935
>> [  141.609268]  [<ffffffff810c26ce>] ? handle_mm_fault+0x6de/0x6f1
>> [  141.609268]  [<ffffffff81067e36>] ? do_futex+0x8f3/0x935
>> [  141.609268]  [<ffffffff81067fba>] sys_futex+0x142/0x154
>> [  141.609268]  [<ffffffff81020eb0>] ? do_page_fault+0x23e/0x28e
>> [  141.609268]  [<ffffffff81004aa7>] ? math_state_restore+0x3d/0x3f
>> [  141.609268]  [<ffffffff81004b08>] ? do_device_not_available+0xe/0x12
>> [  141.609268]  [<ffffffff81002c5b>] system_call_fastpath+0x16/0x1b
>> [  141.609268] Code: c7 09 6d 41 81 e8 ac 34 fd ff 4c 39 ab 70 06 00 00 74 11 be 47 02 00 00 48 c7 c7 09 6d 41 81 e8 92 34 fd ff 48 8b 83 70 06 00 00<4c>  39 60 58 74 11 be 48 02 00 00 48 c7 c7 09 6d 41 81 e8 74 34
>> [  141.609268] RIP  [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177
>> [  141.609268]  RSP<ffff88020e3cdd78>
>> [  141.609268] CR2: 0000000000000058
>> [  141.609268] ---[ end trace 58805b944e6f93ce ]---
>> [  141.609268] note: pthread_cond_ma[8154] exited with preempt_count 2
>>
>> (5. eyeball locks.. ->  zzzzt ->  report ->  eyeball..)
>>
>> 	-Mike
>>


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

  reply	other threads:[~2010-07-07 14:04 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-07-07  4:46 2.6.33.[56]-rt23: howto create repeatable explosion in wakeup_next_waiter() Mike Galbraith
2010-07-07  8:03 ` Mike Galbraith
2010-07-07 11:57   ` Thomas Gleixner
2010-07-07 12:50     ` Mike Galbraith
2010-07-07 11:57 ` Thomas Gleixner
2010-07-07 14:03   ` Darren Hart [this message]
2010-07-07 14:17     ` Mike Galbraith
2010-07-08 12:05     ` Mike Galbraith
2010-07-08 14:12       ` Darren Hart
2010-07-09  2:11   ` Darren Hart
2010-07-09  4:32     ` Mike Galbraith
     [not found]     ` <4C36CD83.6070809@us.ibm.com>
2010-07-09  8:13       ` Mike Galbraith
2010-07-09 13:58       ` Mike Galbraith
2010-07-09 14:51         ` Mike Galbraith
2010-07-09 16:35         ` Darren Hart
2010-07-09 19:34           ` Mike Galbraith
2010-07-09 20:05   ` Darren Hart
2010-07-13  8:03   ` [PATCH][RT] futex: protect against pi_blocked_on corruption during requeue PI Darren Hart
2010-07-13  9:25     ` Thomas Gleixner
2010-07-13 10:28       ` Thomas Gleixner
2010-07-13 11:52         ` [PATCH][RT] futex: protect against pi_blocked_on corruption during requeue PI -V2 Thomas Gleixner
2010-07-13 15:57           ` Mike Galbraith
2010-07-13 18:59           ` Darren Hart
2010-07-18  8:32           ` Mike Galbraith
2010-07-13  9:58     ` [PATCH][RT] futex: protect against pi_blocked_on corruption during requeue PI Thomas Gleixner
2010-07-07 14:11 ` 2.6.33.[56]-rt23: howto create repeatable explosion in wakeup_next_waiter() gowrishankar
2010-07-07 14:31   ` Mike Galbraith
2010-07-07 15:05     ` Darren Hart
2010-07-07 17:45       ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4C348940.9070805@us.ibm.com \
    --to=dvhltc@us.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=efault@gmx.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.