linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Shaohua Li <shaohua.li@intel.com>
Cc: Tejun Heo <tj@kernel.org>, Vivek Goyal <vgoyal@redhat.com>,
	linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [patch]block: fix NULL icq_cache reference
Date: Thu, 19 Jan 2012 09:20:51 +0100	[thread overview]
Message-ID: <4F17D263.4090803@kernel.dk> (raw)
In-Reply-To: <1326937294.22361.649.camel@sli10-conroe>

On 01/19/2012 02:41 AM, Shaohua Li wrote:
> On Wed, 2012-01-18 at 08:07 -0800, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Jan 18, 2012 at 02:03:22PM +0800, Shaohua Li wrote:
>>> Subject: block: fix NULL icq_cache reference
>>>
>>> CPU0:					CPU1:
>>> 					switch from cfq to noop
>>> 					   elv_quiesce_start
>>> C: get_request
>>> A:   ioc_create_icq
>>>       alloc icq with cfq
>>> 					   B: do elevator switch
>>> 					       ioc_clear_queue
>>> 					   elv_quiesce_end
>>>       insert icq to ioc
>>> 					switch from noop to cfq
>>> 					    elv_quiesce_start
>>> 					    do elevator switch
>>> 					       ioc_clear_queue
>>> 					    elv_quiesce_end
>>> CPU0 leaves some icq to ioc list after elevator is switching from cfq to noop.
>>> in the second ioc_clear_queue, the ioc has icq in its list, but current
>>> elevator is noop. so ioc_exit_icq will get a NULL et->icq_cache.
>>>
>>> In above cases, if A runs after B, ioc_create_icq will have a NULL
>>> et->icq_cache, this will trigger another crash.
>>>
>>> Note, get_request caches et without lock hold. Between C and A, a elevator
>>> switch can start. But we will have elvpriv++, elv_quiesce_start will drain
>>> all requests first. So this will not trigger crash.
>>
>> Thanks a lot for tracking it down.
>>
>> Hmmm... but I'm having a difficult time following the description.
>> Maybe we can simplify a bit?  e.g. sth like the following?
>>
>>   Once a queue is quiesced, it's not supposed to have any elvpriv data
>>   or icq's, and elevator switching depends on that.  Request alloc
>>   path followed the rule for elvpriv data but forgot apply it to
>>   icq's leading to the following crash during elevator switch.
>>
>>   <oops log>
>>
>>   Fix it by not allocating icq's if ELVPRIV is not set for the
>>   request.
> I'm trying to explain why there is a crash, but fine to use your
> description.
> 
>>> Index: linux/block/blk-core.c
>>> ===================================================================
>>> --- linux.orig/block/blk-core.c	2012-01-18 12:44:13.000000000 +0800
>>> +++ linux/block/blk-core.c	2012-01-18 12:45:28.000000000 +0800
>>> @@ -872,11 +872,11 @@ retry:
>>>  	spin_unlock_irq(q->queue_lock);
>>>  
>>>  	/* create icq if missing */
>>> -	if (unlikely(et->icq_cache && !icq))
>>> +	if (unlikely(et->icq_cache && !icq && (rw_flags & REQ_ELVPRIV)))
>>>  		icq = ioc_create_icq(q, gfp_mask);
>>>  
>>>  	/* rqs are guaranteed to have icq on elv_set_request() if requested */
>>> -	if (likely(!et->icq_cache || icq))
>>> +	if (likely(!et->icq_cache || icq || !(rw_flags & REQ_ELVPRIV)))
>>>  		rq = blk_alloc_request(q, icq, rw_flags, gfp_mask);
>>
>> Hmmm... I was trying to avoid adding a goto label with the double
>> testing but with REQ_ELVPRIV test added, it looks more confusing.
>> Maybe something like the following is better?
>>
>> 	/* rqs are guaranteed to have icq on elv_set_request() if requested */
>> 	if ((rw_flags & REQ_ELVPRIV) && unlikely(et->icq_cache && !icq)) {
>> 		icq = ioc_create_icq(q, gfp_mask);
>> 		if (!icq)
>> 			goto fail_icq;
>> 	}
>> 	rq = blk_alloc_request(q, icq, rw_flags, gfp_mask);
>>  fail_icq:
> this is ok.
> 
> Subject: block: fix NULL icq_cache reference
> 
> Vivek reported a kernel crash:
> [   94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
> [   94.218004] IP: [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
> [   94.218004] PGD 13abda067 PUD 137d52067 PMD 0
> [   94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   94.218004] CPU 0
> [   94.218004] Modules linked in: [last unloaded: scsi_wait_scan]
> [   94.218004]
> [   94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ #16 Hewlett-Packard HP xw6600 Workstation/0A9Ch
> [   94.218004] RIP: 0010:[<ffffffff81142fae>]  [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
> [   94.218004] RSP: 0018:ffff88013fc03de0  EFLAGS: 00010006
> [   94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b
> [   94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae
> [   94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001
> [   94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [   94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246
> [   94.218004] FS:  0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
> [   94.218004] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0
> [   94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020)
> [   94.218004] Stack:
> [   94.218004]  0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00
> [   94.218004]  0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6
> [   94.218004]  ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240
> [   94.218004] Call Trace:
> [   94.218004]  <IRQ>
> [   94.218004]  [<ffffffff812f6eb6>] icq_free_icq_rcu+0x16/0x20
> [   94.218004]  [<ffffffff810c8da2>] __rcu_process_callbacks+0x1c2/0x420
> [   94.218004]  [<ffffffff810c9038>] rcu_process_callbacks+0x38/0x250
> [   94.218004]  [<ffffffff810405ee>] __do_softirq+0xce/0x3e0
> [   94.218004]  [<ffffffff8108ed04>] ? clockevents_program_event+0x74/0x100
> [   94.218004]  [<ffffffff81090104>] ? tick_program_event+0x24/0x30
> [   94.218004]  [<ffffffff8183ed1c>] call_softirq+0x1c/0x30
> [   94.218004]  [<ffffffff8100422d>] do_softirq+0x8d/0xc0
> [   94.218004]  [<ffffffff81040c3e>] irq_exit+0xae/0xe0
> [   94.218004]  [<ffffffff8183f4be>] smp_apic_timer_interrupt+0x6e/0x99
> [   94.218004]  [<ffffffff8183e330>] apic_timer_interrupt+0x70/0x80
> 
> Once a queue is quiesced, it's not supposed to have any elvpriv data or
> icq's, and elevator switching depends on that.  Request alloc path
> followed the rule for elvpriv data but forgot apply it to icq's
> leading to the following crash during elevator switch. Fix it by not
> allocating icq's if ELVPRIV is not set for the request.
> 
> Reported-by: Vivek Goyal <vgoyal@redhat.com>
> Tested-by: Vivek Goyal <vgoyal@redhat.com>
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>

Thanks, applied.

-- 
Jens Axboe


      parent reply	other threads:[~2012-01-19  8:21 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-17 20:18 Kernel crash in icq_free_icq_rcu Vivek Goyal
2012-01-17 20:19 ` Jens Axboe
2012-01-17 20:40   ` Vivek Goyal
2012-01-17 20:42     ` Jens Axboe
2012-01-17 20:58       ` Vivek Goyal
2012-01-17 21:01         ` Vivek Goyal
2012-01-17 21:48 ` Tejun Heo
2012-01-17 22:07   ` Vivek Goyal
2012-01-18  1:01     ` Shaohua Li
2012-01-18  1:03       ` Tejun Heo
2012-01-18  1:05         ` Shaohua Li
2012-01-18  1:11           ` Tejun Heo
2012-01-18  1:30             ` Shaohua Li
2012-01-18  2:26               ` Shaohua Li
2012-01-18  4:23                 ` Shaohua Li
2012-01-18  6:03               ` Shaohua Li
2012-01-18 13:51                 ` Vivek Goyal
2012-01-18 14:20                   ` Vivek Goyal
2012-01-18 16:09                     ` Tejun Heo
2012-01-18 16:24                       ` Jens Axboe
2012-01-18 16:31                         ` Jens Axboe
2012-01-18 16:36                           ` Vivek Goyal
2012-01-18 17:10                             ` Tejun Heo
2012-01-18 19:07                               ` Jens Axboe
2012-01-18 19:05                             ` Jens Axboe
2012-01-18 16:55                           ` Tejun Heo
2012-01-18 16:07                 ` Tejun Heo
2012-01-19  1:41                   ` [patch]block: fix NULL icq_cache reference Shaohua Li
2012-01-19  1:43                     ` Tejun Heo
2012-01-19  8:20                     ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F17D263.4090803@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=shaohua.li@intel.com \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).