linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch]block: fix ioc locking warning
@ 2012-02-06  7:50 Shaohua Li
  2012-02-06  7:55 ` Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 49+ messages in thread
From: Shaohua Li @ 2012-02-06  7:50 UTC (permalink / raw)
  To: lkml; +Cc: Linus Torvalds, Jens Axboe, Tejun Heo, Knut Petersen, mroos

Meelis reported a warning:

WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
Hardware name: 939Dual-SATA2
timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
Call Trace:
 <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
 [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
 [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
 [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
 [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
 [<ffffffff810145fd>] ? apic_write+0x11/0x13
 [<ffffffff81027475>] __do_softirq+0x74/0xfa
 [<ffffffff812f337a>] call_softirq+0x1a/0x30
 [<ffffffff81002ff9>] do_softirq+0x31/0x68
 [<ffffffff810276cf>] irq_exit+0x3d/0xa3
 [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
 [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
 <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
 [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
 [<ffffffff8100801f>] ? default_idle+0x1e/0x32
 [<ffffffff81008019>] ? default_idle+0x18/0x32
 [<ffffffff810008b1>] cpu_idle+0x87/0xd1
 [<ffffffff812de861>] rest_init+0x85/0x89
 [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
 [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
 [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7

this_q == locked_q is possible. There are two problems here:
1. In UP case, there is preemption counter issue as spin_trylock always
successes.
2. In SMP case, the loop breaks too earlier.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Tested-by: Knut Petersen <Knut_Petersen@t-online.de>

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 27a06e0..7490b6d 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -204,7 +204,9 @@ void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 				spin_unlock(last_q->queue_lock);
 			last_q = NULL;
 
-			if (!spin_trylock(this_q->queue_lock))
+			/* spin_trylock() always successes in UP case */
+			if (this_q != locked_q &&
+			    !spin_trylock(this_q->queue_lock))
 				break;
 			last_q = this_q;
 			continue;



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06  7:50 [patch]block: fix ioc locking warning Shaohua Li
@ 2012-02-06  7:55 ` Jens Axboe
  2012-02-06 15:12 ` Vivek Goyal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 49+ messages in thread
From: Jens Axboe @ 2012-02-06  7:55 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Linus Torvalds, Tejun Heo, Knut Petersen, mroos

On 02/06/2012 08:50 AM, Shaohua Li wrote:
> Meelis reported a warning:
> 
> WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
> Hardware name: 939Dual-SATA2
> timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
> Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
> Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
> Call Trace:
>  <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>  [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
>  [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>  [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
>  [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
>  [<ffffffff810145fd>] ? apic_write+0x11/0x13
>  [<ffffffff81027475>] __do_softirq+0x74/0xfa
>  [<ffffffff812f337a>] call_softirq+0x1a/0x30
>  [<ffffffff81002ff9>] do_softirq+0x31/0x68
>  [<ffffffff810276cf>] irq_exit+0x3d/0xa3
>  [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
>  [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
>  <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>  [<ffffffff8100801f>] ? default_idle+0x1e/0x32
>  [<ffffffff81008019>] ? default_idle+0x18/0x32
>  [<ffffffff810008b1>] cpu_idle+0x87/0xd1
>  [<ffffffff812de861>] rest_init+0x85/0x89
>  [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
>  [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
>  [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7
> 
> this_q == locked_q is possible. There are two problems here:
> 1. In UP case, there is preemption counter issue as spin_trylock always
> successes.
> 2. In SMP case, the loop breaks too earlier.

Thanks Shaohua, applied!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06  7:50 [patch]block: fix ioc locking warning Shaohua Li
  2012-02-06  7:55 ` Jens Axboe
@ 2012-02-06 15:12 ` Vivek Goyal
  2012-02-06 16:09   ` Jens Axboe
  2012-02-06 16:22 ` Tejun Heo
  2012-02-08 18:07 ` walt
  3 siblings, 1 reply; 49+ messages in thread
From: Vivek Goyal @ 2012-02-06 15:12 UTC (permalink / raw)
  To: Shaohua Li
  Cc: lkml, Linus Torvalds, Jens Axboe, Tejun Heo, Knut Petersen, mroos

On Mon, Feb 06, 2012 at 03:50:11PM +0800, Shaohua Li wrote:
> Meelis reported a warning:
> 
> WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
> Hardware name: 939Dual-SATA2
> timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
> Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
> Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
> Call Trace:
>  <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>  [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
>  [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>  [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
>  [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
>  [<ffffffff810145fd>] ? apic_write+0x11/0x13
>  [<ffffffff81027475>] __do_softirq+0x74/0xfa
>  [<ffffffff812f337a>] call_softirq+0x1a/0x30
>  [<ffffffff81002ff9>] do_softirq+0x31/0x68
>  [<ffffffff810276cf>] irq_exit+0x3d/0xa3
>  [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
>  [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
>  <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>  [<ffffffff8100801f>] ? default_idle+0x1e/0x32
>  [<ffffffff81008019>] ? default_idle+0x18/0x32
>  [<ffffffff810008b1>] cpu_idle+0x87/0xd1
>  [<ffffffff812de861>] rest_init+0x85/0x89
>  [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
>  [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
>  [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7
> 
> this_q == locked_q is possible. There are two problems here:
> 1. In UP case, there is preemption counter issue as spin_trylock always
> successes.
> 2. In SMP case, the loop breaks too earlier.

Thanks Shaohua. So is it the case where there are more than one cic's on
ioc->ioc_list and first cic's queue is not same as locked_queue. But some
other cic other than first has queue same as locked_queue.

In that case current code will still defer freeing of ioc and cic to a
worker thread. So this patch will introduce one optimization to handle
those cases and avoid calling worker thread.

Secondly it fixes the discrepancy of preemption count on UP machines, 
where we have one extra preemption count after finish of function
put_io_context(). So for UP case spin_trylock() increases the preemption
count and always returns success. As this_q == locked_q we never try to do
unlock on this queue and hence never decrement the preemption count
hence resulting in preemption count warning.

Changlog was not obivious atleast to me. I wished it was little more
descriptive. Anyway, patch is already committed..

Thanks
Vivek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 15:12 ` Vivek Goyal
@ 2012-02-06 16:09   ` Jens Axboe
  2012-02-06 16:37     ` Vivek Goyal
  0 siblings, 1 reply; 49+ messages in thread
From: Jens Axboe @ 2012-02-06 16:09 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, lkml, Linus Torvalds, Tejun Heo, Knut Petersen, mroos

On 02/06/2012 04:12 PM, Vivek Goyal wrote:
> On Mon, Feb 06, 2012 at 03:50:11PM +0800, Shaohua Li wrote:
>> Meelis reported a warning:
>>
>> WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
>> Hardware name: 939Dual-SATA2
>> timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
>> Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
>> Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
>> Call Trace:
>>  <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
>>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>>  [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
>>  [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
>>  [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
>>  [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
>>  [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
>>  [<ffffffff810145fd>] ? apic_write+0x11/0x13
>>  [<ffffffff81027475>] __do_softirq+0x74/0xfa
>>  [<ffffffff812f337a>] call_softirq+0x1a/0x30
>>  [<ffffffff81002ff9>] do_softirq+0x31/0x68
>>  [<ffffffff810276cf>] irq_exit+0x3d/0xa3
>>  [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
>>  [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
>>  <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
>>  [<ffffffff8100801f>] ? default_idle+0x1e/0x32
>>  [<ffffffff81008019>] ? default_idle+0x18/0x32
>>  [<ffffffff810008b1>] cpu_idle+0x87/0xd1
>>  [<ffffffff812de861>] rest_init+0x85/0x89
>>  [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
>>  [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
>>  [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7
>>
>> this_q == locked_q is possible. There are two problems here:
>> 1. In UP case, there is preemption counter issue as spin_trylock always
>> successes.
>> 2. In SMP case, the loop breaks too earlier.
> 
> Thanks Shaohua. So is it the case where there are more than one cic's on
> ioc->ioc_list and first cic's queue is not same as locked_queue. But some
> other cic other than first has queue same as locked_queue.
> 
> In that case current code will still defer freeing of ioc and cic to a
> worker thread. So this patch will introduce one optimization to handle
> those cases and avoid calling worker thread.
> 
> Secondly it fixes the discrepancy of preemption count on UP machines, 
> where we have one extra preemption count after finish of function
> put_io_context(). So for UP case spin_trylock() increases the preemption
> count and always returns success. As this_q == locked_q we never try to do
> unlock on this queue and hence never decrement the preemption count
> hence resulting in preemption count warning.
> 
> Changlog was not obivious atleast to me. I wished it was little more
> descriptive. Anyway, patch is already committed..

We can always amend the changelog, so don't worry about it already being
committed. If you want to add/change something, just send it in.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06  7:50 [patch]block: fix ioc locking warning Shaohua Li
  2012-02-06  7:55 ` Jens Axboe
  2012-02-06 15:12 ` Vivek Goyal
@ 2012-02-06 16:22 ` Tejun Heo
  2012-02-08 18:07 ` walt
  3 siblings, 0 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-06 16:22 UTC (permalink / raw)
  To: Shaohua Li; +Cc: lkml, Linus Torvalds, Jens Axboe, Knut Petersen, mroos

Hello,

On Mon, Feb 06, 2012 at 03:50:11PM +0800, Shaohua Li wrote:
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index 27a06e0..7490b6d 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -204,7 +204,9 @@ void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
>  				spin_unlock(last_q->queue_lock);
>  			last_q = NULL;
>  
> -			if (!spin_trylock(this_q->queue_lock))
> +			/* spin_trylock() always successes in UP case */
> +			if (this_q != locked_q &&
> +			    !spin_trylock(this_q->queue_lock))

Yeah, this is the right thing to do.  I think the comment is slightly
misleading.  I'll prep a patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 16:09   ` Jens Axboe
@ 2012-02-06 16:37     ` Vivek Goyal
  2012-02-06 16:44       ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Vivek Goyal @ 2012-02-06 16:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Shaohua Li, lkml, Linus Torvalds, Tejun Heo, Knut Petersen, mroos

On Mon, Feb 06, 2012 at 05:09:05PM +0100, Jens Axboe wrote:

[..]
> > Changlog was not obivious atleast to me. I wished it was little more
> > descriptive. Anyway, patch is already committed..
> 
> We can always amend the changelog, so don't worry about it already being
> committed. If you want to add/change something, just send it in.

How about something like as follows.


block: Do not lock try to lock already locked queue again

put_io_context() can be called with one of the request queue lock 
already held. But if the locked queue's cic is not first in the
ioc->ioc_list, then we have two possible issues.

- For above condition, current code bails out and schedules the worker
  thread for freeing up ioc. This can be optimized.

- It might happen that we received the queue locked but we still do the
  trylock on the queue. For SMP case that's not a problem as we will
  fail to lock already locked queue, but in case of UP, we seem to
  succeed and in the process increment the preempt count. Once we are
  done with ioc_exit_icq(), we do not call spin_unlock() on locked
  queue as we are not supposed to. This leads to imbalance in preemtion
  count and following warning was reported.

This patch fixes both the above issues by making sure we do not try to
lock already locked queue again.

WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
Hardware name: 939Dual-SATA2
timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd
v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
Call Trace:
<IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
[<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
[<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
[<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
[<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
[<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
[<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
[<ffffffff810145fd>] ? apic_write+0x11/0x13
[<ffffffff81027475>] __do_softirq+0x74/0xfa
[<ffffffff812f337a>] call_softirq+0x1a/0x30
[<ffffffff81002ff9>] do_softirq+0x31/0x68
[<ffffffff810276cf>] irq_exit+0x3d/0xa3
[<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
[<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
<EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
[<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
[<ffffffff8100801f>] ? default_idle+0x1e/0x32
[<ffffffff81008019>] ? default_idle+0x18/0x32
[<ffffffff810008b1>] cpu_idle+0x87/0xd1
[<ffffffff812de861>] rest_init+0x85/0x89
[<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
[<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
[<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Tested-by: Knut Petersen <Knut_Petersen@t-online.de>

Thanks
Vivek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 16:37     ` Vivek Goyal
@ 2012-02-06 16:44       ` Tejun Heo
  2012-02-06 16:58         ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-06 16:44 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jens Axboe, Shaohua Li, lkml, Linus Torvalds, Knut Petersen, mroos

Hello,

On Mon, Feb 06, 2012 at 11:37:21AM -0500, Vivek Goyal wrote:
> block: Do not lock try to lock already locked queue again
> 
> put_io_context() can be called with one of the request queue lock 
> already held. But if the locked queue's cic is not first in the
> ioc->ioc_list, then we have two possible issues.
> 
> - For above condition, current code bails out and schedules the worker
>   thread for freeing up ioc. This can be optimized.
> 
> - It might happen that we received the queue locked but we still do the
>   trylock on the queue. For SMP case that's not a problem as we will
>   fail to lock already locked queue, but in case of UP, we seem to
>   succeed and in the process increment the preempt count. Once we are
>   done with ioc_exit_icq(), we do not call spin_unlock() on locked
>   queue as we are not supposed to. This leads to imbalance in preemtion
>   count and following warning was reported.
>
> This patch fixes both the above issues by making sure we do not try to
> lock already locked queue again.
> 
> WARNING: at kernel/timer.c:1122 run_timer_softirq+0x199/0x1ec()
> Hardware name: 939Dual-SATA2
> timer: cfq_idle_slice_timer+0x0/0xaa preempt leak: 00000102 -> 00000103
> Modules linked in: sr_mod cdrom videodev media drm_kms_helper ohci_hcd ehci_hcd
> v4l2_compat_ioctl32 usbcore i2c_ali15x3 snd_seq drm snd_timer snd_seq
> Pid: 0, comm: swapper Not tainted 3.3.0-rc2-00110-gd125666 #176
> Call Trace:
> <IRQ>  [<ffffffff81022aaa>] warn_slowpath_common+0x7e/0x96
> [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
> [<ffffffff81022b56>] warn_slowpath_fmt+0x41/0x43
> [<ffffffff8114c526>] ? cfq_idle_slice_timer+0xa1/0xaa
> [<ffffffff8114c485>] ? cfq_slice_expired+0x1d/0x1d
> [<ffffffff8102c124>] run_timer_softirq+0x199/0x1ec
> [<ffffffff81047a53>] ? timekeeping_get_ns+0x12/0x31
> [<ffffffff810145fd>] ? apic_write+0x11/0x13
> [<ffffffff81027475>] __do_softirq+0x74/0xfa
> [<ffffffff812f337a>] call_softirq+0x1a/0x30
> [<ffffffff81002ff9>] do_softirq+0x31/0x68
> [<ffffffff810276cf>] irq_exit+0x3d/0xa3
> [<ffffffff81014aca>] smp_apic_timer_interrupt+0x6b/0x77
> [<ffffffff812f2de9>] apic_timer_interrupt+0x69/0x70
> <EOI>  [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
> [<ffffffff81040136>] ? sched_clock_cpu+0x73/0x7d
> [<ffffffff8100801f>] ? default_idle+0x1e/0x32
> [<ffffffff81008019>] ? default_idle+0x18/0x32
> [<ffffffff810008b1>] cpu_idle+0x87/0xd1
> [<ffffffff812de861>] rest_init+0x85/0x89
> [<ffffffff81659a4d>] start_kernel+0x2eb/0x2f8
> [<ffffffff8165926e>] x86_64_start_reservations+0x7e/0x82
> [<ffffffff81659362>] x86_64_start_kernel+0xf0/0xf7
> 
> Signed-off-by: Shaohua Li <shaohua.li@intel.com>
> Reported-by: Meelis Roos <mroos@linux.ee>
> Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
> Tested-by: Knut Petersen <Knut_Petersen@t-online.de>

Yeah, this seems better to me.  Jens, if you're gonna amend the
commit, please consider collapsing the following patch into the
original patch too.  Thanks.

diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 7490b6d..12978fc 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -204,7 +204,14 @@ void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 				spin_unlock(last_q->queue_lock);
 			last_q = NULL;
 
-			/* spin_trylock() always successes in UP case */
+			/*
+			 * If icq for locked_q wasn't at the head of
+			 * icq_list, we can try to switch back to locked_q.
+			 * On SMP, the following locked_q test avoids
+			 * unnecessary deferring to release_work, on UP,
+			 * incorrect lock state transition (trylock
+			 * succeeding while holding the same lock).
+			 */
 			if (this_q != locked_q &&
 			    !spin_trylock(this_q->queue_lock))
 				break;

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 16:44       ` Tejun Heo
@ 2012-02-06 16:58         ` Linus Torvalds
  2012-02-06 17:27           ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2012-02-06 16:58 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Vivek Goyal, Jens Axboe, Shaohua Li, lkml, Knut Petersen, mroos

On Mon, Feb 6, 2012 at 8:44 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Yeah, this seems better to me.  Jens, if you're gonna amend the
> commit, please consider collapsing the following patch into the
> original patch too.  Thanks.

Guys, is it *really* worth it to do all these crazy games?

How bad is it to just always use the async freeing, instead of this
clearly very fragile crazy direct-freeing-with-serious-locking-issues
thing?

Sure, even ioc_release_fn() isn't trivial wrt lock handling, but at
least it doesn't have to play these *insane* games with recursive
locking.

And if workqueues are too expensive, what saner alternatives might
there be? This code really is insane.

                Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 16:58         ` Linus Torvalds
@ 2012-02-06 17:27           ` Tejun Heo
  2012-02-06 20:16             ` Jens Axboe
  2012-02-06 20:36             ` [patch]block: fix ioc locking warning Tejun Heo
  0 siblings, 2 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-06 17:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vivek Goyal, Jens Axboe, Shaohua Li, lkml, Knut Petersen, mroos

Hello,

On Mon, Feb 06, 2012 at 08:58:49AM -0800, Linus Torvalds wrote:
> On Mon, Feb 6, 2012 at 8:44 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > Yeah, this seems better to me.  Jens, if you're gonna amend the
> > commit, please consider collapsing the following patch into the
> > original patch too.  Thanks.
> 
> Guys, is it *really* worth it to do all these crazy games?
>
> How bad is it to just always use the async freeing, instead of this
> clearly very fragile crazy direct-freeing-with-serious-locking-issues
> thing?

It's one wq scheduling on exit for any task which has issued an IO.  I
don't think it would matter except for task fork/exit microbenchs (or
workloads which approximate to that).  I'll get some measurements and
strip the optimization if it doesn't really show up.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 17:27           ` Tejun Heo
@ 2012-02-06 20:16             ` Jens Axboe
  2012-02-06 21:54               ` [PATCH] block: strip out locking optimization in put_io_context() Tejun Heo
  2012-02-06 20:36             ` [patch]block: fix ioc locking warning Tejun Heo
  1 sibling, 1 reply; 49+ messages in thread
From: Jens Axboe @ 2012-02-06 20:16 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

On 2012-02-06 18:27, Tejun Heo wrote:
> Hello,
> 
> On Mon, Feb 06, 2012 at 08:58:49AM -0800, Linus Torvalds wrote:
>> On Mon, Feb 6, 2012 at 8:44 AM, Tejun Heo <tj@kernel.org> wrote:
>>>
>>> Yeah, this seems better to me.  Jens, if you're gonna amend the
>>> commit, please consider collapsing the following patch into the
>>> original patch too.  Thanks.
>>
>> Guys, is it *really* worth it to do all these crazy games?
>>
>> How bad is it to just always use the async freeing, instead of this
>> clearly very fragile crazy direct-freeing-with-serious-locking-issues
>> thing?
> 
> It's one wq scheduling on exit for any task which has issued an IO.  I
> don't think it would matter except for task fork/exit microbenchs (or
> workloads which approximate to that).  I'll get some measurements and
> strip the optimization if it doesn't really show up.

One (arguably stupid) thing that some users do do is something like:

$ find . -exec grep foo '{}' \;

So that would probably be a good pathological test case for this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 17:27           ` Tejun Heo
  2012-02-06 20:16             ` Jens Axboe
@ 2012-02-06 20:36             ` Tejun Heo
  2012-02-07  0:31               ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-06 20:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Vivek Goyal, Jens Axboe, Shaohua Li, lkml, Knut Petersen, mroos

On Mon, Feb 06, 2012 at 09:27:06AM -0800, Tejun Heo wrote:
> It's one wq scheduling on exit for any task which has issued an IO.  I
> don't think it would matter except for task fork/exit microbenchs (or
> workloads which approximate to that).  I'll get some measurements and
> strip the optimization if it doesn't really show up.

I'm still playing with test methods and getting numbers but the
following is the simplified one of the three setups I'm playing with -
the current one, simplified and no optimization.  There *seems* to be
appreciable performance degradation on fork/exit w/ ioc microbenchs so
I'm likely to go with the following.  I'll post when I know more.

Thanks.
---
 block/blk-ioc.c |   50 +++++++++++++-------------------------------------
 1 file changed, 13 insertions(+), 37 deletions(-)

Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -158,7 +158,6 @@ static void ioc_release_fn(struct work_s
  */
 void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 {
-	struct request_queue *last_q = locked_q;
 	unsigned long flags;
 
 	if (ioc == NULL)
@@ -171,50 +170,27 @@ void put_io_context(struct io_context *i
 	if (!atomic_long_dec_and_test(&ioc->refcount))
 		return;
 
-	/*
-	 * Destroy @ioc.  This is a bit messy because icq's are chained
-	 * from both ioc and queue, and ioc->lock nests inside queue_lock.
-	 * The inner ioc->lock should be held to walk our icq_list and then
-	 * for each icq the outer matching queue_lock should be grabbed.
-	 * ie. We need to do reverse-order double lock dancing.
-	 *
-	 * Another twist is that we are often called with one of the
-	 * matching queue_locks held as indicated by @locked_q, which
-	 * prevents performing double-lock dance for other queues.
-	 *
-	 * So, we do it in two stages.  The fast path uses the queue_lock
-	 * the caller is holding and, if other queues need to be accessed,
-	 * uses trylock to avoid introducing locking dependency.  This can
-	 * handle most cases, especially if @ioc was performing IO on only
-	 * single device.
-	 *
-	 * If trylock doesn't cut it, we defer to @ioc->release_work which
-	 * can do all the double-locking dancing.
-	 */
 	spin_lock_irqsave_nested(&ioc->lock, flags,
 				 ioc_release_depth(locked_q));
 
-	while (!hlist_empty(&ioc->icq_list)) {
+	/*
+	 * Due to locking order between queue_lock and ioc lock, proper icq
+	 * shoot down requires reverse double locking dance from another
+	 * context.  As there usually is no or one icq, try to handle those
+	 * cases synchronously.
+	 */
+	if (!hlist_empty(&ioc->icq_list)) {
 		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
 						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
-
-		if (this_q != last_q) {
-			if (last_q && last_q != locked_q)
-				spin_unlock(last_q->queue_lock);
-			last_q = NULL;
-
-			if (!spin_trylock(this_q->queue_lock))
-				break;
-			last_q = this_q;
-			continue;
+		if (locked_q) {
+			if (locked_q == icq->q)
+				ioc_exit_icq(icq);
+		} else if (spin_trylock(icq->q->queue_lock)) {
+			ioc_exit_icq(icq);
+			spin_unlock(icq->q->queue_lock);
 		}
-		ioc_exit_icq(icq);
 	}
 
-	if (last_q && last_q != locked_q)
-		spin_unlock(last_q->queue_lock);
-
 	spin_unlock_irqrestore(&ioc->lock, flags);
 
 	/* if no icq is left, we're done; otherwise, kick release_work */

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-06 20:16             ` Jens Axboe
@ 2012-02-06 21:54               ` Tejun Heo
  2012-02-07  6:49                 ` Jens Axboe
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-06 21:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

put_io_context() performed a complex trylock dancing to avoid
deferring ioc release to workqueue.  It was also broken on UP because
trylock was always assumed to succeed which resulted in unbalanced
preemption count.

While there are ways to fix the UP breakage, even the most
pathological microbench (forced ioc allocation and tight fork/exit
loop) fails to show any appreciable performance benefit of the
optimization.  Strip it out.  If there turns out to be workloads which
are affected by this change, simpler optimization from the discussion
thread can be applied later.

Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
---
I couldn't find any statiscally meaningful advantage of the
optimization with tight fork/exit tests w/ forced ioc creation on
fork, which gotta be the most pathological test case for the code
path.  So, let's remove the ugly optimization.  If I missed sth, we
can resurrect the simpler optimization later.  Jens, this is on top of
linus#master without Shaohua's patch.

Thanks.

 block/blk-cgroup.c        |    2 -
 block/blk-core.c          |    2 -
 block/blk-ioc.c           |   90 +++++-----------------------------------------
 block/cfq-iosched.c       |    2 -
 fs/ioprio.c               |    2 -
 include/linux/blkdev.h    |    3 -
 include/linux/iocontext.h |    5 +-
 kernel/fork.c             |    2 -
 8 files changed, 18 insertions(+), 90 deletions(-)

Index: work/block/blk-cgroup.c
===================================================================
--- work.orig/block/blk-cgroup.c
+++ work/block/blk-cgroup.c
@@ -1659,7 +1659,7 @@ static void blkiocg_attach(struct cgroup
 		ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 		if (ioc) {
 			ioc_cgroup_changed(ioc);
-			put_io_context(ioc, NULL);
+			put_io_context(ioc);
 		}
 	}
 }
Index: work/block/blk-core.c
===================================================================
--- work.orig/block/blk-core.c
+++ work/block/blk-core.c
@@ -642,7 +642,7 @@ static inline void blk_free_request(stru
 	if (rq->cmd_flags & REQ_ELVPRIV) {
 		elv_put_request(q, rq);
 		if (rq->elv.icq)
-			put_io_context(rq->elv.icq->ioc, q);
+			put_io_context(rq->elv.icq->ioc);
 	}
 
 	mempool_free(rq, q->rq.rq_pool);
Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -29,21 +29,6 @@ void get_io_context(struct io_context *i
 }
 EXPORT_SYMBOL(get_io_context);
 
-/*
- * Releasing ioc may nest into another put_io_context() leading to nested
- * fast path release.  As the ioc's can't be the same, this is okay but
- * makes lockdep whine.  Keep track of nesting and use it as subclass.
- */
-#ifdef CONFIG_LOCKDEP
-#define ioc_release_depth(q)		((q) ? (q)->ioc_release_depth : 0)
-#define ioc_release_depth_inc(q)	(q)->ioc_release_depth++
-#define ioc_release_depth_dec(q)	(q)->ioc_release_depth--
-#else
-#define ioc_release_depth(q)		0
-#define ioc_release_depth_inc(q)	do { } while (0)
-#define ioc_release_depth_dec(q)	do { } while (0)
-#endif
-
 static void icq_free_icq_rcu(struct rcu_head *head)
 {
 	struct io_cq *icq = container_of(head, struct io_cq, __rcu_head);
@@ -75,11 +60,8 @@ static void ioc_exit_icq(struct io_cq *i
 	if (rcu_dereference_raw(ioc->icq_hint) == icq)
 		rcu_assign_pointer(ioc->icq_hint, NULL);
 
-	if (et->ops.elevator_exit_icq_fn) {
-		ioc_release_depth_inc(q);
+	if (et->ops.elevator_exit_icq_fn)
 		et->ops.elevator_exit_icq_fn(icq);
-		ioc_release_depth_dec(q);
-	}
 
 	/*
 	 * @icq->q might have gone away by the time RCU callback runs
@@ -149,79 +131,29 @@ static void ioc_release_fn(struct work_s
 /**
  * put_io_context - put a reference of io_context
  * @ioc: io_context to put
- * @locked_q: request_queue the caller is holding queue_lock of (hint)
  *
  * Decrement reference count of @ioc and release it if the count reaches
- * zero.  If the caller is holding queue_lock of a queue, it can indicate
- * that with @locked_q.  This is an optimization hint and the caller is
- * allowed to pass in %NULL even when it's holding a queue_lock.
+ * zero.
  */
-void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
+void put_io_context(struct io_context *ioc)
 {
-	struct request_queue *last_q = locked_q;
 	unsigned long flags;
 
 	if (ioc == NULL)
 		return;
 
 	BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
-	if (locked_q)
-		lockdep_assert_held(locked_q->queue_lock);
-
-	if (!atomic_long_dec_and_test(&ioc->refcount))
-		return;
 
 	/*
-	 * Destroy @ioc.  This is a bit messy because icq's are chained
-	 * from both ioc and queue, and ioc->lock nests inside queue_lock.
-	 * The inner ioc->lock should be held to walk our icq_list and then
-	 * for each icq the outer matching queue_lock should be grabbed.
-	 * ie. We need to do reverse-order double lock dancing.
-	 *
-	 * Another twist is that we are often called with one of the
-	 * matching queue_locks held as indicated by @locked_q, which
-	 * prevents performing double-lock dance for other queues.
-	 *
-	 * So, we do it in two stages.  The fast path uses the queue_lock
-	 * the caller is holding and, if other queues need to be accessed,
-	 * uses trylock to avoid introducing locking dependency.  This can
-	 * handle most cases, especially if @ioc was performing IO on only
-	 * single device.
-	 *
-	 * If trylock doesn't cut it, we defer to @ioc->release_work which
-	 * can do all the double-locking dancing.
+	 * Releasing ioc requires reverse order double locking and we may
+	 * already be holding a queue_lock.  Do it asynchronously from wq.
 	 */
-	spin_lock_irqsave_nested(&ioc->lock, flags,
-				 ioc_release_depth(locked_q));
-
-	while (!hlist_empty(&ioc->icq_list)) {
-		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
-						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
-
-		if (this_q != last_q) {
-			if (last_q && last_q != locked_q)
-				spin_unlock(last_q->queue_lock);
-			last_q = NULL;
-
-			if (!spin_trylock(this_q->queue_lock))
-				break;
-			last_q = this_q;
-			continue;
-		}
-		ioc_exit_icq(icq);
+	if (atomic_long_dec_and_test(&ioc->refcount)) {
+		spin_lock_irqsave(&ioc->lock, flags);
+		if (!hlist_empty(&ioc->icq_list))
+			schedule_work(&ioc->release_work);
+		spin_unlock_irqrestore(&ioc->lock, flags);
 	}
-
-	if (last_q && last_q != locked_q)
-		spin_unlock(last_q->queue_lock);
-
-	spin_unlock_irqrestore(&ioc->lock, flags);
-
-	/* if no icq is left, we're done; otherwise, kick release_work */
-	if (hlist_empty(&ioc->icq_list))
-		kmem_cache_free(iocontext_cachep, ioc);
-	else
-		schedule_work(&ioc->release_work);
 }
 EXPORT_SYMBOL(put_io_context);
 
@@ -236,7 +168,7 @@ void exit_io_context(struct task_struct 
 	task_unlock(task);
 
 	atomic_dec(&ioc->nr_tasks);
-	put_io_context(ioc, NULL);
+	put_io_context(ioc);
 }
 
 /**
Index: work/block/cfq-iosched.c
===================================================================
--- work.orig/block/cfq-iosched.c
+++ work/block/cfq-iosched.c
@@ -1794,7 +1794,7 @@ __cfq_slice_expired(struct cfq_data *cfq
 		cfqd->active_queue = NULL;
 
 	if (cfqd->active_cic) {
-		put_io_context(cfqd->active_cic->icq.ioc, cfqd->queue);
+		put_io_context(cfqd->active_cic->icq.ioc);
 		cfqd->active_cic = NULL;
 	}
 }
Index: work/fs/ioprio.c
===================================================================
--- work.orig/fs/ioprio.c
+++ work/fs/ioprio.c
@@ -51,7 +51,7 @@ int set_task_ioprio(struct task_struct *
 	ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 	if (ioc) {
 		ioc_ioprio_changed(ioc, ioprio);
-		put_io_context(ioc, NULL);
+		put_io_context(ioc);
 	}
 
 	return err;
Index: work/include/linux/blkdev.h
===================================================================
--- work.orig/include/linux/blkdev.h
+++ work/include/linux/blkdev.h
@@ -399,9 +399,6 @@ struct request_queue {
 	/* Throttle data */
 	struct throtl_data *td;
 #endif
-#ifdef CONFIG_LOCKDEP
-	int			ioc_release_depth;
-#endif
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
Index: work/include/linux/iocontext.h
===================================================================
--- work.orig/include/linux/iocontext.h
+++ work/include/linux/iocontext.h
@@ -133,7 +133,7 @@ static inline struct io_context *ioc_tas
 
 struct task_struct;
 #ifdef CONFIG_BLOCK
-void put_io_context(struct io_context *ioc, struct request_queue *locked_q);
+void put_io_context(struct io_context *ioc);
 void exit_io_context(struct task_struct *task);
 struct io_context *get_task_io_context(struct task_struct *task,
 				       gfp_t gfp_flags, int node);
@@ -141,8 +141,7 @@ void ioc_ioprio_changed(struct io_contex
 void ioc_cgroup_changed(struct io_context *ioc);
 #else
 struct io_context;
-static inline void put_io_context(struct io_context *ioc,
-				  struct request_queue *locked_q) { }
+static inline void put_io_context(struct io_context *ioc) { }
 static inline void exit_io_context(struct task_struct *task) { }
 #endif
 
Index: work/kernel/fork.c
===================================================================
--- work.orig/kernel/fork.c
+++ work/kernel/fork.c
@@ -890,7 +890,7 @@ static int copy_io(unsigned long clone_f
 			return -ENOMEM;
 
 		new_ioc->ioprio = ioc->ioprio;
-		put_io_context(new_ioc, NULL);
+		put_io_context(new_ioc);
 	}
 #endif
 	return 0;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06 20:36             ` [patch]block: fix ioc locking warning Tejun Heo
@ 2012-02-07  0:31               ` Shaohua Li
  2012-02-07  0:39                 ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-07  0:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

2012/2/7 Tejun Heo <tj@kernel.org>:
> On Mon, Feb 06, 2012 at 09:27:06AM -0800, Tejun Heo wrote:
>> It's one wq scheduling on exit for any task which has issued an IO.  I
>> don't think it would matter except for task fork/exit microbenchs (or
>> workloads which approximate to that).  I'll get some measurements and
>> strip the optimization if it doesn't really show up.
>
> I'm still playing with test methods and getting numbers but the
> following is the simplified one of the three setups I'm playing with -
> the current one, simplified and no optimization.  There *seems* to be
> appreciable performance degradation on fork/exit w/ ioc microbenchs so
> I'm likely to go with the following.  I'll post when I know more.
Hi,
Since you are talking about performance, one of our microbenchmark (swap)
shows a regression. Alex bisect it to be b2efa05265d62bc2, which is related to
the ioc change. A little strange to me, don't expect such change can cause
performance issue. I haven't double check the issue, but if you have
ideas, please
let me know.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  0:31               ` Shaohua Li
@ 2012-02-07  0:39                 ` Tejun Heo
  2012-02-07  0:43                   ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-07  0:39 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

Hello,

On Tue, Feb 07, 2012 at 08:31:22AM +0800, Shaohua Li wrote:
> Since you are talking about performance, one of our microbenchmark (swap)
> shows a regression. Alex bisect it to be b2efa05265d62bc2, which is related to
> the ioc change. A little strange to me, don't expect such change can cause
> performance issue. I haven't double check the issue, but if you have
> ideas, please let me know.

Hmmm... I think the only thing which can be made slower by that patch
is ioc release (ie. task exits).  What does the microbench do?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  0:39                 ` Tejun Heo
@ 2012-02-07  0:43                   ` Shaohua Li
  2012-02-07  0:59                     ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-07  0:43 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Mon, 2012-02-06 at 16:39 -0800, Tejun Heo wrote:
> Hello,
> 
> On Tue, Feb 07, 2012 at 08:31:22AM +0800, Shaohua Li wrote:
> > Since you are talking about performance, one of our microbenchmark (swap)
> > shows a regression. Alex bisect it to be b2efa05265d62bc2, which is related to
> > the ioc change. A little strange to me, don't expect such change can cause
> > performance issue. I haven't double check the issue, but if you have
> > ideas, please let me know.
> 
> Hmmm... I think the only thing which can be made slower by that patch
> is ioc release (ie. task exits).  What does the microbench do?
Quite simple, copy kernel source to tmpfs in tight memory environment.
This will trigger swap. The bisection result appears stable.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  0:43                   ` Shaohua Li
@ 2012-02-07  0:59                     ` Tejun Heo
  2012-02-07  1:10                       ` Shaohua Li
  2012-02-07  5:22                       ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-07  0:59 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Tue, Feb 07, 2012 at 08:43:50AM +0800, Shaohua Li wrote:
> Quite simple, copy kernel source to tmpfs in tight memory environment.
> This will trigger swap. The bisection result appears stable.

Looked through the commit again but it really doesn't change fast
paths.  The only paths which can be slower are ioc and q exits.  Can
you please do the followings?

* Share the script used for microbench.  Maybe it somehow exposes
  slower exit path?  How much regression are we talking about?

* Test with the this_q != locked_q trylock patch.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  0:59                     ` Tejun Heo
@ 2012-02-07  1:10                       ` Shaohua Li
  2012-02-07  1:33                         ` Shaohua Li
  2012-02-07  5:22                       ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-07  1:10 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Mon, 2012-02-06 at 16:59 -0800, Tejun Heo wrote:
> On Tue, Feb 07, 2012 at 08:43:50AM +0800, Shaohua Li wrote:
> > Quite simple, copy kernel source to tmpfs in tight memory environment.
> > This will trigger swap. The bisection result appears stable.
> 
> Looked through the commit again but it really doesn't change fast
> paths.  The only paths which can be slower are ioc and q exits.  Can
> you please do the followings?
> 
> * Share the script used for microbench.  Maybe it somehow exposes
>   slower exit path?  How much regression are we talking about?
I need strip the script out from our test framework, but it essentially
is a 'time copy -a tmpfs/kernelsource tmpfs/copy'. The time increased
~20% with 3.3-rc1.

> * Test with the this_q != locked_q trylock patch.
sure.

Thanks,
Shaohua


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  1:10                       ` Shaohua Li
@ 2012-02-07  1:33                         ` Shaohua Li
  0 siblings, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2012-02-07  1:33 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Tue, 2012-02-07 at 09:10 +0800, Shaohua Li wrote:
> On Mon, 2012-02-06 at 16:59 -0800, Tejun Heo wrote:
> > On Tue, Feb 07, 2012 at 08:43:50AM +0800, Shaohua Li wrote:
> > > Quite simple, copy kernel source to tmpfs in tight memory environment.
> > > This will trigger swap. The bisection result appears stable.
> > 
> > Looked through the commit again but it really doesn't change fast
> > paths.  The only paths which can be slower are ioc and q exits.  Can
> > you please do the followings?
> > 
> > * Share the script used for microbench.  Maybe it somehow exposes
> >   slower exit path?  How much regression are we talking about?
> I need strip the script out from our test framework, but it essentially
> is a 'time copy -a tmpfs/kernelsource tmpfs/copy'. The time increased
> ~20% with 3.3-rc1.
oops, I get confused about our swap and swap-cp test. The workload with
regression is swap. which does 'time kernelbuild tmpfs/kernelsource'. we
run several such kernelbuild in the meantime to trigger swap. Very sorry
to give your wrong info before.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  0:59                     ` Tejun Heo
  2012-02-07  1:10                       ` Shaohua Li
@ 2012-02-07  5:22                       ` Shaohua Li
  2012-02-07 22:34                         ` Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-07  5:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Mon, 2012-02-06 at 16:59 -0800, Tejun Heo wrote:
> * Test with the this_q != locked_q trylock patch.
this one slightly reduces the regression. without it, the regression is
about 17%, with it, the regression is about 12%. Test result is stable.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-06 21:54               ` [PATCH] block: strip out locking optimization in put_io_context() Tejun Heo
@ 2012-02-07  6:49                 ` Jens Axboe
  2012-02-07 16:22                   ` Tejun Heo
  2012-02-07 23:00                   ` [PATCH] block: fix lockdep warning on io_context release put_io_context() Tejun Heo
  0 siblings, 2 replies; 49+ messages in thread
From: Jens Axboe @ 2012-02-07  6:49 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

On 02/06/2012 10:54 PM, Tejun Heo wrote:
> put_io_context() performed a complex trylock dancing to avoid
> deferring ioc release to workqueue.  It was also broken on UP because
> trylock was always assumed to succeed which resulted in unbalanced
> preemption count.
> 
> While there are ways to fix the UP breakage, even the most
> pathological microbench (forced ioc allocation and tight fork/exit
> loop) fails to show any appreciable performance benefit of the
> optimization.  Strip it out.  If there turns out to be workloads which
> are affected by this change, simpler optimization from the discussion
> thread can be applied later.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
> ---
> I couldn't find any statiscally meaningful advantage of the
> optimization with tight fork/exit tests w/ forced ioc creation on
> fork, which gotta be the most pathological test case for the code
> path.  So, let's remove the ugly optimization.  If I missed sth, we
> can resurrect the simpler optimization later.  Jens, this is on top of
> linus#master without Shaohua's patch.

OK, then I'm fine with cleaning it up. Applied, thanks Tejun.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07  6:49                 ` Jens Axboe
@ 2012-02-07 16:22                   ` Tejun Heo
  2012-02-07 16:28                     ` Jens Axboe
  2012-02-07 23:00                   ` [PATCH] block: fix lockdep warning on io_context release put_io_context() Tejun Heo
  1 sibling, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-07 16:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

Hello, Jens.

On Tue, Feb 07, 2012 at 07:49:19AM +0100, Jens Axboe wrote:
> > I couldn't find any statiscally meaningful advantage of the
> > optimization with tight fork/exit tests w/ forced ioc creation on
> > fork, which gotta be the most pathological test case for the code
> > path.  So, let's remove the ugly optimization.  If I missed sth, we
> > can resurrect the simpler optimization later.  Jens, this is on top of
> > linus#master without Shaohua's patch.
> 
> OK, then I'm fine with cleaning it up. Applied, thanks Tejun.

Hmmm... how about merging Shaohua's smaller fix first until we figure
out what's going on with the performance regression he's seeing?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07 16:22                   ` Tejun Heo
@ 2012-02-07 16:28                     ` Jens Axboe
  2012-02-07 16:33                       ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Jens Axboe @ 2012-02-07 16:28 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

On 2012-02-07 17:22, Tejun Heo wrote:
> Hello, Jens.
> 
> On Tue, Feb 07, 2012 at 07:49:19AM +0100, Jens Axboe wrote:
>>> I couldn't find any statiscally meaningful advantage of the
>>> optimization with tight fork/exit tests w/ forced ioc creation on
>>> fork, which gotta be the most pathological test case for the code
>>> path.  So, let's remove the ugly optimization.  If I missed sth, we
>>> can resurrect the simpler optimization later.  Jens, this is on top of
>>> linus#master without Shaohua's patch.
>>
>> OK, then I'm fine with cleaning it up. Applied, thanks Tejun.
> 
> Hmmm... how about merging Shaohua's smaller fix first until we figure
> out what's going on with the performance regression he's seeing?

That was already merged in my tree. I don't see how it makes much
difference in tracking the regression. You said that removing it made no
difference for the find test case, so I'd be more comfortable getting
rid of the nasty optimization.

I'll send it when I have pending tomorrow, so there's still a full day
to change things. We can just shuffle patches before then, not a
problem.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07 16:28                     ` Jens Axboe
@ 2012-02-07 16:33                       ` Linus Torvalds
  2012-02-07 16:47                         ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2012-02-07 16:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

On Tue, Feb 7, 2012 at 8:28 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> That was already merged in my tree. I don't see how it makes much
> difference in tracking the regression. You said that removing it made no
> difference for the find test case, so I'd be more comfortable getting
> rid of the nasty optimization.

Yeah, please just get rid of the crazy code. Maybe *that* fixes the
regression too, who knows?

For all we know, the "fast case" is what causes extra locking only to
then fail and not even be a fast-path.

I think our default action should always be to simplify and clean up
code, unless you have seriously hard numbers to show that the code
complexity is worth it.

                      Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07 16:33                       ` Linus Torvalds
@ 2012-02-07 16:47                         ` Tejun Heo
  2012-02-07 17:17                           ` Tejun Heo
  2012-02-08  0:19                           ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-07 16:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jens Axboe, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

Hello,

On Tue, Feb 07, 2012 at 08:33:15AM -0800, Linus Torvalds wrote:
> Yeah, please just get rid of the crazy code. Maybe *that* fixes the
> regression too, who knows?
> 
> For all we know, the "fast case" is what causes extra locking only to
> then fail and not even be a fast-path.

Yeah, I was about to ask Shaohua to test the version w/o optimization.
With heavily loaded request_queue, trylock failure could be frequent,
which I wasn't testing.

Shaohua, can you please test the version w/o optimization?  Also, can
you please give a bit more details on the setup?  Are there multiple
swap devices?  Is it SSD or rotating disk?

> I think our default action should always be to simplify and clean up
> code, unless you have seriously hard numbers to show that the code
> complexity is worth it.

Sure, it was originally all in put_io_context() and while moving
things to wq, it got progressively complex and I lost sense of
complexity from staring at it too long.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07 16:47                         ` Tejun Heo
@ 2012-02-07 17:17                           ` Tejun Heo
  2012-02-08  0:19                           ` Shaohua Li
  1 sibling, 0 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-07 17:17 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Vivek Goyal, lkml, Linus Torvalds, Knut Petersen, mroos

On Tue, Feb 07, 2012 at 08:47:35AM -0800, Tejun Heo wrote:
> Yeah, I was about to ask Shaohua to test the version w/o optimization.
> With heavily loaded request_queue, trylock failure could be frequent,
> which I wasn't testing.
> 
> Shaohua, can you please test the version w/o optimization?  Also, can
> you please give a bit more details on the setup?  Are there multiple
> swap devices?  Is it SSD or rotating disk?

Can you please also apply the following patch on top of
block/for-linus and see how it behaves?

Thanks.
---
 block/blk-ioc.c |   54 +++++++++++++++++-------------------------------------
 1 file changed, 17 insertions(+), 37 deletions(-)

Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -79,52 +79,32 @@ static void ioc_release_fn(struct work_s
 {
 	struct io_context *ioc = container_of(work, struct io_context,
 					      release_work);
-	struct request_queue *last_q = NULL;
+	unsigned long flags;
 
-	spin_lock_irq(&ioc->lock);
+	/*
+	 * Exiting icq may call into put_io_context() through elevator
+	 * which will trigger lockdep warning.  The ioc's are guaranteed to
+	 * be different, use a different locking subclass here.  Using
+	 * irqsave variant as there's no spin_lock_irq_nested().
+	 */
+	spin_lock_irqsave_nested(&ioc->lock, flags, 1);
 
 	while (!hlist_empty(&ioc->icq_list)) {
 		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
 						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
+		struct request_queue *q = icq->q;
 
-		if (this_q != last_q) {
-			/*
-			 * Need to switch to @this_q.  Once we release
-			 * @ioc->lock, it can go away along with @cic.
-			 * Hold on to it.
-			 */
-			__blk_get_queue(this_q);
-
-			/*
-			 * blk_put_queue() might sleep thanks to kobject
-			 * idiocy.  Always release both locks, put and
-			 * restart.
-			 */
-			if (last_q) {
-				spin_unlock(last_q->queue_lock);
-				spin_unlock_irq(&ioc->lock);
-				blk_put_queue(last_q);
-			} else {
-				spin_unlock_irq(&ioc->lock);
-			}
-
-			last_q = this_q;
-			spin_lock_irq(this_q->queue_lock);
-			spin_lock(&ioc->lock);
-			continue;
+		if (spin_trylock(q->queue_lock)) {
+			ioc_exit_icq(icq);
+			spin_unlock(q->queue_lock);
+		} else {
+			spin_unlock_irqrestore(&ioc->lock, flags);
+			cpu_relax();
+			spin_lock_irqsave_nested(&ioc->lock, flags, 1);
 		}
-		ioc_exit_icq(icq);
-	}
-
-	if (last_q) {
-		spin_unlock(last_q->queue_lock);
-		spin_unlock_irq(&ioc->lock);
-		blk_put_queue(last_q);
-	} else {
-		spin_unlock_irq(&ioc->lock);
 	}
 
+	spin_unlock_irqrestore(&ioc->lock, flags);
 	kmem_cache_free(iocontext_cachep, ioc);
 }
 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-07  5:22                       ` Shaohua Li
@ 2012-02-07 22:34                         ` Linus Torvalds
  0 siblings, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2012-02-07 22:34 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Tejun Heo, Vivek Goyal, Jens Axboe, lkml, Knut Petersen, mroos

On Mon, Feb 6, 2012 at 9:22 PM, Shaohua Li <shaohua.li@intel.com> wrote:
> On Mon, 2012-02-06 at 16:59 -0800, Tejun Heo wrote:
>> * Test with the this_q != locked_q trylock patch.
> this one slightly reduces the regression. without it, the regression is
> about 17%, with it, the regression is about 12%. Test result is stable.

What happens when you remove the fast-case entirely?

See the other patch by Tejun with the subject "[PATCH] block: strip
out locking optimization in put_io_context()"

               Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] block: fix lockdep warning on io_context release put_io_context()
  2012-02-07  6:49                 ` Jens Axboe
  2012-02-07 16:22                   ` Tejun Heo
@ 2012-02-07 23:00                   ` Tejun Heo
  1 sibling, 0 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-07 23:00 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Linus Torvalds, Vivek Goyal, Shaohua Li, lkml, Knut Petersen, mroos

11a3122f6c "block: strip out locking optimization in put_io_context()"
removed ioc_lock depth lockdep annoation along with locking
optimization; however, while recursing from put_io_context() is no
longer possible, ioc_release_fn() may still end up putting the last
reference of another ioc through elevator, which wlil grab ioc->lock
triggering spurious (as the ioc is always different one) A-A deadlock
warning.

As this can only happen one time from ioc_release_fn(), using non-zero
subclass from ioc_release_fn() is enough.  Use subclass 1.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
Jens, this one should go with the previous patch to avoid triggering
spurious lockdep warning.

Thanks.

 block/blk-ioc.c |   21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -80,8 +80,15 @@ static void ioc_release_fn(struct work_s
 	struct io_context *ioc = container_of(work, struct io_context,
 					      release_work);
 	struct request_queue *last_q = NULL;
+	unsigned long flags;
 
-	spin_lock_irq(&ioc->lock);
+	/*
+	 * Exiting icq may call into put_io_context() through elevator
+	 * which will trigger lockdep warning.  The ioc's are guaranteed to
+	 * be different, use a different locking subclass here.  Use
+	 * irqsave variant as there's no spin_lock_irq_nested().
+	 */
+	spin_lock_irqsave_nested(&ioc->lock, flags, 1);
 
 	while (!hlist_empty(&ioc->icq_list)) {
 		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
@@ -103,15 +110,15 @@ static void ioc_release_fn(struct work_s
 			 */
 			if (last_q) {
 				spin_unlock(last_q->queue_lock);
-				spin_unlock_irq(&ioc->lock);
+				spin_unlock_irqrestore(&ioc->lock, flags);
 				blk_put_queue(last_q);
 			} else {
-				spin_unlock_irq(&ioc->lock);
+				spin_unlock_irqrestore(&ioc->lock, flags);
 			}
 
 			last_q = this_q;
-			spin_lock_irq(this_q->queue_lock);
-			spin_lock(&ioc->lock);
+			spin_lock_irqsave(this_q->queue_lock, flags);
+			spin_lock_nested(&ioc->lock, 1);
 			continue;
 		}
 		ioc_exit_icq(icq);
@@ -119,10 +126,10 @@ static void ioc_release_fn(struct work_s
 
 	if (last_q) {
 		spin_unlock(last_q->queue_lock);
-		spin_unlock_irq(&ioc->lock);
+		spin_unlock_irqrestore(&ioc->lock, flags);
 		blk_put_queue(last_q);
 	} else {
-		spin_unlock_irq(&ioc->lock);
+		spin_unlock_irqrestore(&ioc->lock, flags);
 	}
 
 	kmem_cache_free(iocontext_cachep, ioc);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-07 16:47                         ` Tejun Heo
  2012-02-07 17:17                           ` Tejun Heo
@ 2012-02-08  0:19                           ` Shaohua Li
  2012-02-08  8:29                             ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-08  0:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

2012/2/8 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Tue, Feb 07, 2012 at 08:33:15AM -0800, Linus Torvalds wrote:
>> Yeah, please just get rid of the crazy code. Maybe *that* fixes the
>> regression too, who knows?
>>
>> For all we know, the "fast case" is what causes extra locking only to
>> then fail and not even be a fast-path.
>
> Yeah, I was about to ask Shaohua to test the version w/o optimization.
> With heavily loaded request_queue, trylock failure could be frequent,
> which I wasn't testing.
>
> Shaohua, can you please test the version w/o optimization?  Also, can
> you please give a bit more details on the setup?  Are there multiple
> swap devices?  Is it SSD or rotating disk?
the test adds mem=4G in a 2 sockets 16 CPU machine.
just make several copy of kernel source in tmpfs (so there is swap
depending on your
memory size) and run kernel build in the kernel source in the meaning time.
there is only one swap device which is a rotating disk.

I'll test both patches soon.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08  0:19                           ` Shaohua Li
@ 2012-02-08  8:29                             ` Shaohua Li
  2012-02-08 16:29                               ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-08  8:29 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

2012/2/8 Shaohua Li <shaohua.li@intel.com>:
> 2012/2/8 Tejun Heo <tj@kernel.org>:
>> Hello,
>>
>> On Tue, Feb 07, 2012 at 08:33:15AM -0800, Linus Torvalds wrote:
>>> Yeah, please just get rid of the crazy code. Maybe *that* fixes the
>>> regression too, who knows?
>>>
>>> For all we know, the "fast case" is what causes extra locking only to
>>> then fail and not even be a fast-path.
>>
>> Yeah, I was about to ask Shaohua to test the version w/o optimization.
>> With heavily loaded request_queue, trylock failure could be frequent,
>> which I wasn't testing.
>>
>> Shaohua, can you please test the version w/o optimization?  Also, can
>> you please give a bit more details on the setup?  Are there multiple
>> swap devices?  Is it SSD or rotating disk?
> the test adds mem=4G in a 2 sockets 16 CPU machine.
> just make several copy of kernel source in tmpfs (so there is swap
> depending on your
> memory size) and run kernel build in the kernel source in the meaning time.
> there is only one swap device which is a rotating disk.
>
> I'll test both patches soon.
Tried all the options, the regression still exists. Any new idea?
I'll spend some time on it if I can get anything

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08  8:29                             ` Shaohua Li
@ 2012-02-08 16:29                               ` Tejun Heo
  2012-02-08 16:34                                 ` Linus Torvalds
  2012-02-09  6:22                                 ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-08 16:29 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Linus Torvalds, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

Hello, Shaohua.

On Wed, Feb 08, 2012 at 04:29:53PM +0800, Shaohua Li wrote:
> > the test adds mem=4G in a 2 sockets 16 CPU machine.
> > just make several copy of kernel source in tmpfs (so there is swap
> > depending on your
> > memory size) and run kernel build in the kernel source in the meaning time.
> > there is only one swap device which is a rotating disk.
> >
> > I'll test both patches soon.
>
> Tried all the options, the regression still exists. Any new idea?
> I'll spend some time on it if I can get anything

Can you please try the following one?  Thanks a lot!

---
 block/blk-cgroup.c        |    2 
 block/blk-core.c          |    2 
 block/blk-ioc.c           |   99 +++++++++++++---------------------------------
 block/cfq-iosched.c       |    2 
 fs/ioprio.c               |    2 
 include/linux/iocontext.h |    8 +--
 kernel/fork.c             |    2 
 7 files changed, 37 insertions(+), 80 deletions(-)

Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -8,9 +8,12 @@
 #include <linux/blkdev.h>
 #include <linux/bootmem.h>	/* for max_pfn/max_low_pfn */
 #include <linux/slab.h>
+#include <linux/rwlock.h>
 
 #include "blk.h"
 
+static DEFINE_RWLOCK(ioc_clear_rwlock);
+
 /*
  * For io context allocations
  */
@@ -45,7 +48,7 @@ static void ioc_exit_icq(struct io_cq *i
 	struct request_queue *q = icq->q;
 	struct elevator_type *et = q->elevator->type;
 
-	lockdep_assert_held(&ioc->lock);
+	//lockdep_assert_held(&ioc->lock);
 	lockdep_assert_held(q->queue_lock);
 
 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
@@ -71,63 +74,6 @@ static void ioc_exit_icq(struct io_cq *i
 	call_rcu(&icq->__rcu_head, icq_free_icq_rcu);
 }
 
-/*
- * Slow path for ioc release in put_io_context().  Performs double-lock
- * dancing to unlink all icq's and then frees ioc.
- */
-static void ioc_release_fn(struct work_struct *work)
-{
-	struct io_context *ioc = container_of(work, struct io_context,
-					      release_work);
-	struct request_queue *last_q = NULL;
-
-	spin_lock_irq(&ioc->lock);
-
-	while (!hlist_empty(&ioc->icq_list)) {
-		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
-						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
-
-		if (this_q != last_q) {
-			/*
-			 * Need to switch to @this_q.  Once we release
-			 * @ioc->lock, it can go away along with @cic.
-			 * Hold on to it.
-			 */
-			__blk_get_queue(this_q);
-
-			/*
-			 * blk_put_queue() might sleep thanks to kobject
-			 * idiocy.  Always release both locks, put and
-			 * restart.
-			 */
-			if (last_q) {
-				spin_unlock(last_q->queue_lock);
-				spin_unlock_irq(&ioc->lock);
-				blk_put_queue(last_q);
-			} else {
-				spin_unlock_irq(&ioc->lock);
-			}
-
-			last_q = this_q;
-			spin_lock_irq(this_q->queue_lock);
-			spin_lock(&ioc->lock);
-			continue;
-		}
-		ioc_exit_icq(icq);
-	}
-
-	if (last_q) {
-		spin_unlock(last_q->queue_lock);
-		spin_unlock_irq(&ioc->lock);
-		blk_put_queue(last_q);
-	} else {
-		spin_unlock_irq(&ioc->lock);
-	}
-
-	kmem_cache_free(iocontext_cachep, ioc);
-}
-
 /**
  * put_io_context - put a reference of io_context
  * @ioc: io_context to put
@@ -135,7 +81,7 @@ static void ioc_release_fn(struct work_s
  * Decrement reference count of @ioc and release it if the count reaches
  * zero.
  */
-void put_io_context(struct io_context *ioc)
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 {
 	unsigned long flags;
 
@@ -144,16 +90,26 @@ void put_io_context(struct io_context *i
 
 	BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
 
-	/*
-	 * Releasing ioc requires reverse order double locking and we may
-	 * already be holding a queue_lock.  Do it asynchronously from wq.
-	 */
-	if (atomic_long_dec_and_test(&ioc->refcount)) {
-		spin_lock_irqsave(&ioc->lock, flags);
-		if (!hlist_empty(&ioc->icq_list))
-			schedule_work(&ioc->release_work);
-		spin_unlock_irqrestore(&ioc->lock, flags);
+	if (!atomic_long_dec_and_test(&ioc->refcount))
+		return;
+
+	read_lock_irqsave(&ioc_clear_rwlock, flags);
+
+	while (!hlist_empty(&ioc->icq_list)) {
+		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
+						struct io_cq, ioc_node);
+		struct request_queue *q = icq->q;
+
+		if (q != locked_q)
+			spin_lock_nested(q->queue_lock, 1);
+
+		ioc_exit_icq(icq);
+
+		if (q != locked_q)
+			spin_unlock(q->queue_lock);
 	}
+
+	read_unlock_irqrestore(&ioc_clear_rwlock, flags);
 }
 EXPORT_SYMBOL(put_io_context);
 
@@ -168,7 +124,7 @@ void exit_io_context(struct task_struct 
 	task_unlock(task);
 
 	atomic_dec(&ioc->nr_tasks);
-	put_io_context(ioc);
+	put_io_context(ioc, NULL);
 }
 
 /**
@@ -181,6 +137,8 @@ void ioc_clear_queue(struct request_queu
 {
 	lockdep_assert_held(q->queue_lock);
 
+	write_lock(&ioc_clear_rwlock);
+
 	while (!list_empty(&q->icq_list)) {
 		struct io_cq *icq = list_entry(q->icq_list.next,
 					       struct io_cq, q_node);
@@ -190,6 +148,8 @@ void ioc_clear_queue(struct request_queu
 		ioc_exit_icq(icq);
 		spin_unlock(&ioc->lock);
 	}
+
+	write_unlock(&ioc_clear_rwlock);
 }
 
 void create_io_context_slowpath(struct task_struct *task, gfp_t gfp_flags,
@@ -208,7 +168,6 @@ void create_io_context_slowpath(struct t
 	spin_lock_init(&ioc->lock);
 	INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH);
 	INIT_HLIST_HEAD(&ioc->icq_list);
-	INIT_WORK(&ioc->release_work, ioc_release_fn);
 
 	/*
 	 * Try to install.  ioc shouldn't be installed if someone else
Index: work/include/linux/iocontext.h
===================================================================
--- work.orig/include/linux/iocontext.h
+++ work/include/linux/iocontext.h
@@ -3,7 +3,6 @@
 
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
-#include <linux/workqueue.h>
 
 enum {
 	ICQ_IOPRIO_CHANGED,
@@ -113,8 +112,6 @@ struct io_context {
 	struct radix_tree_root	icq_tree;
 	struct io_cq __rcu	*icq_hint;
 	struct hlist_head	icq_list;
-
-	struct work_struct release_work;
 };
 
 static inline struct io_context *ioc_task_link(struct io_context *ioc)
@@ -133,7 +130,7 @@ static inline struct io_context *ioc_tas
 
 struct task_struct;
 #ifdef CONFIG_BLOCK
-void put_io_context(struct io_context *ioc);
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q);
 void exit_io_context(struct task_struct *task);
 struct io_context *get_task_io_context(struct task_struct *task,
 				       gfp_t gfp_flags, int node);
@@ -141,7 +138,8 @@ void ioc_ioprio_changed(struct io_contex
 void ioc_cgroup_changed(struct io_context *ioc);
 #else
 struct io_context;
-static inline void put_io_context(struct io_context *ioc) { }
+static inline void put_io_context(struct io_context *ioc,
+				  struct request_queue *locked_q) { }
 static inline void exit_io_context(struct task_struct *task) { }
 #endif
 
Index: work/block/blk-cgroup.c
===================================================================
--- work.orig/block/blk-cgroup.c
+++ work/block/blk-cgroup.c
@@ -1659,7 +1659,7 @@ static void blkiocg_attach(struct cgroup
 		ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 		if (ioc) {
 			ioc_cgroup_changed(ioc);
-			put_io_context(ioc);
+			put_io_context(ioc, NULL);
 		}
 	}
 }
Index: work/block/blk-core.c
===================================================================
--- work.orig/block/blk-core.c
+++ work/block/blk-core.c
@@ -642,7 +642,7 @@ static inline void blk_free_request(stru
 	if (rq->cmd_flags & REQ_ELVPRIV) {
 		elv_put_request(q, rq);
 		if (rq->elv.icq)
-			put_io_context(rq->elv.icq->ioc);
+			put_io_context(rq->elv.icq->ioc, q);
 	}
 
 	mempool_free(rq, q->rq.rq_pool);
Index: work/block/cfq-iosched.c
===================================================================
--- work.orig/block/cfq-iosched.c
+++ work/block/cfq-iosched.c
@@ -1787,7 +1787,7 @@ __cfq_slice_expired(struct cfq_data *cfq
 		cfqd->active_queue = NULL;
 
 	if (cfqd->active_cic) {
-		put_io_context(cfqd->active_cic->icq.ioc);
+		put_io_context(cfqd->active_cic->icq.ioc, cfqd->queue);
 		cfqd->active_cic = NULL;
 	}
 }
Index: work/fs/ioprio.c
===================================================================
--- work.orig/fs/ioprio.c
+++ work/fs/ioprio.c
@@ -51,7 +51,7 @@ int set_task_ioprio(struct task_struct *
 	ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 	if (ioc) {
 		ioc_ioprio_changed(ioc, ioprio);
-		put_io_context(ioc);
+		put_io_context(ioc, NULL);
 	}
 
 	return err;
Index: work/kernel/fork.c
===================================================================
--- work.orig/kernel/fork.c
+++ work/kernel/fork.c
@@ -890,7 +890,7 @@ static int copy_io(unsigned long clone_f
 			return -ENOMEM;
 
 		new_ioc->ioprio = ioc->ioprio;
-		put_io_context(new_ioc);
+		put_io_context(new_ioc, NULL);
 	}
 #endif
 	return 0;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08 16:29                               ` Tejun Heo
@ 2012-02-08 16:34                                 ` Linus Torvalds
  2012-02-08 16:49                                   ` Tejun Heo
  2012-02-09  6:22                                 ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2012-02-08 16:34 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

On Wed, Feb 8, 2012 at 8:29 AM, Tejun Heo <tj@kernel.org> wrote:
>
> Can you please try the following one?  Thanks a lot!

If you can use it as a rwlock, why can't you do it with RCU?

Usually rwlocks are a bad idea. They tend to be more expensive than
spinlocks, and the extra parallelism is almost never noticeable
(except as "more cacheline bounces") for something that is appropriate
for a non-sleeping lock.

There's a *very* few situations where rwlock is the right thing, but
it really almost always is a horribly bad idea.

                     Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08 16:34                                 ` Linus Torvalds
@ 2012-02-08 16:49                                   ` Tejun Heo
  2012-02-08 16:56                                     ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-08 16:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

Hello, Linus.

On Wed, Feb 08, 2012 at 08:34:53AM -0800, Linus Torvalds wrote:
> On Wed, Feb 8, 2012 at 8:29 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > Can you please try the following one?  Thanks a lot!
> 
> If you can use it as a rwlock, why can't you do it with RCU?

The original locking scheme was using RCU which was very fragile and
broken on corner cases.  The locking restructuring was aimed to make
things simpler.  While the double locking isn't trivial, it's much
easier to grasp and get right than RCU.  We might have to revive RCU
if the regression can't be tackled otherwise and it probably is
possible to do it simpler.  Let's see.

> Usually rwlocks are a bad idea. They tend to be more expensive than
> spinlocks, and the extra parallelism is almost never noticeable
> (except as "more cacheline bounces") for something that is appropriate
> for a non-sleeping lock.
>
> There's a *very* few situations where rwlock is the right thing, but
> it really almost always is a horribly bad idea.

I'm still a bit lost on where the regression is coming from and
*suspecting* that queue_lock contention is making the reverse locking
behave much worse than expected, so I mostly wanted to take that out
and see what happens.  rwlock might increase locking overhead per try
but it avoids unlock/lock dancing.  I'll try to reproduce the
regression in a few days and do better analysis.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08 16:49                                   ` Tejun Heo
@ 2012-02-08 16:56                                     ` Tejun Heo
  2012-02-08 17:23                                       ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-08 16:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

On Wed, Feb 08, 2012 at 08:49:20AM -0800, Tejun Heo wrote:
> I'm still a bit lost on where the regression is coming from and
> *suspecting* that queue_lock contention is making the reverse locking
> behave much worse than expected, so I mostly wanted to take that out
> and see what happens.

IOW, we can achieve about the same thing by adding another lock in
request_queue.  The goal is using an inner lock for ioc clearing so
that queue_lock doesn't have to be grabbed inside ioc lock.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08 16:56                                     ` Tejun Heo
@ 2012-02-08 17:23                                       ` Tejun Heo
  0 siblings, 0 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-08 17:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

On Wed, Feb 08, 2012 at 08:56:11AM -0800, Tejun Heo wrote:
> On Wed, Feb 08, 2012 at 08:49:20AM -0800, Tejun Heo wrote:
> > I'm still a bit lost on where the regression is coming from and
> > *suspecting* that queue_lock contention is making the reverse locking
> > behave much worse than expected, so I mostly wanted to take that out
> > and see what happens.
> 
> IOW, we can achieve about the same thing by adding another lock in
> request_queue.  The goal is using an inner lock for ioc clearing so
> that queue_lock doesn't have to be grabbed inside ioc lock.

Urgh....... forget about the above message.  My head is still booting.
We can't do this w/ per-queue lock as we don't have a way to traverse
the associated queues w/o holding the lock and we need read locking on
ioc exit path as we may recurse through elevator icq exit (it's not
about concurrency).  /me goes to brew coffee.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [patch]block: fix ioc locking warning
  2012-02-06  7:50 [patch]block: fix ioc locking warning Shaohua Li
                   ` (2 preceding siblings ...)
  2012-02-06 16:22 ` Tejun Heo
@ 2012-02-08 18:07 ` walt
  3 siblings, 0 replies; 49+ messages in thread
From: walt @ 2012-02-08 18:07 UTC (permalink / raw)
  To: linux-kernel

On 02/05/2012 11:50 PM, Shaohua Li wrote:

> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index 27a06e0..7490b6d 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -204,7 +204,9 @@ void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
>   				spin_unlock(last_q->queue_lock);
>   			last_q = NULL;
>
> -			if (!spin_trylock(this_q->queue_lock))
> +			/* spin_trylock() always successes in UP case */
> +			if (this_q != locked_q&&
> +			    !spin_trylock(this_q->queue_lock))
>   				break;
>   			last_q = this_q;
>   			continue;

I can see from the rest of this thread that your patch fixes a multitude
of different backtraces, including (most importantly :) mine, thanks:

http://marc.info/?l=linux-kernel&m=132804761925018&w=2




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-08 16:29                               ` Tejun Heo
  2012-02-08 16:34                                 ` Linus Torvalds
@ 2012-02-09  6:22                                 ` Shaohua Li
  2012-02-09 17:59                                   ` Tejun Heo
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-09  6:22 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Linus Torvalds, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

2012/2/9 Tejun Heo <tj@kernel.org>:
> Hello, Shaohua.
>
> On Wed, Feb 08, 2012 at 04:29:53PM +0800, Shaohua Li wrote:
>> > the test adds mem=4G in a 2 sockets 16 CPU machine.
>> > just make several copy of kernel source in tmpfs (so there is swap
>> > depending on your
>> > memory size) and run kernel build in the kernel source in the meaning time.
>> > there is only one swap device which is a rotating disk.
>> >
>> > I'll test both patches soon.
>>
>> Tried all the options, the regression still exists. Any new idea?
>> I'll spend some time on it if I can get anything
>
> Can you please try the following one?  Thanks a lot!
doesn't work.
I also double confirmed b2efa05265d62 causes the regression

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09  6:22                                 ` Shaohua Li
@ 2012-02-09 17:59                                   ` Tejun Heo
  2012-02-09 18:07                                     ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-09 17:59 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Linus Torvalds, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

Hello,

On Thu, Feb 09, 2012 at 02:22:32PM +0800, Shaohua Li wrote:
> >> Tried all the options, the regression still exists. Any new idea?
> >> I'll spend some time on it if I can get anything
> >
> > Can you please try the following one?  Thanks a lot!
> doesn't work.
> I also double confirmed b2efa05265d62 causes the regression

I'll soon send a RCU based version.  I'm still having trouble
reproducing the regression tho.  I've tried a few different things.

* Heavy thrashing: Disk IO dominates everything and CPUs aren't too
  busy.  While how swap behaves affects completion time, I don't see
  how CPU locking issue comes into play at all under this
  circumstance.

* Some swap load: Simulated w/ 1G memory limit and buliding defconfig
  kernel in tmpfs.  Swap grows to a couple hundred megabytes but build
  time is still dominated by CPU.  I didn't see any meanginful
  difference before and after the commit - both in terms of wallclock
  and CPU times.

Maybe these two were too extreme to show the problem and I need to
push memory limit a bit further, but it would be great if you can give
me a bit more details about your testing.

* How much memory does the machine have?  How is the tmpfs setup and
  filled up?  What's the size of the tmpfs and the output of "free -m"
  right before test starts?  While the test is running, how occupied
  are the CPUs?  On test completion, what's the output of "free -m"?

* What exactly is the test and what do you measure?  What does "12%
  regression" mean?  Is it wallclock time or CPU time?  If it's CPU
  time, does systime increase dominate the regression?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09 17:59                                   ` Tejun Heo
@ 2012-02-09 18:07                                     ` Linus Torvalds
  2012-02-09 19:24                                       ` Tejun Heo
  2012-02-10  3:09                                       ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2012-02-09 18:07 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

On Thu, Feb 9, 2012 at 9:59 AM, Tejun Heo <tj@kernel.org> wrote:
>
> * What exactly is the test and what do you measure?  What does "12%
>  regression" mean?  Is it wallclock time or CPU time?  If it's CPU
>  time, does systime increase dominate the regression?

Shaohua, it might be interesting to see a profile of the bad case.

Now, quite often these kinds of things don't show anything at all -
it's just due to cache issues and there's no obvious "we hold spinlock
X for 15 seconds total". But if it's actual lock contention rather
than just "more scheduling of worker threads", it should show up in
the profile quite clearly.

That said, I do think the RCU approach is the right one. The whole
delayed deallocation (and the replacement patch with rwlocks) really
smells like "badly done RCU-like behavior" to me.

                  Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09 18:07                                     ` Linus Torvalds
@ 2012-02-09 19:24                                       ` Tejun Heo
  2012-02-09 23:48                                         ` Tejun Heo
  2012-02-10  3:09                                       ` Shaohua Li
  1 sibling, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-09 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shaohua Li, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

On Thu, Feb 09, 2012 at 10:07:35AM -0800, Linus Torvalds wrote:
> On Thu, Feb 9, 2012 at 9:59 AM, Tejun Heo <tj@kernel.org> wrote:
> >
> > * What exactly is the test and what do you measure?  What does "12%
> >  regression" mean?  Is it wallclock time or CPU time?  If it's CPU
> >  time, does systime increase dominate the regression?
> 
> Shaohua, it might be interesting to see a profile of the bad case.

Yeap, if CPUs are taking more time to do stuff, it would be helpful to
obtain before and after profiles.

> Now, quite often these kinds of things don't show anything at all -
> it's just due to cache issues and there's no obvious "we hold spinlock
> X for 15 seconds total". But if it's actual lock contention rather
> than just "more scheduling of worker threads", it should show up in
> the profile quite clearly.

Weird thing is that if it were wq, the rwlock patch should have
removed the regression.  It removed the reverse locking and the wq
deferring.  It does replace ioc locking with global readlock but it's
weird that that can show up as >10% regression (whatever the measure
may be).  Even though kernel compiling is pretty fork/exit
intensive....

> That said, I do think the RCU approach is the right one. The whole
> delayed deallocation (and the replacement patch with rwlocks) really
> smells like "badly done RCU-like behavior" to me.

I'll probably post it in several hours and think it's gonna be pretty
well contained.  I probably avoided RCU too hard in that path from the
original scary RCU usage.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09 19:24                                       ` Tejun Heo
@ 2012-02-09 23:48                                         ` Tejun Heo
  2012-02-10  5:14                                           ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-09 23:48 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

Hello, Shaohua.

Can you please test the following one?  It's probably the simplest
version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
still a bit hesitant because RCU coverage needs to be extended to
request_queue via conditional synchronize_rcu() in queue exit path
(can't enforce delayed RCU free on request_queues and unconditional
synchronize_rcu() may cause excessive delay during boot for certain
configurations).  It now can be done in the block core layer proper so
it shouldn't be as bad tho.  If this too flops, I'll get to that.

Thanks.
---
 block/blk-cgroup.c        |    2 
 block/blk-core.c          |    2 
 block/blk-ioc.c           |  112 +++++++++++++++++-----------------------------
 block/cfq-iosched.c       |    2 
 fs/ioprio.c               |    2 
 include/linux/blkdev.h    |    3 +
 include/linux/iocontext.h |    8 +--
 kernel/fork.c             |    2 
 8 files changed, 54 insertions(+), 79 deletions(-)

Index: work/block/blk-cgroup.c
===================================================================
--- work.orig/block/blk-cgroup.c
+++ work/block/blk-cgroup.c
@@ -1659,7 +1659,7 @@ static void blkiocg_attach(struct cgroup
 		ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 		if (ioc) {
 			ioc_cgroup_changed(ioc);
-			put_io_context(ioc);
+			put_io_context(ioc, NULL);
 		}
 	}
 }
Index: work/block/blk-core.c
===================================================================
--- work.orig/block/blk-core.c
+++ work/block/blk-core.c
@@ -642,7 +642,7 @@ static inline void blk_free_request(stru
 	if (rq->cmd_flags & REQ_ELVPRIV) {
 		elv_put_request(q, rq);
 		if (rq->elv.icq)
-			put_io_context(rq->elv.icq->ioc);
+			put_io_context(rq->elv.icq->ioc, q);
 	}
 
 	mempool_free(rq, q->rq.rq_pool);
Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -29,6 +29,21 @@ void get_io_context(struct io_context *i
 }
 EXPORT_SYMBOL(get_io_context);
 
+/*
+ * Releasing ioc may nest into another put_io_context() leading to nested
+ * fast path release.  As the ioc's can't be the same, this is okay but
+ * makes lockdep whine.  Keep track of nesting and use it as subclass.
+ */
+#ifdef CONFIG_LOCKDEP
+#define ioc_release_depth(q)		((q) ? (q)->ioc_release_depth : 0)
+#define ioc_release_depth_inc(q)	(q)->ioc_release_depth++
+#define ioc_release_depth_dec(q)	(q)->ioc_release_depth--
+#else
+#define ioc_release_depth(q)		0
+#define ioc_release_depth_inc(q)	do { } while (0)
+#define ioc_release_depth_dec(q)	do { } while (0)
+#endif
+
 static void icq_free_icq_rcu(struct rcu_head *head)
 {
 	struct io_cq *icq = container_of(head, struct io_cq, __rcu_head);
@@ -71,63 +86,6 @@ static void ioc_exit_icq(struct io_cq *i
 	call_rcu(&icq->__rcu_head, icq_free_icq_rcu);
 }
 
-/*
- * Slow path for ioc release in put_io_context().  Performs double-lock
- * dancing to unlink all icq's and then frees ioc.
- */
-static void ioc_release_fn(struct work_struct *work)
-{
-	struct io_context *ioc = container_of(work, struct io_context,
-					      release_work);
-	struct request_queue *last_q = NULL;
-
-	spin_lock_irq(&ioc->lock);
-
-	while (!hlist_empty(&ioc->icq_list)) {
-		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
-						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
-
-		if (this_q != last_q) {
-			/*
-			 * Need to switch to @this_q.  Once we release
-			 * @ioc->lock, it can go away along with @cic.
-			 * Hold on to it.
-			 */
-			__blk_get_queue(this_q);
-
-			/*
-			 * blk_put_queue() might sleep thanks to kobject
-			 * idiocy.  Always release both locks, put and
-			 * restart.
-			 */
-			if (last_q) {
-				spin_unlock(last_q->queue_lock);
-				spin_unlock_irq(&ioc->lock);
-				blk_put_queue(last_q);
-			} else {
-				spin_unlock_irq(&ioc->lock);
-			}
-
-			last_q = this_q;
-			spin_lock_irq(this_q->queue_lock);
-			spin_lock(&ioc->lock);
-			continue;
-		}
-		ioc_exit_icq(icq);
-	}
-
-	if (last_q) {
-		spin_unlock(last_q->queue_lock);
-		spin_unlock_irq(&ioc->lock);
-		blk_put_queue(last_q);
-	} else {
-		spin_unlock_irq(&ioc->lock);
-	}
-
-	kmem_cache_free(iocontext_cachep, ioc);
-}
-
 /**
  * put_io_context - put a reference of io_context
  * @ioc: io_context to put
@@ -135,7 +93,7 @@ static void ioc_release_fn(struct work_s
  * Decrement reference count of @ioc and release it if the count reaches
  * zero.
  */
-void put_io_context(struct io_context *ioc)
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 {
 	unsigned long flags;
 
@@ -144,16 +102,33 @@ void put_io_context(struct io_context *i
 
 	BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
 
-	/*
-	 * Releasing ioc requires reverse order double locking and we may
-	 * already be holding a queue_lock.  Do it asynchronously from wq.
-	 */
-	if (atomic_long_dec_and_test(&ioc->refcount)) {
-		spin_lock_irqsave(&ioc->lock, flags);
-		if (!hlist_empty(&ioc->icq_list))
-			schedule_work(&ioc->release_work);
-		spin_unlock_irqrestore(&ioc->lock, flags);
+	if (!atomic_long_dec_and_test(&ioc->refcount))
+		return;
+
+	spin_lock_irqsave_nested(&ioc->lock, flags,
+				 ioc_release_depth(locked_q));
+
+	while (!hlist_empty(&ioc->icq_list)) {
+		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
+						struct io_cq, ioc_node);
+		struct request_queue *q = icq->q;
+
+		if (q == locked_q || spin_trylock(q->queue_lock)) {
+			ioc_release_depth_inc(q);
+			ioc_exit_icq(icq);
+			ioc_release_depth_dec(q);
+
+			if (q != locked_q)
+				spin_unlock(q->queue_lock);
+		} else {
+			spin_unlock_irqrestore(&ioc->lock, flags);
+			cpu_relax();
+			spin_lock_irqsave_nested(&ioc->lock, flags,
+						 ioc_release_depth(locked_q));
+		}
 	}
+
+	spin_unlock_irqrestore(&ioc->lock, flags);
 }
 EXPORT_SYMBOL(put_io_context);
 
@@ -168,7 +143,7 @@ void exit_io_context(struct task_struct 
 	task_unlock(task);
 
 	atomic_dec(&ioc->nr_tasks);
-	put_io_context(ioc);
+	put_io_context(ioc, NULL);
 }
 
 /**
@@ -208,7 +183,6 @@ void create_io_context_slowpath(struct t
 	spin_lock_init(&ioc->lock);
 	INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH);
 	INIT_HLIST_HEAD(&ioc->icq_list);
-	INIT_WORK(&ioc->release_work, ioc_release_fn);
 
 	/*
 	 * Try to install.  ioc shouldn't be installed if someone else
Index: work/block/cfq-iosched.c
===================================================================
--- work.orig/block/cfq-iosched.c
+++ work/block/cfq-iosched.c
@@ -1787,7 +1787,7 @@ __cfq_slice_expired(struct cfq_data *cfq
 		cfqd->active_queue = NULL;
 
 	if (cfqd->active_cic) {
-		put_io_context(cfqd->active_cic->icq.ioc);
+		put_io_context(cfqd->active_cic->icq.ioc, cfqd->queue);
 		cfqd->active_cic = NULL;
 	}
 }
Index: work/fs/ioprio.c
===================================================================
--- work.orig/fs/ioprio.c
+++ work/fs/ioprio.c
@@ -51,7 +51,7 @@ int set_task_ioprio(struct task_struct *
 	ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 	if (ioc) {
 		ioc_ioprio_changed(ioc, ioprio);
-		put_io_context(ioc);
+		put_io_context(ioc, NULL);
 	}
 
 	return err;
Index: work/include/linux/iocontext.h
===================================================================
--- work.orig/include/linux/iocontext.h
+++ work/include/linux/iocontext.h
@@ -3,7 +3,6 @@
 
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
-#include <linux/workqueue.h>
 
 enum {
 	ICQ_IOPRIO_CHANGED,
@@ -113,8 +112,6 @@ struct io_context {
 	struct radix_tree_root	icq_tree;
 	struct io_cq __rcu	*icq_hint;
 	struct hlist_head	icq_list;
-
-	struct work_struct release_work;
 };
 
 static inline struct io_context *ioc_task_link(struct io_context *ioc)
@@ -133,7 +130,7 @@ static inline struct io_context *ioc_tas
 
 struct task_struct;
 #ifdef CONFIG_BLOCK
-void put_io_context(struct io_context *ioc);
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q);
 void exit_io_context(struct task_struct *task);
 struct io_context *get_task_io_context(struct task_struct *task,
 				       gfp_t gfp_flags, int node);
@@ -141,7 +138,8 @@ void ioc_ioprio_changed(struct io_contex
 void ioc_cgroup_changed(struct io_context *ioc);
 #else
 struct io_context;
-static inline void put_io_context(struct io_context *ioc) { }
+static inline void put_io_context(struct io_context *ioc,
+				  struct request_queue *locked_q) { }
 static inline void exit_io_context(struct task_struct *task) { }
 #endif
 
Index: work/kernel/fork.c
===================================================================
--- work.orig/kernel/fork.c
+++ work/kernel/fork.c
@@ -890,7 +890,7 @@ static int copy_io(unsigned long clone_f
 			return -ENOMEM;
 
 		new_ioc->ioprio = ioc->ioprio;
-		put_io_context(new_ioc);
+		put_io_context(new_ioc, NULL);
 	}
 #endif
 	return 0;
Index: work/include/linux/blkdev.h
===================================================================
--- work.orig/include/linux/blkdev.h
+++ work/include/linux/blkdev.h
@@ -399,6 +399,9 @@ struct request_queue {
 	/* Throttle data */
 	struct throtl_data *td;
 #endif
+#ifdef CONFIG_LOCKDEP
+	int			ioc_release_depth;
+#endif
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09 18:07                                     ` Linus Torvalds
  2012-02-09 19:24                                       ` Tejun Heo
@ 2012-02-10  3:09                                       ` Shaohua Li
  1 sibling, 0 replies; 49+ messages in thread
From: Shaohua Li @ 2012-02-10  3:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Tejun Heo, Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos

2012/2/10 Linus Torvalds <torvalds@linux-foundation.org>:
> On Thu, Feb 9, 2012 at 9:59 AM, Tejun Heo <tj@kernel.org> wrote:
>>
>> * What exactly is the test and what do you measure?  What does "12%
>>  regression" mean?  Is it wallclock time or CPU time?  If it's CPU
>>  time, does systime increase dominate the regression?
>
> Shaohua, it might be interesting to see a profile of the bad case.
>
> Now, quite often these kinds of things don't show anything at all -
> it's just due to cache issues and there's no obvious "we hold spinlock
> X for 15 seconds total". But if it's actual lock contention rather
> than just "more scheduling of worker threads", it should show up in
> the profile quite clearly.
>
> That said, I do think the RCU approach is the right one. The whole
> delayed deallocation (and the replacement patch with rwlocks) really
> smells like "badly done RCU-like behavior" to me.
Appears not a lock contention issue. The system is quite idle, about
20% busy. And top shows no cpu is very busy. Before test runs, the
system has about 2G free memory (from vmstat)

system and user time isn't changed. only real time becomes longer.
This suggests IO is slower or there is more IO. But vmstat and iostat
data doesn't show significant difference between the good and bad
cases. There might be some access pattern changed which makes
swap no efficient or working set is wrongly swaped out.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-09 23:48                                         ` Tejun Heo
@ 2012-02-10  5:14                                           ` Shaohua Li
  2012-02-10  8:48                                             ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-10  5:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

2012/2/10 Tejun Heo <tj@kernel.org>:
> Hello, Shaohua.
>
> Can you please test the following one?  It's probably the simplest
> version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
> still a bit hesitant because RCU coverage needs to be extended to
> request_queue via conditional synchronize_rcu() in queue exit path
> (can't enforce delayed RCU free on request_queues and unconditional
> synchronize_rcu() may cause excessive delay during boot for certain
> configurations).  It now can be done in the block core layer proper so
> it shouldn't be as bad tho.  If this too flops, I'll get to that.
doesn't work.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-10  5:14                                           ` Shaohua Li
@ 2012-02-10  8:48                                             ` Shaohua Li
  2012-02-11  2:17                                               ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-10  8:48 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

2012/2/10 Shaohua Li <shli@kernel.org>:
> 2012/2/10 Tejun Heo <tj@kernel.org>:
>> Hello, Shaohua.
>>
>> Can you please test the following one?  It's probably the simplest
>> version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
>> still a bit hesitant because RCU coverage needs to be extended to
>> request_queue via conditional synchronize_rcu() in queue exit path
>> (can't enforce delayed RCU free on request_queues and unconditional
>> synchronize_rcu() may cause excessive delay during boot for certain
>> configurations).  It now can be done in the block core layer proper so
>> it shouldn't be as bad tho.  If this too flops, I'll get to that.
> doesn't work.
I added trace in the schedule_work code path of put_io_context, which
runs very rare. So it's not lock contention for sure.
Sounds the only difference between the good/bad cases is the good
case runs with rcu_lock_read/rcu_read_unlock. I also checked slab
info, the cfq related slab doesn't use too many memory, unlikely
because rcu latency uses too many memory.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-10  8:48                                             ` Shaohua Li
@ 2012-02-11  2:17                                               ` Tejun Heo
  2012-02-11 11:35                                                 ` Jens Axboe
  2012-02-13  1:34                                                 ` Shaohua Li
  0 siblings, 2 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-11  2:17 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

Hello,

On Fri, Feb 10, 2012 at 04:48:49PM +0800, Shaohua Li wrote:
> >> Can you please test the following one?  It's probably the simplest
> >> version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
> >> still a bit hesitant because RCU coverage needs to be extended to
> >> request_queue via conditional synchronize_rcu() in queue exit path
> >> (can't enforce delayed RCU free on request_queues and unconditional
> >> synchronize_rcu() may cause excessive delay during boot for certain
> >> configurations).  It now can be done in the block core layer proper so
> >> it shouldn't be as bad tho.  If this too flops, I'll get to that.
> > doesn't work.
> I added trace in the schedule_work code path of put_io_context, which
> runs very rare. So it's not lock contention for sure.
> Sounds the only difference between the good/bad cases is the good
> case runs with rcu_lock_read/rcu_read_unlock. I also checked slab
> info, the cfq related slab doesn't use too many memory, unlikely
> because rcu latency uses too many memory.

Yeah, that makes much more sense.  It just isn't hot enough path for
this sort of micro locking changes to matter.  I think the problem is
that, after the change, the cfqq aren't being expired immediately on
task exit.  ie. While moving the cic destruction to release path, I
accidentally removed exit notification to cfq.  I'll come up with a
fix.

Thank you!

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-11  2:17                                               ` Tejun Heo
@ 2012-02-11 11:35                                                 ` Jens Axboe
  2012-02-13  1:34                                                 ` Shaohua Li
  1 sibling, 0 replies; 49+ messages in thread
From: Jens Axboe @ 2012-02-11 11:35 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Shaohua Li, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

On 2012-02-11 03:17, Tejun Heo wrote:
> Hello,
> 
> On Fri, Feb 10, 2012 at 04:48:49PM +0800, Shaohua Li wrote:
>>>> Can you please test the following one?  It's probably the simplest
>>>> version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
>>>> still a bit hesitant because RCU coverage needs to be extended to
>>>> request_queue via conditional synchronize_rcu() in queue exit path
>>>> (can't enforce delayed RCU free on request_queues and unconditional
>>>> synchronize_rcu() may cause excessive delay during boot for certain
>>>> configurations).  It now can be done in the block core layer proper so
>>>> it shouldn't be as bad tho.  If this too flops, I'll get to that.
>>> doesn't work.
>> I added trace in the schedule_work code path of put_io_context, which
>> runs very rare. So it's not lock contention for sure.
>> Sounds the only difference between the good/bad cases is the good
>> case runs with rcu_lock_read/rcu_read_unlock. I also checked slab
>> info, the cfq related slab doesn't use too many memory, unlikely
>> because rcu latency uses too many memory.
> 
> Yeah, that makes much more sense.  It just isn't hot enough path for
> this sort of micro locking changes to matter.  I think the problem is
> that, after the change, the cfqq aren't being expired immediately on
> task exit.  ie. While moving the cic destruction to release path, I
> accidentally removed exit notification to cfq.  I'll come up with a
> fix.

Was just thinking about that last night, the missing slice expire on
task exit makes a LOT more sense than changed locking.

I'm pushing off what I have to Linus today, since I'll be gone skiing
next week. I will check email regularly and be able to apply patches and
so forth, just a heads up on availability.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-11  2:17                                               ` Tejun Heo
  2012-02-11 11:35                                                 ` Jens Axboe
@ 2012-02-13  1:34                                                 ` Shaohua Li
  2012-02-13 20:49                                                   ` Tejun Heo
  1 sibling, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-13  1:34 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

2012/2/11 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Fri, Feb 10, 2012 at 04:48:49PM +0800, Shaohua Li wrote:
>> >> Can you please test the following one?  It's probably the simplest
>> >> version w/o RCU and wq deferring.  RCUfying isn't too bad but I'm
>> >> still a bit hesitant because RCU coverage needs to be extended to
>> >> request_queue via conditional synchronize_rcu() in queue exit path
>> >> (can't enforce delayed RCU free on request_queues and unconditional
>> >> synchronize_rcu() may cause excessive delay during boot for certain
>> >> configurations).  It now can be done in the block core layer proper so
>> >> it shouldn't be as bad tho.  If this too flops, I'll get to that.
>> > doesn't work.
>> I added trace in the schedule_work code path of put_io_context, which
>> runs very rare. So it's not lock contention for sure.
>> Sounds the only difference between the good/bad cases is the good
>> case runs with rcu_lock_read/rcu_read_unlock. I also checked slab
>> info, the cfq related slab doesn't use too many memory, unlikely
>> because rcu latency uses too many memory.
>
> Yeah, that makes much more sense.  It just isn't hot enough path for
> this sort of micro locking changes to matter.  I think the problem is
> that, after the change, the cfqq aren't being expired immediately on
> task exit.  ie. While moving the cic destruction to release path, I
> accidentally removed exit notification to cfq.  I'll come up with a
> fix.
ah, I felt a little strange looking at exit_io_context, but didn't realize
the exit notification is removed (confused by put_io_context). This
makes a lot of sense. Good catch!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-13  1:34                                                 ` Shaohua Li
@ 2012-02-13 20:49                                                   ` Tejun Heo
  2012-02-14  2:36                                                     ` Shaohua Li
  0 siblings, 1 reply; 49+ messages in thread
From: Tejun Heo @ 2012-02-13 20:49 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

Hello, Shaohua.

Can you please test the following patch?  It's combination of three
patches which invokes elevator icq exit from exit_io_context().  This
unfortunately ends up adding another reverse locking loop and using
RCU could be better; unfortunately, the change isn't trivial due to
q->queue_lock modification during blk_cleanup_queue() and ioc cleanup
being called after that from blk_release_queue() - IOW, while holding
RCU, we might end up grabbing the wrong q lock (I don't think this is
a new problem).

Now that we have proper request draining on queue exit, we can
probably move ioc clearing and other operations to blk_cleanup_queue()
and then apply RCU, but that's for another merge window.

Thanks.

Index: work/block/blk-ioc.c
===================================================================
--- work.orig/block/blk-ioc.c
+++ work/block/blk-ioc.c
@@ -29,6 +29,21 @@ void get_io_context(struct io_context *i
 }
 EXPORT_SYMBOL(get_io_context);
 
+/*
+ * Exiting ioc may nest into another put_io_context() leading to nested
+ * fast path release.  As the ioc's can't be the same, this is okay but
+ * makes lockdep whine.  Keep track of nesting and use it as subclass.
+ */
+#ifdef CONFIG_LOCKDEP
+#define ioc_exit_depth(q)		((q) ? (q)->ioc_exit_depth : 0)
+#define ioc_exit_depth_inc(q)		(q)->ioc_exit_depth++
+#define ioc_exit_depth_dec(q)		(q)->ioc_exit_depth--
+#else
+#define ioc_exit_depth(q)		0
+#define ioc_exit_depth_inc(q)		do { } while (0)
+#define ioc_exit_depth_dec(q)		do { } while (0)
+#endif
+
 static void icq_free_icq_rcu(struct rcu_head *head)
 {
 	struct io_cq *icq = container_of(head, struct io_cq, __rcu_head);
@@ -36,17 +51,31 @@ static void icq_free_icq_rcu(struct rcu_
 	kmem_cache_free(icq->__rcu_icq_cache, icq);
 }
 
-/*
- * Exit and free an icq.  Called with both ioc and q locked.
- */
+/* Exit an icq. Called with both ioc and q locked. */
 static void ioc_exit_icq(struct io_cq *icq)
 {
+	struct elevator_type *et = icq->q->elevator->type;
+
+	if (icq->flags & ICQ_EXITED)
+		return;
+
+	if (et->ops.elevator_exit_icq_fn) {
+		ioc_exit_depth_inc(icq->q);
+		et->ops.elevator_exit_icq_fn(icq);
+		ioc_exit_depth_dec(icq->q);
+	}
+
+	icq->flags |= ICQ_EXITED;
+}
+
+/* Release an icq.  Called with both ioc and q locked. */
+static void ioc_destroy_icq(struct io_cq *icq)
+{
 	struct io_context *ioc = icq->ioc;
-	struct request_queue *q = icq->q;
-	struct elevator_type *et = q->elevator->type;
+	struct elevator_type *et = icq->q->elevator->type;
 
 	lockdep_assert_held(&ioc->lock);
-	lockdep_assert_held(q->queue_lock);
+	lockdep_assert_held(icq->q->queue_lock);
 
 	radix_tree_delete(&ioc->icq_tree, icq->q->id);
 	hlist_del_init(&icq->ioc_node);
@@ -60,8 +89,7 @@ static void ioc_exit_icq(struct io_cq *i
 	if (rcu_dereference_raw(ioc->icq_hint) == icq)
 		rcu_assign_pointer(ioc->icq_hint, NULL);
 
-	if (et->ops.elevator_exit_icq_fn)
-		et->ops.elevator_exit_icq_fn(icq);
+	ioc_exit_icq(icq);
 
 	/*
 	 * @icq->q might have gone away by the time RCU callback runs
@@ -71,70 +99,6 @@ static void ioc_exit_icq(struct io_cq *i
 	call_rcu(&icq->__rcu_head, icq_free_icq_rcu);
 }
 
-/*
- * Slow path for ioc release in put_io_context().  Performs double-lock
- * dancing to unlink all icq's and then frees ioc.
- */
-static void ioc_release_fn(struct work_struct *work)
-{
-	struct io_context *ioc = container_of(work, struct io_context,
-					      release_work);
-	struct request_queue *last_q = NULL;
-	unsigned long flags;
-
-	/*
-	 * Exiting icq may call into put_io_context() through elevator
-	 * which will trigger lockdep warning.  The ioc's are guaranteed to
-	 * be different, use a different locking subclass here.  Use
-	 * irqsave variant as there's no spin_lock_irq_nested().
-	 */
-	spin_lock_irqsave_nested(&ioc->lock, flags, 1);
-
-	while (!hlist_empty(&ioc->icq_list)) {
-		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
-						struct io_cq, ioc_node);
-		struct request_queue *this_q = icq->q;
-
-		if (this_q != last_q) {
-			/*
-			 * Need to switch to @this_q.  Once we release
-			 * @ioc->lock, it can go away along with @cic.
-			 * Hold on to it.
-			 */
-			__blk_get_queue(this_q);
-
-			/*
-			 * blk_put_queue() might sleep thanks to kobject
-			 * idiocy.  Always release both locks, put and
-			 * restart.
-			 */
-			if (last_q) {
-				spin_unlock(last_q->queue_lock);
-				spin_unlock_irqrestore(&ioc->lock, flags);
-				blk_put_queue(last_q);
-			} else {
-				spin_unlock_irqrestore(&ioc->lock, flags);
-			}
-
-			last_q = this_q;
-			spin_lock_irqsave(this_q->queue_lock, flags);
-			spin_lock_nested(&ioc->lock, 1);
-			continue;
-		}
-		ioc_exit_icq(icq);
-	}
-
-	if (last_q) {
-		spin_unlock(last_q->queue_lock);
-		spin_unlock_irqrestore(&ioc->lock, flags);
-		blk_put_queue(last_q);
-	} else {
-		spin_unlock_irqrestore(&ioc->lock, flags);
-	}
-
-	kmem_cache_free(iocontext_cachep, ioc);
-}
-
 /**
  * put_io_context - put a reference of io_context
  * @ioc: io_context to put
@@ -142,7 +106,7 @@ static void ioc_release_fn(struct work_s
  * Decrement reference count of @ioc and release it if the count reaches
  * zero.
  */
-void put_io_context(struct io_context *ioc)
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q)
 {
 	unsigned long flags;
 
@@ -151,16 +115,33 @@ void put_io_context(struct io_context *i
 
 	BUG_ON(atomic_long_read(&ioc->refcount) <= 0);
 
+	if (!atomic_long_dec_and_test(&ioc->refcount))
+		return;
+
 	/*
-	 * Releasing ioc requires reverse order double locking and we may
-	 * already be holding a queue_lock.  Do it asynchronously from wq.
+	 * Need to grab both q and ioc locks from ioc side to release all
+	 * icqs.  Perform reverse double locking.
 	 */
-	if (atomic_long_dec_and_test(&ioc->refcount)) {
-		spin_lock_irqsave(&ioc->lock, flags);
-		if (!hlist_empty(&ioc->icq_list))
-			schedule_work(&ioc->release_work);
-		spin_unlock_irqrestore(&ioc->lock, flags);
+	spin_lock_irqsave_nested(&ioc->lock, flags, ioc_exit_depth(locked_q));
+
+	while (!hlist_empty(&ioc->icq_list)) {
+		struct io_cq *icq = hlist_entry(ioc->icq_list.first,
+						struct io_cq, ioc_node);
+		struct request_queue *q = icq->q;
+
+		if (q == locked_q || spin_trylock(q->queue_lock)) {
+			ioc_destroy_icq(icq);
+			if (q != locked_q)
+				spin_unlock(q->queue_lock);
+		} else {
+			spin_unlock_irqrestore(&ioc->lock, flags);
+			cpu_relax();
+			spin_lock_irqsave_nested(&ioc->lock, flags,
+						 ioc_exit_depth(locked_q));
+		}
 	}
+
+	spin_unlock_irqrestore(&ioc->lock, flags);
 }
 EXPORT_SYMBOL(put_io_context);
 
@@ -168,14 +149,40 @@ EXPORT_SYMBOL(put_io_context);
 void exit_io_context(struct task_struct *task)
 {
 	struct io_context *ioc;
+	struct io_cq *icq;
+	struct hlist_node *n;
 
 	task_lock(task);
 	ioc = task->io_context;
 	task->io_context = NULL;
 	task_unlock(task);
 
-	atomic_dec(&ioc->nr_tasks);
-	put_io_context(ioc);
+	if (!atomic_dec_and_test(&ioc->nr_tasks)) {
+		put_io_context(ioc, NULL);
+		return;
+	}
+
+	/*
+	 * Need ioc lock to walk icq_list and q lock to exit icq.  Perform
+	 * reverse double locking.
+	 */
+retry:
+	spin_lock_irq(&ioc->lock);
+	hlist_for_each_entry(icq, n, &ioc->icq_list, ioc_node) {
+		if (icq->flags & ICQ_EXITED)
+			continue;
+		if (spin_trylock(icq->q->queue_lock)) {
+			ioc_exit_icq(icq);
+			spin_unlock(icq->q->queue_lock);
+		} else {
+			spin_unlock_irq(&ioc->lock);
+			cpu_relax();
+			goto retry;
+		}
+	}
+	spin_unlock_irq(&ioc->lock);
+
+	put_io_context(ioc, NULL);
 }
 
 /**
@@ -194,7 +201,7 @@ void ioc_clear_queue(struct request_queu
 		struct io_context *ioc = icq->ioc;
 
 		spin_lock(&ioc->lock);
-		ioc_exit_icq(icq);
+		ioc_destroy_icq(icq);
 		spin_unlock(&ioc->lock);
 	}
 }
@@ -215,7 +222,6 @@ void create_io_context_slowpath(struct t
 	spin_lock_init(&ioc->lock);
 	INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH);
 	INIT_HLIST_HEAD(&ioc->icq_list);
-	INIT_WORK(&ioc->release_work, ioc_release_fn);
 
 	/*
 	 * Try to install.  ioc shouldn't be installed if someone else
@@ -363,13 +369,13 @@ struct io_cq *ioc_create_icq(struct requ
 	return icq;
 }
 
-void ioc_set_changed(struct io_context *ioc, int which)
+void ioc_set_icq_flags(struct io_context *ioc, unsigned int flags)
 {
 	struct io_cq *icq;
 	struct hlist_node *n;
 
 	hlist_for_each_entry(icq, n, &ioc->icq_list, ioc_node)
-		set_bit(which, &icq->changed);
+		icq->flags |= flags;
 }
 
 /**
@@ -387,7 +393,7 @@ void ioc_ioprio_changed(struct io_contex
 
 	spin_lock_irqsave(&ioc->lock, flags);
 	ioc->ioprio = ioprio;
-	ioc_set_changed(ioc, ICQ_IOPRIO_CHANGED);
+	ioc_set_icq_flags(ioc, ICQ_IOPRIO_CHANGED);
 	spin_unlock_irqrestore(&ioc->lock, flags);
 }
 
@@ -404,11 +410,33 @@ void ioc_cgroup_changed(struct io_contex
 	unsigned long flags;
 
 	spin_lock_irqsave(&ioc->lock, flags);
-	ioc_set_changed(ioc, ICQ_CGROUP_CHANGED);
+	ioc_set_icq_flags(ioc, ICQ_CGROUP_CHANGED);
 	spin_unlock_irqrestore(&ioc->lock, flags);
 }
 EXPORT_SYMBOL(ioc_cgroup_changed);
 
+/**
+ * icq_get_changed - fetch and clear icq changed mask
+ * @icq: icq of interest
+ *
+ * Fetch and clear ICQ_*_CHANGED bits from @icq.  Grabs and releases
+ * @icq->ioc->lock.
+ */
+unsigned icq_get_changed(struct io_cq *icq)
+{
+	unsigned int changed = 0;
+	unsigned long flags;
+
+	if (unlikely(icq->flags & ICQ_CHANGED_MASK)) {
+		spin_lock_irqsave(&icq->ioc->lock, flags);
+		changed = icq->flags & ICQ_CHANGED_MASK;
+		icq->flags &= ~ICQ_CHANGED_MASK;
+		spin_unlock_irqrestore(&icq->ioc->lock, flags);
+	}
+	return changed;
+}
+EXPORT_SYMBOL(icq_get_changed);
+
 static int __init blk_ioc_init(void)
 {
 	iocontext_cachep = kmem_cache_create("blkdev_ioc",
Index: work/block/cfq-iosched.c
===================================================================
--- work.orig/block/cfq-iosched.c
+++ work/block/cfq-iosched.c
@@ -1787,7 +1787,7 @@ __cfq_slice_expired(struct cfq_data *cfq
 		cfqd->active_queue = NULL;
 
 	if (cfqd->active_cic) {
-		put_io_context(cfqd->active_cic->icq.ioc);
+		put_io_context(cfqd->active_cic->icq.ioc, cfqd->queue);
 		cfqd->active_cic = NULL;
 	}
 }
@@ -3470,20 +3470,20 @@ cfq_set_request(struct request_queue *q,
 	const int rw = rq_data_dir(rq);
 	const bool is_sync = rq_is_sync(rq);
 	struct cfq_queue *cfqq;
+	unsigned int changed;
 
 	might_sleep_if(gfp_mask & __GFP_WAIT);
 
 	spin_lock_irq(q->queue_lock);
 
 	/* handle changed notifications */
-	if (unlikely(cic->icq.changed)) {
-		if (test_and_clear_bit(ICQ_IOPRIO_CHANGED, &cic->icq.changed))
-			changed_ioprio(cic);
+	changed = icq_get_changed(&cic->icq);
+	if (unlikely(changed & ICQ_IOPRIO_CHANGED))
+		changed_ioprio(cic);
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
-		if (test_and_clear_bit(ICQ_CGROUP_CHANGED, &cic->icq.changed))
-			changed_cgroup(cic);
+	if (unlikely(changed & ICQ_CGROUP_CHANGED))
+		changed_cgroup(cic);
 #endif
-	}
 
 new_queue:
 	cfqq = cic_to_cfqq(cic, is_sync);
Index: work/include/linux/iocontext.h
===================================================================
--- work.orig/include/linux/iocontext.h
+++ work/include/linux/iocontext.h
@@ -3,11 +3,13 @@
 
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
-#include <linux/workqueue.h>
 
 enum {
-	ICQ_IOPRIO_CHANGED,
-	ICQ_CGROUP_CHANGED,
+	ICQ_IOPRIO_CHANGED	= 1 << 0,
+	ICQ_CGROUP_CHANGED	= 1 << 1,
+	ICQ_EXITED		= 1 << 2,
+
+	ICQ_CHANGED_MASK	= ICQ_IOPRIO_CHANGED | ICQ_CGROUP_CHANGED,
 };
 
 /*
@@ -88,7 +90,8 @@ struct io_cq {
 		struct rcu_head		__rcu_head;
 	};
 
-	unsigned long		changed;
+	/* ICQ_*, use atomic bitops */
+	unsigned long		flags;
 };
 
 /*
@@ -113,8 +116,6 @@ struct io_context {
 	struct radix_tree_root	icq_tree;
 	struct io_cq __rcu	*icq_hint;
 	struct hlist_head	icq_list;
-
-	struct work_struct release_work;
 };
 
 static inline struct io_context *ioc_task_link(struct io_context *ioc)
@@ -133,15 +134,17 @@ static inline struct io_context *ioc_tas
 
 struct task_struct;
 #ifdef CONFIG_BLOCK
-void put_io_context(struct io_context *ioc);
+void put_io_context(struct io_context *ioc, struct request_queue *locked_q);
 void exit_io_context(struct task_struct *task);
 struct io_context *get_task_io_context(struct task_struct *task,
 				       gfp_t gfp_flags, int node);
 void ioc_ioprio_changed(struct io_context *ioc, int ioprio);
 void ioc_cgroup_changed(struct io_context *ioc);
+unsigned int icq_get_changed(struct io_cq *icq);
 #else
 struct io_context;
-static inline void put_io_context(struct io_context *ioc) { }
+static inline void put_io_context(struct io_context *ioc,
+				  struct request_queue *locked_q) { }
 static inline void exit_io_context(struct task_struct *task) { }
 #endif
 
Index: work/block/blk-cgroup.c
===================================================================
--- work.orig/block/blk-cgroup.c
+++ work/block/blk-cgroup.c
@@ -1659,7 +1659,7 @@ static void blkiocg_attach(struct cgroup
 		ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 		if (ioc) {
 			ioc_cgroup_changed(ioc);
-			put_io_context(ioc);
+			put_io_context(ioc, NULL);
 		}
 	}
 }
Index: work/block/blk-core.c
===================================================================
--- work.orig/block/blk-core.c
+++ work/block/blk-core.c
@@ -642,7 +642,7 @@ static inline void blk_free_request(stru
 	if (rq->cmd_flags & REQ_ELVPRIV) {
 		elv_put_request(q, rq);
 		if (rq->elv.icq)
-			put_io_context(rq->elv.icq->ioc);
+			put_io_context(rq->elv.icq->ioc, q);
 	}
 
 	mempool_free(rq, q->rq.rq_pool);
Index: work/fs/ioprio.c
===================================================================
--- work.orig/fs/ioprio.c
+++ work/fs/ioprio.c
@@ -51,7 +51,7 @@ int set_task_ioprio(struct task_struct *
 	ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE);
 	if (ioc) {
 		ioc_ioprio_changed(ioc, ioprio);
-		put_io_context(ioc);
+		put_io_context(ioc, NULL);
 	}
 
 	return err;
Index: work/kernel/fork.c
===================================================================
--- work.orig/kernel/fork.c
+++ work/kernel/fork.c
@@ -910,7 +910,7 @@ static int copy_io(unsigned long clone_f
 			return -ENOMEM;
 
 		new_ioc->ioprio = ioc->ioprio;
-		put_io_context(new_ioc);
+		put_io_context(new_ioc, NULL);
 	}
 #endif
 	return 0;
Index: work/include/linux/blkdev.h
===================================================================
--- work.orig/include/linux/blkdev.h
+++ work/include/linux/blkdev.h
@@ -399,6 +399,9 @@ struct request_queue {
 	/* Throttle data */
 	struct throtl_data *td;
 #endif
+#ifdef CONFIG_LOCKDEP
+	int			ioc_exit_depth;
+#endif
 };
 
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-13 20:49                                                   ` Tejun Heo
@ 2012-02-14  2:36                                                     ` Shaohua Li
  2012-02-14 16:39                                                       ` Tejun Heo
  0 siblings, 1 reply; 49+ messages in thread
From: Shaohua Li @ 2012-02-14  2:36 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

2012/2/14 Tejun Heo <tj@kernel.org>:
> Hello, Shaohua.
>
> Can you please test the following patch?  It's combination of three
> patches which invokes elevator icq exit from exit_io_context().  This
> unfortunately ends up adding another reverse locking loop and using
> RCU could be better; unfortunately, the change isn't trivial due to
> q->queue_lock modification during blk_cleanup_queue() and ioc cleanup
> being called after that from blk_release_queue() - IOW, while holding
> RCU, we might end up grabbing the wrong q lock (I don't think this is
> a new problem).
>
> Now that we have proper request draining on queue exit, we can
> probably move ioc clearing and other operations to blk_cleanup_queue()
> and then apply RCU, but that's for another merge window.
This fixed the regression. Thanks!

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH] block: strip out locking optimization in put_io_context()
  2012-02-14  2:36                                                     ` Shaohua Li
@ 2012-02-14 16:39                                                       ` Tejun Heo
  0 siblings, 0 replies; 49+ messages in thread
From: Tejun Heo @ 2012-02-14 16:39 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Jens Axboe, Vivek Goyal, lkml, Knut Petersen, mroos, Linus Torvalds

Hello,

On Tue, Feb 14, 2012 at 10:36:36AM +0800, Shaohua Li wrote:
> 2012/2/14 Tejun Heo <tj@kernel.org>:
> > Now that we have proper request draining on queue exit, we can
> > probably move ioc clearing and other operations to blk_cleanup_queue()
> > and then apply RCU, but that's for another merge window.
>
> This fixed the regression. Thanks!

Ah... that's a good news.  I'll split up the patches and post them.

Thank you very much for the bisection and testing all the different
patches!

-- 
tejun

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2012-02-14 16:40 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-06  7:50 [patch]block: fix ioc locking warning Shaohua Li
2012-02-06  7:55 ` Jens Axboe
2012-02-06 15:12 ` Vivek Goyal
2012-02-06 16:09   ` Jens Axboe
2012-02-06 16:37     ` Vivek Goyal
2012-02-06 16:44       ` Tejun Heo
2012-02-06 16:58         ` Linus Torvalds
2012-02-06 17:27           ` Tejun Heo
2012-02-06 20:16             ` Jens Axboe
2012-02-06 21:54               ` [PATCH] block: strip out locking optimization in put_io_context() Tejun Heo
2012-02-07  6:49                 ` Jens Axboe
2012-02-07 16:22                   ` Tejun Heo
2012-02-07 16:28                     ` Jens Axboe
2012-02-07 16:33                       ` Linus Torvalds
2012-02-07 16:47                         ` Tejun Heo
2012-02-07 17:17                           ` Tejun Heo
2012-02-08  0:19                           ` Shaohua Li
2012-02-08  8:29                             ` Shaohua Li
2012-02-08 16:29                               ` Tejun Heo
2012-02-08 16:34                                 ` Linus Torvalds
2012-02-08 16:49                                   ` Tejun Heo
2012-02-08 16:56                                     ` Tejun Heo
2012-02-08 17:23                                       ` Tejun Heo
2012-02-09  6:22                                 ` Shaohua Li
2012-02-09 17:59                                   ` Tejun Heo
2012-02-09 18:07                                     ` Linus Torvalds
2012-02-09 19:24                                       ` Tejun Heo
2012-02-09 23:48                                         ` Tejun Heo
2012-02-10  5:14                                           ` Shaohua Li
2012-02-10  8:48                                             ` Shaohua Li
2012-02-11  2:17                                               ` Tejun Heo
2012-02-11 11:35                                                 ` Jens Axboe
2012-02-13  1:34                                                 ` Shaohua Li
2012-02-13 20:49                                                   ` Tejun Heo
2012-02-14  2:36                                                     ` Shaohua Li
2012-02-14 16:39                                                       ` Tejun Heo
2012-02-10  3:09                                       ` Shaohua Li
2012-02-07 23:00                   ` [PATCH] block: fix lockdep warning on io_context release put_io_context() Tejun Heo
2012-02-06 20:36             ` [patch]block: fix ioc locking warning Tejun Heo
2012-02-07  0:31               ` Shaohua Li
2012-02-07  0:39                 ` Tejun Heo
2012-02-07  0:43                   ` Shaohua Li
2012-02-07  0:59                     ` Tejun Heo
2012-02-07  1:10                       ` Shaohua Li
2012-02-07  1:33                         ` Shaohua Li
2012-02-07  5:22                       ` Shaohua Li
2012-02-07 22:34                         ` Linus Torvalds
2012-02-06 16:22 ` Tejun Heo
2012-02-08 18:07 ` walt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).