* Warning from swake_up_all in 4.14.15-rt13 non-RT @ 2018-03-05 15:08 Corey Minyard 2018-03-06 17:46 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Corey Minyard @ 2018-03-05 15:08 UTC (permalink / raw) To: linux-rt-users, linux-kernel, Tejun Heo Starting with the change 8a64547a07980f9d25e962a78c2e10ee82bdb742 fs/dcache: use swait_queue instead of waitqueue we are getting the following warning when running with PREEMPT__LL when inserting a USB drive. This is on x86_64, 4.14.15-rt13. It works fine with PREEMPT_RT. # [ 155.604042] usb 1-2: new high-speed USB device number 7 using xhci_hcd [ 155.736588] usb 1-2: New USB device found, idVendor=0781, idProduct=5567 [ 155.743291] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 155.750423] usb 1-2: Product: Cruzer Blade [ 155.754517] usb 1-2: Manufacturer: SanDisk [ 155.758616] usb 1-2: SerialNumber: 4C530302900731101541 [ 155.764207] usb-storage 1-2:1.0: USB Mass Storage device detected [ 155.770457] scsi host7: usb-storage 1-2:1.0 [ 156.831919] scsi 7:0:0:0: Direct-Access SanDisk Cruzer Blade 1.26 PQ: 0 ANSI: 6 [ 156.840160] sd 7:0:0:0: Attached scsi generic sg1 type 0 [ 156.845766] ------------[ cut here ]------------ [ 156.850387] WARNING: CPU: 0 PID: 36 at kernel/sched/swait.c:72 swake_up_all+0xb4/0xc0 [ 156.858208] Modules linked in: [ 156.861259] CPU: 0 PID: 36 Comm: kworker/0:1 Not tainted 4.14.15-rt13 #1 [ 156.867950] Hardware name: Supermicro Super Server/To be filled by O.E.M., BIOS T20170302175436 03/02/2017 [ 156.877590] Workqueue: events_freezable usb_stor_scan_dwork [ 156.883159] task: ffff8c7ead6c6a00 task.stack: ffffb19dc19d0000 [ 156.889072] RIP: 0010:swake_up_all+0xb4/0xc0 [ 156.893334] RSP: 0000:ffffb19dc19d3be0 EFLAGS: 00010046 [ 156.898550] RAX: 0000000000000046 RBX: ffff8c7eab451788 RCX: 0000000000000000 [ 156.905673] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8c7eab451770 [ 156.912798] RBP: ffff8c7eab451770 R08: 0000000000023ca0 R09: ffffffff8bb7663e [ 156.919920] R10: ffffd80dd1a7dfc0 R11: 0000000000000000 R12: ffffb19dc19d3be0 [ 156.927045] R13: 0000000000000003 R14: ffff8c7ea9e0e800 R15: ffff8c7ea69e5000 [ 156.934171] FS: 0000000000000000(0000) GS:ffff8c7ebfc00000(0000) knlGS:0000000000000000 [ 156.942246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.947983] CR2: 00000000fffd5000 CR3: 000000046bb4e000 CR4: 00000000003406f0 [ 156.955108] Call Trace: [ 156.957556] percpu_ref_kill_and_confirm+0x93/0xa0 [ 156.962345] blk_freeze_queue_start+0x25/0x30 [ 156.966696] blk_set_queue_dying+0x2b/0x90 [ 156.970786] blk_cleanup_queue+0x28/0x110 [ 156.974793] __scsi_remove_device+0x66/0x130 [ 156.979063] scsi_probe_and_add_lun+0x878/0xbd0 [ 156.983587] ? scsi_probe_and_add_lun+0x9df/0xbd0 [ 156.988285] __scsi_scan_target+0x1e8/0x550 [ 156.992462] ? __wake_up_common_lock+0x79/0x90 [ 156.996899] scsi_scan_channel+0x5b/0x80 [ 157.000815] scsi_scan_host_selected+0xbe/0xf0 [ 157.005252] scsi_scan_host+0x15e/0x1a0 [ 157.009083] usb_stor_scan_dwork+0x1d/0x80 [ 157.013177] process_one_work+0x1dd/0x3e0 [ 157.017189] worker_thread+0x26/0x400 [ 157.020844] ? cancel_delayed_work+0x10/0x10 [ 157.025107] kthread+0x116/0x130 [ 157.028333] ? kthread_create_on_node+0x40/0x40 [ 157.032858] ret_from_fork+0x35/0x40 [ 157.036435] Code: 49 39 c4 74 17 c6 45 00 00 fb 48 8d 7d 00 e8 c4 8a 97 00 48 8b 04 24 49 39 c4 75 b9 c6 45 00 00 fb 48 8d 64 24 10 5b 5d 41 5c c3 [ 157.055292] ---[ end trace 86c20fd8d6c01794 ]--- [ 157.060040] sd 7:0:0:0: [sdb] 15633408 512-byte logical blocks: (8.00 GB/7.45 GiB) [ 157.070089] sd 7:0:0:0: [sdb] Write Protect is off [ 157.075183] sd 7:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 157.100295] sdb: sdb1 [ 157.103778] sd 7:0:0:0: [sdb] Attached SCSI disk [ 157.379921] FAT-fs (sdb1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. The following change is the obvious reason: --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -69,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q) struct swait_queue *curr; LIST_HEAD(tmp); + WARN_ON(irqs_disabled()); raw_spin_lock_irq(&q->lock); list_splice_init(&q->task_list, &tmp); while (!list_empty(&tmp)) { I've done a little bit of analysis here, percpu_ref_kill_and_confirm() does spin_lock_irqsave() and then does a percpu_ref_put(). If the refcount reaches zero, the release function of the refcount is called. In this case, the block code has set this to blk_queue_usage_counter_release(), which calls swake_up_all(). It seems like a bad idea to call percpu_ref_put() with interrupts disabled. This problem actually doesn't appear to be RT-related, there's just no warning call if the RT tree isn't used. I'm not sure if it's best to just do the put outside the lock, or have modified put function that returns a bool to know if a release is required, then the release function can be called outside the lock. I can do patches and test, but I'm hoping for a little guidance here. I'm also wondering why we don't have a warning like this in the *_spin_lock_irq() macros, perhaps turned on with a debug option. That would catch things like this sooner. Thanks, -corey ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-05 15:08 Warning from swake_up_all in 4.14.15-rt13 non-RT Corey Minyard @ 2018-03-06 17:46 ` Sebastian Andrzej Siewior 2018-03-06 22:51 ` Corey Minyard 2018-03-07 15:45 ` Corey Minyard 0 siblings, 2 replies; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-06 17:46 UTC (permalink / raw) To: Corey Minyard Cc: linux-rt-users, linux-kernel, Tejun Heo, Peter Zijlstra, Thomas Gleixner On 2018-03-05 09:08:11 [-0600], Corey Minyard wrote: > Starting with the change > > 8a64547a07980f9d25e962a78c2e10ee82bdb742 fs/dcache: use swait_queue instead > of > waitqueue … > The following change is the obvious reason: > > --- a/kernel/sched/swait.c > +++ b/kernel/sched/swait.c > @@ -69,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q) > struct swait_queue *curr; > LIST_HEAD(tmp); > > + WARN_ON(irqs_disabled()); > raw_spin_lock_irq(&q->lock); > list_splice_init(&q->task_list, &tmp); > while (!list_empty(&tmp)) { > > I've done a little bit of analysis here, percpu_ref_kill_and_confirm() > does spin_lock_irqsave() and then does a percpu_ref_put(). If the > refcount reaches zero, the release function of the refcount is > called. In this case, the block code has set this to > blk_queue_usage_counter_release(), which calls swake_up_all(). > > It seems like a bad idea to call percpu_ref_put() with interrupts > disabled. This problem actually doesn't appear to be RT-related, > there's just no warning call if the RT tree isn't used. yeah but vanilla uses wake_up() which does spin_lock_irqsafe() so it is not an issue there. The odd part here is that percpu_ref_kill_and_confirm() does _irqsave() which suggests that it might be called from any context and then it does wait_event_lock_irq() which enables interrupts again while it waits. So it can't be used from any context. > I'm not sure if it's best to just do the put outside the lock, or > have modified put function that returns a bool to know if a release > is required, then the release function can be called outside the > lock. I can do patches and test, but I'm hoping for a little > guidance here. swake_up_all() does raw_spin_lock_irq() because it should be called from non-IRQ context. And it drops the lock (+IRQ enable) between wake-ups in case we "need_resched()" because we woke a high-priority waiter. There is the list_splice because we wanted to drop the locks (and have IRQs open) during the entire wake up process but finish_swait() may happen during the wake up and so we must hold the lock while the list-item is removed for the queue head. I have no idea what is the wisest thing to do here. The obvious fix would be to use the irqsafe() variant here and not drop the lock between wake ups. That is essentially what swake_up_all_locked() does which I need for the completions (and based on some testing most users have one waiter except during PM and some crypto code). It is probably no comparison to wake_up_q() (which does multiple wake ups without a context switch) but then we did this before like that. Preferably we would have a proper list_splice() and some magic in the "early" dequeue part that works. > I'm also wondering why we don't have a warning like this in the > *_spin_lock_irq() macros, perhaps turned on with a debug > option. That would catch things like this sooner. Ideally you would add lockdep_assert_irqs_enabled() to local_irq_disable() so you would have it hidden behind lockdep with an recursion check and everything. But this needs a lot of headers like task_struct so… I had once WARN_ON_ONCE(irqs_disabled()) added to testdrive it and had a few false-positive in the early boot or constructs like in __run_hrtimer(). I didn't look at it further… > Thanks, > > -corey Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-06 17:46 ` Sebastian Andrzej Siewior @ 2018-03-06 22:51 ` Corey Minyard 2018-03-07 15:45 ` Corey Minyard 1 sibling, 0 replies; 23+ messages in thread From: Corey Minyard @ 2018-03-06 22:51 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, linux-kernel, Tejun Heo, Peter Zijlstra, Thomas Gleixner, Kent Overstreet On 03/06/2018 11:46 AM, Sebastian Andrzej Siewior wrote: > On 2018-03-05 09:08:11 [-0600], Corey Minyard wrote: >> Starting with the change >> >> 8a64547a07980f9d25e962a78c2e10ee82bdb742 fs/dcache: use swait_queue instead >> of >> waitqueue > … >> The following change is the obvious reason: >> >> --- a/kernel/sched/swait.c >> +++ b/kernel/sched/swait.c >> @@ -69,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q) >> struct swait_queue *curr; >> LIST_HEAD(tmp); >> >> + WARN_ON(irqs_disabled()); >> raw_spin_lock_irq(&q->lock); >> list_splice_init(&q->task_list, &tmp); >> while (!list_empty(&tmp)) { >> >> I've done a little bit of analysis here, percpu_ref_kill_and_confirm() >> does spin_lock_irqsave() and then does a percpu_ref_put(). If the >> refcount reaches zero, the release function of the refcount is >> called. In this case, the block code has set this to >> blk_queue_usage_counter_release(), which calls swake_up_all(). >> >> It seems like a bad idea to call percpu_ref_put() with interrupts >> disabled. This problem actually doesn't appear to be RT-related, >> there's just no warning call if the RT tree isn't used. > yeah but vanilla uses wake_up() which does spin_lock_irqsafe() so it is > not an issue there. > > The odd part here is that percpu_ref_kill_and_confirm() does _irqsave() > which suggests that it might be called from any context and then it does > wait_event_lock_irq() which enables interrupts again while it waits. So > it can't be used from any context. I'm adding the author (Kent) to this email, I should have done that originally. You are right, it looks like all the percpu_ref_switch.. and percpu_ref_kill... functions are broken here. I also don't understand the need for a global lock for non-global variables. It looks like this could become a bottleneck in a big SMP system. I'm going to spend some time with this and try to figure out what is going on. Hopefully Kent or Tejun can offer some insight. -corey >> I'm not sure if it's best to just do the put outside the lock, or >> have modified put function that returns a bool to know if a release >> is required, then the release function can be called outside the >> lock. I can do patches and test, but I'm hoping for a little >> guidance here. > swake_up_all() does raw_spin_lock_irq() because it should be called from > non-IRQ context. And it drops the lock (+IRQ enable) between wake-ups in > case we "need_resched()" because we woke a high-priority waiter. There > is the list_splice because we wanted to drop the locks (and have IRQs > open) during the entire wake up process but finish_swait() may happen > during the wake up and so we must hold the lock while the list-item is > removed for the queue head. > I have no idea what is the wisest thing to do here. The obvious fix > would be to use the irqsafe() variant here and not drop the lock between > wake ups. That is essentially what swake_up_all_locked() does which I > need for the completions (and based on some testing most users have one > waiter except during PM and some crypto code). > It is probably no comparison to wake_up_q() (which does multiple wake > ups without a context switch) but then we did this before like that. > > Preferably we would have a proper list_splice() and some magic in the > "early" dequeue part that works. > >> I'm also wondering why we don't have a warning like this in the >> *_spin_lock_irq() macros, perhaps turned on with a debug >> option. That would catch things like this sooner. > Ideally you would add lockdep_assert_irqs_enabled() to > local_irq_disable() so you would have it hidden behind lockdep with an > recursion check and everything. But this needs a lot of headers like > task_struct so… > I had once WARN_ON_ONCE(irqs_disabled()) added to testdrive it and had a > few false-positive in the early boot or constructs like in > __run_hrtimer(). I didn't look at it further… > >> Thanks, >> >> -corey > Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-06 17:46 ` Sebastian Andrzej Siewior 2018-03-06 22:51 ` Corey Minyard @ 2018-03-07 15:45 ` Corey Minyard 2018-03-08 17:41 ` Sebastian Andrzej Siewior 1 sibling, 1 reply; 23+ messages in thread From: Corey Minyard @ 2018-03-07 15:45 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, linux-kernel, Tejun Heo, Peter Zijlstra, Thomas Gleixner On 03/06/2018 11:46 AM, Sebastian Andrzej Siewior wrote: > On 2018-03-05 09:08:11 [-0600], Corey Minyard wrote: >> Starting with the change >> >> 8a64547a07980f9d25e962a78c2e10ee82bdb742 fs/dcache: use swait_queue instead >> of >> waitqueue > … >> The following change is the obvious reason: >> >> --- a/kernel/sched/swait.c >> +++ b/kernel/sched/swait.c >> @@ -69,6 +69,7 @@ void swake_up_all(struct swait_queue_head *q) >> struct swait_queue *curr; >> LIST_HEAD(tmp); >> >> + WARN_ON(irqs_disabled()); >> raw_spin_lock_irq(&q->lock); >> list_splice_init(&q->task_list, &tmp); >> while (!list_empty(&tmp)) { >> >> I've done a little bit of analysis here, percpu_ref_kill_and_confirm() >> does spin_lock_irqsave() and then does a percpu_ref_put(). If the >> refcount reaches zero, the release function of the refcount is >> called. In this case, the block code has set this to >> blk_queue_usage_counter_release(), which calls swake_up_all(). >> >> It seems like a bad idea to call percpu_ref_put() with interrupts >> disabled. This problem actually doesn't appear to be RT-related, >> there's just no warning call if the RT tree isn't used. > yeah but vanilla uses wake_up() which does spin_lock_irqsafe() so it is > not an issue there. > > The odd part here is that percpu_ref_kill_and_confirm() does _irqsave() > which suggests that it might be called from any context and then it does > wait_event_lock_irq() which enables interrupts again while it waits. So > it can't be used from any context. > >> I'm not sure if it's best to just do the put outside the lock, or >> have modified put function that returns a bool to know if a release >> is required, then the release function can be called outside the >> lock. I can do patches and test, but I'm hoping for a little >> guidance here. > swake_up_all() does raw_spin_lock_irq() because it should be called from > non-IRQ context. And it drops the lock (+IRQ enable) between wake-ups in > case we "need_resched()" because we woke a high-priority waiter. There > is the list_splice because we wanted to drop the locks (and have IRQs > open) during the entire wake up process but finish_swait() may happen > during the wake up and so we must hold the lock while the list-item is > removed for the queue head. > I have no idea what is the wisest thing to do here. The obvious fix > would be to use the irqsafe() variant here and not drop the lock between > wake ups. That is essentially what swake_up_all_locked() does which I > need for the completions (and based on some testing most users have one > waiter except during PM and some crypto code). > It is probably no comparison to wake_up_q() (which does multiple wake > ups without a context switch) but then we did this before like that. > > Preferably we would have a proper list_splice() and some magic in the > "early" dequeue part that works. > Maybe just modify the block code to run the swake_up_all() call in a workqueue or tasklet? If you think that works, I'll create a patch, test it, and submit it if all goes well. Thanks, -corey ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-07 15:45 ` Corey Minyard @ 2018-03-08 17:41 ` Sebastian Andrzej Siewior 2018-03-08 19:54 ` Corey Minyard 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-08 17:41 UTC (permalink / raw) To: Corey Minyard Cc: linux-rt-users, linux-kernel, Tejun Heo, Peter Zijlstra, Thomas Gleixner On 2018-03-07 09:45:29 [-0600], Corey Minyard wrote: > > I have no idea what is the wisest thing to do here. The obvious fix > > would be to use the irqsafe() variant here and not drop the lock between > > wake ups. That is essentially what swake_up_all_locked() does which I > > need for the completions (and based on some testing most users have one > > waiter except during PM and some crypto code). > > It is probably no comparison to wake_up_q() (which does multiple wake > > ups without a context switch) but then we did this before like that. > > > > Preferably we would have a proper list_splice() and some magic in the > > "early" dequeue part that works. > > > > Maybe just modify the block code to run the swake_up_all() call in a > workqueue > or tasklet? If you think that works, I'll create a patch, test it, and > submit it if > all goes well. It will work but I don't think pushing this into workqueue/tasklet is a good idea. You want to wakeup all waiters on waitqueue X (probably one waiter) and instead there is one one wakeup + ctx-switch which does the final wakeup. But now I had an idea: swake_up_all() could iterate over list and instead performing wakes it would just wake_q_add() the tasks. Drop the lock and then wake_up_q(). So in case there is wakeup pending and the task removed itself from the list then the task may observe a spurious wakeup. > Thanks, > > -corey Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-08 17:41 ` Sebastian Andrzej Siewior @ 2018-03-08 19:54 ` Corey Minyard 2018-03-09 11:04 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Corey Minyard @ 2018-03-08 19:54 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, linux-kernel, Tejun Heo, Peter Zijlstra, Thomas Gleixner On 03/08/2018 11:41 AM, Sebastian Andrzej Siewior wrote: > On 2018-03-07 09:45:29 [-0600], Corey Minyard wrote: >>> I have no idea what is the wisest thing to do here. The obvious fix >>> would be to use the irqsafe() variant here and not drop the lock between >>> wake ups. That is essentially what swake_up_all_locked() does which I >>> need for the completions (and based on some testing most users have one >>> waiter except during PM and some crypto code). >>> It is probably no comparison to wake_up_q() (which does multiple wake >>> ups without a context switch) but then we did this before like that. >>> >>> Preferably we would have a proper list_splice() and some magic in the >>> "early" dequeue part that works. >>> >> Maybe just modify the block code to run the swake_up_all() call in a >> workqueue >> or tasklet? If you think that works, I'll create a patch, test it, and >> submit it if >> all goes well. > It will work but I don't think pushing this into workqueue/tasklet is a > good idea. You want to wakeup all waiters on waitqueue X (probably one > waiter) and instead there is one one wakeup + ctx-switch which does the > final wakeup. True, but this is an uncommon and already fairly expensive operation being done. Adding a context switch doesn't seem that bad. > But now I had an idea: swake_up_all() could iterate over list and > instead performing wakes it would just wake_q_add() the tasks. Drop the > lock and then wake_up_q(). So in case there is wakeup pending and the > task removed itself from the list then the task may observe a spurious > wakeup. That sounds promising, but where does wake_up_q() get called? No matter what it's an expensive operation and I'm not sure where you would put it in this case. I had another idea. This is only occurring if RT is not enabled, because with RT all the irq disable things go away and you are generally running in task context. So why not have a different version of swake_up_all() for non-RT that does work from irqs-off context? -corey ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-08 19:54 ` Corey Minyard @ 2018-03-09 11:04 ` Sebastian Andrzej Siewior 2018-03-09 13:29 ` Corey Minyard 2018-03-09 17:46 ` Peter Zijlstra 0 siblings, 2 replies; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-09 11:04 UTC (permalink / raw) To: Corey Minyard, Peter Zijlstra, Thomas Gleixner, Steven Rostedt Cc: linux-rt-users, linux-kernel, Tejun Heo On 2018-03-08 13:54:17 [-0600], Corey Minyard wrote: > > It will work but I don't think pushing this into workqueue/tasklet is a > > good idea. You want to wakeup all waiters on waitqueue X (probably one > > waiter) and instead there is one one wakeup + ctx-switch which does the > > final wakeup. > > True, but this is an uncommon and already fairly expensive operation being > done. Adding a context switch doesn't seem that bad. still no need to make it more expensive if it can be avoided. > > But now I had an idea: swake_up_all() could iterate over list and > > instead performing wakes it would just wake_q_add() the tasks. Drop the > > lock and then wake_up_q(). So in case there is wakeup pending and the > > task removed itself from the list then the task may observe a spurious > > wakeup. > > That sounds promising, but where does wake_up_q() get called? No matter > what > it's an expensive operation and I'm not sure where you would put it in this > case. Look at this: Subject: [RFC PATCH RT] sched/swait: use WAKE_Q for possible multiple wake ups Corey Minyard reported swake_up_all() invocation with disabled interrupts with the RT patch applied but disabled (low latency config). The reason why swake_up_all() avoids the irqsafe variant is because it shouldn't be called from IRQ-disabled section. The idea was to wake up one task after the other, enable interrupts (and drop the lock) during the wake ups so we can schedule away in case a task with a higher priority was just waken up. In RT we have swait based completions so I kind of needed a complete() which could wake multiple sleepers without dropping the lock and enabling interrupts. To work around this shortcoming I propose to use WAKE_Q. swake_up_all() will queue all to be woken up tasks on wake-queue with interrupts disabled which should be "quick". After dropping the lock (and enabling interrupts) it can wake the tasks one after the other. Reported-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- include/linux/swait.h | 4 +++- kernel/sched/completion.c | 5 ++++- kernel/sched/swait.c | 35 ++++++++++------------------------- 3 files changed, 17 insertions(+), 27 deletions(-) diff --git a/include/linux/swait.h b/include/linux/swait.h index 853f3e61a9f4..929721cffdb3 100644 --- a/include/linux/swait.h +++ b/include/linux/swait.h @@ -148,7 +148,9 @@ static inline bool swq_has_sleeper(struct swait_queue_head *wq) extern void swake_up(struct swait_queue_head *q); extern void swake_up_all(struct swait_queue_head *q); extern void swake_up_locked(struct swait_queue_head *q); -extern void swake_up_all_locked(struct swait_queue_head *q); + +struct wake_q_head; +extern void swake_add_all_wq(struct swait_queue_head *q, struct wake_q_head *wq); extern void __prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait); extern void prepare_to_swait(struct swait_queue_head *q, struct swait_queue *wait, int state); diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index 0fe2982e46a0..461d992e30f9 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -15,6 +15,7 @@ #include <linux/sched/signal.h> #include <linux/sched/debug.h> #include <linux/completion.h> +#include <linux/sched/wake_q.h> /** * complete: - signals a single thread waiting on this completion @@ -65,11 +66,13 @@ EXPORT_SYMBOL(complete); void complete_all(struct completion *x) { unsigned long flags; + DEFINE_WAKE_Q(wq); raw_spin_lock_irqsave(&x->wait.lock, flags); x->done = UINT_MAX; - swake_up_all_locked(&x->wait); + swake_add_all_wq(&x->wait, &wq); raw_spin_unlock_irqrestore(&x->wait.lock, flags); + wake_up_q(&wq); } EXPORT_SYMBOL(complete_all); diff --git a/kernel/sched/swait.c b/kernel/sched/swait.c index b14638a05ec9..1a09cc425bd8 100644 --- a/kernel/sched/swait.c +++ b/kernel/sched/swait.c @@ -2,6 +2,7 @@ #include <linux/sched/signal.h> #include <linux/swait.h> #include <linux/suspend.h> +#include <linux/sched/wake_q.h> void __init_swait_queue_head(struct swait_queue_head *q, const char *name, struct lock_class_key *key) @@ -31,24 +32,19 @@ void swake_up_locked(struct swait_queue_head *q) } EXPORT_SYMBOL(swake_up_locked); -void swake_up_all_locked(struct swait_queue_head *q) +void swake_add_all_wq(struct swait_queue_head *q, struct wake_q_head *wq) { struct swait_queue *curr; - int wakes = 0; while (!list_empty(&q->task_list)) { curr = list_first_entry(&q->task_list, typeof(*curr), task_list); - wake_up_process(curr->task); list_del_init(&curr->task_list); - wakes++; + wake_q_add(wq, curr->task); } - if (pm_in_action) - return; - WARN(wakes > 2, "complete_all() with %d waiters\n", wakes); } -EXPORT_SYMBOL(swake_up_all_locked); +EXPORT_SYMBOL(swake_add_all_wq); void swake_up(struct swait_queue_head *q) { @@ -66,25 +62,14 @@ EXPORT_SYMBOL(swake_up); */ void swake_up_all(struct swait_queue_head *q) { - struct swait_queue *curr; - LIST_HEAD(tmp); + unsigned long flags; + DEFINE_WAKE_Q(wq); - WARN_ON(irqs_disabled()); - raw_spin_lock_irq(&q->lock); - list_splice_init(&q->task_list, &tmp); - while (!list_empty(&tmp)) { - curr = list_first_entry(&tmp, typeof(*curr), task_list); + raw_spin_lock_irqsave(&q->lock, flags); + swake_add_all_wq(q, &wq); + raw_spin_unlock_irqrestore(&q->lock, flags); - wake_up_state(curr->task, TASK_NORMAL); - list_del_init(&curr->task_list); - - if (list_empty(&tmp)) - break; - - raw_spin_unlock_irq(&q->lock); - raw_spin_lock_irq(&q->lock); - } - raw_spin_unlock_irq(&q->lock); + wake_up_q(&wq); } EXPORT_SYMBOL(swake_up_all); > I had another idea. This is only occurring if RT is not enabled, because > with > RT all the irq disable things go away and you are generally running in task > context. So why not have a different version of swake_up_all() for non-RT > that does work from irqs-off context? With the patch above I have puzzle part which would allow to use swait based completions upstream. That ifdef would probably not help. > -corey Sebastian ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 11:04 ` Sebastian Andrzej Siewior @ 2018-03-09 13:29 ` Corey Minyard 2018-03-09 14:58 ` Sebastian Andrzej Siewior 2018-03-09 17:46 ` Peter Zijlstra 1 sibling, 1 reply; 23+ messages in thread From: Corey Minyard @ 2018-03-09 13:29 UTC (permalink / raw) To: Sebastian Andrzej Siewior, Peter Zijlstra, Thomas Gleixner, Steven Rostedt Cc: linux-rt-users, linux-kernel, Tejun Heo On 03/09/2018 05:04 AM, Sebastian Andrzej Siewior wrote: > On 2018-03-08 13:54:17 [-0600], Corey Minyard wrote: >>> It will work but I don't think pushing this into workqueue/tasklet is a >>> good idea. You want to wakeup all waiters on waitqueue X (probably one >>> waiter) and instead there is one one wakeup + ctx-switch which does the >>> final wakeup. >> True, but this is an uncommon and already fairly expensive operation being >> done. Adding a context switch doesn't seem that bad. > still no need to make it more expensive if it can be avoided. > >>> But now I had an idea: swake_up_all() could iterate over list and >>> instead performing wakes it would just wake_q_add() the tasks. Drop the >>> lock and then wake_up_q(). So in case there is wakeup pending and the >>> task removed itself from the list then the task may observe a spurious >>> wakeup. >> That sounds promising, but where does wake_up_q() get called? No matter >> what >> it's an expensive operation and I'm not sure where you would put it in this >> case. > Look at this: > ... > > void swake_up(struct swait_queue_head *q) > { > @@ -66,25 +62,14 @@ EXPORT_SYMBOL(swake_up); > */ > void swake_up_all(struct swait_queue_head *q) > { > - struct swait_queue *curr; > - LIST_HEAD(tmp); > + unsigned long flags; > + DEFINE_WAKE_Q(wq); > > - WARN_ON(irqs_disabled()); > - raw_spin_lock_irq(&q->lock); > - list_splice_init(&q->task_list, &tmp); > - while (!list_empty(&tmp)) { > - curr = list_first_entry(&tmp, typeof(*curr), task_list); > + raw_spin_lock_irqsave(&q->lock, flags); > + swake_add_all_wq(q, &wq); > + raw_spin_unlock_irqrestore(&q->lock, flags); > > - wake_up_state(curr->task, TASK_NORMAL); > - list_del_init(&curr->task_list); > - > - if (list_empty(&tmp)) > - break; > - > - raw_spin_unlock_irq(&q->lock); > - raw_spin_lock_irq(&q->lock); > - } > - raw_spin_unlock_irq(&q->lock); > + wake_up_q(&wq); From what I can tell, wake_up_q() is unbounded, and you have undone what the previous code had tried to accomplish. In the scenario I'm talking about, interrupts are still disabled here. That's why I was asking about where to put wake_up_q(), I knew you could put it here, but it didn't seem to me to help at all. > } > EXPORT_SYMBOL(swake_up_all); > > >> I had another idea. This is only occurring if RT is not enabled, because >> with >> RT all the irq disable things go away and you are generally running in task >> context. So why not have a different version of swake_up_all() for non-RT >> that does work from irqs-off context? > With the patch above I have puzzle part which would allow to use swait > based completions upstream. That ifdef would probably not help. I agree that having a bounded time way to wake up a bunch of threads while interrupts are disabled would solve a bunch of issues. I just don't see how it can be done without pushing it off to a softirq or workqueue. -corey >> -corey > Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 13:29 ` Corey Minyard @ 2018-03-09 14:58 ` Sebastian Andrzej Siewior 2018-03-09 16:03 ` Corey Minyard 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-09 14:58 UTC (permalink / raw) To: Corey Minyard Cc: Peter Zijlstra, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 2018-03-09 07:29:31 [-0600], Corey Minyard wrote: > From what I can tell, wake_up_q() is unbounded, and you have undone what > the previous code had tried to accomplish. In the scenario I'm talking > about, > interrupts are still disabled here. That's why I was asking about where to > put > wake_up_q(), I knew you could put it here, but it didn't seem to me to help > at all. So you are worried about unbound latencies on !RT. Okay. So for !RT this does not help but it is not worse then before (before the RT patch was applied and changed things). In fact it is better now (with RT patch and this one) because before that patch you would not only open interrupts between the wake up but you would leave the function with interrupts open which is wrong. Any interrupt (or a context switch due to need-resched() that would invoke percpu_ref_put() would freeze the CPU/system. Also, every user that invoked swake_up_all() with enabled interrupts will still perform the wake up with enabled interrupts. So nothing changes here. > > > I had another idea. This is only occurring if RT is not enabled, because > > > with > > > RT all the irq disable things go away and you are generally running in task > > > context. So why not have a different version of swake_up_all() for non-RT > > > that does work from irqs-off context? > > With the patch above I have puzzle part which would allow to use swait > > based completions upstream. That ifdef would probably not help. > > I agree that having a bounded time way to wake up a bunch of threads while > interrupts are disabled would solve a bunch of issues. I just don't see how > it > can be done without pushing it off to a softirq or workqueue. true but this is a different story. We started with a WARN_ON() which triggered correctly and the problem it pointed to looks solved to me. This "unbounded runtime during the wake up of many tasks with interrupts disabled via percpu_ref_kill() -> blk_queue_usage_counter_release()" thing exists already in the vanilla kernel and does not exist with the RT patch applied and RT enabled. If you are affected by this and you don't like it - fine. Using a workqueue is one way of getting around it (the softirq is not preemptible in !RT so it wouldn't change much). However, I see no benefit in carrying such a patch because as I said only !RT is affected by this. > -corey Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 14:58 ` Sebastian Andrzej Siewior @ 2018-03-09 16:03 ` Corey Minyard 0 siblings, 0 replies; 23+ messages in thread From: Corey Minyard @ 2018-03-09 16:03 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Peter Zijlstra, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 03/09/2018 08:58 AM, Sebastian Andrzej Siewior wrote: > On 2018-03-09 07:29:31 [-0600], Corey Minyard wrote: >> From what I can tell, wake_up_q() is unbounded, and you have undone what >> the previous code had tried to accomplish. In the scenario I'm talking >> about, >> interrupts are still disabled here. That's why I was asking about where to >> put >> wake_up_q(), I knew you could put it here, but it didn't seem to me to help >> at all. > So you are worried about unbound latencies on !RT. Okay. So for !RT this > does not help but it is not worse then before (before the RT patch was > applied and changed things). > In fact it is better now (with RT patch and this one) because before > that patch you would not only open interrupts between the wake up but you > would leave the function with interrupts open which is wrong. Any > interrupt (or a context switch due to need-resched() that would invoke > percpu_ref_put() would freeze the CPU/system. > Also, every user that invoked swake_up_all() with enabled interrupts > will still perform the wake up with enabled interrupts. So nothing > changes here. Ah, ok, that makes sense. Sorry, I was mixing things up. Yes. on RT this should fix the unbounded time issue, and it should also solve the interrupts disabled issue on !RT. I'll try this out. -corey >>>> I had another idea. This is only occurring if RT is not enabled, because >>>> with >>>> RT all the irq disable things go away and you are generally running in task >>>> context. So why not have a different version of swake_up_all() for non-RT >>>> that does work from irqs-off context? >>> With the patch above I have puzzle part which would allow to use swait >>> based completions upstream. That ifdef would probably not help. >> I agree that having a bounded time way to wake up a bunch of threads while >> interrupts are disabled would solve a bunch of issues. I just don't see how >> it >> can be done without pushing it off to a softirq or workqueue. > true but this is a different story. We started with a WARN_ON() which > triggered correctly and the problem it pointed to looks solved to me. > > This "unbounded runtime during the wake up of many tasks with interrupts > disabled via percpu_ref_kill() -> blk_queue_usage_counter_release()" > thing exists already in the vanilla kernel and does not exist > with the RT patch applied and RT enabled. If you are affected by this > and you don't like it - fine. Using a workqueue is one way of getting > around it (the softirq is not preemptible in !RT so it wouldn't change > much). However, I see no benefit in carrying such a patch because as I > said only !RT is affected by this. > >> -corey > Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 11:04 ` Sebastian Andrzej Siewior 2018-03-09 13:29 ` Corey Minyard @ 2018-03-09 17:46 ` Peter Zijlstra 2018-03-09 20:25 ` Sebastian Andrzej Siewior 2018-03-09 22:02 ` Warning from swake_up_all in 4.14.15-rt13 non-RT Corey Minyard 1 sibling, 2 replies; 23+ messages in thread From: Peter Zijlstra @ 2018-03-09 17:46 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On Fri, Mar 09, 2018 at 12:04:18PM +0100, Sebastian Andrzej Siewior wrote: > +void swake_add_all_wq(struct swait_queue_head *q, struct wake_q_head *wq) > { > struct swait_queue *curr; > > while (!list_empty(&q->task_list)) { > > curr = list_first_entry(&q->task_list, typeof(*curr), > task_list); > list_del_init(&curr->task_list); > + wake_q_add(wq, curr->task); > } > } > +EXPORT_SYMBOL(swake_add_all_wq); > > void swake_up(struct swait_queue_head *q) > { > @@ -66,25 +62,14 @@ EXPORT_SYMBOL(swake_up); > */ > void swake_up_all(struct swait_queue_head *q) > { > + unsigned long flags; > + DEFINE_WAKE_Q(wq); > > + raw_spin_lock_irqsave(&q->lock, flags); > + swake_add_all_wq(q, &wq); > + raw_spin_unlock_irqrestore(&q->lock, flags); > > + wake_up_q(&wq); > } > EXPORT_SYMBOL(swake_up_all); This is fundamentally wrong. The whole point of wake_up_all() is that _all_ is unbounded and should not ever land in a single critical section, be it IRQ or PREEMPT disabled. The above does both. Yes, wake_up_all() is crap, it is also fundamentally incompatible with in-*irq usage. Nothing to be done about that. So NAK on this. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 17:46 ` Peter Zijlstra @ 2018-03-09 20:25 ` Sebastian Andrzej Siewior 2018-03-09 22:26 ` Peter Zijlstra 2018-03-09 22:02 ` Warning from swake_up_all in 4.14.15-rt13 non-RT Corey Minyard 1 sibling, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-09 20:25 UTC (permalink / raw) To: Peter Zijlstra Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 2018-03-09 18:46:05 [+0100], Peter Zijlstra wrote: > On Fri, Mar 09, 2018 at 12:04:18PM +0100, Sebastian Andrzej Siewior wrote: > > +void swake_add_all_wq(struct swait_queue_head *q, struct wake_q_head *wq) > > { > > struct swait_queue *curr; > > > > while (!list_empty(&q->task_list)) { > > > > curr = list_first_entry(&q->task_list, typeof(*curr), > > task_list); > > list_del_init(&curr->task_list); > > + wake_q_add(wq, curr->task); > > } > > } > > +EXPORT_SYMBOL(swake_add_all_wq); > > > > void swake_up(struct swait_queue_head *q) > > { > > @@ -66,25 +62,14 @@ EXPORT_SYMBOL(swake_up); > > */ > > void swake_up_all(struct swait_queue_head *q) > > { > > + unsigned long flags; > > + DEFINE_WAKE_Q(wq); > > > > + raw_spin_lock_irqsave(&q->lock, flags); > > + swake_add_all_wq(q, &wq); > > + raw_spin_unlock_irqrestore(&q->lock, flags); > > > > + wake_up_q(&wq); > > } > > EXPORT_SYMBOL(swake_up_all); > > This is fundamentally wrong. The whole point of wake_up_all() is that > _all_ is unbounded and should not ever land in a single critical > section, be it IRQ or PREEMPT disabled. The above does both. Is it just about the irqsave() usage or something else? I doubt it is the list walk. It is still unbound if not called from irq-off region. But it is now possible, I agree. The wake_q usage should be cheaper compared to IRQ off+on in each loop. And we wanted to do the wake ups with enabled interrupts - there is still the list_splice() from that attempt. Now it can be. > Yes, wake_up_all() is crap, it is also fundamentally incompatible with > in-*irq usage. Nothing to be done about that. I still have (or need) completions which are swait based and do complete_all(). There are complete_all() caller which wake more than one waiter (that is PM and crypto from the reports I got once I added the WARN_ON())). The in-IRQ usage is !RT only and was there before. > So NAK on this. So I need completions to be swait based and do complete_all() from IRQ (on !RT, not RT). I have this one call which breaks the usage on !RT and has wake_up_all() in it in vanilla which needs an swait equivalent since it calls its callback from an rcu-sched section. Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 20:25 ` Sebastian Andrzej Siewior @ 2018-03-09 22:26 ` Peter Zijlstra 2018-03-12 10:51 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2018-03-09 22:26 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On Fri, Mar 09, 2018 at 09:25:50PM +0100, Sebastian Andrzej Siewior wrote: > Is it just about the irqsave() usage or something else? I doubt it is > the list walk. It is still unbound if not called from irq-off region. The current list walk is preemptible. You put the entire iteration (of unbound length) inside a single critical section which destroy RT. > But it is now possible, I agree. The wake_q usage should be cheaper > compared to IRQ off+on in each loop. And we wanted to do the wake ups > with enabled interrupts - there is still the list_splice() from that > attempt. Now it can be. Unbound is still unbound, inf/n := inf. A 'cheaper' unbound doesn't RT make. > > Yes, wake_up_all() is crap, it is also fundamentally incompatible with > > in-*irq usage. Nothing to be done about that. > I still have (or need) completions which are swait based and do > complete_all(). That's fine, as long as they're done from preemptible context. Back when we introduced swait this was an explicit design goal/limitation. And there were no in-irq users of this. > There are complete_all() caller which wake more than one > waiter (that is PM and crypto from the reports I got once I added the > WARN_ON())). > The in-IRQ usage is !RT only and was there before. Then that's broken and needs to be undone. Also, why did you need the WARN, lockdep should've equally triggered on this, no? > > So NAK on this. > So I need completions to be swait based and do complete_all() from IRQ > (on !RT, not RT). I have this one call which breaks the usage on !RT and > has wake_up_all() in it in vanilla which needs an swait equivalent since > it calls its callback from an rcu-sched section. Why isn't this a problem on RT? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 22:26 ` Peter Zijlstra @ 2018-03-12 10:51 ` Sebastian Andrzej Siewior 2018-03-12 13:27 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-12 10:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 2018-03-09 23:26:43 [+0100], Peter Zijlstra wrote: > On Fri, Mar 09, 2018 at 09:25:50PM +0100, Sebastian Andrzej Siewior wrote: > > Is it just about the irqsave() usage or something else? I doubt it is > > the list walk. It is still unbound if not called from irq-off region. > > The current list walk is preemptible. You put the entire iteration (of > unbound length) inside a single critical section which destroy RT. I considered that list walk as cheap. We don't do any wake ups with the list walk - just mark the task for a later wake up. But if it is not I could add an upper limit of 20 iterations or so. > > But it is now possible, I agree. The wake_q usage should be cheaper > > compared to IRQ off+on in each loop. And we wanted to do the wake ups > > with enabled interrupts - there is still the list_splice() from that > > attempt. Now it can be. > > Unbound is still unbound, inf/n := inf. A 'cheaper' unbound doesn't RT > make. What I meant is that wake_q() is invoked with interrupts enabled and we don't need the IRQ on/off on each iteration. But as I said in the upper paragraph, I can add an upper limit for the list walk. And wake up itself is with enabled interrupts. > > > Yes, wake_up_all() is crap, it is also fundamentally incompatible with > > > in-*irq usage. Nothing to be done about that. > > I still have (or need) completions which are swait based and do > > complete_all(). > > That's fine, as long as they're done from preemptible context. Back when > we introduced swait this was an explicit design goal/limitation. And > there were no in-irq users of this. Yes at that time in !RT. wake_up() is using sleeping locks on RT and swait is the only thing that can be used there. So if I don't get rid if that !preemptible part I try to switch to swait. > > There are complete_all() caller which wake more than one > > waiter (that is PM and crypto from the reports I got once I added the > > WARN_ON())). > > The in-IRQ usage is !RT only and was there before. > > Then that's broken and needs to be undone. Also, why did you need the > WARN, lockdep should've equally triggered on this, no? I added WARN_ON() and I didn't even think about lockdep. I wanted to see a warning even with lockdep off. After adding this for testing: { raw_spin_lock_irq(&lock1); raw_spin_unlock_irq(&lock1); raw_spin_lock_irq(&lock2); raw_spin_unlock_irq(&lock2); raw_spin_lock_irq(&lock1); raw_spin_lock_irq(&lock2); raw_spin_unlock_irq(&lock2); raw_spin_unlock_irq(&lock1); raw_spin_lock_irq(&lock2); raw_spin_lock_irq(&lock1); raw_spin_unlock_irq(&lock1); raw_spin_unlock_irq(&lock2); } I see only one complaint about the lock order in the last block. With that one gone there is no complain about the second block. So no, lockdep does not report such things (this was just tested on RT and TIP). > > > So NAK on this. > > So I need completions to be swait based and do complete_all() from IRQ > > (on !RT, not RT). I have this one call which breaks the usage on !RT and > > has wake_up_all() in it in vanilla which needs an swait equivalent since > > it calls its callback from an rcu-sched section. > > Why isn't this a problem on RT? So we remain in the preempt_disable() section due to RCU-sched so we have this, yes. But the "disabled interrupts" part is due to spin_lock_irqsave() which is a non-issue on RT. So if we managed to get rid of the rcu-sched then the swait can go and we can stick with the wake_up_all() on RT, too. Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-12 10:51 ` Sebastian Andrzej Siewior @ 2018-03-12 13:27 ` Peter Zijlstra 2018-03-12 14:11 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2018-03-12 13:27 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On Mon, Mar 12, 2018 at 11:51:13AM +0100, Sebastian Andrzej Siewior wrote: > On 2018-03-09 23:26:43 [+0100], Peter Zijlstra wrote: > > On Fri, Mar 09, 2018 at 09:25:50PM +0100, Sebastian Andrzej Siewior wrote: > > > Is it just about the irqsave() usage or something else? I doubt it is > > > the list walk. It is still unbound if not called from irq-off region. > > > > The current list walk is preemptible. You put the entire iteration (of > > unbound length) inside a single critical section which destroy RT. > > I considered that list walk as cheap. We don't do any wake ups with the > list walk - just mark the task for a later wake up. But if it is not I > could add an upper limit of 20 iterations or so. So the problem is that as soon as this is exposed to userspace you've lost. If a user can stack like 10000 tasks on the completion before triggering it, you've got yourself a giant !preempt section. Yes the wake_q stuff is cheaper, but unbound is still unbound. wake_all must not be used from !preemptible (IRQ or otherwise) sections. And I'm not seeing how waking just the top 20 helps. > > Why isn't this a problem on RT? > So we remain in the preempt_disable() section due to RCU-sched so we > have this, yes. But the "disabled interrupts" part is due to > spin_lock_irqsave() which is a non-issue on RT. So if we managed to get > rid of the rcu-sched then the swait can go and we can stick with the > wake_up_all() on RT, too. OK, so for RT we simply loose the IRQ-disable thing, but its still a !preemptible section. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-12 13:27 ` Peter Zijlstra @ 2018-03-12 14:11 ` Sebastian Andrzej Siewior 2018-03-12 14:29 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-12 14:11 UTC (permalink / raw) To: Peter Zijlstra Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 2018-03-12 14:27:29 [+0100], Peter Zijlstra wrote: > On Mon, Mar 12, 2018 at 11:51:13AM +0100, Sebastian Andrzej Siewior wrote: > > On 2018-03-09 23:26:43 [+0100], Peter Zijlstra wrote: > > > On Fri, Mar 09, 2018 at 09:25:50PM +0100, Sebastian Andrzej Siewior wrote: > > > > Is it just about the irqsave() usage or something else? I doubt it is > > > > the list walk. It is still unbound if not called from irq-off region. > > > > > > The current list walk is preemptible. You put the entire iteration (of > > > unbound length) inside a single critical section which destroy RT. > > > > I considered that list walk as cheap. We don't do any wake ups with the > > list walk - just mark the task for a later wake up. But if it is not I > > could add an upper limit of 20 iterations or so. > > So the problem is that as soon as this is exposed to userspace you've > lost. I know. We had this very same problem with clock_nanosleep() which got solved after timer rework. > If a user can stack like 10000 tasks on the completion before triggering > it, you've got yourself a giant !preempt section. Yes the wake_q stuff > is cheaper, but unbound is still unbound. > > wake_all must not be used from !preemptible (IRQ or otherwise) sections. > And I'm not seeing how waking just the top 20 helps. I assumed you complained about the unbounded list-walk with interrupts disabled (which is cheap but unbound is unbound). So here I suggested I move 20 entries off that list a time and enable interrupts again so an interrupt could set need_resched. But if we get invoked !preemptible then nothing changes. > > > Why isn't this a problem on RT? > > So we remain in the preempt_disable() section due to RCU-sched so we > > have this, yes. But the "disabled interrupts" part is due to > > spin_lock_irqsave() which is a non-issue on RT. So if we managed to get > > rid of the rcu-sched then the swait can go and we can stick with the > > wake_up_all() on RT, too. > > OK, so for RT we simply loose the IRQ-disable thing, but its still a > !preemptible section. exactly. The irqsafe() was to guard non-RT config which uses the same code. So do I understand you correctly that irqsafe may remain for !RT config but that invocation with disabled preemption due to sched_rcu (on RT, too) must go? Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-12 14:11 ` Sebastian Andrzej Siewior @ 2018-03-12 14:29 ` Peter Zijlstra 2018-03-12 19:51 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2018-03-12 14:29 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On Mon, Mar 12, 2018 at 03:11:07PM +0100, Sebastian Andrzej Siewior wrote: > I assumed you complained about the unbounded list-walk with interrupts > disabled (which is cheap but unbound is unbound). So here I suggested I > move 20 entries off that list a time and enable interrupts again so an > interrupt could set need_resched. > But if we get invoked !preemptible then nothing changes. Right, so any !preemptible invocation of wake_all is bad. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-12 14:29 ` Peter Zijlstra @ 2018-03-12 19:51 ` Sebastian Andrzej Siewior 2018-03-13 18:40 ` [RT PATCH 1/2] Revert "block: blk-mq: Use swait" Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-12 19:51 UTC (permalink / raw) To: Peter Zijlstra Cc: Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 2018-03-12 15:29:33 [+0100], Peter Zijlstra wrote: > On Mon, Mar 12, 2018 at 03:11:07PM +0100, Sebastian Andrzej Siewior wrote: > > I assumed you complained about the unbounded list-walk with interrupts > > disabled (which is cheap but unbound is unbound). So here I suggested I > > move 20 entries off that list a time and enable interrupts again so an > > interrupt could set need_resched. > > But if we get invoked !preemptible then nothing changes. > > Right, so any !preemptible invocation of wake_all is bad. I doubt can I can argue the move of wake_up_all() into a workqueue in a sane way. Not to mention that it wouldn't do any good for me on RT since I can't schedule workqueues in !preemptible context. I've been staring at the original issue. The original commit that properly introduced RCU-sched [0] reads like "it could have been normal RCU but this one is faster in my micro bench so let's go for it". Probably because the normal RCU invokes the callbacks once in a while. So we could switch it to "normal" RCU instead. Anyone thinks that it could be done? percpu_ref_get()/put() is a hot-path, I am not sure how much worse it gets in a real benchmark. In the meantime I prepared a patch which reverts the swait conversation and invokes the callback swork_queue() which in turn moves the wake_up_all() into a different process. Since it triggers rare, it should not have performance regressions. And the wake_up_all() is preemptible and RT is happy. [0] a4244454df12 ("percpu-refcount: use RCU-sched insted of normal RCU") --- a/block/blk-core.c +++ b/block/blk-core.c @@ -381,7 +381,7 @@ void blk_clear_preempt_only(struct reque spin_lock_irqsave(q->queue_lock, flags); queue_flag_clear(QUEUE_FLAG_PREEMPT_ONLY, q); - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); spin_unlock_irqrestore(q->queue_lock, flags); } EXPORT_SYMBOL_GPL(blk_clear_preempt_only); @@ -659,7 +659,7 @@ void blk_set_queue_dying(struct request_ } /* Make blk_queue_enter() reexamine the DYING flag. */ - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); } EXPORT_SYMBOL_GPL(blk_set_queue_dying); @@ -860,7 +860,7 @@ int blk_queue_enter(struct request_queue */ smp_rmb(); - ret = swait_event_interruptible(q->mq_freeze_wq, + ret = wait_event_interruptible(q->mq_freeze_wq, (atomic_read(&q->mq_freeze_depth) == 0 && (preempt || !blk_queue_preempt_only(q))) || blk_queue_dying(q)); @@ -876,12 +876,20 @@ void blk_queue_exit(struct request_queue percpu_ref_put(&q->q_usage_counter); } +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) +{ + struct request_queue *q = + container_of(sev, struct request_queue, mq_pcpu_wake); + + wake_up_all(&q->mq_freeze_wq); +} + static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - swake_up_all(&q->mq_freeze_wq); + swork_queue(&q->mq_pcpu_wake); } static void blk_rq_timed_out_timer(struct timer_list *t) @@ -957,7 +965,8 @@ struct request_queue *blk_alloc_queue_no q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); - init_swait_queue_head(&q->mq_freeze_wq); + init_waitqueue_head(&q->mq_freeze_wq); + INIT_SWORK(&q->mq_pcpu_wake, blk_queue_usage_counter_release_swork); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. @@ -3843,6 +3852,7 @@ int __init blk_dev_init(void) blk_requestq_cachep = kmem_cache_create("request_queue", sizeof(struct request_queue), 0, SLAB_PANIC, NULL); + BUG_ON(swork_get()); #ifdef CONFIG_DEBUG_FS blk_debugfs_root = debugfs_create_dir("block", NULL); --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -133,14 +133,14 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start void blk_mq_freeze_queue_wait(struct request_queue *q) { - swait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); + wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout) { - return swait_event_timeout(q->mq_freeze_wq, + return wait_event_timeout(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter), timeout); } @@ -183,7 +183,7 @@ void blk_mq_unfreeze_queue(struct reques WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -29,6 +29,7 @@ #include <linux/blkzoned.h> #include <linux/seqlock.h> #include <linux/u64_stats_sync.h> +#include <linux/swork.h> struct module; struct scsi_ioctl_command; @@ -647,7 +648,8 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - struct swait_queue_head mq_freeze_wq; + wait_queue_head_t mq_freeze_wq; + struct swork_event mq_pcpu_wake; struct percpu_ref q_usage_counter; struct list_head all_q_node; ^ permalink raw reply [flat|nested] 23+ messages in thread
* [RT PATCH 1/2] Revert "block: blk-mq: Use swait" 2018-03-12 19:51 ` Sebastian Andrzej Siewior @ 2018-03-13 18:40 ` Sebastian Andrzej Siewior 2018-03-13 18:42 ` [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-13 18:40 UTC (permalink / raw) To: linux-rt-users Cc: Peter Zijlstra, Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-kernel, Tejun Heo This reverts commit "block: blk-mq: Use swait". The issue remains but will be fixed differently. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- block/blk-core.c | 6 +++--- block/blk-mq.c | 8 ++++---- include/linux/blkdev.h | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ff1258ca236c..f9ac6f169c67 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -799,7 +799,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait) */ smp_rmb(); - ret = swait_event_interruptible(q->mq_freeze_wq, + ret = wait_event_interruptible(q->mq_freeze_wq, !atomic_read(&q->mq_freeze_depth) || blk_queue_dying(q)); if (blk_queue_dying(q)) @@ -819,7 +819,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); } static void blk_rq_timed_out_timer(unsigned long data) @@ -895,7 +895,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); - init_swait_queue_head(&q->mq_freeze_wq); + init_waitqueue_head(&q->mq_freeze_wq); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index bbe43d32f71a..c5bd467dd97b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -132,14 +132,14 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start); void blk_mq_freeze_queue_wait(struct request_queue *q) { - swait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); + wait_event(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter)); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout) { - return swait_event_timeout(q->mq_freeze_wq, + return wait_event_timeout(q->mq_freeze_wq, percpu_ref_is_zero(&q->q_usage_counter), timeout); } @@ -182,7 +182,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(&q->q_usage_counter); - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -263,7 +263,7 @@ void blk_mq_wake_waiters(struct request_queue *q) * dying, we need to ensure that processes currently waiting on * the queue are notified as well. */ - swake_up_all(&q->mq_freeze_wq); + wake_up_all(&q->mq_freeze_wq); } bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6f278f1fd634..b68752bfb645 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -598,7 +598,7 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - struct swait_queue_head mq_freeze_wq; + wait_queue_head_t mq_freeze_wq; struct percpu_ref q_usage_counter; struct list_head all_q_node; -- 2.16.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context 2018-03-13 18:40 ` [RT PATCH 1/2] Revert "block: blk-mq: Use swait" Sebastian Andrzej Siewior @ 2018-03-13 18:42 ` Sebastian Andrzej Siewior 2018-03-13 20:10 ` Peter Zijlstra 0 siblings, 1 reply; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-13 18:42 UTC (permalink / raw) To: linux-rt-users Cc: Peter Zijlstra, Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-kernel, Tejun Heo | BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 | in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6 | 5 locks held by kworker/u257:6/255: | #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0 | #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8108edf1>] process_one_work+0x171/0x5e0 | #2: (&shost->scan_mutex){+.+.+.}, at: [<ffffffffa000faa3>] __scsi_add_device+0xa3/0x130 [scsi_mod] | #3: (&set->tag_list_lock){+.+...}, at: [<ffffffff812f09fa>] blk_mq_init_queue+0x96a/0xa50 | #4: (rcu_read_lock_sched){......}, at: [<ffffffff8132887d>] percpu_ref_kill_and_confirm+0x1d/0x120 | Preemption disabled at:[<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70 | | CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1 | Workqueue: events_unbound async_run_entry_fn | 0000000000000003 ffff8800bc29f998 ffffffff815b3a12 0000000000000000 | 0000000000000000 ffff8800bc29f9b8 ffffffff8109aa16 ffff8800bc29fa28 | ffff8800bc5d1bc8 ffff8800bc29f9e8 ffffffff815b8dd4 ffff880000000000 | Call Trace: | [<ffffffff815b3a12>] dump_stack+0x4f/0x7c | [<ffffffff8109aa16>] __might_sleep+0x116/0x190 | [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60 | [<ffffffff810b6089>] __wake_up+0x29/0x60 | [<ffffffff812ee06e>] blk_mq_usage_counter_release+0x1e/0x20 | [<ffffffff81328966>] percpu_ref_kill_and_confirm+0x106/0x120 | [<ffffffff812eff76>] blk_mq_freeze_queue_start+0x56/0x70 | [<ffffffff812f0000>] blk_mq_update_tag_set_depth+0x40/0xd0 | [<ffffffff812f0a1c>] blk_mq_init_queue+0x98c/0xa50 | [<ffffffffa000dcf0>] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod] | [<ffffffffa000ea35>] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod] | [<ffffffffa000f494>] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod] | [<ffffffffa000fb26>] __scsi_add_device+0x126/0x130 [scsi_mod] | [<ffffffffa013033f>] ata_scsi_scan_host+0xaf/0x200 [libata] | [<ffffffffa012b5b6>] async_port_probe+0x46/0x60 [libata] | [<ffffffff810978fb>] async_run_entry_fn+0x3b/0xf0 | [<ffffffff8108ee81>] process_one_work+0x201/0x5e0 percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in a rcu-sched region. swait based wake queue can't be used due to wake_up_all() usage and disabled interrupts in !RT configs (as reported by Corey Minyard). Uses work_queue() to invoke wake_up_all() in process context. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- block/blk-core.c | 13 ++++++++++++- include/linux/blkdev.h | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index f9ac6f169c67..4db4051724ea 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -814,12 +814,20 @@ void blk_queue_exit(struct request_queue *q) percpu_ref_put(&q->q_usage_counter); } +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) +{ + struct request_queue *q = + container_of(sev, struct request_queue, mq_pcpu_wake); + + wake_up_all(&q->mq_freeze_wq); +} + static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - wake_up_all(&q->mq_freeze_wq); + swork_queue(&q->mq_pcpu_wake); } static void blk_rq_timed_out_timer(unsigned long data) @@ -896,6 +904,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags); init_waitqueue_head(&q->mq_freeze_wq); + INIT_SWORK(&q->mq_pcpu_wake, blk_queue_usage_counter_release_swork); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. @@ -3623,6 +3632,8 @@ int __init blk_dev_init(void) if (!kblockd_workqueue) panic("Failed to create kblockd\n"); + BUG_ON(swork_get()); + request_cachep = kmem_cache_create("blkdev_requests", sizeof(struct request), 0, SLAB_PANIC, NULL); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b68752bfb645..49b53ad6d2d6 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include <linux/percpu-refcount.h> #include <linux/scatterlist.h> #include <linux/blkzoned.h> +#include <linux/swork.h> struct module; struct scsi_ioctl_command; @@ -599,6 +600,7 @@ struct request_queue { #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; + struct swork_event mq_pcpu_wake; struct percpu_ref q_usage_counter; struct list_head all_q_node; -- 2.16.2 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context 2018-03-13 18:42 ` [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context Sebastian Andrzej Siewior @ 2018-03-13 20:10 ` Peter Zijlstra 2018-03-14 15:23 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 23+ messages in thread From: Peter Zijlstra @ 2018-03-13 20:10 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: linux-rt-users, Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-kernel, Tejun Heo On Tue, Mar 13, 2018 at 07:42:41PM +0100, Sebastian Andrzej Siewior wrote: > +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) > +{ > + struct request_queue *q = > + container_of(sev, struct request_queue, mq_pcpu_wake); > + > + wake_up_all(&q->mq_freeze_wq); > +} > + > static void blk_queue_usage_counter_release(struct percpu_ref *ref) > { > struct request_queue *q = > container_of(ref, struct request_queue, q_usage_counter); > > - wake_up_all(&q->mq_freeze_wq); > + swork_queue(&q->mq_pcpu_wake); > } Depending on if we expect there to actually be wakeups, you could do something like: if (wq_has_sleepers(&q->mq_freeze_wq)) swork_queue(&q->mq_pcpu_wake)); avoiding the whole workqueue thing in the case there wasn't anybody waiting for it. But since I don't know this code, I can't say if it makes sense or not. Tejun? ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context 2018-03-13 20:10 ` Peter Zijlstra @ 2018-03-14 15:23 ` Sebastian Andrzej Siewior 0 siblings, 0 replies; 23+ messages in thread From: Sebastian Andrzej Siewior @ 2018-03-14 15:23 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-rt-users, Corey Minyard, Thomas Gleixner, Steven Rostedt, linux-kernel, Tejun Heo On 2018-03-13 21:10:39 [+0100], Peter Zijlstra wrote: > On Tue, Mar 13, 2018 at 07:42:41PM +0100, Sebastian Andrzej Siewior wrote: > > +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) > > +{ > > + struct request_queue *q = > > + container_of(sev, struct request_queue, mq_pcpu_wake); > > + > > + wake_up_all(&q->mq_freeze_wq); > > +} > > + > > static void blk_queue_usage_counter_release(struct percpu_ref *ref) > > { > > struct request_queue *q = > > container_of(ref, struct request_queue, q_usage_counter); > > > > - wake_up_all(&q->mq_freeze_wq); > > + swork_queue(&q->mq_pcpu_wake); > > } > > Depending on if we expect there to actually be wakeups, you could do > something like: > > if (wq_has_sleepers(&q->mq_freeze_wq)) > swork_queue(&q->mq_pcpu_wake)); > > avoiding the whole workqueue thing in the case there wasn't anybody > waiting for it. But since I don't know this code, I can't say if it > makes sense or not. Tejun? this pops up on boot and shows that there are no waiter. So I consider to take this. Sebastian ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: Warning from swake_up_all in 4.14.15-rt13 non-RT 2018-03-09 17:46 ` Peter Zijlstra 2018-03-09 20:25 ` Sebastian Andrzej Siewior @ 2018-03-09 22:02 ` Corey Minyard 1 sibling, 0 replies; 23+ messages in thread From: Corey Minyard @ 2018-03-09 22:02 UTC (permalink / raw) To: Peter Zijlstra, Sebastian Andrzej Siewior Cc: Thomas Gleixner, Steven Rostedt, linux-rt-users, linux-kernel, Tejun Heo On 03/09/2018 11:46 AM, Peter Zijlstra wrote: > On Fri, Mar 09, 2018 at 12:04:18PM +0100, Sebastian Andrzej Siewior wrote: >> +void swake_add_all_wq(struct swait_queue_head *q, struct wake_q_head *wq) >> { >> struct swait_queue *curr; >> >> while (!list_empty(&q->task_list)) { >> >> curr = list_first_entry(&q->task_list, typeof(*curr), >> task_list); >> list_del_init(&curr->task_list); >> + wake_q_add(wq, curr->task); >> } >> } >> +EXPORT_SYMBOL(swake_add_all_wq); >> >> void swake_up(struct swait_queue_head *q) >> { >> @@ -66,25 +62,14 @@ EXPORT_SYMBOL(swake_up); >> */ >> void swake_up_all(struct swait_queue_head *q) >> { >> + unsigned long flags; >> + DEFINE_WAKE_Q(wq); >> >> + raw_spin_lock_irqsave(&q->lock, flags); >> + swake_add_all_wq(q, &wq); >> + raw_spin_unlock_irqrestore(&q->lock, flags); >> >> + wake_up_q(&wq); >> } >> EXPORT_SYMBOL(swake_up_all); > This is fundamentally wrong. The whole point of wake_up_all() is that > _all_ is unbounded and should not ever land in a single critical > section, be it IRQ or PREEMPT disabled. The above does both. It seems to me to be better than what was there, certainly more efficient. And if I understand this correctly it is unbounded when !RT, but it is bounded on RT. And I'm biased, because it should fix my problem :). > Yes, wake_up_all() is crap, it is also fundamentally incompatible with > in-*irq usage. Nothing to be done about that. > > So NAK on this. So what would you suggest? At this point getting rid of all the users of wake_up_all() from interrupt context is not really an option, though as an eventual goal it would be good. -corey ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2018-03-14 15:23 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-05 15:08 Warning from swake_up_all in 4.14.15-rt13 non-RT Corey Minyard 2018-03-06 17:46 ` Sebastian Andrzej Siewior 2018-03-06 22:51 ` Corey Minyard 2018-03-07 15:45 ` Corey Minyard 2018-03-08 17:41 ` Sebastian Andrzej Siewior 2018-03-08 19:54 ` Corey Minyard 2018-03-09 11:04 ` Sebastian Andrzej Siewior 2018-03-09 13:29 ` Corey Minyard 2018-03-09 14:58 ` Sebastian Andrzej Siewior 2018-03-09 16:03 ` Corey Minyard 2018-03-09 17:46 ` Peter Zijlstra 2018-03-09 20:25 ` Sebastian Andrzej Siewior 2018-03-09 22:26 ` Peter Zijlstra 2018-03-12 10:51 ` Sebastian Andrzej Siewior 2018-03-12 13:27 ` Peter Zijlstra 2018-03-12 14:11 ` Sebastian Andrzej Siewior 2018-03-12 14:29 ` Peter Zijlstra 2018-03-12 19:51 ` Sebastian Andrzej Siewior 2018-03-13 18:40 ` [RT PATCH 1/2] Revert "block: blk-mq: Use swait" Sebastian Andrzej Siewior 2018-03-13 18:42 ` [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context Sebastian Andrzej Siewior 2018-03-13 20:10 ` Peter Zijlstra 2018-03-14 15:23 ` Sebastian Andrzej Siewior 2018-03-09 22:02 ` Warning from swake_up_all in 4.14.15-rt13 non-RT Corey Minyard
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.