* [PATCH 0/2] locking/rwbase_rt: Use wake_q for lockless reader waker @ 2021-09-01 22:28 Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 1/2] sched: Move wake_q code below try_to_wake_up() Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso 0 siblings, 2 replies; 9+ messages in thread From: Davidlohr Bueso @ 2021-09-01 22:28 UTC (permalink / raw) To: tglx Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, dave, linux-kernel Hi, Patch 1 does some necessary code reordering. Patch 2 introduces wake_up_q_state() and converts rwbase_read_unlock() slowpath to use wake_q instead of holding the wait_lock across the entire wakeup. I have tested equivalent changes in the preempt_rt kernel (v5.14.y-rt) stressing mmap_sem, without anything falling out. Thanks! Davidlohr Bueso (2): sched: Move wake_q code below try_to_wake_up() locking/rwbase_rt: Lockless reader waking up a writer kernel/locking/rwbase_rt.c | 4 +- kernel/sched/core.c | 182 +++++++++++++++++++------------------ 2 files changed, 99 insertions(+), 87 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] sched: Move wake_q code below try_to_wake_up() 2021-09-01 22:28 [PATCH 0/2] locking/rwbase_rt: Use wake_q for lockless reader waker Davidlohr Bueso @ 2021-09-01 22:28 ` Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso 1 sibling, 0 replies; 9+ messages in thread From: Davidlohr Bueso @ 2021-09-01 22:28 UTC (permalink / raw) To: tglx Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, dave, linux-kernel, Davidlohr Bueso Such that wake_up_q() can make use of it instead of wake_up_process(). No change in code. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/sched/core.c | 172 ++++++++++++++++++++++---------------------- 1 file changed, 86 insertions(+), 86 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index c4462c454ab9..7fc3d22bc6d8 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -865,92 +865,6 @@ static bool set_nr_if_polling(struct task_struct *p) #endif #endif -static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task) -{ - struct wake_q_node *node = &task->wake_q; - - /* - * Atomically grab the task, if ->wake_q is !nil already it means - * it's already queued (either by us or someone else) and will get the - * wakeup due to that. - * - * In order to ensure that a pending wakeup will observe our pending - * state, even in the failed case, an explicit smp_mb() must be used. - */ - smp_mb__before_atomic(); - if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL))) - return false; - - /* - * The head is context local, there can be no concurrency. - */ - *head->lastp = node; - head->lastp = &node->next; - return true; -} - -/** - * wake_q_add() - queue a wakeup for 'later' waking. - * @head: the wake_q_head to add @task to - * @task: the task to queue for 'later' wakeup - * - * Queue a task for later wakeup, most likely by the wake_up_q() call in the - * same context, _HOWEVER_ this is not guaranteed, the wakeup can come - * instantly. - * - * This function must be used as-if it were wake_up_process(); IOW the task - * must be ready to be woken at this location. - */ -void wake_q_add(struct wake_q_head *head, struct task_struct *task) -{ - if (__wake_q_add(head, task)) - get_task_struct(task); -} - -/** - * wake_q_add_safe() - safely queue a wakeup for 'later' waking. - * @head: the wake_q_head to add @task to - * @task: the task to queue for 'later' wakeup - * - * Queue a task for later wakeup, most likely by the wake_up_q() call in the - * same context, _HOWEVER_ this is not guaranteed, the wakeup can come - * instantly. - * - * This function must be used as-if it were wake_up_process(); IOW the task - * must be ready to be woken at this location. - * - * This function is essentially a task-safe equivalent to wake_q_add(). Callers - * that already hold reference to @task can call the 'safe' version and trust - * wake_q to do the right thing depending whether or not the @task is already - * queued for wakeup. - */ -void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task) -{ - if (!__wake_q_add(head, task)) - put_task_struct(task); -} - -void wake_up_q(struct wake_q_head *head) -{ - struct wake_q_node *node = head->first; - - while (node != WAKE_Q_TAIL) { - struct task_struct *task; - - task = container_of(node, struct task_struct, wake_q); - /* Task can safely be re-inserted now: */ - node = node->next; - task->wake_q.next = NULL; - - /* - * wake_up_process() executes a full barrier, which pairs with - * the queueing in wake_q_add() so as not to miss wakeups. - */ - wake_up_process(task); - put_task_struct(task); - } -} - /* * resched_curr - mark rq's current task 'to be rescheduled now'. * @@ -4172,6 +4086,92 @@ int wake_up_state(struct task_struct *p, unsigned int state) return try_to_wake_up(p, state, 0); } +static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task) +{ + struct wake_q_node *node = &task->wake_q; + + /* + * Atomically grab the task, if ->wake_q is !nil already it means + * it's already queued (either by us or someone else) and will get the + * wakeup due to that. + * + * In order to ensure that a pending wakeup will observe our pending + * state, even in the failed case, an explicit smp_mb() must be used. + */ + smp_mb__before_atomic(); + if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL))) + return false; + + /* + * The head is context local, there can be no concurrency. + */ + *head->lastp = node; + head->lastp = &node->next; + return true; +} + +/** + * wake_q_add() - queue a wakeup for 'later' waking. + * @head: the wake_q_head to add @task to + * @task: the task to queue for 'later' wakeup + * + * Queue a task for later wakeup, most likely by the wake_up_q() call in the + * same context, _HOWEVER_ this is not guaranteed, the wakeup can come + * instantly. + * + * This function must be used as-if it were wake_up_process(); IOW the task + * must be ready to be woken at this location. + */ +void wake_q_add(struct wake_q_head *head, struct task_struct *task) +{ + if (__wake_q_add(head, task)) + get_task_struct(task); +} + +/** + * wake_q_add_safe() - safely queue a wakeup for 'later' waking. + * @head: the wake_q_head to add @task to + * @task: the task to queue for 'later' wakeup + * + * Queue a task for later wakeup, most likely by the wake_up_q() call in the + * same context, _HOWEVER_ this is not guaranteed, the wakeup can come + * instantly. + * + * This function must be used as-if it were wake_up_process(); IOW the task + * must be ready to be woken at this location. + * + * This function is essentially a task-safe equivalent to wake_q_add(). Callers + * that already hold reference to @task can call the 'safe' version and trust + * wake_q to do the right thing depending whether or not the @task is already + * queued for wakeup. + */ +void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task) +{ + if (!__wake_q_add(head, task)) + put_task_struct(task); +} + +void wake_up_q(struct wake_q_head *head) +{ + struct wake_q_node *node = head->first; + + while (node != WAKE_Q_TAIL) { + struct task_struct *task; + + task = container_of(node, struct task_struct, wake_q); + /* Task can safely be re-inserted now: */ + node = node->next; + task->wake_q.next = NULL; + + /* + * wake_up_process() executes a full barrier, which pairs with + * the queueing in wake_q_add() so as not to miss wakeups. + */ + wake_up_process(task); + put_task_struct(task); + } +} + /* * Perform scheduler related setup for a newly forked process p. * p is forked by current. -- 2.26.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-01 22:28 [PATCH 0/2] locking/rwbase_rt: Use wake_q for lockless reader waker Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 1/2] sched: Move wake_q code below try_to_wake_up() Davidlohr Bueso @ 2021-09-01 22:28 ` Davidlohr Bueso 2021-09-01 23:03 ` Davidlohr Bueso ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Davidlohr Bueso @ 2021-09-01 22:28 UTC (permalink / raw) To: tglx Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, dave, linux-kernel, Davidlohr Bueso As with the rest of our sleeping locks, use a wake_q to allow waking up the writer without having to hold the wait_lock across the operation. While this is ideally for batching wakeups, single wakeup usage as still shown to be beneficial vs the cost of try_to_wakeup when the lock is contended. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/locking/rwbase_rt.c | 4 +++- kernel/sched/core.c | 16 +++++++++++++--- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 4ba15088e640..3444bc709973 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -141,6 +141,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, { struct rt_mutex_base *rtm = &rwb->rtmutex; struct task_struct *owner; + DEFINE_WAKE_Q(wake_q); raw_spin_lock_irq(&rtm->wait_lock); /* @@ -151,9 +152,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, */ owner = rt_mutex_owner(rtm); if (owner) - wake_up_state(owner, state); + wake_q_add(&wake_q, owner); raw_spin_unlock_irq(&rtm->wait_lock); + wake_up_q_state(&wake_q, state); } static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb, diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7fc3d22bc6d8..22c77742f1a7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4151,7 +4151,7 @@ void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task) put_task_struct(task); } -void wake_up_q(struct wake_q_head *head) +static void __wake_up_q(struct wake_q_head *head, unsigned int state) { struct wake_q_node *node = head->first; @@ -4164,14 +4164,24 @@ void wake_up_q(struct wake_q_head *head) task->wake_q.next = NULL; /* - * wake_up_process() executes a full barrier, which pairs with + * try_to_wake_up() executes a full barrier, which pairs with * the queueing in wake_q_add() so as not to miss wakeups. */ - wake_up_process(task); + try_to_wake_up(task, state, 0); put_task_struct(task); } } +void wake_up_q(struct wake_q_head *head) +{ + __wake_up_q(head, TASK_NORMAL); +} + +void wake_up_q_state(struct wake_q_head *head, unsigned int state) +{ + __wake_up_q(head, state); +} + /* * Perform scheduler related setup for a newly forked process p. * p is forked by current. -- 2.26.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso @ 2021-09-01 23:03 ` Davidlohr Bueso 2021-09-03 10:55 ` Sebastian Andrzej Siewior 2021-09-13 12:20 ` Thomas Gleixner 2 siblings, 0 replies; 9+ messages in thread From: Davidlohr Bueso @ 2021-09-01 23:03 UTC (permalink / raw) To: tglx Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, linux-kernel, Davidlohr Bueso On Wed, 01 Sep 2021, Bueso wrote: >As with the rest of our sleeping locks, use a wake_q to >allow waking up the writer without having to hold the >wait_lock across the operation. While this is ideally >for batching wakeups, single wakeup usage as still shown >to be beneficial vs the cost of try_to_wakeup when the >lock is contended. > >Signed-off-by: Davidlohr Bueso <dbueso@suse.de> >--- > kernel/locking/rwbase_rt.c | 4 +++- > kernel/sched/core.c | 16 +++++++++++++--- Bleh, this of course is missing: diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h index 06cd8fb2f409..695efd987b56 100644 --- a/include/linux/sched/wake_q.h +++ b/include/linux/sched/wake_q.h @@ -62,5 +62,6 @@ static inline bool wake_q_empty(struct wake_q_head *head) extern void wake_q_add(struct wake_q_head *head, struct task_struct *task); extern void wake_q_add_safe(struct wake_q_head *head, struct task_struct *task); extern void wake_up_q(struct wake_q_head *head); +extern void wake_up_q_state(struct wake_q_head *head, unsigned int state); #endif /* _LINUX_SCHED_WAKE_Q_H */ ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso 2021-09-01 23:03 ` Davidlohr Bueso @ 2021-09-03 10:55 ` Sebastian Andrzej Siewior 2021-09-13 12:20 ` Thomas Gleixner 2 siblings, 0 replies; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2021-09-03 10:55 UTC (permalink / raw) To: Davidlohr Bueso Cc: tglx, peterz, mingo, rostedt, longman, boqun.feng, linux-kernel, Davidlohr Bueso On 2021-09-01 15:28:25 [-0700], Davidlohr Bueso wrote: > diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c > index 4ba15088e640..3444bc709973 100644 > --- a/kernel/locking/rwbase_rt.c > +++ b/kernel/locking/rwbase_rt.c > @@ -141,6 +141,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, > { > struct rt_mutex_base *rtm = &rwb->rtmutex; > struct task_struct *owner; > + DEFINE_WAKE_Q(wake_q); > > raw_spin_lock_irq(&rtm->wait_lock); > /* > @@ -151,9 +152,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, > */ > owner = rt_mutex_owner(rtm); > if (owner) > - wake_up_state(owner, state); > + wake_q_add(&wake_q, owner); > > raw_spin_unlock_irq(&rtm->wait_lock); > + wake_up_q_state(&wake_q, state); > } You keep the same wake_q in task_struct. Don't you miss states/wake ups if a task needs both wakes? one for TASK_NORMAL and one for TASK_UNINTERRUPTIBLE? Side note: This wake up happens in an-IRQ off region. So there no PI-boosting dance around as it would be the case with a sleeping lock. Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso 2021-09-01 23:03 ` Davidlohr Bueso 2021-09-03 10:55 ` Sebastian Andrzej Siewior @ 2021-09-13 12:20 ` Thomas Gleixner 2021-09-14 10:42 ` Thomas Gleixner 2 siblings, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2021-09-13 12:20 UTC (permalink / raw) To: Davidlohr Bueso Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, dave, linux-kernel, Davidlohr Bueso On Wed, Sep 01 2021 at 15:28, Davidlohr Bueso wrote: > diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c > index 4ba15088e640..3444bc709973 100644 > --- a/kernel/locking/rwbase_rt.c > +++ b/kernel/locking/rwbase_rt.c > @@ -141,6 +141,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, > { > struct rt_mutex_base *rtm = &rwb->rtmutex; > struct task_struct *owner; > + DEFINE_WAKE_Q(wake_q); > > raw_spin_lock_irq(&rtm->wait_lock); > /* > @@ -151,9 +152,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, > */ > owner = rt_mutex_owner(rtm); > if (owner) > - wake_up_state(owner, state); > + wake_q_add(&wake_q, owner); That's broken for rw_locks. See commit 456cfbc65cd072f4f53936ee5a37eb1447a7d3ba. Thanks, tglx ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-13 12:20 ` Thomas Gleixner @ 2021-09-14 10:42 ` Thomas Gleixner 2021-09-16 17:02 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 9+ messages in thread From: Thomas Gleixner @ 2021-09-14 10:42 UTC (permalink / raw) To: Davidlohr Bueso Cc: peterz, mingo, rostedt, longman, bigeasy, boqun.feng, dave, linux-kernel, Davidlohr Bueso On Mon, Sep 13 2021 at 14:20, Thomas Gleixner wrote: > On Wed, Sep 01 2021 at 15:28, Davidlohr Bueso wrote: >> diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c >> index 4ba15088e640..3444bc709973 100644 >> --- a/kernel/locking/rwbase_rt.c >> +++ b/kernel/locking/rwbase_rt.c >> @@ -141,6 +141,7 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, >> { >> struct rt_mutex_base *rtm = &rwb->rtmutex; >> struct task_struct *owner; >> + DEFINE_WAKE_Q(wake_q); >> >> raw_spin_lock_irq(&rtm->wait_lock); >> /* >> @@ -151,9 +152,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, >> */ >> owner = rt_mutex_owner(rtm); >> if (owner) >> - wake_up_state(owner, state); >> + wake_q_add(&wake_q, owner); > > That's broken for rw_locks. See commit 456cfbc65cd072f4f53936ee5a37eb1447a7d3ba. Something like the untested below should work. Thanks, tglx --- diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 6bb116c559b4..9e04bca0c11e 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -459,6 +459,20 @@ static __always_inline void rt_mutex_wake_q_add(struct rt_wake_q_head *wqh, } } +static __always_inline void rt_mutex_wake_q_add_task(struct rt_wake_q_head *wqh, + struct task_struct *task, + unsigned int wake_state) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT) && wake_state != TASK_NORMAL) { + if (IS_ENABLED(CONFIG_PROVE_LOCKING)) + WARN_ON_ONCE(wqh->rtlock_task); + get_task_struct(task); + wqh->rtlock_task = task; + } else { + wake_q_add(&wqh->head, task); + } +} + static __always_inline void rt_mutex_wake_up_q(struct rt_wake_q_head *wqh) { if (IS_ENABLED(CONFIG_PREEMPT_RT) && wqh->rtlock_task) { diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 4ba15088e640..e011b347a2c5 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -141,8 +141,10 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, { struct rt_mutex_base *rtm = &rwb->rtmutex; struct task_struct *owner; + DEFINE_RT_WAKE_Q(wqh); raw_spin_lock_irq(&rtm->wait_lock); + /* * Wake the writer, i.e. the rtmutex owner. It might release the * rtmutex concurrently in the fast path (due to a signal), but to @@ -151,9 +153,12 @@ static void __sched __rwbase_read_unlock(struct rwbase_rt *rwb, */ owner = rt_mutex_owner(rtm); if (owner) - wake_up_state(owner, state); + rt_mutex_wake_q_add_task(&wqh, owner, state); + /* Pairs with the preempt_enable in rt_mutex_wake_up_q() */ + preempt_disable(); raw_spin_unlock_irq(&rtm->wait_lock); + rt_mutex_wake_up_q(&wqh); } static __always_inline void rwbase_read_unlock(struct rwbase_rt *rwb, ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-14 10:42 ` Thomas Gleixner @ 2021-09-16 17:02 ` Sebastian Andrzej Siewior 2021-09-16 17:27 ` Davidlohr Bueso 0 siblings, 1 reply; 9+ messages in thread From: Sebastian Andrzej Siewior @ 2021-09-16 17:02 UTC (permalink / raw) To: Thomas Gleixner Cc: Davidlohr Bueso, peterz, mingo, rostedt, longman, boqun.feng, linux-kernel, Davidlohr Bueso On 2021-09-14 12:42:41 [+0200], Thomas Gleixner wrote: > Something like the untested below should work. works. > Thanks, > > tglx Sebastian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer 2021-09-16 17:02 ` Sebastian Andrzej Siewior @ 2021-09-16 17:27 ` Davidlohr Bueso 0 siblings, 0 replies; 9+ messages in thread From: Davidlohr Bueso @ 2021-09-16 17:27 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Thomas Gleixner, Davidlohr Bueso, peterz, mingo, rostedt, longman, boqun.feng, linux-kernel On 2021-09-16 10:02, Sebastian Andrzej Siewior wrote: > On 2021-09-14 12:42:41 [+0200], Thomas Gleixner wrote: >> Something like the untested below should work. > > works. Works for me too. Also survived an overnight round of mmap_sem pounding on v5.14-rt. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-09-16 17:36 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-09-01 22:28 [PATCH 0/2] locking/rwbase_rt: Use wake_q for lockless reader waker Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 1/2] sched: Move wake_q code below try_to_wake_up() Davidlohr Bueso 2021-09-01 22:28 ` [PATCH 2/2] locking/rwbase_rt: Lockless reader waking up a writer Davidlohr Bueso 2021-09-01 23:03 ` Davidlohr Bueso 2021-09-03 10:55 ` Sebastian Andrzej Siewior 2021-09-13 12:20 ` Thomas Gleixner 2021-09-14 10:42 ` Thomas Gleixner 2021-09-16 17:02 ` Sebastian Andrzej Siewior 2021-09-16 17:27 ` Davidlohr Bueso
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.