All of lore.kernel.org
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Yongji Xie <elohimes@gmail.com>,
	mingo@redhat.com, will.deacon@arm.com,
	linux-kernel@vger.kernel.org, Xie Yongji <xieyongji@baidu.com>,
	zhangyu31@baidu.com, liuqi16@baidu.com, yuanlinsi01@baidu.com,
	nixun@baidu.com, lilin24@baidu.com, longman@redhat.com,
	andrea.parri@amarulasolutions.com
Subject: Re: [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil
Date: Mon, 17 Dec 2018 12:53:10 -0800	[thread overview]
Message-ID: <20181217205310.pvwcryyaqlrzmaex@linux-r8p5> (raw)
In-Reply-To: <20181217113718.GB4900@worktop.programming.kicks-ass.net>

On Mon, 17 Dec 2018, Peter Zijlstra wrote:
>
>I've put some patches here:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core
>
>Could you have a look?

So how about the following to reduce some of the performance penalty (at
the cost of more complexity)?

Thanks,
Davidlohr

----------8<-----------------------------------------------------------
[PATCH] sched/wake_q: Reduce reference counting for special users

Some users, specifically futexes and rwsems, required fixes
that allowed the callers to be safe when wakeups occur before
they are expected by wake_up_q(). Such scenarios also play
games and rely on reference counting, and until now were
pivoting on wake_q doing it. With the wake_q_add() call being
moved down, this can no longer be the case. As such we end up
with a double task refcounting overhead; and these callers
care enough about this (being rather core-ish).

This patch introduces a wake_q_add_tasksafe() call that serves
for callers that have already done refcounting and therefore the
task is 'safe' from wake_q point of view (int that it requires
reference throughout the entire queue/wakeup cycle). These
users also need to check the return value of the operation and
do the put() if necessary when the cmpxchg() fails. Regular users
of wake_q_add() that don't care about when the wakeup actually
happens can just ignore the return value.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/sched/wake_q.h |  7 ++++--
 kernel/futex.c               |  4 ++--
 kernel/locking/rwsem-xadd.c  |  7 +++---
 kernel/sched/core.c          | 53 +++++++++++++++++++++++++++++++-------------
 4 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 545f37138057..8c1fc6434c6c 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h
@@ -51,8 +51,11 @@ static inline void wake_q_init(struct wake_q_head *head)
 	head->lastp = &head->first;
 }
 
-extern void wake_q_add(struct wake_q_head *head,
-		       struct task_struct *task);
+extern bool wake_q_add(struct wake_q_head *head,
+		      struct task_struct *task);
+extern bool wake_q_add_tasksafe(struct wake_q_head *head,
+				struct task_struct *task);
+
 extern void wake_up_q(struct wake_q_head *head);
 
 #endif /* _LINUX_SCHED_WAKE_Q_H */
diff --git a/kernel/futex.c b/kernel/futex.c
index d14971f6ed3d..2ff7e811f13b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1402,8 +1402,8 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q)
 	 * Queue the task for later wakeup for after we've released
 	 * the hb->lock. wake_q_add() grabs reference to p.
 	 */
-	wake_q_add(wake_q, p);
-	put_task_struct(p);
+	if (!wake_q_add_tasksafe(wake_q, p))
+		put_task_struct(p);
 }
 
 /*
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 50d9af615dc4..dea4dcf9d8f5 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -211,9 +211,10 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
 		 * Ensure issuing the wakeup (either by us or someone else)
 		 * after setting the reader waiter to nil.
 		 */
-		wake_q_add(wake_q, tsk);
-		/* wake_q_add() already take the task ref */
-		put_task_struct(tsk);
+		if (!wake_q_add_tasksafe(wake_q, tsk)) {
+			/* wake_q_add() already take the task ref */
+			put_task_struct(tsk);
+		}
 	}
 
 	adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d740d7a3608d..2c1825fe46e6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -396,19 +396,8 @@ static bool set_nr_if_polling(struct task_struct *p)
 #endif
 #endif
 
-/**
- * wake_q_add() - queue a wakeup for 'later' waking.
- * @head: the wake_q_head to add @task to
- * @task: the task to queue for 'later' wakeup
- *
- * Queue a task for later wakeup, most likely by the wake_up_q() call in the
- * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
- * instantly.
- *
- * This function must be used as-if it were wake_up_process(); IOW the task
- * must be ready to be woken at this location.
- */
-void wake_q_add(struct wake_q_head *head, struct task_struct *task)
+bool __wake_q_add(struct wake_q_head *head,
+		  struct task_struct *task, bool tasksafe)
 {
 	struct wake_q_node *node = &task->wake_q;
 
@@ -422,15 +411,49 @@ void wake_q_add(struct wake_q_head *head, struct task_struct *task)
 	 */
 	smp_mb__before_atomic();
 	if (unlikely(cmpxchg_relaxed(&node->next, NULL, WAKE_Q_TAIL)))
-		return;
+		return false;
 
-	get_task_struct(task);
+	if (!tasksafe)
+		get_task_struct(task);
 
 	/*
 	 * The head is context local, there can be no concurrency.
 	 */
 	*head->lastp = node;
 	head->lastp = &node->next;
+	return true;
+}
+
+/**
+ * wake_q_add() - queue a wakeup for 'later' waking.
+ * @head: the wake_q_head to add @task to
+ * @task: the task to queue for 'later' wakeup
+ *
+ * Queue a task for later wakeup, most likely by the wake_up_q() call in the
+ * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
+ * instantly.
+ *
+ * This function must be used as-if it were wake_up_process(); IOW the task
+ * must be ready to be woken at this location.
+ *
+ * Returns whether or not the task was successfully queued for wakeup.
+ * If false, the task is already queued and can happen at any time after
+ * this call.
+ */
+bool wake_q_add(struct wake_q_head *head, struct task_struct *task)
+{
+	return __wake_q_add(head, task, false);
+}
+
+/*
+ * wake_q_add_tasksafe() is the same as the above wake_q_add(), except that
+ * the caller has already done the task reference counting for us. Normally
+ * the 'tasksafe' caller will check the return value and cleanup refcounting
+ * accordingly.
+ */
+bool wake_q_add_tasksafe(struct wake_q_head *head, struct task_struct *task)
+{
+	return __wake_q_add(head, task, true);
 }
 
 void wake_up_q(struct wake_q_head *head)
-- 
2.16.4


  parent reply	other threads:[~2018-12-17 20:53 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-29 12:50 [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Yongji Xie
2018-11-29 13:12 ` Peter Zijlstra
2018-11-29 13:44   ` Peter Zijlstra
2018-11-29 14:02     ` Yongji Xie
2018-11-29 18:43     ` Davidlohr Bueso
2018-11-29 18:49       ` Waiman Long
2018-11-29 15:21   ` Waiman Long
2018-11-29 15:29     ` Waiman Long
2018-11-29 16:06     ` Peter Zijlstra
2018-11-29 17:02       ` Waiman Long
2018-11-29 17:27         ` Peter Zijlstra
2018-11-29 17:58           ` Waiman Long
2018-11-29 18:13             ` Peter Zijlstra
2018-11-29 18:17               ` Davidlohr Bueso
2018-11-29 18:08           ` Peter Zijlstra
2018-11-29 18:26             ` Waiman Long
2018-11-29 18:31               ` Will Deacon
2018-11-29 18:34                 ` Waiman Long
2018-11-29 22:05                   ` Peter Zijlstra
2018-11-30  9:34                     ` 答复: " Liu,Qi(ACU-T1)
2018-11-30 14:15                       ` Peter Zijlstra
2018-11-29 21:30               ` Davidlohr Bueso
2018-11-29 21:34                 ` Davidlohr Bueso
2018-11-29 22:17                   ` Peter Zijlstra
2018-11-30  9:30                     ` Andrea Parri
2018-12-03  5:31                     ` [PATCH -tip] kernel/sched,wake_q: Branch predict wake_q_add() cmpxchg Davidlohr Bueso
2018-12-03 16:10                       ` Waiman Long
2019-01-21 11:28                       ` [tip:locking/core] sched/wake_q: Add branch prediction hint to " tip-bot for Davidlohr Bueso
2018-12-10 15:12                     ` [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil Yongji Xie
2018-12-17 11:37                       ` Peter Zijlstra
2018-12-17 13:12                         ` Yongji Xie
2019-01-07 14:35                           ` Waiman Long
2019-01-07 15:31                             ` Peter Zijlstra
2019-01-07 15:35                               ` Waiman Long
2018-12-17 20:53                         ` Davidlohr Bueso [this message]
2018-12-18 13:10                           ` Peter Zijlstra
2018-12-18 13:14                             ` Peter Zijlstra
2018-12-18 17:27                               ` Davidlohr Bueso
2018-12-18 18:54                               ` [PATCH v2] sched/wake_q: Reduce reference counting for special users Davidlohr Bueso
2018-12-18 19:17                                 ` Waiman Long
2018-12-18 19:30                                   ` Davidlohr Bueso
2018-12-18 19:39                                     ` Davidlohr Bueso
2018-12-18 19:53                                       ` [PATCH v4] " Davidlohr Bueso
2018-12-18 20:35                                         ` Waiman Long
2019-01-21 16:02                                           ` Davidlohr Bueso
2019-01-22  8:55                                             ` Peter Zijlstra
2019-02-04  8:57                                         ` [tip:locking/core] " tip-bot for Davidlohr Bueso
2019-02-07 19:30                                           ` Davidlohr Bueso
2019-02-12 14:14                                           ` Daniel Vacek
2019-01-21 11:28 ` [tip:locking/core] locking/rwsem: Fix (possible) missed wakeup tip-bot for Xie Yongji

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181217205310.pvwcryyaqlrzmaex@linux-r8p5 \
    --to=dave@stgolabs.net \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=elohimes@gmail.com \
    --cc=lilin24@baidu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=liuqi16@baidu.com \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=nixun@baidu.com \
    --cc=peterz@infradead.org \
    --cc=will.deacon@arm.com \
    --cc=xieyongji@baidu.com \
    --cc=yuanlinsi01@baidu.com \
    --cc=zhangyu31@baidu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.