All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc
@ 2019-10-12  5:49 Manfred Spraul
  2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

Hi,

Updated series, based on input from Davidlohr:

- Mixing WRITE_ONCE(), when not holding a lock, and "normal" writes,
  when holding a lock, makes the code less readable.
  Thus use _ONCE() everywhere, for both WRITE_ONCE() and READ_ONCE().

- According to my understanding, wake_q_add() does not contain a barrier
  that protects the refount increase. Document that, and add the barrier
  to the ipc code

- and, based on patch review: The V1 patch for ipc/sem.c is incorrect,
  ->state must be set to "-EINTR", not EINTR.

From V1:

The memory barriers in ipc are not properly documented, and at least
for some architectures insufficient:
Reading the xyz->status is only a control barrier, thus
smp_acquire__after_ctrl_dep() was missing in mqueue.c and msg.c
sem.c contained a full smp_mb(), which is not required.

Patches:
Patch 1: Document the barrier rules for wake_q_add().

Patch 2: remove code duplication
@Davidlohr: There is no "Signed-off-by" in your mail, otherwise I would
list you as author.

Patch 3-5: Update the ipc code, especially add missing
           smp_mb__after_ctrl_dep().

Clarify that smp_mb__{before,after}_atomic() are compatible with all
RMW atomic operations, not just the operations that do not return a value.

Patch 6: Documentation for smp_mb__{before,after}_atomic().

Open issues:
- Is my analysis regarding the refcount correct?

- Review other users of wake_q_add().

- More testing. I did some tests, but doubt that the tests would be
  sufficient to show issues with regards to incorrect memory barriers.

- Should I add a "Fixes:" or "Cc:stable"? The issues that I see are
  the missing smp_mb__after_ctrl_dep(), and WRITE_ONCE() vs.
  "ptr = NULL", and a risk regarding the refcount that I can't evaluate.


What do you think?

--
	Manfred

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] wake_q: Cleanup + Documentation update.
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-14  6:34   ` Davidlohr Bueso
  2019-10-14 12:04   ` Peter Zijlstra
  2019-10-12  5:49 ` [PATCH 2/6] ipc/mqueue.c: Remove duplicated code Manfred Spraul
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

1) wake_q_add() contains a memory barrier, and callers such as
ipc/mqueue.c rely on this barrier.
Unfortunately, this is documented in ipc/mqueue.c, and not in the
description of wake_q_add().
Therefore: Update the documentation.
Removing/updating ipc/mqueue.c will happen with the next patch in the
series.

2) wake_q_add() ends with get_task_struct(), which is an
unordered refcount increase. Add a clear comment that the callers
are responsible for a barrier: most likely spin_unlock() or
smp_store_release().

3) wake_up_q() relies on the memory barrier in try_to_wake_up().
Add a comment, to simplify searching.

4) wake_q.next is accessed without synchroniyation by wake_q_add(),
using cmpxchg_relaxed(), and by wake_up_q().
Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
compiler doesn't perform any tricks.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 kernel/sched/core.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd05a378631a..60ae574317fd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
  * @task: the task to queue for 'later' wakeup
  *
  * Queue a task for later wakeup, most likely by the wake_up_q() call in the
- * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
- * instantly.
+ * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
+ * may happen before the function returns.
+ *
+ * What is guaranteed is that there is a memory barrier before the wakeup,
+ * callers may rely on this barrier.
+ *
+ * On the other hand, the caller must guarantee that @task does not disappear
+ * before wake_q_add() completed. wake_q_add() does not contain any memory
+ * barrier to ensure ordering, thus the caller may need to use
+ * smp_store_release().
  *
  * This function must be used as-if it were wake_up_process(); IOW the task
  * must be ready to be woken at this location.
@@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
 		BUG_ON(!task);
 		/* Task can safely be re-inserted now: */
 		node = node->next;
-		task->wake_q.next = NULL;
+
+		WRITE_ONCE(task->wake_q.next, NULL);
 
 		/*
 		 * wake_up_process() executes a full barrier, which pairs with
 		 * the queueing in wake_q_add() so as not to miss wakeups.
+		 * The barrier is the smp_mb__after_spinlock() in
+		 * try_to_wake_up().
 		 */
 		wake_up_process(task);
 		put_task_struct(task);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] ipc/mqueue.c: Remove duplicated code
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
  2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-14  6:35   ` Davidlohr Bueso
  2019-10-14 12:37   ` Peter Zijlstra
  2019-10-12  5:49 ` [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

Patch entirely from Davidlohr:
pipelined_send() and pipelined_receive() are identical, so merge them.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 ipc/mqueue.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3d920ff15c80..be48c0ba92f7 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
  * The same algorithm is used for senders.
  */
 
-/* pipelined_send() - send a message directly to the task waiting in
- * sys_mq_timedreceive() (without inserting message into a queue).
- */
-static inline void pipelined_send(struct wake_q_head *wake_q,
+static inline void __pipelined_op(struct wake_q_head *wake_q,
 				  struct mqueue_inode_info *info,
-				  struct msg_msg *message,
-				  struct ext_wait_queue *receiver)
+				  struct ext_wait_queue *this)
 {
-	receiver->msg = message;
-	list_del(&receiver->list);
-	wake_q_add(wake_q, receiver->task);
+	list_del(&this->list);
+	wake_q_add(wake_q, this->task);
 	/*
 	 * Rely on the implicit cmpxchg barrier from wake_q_add such
 	 * that we can ensure that updating receiver->state is the last
@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
 	 * yet, at that point we can later have a use-after-free
 	 * condition and bogus wakeup.
 	 */
-	receiver->state = STATE_READY;
+        this->state = STATE_READY;
+}
+
+/* pipelined_send() - send a message directly to the task waiting in
+ * sys_mq_timedreceive() (without inserting message into a queue).
+ */
+static inline void pipelined_send(struct wake_q_head *wake_q,
+				  struct mqueue_inode_info *info,
+				  struct msg_msg *message,
+				  struct ext_wait_queue *receiver)
+{
+	receiver->msg = message;
+	__pipelined_op(wake_q, info, receiver);
 }
 
 /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
 	if (msg_insert(sender->msg, info))
 		return;
 
-	list_del(&sender->list);
-	wake_q_add(wake_q, sender->task);
-	sender->state = STATE_READY;
+	__pipelined_op(wake_q, info, sender);
 }
 
 static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
  2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
  2019-10-12  5:49 ` [PATCH 2/6] ipc/mqueue.c: Remove duplicated code Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-14  6:38   ` Davidlohr Bueso
  2019-10-14 12:59   ` Peter Zijlstra
  2019-10-12  5:49 ` [PATCH 4/6] ipc/msg.c: Update and document " Manfred Spraul
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

Update and document memory barriers for mqueue.c:
- ewp->state is read without any locks, thus READ_ONCE is required.

- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
  acquire semantics if the value is STATE_READY.

- add an explicit memory barrier to __pipelined_op(), the
  refcount must have been increased before the updated state becomes
  visible

- document why __set_current_state() may be used:
  Reading task->state cannot happen before the wake_q_add() call,
  which happens while holding info->lock. Thus the spin_unlock()
  is the RELEASE, and the spin_lock() is the ACQUIRE.

For completeness: there is also a 3 CPU szenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 ipc/mqueue.c | 32 +++++++++++++++++++++-----------
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index be48c0ba92f7..b80574822f0a 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -646,18 +646,26 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr,
 	wq_add(info, sr, ewp);
 
 	for (;;) {
+		/* memory barrier not required, we hold info->lock */
 		__set_current_state(TASK_INTERRUPTIBLE);
 
 		spin_unlock(&info->lock);
 		time = schedule_hrtimeout_range_clock(timeout, 0,
 			HRTIMER_MODE_ABS, CLOCK_REALTIME);
 
-		if (ewp->state == STATE_READY) {
+		if (READ_ONCE(ewp->state) == STATE_READY) {
+			/*
+			 * Pairs, together with READ_ONCE(), with
+			 * the barrier in __pipelined_op().
+			 */
+			smp_acquire__after_ctrl_dep();
 			retval = 0;
 			goto out;
 		}
 		spin_lock(&info->lock);
-		if (ewp->state == STATE_READY) {
+
+		/* we hold info->lock, so no memory barrier required */
+		if (READ_ONCE(ewp->state) == STATE_READY) {
 			retval = 0;
 			goto out_unlock;
 		}
@@ -925,14 +933,12 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
 	list_del(&this->list);
 	wake_q_add(wake_q, this->task);
 	/*
-	 * Rely on the implicit cmpxchg barrier from wake_q_add such
-	 * that we can ensure that updating receiver->state is the last
-	 * write operation: As once set, the receiver can continue,
-	 * and if we don't have the reference count from the wake_q,
-	 * yet, at that point we can later have a use-after-free
-	 * condition and bogus wakeup.
+	 * The barrier is required to ensure that the refcount increase
+	 * inside wake_q_add() is completed before the state is updated.
+	 *
+	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
 	 */
-        this->state = STATE_READY;
+        smp_store_release(&this->state, STATE_READY);
 }
 
 /* pipelined_send() - send a message directly to the task waiting in
@@ -1049,7 +1055,9 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
 		} else {
 			wait.task = current;
 			wait.msg = (void *) msg_ptr;
-			wait.state = STATE_NONE;
+
+			/* memory barrier not required, we hold info->lock */
+			WRITE_ONCE(wait.state, STATE_NONE);
 			ret = wq_sleep(info, SEND, timeout, &wait);
 			/*
 			 * wq_sleep must be called with info->lock held, and
@@ -1152,7 +1160,9 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
 			ret = -EAGAIN;
 		} else {
 			wait.task = current;
-			wait.state = STATE_NONE;
+
+			/* memory barrier not required, we hold info->lock */
+			WRITE_ONCE(wait.state, STATE_NONE);
 			ret = wq_sleep(info, RECV, timeout, &wait);
 			msg_ptr = wait.msg;
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] ipc/msg.c: Update and document memory barriers.
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
                   ` (2 preceding siblings ...)
  2019-10-12  5:49 ` [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-12  5:49 ` [PATCH 5/6] ipc/sem.c: Document and update " Manfred Spraul
  2019-10-12  5:49 ` [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg() Manfred Spraul
  5 siblings, 0 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

Transfer findings from ipc/mqueue.c:
- A control barrier was missing for the lockless receive case
  So in theory, not yet initialized data may have been copied
  to user space - obviously only for architectures where
  control barriers are not NOP.

- use smp_store_release(). In theory, the refount
  may have been decreased to 0 already when wake_q_add()
  tries to get a reference.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 ipc/msg.c | 44 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 6 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 8dec945fa030..e6b20a7e6341 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -184,6 +184,10 @@ static inline void ss_add(struct msg_queue *msq,
 {
 	mss->tsk = current;
 	mss->msgsz = msgsz;
+	/*
+	 * No memory barrier required: we did ipc_lock_object(),
+	 * and the waker obtains that lock before calling wake_q_add().
+	 */
 	__set_current_state(TASK_INTERRUPTIBLE);
 	list_add_tail(&mss->list, &msq->q_senders);
 }
@@ -238,7 +242,14 @@ static void expunge_all(struct msg_queue *msq, int res,
 
 	list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
 		wake_q_add(wake_q, msr->r_tsk);
-		WRITE_ONCE(msr->r_msg, ERR_PTR(res));
+
+		/*
+		 * The barrier is required to ensure that the refcount increase
+		 * inside wake_q_add() is completed before the state is updated.
+		 *
+		 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
+		 */
+		smp_store_release(&msr->r_msg, ERR_PTR(res));
 	}
 }
 
@@ -798,13 +809,17 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
 			list_del(&msr->r_list);
 			if (msr->r_maxsize < msg->m_ts) {
 				wake_q_add(wake_q, msr->r_tsk);
-				WRITE_ONCE(msr->r_msg, ERR_PTR(-E2BIG));
+
+				/* See expunge_all regarding memory barrier */
+				smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG));
 			} else {
 				ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk));
 				msq->q_rtime = ktime_get_real_seconds();
 
 				wake_q_add(wake_q, msr->r_tsk);
-				WRITE_ONCE(msr->r_msg, msg);
+
+				/* See expunge_all regarding memory barrier */
+				smp_store_release(&msr->r_msg, msg);
 				return 1;
 			}
 		}
@@ -1154,7 +1169,11 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
 			msr_d.r_maxsize = INT_MAX;
 		else
 			msr_d.r_maxsize = bufsz;
-		msr_d.r_msg = ERR_PTR(-EAGAIN);
+
+		/* memory barrier not require due to ipc_lock_object() */
+		WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN));
+
+		/* memory barrier not required, we own ipc_lock_object() */
 		__set_current_state(TASK_INTERRUPTIBLE);
 
 		ipc_unlock_object(&msq->q_perm);
@@ -1183,8 +1202,21 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
 		 * signal) it will either see the message and continue ...
 		 */
 		msg = READ_ONCE(msr_d.r_msg);
-		if (msg != ERR_PTR(-EAGAIN))
+		if (msg != ERR_PTR(-EAGAIN)) {
+			/*
+			 * Memory barrier for msr_d.r_msg
+			 * The smp_acquire__after_ctrl_dep(), together with the
+			 * READ_ONCE() above pairs with the barrier inside
+			 * wake_q_add().
+			 * The barrier protects the accesses to the message in
+			 * do_msg_fill(). In addition, the barrier protects user
+			 * space, too: User space may assume that all data from
+			 * the CPU that sent the message is visible.
+			 */
+			smp_acquire__after_ctrl_dep();
+
 			goto out_unlock1;
+		}
 
 		 /*
 		  * ... or see -EAGAIN, acquire the lock to check the message
@@ -1192,7 +1224,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
 		  */
 		ipc_lock_object(&msq->q_perm);
 
-		msg = msr_d.r_msg;
+		msg = READ_ONCE(msr_d.r_msg);
 		if (msg != ERR_PTR(-EAGAIN))
 			goto out_unlock0;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] ipc/sem.c: Document and update memory barriers
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
                   ` (3 preceding siblings ...)
  2019-10-12  5:49 ` [PATCH 4/6] ipc/msg.c: Update and document " Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-12  5:49 ` [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg() Manfred Spraul
  5 siblings, 0 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

The patch documents and updates the memory barriers in ipc/sem.c:
- Add smp_store_release() to wake_up_sem_queue_prepare() and
  document why it is needed.

- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
  as the pair for the barrier inside wake_up_sem_queue_prepare().

- Add comments to all barriers, and mention the rules in the block
  regarding locking.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
 ipc/sem.c | 63 ++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 51 insertions(+), 12 deletions(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index ec97a7072413..c6c5954a2030 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -205,7 +205,9 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
  *
  * Memory ordering:
  * Most ordering is enforced by using spin_lock() and spin_unlock().
- * The special case is use_global_lock:
+ *
+ * Exceptions:
+ * 1) use_global_lock:
  * Setting it from non-zero to 0 is a RELEASE, this is ensured by
  * using smp_store_release().
  * Testing if it is non-zero is an ACQUIRE, this is ensured by using
@@ -214,6 +216,24 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
  * this smp_load_acquire(), this is guaranteed because the smp_load_acquire()
  * is inside a spin_lock() and after a write from 0 to non-zero a
  * spin_lock()+spin_unlock() is done.
+ *
+ * 2) queue.status:
+ * Initialization is done while holding sem_lock(), so no further barrier is
+ * required.
+ * Setting it to a result code is a RELEASE, this is ensured by both a
+ * smp_store_release() (for case a) and while holding sem_lock()
+ * (for case b).
+ * The AQUIRE when reading the result code without holding sem_lock() is
+ * achieved by using READ_ONCE() + smp_acquire__after_ctrl_dep().
+ * (case a above).
+ * Reading the result code while holding sem_lock() needs no further barriers,
+ * the locks inside sem_lock() enforce ordering (case b above)
+ *
+ * 3) current->state:
+ * current->state is set to TASK_INTERRUPTIBLE while holding sem_lock().
+ * The wakeup is handled using the wake_q infrastructure. wake_q wakeups may
+ * happen immediately after calling wake_q_add. As wake_q_add() is called
+ * when holding sem_lock(), no further barriers are required.
  */
 
 #define sc_semmsl	sem_ctls[0]
@@ -766,15 +786,24 @@ static int perform_atomic_semop(struct sem_array *sma, struct sem_queue *q)
 static inline void wake_up_sem_queue_prepare(struct sem_queue *q, int error,
 					     struct wake_q_head *wake_q)
 {
+	/*
+	 * When the wakeup is performed, q->sleeper->state is read and later
+	 * set to TASK_RUNNING. This may happen at any time, even before
+	 * wake_q_add() returns. Memory ordering for q->sleeper->state is
+	 * enforced by sem_lock(): we own sem_lock now (that was the ACQUIRE),
+	 * and q->sleeper wrote q->sleeper->state before calling sem_unlock()
+	 * (->RELEASE).
+	 */
 	wake_q_add(wake_q, q->sleeper);
 	/*
-	 * Rely on the above implicit barrier, such that we can
-	 * ensure that we hold reference to the task before setting
-	 * q->status. Otherwise we could race with do_exit if the
-	 * task is awoken by an external event before calling
-	 * wake_up_process().
+	 * Here, we need a barrier to protect the refcount increase inside
+	 * wake_q_add().
+	 * case a: The barrier inside wake_q_add() pairs with
+	 *         READ_ONCE(q->status) + smp_acquire__after_ctrl_dep() in
+	 *         do_semtimedop().
+	 * case b: nothing, ordering is enforced by the locks in sem_lock().
 	 */
-	WRITE_ONCE(q->status, error);
+	smp_store_release(&q->status, error);
 }
 
 static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
@@ -2148,9 +2177,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 	}
 
 	do {
+		/* memory ordering ensured by the lock in sem_lock() */
 		WRITE_ONCE(queue.status, -EINTR);
 		queue.sleeper = current;
 
+		/* memory ordering is ensured by the lock in sem_lock() */
 		__set_current_state(TASK_INTERRUPTIBLE);
 		sem_unlock(sma, locknum);
 		rcu_read_unlock();
@@ -2174,12 +2205,16 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		error = READ_ONCE(queue.status);
 		if (error != -EINTR) {
 			/*
-			 * User space could assume that semop() is a memory
-			 * barrier: Without the mb(), the cpu could
-			 * speculatively read in userspace stale data that was
-			 * overwritten by the previous owner of the semaphore.
+			 * Memory barrier for queue.status, case a):
+			 * The smp_acquire__after_ctrl_dep(), together with the
+			 * READ_ONCE() above pairs with the barrier inside
+			 * wake_up_sem_queue_prepare().
+			 * The barrier protects user space, too: User space may
+			 * assume that all data from the CPU that did the wakeup
+			 * semop() is visible on the wakee CPU when the sleeping
+			 * semop() returns.
 			 */
-			smp_mb();
+			smp_acquire__after_ctrl_dep();
 			goto out_free;
 		}
 
@@ -2189,6 +2224,10 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
 		if (!ipc_valid_object(&sma->sem_perm))
 			goto out_unlock_free;
 
+		/*
+		 * No necessity for any barrier:
+		 * We are protect by sem_lock() (case b)
+		 */
 		error = READ_ONCE(queue.status);
 
 		/*
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
                   ` (4 preceding siblings ...)
  2019-10-12  5:49 ` [PATCH 5/6] ipc/sem.c: Document and update " Manfred Spraul
@ 2019-10-12  5:49 ` Manfred Spraul
  2019-10-14 13:03   ` Peter Zijlstra
  2019-10-15  1:26   ` Davidlohr Bueso
  5 siblings, 2 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-12  5:49 UTC (permalink / raw)
  To: LKML, Davidlohr Bueso, Waiman Long
  Cc: 1vier1, Andrew Morton, Peter Zijlstra, Jonathan Corbet, Manfred Spraul

The documentation in memory-barriers.txt claims that
smp_mb__{before,after}_atomic() are for atomic ops that do not return a
value.

This is misleading and doesn't match the example in atomic_t.txt,
and e.g. smp_mb__before_atomic() may and is used together with
cmpxchg_relaxed() in the wake_q code.

The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
RMW atomic operation to a full memory barrier.
The return code of the atomic operation has no impact, so all of the
following examples are valid:

1)
	smp_mb__before_atomic();
	atomic_add();

2)
	smp_mb__before_atomic();
	atomic_xchg_relaxed();

3)
	smp_mb__before_atomic();
	atomic_fetch_add_relaxed();

Invalid would be:
	smp_mb__before_atomic();
	atomic_set();

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 Documentation/memory-barriers.txt | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..52076b057400 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
  (*) smp_mb__before_atomic();
  (*) smp_mb__after_atomic();
 
-     These are for use with atomic (such as add, subtract, increment and
-     decrement) functions that don't return a value, especially when used for
-     reference counting.  These functions do not imply memory barriers.
+     These are for use with atomic RMW functions (such as add, subtract,
+     increment, decrement, failed conditional operations, ...) that do
+     not imply memory barriers, but where the code needs a memory barrier,
+     for example when used for reference counting.
 
-     These are also used for atomic bitop functions that do not return a
-     value (such as set_bit and clear_bit).
+     These are also used for atomic RMW bitop functions that do imply a full
+     memory barrier (such as set_bit and clear_bit).
 
      As an example, consider a piece of code that marks an object as being dead
      and then decrements the object's reference count:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] wake_q: Cleanup + Documentation update.
  2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
@ 2019-10-14  6:34   ` Davidlohr Bueso
  2019-10-14 12:04   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-14  6:34 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Waiman Long, 1vier1, Andrew Morton, Peter Zijlstra,
	Jonathan Corbet

On Sat, 12 Oct 2019, Manfred Spraul wrote:

>1) wake_q_add() contains a memory barrier, and callers such as
>ipc/mqueue.c rely on this barrier.
>Unfortunately, this is documented in ipc/mqueue.c, and not in the
>description of wake_q_add().
>Therefore: Update the documentation.
>Removing/updating ipc/mqueue.c will happen with the next patch in the
>series.
>
>2) wake_q_add() ends with get_task_struct(), which is an
>unordered refcount increase. Add a clear comment that the callers
>are responsible for a barrier: most likely spin_unlock() or
>smp_store_release().
>
>3) wake_up_q() relies on the memory barrier in try_to_wake_up().
>Add a comment, to simplify searching.
>
>4) wake_q.next is accessed without synchroniyation by wake_q_add(),
>using cmpxchg_relaxed(), and by wake_up_q().
>Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
>compiler doesn't perform any tricks.
>
>Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>Cc: Davidlohr Bueso <dave@stgolabs.net>
>---
> kernel/sched/core.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
>diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>index dd05a378631a..60ae574317fd 100644
>--- a/kernel/sched/core.c
>+++ b/kernel/sched/core.c
>@@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
>  * @task: the task to queue for 'later' wakeup
>  *
>  * Queue a task for later wakeup, most likely by the wake_up_q() call in the
>- * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
>- * instantly.
>+ * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
>+ * may happen before the function returns.
>+ *
>+ * What is guaranteed is that there is a memory barrier before the wakeup,
>+ * callers may rely on this barrier.
>+ *
>+ * On the other hand, the caller must guarantee that @task does not disappear
>+ * before wake_q_add() completed. wake_q_add() does not contain any memory
>+ * barrier to ensure ordering, thus the caller may need to use
>+ * smp_store_release().

This is why we have wake_q_add_safe(). I think this last paragraph is unnecessary
and confusing.

Thanks,
Davidlohr

>  *
>  * This function must be used as-if it were wake_up_process(); IOW the task
>  * must be ready to be woken at this location.
>@@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
> 		BUG_ON(!task);
> 		/* Task can safely be re-inserted now: */
> 		node = node->next;
>-		task->wake_q.next = NULL;
>+
>+		WRITE_ONCE(task->wake_q.next, NULL);
>
> 		/*
> 		 * wake_up_process() executes a full barrier, which pairs with
> 		 * the queueing in wake_q_add() so as not to miss wakeups.
>+		 * The barrier is the smp_mb__after_spinlock() in
>+		 * try_to_wake_up().
> 		 */
> 		wake_up_process(task);
> 		put_task_struct(task);
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] ipc/mqueue.c: Remove duplicated code
  2019-10-12  5:49 ` [PATCH 2/6] ipc/mqueue.c: Remove duplicated code Manfred Spraul
@ 2019-10-14  6:35   ` Davidlohr Bueso
  2019-10-14 12:37   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-14  6:35 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Waiman Long, 1vier1, Andrew Morton, Peter Zijlstra,
	Jonathan Corbet

On Sat, 12 Oct 2019, Manfred Spraul wrote:

>Patch entirely from Davidlohr:
>pipelined_send() and pipelined_receive() are identical, so merge them.
>

Sorry, yeah feel free to add my:

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

>Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>Cc: Davidlohr Bueso <dave@stgolabs.net>
>---
> ipc/mqueue.c | 31 ++++++++++++++++++-------------
> 1 file changed, 18 insertions(+), 13 deletions(-)
>
>diff --git a/ipc/mqueue.c b/ipc/mqueue.c
>index 3d920ff15c80..be48c0ba92f7 100644
>--- a/ipc/mqueue.c
>+++ b/ipc/mqueue.c
>@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
>  * The same algorithm is used for senders.
>  */
>
>-/* pipelined_send() - send a message directly to the task waiting in
>- * sys_mq_timedreceive() (without inserting message into a queue).
>- */
>-static inline void pipelined_send(struct wake_q_head *wake_q,
>+static inline void __pipelined_op(struct wake_q_head *wake_q,
> 				  struct mqueue_inode_info *info,
>-				  struct msg_msg *message,
>-				  struct ext_wait_queue *receiver)
>+				  struct ext_wait_queue *this)
> {
>-	receiver->msg = message;
>-	list_del(&receiver->list);
>-	wake_q_add(wake_q, receiver->task);
>+	list_del(&this->list);
>+	wake_q_add(wake_q, this->task);
> 	/*
> 	 * Rely on the implicit cmpxchg barrier from wake_q_add such
> 	 * that we can ensure that updating receiver->state is the last
>@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
> 	 * yet, at that point we can later have a use-after-free
> 	 * condition and bogus wakeup.
> 	 */
>-	receiver->state = STATE_READY;
>+        this->state = STATE_READY;
>+}
>+
>+/* pipelined_send() - send a message directly to the task waiting in
>+ * sys_mq_timedreceive() (without inserting message into a queue).
>+ */
>+static inline void pipelined_send(struct wake_q_head *wake_q,
>+				  struct mqueue_inode_info *info,
>+				  struct msg_msg *message,
>+				  struct ext_wait_queue *receiver)
>+{
>+	receiver->msg = message;
>+	__pipelined_op(wake_q, info, receiver);
> }
>
> /* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
>@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
> 	if (msg_insert(sender->msg, info))
> 		return;
>
>-	list_del(&sender->list);
>-	wake_q_add(wake_q, sender->task);
>-	sender->state = STATE_READY;
>+	__pipelined_op(wake_q, info, sender);
> }
>
> static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers
  2019-10-12  5:49 ` [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
@ 2019-10-14  6:38   ` Davidlohr Bueso
  2019-10-14 12:59   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-14  6:38 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Waiman Long, 1vier1, Andrew Morton, Peter Zijlstra,
	Jonathan Corbet

On Sat, 12 Oct 2019, Manfred Spraul wrote:

>Update and document memory barriers for mqueue.c:
>- ewp->state is read without any locks, thus READ_ONCE is required.
>
>- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
>  acquire semantics if the value is STATE_READY.
>
>- add an explicit memory barrier to __pipelined_op(), the
>  refcount must have been increased before the updated state becomes
>  visible
>
>- document why __set_current_state() may be used:
>  Reading task->state cannot happen before the wake_q_add() call,
>  which happens while holding info->lock. Thus the spin_unlock()
>  is the RELEASE, and the spin_lock() is the ACQUIRE.
>
>For completeness: there is also a 3 CPU szenario, if the to be woken
     		   	    	     	 ^^^ scenario

>up task is already on another wake_q.
>Then:
>- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
>- CPU2: the spin_lock() of the waker is the ACQUIRE
>- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
>- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
>
>Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>Cc: Waiman Long <longman@redhat.com>
>Cc: Davidlohr Bueso <dave@stgolabs.net>

Without considering the smp_store_release() in __pipelined_op(), feel
free to add my:

Reviewed-by: Davidlohr Bueso <dbueso@suse.de>

>---
> ipc/mqueue.c | 32 +++++++++++++++++++++-----------
> 1 file changed, 21 insertions(+), 11 deletions(-)
>
>diff --git a/ipc/mqueue.c b/ipc/mqueue.c
>index be48c0ba92f7..b80574822f0a 100644
>--- a/ipc/mqueue.c
>+++ b/ipc/mqueue.c
>@@ -646,18 +646,26 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr,
> 	wq_add(info, sr, ewp);
>
> 	for (;;) {
>+		/* memory barrier not required, we hold info->lock */
> 		__set_current_state(TASK_INTERRUPTIBLE);
>
> 		spin_unlock(&info->lock);
> 		time = schedule_hrtimeout_range_clock(timeout, 0,
> 			HRTIMER_MODE_ABS, CLOCK_REALTIME);
>
>-		if (ewp->state == STATE_READY) {
>+		if (READ_ONCE(ewp->state) == STATE_READY) {
>+			/*
>+			 * Pairs, together with READ_ONCE(), with
>+			 * the barrier in __pipelined_op().
>+			 */
>+			smp_acquire__after_ctrl_dep();
> 			retval = 0;
> 			goto out;
> 		}
> 		spin_lock(&info->lock);
>-		if (ewp->state == STATE_READY) {
>+
>+		/* we hold info->lock, so no memory barrier required */
>+		if (READ_ONCE(ewp->state) == STATE_READY) {
> 			retval = 0;
> 			goto out_unlock;
> 		}
>@@ -925,14 +933,12 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
> 	list_del(&this->list);
> 	wake_q_add(wake_q, this->task);
> 	/*
>-	 * Rely on the implicit cmpxchg barrier from wake_q_add such
>-	 * that we can ensure that updating receiver->state is the last
>-	 * write operation: As once set, the receiver can continue,
>-	 * and if we don't have the reference count from the wake_q,
>-	 * yet, at that point we can later have a use-after-free
>-	 * condition and bogus wakeup.
>+	 * The barrier is required to ensure that the refcount increase
>+	 * inside wake_q_add() is completed before the state is updated.
>+	 *
>+	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
> 	 */
>-        this->state = STATE_READY;
>+        smp_store_release(&this->state, STATE_READY);
> }
>
> /* pipelined_send() - send a message directly to the task waiting in
>@@ -1049,7 +1055,9 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
> 		} else {
> 			wait.task = current;
> 			wait.msg = (void *) msg_ptr;
>-			wait.state = STATE_NONE;
>+
>+			/* memory barrier not required, we hold info->lock */
>+			WRITE_ONCE(wait.state, STATE_NONE);
> 			ret = wq_sleep(info, SEND, timeout, &wait);
> 			/*
> 			 * wq_sleep must be called with info->lock held, and
>@@ -1152,7 +1160,9 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
> 			ret = -EAGAIN;
> 		} else {
> 			wait.task = current;
>-			wait.state = STATE_NONE;
>+
>+			/* memory barrier not required, we hold info->lock */
>+			WRITE_ONCE(wait.state, STATE_NONE);
> 			ret = wq_sleep(info, RECV, timeout, &wait);
> 			msg_ptr = wait.msg;
> 		}
>-- 
>2.21.0
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] wake_q: Cleanup + Documentation update.
  2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
  2019-10-14  6:34   ` Davidlohr Bueso
@ 2019-10-14 12:04   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-14 12:04 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Sat, Oct 12, 2019 at 07:49:53AM +0200, Manfred Spraul wrote:
> 1) wake_q_add() contains a memory barrier, and callers such as
> ipc/mqueue.c rely on this barrier.
> Unfortunately, this is documented in ipc/mqueue.c, and not in the
> description of wake_q_add().
> Therefore: Update the documentation.
> Removing/updating ipc/mqueue.c will happen with the next patch in the
> series.
> 
> 2) wake_q_add() ends with get_task_struct(), which is an
> unordered refcount increase. Add a clear comment that the callers
> are responsible for a barrier: most likely spin_unlock() or
> smp_store_release().
> 
> 3) wake_up_q() relies on the memory barrier in try_to_wake_up().
> Add a comment, to simplify searching.
> 
> 4) wake_q.next is accessed without synchroniyation by wake_q_add(),
> using cmpxchg_relaxed(), and by wake_up_q().
> Therefore: Use WRITE_ONCE in wake_up_q(), to ensure that the
> compiler doesn't perform any tricks.
> 
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> ---
>  kernel/sched/core.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index dd05a378631a..60ae574317fd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -440,8 +440,16 @@ static bool __wake_q_add(struct wake_q_head *head, struct task_struct *task)
>   * @task: the task to queue for 'later' wakeup
>   *
>   * Queue a task for later wakeup, most likely by the wake_up_q() call in the
> - * same context, _HOWEVER_ this is not guaranteed, the wakeup can come
> - * instantly.
> + * same context, _HOWEVER_ this is not guaranteed. Especially, the wakeup
> + * may happen before the function returns.
> + *
> + * What is guaranteed is that there is a memory barrier before the wakeup,
> + * callers may rely on this barrier.

I would like to stress that this (wake_q_add) provides the same ordering
guarantees as a 'normal' wakeup.

> + *
> + * On the other hand, the caller must guarantee that @task does not disappear
> + * before wake_q_add() completed. wake_q_add() does not contain any memory
> + * barrier to ensure ordering, thus the caller may need to use
> + * smp_store_release().

Like Davidlohr, I'm slightly puzzled by this last paragraph.

>   *
>   * This function must be used as-if it were wake_up_process(); IOW the task
>   * must be ready to be woken at this location.
> @@ -486,11 +494,14 @@ void wake_up_q(struct wake_q_head *head)
>  		BUG_ON(!task);
>  		/* Task can safely be re-inserted now: */
>  		node = node->next;
> -		task->wake_q.next = NULL;
> +
> +		WRITE_ONCE(task->wake_q.next, NULL);
>  
>  		/*
>  		 * wake_up_process() executes a full barrier, which pairs with
>  		 * the queueing in wake_q_add() so as not to miss wakeups.
> +		 * The barrier is the smp_mb__after_spinlock() in
> +		 * try_to_wake_up().

We already have wake_up_process() documented as implying this barrier;
why do we want to mention the specifics here? That is, have a look at
the comments before try_to_wake_up().

>  		 */
>  		wake_up_process(task);
>  		put_task_struct(task);
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] ipc/mqueue.c: Remove duplicated code
  2019-10-12  5:49 ` [PATCH 2/6] ipc/mqueue.c: Remove duplicated code Manfred Spraul
  2019-10-14  6:35   ` Davidlohr Bueso
@ 2019-10-14 12:37   ` Peter Zijlstra
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-14 12:37 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Sat, Oct 12, 2019 at 07:49:54AM +0200, Manfred Spraul wrote:
> +static inline void __pipelined_op(struct wake_q_head *wake_q,
>  				  struct mqueue_inode_info *info,
> +				  struct ext_wait_queue *this)
>  {
> +	list_del(&this->list);
> +	wake_q_add(wake_q, this->task);
>  	/*
>  	 * Rely on the implicit cmpxchg barrier from wake_q_add such
>  	 * that we can ensure that updating receiver->state is the last
> @@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
>  	 * yet, at that point we can later have a use-after-free
>  	 * condition and bogus wakeup.
>  	 */
> +        this->state = STATE_READY;

  ^^^^^^^^^^ whitespace damage

> +}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers
  2019-10-12  5:49 ` [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
  2019-10-14  6:38   ` Davidlohr Bueso
@ 2019-10-14 12:59   ` Peter Zijlstra
  2019-10-14 13:58     ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-14 12:59 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Sat, Oct 12, 2019 at 07:49:55AM +0200, Manfred Spraul wrote:

>  	for (;;) {
> +		/* memory barrier not required, we hold info->lock */
>  		__set_current_state(TASK_INTERRUPTIBLE);
>  
>  		spin_unlock(&info->lock);
>  		time = schedule_hrtimeout_range_clock(timeout, 0,
>  			HRTIMER_MODE_ABS, CLOCK_REALTIME);
>  
> +		if (READ_ONCE(ewp->state) == STATE_READY) {
> +			/*
> +			 * Pairs, together with READ_ONCE(), with
> +			 * the barrier in __pipelined_op().
> +			 */
> +			smp_acquire__after_ctrl_dep();
>  			retval = 0;
>  			goto out;
>  		}
>  		spin_lock(&info->lock);
> +
> +		/* we hold info->lock, so no memory barrier required */
> +		if (READ_ONCE(ewp->state) == STATE_READY) {
>  			retval = 0;
>  			goto out_unlock;
>  		}
> @@ -925,14 +933,12 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
>  	list_del(&this->list);
>  	wake_q_add(wake_q, this->task);
>  	/*
> +	 * The barrier is required to ensure that the refcount increase
> +	 * inside wake_q_add() is completed before the state is updated.

fails to explain *why* this is important.

> +	 *
> +	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
>  	 */
> +        smp_store_release(&this->state, STATE_READY);

You retained the whitespace damage.

And I'm terribly confused by this code, probably due to the lack of
'why' as per the above. What is this trying to do?

Are we worried about something like:

	A			B				C


				wq_sleep()
				  schedule_...();

								/* spuriuos wakeup */
								wake_up_process(B)

	wake_q_add(A)
	  if (cmpxchg()) // success

	->state = STATE_READY (reordered)

				  if (READ_ONCE() == STATE_READY)
				    goto out;

				exit();


	    get_task_struct() // UaF


Can we put the exact and full race in the comment please?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-12  5:49 ` [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg() Manfred Spraul
@ 2019-10-14 13:03   ` Peter Zijlstra
  2019-10-14 17:49     ` Manfred Spraul
  2019-10-15  1:26   ` Davidlohr Bueso
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-14 13:03 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
> The documentation in memory-barriers.txt claims that
> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
> value.
> 
> This is misleading and doesn't match the example in atomic_t.txt,
> and e.g. smp_mb__before_atomic() may and is used together with
> cmpxchg_relaxed() in the wake_q code.
> 
> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
> RMW atomic operation to a full memory barrier.
> The return code of the atomic operation has no impact, so all of the
> following examples are valid:

The value return of atomic ops is relevant in so far that
(traditionally) all value returning atomic ops already implied full
barriers. That of course changed when we added
_release/_acquire/_relaxed variants.

> 
> 1)
> 	smp_mb__before_atomic();
> 	atomic_add();
> 
> 2)
> 	smp_mb__before_atomic();
> 	atomic_xchg_relaxed();
> 
> 3)
> 	smp_mb__before_atomic();
> 	atomic_fetch_add_relaxed();
> 
> Invalid would be:
> 	smp_mb__before_atomic();
> 	atomic_set();
> 
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  Documentation/memory-barriers.txt | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..52076b057400 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
>   (*) smp_mb__before_atomic();
>   (*) smp_mb__after_atomic();
>  
> -     These are for use with atomic (such as add, subtract, increment and
> -     decrement) functions that don't return a value, especially when used for
> -     reference counting.  These functions do not imply memory barriers.
> +     These are for use with atomic RMW functions (such as add, subtract,
> +     increment, decrement, failed conditional operations, ...) that do
> +     not imply memory barriers, but where the code needs a memory barrier,
> +     for example when used for reference counting.
>  
> -     These are also used for atomic bitop functions that do not return a
> -     value (such as set_bit and clear_bit).
> +     These are also used for atomic RMW bitop functions that do imply a full

s/do/do not/ ?

> +     memory barrier (such as set_bit and clear_bit).



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers
  2019-10-14 12:59   ` Peter Zijlstra
@ 2019-10-14 13:58     ` Peter Zijlstra
  2019-10-14 18:06       ` Manfred Spraul
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-14 13:58 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Mon, Oct 14, 2019 at 02:59:11PM +0200, Peter Zijlstra wrote:
> On Sat, Oct 12, 2019 at 07:49:55AM +0200, Manfred Spraul wrote:
> 
> >  	for (;;) {
> > +		/* memory barrier not required, we hold info->lock */
> >  		__set_current_state(TASK_INTERRUPTIBLE);
> >  
> >  		spin_unlock(&info->lock);
> >  		time = schedule_hrtimeout_range_clock(timeout, 0,
> >  			HRTIMER_MODE_ABS, CLOCK_REALTIME);
> >  
> > +		if (READ_ONCE(ewp->state) == STATE_READY) {
> > +			/*
> > +			 * Pairs, together with READ_ONCE(), with
> > +			 * the barrier in __pipelined_op().
> > +			 */
> > +			smp_acquire__after_ctrl_dep();
> >  			retval = 0;
> >  			goto out;
> >  		}
> >  		spin_lock(&info->lock);
> > +
> > +		/* we hold info->lock, so no memory barrier required */
> > +		if (READ_ONCE(ewp->state) == STATE_READY) {
> >  			retval = 0;
> >  			goto out_unlock;
> >  		}
> > @@ -925,14 +933,12 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
> >  	list_del(&this->list);
> >  	wake_q_add(wake_q, this->task);
> >  	/*
> > +	 * The barrier is required to ensure that the refcount increase
> > +	 * inside wake_q_add() is completed before the state is updated.
> 
> fails to explain *why* this is important.
> 
> > +	 *
> > +	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
> >  	 */
> > +        smp_store_release(&this->state, STATE_READY);
> 
> You retained the whitespace damage.
> 
> And I'm terribly confused by this code, probably due to the lack of
> 'why' as per the above. What is this trying to do?
> 
> Are we worried about something like:
> 
> 	A			B				C
> 
> 
> 				wq_sleep()
> 				  schedule_...();
> 
> 								/* spuriuos wakeup */
> 								wake_up_process(B)
> 
> 	wake_q_add(A)
> 	  if (cmpxchg()) // success
> 
> 	->state = STATE_READY (reordered)
> 
> 				  if (READ_ONCE() == STATE_READY)
> 				    goto out;
> 
> 				exit();
> 
> 
> 	    get_task_struct() // UaF
> 
> 
> Can we put the exact and full race in the comment please?

Like Davidlohr already suggested, elsewhere we write it like so:


--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -930,15 +930,10 @@ static inline void __pipelined_op(struct
 				  struct mqueue_inode_info *info,
 				  struct ext_wait_queue *this)
 {
+	get_task_struct(this->task);
 	list_del(&this->list);
-	wake_q_add(wake_q, this->task);
-	/*
-	 * The barrier is required to ensure that the refcount increase
-	 * inside wake_q_add() is completed before the state is updated.
-	 *
-	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
-	 */
-        smp_store_release(&this->state, STATE_READY);
+	smp_store_release(&this->state, STATE_READY);
+	wake_q_add_safe(wake_q, this->task);
 }
 
 /* pipelined_send() - send a message directly to the task waiting in

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-14 13:03   ` Peter Zijlstra
@ 2019-10-14 17:49     ` Manfred Spraul
  2019-10-14 19:03       ` Davidlohr Bueso
                         ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-14 17:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 2707 bytes --]

Hello Peter,

On 10/14/19 3:03 PM, Peter Zijlstra wrote:
> On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
>> The documentation in memory-barriers.txt claims that
>> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
>> value.
>>
>> This is misleading and doesn't match the example in atomic_t.txt,
>> and e.g. smp_mb__before_atomic() may and is used together with
>> cmpxchg_relaxed() in the wake_q code.
>>
>> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
>> RMW atomic operation to a full memory barrier.
>> The return code of the atomic operation has no impact, so all of the
>> following examples are valid:
> The value return of atomic ops is relevant in so far that
> (traditionally) all value returning atomic ops already implied full
> barriers. That of course changed when we added
> _release/_acquire/_relaxed variants.
I've updated the Change description accordingly
>> 1)
>> 	smp_mb__before_atomic();
>> 	atomic_add();
>>
>> 2)
>> 	smp_mb__before_atomic();
>> 	atomic_xchg_relaxed();
>>
>> 3)
>> 	smp_mb__before_atomic();
>> 	atomic_fetch_add_relaxed();
>>
>> Invalid would be:
>> 	smp_mb__before_atomic();
>> 	atomic_set();
>>
>> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>> Cc: Waiman Long <longman@redhat.com>
>> Cc: Davidlohr Bueso <dave@stgolabs.net>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> ---
>>   Documentation/memory-barriers.txt | 11 ++++++-----
>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
>> index 1adbb8a371c7..52076b057400 100644
>> --- a/Documentation/memory-barriers.txt
>> +++ b/Documentation/memory-barriers.txt
>> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
>>    (*) smp_mb__before_atomic();
>>    (*) smp_mb__after_atomic();
>>   
>> -     These are for use with atomic (such as add, subtract, increment and
>> -     decrement) functions that don't return a value, especially when used for
>> -     reference counting.  These functions do not imply memory barriers.
>> +     These are for use with atomic RMW functions (such as add, subtract,
>> +     increment, decrement, failed conditional operations, ...) that do
>> +     not imply memory barriers, but where the code needs a memory barrier,
>> +     for example when used for reference counting.
>>   
>> -     These are also used for atomic bitop functions that do not return a
>> -     value (such as set_bit and clear_bit).
>> +     These are also used for atomic RMW bitop functions that do imply a full
> s/do/do not/ ?
Sorry, yes, of course
>> +     memory barrier (such as set_bit and clear_bit).



[-- Attachment #2: 0001-Update-Documentation-for-_-acquire-release-relaxed.patch --]
[-- Type: text/x-patch, Size: 2379 bytes --]

From 61c85a56994e32ea393af9debef4cccd9cd24abd Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Fri, 11 Oct 2019 10:33:26 +0200
Subject: [PATCH] Update Documentation for _{acquire|release|relaxed}()

When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:

smp_mb__{before,after}_atomic() is now indended for all RMW operations
that do not imply a full memory barrier.

1)
	smp_mb__before_atomic();
	atomic_add();

2)
	smp_mb__before_atomic();
	atomic_xchg_relaxed();

3)
	smp_mb__before_atomic();
	atomic_fetch_add_relaxed();

Invalid would be:
	smp_mb__before_atomic();
	atomic_set();

Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 Documentation/memory-barriers.txt | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..08090eea3751 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
  (*) smp_mb__before_atomic();
  (*) smp_mb__after_atomic();
 
-     These are for use with atomic (such as add, subtract, increment and
-     decrement) functions that don't return a value, especially when used for
-     reference counting.  These functions do not imply memory barriers.
+     These are for use with atomic RMW functions (such as add, subtract,
+     increment, decrement, failed conditional operations, ...) that do
+     not imply memory barriers, but where the code needs a memory barrier,
+     for example when used for reference counting.
 
-     These are also used for atomic bitop functions that do not return a
-     value (such as set_bit and clear_bit).
+     These are also used for atomic RMW bitop functions that do not imply a
+     full memory barrier (such as set_bit and clear_bit).
 
      As an example, consider a piece of code that marks an object as being dead
      and then decrements the object's reference count:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers
  2019-10-14 13:58     ` Peter Zijlstra
@ 2019-10-14 18:06       ` Manfred Spraul
  0 siblings, 0 replies; 23+ messages in thread
From: Manfred Spraul @ 2019-10-14 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

Hi Peter,

On 10/14/19 3:58 PM, Peter Zijlstra wrote:
> On Mon, Oct 14, 2019 at 02:59:11PM +0200, Peter Zijlstra wrote:
>> On Sat, Oct 12, 2019 at 07:49:55AM +0200, Manfred Spraul wrote:
>>
>>>   	for (;;) {
>>> +		/* memory barrier not required, we hold info->lock */
>>>   		__set_current_state(TASK_INTERRUPTIBLE);
>>>   
>>>   		spin_unlock(&info->lock);
>>>   		time = schedule_hrtimeout_range_clock(timeout, 0,
>>>   			HRTIMER_MODE_ABS, CLOCK_REALTIME);
>>>   
>>> +		if (READ_ONCE(ewp->state) == STATE_READY) {
>>> +			/*
>>> +			 * Pairs, together with READ_ONCE(), with
>>> +			 * the barrier in __pipelined_op().
>>> +			 */
>>> +			smp_acquire__after_ctrl_dep();
>>>   			retval = 0;
>>>   			goto out;
>>>   		}
>>>   		spin_lock(&info->lock);
>>> +
>>> +		/* we hold info->lock, so no memory barrier required */
>>> +		if (READ_ONCE(ewp->state) == STATE_READY) {
>>>   			retval = 0;
>>>   			goto out_unlock;
>>>   		}
>>> @@ -925,14 +933,12 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
>>>   	list_del(&this->list);
>>>   	wake_q_add(wake_q, this->task);
>>>   	/*
>>> +	 * The barrier is required to ensure that the refcount increase
>>> +	 * inside wake_q_add() is completed before the state is updated.
>> fails to explain *why* this is important.
>>
>>> +	 *
>>> +	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
>>>   	 */
>>> +        smp_store_release(&this->state, STATE_READY);
>> You retained the whitespace damage.
>>
>> And I'm terribly confused by this code, probably due to the lack of
>> 'why' as per the above. What is this trying to do?
>>
>> Are we worried about something like:
>>
>> 	A			B				C
>>
>>
>> 				wq_sleep()
>> 				  schedule_...();
>>
>> 								/* spuriuos wakeup */
>> 								wake_up_process(B)
>>
>> 	wake_q_add(A)
>> 	  if (cmpxchg()) // success
>>
>> 	->state = STATE_READY (reordered)
>>
>> 				  if (READ_ONCE() == STATE_READY)
>> 				    goto out;
>>
>> 				exit();
>>
>>
>> 	    get_task_struct() // UaF
>>
>>
>> Can we put the exact and full race in the comment please?

Yes, I'll do that. Actually, two threads are sufficient:

A                    B

WRITE_ONCE(wait.state, STATE_NONE);
schedule_hrtimeout()

                       wake_q_add(A)
                       if (cmpxchg()) // success
                       ->state = STATE_READY (reordered)

<timeout returns>
if (wait.state == STATE_READY) return;
sysret to user space
sys_exit()

                       get_task_struct() // UaF


> Like Davidlohr already suggested, elsewhere we write it like so:
>
>
> --- a/ipc/mqueue.c
> +++ b/ipc/mqueue.c
> @@ -930,15 +930,10 @@ static inline void __pipelined_op(struct
>   				  struct mqueue_inode_info *info,
>   				  struct ext_wait_queue *this)
>   {
> +	get_task_struct(this->task);
>   	list_del(&this->list);
> -	wake_q_add(wake_q, this->task);
> -	/*
> -	 * The barrier is required to ensure that the refcount increase
> -	 * inside wake_q_add() is completed before the state is updated.
> -	 *
> -	 * The barrier pairs with READ_ONCE()+smp_mb__after_ctrl_dep().
> -	 */
> -        smp_store_release(&this->state, STATE_READY);
> +	smp_store_release(&this->state, STATE_READY);
> +	wake_q_add_safe(wake_q, this->task);
>   }
>   
>   /* pipelined_send() - send a message directly to the task waiting in

Much better, I'll rewrite it and then resend the series.

--

     Manfred


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-14 17:49     ` Manfred Spraul
@ 2019-10-14 19:03       ` Davidlohr Bueso
  2019-10-15  7:45       ` Peter Zijlstra
  2019-10-15 20:31       ` Waiman Long
  2 siblings, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-14 19:03 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Peter Zijlstra, LKML, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Mon, 14 Oct 2019, Manfred Spraul wrote:
>I've updated the Change description accordingly

I continue to think my memory-barriers.txt change regarding failed
Rmw is still good to have. Unless any strong objections, could you
also add the patch to the series?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-12  5:49 ` [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg() Manfred Spraul
  2019-10-14 13:03   ` Peter Zijlstra
@ 2019-10-15  1:26   ` Davidlohr Bueso
  2019-10-15  7:19     ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-15  1:26 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Waiman Long, 1vier1, Andrew Morton, Peter Zijlstra,
	Jonathan Corbet

On Sat, 12 Oct 2019, Manfred Spraul wrote:
>Invalid would be:
>	smp_mb__before_atomic();
>	atomic_set();

fyi I've caught a couple of naughty users:

   drivers/crypto/cavium/nitrox/nitrox_main.c
   drivers/gpu/drm/msm/adreno/a5xx_preempt.c

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-15  1:26   ` Davidlohr Bueso
@ 2019-10-15  7:19     ` Peter Zijlstra
  2019-10-15 16:26       ` Davidlohr Bueso
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-15  7:19 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Manfred Spraul, LKML, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet, parri.andrea

On Mon, Oct 14, 2019 at 06:26:04PM -0700, Davidlohr Bueso wrote:
> On Sat, 12 Oct 2019, Manfred Spraul wrote:
> > Invalid would be:
> > 	smp_mb__before_atomic();
> > 	atomic_set();
> 
> fyi I've caught a couple of naughty users:
> 
>   drivers/crypto/cavium/nitrox/nitrox_main.c
>   drivers/gpu/drm/msm/adreno/a5xx_preempt.c

Yes, there's still some of that. Andrea went and killed a buch a while
ago I think.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-14 17:49     ` Manfred Spraul
  2019-10-14 19:03       ` Davidlohr Bueso
@ 2019-10-15  7:45       ` Peter Zijlstra
  2019-10-15 20:31       ` Waiman Long
  2 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2019-10-15  7:45 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet

On Mon, Oct 14, 2019 at 07:49:56PM +0200, Manfred Spraul wrote:

> From 61c85a56994e32ea393af9debef4cccd9cd24abd Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <manfred@colorfullife.com>
> Date: Fri, 11 Oct 2019 10:33:26 +0200
> Subject: [PATCH] Update Documentation for _{acquire|release|relaxed}()
> 
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
> 
> smp_mb__{before,after}_atomic() is now indended for all RMW operations

"intended"

Otherwise it looks fine, thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-15  7:19     ` Peter Zijlstra
@ 2019-10-15 16:26       ` Davidlohr Bueso
  0 siblings, 0 replies; 23+ messages in thread
From: Davidlohr Bueso @ 2019-10-15 16:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Manfred Spraul, LKML, Waiman Long, 1vier1, Andrew Morton,
	Jonathan Corbet, parri.andrea

On Tue, 15 Oct 2019, Peter Zijlstra wrote:

>On Mon, Oct 14, 2019 at 06:26:04PM -0700, Davidlohr Bueso wrote:
>> On Sat, 12 Oct 2019, Manfred Spraul wrote:
>> > Invalid would be:
>> > 	smp_mb__before_atomic();
>> > 	atomic_set();
>>
>> fyi I've caught a couple of naughty users:
>>
>>   drivers/crypto/cavium/nitrox/nitrox_main.c
>>   drivers/gpu/drm/msm/adreno/a5xx_preempt.c
>
>Yes, there's still some of that. Andrea went and killed a buch a while
>ago I think.

I sent these, which just does smp_mb():

https://lore.kernel.org/lkml/20191015161657.10760-1-dave@stgolabs.net
https://lore.kernel.org/lkml/20191015162144.fuyc25tdwvc6ddu3@linux-p48b

Similarly, I was doing some barrier auditing in btrfs code recently
(completely unrelated to these topics) and noted that there are some
cases where we can inverse this exercise. Iow, callers doing atomic
Rmw along with smp_mb(), which we can replace with these upgradable
calls and benefit, for example in x86.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg()
  2019-10-14 17:49     ` Manfred Spraul
  2019-10-14 19:03       ` Davidlohr Bueso
  2019-10-15  7:45       ` Peter Zijlstra
@ 2019-10-15 20:31       ` Waiman Long
  2 siblings, 0 replies; 23+ messages in thread
From: Waiman Long @ 2019-10-15 20:31 UTC (permalink / raw)
  To: Manfred Spraul, Peter Zijlstra
  Cc: LKML, Davidlohr Bueso, 1vier1, Andrew Morton, Jonathan Corbet

On 10/14/19 1:49 PM, Manfred Spraul wrote:
> Hello Peter,
>
> On 10/14/19 3:03 PM, Peter Zijlstra wrote:
>> On Sat, Oct 12, 2019 at 07:49:58AM +0200, Manfred Spraul wrote:
>>> The documentation in memory-barriers.txt claims that
>>> smp_mb__{before,after}_atomic() are for atomic ops that do not return a
>>> value.
>>>
>>> This is misleading and doesn't match the example in atomic_t.txt,
>>> and e.g. smp_mb__before_atomic() may and is used together with
>>> cmpxchg_relaxed() in the wake_q code.
>>>
>>> The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
>>> RMW atomic operation to a full memory barrier.
>>> The return code of the atomic operation has no impact, so all of the
>>> following examples are valid:
>> The value return of atomic ops is relevant in so far that
>> (traditionally) all value returning atomic ops already implied full
>> barriers. That of course changed when we added
>> _release/_acquire/_relaxed variants.
> I've updated the Change description accordingly
>>> 1)
>>>     smp_mb__before_atomic();
>>>     atomic_add();
>>>
>>> 2)
>>>     smp_mb__before_atomic();
>>>     atomic_xchg_relaxed();
>>>
>>> 3)
>>>     smp_mb__before_atomic();
>>>     atomic_fetch_add_relaxed();
>>>
>>> Invalid would be:
>>>     smp_mb__before_atomic();
>>>     atomic_set();
>>>
>>> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
>>> Cc: Waiman Long <longman@redhat.com>
>>> Cc: Davidlohr Bueso <dave@stgolabs.net>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> ---
>>>   Documentation/memory-barriers.txt | 11 ++++++-----
>>>   1 file changed, 6 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/Documentation/memory-barriers.txt
>>> b/Documentation/memory-barriers.txt
>>> index 1adbb8a371c7..52076b057400 100644
>>> --- a/Documentation/memory-barriers.txt
>>> +++ b/Documentation/memory-barriers.txt
>>> @@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
>>>    (*) smp_mb__before_atomic();
>>>    (*) smp_mb__after_atomic();
>>>   -     These are for use with atomic (such as add, subtract,
>>> increment and
>>> -     decrement) functions that don't return a value, especially
>>> when used for
>>> -     reference counting.  These functions do not imply memory
>>> barriers.
>>> +     These are for use with atomic RMW functions (such as add,
>>> subtract,
>>> +     increment, decrement, failed conditional operations, ...) that do
>>> +     not imply memory barriers, but where the code needs a memory
>>> barrier,
>>> +     for example when used for reference counting.
>>>   -     These are also used for atomic bitop functions that do not
>>> return a
>>> -     value (such as set_bit and clear_bit).
>>> +     These are also used for atomic RMW bitop functions that do
>>> imply a full
>> s/do/do not/ ?
> Sorry, yes, of course

I was wondering the same thing. With the revised patch,

Acked-by: Waiman Long <longman@redhat.com>

>>> +     memory barrier (such as set_bit and clear_bit).
>
>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2019-10-15 20:31 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-12  5:49 [PATCH 0/6] V2: Clarify/standardize memory barriers for ipc Manfred Spraul
2019-10-12  5:49 ` [PATCH 1/6] wake_q: Cleanup + Documentation update Manfred Spraul
2019-10-14  6:34   ` Davidlohr Bueso
2019-10-14 12:04   ` Peter Zijlstra
2019-10-12  5:49 ` [PATCH 2/6] ipc/mqueue.c: Remove duplicated code Manfred Spraul
2019-10-14  6:35   ` Davidlohr Bueso
2019-10-14 12:37   ` Peter Zijlstra
2019-10-12  5:49 ` [PATCH 3/6] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
2019-10-14  6:38   ` Davidlohr Bueso
2019-10-14 12:59   ` Peter Zijlstra
2019-10-14 13:58     ` Peter Zijlstra
2019-10-14 18:06       ` Manfred Spraul
2019-10-12  5:49 ` [PATCH 4/6] ipc/msg.c: Update and document " Manfred Spraul
2019-10-12  5:49 ` [PATCH 5/6] ipc/sem.c: Document and update " Manfred Spraul
2019-10-12  5:49 ` [PATCH 6/6] Documentation/memory-barriers.txt: Clarify cmpxchg() Manfred Spraul
2019-10-14 13:03   ` Peter Zijlstra
2019-10-14 17:49     ` Manfred Spraul
2019-10-14 19:03       ` Davidlohr Bueso
2019-10-15  7:45       ` Peter Zijlstra
2019-10-15 20:31       ` Waiman Long
2019-10-15  1:26   ` Davidlohr Bueso
2019-10-15  7:19     ` Peter Zijlstra
2019-10-15 16:26       ` Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.