* [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
2019-10-20 12:33 [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc Manfred Spraul
@ 2019-10-20 12:33 ` Manfred Spraul
2019-11-01 16:49 ` Will Deacon
2019-10-20 12:33 ` [PATCH 2/5] ipc/mqueue.c: Remove duplicated code Manfred Spraul
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2019-10-20 12:33 UTC (permalink / raw)
To: LKML, Davidlohr Bueso, Waiman Long
Cc: 1vier1, Andrew Morton, Peter Zijlstra, Manfred Spraul, Will Deacon
When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:
smp_mb__{before,after}_atomic() is now intended for all RMW operations
that do not imply a memory barrier.
1)
smp_mb__before_atomic();
atomic_add();
2)
smp_mb__before_atomic();
atomic_xchg_relaxed();
3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();
Invalid would be:
smp_mb__before_atomic();
atomic_set();
In addition, the patch splits the long sentence into multiple shorter
sentences.
Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
---
Documentation/memory-barriers.txt | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..fe43f4b30907 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,16 @@ There are some more advanced barrier functions:
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();
- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
-
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are for use with atomic RMW functions that do not imply memory
+ barriers, but where the code needs a memory barrier. Examples for atomic
+ RMW functions that do not imply are memory barrier are e.g. add,
+ subtract, (failed) conditional operations, _relaxed functions,
+ but not atomic_read or atomic_set. A common example where a memory
+ barrier may be required is when atomic ops are used for reference
+ counting.
+
+ These are also used for atomic RMW bitop functions that do not imply a
+ memory barrier (such as set_bit and clear_bit).
As an example, consider a piece of code that marks an object as being dead
and then decrements the object's reference count:
--
2.21.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
2019-10-20 12:33 ` [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation Manfred Spraul
@ 2019-11-01 16:49 ` Will Deacon
2019-11-06 19:23 ` Manfred Spraul
0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2019-11-01 16:49 UTC (permalink / raw)
To: Manfred Spraul
Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
Peter Zijlstra, Will Deacon
Hi Manfred,
On Sun, Oct 20, 2019 at 02:33:01PM +0200, Manfred Spraul wrote:
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
>
> smp_mb__{before,after}_atomic() is now intended for all RMW operations
> that do not imply a memory barrier.
[...]
> Documentation/memory-barriers.txt | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..fe43f4b30907 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,12 +1873,16 @@ There are some more advanced barrier functions:
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> + These are for use with atomic RMW functions that do not imply memory
> + barriers, but where the code needs a memory barrier. Examples for atomic
> + RMW functions that do not imply are memory barrier are e.g. add,
typo: "are memory barrier"
> + subtract, (failed) conditional operations, _relaxed functions,
> + but not atomic_read or atomic_set. A common example where a memory
> + barrier may be required is when atomic ops are used for reference
> + counting.
> +
> + These are also used for atomic RMW bitop functions that do not imply a
> + memory barrier (such as set_bit and clear_bit).
Although I think this is correct, I really think we should instead refer to
Documentation/atomic_t.txt and get rid of this out of memory-barriers.txt
entirely. It's just duplication and is out of date.
Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
2019-11-01 16:49 ` Will Deacon
@ 2019-11-06 19:23 ` Manfred Spraul
2019-11-07 11:22 ` Will Deacon
0 siblings, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2019-11-06 19:23 UTC (permalink / raw)
To: Will Deacon
Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
Peter Zijlstra, Will Deacon
[-- Attachment #1: Type: text/plain, Size: 710 bytes --]
Hi Will,
On 11/1/19 5:49 PM, Will Deacon wrote:
> Hi Manfred,
>
> On Sun, Oct 20, 2019 at 02:33:01PM +0200, Manfred Spraul wrote:
>> When adding the _{acquire|release|relaxed}() variants of some atomic
>> operations, it was forgotten to update Documentation/memory_barrier.txt:
>>
>> smp_mb__{before,after}_atomic() is now intended for all RMW operations
>> that do not imply a memory barrier.
> [...]
>
> Although I think this is correct, I really think we should instead refer to
> Documentation/atomic_t.txt and get rid of this out of memory-barriers.txt
> entirely. It's just duplication and is out of date.
So you would prefer the attached patch?
For me this would be fine, too.
--
Manfred
[-- Attachment #2: 0001-smp_mb__-before-after-_atomic-Update-Documentation.patch --]
[-- Type: text/x-patch, Size: 2346 bytes --]
From 8d2b219794221e3ef1a1ec90e0f4fe344af9a55d Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Fri, 11 Oct 2019 10:33:26 +0200
Subject: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
When adding the _{acquire|release|relaxed}() variants of some atomic
operations, it was forgotten to update Documentation/memory_barrier.txt:
smp_mb__before_atomic and smp_mb__after_atomic can be combined with
all RMW operations that do not imply memory barriers.
In order to avoid that this happens again:
Remove the paragraph from Documentation/memory_barrier.txt, the functions
are sufficiently documented in Documentation/atomic_{t,bitops}.txt
Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
---
Documentation/memory-barriers.txt | 20 +-------------------
1 file changed, 1 insertion(+), 19 deletions(-)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..16dfb4cde1e1 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,25 +1873,7 @@ There are some more advanced barrier functions:
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();
- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
-
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
-
- As an example, consider a piece of code that marks an object as being dead
- and then decrements the object's reference count:
-
- obj->dead = 1;
- smp_mb__before_atomic();
- atomic_dec(&obj->ref_count);
-
- This makes sure that the death mark on the object is perceived to be set
- *before* the reference counter is decremented.
-
- See Documentation/atomic_{t,bitops}.txt for more information.
-
+ See Documentation/atomic_{t,bitops}.txt for information.
(*) dma_wmb();
(*) dma_rmb();
--
2.23.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
2019-11-06 19:23 ` Manfred Spraul
@ 2019-11-07 11:22 ` Will Deacon
0 siblings, 0 replies; 10+ messages in thread
From: Will Deacon @ 2019-11-07 11:22 UTC (permalink / raw)
To: Manfred Spraul
Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Andrew Morton,
Peter Zijlstra, Will Deacon
Hi Manfred,
On Wed, Nov 06, 2019 at 08:23:03PM +0100, Manfred Spraul wrote:
> From 8d2b219794221e3ef1a1ec90e0f4fe344af9a55d Mon Sep 17 00:00:00 2001
> From: Manfred Spraul <manfred@colorfullife.com>
> Date: Fri, 11 Oct 2019 10:33:26 +0200
> Subject: [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation
>
> When adding the _{acquire|release|relaxed}() variants of some atomic
> operations, it was forgotten to update Documentation/memory_barrier.txt:
>
> smp_mb__before_atomic and smp_mb__after_atomic can be combined with
> all RMW operations that do not imply memory barriers.
>
> In order to avoid that this happens again:
> Remove the paragraph from Documentation/memory_barrier.txt, the functions
> are sufficiently documented in Documentation/atomic_{t,bitops}.txt
>
> Fixes: 654672d4ba1a ("locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations")
>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Acked-by: Waiman Long <longman@redhat.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
> Documentation/memory-barriers.txt | 20 +-------------------
> 1 file changed, 1 insertion(+), 19 deletions(-)
>
> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..16dfb4cde1e1 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1873,25 +1873,7 @@ There are some more advanced barrier functions:
> (*) smp_mb__before_atomic();
> (*) smp_mb__after_atomic();
>
> - These are for use with atomic (such as add, subtract, increment and
> - decrement) functions that don't return a value, especially when used for
> - reference counting. These functions do not imply memory barriers.
> -
> - These are also used for atomic bitop functions that do not return a
> - value (such as set_bit and clear_bit).
> -
> - As an example, consider a piece of code that marks an object as being dead
> - and then decrements the object's reference count:
> -
> - obj->dead = 1;
> - smp_mb__before_atomic();
> - atomic_dec(&obj->ref_count);
> -
> - This makes sure that the death mark on the object is perceived to be set
> - *before* the reference counter is decremented.
> -
> - See Documentation/atomic_{t,bitops}.txt for more information.
> -
> + See Documentation/atomic_{t,bitops}.txt for information.
>
> (*) dma_wmb();
> (*) dma_rmb();
Thanks, I much prefer this approach:
Acked-by: Will Deacon <will@kernel.org>
Will
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/5] ipc/mqueue.c: Remove duplicated code
2019-10-20 12:33 [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc Manfred Spraul
2019-10-20 12:33 ` [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation Manfred Spraul
@ 2019-10-20 12:33 ` Manfred Spraul
2019-10-22 22:43 ` Andrew Morton
2019-10-20 12:33 ` [PATCH 3/5] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Manfred Spraul @ 2019-10-20 12:33 UTC (permalink / raw)
To: LKML, Davidlohr Bueso, Waiman Long
Cc: 1vier1, Andrew Morton, Peter Zijlstra, Manfred Spraul
Patch from Davidlohr, I just added this change log.
pipelined_send() and pipelined_receive() are identical, so merge them.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
ipc/mqueue.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 3d920ff15c80..270456530f6a 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -918,17 +918,12 @@ SYSCALL_DEFINE1(mq_unlink, const char __user *, u_name)
* The same algorithm is used for senders.
*/
-/* pipelined_send() - send a message directly to the task waiting in
- * sys_mq_timedreceive() (without inserting message into a queue).
- */
-static inline void pipelined_send(struct wake_q_head *wake_q,
+static inline void __pipelined_op(struct wake_q_head *wake_q,
struct mqueue_inode_info *info,
- struct msg_msg *message,
- struct ext_wait_queue *receiver)
+ struct ext_wait_queue *this)
{
- receiver->msg = message;
- list_del(&receiver->list);
- wake_q_add(wake_q, receiver->task);
+ list_del(&this->list);
+ wake_q_add(wake_q, this->task);
/*
* Rely on the implicit cmpxchg barrier from wake_q_add such
* that we can ensure that updating receiver->state is the last
@@ -937,7 +932,19 @@ static inline void pipelined_send(struct wake_q_head *wake_q,
* yet, at that point we can later have a use-after-free
* condition and bogus wakeup.
*/
- receiver->state = STATE_READY;
+ this->state = STATE_READY;
+}
+
+/* pipelined_send() - send a message directly to the task waiting in
+ * sys_mq_timedreceive() (without inserting message into a queue).
+ */
+static inline void pipelined_send(struct wake_q_head *wake_q,
+ struct mqueue_inode_info *info,
+ struct msg_msg *message,
+ struct ext_wait_queue *receiver)
+{
+ receiver->msg = message;
+ __pipelined_op(wake_q, info, receiver);
}
/* pipelined_receive() - if there is task waiting in sys_mq_timedsend()
@@ -955,9 +962,7 @@ static inline void pipelined_receive(struct wake_q_head *wake_q,
if (msg_insert(sender->msg, info))
return;
- list_del(&sender->list);
- wake_q_add(wake_q, sender->task);
- sender->state = STATE_READY;
+ __pipelined_op(wake_q, info, sender);
}
static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
--
2.21.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/5] ipc/mqueue.c: Remove duplicated code
2019-10-20 12:33 ` [PATCH 2/5] ipc/mqueue.c: Remove duplicated code Manfred Spraul
@ 2019-10-22 22:43 ` Andrew Morton
0 siblings, 0 replies; 10+ messages in thread
From: Andrew Morton @ 2019-10-22 22:43 UTC (permalink / raw)
To: Manfred Spraul; +Cc: LKML, Davidlohr Bueso, Waiman Long, 1vier1, Peter Zijlstra
On Sun, 20 Oct 2019 14:33:02 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
> Patch from Davidlohr, I just added this change log.
> pipelined_send() and pipelined_receive() are identical, so merge them.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
I redid the changelog as below:
From: Davidlohr Bueso <dave@stgolabs.net>
Subject: ipc/mqueue.c: remove duplicated code
pipelined_send() and pipelined_receive() are identical, so merge them.
[manfred@colorfullife.com: add changelog]
Link: http://lkml.kernel.org/r/20191020123305.14715-3-manfred@colorfullife.com
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/5] ipc/mqueue.c: Update/document memory barriers
2019-10-20 12:33 [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc Manfred Spraul
2019-10-20 12:33 ` [PATCH 1/5] smp_mb__{before,after}_atomic(): Update Documentation Manfred Spraul
2019-10-20 12:33 ` [PATCH 2/5] ipc/mqueue.c: Remove duplicated code Manfred Spraul
@ 2019-10-20 12:33 ` Manfred Spraul
2019-10-20 12:33 ` [PATCH 4/5] ipc/msg.c: Update and document " Manfred Spraul
2019-10-20 12:33 ` [PATCH 5/5] ipc/sem.c: Document and update " Manfred Spraul
4 siblings, 0 replies; 10+ messages in thread
From: Manfred Spraul @ 2019-10-20 12:33 UTC (permalink / raw)
To: LKML, Davidlohr Bueso, Waiman Long
Cc: 1vier1, Andrew Morton, Peter Zijlstra, Manfred Spraul, Davidlohr Bueso
Update and document memory barriers for mqueue.c:
- ewp->state is read without any locks, thus READ_ONCE is required.
- add smp_aquire__after_ctrl_dep() after the READ_ONCE, we need
acquire semantics if the value is STATE_READY.
- use wake_q_add_safe()
- document why __set_current_state() may be used:
Reading task->state cannot happen before the wake_q_add() call,
which happens while holding info->lock. Thus the spin_unlock()
is the RELEASE, and the spin_lock() is the ACQUIRE.
For completeness: there is also a 3 CPU scenario, if the to be woken
up task is already on another wake_q.
Then:
- CPU1: spin_unlock() of the task that goes to sleep is the RELEASE
- CPU2: the spin_lock() of the waker is the ACQUIRE
- CPU2: smp_mb__before_atomic inside wake_q_add() is the RELEASE
- CPU3: smp_mb__after_spinlock() inside try_to_wake_up() is the ACQUIRE
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Waiman Long <longman@redhat.com>
---
ipc/mqueue.c | 92 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 78 insertions(+), 14 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 270456530f6a..49a05ba3000d 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -63,6 +63,66 @@ struct posix_msg_tree_node {
int priority;
};
+/*
+ * Locking:
+ *
+ * Accesses to a message queue are synchronized by acquiring info->lock.
+ *
+ * There are two notable exceptions:
+ * - The actual wakeup of a sleeping task is performed using the wake_q
+ * framework. info->lock is already released when wake_up_q is called.
+ * - The exit codepaths after sleeping check ext_wait_queue->state without
+ * any locks. If it is STATE_READY, then the syscall is completed without
+ * acquiring info->lock.
+ *
+ * MQ_BARRIER:
+ * To achieve proper release/acquire memory barrier pairing, the state is set to
+ * STATE_READY with smp_store_release(), and it is read with READ_ONCE followed
+ * by smp_acquire__after_ctrl_dep(). In addition, wake_q_add_safe() is used.
+ *
+ * This prevents the following races:
+ *
+ * 1) With the simple wake_q_add(), the task could be gone already before
+ * the increase of the reference happens
+ * Thread A
+ * Thread B
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * wake_q_add(A)
+ * if (cmpxchg()) // success
+ * ->state = STATE_READY (reordered)
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * sysret to user space
+ * sys_exit()
+ * get_task_struct() // UaF
+ *
+ * Solution: Use wake_q_add_safe() and perform the get_task_struct() before
+ * the smp_store_release() that does ->state = STATE_READY.
+ *
+ * 2) Without proper _release/_acquire barriers, the woken up task
+ * could read stale data
+ *
+ * Thread A
+ * Thread B
+ * do_mq_timedreceive
+ * WRITE_ONCE(wait.state, STATE_NONE);
+ * schedule_hrtimeout()
+ * state = STATE_READY;
+ * <timeout returns>
+ * if (wait.state == STATE_READY) return;
+ * msg_ptr = wait.msg; // Access to stale data!
+ * receiver->msg = message; (reordered)
+ *
+ * Solution: use _release and _acquire barriers.
+ *
+ * 3) There is intentionally no barrier when setting current->state
+ * to TASK_INTERRUPTIBLE: spin_unlock(&info->lock) provides the
+ * release memory barrier, and the wakeup is triggered when holding
+ * info->lock, i.e. spin_lock(&info->lock) provided a pairing
+ * acquire memory barrier.
+ */
+
struct ext_wait_queue { /* queue of sleeping tasks */
struct task_struct *task;
struct list_head list;
@@ -646,18 +706,23 @@ static int wq_sleep(struct mqueue_inode_info *info, int sr,
wq_add(info, sr, ewp);
for (;;) {
+ /* memory barrier not required, we hold info->lock */
__set_current_state(TASK_INTERRUPTIBLE);
spin_unlock(&info->lock);
time = schedule_hrtimeout_range_clock(timeout, 0,
HRTIMER_MODE_ABS, CLOCK_REALTIME);
- if (ewp->state == STATE_READY) {
+ if (READ_ONCE(ewp->state) == STATE_READY) {
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
retval = 0;
goto out;
}
spin_lock(&info->lock);
- if (ewp->state == STATE_READY) {
+
+ /* we hold info->lock, so no memory barrier required */
+ if (READ_ONCE(ewp->state) == STATE_READY) {
retval = 0;
goto out_unlock;
}
@@ -923,16 +988,11 @@ static inline void __pipelined_op(struct wake_q_head *wake_q,
struct ext_wait_queue *this)
{
list_del(&this->list);
- wake_q_add(wake_q, this->task);
- /*
- * Rely on the implicit cmpxchg barrier from wake_q_add such
- * that we can ensure that updating receiver->state is the last
- * write operation: As once set, the receiver can continue,
- * and if we don't have the reference count from the wake_q,
- * yet, at that point we can later have a use-after-free
- * condition and bogus wakeup.
- */
- this->state = STATE_READY;
+ get_task_struct(this->task);
+
+ /* see MQ_BARRIER for purpose/pairing */
+ smp_store_release(&this->state, STATE_READY);
+ wake_q_add_safe(wake_q, this->task);
}
/* pipelined_send() - send a message directly to the task waiting in
@@ -1049,7 +1109,9 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
} else {
wait.task = current;
wait.msg = (void *) msg_ptr;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, SEND, timeout, &wait);
/*
* wq_sleep must be called with info->lock held, and
@@ -1152,7 +1214,9 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
ret = -EAGAIN;
} else {
wait.task = current;
- wait.state = STATE_NONE;
+
+ /* memory barrier not required, we hold info->lock */
+ WRITE_ONCE(wait.state, STATE_NONE);
ret = wq_sleep(info, RECV, timeout, &wait);
msg_ptr = wait.msg;
}
--
2.21.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/5] ipc/msg.c: Update and document memory barriers.
2019-10-20 12:33 [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc Manfred Spraul
` (2 preceding siblings ...)
2019-10-20 12:33 ` [PATCH 3/5] ipc/mqueue.c: Update/document memory barriers Manfred Spraul
@ 2019-10-20 12:33 ` Manfred Spraul
2019-10-20 12:33 ` [PATCH 5/5] ipc/sem.c: Document and update " Manfred Spraul
4 siblings, 0 replies; 10+ messages in thread
From: Manfred Spraul @ 2019-10-20 12:33 UTC (permalink / raw)
To: LKML, Davidlohr Bueso, Waiman Long
Cc: 1vier1, Andrew Morton, Peter Zijlstra, Manfred Spraul
Transfer findings from ipc/mqueue.c:
- A control barrier was missing for the lockless receive case
So in theory, not yet initialized data may have been copied
to user space - obviously only for architectures where
control barriers are not NOP.
- use smp_store_release(). In theory, the refount
may have been decreased to 0 already when wake_q_add()
tries to get a reference.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
ipc/msg.c | 43 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 36 insertions(+), 7 deletions(-)
diff --git a/ipc/msg.c b/ipc/msg.c
index 8dec945fa030..192a9291a8ab 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -61,6 +61,16 @@ struct msg_queue {
struct list_head q_senders;
} __randomize_layout;
+/*
+ * MSG_BARRIER Locking:
+ *
+ * Similar to the optimization used in ipc/mqueue.c, one syscall return path
+ * does not acquire any locks when it sees that a message exists in
+ * msg_receiver.r_msg. Therefore r_msg is set using smp_store_release()
+ * and accessed using READ_ONCE()+smp_acquire__after_ctrl_dep(). In addition,
+ * wake_q_add_safe() is used. See ipc/mqueue.c for more details
+ */
+
/* one msg_receiver structure for each sleeping receiver */
struct msg_receiver {
struct list_head r_list;
@@ -184,6 +194,10 @@ static inline void ss_add(struct msg_queue *msq,
{
mss->tsk = current;
mss->msgsz = msgsz;
+ /*
+ * No memory barrier required: we did ipc_lock_object(),
+ * and the waker obtains that lock before calling wake_q_add().
+ */
__set_current_state(TASK_INTERRUPTIBLE);
list_add_tail(&mss->list, &msq->q_senders);
}
@@ -237,8 +251,11 @@ static void expunge_all(struct msg_queue *msq, int res,
struct msg_receiver *msr, *t;
list_for_each_entry_safe(msr, t, &msq->q_receivers, r_list) {
- wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(res));
+ get_task_struct(msr->r_tsk);
+
+ /* see MSG_BARRIER for purpose/pairing */
+ smp_store_release(&msr->r_msg, ERR_PTR(res));
+ wake_q_add_safe(wake_q, msr->r_tsk);
}
}
@@ -798,13 +815,17 @@ static inline int pipelined_send(struct msg_queue *msq, struct msg_msg *msg,
list_del(&msr->r_list);
if (msr->r_maxsize < msg->m_ts) {
wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, ERR_PTR(-E2BIG));
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, ERR_PTR(-E2BIG));
} else {
ipc_update_pid(&msq->q_lrpid, task_pid(msr->r_tsk));
msq->q_rtime = ktime_get_real_seconds();
wake_q_add(wake_q, msr->r_tsk);
- WRITE_ONCE(msr->r_msg, msg);
+
+ /* See expunge_all regarding memory barrier */
+ smp_store_release(&msr->r_msg, msg);
return 1;
}
}
@@ -1154,7 +1175,11 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
msr_d.r_maxsize = INT_MAX;
else
msr_d.r_maxsize = bufsz;
- msr_d.r_msg = ERR_PTR(-EAGAIN);
+
+ /* memory barrier not require due to ipc_lock_object() */
+ WRITE_ONCE(msr_d.r_msg, ERR_PTR(-EAGAIN));
+
+ /* memory barrier not required, we own ipc_lock_object() */
__set_current_state(TASK_INTERRUPTIBLE);
ipc_unlock_object(&msq->q_perm);
@@ -1183,8 +1208,12 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
* signal) it will either see the message and continue ...
*/
msg = READ_ONCE(msr_d.r_msg);
- if (msg != ERR_PTR(-EAGAIN))
+ if (msg != ERR_PTR(-EAGAIN)) {
+ /* see MSG_BARRIER for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
+
goto out_unlock1;
+ }
/*
* ... or see -EAGAIN, acquire the lock to check the message
@@ -1192,7 +1221,7 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in
*/
ipc_lock_object(&msq->q_perm);
- msg = msr_d.r_msg;
+ msg = READ_ONCE(msr_d.r_msg);
if (msg != ERR_PTR(-EAGAIN))
goto out_unlock0;
--
2.21.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/5] ipc/sem.c: Document and update memory barriers
2019-10-20 12:33 [PATCH 0/5] V3: Clarify/standardize memory barriers for ipc Manfred Spraul
` (3 preceding siblings ...)
2019-10-20 12:33 ` [PATCH 4/5] ipc/msg.c: Update and document " Manfred Spraul
@ 2019-10-20 12:33 ` Manfred Spraul
4 siblings, 0 replies; 10+ messages in thread
From: Manfred Spraul @ 2019-10-20 12:33 UTC (permalink / raw)
To: LKML, Davidlohr Bueso, Waiman Long
Cc: 1vier1, Andrew Morton, Peter Zijlstra, Manfred Spraul
The patch documents and updates the memory barriers in ipc/sem.c:
- Add smp_store_release() to wake_up_sem_queue_prepare() and
document why it is needed.
- Read q->status using READ_ONCE+smp_acquire__after_ctrl_dep().
as the pair for the barrier inside wake_up_sem_queue_prepare().
- Add comments to all barriers, and mention the rules in the block
regarding locking.
- Switch to using wake_q_add_safe().
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
---
ipc/sem.c | 66 ++++++++++++++++++++++++++++++++++---------------------
1 file changed, 41 insertions(+), 25 deletions(-)
diff --git a/ipc/sem.c b/ipc/sem.c
index ec97a7072413..c89734b200c6 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -205,15 +205,38 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
*
* Memory ordering:
* Most ordering is enforced by using spin_lock() and spin_unlock().
- * The special case is use_global_lock:
+ *
+ * Exceptions:
+ * 1) use_global_lock: (SEM_BARRIER_1)
* Setting it from non-zero to 0 is a RELEASE, this is ensured by
- * using smp_store_release().
+ * using smp_store_release(): Immediately after setting it to 0,
+ * a simple op can start.
* Testing if it is non-zero is an ACQUIRE, this is ensured by using
* smp_load_acquire().
* Setting it from 0 to non-zero must be ordered with regards to
* this smp_load_acquire(), this is guaranteed because the smp_load_acquire()
* is inside a spin_lock() and after a write from 0 to non-zero a
* spin_lock()+spin_unlock() is done.
+ *
+ * 2) queue.status: (SEM_BARRIER_2)
+ * Initialization is done while holding sem_lock(), so no further barrier is
+ * required.
+ * Setting it to a result code is a RELEASE, this is ensured by both a
+ * smp_store_release() (for case a) and while holding sem_lock()
+ * (for case b).
+ * The AQUIRE when reading the result code without holding sem_lock() is
+ * achieved by using READ_ONCE() + smp_acquire__after_ctrl_dep().
+ * (case a above).
+ * Reading the result code while holding sem_lock() needs no further barriers,
+ * the locks inside sem_lock() enforce ordering (case b above)
+ *
+ * 3) current->state:
+ * current->state is set to TASK_INTERRUPTIBLE while holding sem_lock().
+ * The wakeup is handled using the wake_q infrastructure. wake_q wakeups may
+ * happen immediately after calling wake_q_add. As wake_q_add_safe() is called
+ * when holding sem_lock(), no further barriers are required.
+ *
+ * See also ipc/mqueue.c for more details on the covered races.
*/
#define sc_semmsl sem_ctls[0]
@@ -344,12 +367,8 @@ static void complexmode_tryleave(struct sem_array *sma)
return;
}
if (sma->use_global_lock == 1) {
- /*
- * Immediately after setting use_global_lock to 0,
- * a simple op can start. Thus: all memory writes
- * performed by the current operation must be visible
- * before we set use_global_lock to 0.
- */
+
+ /* See SEM_BARRIER_1 for purpose/pairing */
smp_store_release(&sma->use_global_lock, 0);
} else {
sma->use_global_lock--;
@@ -400,7 +419,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
*/
spin_lock(&sem->lock);
- /* pairs with smp_store_release() */
+ /* see SEM_BARRIER_1 for purpose/pairing */
if (!smp_load_acquire(&sma->use_global_lock)) {
/* fast path successful! */
return sops->sem_num;
@@ -766,15 +785,12 @@ static int perform_atomic_semop(struct sem_array *sma, struct sem_queue *q)
static inline void wake_up_sem_queue_prepare(struct sem_queue *q, int error,
struct wake_q_head *wake_q)
{
- wake_q_add(wake_q, q->sleeper);
- /*
- * Rely on the above implicit barrier, such that we can
- * ensure that we hold reference to the task before setting
- * q->status. Otherwise we could race with do_exit if the
- * task is awoken by an external event before calling
- * wake_up_process().
- */
- WRITE_ONCE(q->status, error);
+ get_task_struct(q->sleeper);
+
+ /* see SEM_BARRIER_2 for purpuse/pairing */
+ smp_store_release(&q->status, error);
+
+ wake_q_add_safe(wake_q, q->sleeper);
}
static void unlink_queue(struct sem_array *sma, struct sem_queue *q)
@@ -2148,9 +2164,11 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
}
do {
+ /* memory ordering ensured by the lock in sem_lock() */
WRITE_ONCE(queue.status, -EINTR);
queue.sleeper = current;
+ /* memory ordering is ensured by the lock in sem_lock() */
__set_current_state(TASK_INTERRUPTIBLE);
sem_unlock(sma, locknum);
rcu_read_unlock();
@@ -2173,13 +2191,8 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
*/
error = READ_ONCE(queue.status);
if (error != -EINTR) {
- /*
- * User space could assume that semop() is a memory
- * barrier: Without the mb(), the cpu could
- * speculatively read in userspace stale data that was
- * overwritten by the previous owner of the semaphore.
- */
- smp_mb();
+ /* see SEM_BARRIER_2 for purpose/pairing */
+ smp_acquire__after_ctrl_dep();
goto out_free;
}
@@ -2189,6 +2202,9 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
if (!ipc_valid_object(&sma->sem_perm))
goto out_unlock_free;
+ /*
+ * No necessity for any barrier: We are protect by sem_lock()
+ */
error = READ_ONCE(queue.status);
/*
--
2.21.0
^ permalink raw reply related [flat|nested] 10+ messages in thread