* [PATCH 0/2] ipc/sem.c: sem_lock fixes
@ 2016-07-13 5:06 Manfred Spraul
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-07-13 22:05 ` [PATCH 0/2] ipc/sem.c: sem_lock fixes Andrew Morton
0 siblings, 2 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-07-13 5:06 UTC (permalink / raw)
To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso
Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh, Manfred Spraul
Hi Andrew, Hi Peter,
next version of the sem_lock() fixes:
The patches are again vs. tip.
Patch 1 is ready for merging, Patch 2 is for review.
- Patch 1 is the patch as in -next since January
It fixes the race that was found by Felix.
- Patch 2 removes the memory barriers that are part of the qspinlock
code.
- (The hysteresis patch would be patch 3. The risk of regressions
can't be ruled out, thus it must wait for benchmarks from real
workload tests)
Patch 1+2 were one patch, I've split patches so that review and
backporting are simpler.
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-07-13 5:06 [PATCH 0/2] ipc/sem.c: sem_lock fixes Manfred Spraul
@ 2016-07-13 5:06 ` Manfred Spraul
2016-07-13 5:06 ` [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers Manfred Spraul
2016-07-16 1:27 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Davidlohr Bueso
2016-07-13 22:05 ` [PATCH 0/2] ipc/sem.c: sem_lock fixes Andrew Morton
1 sibling, 2 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-07-13 5:06 UTC (permalink / raw)
To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso
Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh,
Manfred Spraul, stable
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait() or that the acquire
within spin_lock() is after setting the lock variable.
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Full memory barrier are required to synchronize changes of
complex_mode and the lock operations.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
include/linux/sem.h | 1 +
ipc/sem.c | 130 +++++++++++++++++++++++++++++++---------------------
2 files changed, 79 insertions(+), 52 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3c..0da63c8 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -260,28 +267,59 @@ static void sem_rcu_free(struct rcu_head *head)
}
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ WRITE_ONCE(sma->complex_mode, true);
+
+ /* We need a full barrier:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_mb();
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
}
+ /*
+ * spin_unlock_wait() is not a memory barriers, it is only a
+ * control barrier. The code must pair with spin_unlock(&sem->lock),
+ * thus just the control barrier is insufficient.
+ *
+ * smp_rmb() is sufficient, as writes cannot pass the control barrier.
+ */
+ smp_rmb();
+}
+
+/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_store_release(&sma->complex_mode, false);
}
/*
@@ -300,56 +338,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
return -1;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking, no memory barrier.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- smp_acquire__after_ctrl_dep();
+ /*
+ * A full barrier is required: the write of sem->lock
+ * must be visible before the read is executed
+ */
+ smp_mb();
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ if (!smp_load_acquire(&sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -369,7 +391,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
return -1;
}
}
@@ -378,6 +400,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -529,6 +552,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2184,10 +2208,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2204,6 +2228,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers.
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
@ 2016-07-13 5:06 ` Manfred Spraul
2016-07-13 16:16 ` Davidlohr Bueso
2016-07-16 1:27 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Davidlohr Bueso
1 sibling, 1 reply; 20+ messages in thread
From: Manfred Spraul @ 2016-07-13 5:06 UTC (permalink / raw)
To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso
Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh, Manfred Spraul
With 2c610022711 (locking/qspinlock: Fix spin_unlock_wait() some more),
memory barriers were added into spin_unlock_wait().
Thus another barrier is not required.
And as explained in 055ce0fd1b8 (locking/qspinlock: Add comments),
spin_lock() provides a barrier so that reads within the critical
section cannot happen before the write for the lock is visible.
i.e. spin_lock provides an acquire barrier after the write of the lock
variable, this barrier pairs with the smp_mb() in complexmode_enter().
Please review!
For x86, the patch is safe. But I don't know enough about all archs
that support SMP.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
---
ipc/sem.c | 14 --------------
1 file changed, 14 deletions(-)
diff --git a/ipc/sem.c b/ipc/sem.c
index 0da63c8..d7b4212 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -291,14 +291,6 @@ static void complexmode_enter(struct sem_array *sma)
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
}
- /*
- * spin_unlock_wait() is not a memory barriers, it is only a
- * control barrier. The code must pair with spin_unlock(&sem->lock),
- * thus just the control barrier is insufficient.
- *
- * smp_rmb() is sufficient, as writes cannot pass the control barrier.
- */
- smp_rmb();
}
/*
@@ -363,12 +355,6 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
*/
spin_lock(&sem->lock);
- /*
- * A full barrier is required: the write of sem->lock
- * must be visible before the read is executed
- */
- smp_mb();
-
if (!smp_load_acquire(&sma->complex_mode)) {
/* fast path successful! */
return sops->sem_num;
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers.
2016-07-13 5:06 ` [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers Manfred Spraul
@ 2016-07-13 16:16 ` Davidlohr Bueso
2016-07-13 18:37 ` Manfred Spraul
0 siblings, 1 reply; 20+ messages in thread
From: Davidlohr Bueso @ 2016-07-13 16:16 UTC (permalink / raw)
To: Manfred Spraul
Cc: H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh
Manfred, shouldn't this patch be part of patch 1 (as you add the unnecessary barriers there?
Iow, can we have a single patch for all this?
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers.
2016-07-13 16:16 ` Davidlohr Bueso
@ 2016-07-13 18:37 ` Manfred Spraul
0 siblings, 0 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-07-13 18:37 UTC (permalink / raw)
To: Davidlohr Bueso, Michael Ellerman
Cc: H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh
Hi Davidlohr,
On 07/13/2016 06:16 PM, Davidlohr Bueso wrote:
> Manfred, shouldn't this patch be part of patch 1 (as you add the
> unnecessary barriers there? Iow, can we have a single patch for all this?
>
Two reasons:
- patch 1 is safe for backporting, patch 2 not.
- patch 1 is safe on all architectures, for patch 2 I would like to get
some review feedback.
e.g. I just found/read 51d7d5205d33 ("powerpc: Add smp_mb() to
arch_spin_is_locked()"):
For powerpc, a smp_mb() was added into spin_is_locked(), more or less
for ipc/sem.c.
Patch 1 replaces the spin_is_locked() with smp_load_acquire().
Isn't that the proof that smp_mb() is required?
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/2] ipc/sem.c: sem_lock fixes
2016-07-13 5:06 [PATCH 0/2] ipc/sem.c: sem_lock fixes Manfred Spraul
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
@ 2016-07-13 22:05 ` Andrew Morton
2016-07-14 16:40 ` Manfred Spraul
1 sibling, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2016-07-13 22:05 UTC (permalink / raw)
To: Manfred Spraul
Cc: H. Peter Anvin, Peter Zijlstra, Davidlohr Bueso, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh
On Wed, 13 Jul 2016 07:06:50 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
> Hi Andrew, Hi Peter,
>
> next version of the sem_lock() fixes:
> The patches are again vs. tip.
>
> Patch 1 is ready for merging, Patch 2 is for review.
>
> - Patch 1 is the patch as in -next since January
> It fixes the race that was found by Felix.
> - Patch 2 removes the memory barriers that are part of the qspinlock
> code.
> - (The hysteresis patch would be patch 3. The risk of regressions
> can't be ruled out, thus it must wait for benchmarks from real
> workload tests)
I think you're saying that if these two patches cause performance
regressions, we will need ipc-sem-sem_lock-with-hysteresis.patch?
Is that even necessary? If your testing shows that
ipc-sem-sem_lock-with-hysteresis.patch makes things faster then in it
goes, surely?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 0/2] ipc/sem.c: sem_lock fixes
2016-07-13 22:05 ` [PATCH 0/2] ipc/sem.c: sem_lock fixes Andrew Morton
@ 2016-07-14 16:40 ` Manfred Spraul
0 siblings, 0 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-07-14 16:40 UTC (permalink / raw)
To: Andrew Morton
Cc: H. Peter Anvin, Peter Zijlstra, Davidlohr Bueso, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh
Hi Andrew,
On 07/14/2016 12:05 AM, Andrew Morton wrote:
> On Wed, 13 Jul 2016 07:06:50 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
>
>> Hi Andrew, Hi Peter,
>>
>> next version of the sem_lock() fixes:
>> The patches are again vs. tip.
>>
>> Patch 1 is ready for merging, Patch 2 is for review.
>>
>> - Patch 1 is the patch as in -next since January
>> It fixes the race that was found by Felix.
>> - Patch 2 removes the memory barriers that are part of the qspinlock
>> code.
>> - (The hysteresis patch would be patch 3. The risk of regressions
>> can't be ruled out, thus it must wait for benchmarks from real
>> workload tests)
> I think you're saying that if these two patches cause performance
> regressions, we will need ipc-sem-sem_lock-with-hysteresis.patch?
No, the two patches will not cause any regressions.
Commit 6062a8dc051 ("ipc,sem: fine grained locking for semtimedop") was
a huge scalability improvement for 99% of the users, but introduced a
regression for one workload.
Noone complained about it, so the workload must be rare.
Patch 3 now fixes the regression.
But we live with the introduced regression for 3 years, so give me (and
Davidlohr, and anyone else who wants to support) some more time.
Also on my TODO list: The queue merge/unmerge logic, i.e. commit
f269f40ad5ae ("ipc/sem.c: always use only one queue for alter
operations") might also win from some hysteresis.
My proposal:
- patch 1: merge towards main tree.
- patch 2: needs update. The barrier in sem_lock() is required. I'm not
yet sure about complexmode_enter(), perhaps with #ifndef
CONFIG_QUEUED_SPINLOCKS
- patch 3 should wait.
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-07-13 5:06 ` [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers Manfred Spraul
@ 2016-07-16 1:27 ` Davidlohr Bueso
1 sibling, 0 replies; 20+ messages in thread
From: Davidlohr Bueso @ 2016-07-16 1:27 UTC (permalink / raw)
To: Manfred Spraul
Cc: H. Peter Anvin, Peter Zijlstra, Andrew Morton, LKML,
Thomas Gleixner, Ingo Molnar, 1vier1, felixh, stable
On Wed, 13 Jul 2016, Manfred Spraul wrote:
>-static void sem_wait_array(struct sem_array *sma)
>+static void complexmode_enter(struct sem_array *sma)
> {
> int i;
> struct sem *sem;
>
>- if (sma->complex_count) {
>- /* The thread that increased sma->complex_count waited on
>- * all sem->lock locks. Thus we don't need to wait again.
>- */
>+ if (sma->complex_mode) {
>+ /* We are already in complex_mode. Nothing to do */
> return;
> }
>+ WRITE_ONCE(sma->complex_mode, true);
So we can actually save those READ/WRITE_ONCE calls for complex_mode as it's
a bool and therefore tearing is not an issue.
>+
>+ /* We need a full barrier:
>+ * The write to complex_mode must be visible
>+ * before we read the first sem->lock spinlock state.
>+ */
>+ smp_mb();
smp_store_mb()?
> /*
>@@ -300,56 +338,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
> /* Complex operation - acquire a full lock */
> ipc_lock_object(&sma->sem_perm);
>
>- /* And wait until all simple ops that are processed
>- * right now have dropped their locks.
>- */
>- sem_wait_array(sma);
>+ /* Prevent parallel simple ops */
>+ complexmode_enter(sma);
> return -1;
nit and unrelated: we should probably use some better label here than a raw
-1 (although I don't see it changing, just for nicer reading), ie: SEM_OBJECT_LOCKED
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-30 19:28 ` Manfred Spraul
@ 2016-07-01 16:52 ` Davidlohr Bueso
0 siblings, 0 replies; 20+ messages in thread
From: Davidlohr Bueso @ 2016-07-01 16:52 UTC (permalink / raw)
To: Manfred Spraul
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh,
stable
On Thu, 30 Jun 2016, Manfred Spraul wrote:
>On 06/28/2016 07:27 AM, Davidlohr Bueso wrote:
>If I understand it right, it means that spin_lock() is both an acquire
>and a release - for qspinlocks.
I wouldn't say that: the _Q_LOCKED_VAL stores are still unordered inside
the acquire region but no further than the the unlock store-release; which
is fine for regular mutual exclusion.
>It this valid for all spinlock implementations, for all architectures?
>Otherwise: How can I detect in generic code if I can rely on a release
>inside spin_lock()?
With ticket locks this was never an issue as special lock readers
(spin_unlock_wait(), spin_is_locked()) will always see the lock taken as
they observe waiters by adding itself to the tail -- something the above
commit mimics and piggy backs on to save expensive smp_rmb().
In addition, see 726328d92a4 (locking/spinlock, arch: Update and fix
spin_unlock_wait() implementations).
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-28 5:27 ` Davidlohr Bueso
@ 2016-06-30 19:28 ` Manfred Spraul
2016-07-01 16:52 ` Davidlohr Bueso
0 siblings, 1 reply; 20+ messages in thread
From: Manfred Spraul @ 2016-06-30 19:28 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh,
stable
On 06/28/2016 07:27 AM, Davidlohr Bueso wrote:
> On Thu, 23 Jun 2016, Manfred Spraul wrote:
>
>> What I'm not sure yet is if smp_load_acquire() is sufficient:
>>
>> Thread A:
>>> if (!READ_ONCE(sma->complex_mode)) {
>> The code is test_and_test, no barrier requirements for first test
>
> Yeah, it would just make us take the big lock unnecessarily, nothing
> fatal
> and I agree its probably worth the optimization. It still might be worth
> commenting.
>
I'll extend the comment: "no locking and no memory barrier"
>>> /*
>>> * It appears that no complex operation is around.
>>> * Acquire the per-semaphore lock.
>>> */
>>> spin_lock(&sem->lock);
>>>
>>> if (!smp_load_acquire(&sma->complex_mode)) {
>>> /* fast path successful! */
>>> return sops->sem_num;
>>> }
>>> spin_unlock(&sem->lock);
>>> }
>>
>> Thread B:
>>> WRITE_ONCE(sma->complex_mode, true);
>>>
>>> /* We need a full barrier:
>>> * The write to complex_mode must be visible
>>> * before we read the first sem->lock spinlock state.
>>> */
>>> smp_mb();
>>>
>>> for (i = 0; i < sma->sem_nsems; i++) {
>>> sem = sma->sem_base + i;
>>> spin_unlock_wait(&sem->lock);
>>> }
>>
>> If thread A is allowed to issue read_spinlock;read complex_mode;write
>> spinlock, then thread B would not notice that thread A is in the
>> critical section
>
> Are you referring to the sem->lock word not being visibly locked
> before we
> read complex_mode (for the second time)? This issue was fixed in
> 2c610022711
> (locking/qspinlock: Fix spin_unlock_wait() some more). So
> smp_load_acquire
> should be enough afaict, or are you referring to something else?
>
You are right, I didn't read this patch fully.
If I understand it right, it means that spin_lock() is both an acquire
and a release - for qspinlocks.
It this valid for all spinlock implementations, for all architectures?
Otherwise: How can I detect in generic code if I can rely on a release
inside spin_lock()?
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-23 19:22 ` Manfred Spraul
@ 2016-06-28 5:27 ` Davidlohr Bueso
2016-06-30 19:28 ` Manfred Spraul
0 siblings, 1 reply; 20+ messages in thread
From: Davidlohr Bueso @ 2016-06-28 5:27 UTC (permalink / raw)
To: Manfred Spraul
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh,
stable
On Thu, 23 Jun 2016, Manfred Spraul wrote:
>What I'm not sure yet is if smp_load_acquire() is sufficient:
>
>Thread A:
>> if (!READ_ONCE(sma->complex_mode)) {
>The code is test_and_test, no barrier requirements for first test
Yeah, it would just make us take the big lock unnecessarily, nothing fatal
and I agree its probably worth the optimization. It still might be worth
commenting.
>> /*
>> * It appears that no complex operation is around.
>> * Acquire the per-semaphore lock.
>> */
>> spin_lock(&sem->lock);
>>
>> if (!smp_load_acquire(&sma->complex_mode)) {
>> /* fast path successful! */
>> return sops->sem_num;
>> }
>> spin_unlock(&sem->lock);
>> }
>
>Thread B:
>> WRITE_ONCE(sma->complex_mode, true);
>>
>> /* We need a full barrier:
>> * The write to complex_mode must be visible
>> * before we read the first sem->lock spinlock state.
>> */
>> smp_mb();
>>
>> for (i = 0; i < sma->sem_nsems; i++) {
>> sem = sma->sem_base + i;
>> spin_unlock_wait(&sem->lock);
>> }
>
>If thread A is allowed to issue read_spinlock;read complex_mode;write
>spinlock, then thread B would not notice that thread A is in the
>critical section
Are you referring to the sem->lock word not being visibly locked before we
read complex_mode (for the second time)? This issue was fixed in 2c610022711
(locking/qspinlock: Fix spin_unlock_wait() some more). So smp_load_acquire
should be enough afaict, or are you referring to something else?
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-25 17:37 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
@ 2016-06-25 19:41 ` Holger Hoffstätte
0 siblings, 0 replies; 20+ messages in thread
From: Holger Hoffstätte @ 2016-06-25 19:41 UTC (permalink / raw)
To: stable; +Cc: linux-kernel
On Sat, 25 Jun 2016 19:37:51 +0200, Manfred Spraul wrote:
> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
> race:
>
> sem_lock has a fast path that allows parallel simple operations.
> There are two reasons why a simple operation cannot run in parallel:
> - a non-simple operations is ongoing (sma->sem_perm.lock held)
> - a complex operation is sleeping (sma->complex_count != 0)
>
> As both facts are stored independently, a thread can bypass the current
> checks by sleeping in the right positions. See below for more details
> (or kernel bugzilla 105651).
>
> The patch fixes that by creating one variable (complex_mode)
> that tracks both reasons why parallel operations are not possible.
>
> The patch also updates stale documentation regarding the locking.
>
> With regards to stable kernels:
> The patch is required for all kernels that include the
> commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
>
> The alternative is to revert the patch that introduced the race.
>
> Background:
> Here is the race of the current implementation:
>
> Thread A: (simple op)
> - does the first "sma->complex_count == 0" test
>
> Thread B: (complex op)
> - does sem_lock(): This includes an array scan. But the scan can't
> find Thread A, because Thread A does not own sem->lock yet.
> - the thread does the operation, increases complex_count,
> drops sem_lock, sleeps
>
> Thread A:
> - spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
> - sleeps before the complex_count test
>
> Thread C: (complex op)
> - does sem_lock (no array scan, complex_count==1)
> - wakes up Thread B.
> - decrements complex_count
>
> Thread A:
> - does the complex_count test
>
> Bug:
> Now both thread A and thread C operate on the same array, without
> any synchronization.
>
> Full memory barrier are required to synchronize changes of
> complex_mode and the lock operations.
>
> Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
> Reported-by: felixh@informatik.uni-bremen.de
> Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
> Cc: <stable@vger.kernel.org>
<snip>
>
> - /* Then check that the global lock is free */
> - if (!spin_is_locked(&sma->sem_perm.lock)) {
> - /*
> - * We need a memory barrier with acquire semantics,
> - * otherwise we can race with another thread that does:
> - * complex_count++;
> - * spin_unlock(sem_perm.lock);
> - */
> - smp_acquire__after_ctrl_dep();
This won't apply to -stable because smp_acquire__after_ctrl_dep() was only
recently added. I could merge this over 4.4.x by replacing it with the previous
definition ipc_smp_acquire__after_spin_is_unlocked() (just as you did in the
first version of the patch).
hth,
Holger
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-25 17:37 Manfred Spraul
@ 2016-06-25 17:37 ` Manfred Spraul
2016-06-25 19:41 ` Holger Hoffstätte
0 siblings, 1 reply; 20+ messages in thread
From: Manfred Spraul @ 2016-06-25 17:37 UTC (permalink / raw)
To: H. Peter Anvin, Peter Zijlstra, Andrew Morton, Davidlohr Bueso
Cc: LKML, Thomas Gleixner, Ingo Molnar, 1vier1, felixh,
Manfred Spraul, stable
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Full memory barrier are required to synchronize changes of
complex_mode and the lock operations.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
include/linux/sem.h | 1 +
ipc/sem.c | 122 ++++++++++++++++++++++++++++++----------------------
2 files changed, 71 insertions(+), 52 deletions(-)
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3c..538f43a 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -260,23 +267,25 @@ static void sem_rcu_free(struct rcu_head *head)
}
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ WRITE_ONCE(sma->complex_mode, true);
+
+ /* We need a full barrier:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_mb();
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
@@ -285,6 +294,27 @@ static void sem_wait_array(struct sem_array *sma)
}
/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_store_release(&sma->complex_mode, false);
+}
+
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
@@ -300,56 +330,40 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
return -1;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- smp_acquire__after_ctrl_dep();
+ /*
+ * A full barrier is required: the write of sem->lock
+ * must be visible before the read is executed
+ */
+ smp_mb();
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ if (!READ_ONCE(sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -369,7 +383,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
return -1;
}
}
@@ -378,6 +392,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -529,6 +544,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2184,10 +2200,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2204,6 +2220,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
--
2.5.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-21 0:30 ` Davidlohr Bueso
@ 2016-06-23 19:22 ` Manfred Spraul
2016-06-28 5:27 ` Davidlohr Bueso
0 siblings, 1 reply; 20+ messages in thread
From: Manfred Spraul @ 2016-06-23 19:22 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh,
stable
On 06/21/2016 02:30 AM, Davidlohr Bueso wrote:
> On Sat, 18 Jun 2016, Manfred Spraul wrote:
>
>> diff --git a/include/linux/sem.h b/include/linux/sem.h
>> index 976ce3a..d0efd6e 100644
>> --- a/include/linux/sem.h
>> +++ b/include/linux/sem.h
>> @@ -21,6 +21,7 @@ struct sem_array {
>> struct list_head list_id; /* undo requests on this array */
>> int sem_nsems; /* no. of semaphores in array */
>> int complex_count; /* pending complex operations */
>
> I see this patch also takes care of complex_count needing READ/WRITE_ONCE
> as you change that to always be done with the big sem_perm lock.
>
>> + bool complex_mode; /* no parallel simple ops */
>
> But I'm wondering about races with this one. Doesn't complex_mode need
> acquire/release semantics?
>
Yes. complex_mode needs acquire/release.
>> };
>>
>
> [...]
>
>> /*
>> - * Wait until all currently ongoing simple ops have completed.
>> + * Enter the mode suitable for non-simple operations:
>> * Caller must own sem_perm.lock.
>> - * New simple ops cannot start, because simple ops first check
>> - * that sem_perm.lock is free.
>> - * that a) sem_perm.lock is free and b) complex_count is 0.
>> */
>> -static void sem_wait_array(struct sem_array *sma)
>> +static void complexmode_enter(struct sem_array *sma)
>> {
>> int i;
>> struct sem *sem;
>>
>> - if (sma->complex_count) {
>> - /* The thread that increased sma->complex_count waited on
>> - * all sem->lock locks. Thus we don't need to wait again.
>> - */
>> + if (sma->complex_mode) {
>> + /* We are already in complex_mode. Nothing to do */
>
> This complex_mode load is serialized because both complexmode_enter() and
> _tryleave(), which are the main calls that modify the variable, are
> serialized
> by sem_perm lock, right?
>
Exactly. Changes to complex_mode are protected by perm.lock.
>> return;
>> }
>
> Btw, I like that this logic is much simpler, just by reading the
> comments :)
>
>> + WRITE_ONCE(sma->complex_mode, true);
>> +
>> + /* We need a full barrier:
>> + * The write to complex_mode must be visible
>> + * before we read the first sem->lock spinlock state.
>> + */
>> + smp_mb();
>>
Theoretically: smp_store_acquire. but this doesn't exist, so smp_mb()
>> for (i = 0; i < sma->sem_nsems; i++) {
>> sem = sma->sem_base + i;
>> @@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma)
>> }
>>
>> /*
>> + * Try to leave the mode that disallows simple operations:
>> + * Caller must own sem_perm.lock.
>> + */
>> +static void complexmode_tryleave(struct sem_array *sma)
>> +{
>> + if (sma->complex_count) {
>> + /* Complex ops are sleeping.
>> + * We must stay in complex mode
>> + */
>> + return;
>> + }
>> + /*
>> + * Immediately after setting complex_mode to false,
>> + * a simple op can start. Thus: all memory writes
>> + * performed by the current operation must be visible
>> + * before we set complex_mode to false.
>> + */
>> + smp_wmb();
>> +
>> + WRITE_ONCE(sma->complex_mode, false);
>
> smp_store_release()? See below.
>
Yes
>> +}
>> +
>> +/*
>> * If the request contains only one semaphore operation, and there are
>> * no complex transactions pending, lock only the semaphore involved.
>> * Otherwise, lock the entire semaphore array, since we either have
>> @@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array
>> *sma, struct sembuf *sops,
>> /* Complex operation - acquire a full lock */
>> ipc_lock_object(&sma->sem_perm);
>>
>> - /* And wait until all simple ops that are processed
>> - * right now have dropped their locks.
>> - */
>> - sem_wait_array(sma);
>> + /* Prevent parallel simple ops */
>> + complexmode_enter(sma);
>> return -1;
>> }
>>
>> /*
>> * Only one semaphore affected - try to optimize locking.
>> - * The rules are:
>> - * - optimized locking is possible if no complex operation
>> - * is either enqueued or processed right now.
>> - * - The test for enqueued complex ops is simple:
>> - * sma->complex_count != 0
>> - * - Testing for complex ops that are processed right now is
>> - * a bit more difficult. Complex ops acquire the full lock
>> - * and first wait that the running simple ops have completed.
>> - * (see above)
>> - * Thus: If we own a simple lock and the global lock is free
>> - * and complex_count is now 0, then it will stay 0 and
>> - * thus just locking sem->lock is sufficient.
>> + * Optimized locking is possible if no complex operation
>> + * is either enqueued or processed right now.
>> + *
>> + * Both facts are tracked by complex_mode.
>> */
>> sem = sma->sem_base + sops->sem_num;
>>
>> - if (sma->complex_count == 0) {
>> + /*
>> + * Initial check for complex_mode. Just an optimization,
>> + * no locking.
>> + */
>> + if (!READ_ONCE(sma->complex_mode)) {
>
> We have no lock (which is the whole point), I think we need stronger
> guarantees here to avoid racing with another thread that holds sem_perm
> lock and is entering complexmode for the first time or vice versa with
> tryleave(). An smp_load_acquire here would pair with the suggested
> smp_store_release calls.
>
Yes,you are right.
What I'm not sure yet is if smp_load_acquire() is sufficient:
Thread A:
> if (!READ_ONCE(sma->complex_mode)) {
The code is test_and_test, no barrier requirements for first test
> /*
> * It appears that no complex operation is around.
> * Acquire the per-semaphore lock.
> */
> spin_lock(&sem->lock);
>
> if (!smp_load_acquire(&sma->complex_mode)) {
> /* fast path successful! */
> return sops->sem_num;
> }
> spin_unlock(&sem->lock);
> }
Thread B:
> WRITE_ONCE(sma->complex_mode, true);
>
> /* We need a full barrier:
> * The write to complex_mode must be visible
> * before we read the first sem->lock spinlock state.
> */
> smp_mb();
>
> for (i = 0; i < sma->sem_nsems; i++) {
> sem = sma->sem_base + i;
> spin_unlock_wait(&sem->lock);
> }
If thread A is allowed to issue read_spinlock;read complex_mode;write
spinlock, then thread B would not notice that thread A is in the
critical section
> Thanks,
> Davidlohr
I'll update the patch.
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-20 23:04 ` Andrew Morton
(?)
@ 2016-06-23 18:55 ` Manfred Spraul
-1 siblings, 0 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-06-23 18:55 UTC (permalink / raw)
To: Andrew Morton
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, LKML, linux-next, 1vier1, Davidlohr Bueso,
felixh, stable
On 06/21/2016 01:04 AM, Andrew Morton wrote:
> On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
>
>> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
>>
>> sem_lock has a fast path that allows parallel simple operations.
>> There are two reasons why a simple operation cannot run in parallel:
>> - a non-simple operations is ongoing (sma->sem_perm.lock held)
>> - a complex operation is sleeping (sma->complex_count != 0)
>>
>> As both facts are stored independently, a thread can bypass the current
>> checks by sleeping in the right positions. See below for more details
>> (or kernel bugzilla 105651).
>>
>> The patch fixes that by creating one variable (complex_mode)
>> that tracks both reasons why parallel operations are not possible.
>>
>> The patch also updates stale documentation regarding the locking.
>>
>> With regards to stable kernels:
>> The patch is required for all kernels that include the commit 6d07b68ce16a
>> ("ipc/sem.c: optimize sem_lock()") (3.10?)
> I've had this in -mm (and -next) since January 4, without issues. I
> put it on hold because Davidlohr expressed concern about performance
> regressions.
I had several ideas how to fix it. The initial ideas probably had
performance issue.
The current one doesn't have any issues. It just took longer than
expected to test it.
> Your [2/2] should prevent those regressions (yes?) so I assume that any
> kernel which has [1/2] really should have [2/2] as well. But without
> any quantitative information, this is all mad guesswork.
>
> What to do?
[2/2] is an improvement, it handles one case better than the current code.
If you want:
3.10 improved scalability, but it introduced a performance regression
for one use case.
[with 3.10, simple ops got parallel, but complex ops had to perform a
"for_each_sem() {spin_unlock_wait()}"]
The patch fixes that.
--
Manfred
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-18 20:02 ` Manfred Spraul
(?)
(?)
@ 2016-06-21 0:30 ` Davidlohr Bueso
2016-06-23 19:22 ` Manfred Spraul
-1 siblings, 1 reply; 20+ messages in thread
From: Davidlohr Bueso @ 2016-06-21 0:30 UTC (permalink / raw)
To: Manfred Spraul
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton, LKML, linux-next, 1vier1, felixh,
stable
On Sat, 18 Jun 2016, Manfred Spraul wrote:
>diff --git a/include/linux/sem.h b/include/linux/sem.h
>index 976ce3a..d0efd6e 100644
>--- a/include/linux/sem.h
>+++ b/include/linux/sem.h
>@@ -21,6 +21,7 @@ struct sem_array {
> struct list_head list_id; /* undo requests on this array */
> int sem_nsems; /* no. of semaphores in array */
> int complex_count; /* pending complex operations */
I see this patch also takes care of complex_count needing READ/WRITE_ONCE
as you change that to always be done with the big sem_perm lock.
>+ bool complex_mode; /* no parallel simple ops */
But I'm wondering about races with this one. Doesn't complex_mode need
acquire/release semantics?
> };
>
[...]
> /*
>- * Wait until all currently ongoing simple ops have completed.
>+ * Enter the mode suitable for non-simple operations:
> * Caller must own sem_perm.lock.
>- * New simple ops cannot start, because simple ops first check
>- * that sem_perm.lock is free.
>- * that a) sem_perm.lock is free and b) complex_count is 0.
> */
>-static void sem_wait_array(struct sem_array *sma)
>+static void complexmode_enter(struct sem_array *sma)
> {
> int i;
> struct sem *sem;
>
>- if (sma->complex_count) {
>- /* The thread that increased sma->complex_count waited on
>- * all sem->lock locks. Thus we don't need to wait again.
>- */
>+ if (sma->complex_mode) {
>+ /* We are already in complex_mode. Nothing to do */
This complex_mode load is serialized because both complexmode_enter() and
_tryleave(), which are the main calls that modify the variable, are serialized
by sem_perm lock, right?
> return;
> }
Btw, I like that this logic is much simpler, just by reading the comments :)
>+ WRITE_ONCE(sma->complex_mode, true);
>+
>+ /* We need a full barrier:
>+ * The write to complex_mode must be visible
>+ * before we read the first sem->lock spinlock state.
>+ */
>+ smp_mb();
>
> for (i = 0; i < sma->sem_nsems; i++) {
> sem = sma->sem_base + i;
>@@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma)
> }
>
> /*
>+ * Try to leave the mode that disallows simple operations:
>+ * Caller must own sem_perm.lock.
>+ */
>+static void complexmode_tryleave(struct sem_array *sma)
>+{
>+ if (sma->complex_count) {
>+ /* Complex ops are sleeping.
>+ * We must stay in complex mode
>+ */
>+ return;
>+ }
>+ /*
>+ * Immediately after setting complex_mode to false,
>+ * a simple op can start. Thus: all memory writes
>+ * performed by the current operation must be visible
>+ * before we set complex_mode to false.
>+ */
>+ smp_wmb();
>+
>+ WRITE_ONCE(sma->complex_mode, false);
smp_store_release()? See below.
>+}
>+
>+/*
> * If the request contains only one semaphore operation, and there are
> * no complex transactions pending, lock only the semaphore involved.
> * Otherwise, lock the entire semaphore array, since we either have
>@@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
> /* Complex operation - acquire a full lock */
> ipc_lock_object(&sma->sem_perm);
>
>- /* And wait until all simple ops that are processed
>- * right now have dropped their locks.
>- */
>- sem_wait_array(sma);
>+ /* Prevent parallel simple ops */
>+ complexmode_enter(sma);
> return -1;
> }
>
> /*
> * Only one semaphore affected - try to optimize locking.
>- * The rules are:
>- * - optimized locking is possible if no complex operation
>- * is either enqueued or processed right now.
>- * - The test for enqueued complex ops is simple:
>- * sma->complex_count != 0
>- * - Testing for complex ops that are processed right now is
>- * a bit more difficult. Complex ops acquire the full lock
>- * and first wait that the running simple ops have completed.
>- * (see above)
>- * Thus: If we own a simple lock and the global lock is free
>- * and complex_count is now 0, then it will stay 0 and
>- * thus just locking sem->lock is sufficient.
>+ * Optimized locking is possible if no complex operation
>+ * is either enqueued or processed right now.
>+ *
>+ * Both facts are tracked by complex_mode.
> */
> sem = sma->sem_base + sops->sem_num;
>
>- if (sma->complex_count == 0) {
>+ /*
>+ * Initial check for complex_mode. Just an optimization,
>+ * no locking.
>+ */
>+ if (!READ_ONCE(sma->complex_mode)) {
We have no lock (which is the whole point), I think we need stronger
guarantees here to avoid racing with another thread that holds sem_perm
lock and is entering complexmode for the first time or vice versa with
tryleave(). An smp_load_acquire here would pair with the suggested
smp_store_release calls.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-18 20:02 ` Manfred Spraul
@ 2016-06-20 23:04 ` Andrew Morton
-1 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2016-06-20 23:04 UTC (permalink / raw)
To: Manfred Spraul
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, LKML, linux-next, 1vier1, Davidlohr Bueso,
felixh, stable
On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
>
> sem_lock has a fast path that allows parallel simple operations.
> There are two reasons why a simple operation cannot run in parallel:
> - a non-simple operations is ongoing (sma->sem_perm.lock held)
> - a complex operation is sleeping (sma->complex_count != 0)
>
> As both facts are stored independently, a thread can bypass the current
> checks by sleeping in the right positions. See below for more details
> (or kernel bugzilla 105651).
>
> The patch fixes that by creating one variable (complex_mode)
> that tracks both reasons why parallel operations are not possible.
>
> The patch also updates stale documentation regarding the locking.
>
> With regards to stable kernels:
> The patch is required for all kernels that include the commit 6d07b68ce16a
> ("ipc/sem.c: optimize sem_lock()") (3.10?)
I've had this in -mm (and -next) since January 4, without issues. I
put it on hold because Davidlohr expressed concern about performance
regressions.
Your [2/2] should prevent those regressions (yes?) so I assume that any
kernel which has [1/2] really should have [2/2] as well. But without
any quantitative information, this is all mad guesswork.
What to do?
(The [2/2] changelog should explain that it is the cure to [1/2]'s
regressions, btw).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
@ 2016-06-20 23:04 ` Andrew Morton
0 siblings, 0 replies; 20+ messages in thread
From: Andrew Morton @ 2016-06-20 23:04 UTC (permalink / raw)
To: Manfred Spraul
Cc: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, LKML, linux-next, 1vier1, Davidlohr Bueso,
felixh, stable
On Sat, 18 Jun 2016 22:02:21 +0200 Manfred Spraul <manfred@colorfullife.com> wrote:
> Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
>
> sem_lock has a fast path that allows parallel simple operations.
> There are two reasons why a simple operation cannot run in parallel:
> - a non-simple operations is ongoing (sma->sem_perm.lock held)
> - a complex operation is sleeping (sma->complex_count != 0)
>
> As both facts are stored independently, a thread can bypass the current
> checks by sleeping in the right positions. See below for more details
> (or kernel bugzilla 105651).
>
> The patch fixes that by creating one variable (complex_mode)
> that tracks both reasons why parallel operations are not possible.
>
> The patch also updates stale documentation regarding the locking.
>
> With regards to stable kernels:
> The patch is required for all kernels that include the commit 6d07b68ce16a
> ("ipc/sem.c: optimize sem_lock()") (3.10?)
I've had this in -mm (and -next) since January 4, without issues. I
put it on hold because Davidlohr expressed concern about performance
regressions.
Your [2/2] should prevent those regressions (yes?) so I assume that any
kernel which has [1/2] really should have [2/2] as well. But without
any quantitative information, this is all mad guesswork.
What to do?
(The [2/2] changelog should explain that it is the cure to [1/2]'s
regressions, btw).
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
2016-06-15 5:23 linux-next: manual merge of the akpm-current tree with the tip tree Stephen Rothwell
@ 2016-06-18 20:02 ` Manfred Spraul
0 siblings, 0 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-06-18 20:02 UTC (permalink / raw)
To: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton
Cc: LKML, linux-next, 1vier1, Davidlohr Bueso, felixh,
Manfred Spraul, stable
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the commit 6d07b68ce16a
("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3c..db2e6fc 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -260,23 +267,25 @@ static void sem_rcu_free(struct rcu_head *head)
}
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ WRITE_ONCE(sma->complex_mode, true);
+
+ /* We need a full barrier:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_mb();
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
@@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma)
}
/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_wmb();
+
+ WRITE_ONCE(sma->complex_mode, false);
+}
+
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
@@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
return -1;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- smp_acquire__after_ctrl_dep();
-
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ /* Now repeat the test for complex_mode.
+ * A memory barrier is provided by the spin_lock()
+ * above.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -369,7 +383,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
return -1;
}
}
@@ -378,6 +392,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -529,6 +544,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2184,10 +2200,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2204,6 +2220,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
@ 2016-06-18 20:02 ` Manfred Spraul
0 siblings, 0 replies; 20+ messages in thread
From: Manfred Spraul @ 2016-06-18 20:02 UTC (permalink / raw)
To: Stephen Rothwell, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
Peter Zijlstra, Andrew Morton
Cc: LKML, linux-next, 1vier1, Davidlohr Bueso, felixh,
Manfred Spraul, stable
Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a race:
sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)
As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).
The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.
The patch also updates stale documentation regarding the locking.
With regards to stable kernels:
The patch is required for all kernels that include the commit 6d07b68ce16a
("ipc/sem.c: optimize sem_lock()") (3.10?)
The alternative is to revert the patch that introduced the race.
Background:
Here is the race of the current implementation:
Thread A: (simple op)
- does the first "sma->complex_count == 0" test
Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps
Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test
Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count
Thread A:
- does the complex_count test
Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.
Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Reported-by: felixh@informatik.uni-bremen.de
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
---
diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};
#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index ae72b3c..db2e6fc 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -162,14 +162,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);
/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/
#define sc_semmsl sem_ctls[0]
@@ -260,23 +267,25 @@ static void sem_rcu_free(struct rcu_head *head)
}
/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;
- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}
+ WRITE_ONCE(sma->complex_mode, true);
+
+ /* We need a full barrier:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ smp_mb();
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
@@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma)
}
/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_wmb();
+
+ WRITE_ONCE(sma->complex_mode, false);
+}
+
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
@@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);
- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
return -1;
}
/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;
- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);
- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- smp_acquire__after_ctrl_dep();
-
- /*
- * Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ /* Now repeat the test for complex_mode.
+ * A memory barrier is provided by the spin_lock()
+ * above.
+ */
+ if (!READ_ONCE(sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -369,7 +383,7 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
return -1;
}
}
@@ -378,6 +392,7 @@ static inline void sem_unlock(struct sem_array *sma, int locknum)
{
if (locknum == -1) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -529,6 +544,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}
sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2184,10 +2200,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);
sem_otime = get_semotime(sma);
@@ -2204,6 +2220,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);
+ complexmode_tryleave(sma);
+
return 0;
}
#endif
^ permalink raw reply related [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-07-16 1:27 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-13 5:06 [PATCH 0/2] ipc/sem.c: sem_lock fixes Manfred Spraul
2016-07-13 5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-07-13 5:06 ` [PATCH 2/2] ipc/sem.c: Remove duplicated memory barriers Manfred Spraul
2016-07-13 16:16 ` Davidlohr Bueso
2016-07-13 18:37 ` Manfred Spraul
2016-07-16 1:27 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Davidlohr Bueso
2016-07-13 22:05 ` [PATCH 0/2] ipc/sem.c: sem_lock fixes Andrew Morton
2016-07-14 16:40 ` Manfred Spraul
-- strict thread matches above, loose matches on Subject: below --
2016-06-25 17:37 Manfred Spraul
2016-06-25 17:37 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-06-25 19:41 ` Holger Hoffstätte
2016-06-15 5:23 linux-next: manual merge of the akpm-current tree with the tip tree Stephen Rothwell
2016-06-18 20:02 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-06-18 20:02 ` Manfred Spraul
2016-06-20 23:04 ` Andrew Morton
2016-06-20 23:04 ` Andrew Morton
2016-06-23 18:55 ` Manfred Spraul
2016-06-21 0:30 ` Davidlohr Bueso
2016-06-23 19:22 ` Manfred Spraul
2016-06-28 5:27 ` Davidlohr Bueso
2016-06-30 19:28 ` Manfred Spraul
2016-07-01 16:52 ` Davidlohr Bueso
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.