All of lore.kernel.org
 help / color / mirror / Atom feed
From: Manfred Spraul <manfred@colorfullife.com>
To: Davidlohr Bueso <dave@stgolabs.net>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-next@vger.kernel.org, 1vier1@web.de,
	felixh@informatik.uni-bremen.de, stable@vger.kernel.org
Subject: Re: [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race
Date: Thu, 23 Jun 2016 21:22:00 +0200	[thread overview]
Message-ID: <b5b4e032-8b1c-f102-ac23-3b331c2ed615@colorfullife.com> (raw)
In-Reply-To: <20160621003015.GA15593@linux-80c1.suse>

On 06/21/2016 02:30 AM, Davidlohr Bueso wrote:
> On Sat, 18 Jun 2016, Manfred Spraul wrote:
>
>> diff --git a/include/linux/sem.h b/include/linux/sem.h
>> index 976ce3a..d0efd6e 100644
>> --- a/include/linux/sem.h
>> +++ b/include/linux/sem.h
>> @@ -21,6 +21,7 @@ struct sem_array {
>>     struct list_head    list_id;    /* undo requests on this array */
>>     int            sem_nsems;    /* no. of semaphores in array */
>>     int            complex_count;    /* pending complex operations */
>
> I see this patch also takes care of complex_count needing READ/WRITE_ONCE
> as you change that to always be done with the big sem_perm lock.
>
>> +    bool            complex_mode;    /* no parallel simple ops */
>
> But I'm wondering about races with this one. Doesn't complex_mode need
> acquire/release semantics?
>
Yes. complex_mode needs acquire/release.

>> };
>>
>
> [...]
>
>> /*
>> - * Wait until all currently ongoing simple ops have completed.
>> + * Enter the mode suitable for non-simple operations:
>>  * Caller must own sem_perm.lock.
>> - * New simple ops cannot start, because simple ops first check
>> - * that sem_perm.lock is free.
>> - * that a) sem_perm.lock is free and b) complex_count is 0.
>>  */
>> -static void sem_wait_array(struct sem_array *sma)
>> +static void complexmode_enter(struct sem_array *sma)
>> {
>>     int i;
>>     struct sem *sem;
>>
>> -    if (sma->complex_count)  {
>> -        /* The thread that increased sma->complex_count waited on
>> -         * all sem->lock locks. Thus we don't need to wait again.
>> -         */
>> +    if (sma->complex_mode)  {
>> +        /* We are already in complex_mode. Nothing to do */
>
> This complex_mode load is serialized because both complexmode_enter() and
> _tryleave(), which are the main calls that modify the variable, are 
> serialized
> by sem_perm lock, right?
>
Exactly. Changes to complex_mode are protected by perm.lock.

>>         return;
>>     }
>
> Btw, I like that this logic is much simpler, just by reading the 
> comments :)
>
>> +    WRITE_ONCE(sma->complex_mode, true);
>> +
>> +    /* We need a full barrier:
>> +     * The write to complex_mode must be visible
>> +     * before we read the first sem->lock spinlock state.
>> +     */
>> +    smp_mb();
>>
Theoretically: smp_store_acquire. but this doesn't exist, so smp_mb()
>>     for (i = 0; i < sma->sem_nsems; i++) {
>>         sem = sma->sem_base + i;
>> @@ -285,6 +294,29 @@ static void sem_wait_array(struct sem_array *sma)
>> }
>>
>> /*
>> + * Try to leave the mode that disallows simple operations:
>> + * Caller must own sem_perm.lock.
>> + */
>> +static void complexmode_tryleave(struct sem_array *sma)
>> +{
>> +    if (sma->complex_count)  {
>> +        /* Complex ops are sleeping.
>> +         * We must stay in complex mode
>> +         */
>> +        return;
>> +    }
>> +    /*
>> +     * Immediately after setting complex_mode to false,
>> +     * a simple op can start. Thus: all memory writes
>> +     * performed by the current operation must be visible
>> +     * before we set complex_mode to false.
>> +     */
>> +    smp_wmb();
>> +
>> +    WRITE_ONCE(sma->complex_mode, false);
>
> smp_store_release()? See below.
>
Yes
>> +}
>> +
>> +/*
>>  * If the request contains only one semaphore operation, and there are
>>  * no complex transactions pending, lock only the semaphore involved.
>>  * Otherwise, lock the entire semaphore array, since we either have
>> @@ -300,56 +332,38 @@ static inline int sem_lock(struct sem_array 
>> *sma, struct sembuf *sops,
>>         /* Complex operation - acquire a full lock */
>>         ipc_lock_object(&sma->sem_perm);
>>
>> -        /* And wait until all simple ops that are processed
>> -         * right now have dropped their locks.
>> -         */
>> -        sem_wait_array(sma);
>> +        /* Prevent parallel simple ops */
>> +        complexmode_enter(sma);
>>         return -1;
>>     }
>>
>>     /*
>>      * Only one semaphore affected - try to optimize locking.
>> -     * The rules are:
>> -     * - optimized locking is possible if no complex operation
>> -     *   is either enqueued or processed right now.
>> -     * - The test for enqueued complex ops is simple:
>> -     *      sma->complex_count != 0
>> -     * - Testing for complex ops that are processed right now is
>> -     *   a bit more difficult. Complex ops acquire the full lock
>> -     *   and first wait that the running simple ops have completed.
>> -     *   (see above)
>> -     *   Thus: If we own a simple lock and the global lock is free
>> -     *    and complex_count is now 0, then it will stay 0 and
>> -     *    thus just locking sem->lock is sufficient.
>> +     * Optimized locking is possible if no complex operation
>> +     * is either enqueued or processed right now.
>> +     *
>> +     * Both facts are tracked by complex_mode.
>>      */
>>     sem = sma->sem_base + sops->sem_num;
>>
>> -    if (sma->complex_count == 0) {
>> +    /*
>> +     * Initial check for complex_mode. Just an optimization,
>> +     * no locking.
>> +     */
>> +    if (!READ_ONCE(sma->complex_mode)) {
>
> We have no lock (which is the whole point), I think we need stronger
> guarantees here to avoid racing with another thread that holds sem_perm
> lock and is entering complexmode for the first time or vice versa with
> tryleave(). An smp_load_acquire here would pair with the suggested
> smp_store_release calls.
>
Yes,you are right.
What I'm not sure yet is if smp_load_acquire() is sufficient:

Thread A:
>        if (!READ_ONCE(sma->complex_mode)) {
The code is test_and_test, no barrier requirements for first test
>                 /*
>                  * It appears that no complex operation is around.
>                  * Acquire the per-semaphore lock.
>                  */
>                 spin_lock(&sem->lock);
>
>                 if (!smp_load_acquire(&sma->complex_mode)) {
>                         /* fast path successful! */
>                         return sops->sem_num;
>                 }
>                 spin_unlock(&sem->lock);
>         }

Thread B:
>        WRITE_ONCE(sma->complex_mode, true);
>
>         /* We need a full barrier:
>          * The write to complex_mode must be visible
>          * before we read the first sem->lock spinlock state.
>          */
>         smp_mb();
>
>         for (i = 0; i < sma->sem_nsems; i++) {
>                 sem = sma->sem_base + i;
>                 spin_unlock_wait(&sem->lock);
>         }

If thread A is allowed to issue read_spinlock;read complex_mode;write 
spinlock, then thread B would not notice that thread A is in the 
critical section

> Thanks,
> Davidlohr


I'll update the patch.


--

     Manfred

  reply	other threads:[~2016-06-23 19:22 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-15  5:23 linux-next: manual merge of the akpm-current tree with the tip tree Stephen Rothwell
2016-06-18 19:39 ` Manfred Spraul
2016-06-18 20:02 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-06-18 20:02   ` Manfred Spraul
2016-06-18 20:02   ` [PATCH 2/2] ipc/sem: sem_lock with hysteresis Manfred Spraul
2016-06-21 20:29     ` Davidlohr Bueso
2016-06-25 17:37       ` Manfred Spraul
2016-06-28 17:54         ` Davidlohr Bueso
2016-06-20 23:04   ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Andrew Morton
2016-06-20 23:04     ` Andrew Morton
2016-06-23 18:55     ` Manfred Spraul
2016-06-21  0:30   ` Davidlohr Bueso
2016-06-23 19:22     ` Manfred Spraul [this message]
2016-06-28  5:27       ` Davidlohr Bueso
2016-06-30 19:28         ` Manfred Spraul
2016-07-01 16:52           ` Davidlohr Bueso
2016-06-25 17:37 [PATCH 0/2] ipc/sem.c: sem_lock fixes Manfred Spraul
2016-06-25 17:37 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-06-25 19:41   ` Holger Hoffstätte
2016-07-13  5:06 [PATCH 0/2] ipc/sem.c: sem_lock fixes Manfred Spraul
2016-07-13  5:06 ` [PATCH 1/2] ipc/sem.c: Fix complex_count vs. simple op race Manfred Spraul
2016-07-16  1:27   ` Davidlohr Bueso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b5b4e032-8b1c-f102-ac23-3b331c2ed615@colorfullife.com \
    --to=manfred@colorfullife.com \
    --cc=1vier1@web.de \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=felixh@informatik.uni-bremen.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=sfr@canb.auug.org.au \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.