All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <longman@redhat.com>, Ingo Molnar <mingo@redhat.com>,
	Will Deacon <will.deacon@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, x86@kernel.org,
	Davidlohr Bueso <dave@stgolabs.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	huang ying <huang.ying.caritas@gmail.com>
Subject: Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative
Date: Tue, 23 Apr 2019 10:31:42 -0400	[thread overview]
Message-ID: <a2baa104-6526-a129-0409-e66ebc098e2a@redhat.com> (raw)
In-Reply-To: <20190423141714.GO11158@hirez.programming.kicks-ass.net>

On 4/23/19 10:17 AM, Peter Zijlstra wrote:
> On Sun, Apr 21, 2019 at 05:07:56PM -0400, Waiman Long wrote:
>
>> How about the following chunks to disable preemption temporarily for the
>> increment-check-decrement sequence?
>>
>> diff --git a/include/linux/preempt.h b/include/linux/preempt.h
>> index dd92b1a93919..4cc03ac66e13 100644
>> --- a/include/linux/preempt.h
>> +++ b/include/linux/preempt.h
>> @@ -250,6 +250,8 @@ do { \
>>  #define preempt_enable_notrace()               barrier()
>>  #define preemptible()                          0
>>  
>> +#define __preempt_disable_nop  /* preempt_disable() is nop */
>> +
>>  #endif /* CONFIG_PREEMPT_COUNT */
>>  
>>  #ifdef MODULE
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 043fd29b7534..54029e6af17b 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -256,11 +256,64 @@ static inline struct task_struct
>> *rwsem_get_owner(struct r
>>         return (struct task_struct *) (cowner
>>                 ? cowner | (sowner & RWSEM_NONSPINNABLE) : sowner);
>>  }
>> +
>> +/*
>> + * If __preempt_disable_nop is defined, calling preempt_disable() and
>> + * preempt_enable() directly is the most efficient way. Otherwise, it may
>> + * be more efficient to disable and enable interrupt instead for disabling
>> + * preemption tempoarily.
>> + */
>> +#ifdef __preempt_disable_nop
>> +#define disable_preemption()   preempt_disable()
>> +#define enable_preemption()    preempt_enable()
>> +#else
>> +#define disable_preemption()   local_irq_disable()
>> +#define enable_preemption()    local_irq_enable()
>> +#endif
> I'm not aware of an architecture where disabling interrupts is faster
> than disabling preemption.

I have actually done some performance test measuring the effects of
disabling interrupt and preemption on readers (on x86-64 system).

  Threads    Before patch    Disable irq    Disable preemption
  -------    ------------    -----------    ------------------
     1          9,088          8,766           9,172
     2          9,296          9,169           8,707
     4         11,192         11,205          10,712
     8         11,329         11,332          11,213

For uncontended case, disable interrupt is slower. The slowdown is gone
once the rwsem becomes contended. So it may not be a good idea to
disable interrupt as a proxy of disabling preemption.

BTW, preemption count is not enabled in typical distro production
kernels like RHEL. So preempt_disable() is just a barrier. It is turned
on in the debug kernel, though.


>> +/*
>> + * When the owner task structure pointer is merged into couunt, less bits
>> + * will be available for readers. Therefore, there is a very slight chance
>> + * that the reader count may overflow. We try to prevent that from
>> happening
>> + * by checking for the MS bit of the count and failing the trylock attempt
>> + * if this bit is set.
>> + *
>> + * With preemption enabled, there is a remote possibility that preemption
>> + * can happen in the narrow timing window between incrementing and
>> + * decrementing the reader count and the task is put to sleep for a
>> + * considerable amount of time. If sufficient number of such unfortunate
>> + * sequence of events happen, we may still overflow the reader count.
>> + * To avoid such possibility, we have to disable preemption for the
>> + * whole increment-check-decrement sequence.
>> + *
>> + * The function returns true if there are too many readers and the count
>> + * has already been properly decremented so the reader must go directly
>> + * into the wait list.
>> + */
>> +static inline bool rwsem_read_trylock(struct rw_semaphore *sem, long *cnt)
>> +{
>> +       bool wait = false;      /* Wait now flag */
>> +
>> +       disable_preemption();
>> +       *cnt = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>> &sem->count);
>> +       if (unlikely(*cnt < 0)) {
>> +               atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
>> +               wait = true;
>> +       }
>> +       enable_preemption();
>> +       return wait;
>> +}
>>  #else /* !CONFIG_RWSEM_OWNER_COUNT */
> This also means you have to ensure CONFIG_NR_CPUS < 32K for
> RWSEM_OWNER_COUNT.


Yes, that can be done.


>
>>  static inline struct task_struct *rwsem_get_owner(struct rw_semaphore *sem)
>>  {
>>         return READ_ONCE(sem->owner);
>>  }
>> +
>> +static inline bool rwsem_read_trylock(struct rw_semaphore *sem, long *cnt)
>> +{
>> +       *cnt = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>> &sem->count);
>> +       return false;
>> +}
>>  #endif /* CONFIG_RWSEM_OWNER_COUNT */
>>  
>>  /*
>> @@ -981,32 +1034,18 @@ static inline void clear_wr_nonspinnable(struct
>> rw_semaph
>>   * Wait for the read lock to be granted
>>   */
>>  static struct rw_semaphore __sched *
>> -rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, long count)
>> +rwsem_down_read_slowpath(struct rw_semaphore *sem, int state, const
>> bool wait)
>>  {
>> -       long adjustment = -RWSEM_READER_BIAS;
>> +       long count, adjustment = -RWSEM_READER_BIAS;
>>         bool wake = false;
>>         struct rwsem_waiter waiter;
>>         DEFINE_WAKE_Q(wake_q);
>>  
>> -       if (unlikely(count < 0)) {
>> +       if (unlikely(wait)) {
>>                 /*
>> -                * The sign bit has been set meaning that too many active
>> -                * readers are present. We need to decrement reader count &
>> -                * enter wait queue immediately to avoid overflowing the
>> -                * reader count.
>> -                *
>> -                * As preemption is not disabled, there is a remote
>> -                * possibility that preemption can happen in the narrow
>> -                * timing window between incrementing and decrementing
>> -                * the reader count and the task is put to sleep for a
>> -                * considerable amount of time. If sufficient number
>> -                * of such unfortunate sequence of events happen, we
>> -                * may still overflow the reader count. It is extremely
>> -                * unlikey, though. If this is a concern, we should consider
>> -                * disable preemption during this timing window to make
>> -                * sure that such unfortunate event will not happen.
>> +                * The reader count has already been decremented and the
>> +                * reader should go directly into the wait list now.
>>                  */
>> -               atomic_long_add(-RWSEM_READER_BIAS, &sem->count);
>>                 adjustment = 0;
>>                 goto queue;
>>         }
>> @@ -1358,11 +1397,12 @@ static struct rw_semaphore
>> *rwsem_downgrade_wake(struct
>>   */
>>  inline void __down_read(struct rw_semaphore *sem)
>>  {
>> -       long tmp = atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>> -                                                &sem->count);
>> +       long tmp;
>> +       bool wait;
>>  
>> +       wait = rwsem_read_trylock(sem, &tmp);
>>         if (unlikely(tmp & RWSEM_READ_FAILED_MASK)) {
>> -               rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE, tmp);
>> +               rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE, wait);
>>                 DEBUG_RWSEMS_WARN_ON(!is_rwsem_reader_owned(sem), sem);
>>         } else {
>>                 rwsem_set_reader_owned(sem);
> I think I prefer that function returning/taking the bias/adjustment
> value instead of a bool, if it is all the same.

Sure, I can do that.

Cheers,
Longman


  reply	other threads:[~2019-04-23 14:31 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-13 17:22 [PATCH v4 00/16] locking/rwsem: Rwsem rearchitecture part 2 Waiman Long
2019-04-13 17:22 ` [PATCH v4 01/16] locking/rwsem: Prevent unneeded warning during locking selftest Waiman Long
2019-04-18  8:04   ` [tip:locking/core] " tip-bot for Waiman Long
2019-04-13 17:22 ` [PATCH v4 02/16] locking/rwsem: Make owner available even if !CONFIG_RWSEM_SPIN_ON_OWNER Waiman Long
2019-04-13 17:22 ` [PATCH v4 03/16] locking/rwsem: Remove rwsem_wake() wakeup optimization Waiman Long
2019-04-13 17:22 ` [PATCH v4 04/16] locking/rwsem: Implement a new locking scheme Waiman Long
2019-04-16 13:22   ` Peter Zijlstra
2019-04-16 13:32     ` Waiman Long
2019-04-16 14:18       ` Peter Zijlstra
2019-04-16 14:42         ` Peter Zijlstra
2019-04-13 17:22 ` [PATCH v4 05/16] locking/rwsem: Merge rwsem.h and rwsem-xadd.c into rwsem.c Waiman Long
2019-04-13 17:22 ` [PATCH v4 06/16] locking/rwsem: Code cleanup after files merging Waiman Long
2019-04-16 16:01   ` Peter Zijlstra
2019-04-16 16:17     ` Peter Zijlstra
2019-04-16 19:45       ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 07/16] locking/rwsem: Implement lock handoff to prevent lock starvation Waiman Long
2019-04-16 14:12   ` Peter Zijlstra
2019-04-16 20:26     ` Waiman Long
2019-04-16 21:07       ` Waiman Long
2019-04-17  7:13         ` Peter Zijlstra
2019-04-17 16:22           ` Waiman Long
2019-04-16 15:49   ` Peter Zijlstra
2019-04-16 16:15     ` Peter Zijlstra
2019-04-16 18:41       ` Waiman Long
2019-04-16 18:16     ` Waiman Long
2019-04-16 18:32       ` Peter Zijlstra
2019-04-17  7:35       ` Peter Zijlstra
2019-04-17 16:35         ` Waiman Long
2019-04-17  8:05       ` Peter Zijlstra
2019-04-17 16:39         ` Waiman Long
2019-04-18  8:22           ` Peter Zijlstra
2019-04-17  8:17   ` Peter Zijlstra
2019-04-13 17:22 ` [PATCH v4 08/16] locking/rwsem: Make rwsem_spin_on_owner() return owner state Waiman Long
2019-04-17  9:00   ` Peter Zijlstra
2019-04-17 16:42     ` Waiman Long
2019-04-17 10:19   ` Peter Zijlstra
2019-04-17 16:53     ` Waiman Long
2019-04-17 12:41   ` Peter Zijlstra
2019-04-17 12:47     ` Peter Zijlstra
2019-04-17 18:29       ` Waiman Long
2019-04-18  8:39         ` Peter Zijlstra
2019-04-17 13:00     ` Peter Zijlstra
2019-04-17 18:50       ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 09/16] locking/rwsem: Ensure an RT task will not spin on reader Waiman Long
2019-04-17 13:18   ` Peter Zijlstra
2019-04-17 18:47     ` Waiman Long
2019-04-18  8:52       ` Peter Zijlstra
2019-04-18 13:27         ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 10/16] locking/rwsem: Wake up almost all readers in wait queue Waiman Long
2019-04-16 16:50   ` Davidlohr Bueso
2019-04-16 17:37     ` Waiman Long
2019-04-17 13:39   ` Peter Zijlstra
2019-04-17 17:16     ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 11/16] locking/rwsem: Enable readers spinning on writer Waiman Long
2019-04-17 13:56   ` Peter Zijlstra
2019-04-17 17:34     ` Waiman Long
2019-04-18  8:57       ` Peter Zijlstra
2019-04-18 14:35         ` Waiman Long
2019-04-17 13:58   ` Peter Zijlstra
2019-04-17 17:45     ` Waiman Long
2019-04-18  9:00       ` Peter Zijlstra
2019-04-18 13:40         ` Waiman Long
2019-04-17 14:05   ` Peter Zijlstra
2019-04-17 17:51     ` Waiman Long
2019-04-18  9:11       ` Peter Zijlstra
2019-04-18 14:37         ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 12/16] locking/rwsem: Enable time-based spinning on reader-owned rwsem Waiman Long
2019-04-18 13:06   ` Peter Zijlstra
2019-04-18 15:15     ` Waiman Long
2019-04-19  7:56       ` Peter Zijlstra
2019-04-19 14:33         ` Waiman Long
2019-04-19 15:36           ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 13/16] locking/rwsem: Add more rwsem owner access helpers Waiman Long
2019-04-13 17:22 ` [PATCH v4 14/16] locking/rwsem: Guard against making count negative Waiman Long
2019-04-18 13:51   ` Peter Zijlstra
2019-04-18 14:08     ` Waiman Long
2019-04-18 14:30       ` Peter Zijlstra
2019-04-18 14:40       ` Peter Zijlstra
2019-04-18 14:54         ` Waiman Long
2019-04-19 10:26           ` Peter Zijlstra
2019-04-19 12:02             ` Peter Zijlstra
2019-04-19 13:03               ` Peter Zijlstra
2019-04-19 13:15                 ` Peter Zijlstra
2019-04-19 19:39                   ` Waiman Long
2019-04-21 21:07                     ` Waiman Long
2019-04-23 14:17                       ` Peter Zijlstra
2019-04-23 14:31                         ` Waiman Long [this message]
2019-04-23 16:27                         ` Linus Torvalds
2019-04-23 19:12                           ` Waiman Long
2019-04-23 19:34                             ` Peter Zijlstra
2019-04-23 19:41                               ` Waiman Long
2019-04-23 19:55                                 ` [PATCH] bpf: Fix preempt_enable_no_resched() abuse Peter Zijlstra
2019-04-23 20:03                                   ` [PATCH] trace: " Peter Zijlstra
2019-04-23 23:58                                     ` Steven Rostedt
2019-04-29  6:39                                     ` [tip:sched/core] " tip-bot for Peter Zijlstra
2019-04-29 13:31                                       ` Steven Rostedt
2019-04-29 14:08                                         ` Ingo Molnar
2019-04-23 20:27                                   ` [PATCH] bpf: " Linus Torvalds
2019-04-23 20:35                                     ` Peter Zijlstra
2019-04-23 20:45                                       ` Linus Torvalds
2019-04-24 13:19                                       ` Peter Zijlstra
2019-04-25 21:23                                   ` Alexei Starovoitov
2019-04-26  7:14                                     ` Peter Zijlstra
2019-04-24  7:09                             ` [PATCH v4 14/16] locking/rwsem: Guard against making count negative Peter Zijlstra
2019-04-24 16:49                               ` Waiman Long
2019-04-24 17:01                                 ` Peter Zijlstra
2019-04-24 17:10                                   ` Waiman Long
2019-04-24 17:56                                   ` Linus Torvalds
2019-04-13 17:22 ` [PATCH v4 15/16] locking/rwsem: Merge owner into count on x86-64 Waiman Long
2019-04-18 14:28   ` Peter Zijlstra
2019-04-18 14:40     ` Waiman Long
2019-04-13 17:22 ` [PATCH v4 16/16] locking/rwsem: Remove redundant computation of writer lock word Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a2baa104-6526-a129-0409-e66ebc098e2a@redhat.com \
    --to=llong@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=huang.ying.caritas@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.