linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: paulmck@kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>, Will Deacon <will@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Randy Dunlap <rdunlap@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Logan Gunthorpe <logang@deltatee.com>,
	Kurt Schwemmer <kurt.schwemmer@microsemi.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci@vger.kernel.org, Felipe Balbi <balbi@kernel.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	linux-usb@vger.kernel.org, Kalle Valo <kvalo@codeaurora.org>,
	"David S. Miller" <davem@davemloft.net>,
	linux-wireless@vger.kernel.org, netdev@vger.kernel.org,
	Oleg Nesterov <oleg@redhat.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Arnd Bergmann <arnd@arndb.de>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [patch V2 08/15] Documentation: Add lock ordering and nesting documentation
Date: Thu, 19 Mar 2020 19:02:17 +0100	[thread overview]
Message-ID: <874kuk5il2.fsf@nanos.tec.linutronix.de> (raw)
In-Reply-To: <20200318223137.GW3199@paulmck-ThinkPad-P72>

Paul,

"Paul E. McKenney" <paulmck@kernel.org> writes:

> On Wed, Mar 18, 2020 at 09:43:10PM +0100, Thomas Gleixner wrote:
>
> Mostly native-English-speaker services below, so please feel free to
> ignore.  The one place I made a substantive change, I marked it "@@@".
> I only did about half of this document, but should this prove useful,
> I will do the other half later.

Native speaker services are always useful and appreciated.

>> +The kernel provides a variety of locking primitives which can be divided
>> +into two categories:
>> +
>> + - Sleeping locks
>> + - Spinning locks
>> +
>> +This document describes the lock types at least at the conceptual level and
>> +provides rules for nesting of lock types also under the aspect of PREEMPT_RT.
>
> I suggest something like this:
>
> This document conceptually describes these lock types and provides rules
> for their nesting, including the rules for use under PREEMPT_RT.

Way better :)

>> +Sleeping locks can only be acquired in preemptible task context.
>> +
>> +Some of the implementations allow try_lock() attempts from other contexts,
>> +but that has to be really evaluated carefully including the question
>> +whether the unlock can be done from that context safely as well.
>> +
>> +Note that some lock types change their implementation details when
>> +debugging is enabled, so this should be really only considered if there is
>> +no other option.
>
> How about something like this?
>
> Although implementations allow try_lock() from other contexts, it is
> necessary to carefully evaluate the safety of unlock() as well as of
> try_lock().  Furthermore, it is also necessary to evaluate the debugging
> versions of these primitives.  In short, don't acquire sleeping locks
> from other contexts unless there is no other option.

Yup.

>> +Sleeping lock types:
>> +
>> + - mutex
>> + - rt_mutex
>> + - semaphore
>> + - rw_semaphore
>> + - ww_mutex
>> + - percpu_rw_semaphore
>> +
>> +On a PREEMPT_RT enabled kernel the following lock types are converted to
>> +sleeping locks:
>
> On PREEMPT_RT kernels, these lock types are converted to sleeping
> locks:

Ok.

>> + - spinlock_t
>> + - rwlock_t
>> +
>> +Spinning locks
>> +--------------
>> +
>> + - raw_spinlock_t
>> + - bit spinlocks
>> +
>> +On a non PREEMPT_RT enabled kernel the following lock types are spinning
>> +locks as well:
>
> On non-PREEMPT_RT kernels, these lock types are also spinning locks:

Ok.

>> + - spinlock_t
>> + - rwlock_t
>> +
>> +Spinning locks implicitly disable preemption and the lock / unlock functions
>> +can have suffixes which apply further protections:
>> +
>> + ===================  ====================================================
>> + _bh()                Disable / enable bottom halves (soft interrupts)
>> + _irq()               Disable / enable interrupts
>> + _irqsave/restore()   Save and disable / restore interrupt disabled state
>> + ===================  ====================================================
>> +
>> +
>> +rtmutex
>> +=======
>> +
>> +RT-mutexes are mutexes with support for priority inheritance (PI).
>> +
>> +PI has limitations on non PREEMPT_RT enabled kernels due to preemption and
>> +interrupt disabled sections.
>> +
>> +On a PREEMPT_RT enabled kernel most of these sections are fully
>> +preemptible. This is possible because PREEMPT_RT forces most executions
>> +into task context, especially interrupt handlers and soft interrupts, which
>> +allows to substitute spinlock_t and rwlock_t with RT-mutex based
>> +implementations.
>
> PI clearly cannot preempt preemption-disabled or interrupt-disabled
> regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
> execute most such regions of code in preemptible task context, especially
> interrupt handlers and soft interrupts.  This conversion allows spinlock_t
> and rwlock_t to be implemented via RT-mutexes.

Nice.

>> +
>> +raw_spinlock_t and spinlock_t
>> +=============================
>> +
>> +raw_spinlock_t
>> +--------------
>> +
>> +raw_spinlock_t is a strict spinning lock implementation regardless of the
>> +kernel configuration including PREEMPT_RT enabled kernels.
>> +
>> +raw_spinlock_t is to be used only in real critical core code, low level
>> +interrupt handling and places where protecting (hardware) state is required
>> +to be safe against preemption and eventually interrupts.
>> +
>> +Another reason to use raw_spinlock_t is when the critical section is tiny
>> +to avoid the overhead of spinlock_t on a PREEMPT_RT enabled kernel in the
>> +contended case.
>
> raw_spinlock_t is a strict spinning lock implementation in all kernels,
> including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
> core code, low level interrupt handling and places where disabling
> preemption or interrupts is required, for example, to safely access
> hardware state.  raw_spinlock_t can sometimes also be used when the
> critical section is tiny and the lock is lightly contended, thus avoiding
> RT-mutex overhead.
>
> @@@  I added the point about the lock being lightly contended.

Hmm, not sure. The point is that if the critical section is small the
overhead of cross CPU boosting along with the resulting IPIs is going to
be at least an order of magnitude larger. And on contention this is just
pushing the raw_spinlock contention off to the raw_spinlock in the rt
mutex plus the owning tasks pi_lock which makes things even worse.

>> + - The hard interrupt related suffixes for spin_lock / spin_unlock
>> +   operations (_irq, _irqsave / _irqrestore) do not affect the CPUs
>                                                                   CPU's

Si senor!

>> +   interrupt disabled state
>> +
>> + - The soft interrupt related suffix (_bh()) is still disabling the
>> +   execution of soft interrupts, but contrary to a non PREEMPT_RT enabled
>> +   kernel, which utilizes the preemption count, this is achieved by a per
>> +   CPU bottom half locking mechanism.
>
>  - The soft interrupt related suffix (_bh()) still disables softirq
>    handlers.  However, unlike non-PREEMPT_RT kernels (which disable
>    preemption to get this effect), PREEMPT_RT kernels use a per-CPU
>    per-bottom-half locking mechanism.

it's not per-bottom-half anymore. That turned out to be dangerous due to
dependencies between BH types, e.g. network and timers.

I hope I was able to encourage you to comment on the other half as well :)

Thanks,

        tglx

  reply	other threads:[~2020-03-19 18:02 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 20:43 [patch V2 00/15] Lock ordering documentation and annotation for lockdep Thomas Gleixner
2020-03-18 20:43 ` [patch V2 01/15] PCI/switchtec: Fix init_completion race condition with poll_wait() Thomas Gleixner
2020-03-18 21:25   ` Bjorn Helgaas
2020-03-18 20:43 ` [patch V2 02/15] pci/switchtec: Replace completion wait queue usage for poll Thomas Gleixner
2020-03-18 21:26   ` Bjorn Helgaas
2020-03-18 22:11   ` Logan Gunthorpe
2020-03-18 20:43 ` [patch V2 03/15] usb: gadget: Use completion interface instead of open coding it Thomas Gleixner
2020-03-19  8:41   ` Greg Kroah-Hartman
2020-03-18 20:43 ` [patch V2 04/15] orinoco_usb: Use the regular completion interfaces Thomas Gleixner
2020-03-19  8:40   ` Greg Kroah-Hartman
2020-03-18 20:43 ` [patch V2 05/15] acpi: Remove header dependency Thomas Gleixner
2020-03-18 20:43 ` [patch V2 06/15] rcuwait: Add @state argument to rcuwait_wait_event() Thomas Gleixner
2020-03-20  5:36   ` Davidlohr Bueso
2020-03-20  8:45     ` Sebastian Andrzej Siewior
2020-03-20  8:58       ` Davidlohr Bueso
2020-03-20  9:48   ` [PATCH 0/5] Remove mm.h from arch/*/include/asm/uaccess.h Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 1/5] nds32: Remove mm.h from asm/uaccess.h Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 2/5] csky: " Sebastian Andrzej Siewior
2020-03-21 11:24       ` Guo Ren
2020-03-21 12:08         ` Thomas Gleixner
2020-03-21 14:11           ` Guo Ren
2020-03-20  9:48     ` [PATCH 3/5] hexagon: " Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 4/5] ia64: " Sebastian Andrzej Siewior
2020-03-20  9:48     ` [PATCH 5/5] microblaze: " Sebastian Andrzej Siewior
2020-03-18 20:43 ` [patch V2 07/15] powerpc/ps3: Convert half completion to rcuwait Thomas Gleixner
2020-03-19  9:00   ` Sebastian Andrzej Siewior
2020-03-19  9:18     ` Peter Zijlstra
2020-03-19  9:21   ` Davidlohr Bueso
2020-03-19 10:04   ` Christoph Hellwig
2020-03-19 10:26     ` Sebastian Andrzej Siewior
2020-03-20  0:01       ` Geoff Levand
2020-03-20  0:45       ` Michael Ellerman
2020-03-21 10:41     ` Thomas Gleixner
2020-03-18 20:43 ` [patch V2 08/15] Documentation: Add lock ordering and nesting documentation Thomas Gleixner
2020-03-18 22:31   ` Paul E. McKenney
2020-03-19 18:02     ` Thomas Gleixner [this message]
2020-03-20 16:01       ` Paul E. McKenney
2020-03-20 19:51         ` Thomas Gleixner
2020-03-20 21:02           ` Paul E. McKenney
2020-03-20 22:36             ` Thomas Gleixner
2020-03-21  2:29               ` Paul E. McKenney
2020-03-21 10:26                 ` Thomas Gleixner
2020-03-21 17:23                   ` Paul E. McKenney
2020-03-19  8:51   ` Davidlohr Bueso
2020-03-19 15:04   ` Jonathan Corbet
2020-03-19 18:04     ` Thomas Gleixner
2020-03-21 21:21   ` Joel Fernandes
2020-03-21 21:49     ` Thomas Gleixner
2020-03-22  1:36       ` Joel Fernandes
2020-03-18 20:43 ` [patch V2 09/15] timekeeping: Split jiffies seqlock Thomas Gleixner
2020-03-18 20:43 ` [patch V2 10/15] sched/swait: Prepare usage in completions Thomas Gleixner
2020-03-18 20:43 ` [patch V2 11/15] completion: Use simple wait queues Thomas Gleixner
2020-03-18 22:28   ` Logan Gunthorpe
2020-03-19  0:33   ` Joel Fernandes
2020-03-19  0:44     ` Thomas Gleixner
2020-03-19  8:42   ` Greg Kroah-Hartman
2020-03-19 17:12   ` Linus Torvalds
2020-03-19 23:25   ` Julian Calaby
2020-03-20  6:59     ` Christoph Hellwig
2020-03-20  9:01   ` Davidlohr Bueso
2020-03-20  8:50 ` [patch V2 00/15] Lock ordering documentation and annotation for lockdep Davidlohr Bueso
2020-03-20  8:55 ` [PATCH 16/15] rcuwait: Get rid of stale name comment Davidlohr Bueso
2020-03-20  8:55   ` [PATCH 17/15] rcuwait: Inform rcuwait_wake_up() users if a wakeup was attempted Davidlohr Bueso
2020-03-20  9:13     ` Sebastian Andrzej Siewior
2020-03-20 10:44     ` Peter Zijlstra
2020-03-20  8:55   ` [PATCH 18/15] kvm: Replace vcpu->swait with rcuwait Davidlohr Bueso
2020-03-20 11:20     ` Paolo Bonzini
2020-03-20 12:54     ` Peter Zijlstra
2020-03-22 16:33       ` Davidlohr Bueso
2020-03-22 22:32         ` Peter Zijlstra
2020-03-20  8:55   ` [PATCH 19/15] sched/swait: Reword some of the main description Davidlohr Bueso
2020-03-20  9:19     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874kuk5il2.fsf@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=arnd@arndb.de \
    --cc=balbi@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=dave@stgolabs.net \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=joel@joelfernandes.org \
    --cc=kurt.schwemmer@microsemi.com \
    --cc=kvalo@codeaurora.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linux-wireless@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=logang@deltatee.com \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=netdev@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).