bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: [PATCH] BPF: Disable on PREEMPT_RT
Date: Fri, 18 Oct 2019 02:22:40 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.21.1910180152110.1869@nanos.tec.linutronix.de> (raw)
In-Reply-To: <CAADnVQJPJubTx0TxcXnbCfavcQDZeu8VTnYYpa8JYpWw9Ze4qg@mail.gmail.com>

On Thu, 17 Oct 2019, Alexei Starovoitov wrote:
> On Thu, Oct 17, 2019 at 2:54 PM Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > I'm all ears for an alternative solution. Here are the pain points:
> 
> Let's talk about them one by one.
> 
> >   #1) BPF disables preemption unconditionally with no way to do a proper RT
> >       substitution like most other infrastructure in the kernel provides
> >       via spinlocks or other locking primitives.
> 
> Kernel has a ton of code that disables preemption.
> Why BPF is somehow special?
> Are you saying RT kernel doesn't disable preemption at all?
> I'm complete noob in RT.

The basic principle of RT is to break up the arbitrary long
preemption/interrupt disabled sections of the mainline kernel.

Most preempt/interrupt disabled sections are implicit by taking locks
(spinlock, rwlock). Just a few are explicit by issuing
preempt/local_irq_disable()

RT substitutes spinlock/rwlock with RT aware counterparts which

 - Do not disable preemption/interrupts

 - Prevent migration to keep the implicit migrate disable semantics
   of preempt disable

 - Convert the underlying lock primitive to a priority inheritance aware
   mechanism, aka. rtmutex.

In order to make the above work, RT forces interrupt and soft interrupt
processing into thread context except for interrupts which are explicitely
marked as interrupt safe (IRQF_NOTHREAD).

As a consequence most of the kernel code becomes fully preemptible. Of
course there are still code parts which require that preemption/interrupts
are hard disabled. That's pretty much initial low level entry code, hard
interrupt handling code (which just wakes up the threads), context switch
code and some other rather low level functions (vmenter/exit ....).

That also requires that we have still locks which disable
preemption/interrupts. That's why we have raw_spinlock and
spinlock. spinlock is substituted with a RT primitive while raw_spinlock
behaves like the traditional spinlock on a non RT kernel (disables
preemption/interrupts).

But that also means any code which explcitely disables preemption or
interrupts without taking a spin/rw lock can trigger the following issues:

  - Calling into code which requires to be preemtible/sleepable on RT
    results in a might sleep splat.

  - Has in RT terms potentially unbound or undesired runtime length without
    any chance for the scheduler to control it.

Aside of that RT has a more strict view vs. lock ownership because almost
all lock primitives except real counting semaphores are substituted by
priority inheritance aware counterparts. PI aware locks have not only the
requirement that they can only be taken in preemptible context (see above),
they also have a strict locker == unlocker requirement for obvious reasons.
up_read_non_owner() can't obviously fulfil that requirement.

I surely answered more than your initial question and probably not enough,
so feel free to ask for clarification.

Thanks for caring!

       Thomas



  reply	other threads:[~2019-10-18  0:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-17  9:05 [PATCH] BPF: Disable on PREEMPT_RT Sebastian Andrzej Siewior
2019-10-17 14:53 ` Daniel Borkmann
2019-10-17 15:40   ` Sebastian Andrzej Siewior
2019-10-17 17:25     ` David Miller
2019-10-17 21:54       ` Thomas Gleixner
2019-10-17 22:13         ` David Miller
2019-10-17 23:50           ` Thomas Gleixner
2019-10-17 23:27         ` Alexei Starovoitov
2019-10-18  0:22           ` Thomas Gleixner [this message]
2019-10-18  5:52             ` Alexei Starovoitov
2019-10-18 11:28               ` Thomas Gleixner
2019-10-18 12:48                 ` Sebastian Sewior
2019-10-18 23:05                 ` Alexei Starovoitov
2019-10-20  9:06                   ` Thomas Gleixner
2019-10-22  1:43                     ` Alexei Starovoitov
2019-10-18  2:49         ` Clark Williams
2019-10-18  4:57           ` David Miller
2019-10-18  5:54             ` Alexei Starovoitov
2019-10-18  8:38             ` Thomas Gleixner
2019-10-18 12:49               ` Clark Williams
2019-10-18  8:46           ` Thomas Gleixner
2019-10-18 12:43             ` Sebastian Sewior
2019-10-18 12:58             ` Clark Williams
2019-10-17 22:11       ` Thomas Gleixner
2019-10-17 22:23         ` David Miller
2019-10-17 17:26   ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1910180152110.1869@nanos.tec.linutronix.de \
    --to=tglx@linutronix.de \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kafai@fb.com \
    --cc=peterz@infradead.org \
    --cc=songliubraving@fb.com \
    --cc=williams@redhat.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).