Re: [PATCH] BPF: Disable on PREEMPT_RT

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: David Miller <davem@davemloft.net>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	Daniel Borkmann <daniel@iogearbox.net>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, Peter Zijlstra <peterz@infradead.org>,
	Clark Williams <williams@redhat.com>
Subject: Re: [PATCH] BPF: Disable on PREEMPT_RT
Date: Thu, 17 Oct 2019 22:52:24 -0700	[thread overview]
Message-ID: <20191018055222.cwx5dmj6pppqzcpc@ast-mbp> (raw)
In-Reply-To: <alpine.DEB.2.21.1910180152110.1869@nanos.tec.linutronix.de>

On Fri, Oct 18, 2019 at 02:22:40AM +0200, Thomas Gleixner wrote:
> 
> But that also means any code which explcitely disables preemption or
> interrupts without taking a spin/rw lock can trigger the following issues:
> 
>   - Calling into code which requires to be preemtible/sleepable on RT
>     results in a might sleep splat.
> 
>   - Has in RT terms potentially unbound or undesired runtime length without
>     any chance for the scheduler to control it.

Much appreciate the explanation. Few more questions:
There is a ton of kernel code that does preempt_disable()
and proceeds to do per-cpu things. How is it handled in RT?
Are you saying that every preempt_disable has to be paired with some lock?
I don't think it's a practical requirement for fulfill, so I probably
misunderstood something.

In BPF we disable preemption because of per-cpu maps and per-cpu data structures
that are shared between bpf program execution and kernel execution.

BPF doesn't call into code that might sleep.
BPF also doesn't have unbound runtime.
So two above issues are actually non-issues.

May be we should go back to concerns that prompted this patch.
Do you have any numbers from production that show that BPF is causing
unbounded latency for RT workloads? If it's all purely theoretical than
we should share the knowledge how different systems behave
instead of building walls. It feels to me that there are no
actual issues. Only misunderstandings.

All that aside I'm working on new BPF program categories that
will be fully preemptable and sleepable. That requirement came
from tracing long ago. The verifier infrastructure wasn't ready
at that time. Now we can do it.
BPF programs will be able to do copy_from_user() and take faults.
preempt_disable and rcu_read_lock regions will be controlled by
the verifier. We will have to support all existing semantics though.