Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Yonghong Song <yhs@fb.com>, Jann Horn <jannh@google.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andriin@fb.com>, Martin KaFai Lau <kafai@fb.com>,
	bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	kernel-team <kernel-team@fb.com>, X86 ML <x86@kernel.org>,
	KP Singh <kpsingh@kernel.org>
Subject: Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
Date: Thu, 28 Jan 2021 18:32:59 -0800	[thread overview]
Message-ID: <20210129023259.wffchzof4rlw5pvs@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CALCETrU2pJUufTvr0c9jM2+ymSnYFWJ_mSJnCeEzaAiySkQyiA@mail.gmail.com>

On Thu, Jan 28, 2021 at 06:18:37PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> > >
> > > What exactly could the fault code even do to fix this up?  Something like:
> > >
> > > if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> > > don't have permission to map NULL) {
> > >   special care for bpf;
> > > }
> >
> > right. where 'special care' is checking extable and skipping single
> > load instruction.
> >
> > > This seems arbitrary and fragile.  And it's not obviously
> > > implementable safely without taking locks,
> >
> > search_bpf_extables() needs rcu_read_lock() only.
> > Not the locks you're talking about.
> 
> I mean the locks in the if statement.  How am I supposed to tell
> whether this fault is a special bpf fault or a normal page fault
> without taking a lock to look up the VMA or to do some other hack?

search_bpf_extables() only needs a faulting rip.
No need to lookup vma.
if (addr == 0 && search_bpf_extables(regs->ip)...
is trivial enough and won't penalize page faults in general.
These conditions are not going to happen in the normal kernel code.

> Also, why should BPF get special dispensation to generate a special
> kind of nonsensical page fault that no other kernel code is allowed to
> do?

bpf is special, since it cares about performance a lot
and saving branches in critical path is important.

> 
> >
> > > which we really ought not
> > > to be doing from inside arbitrary bpf programs.
> >
> > Not in arbitrary progs. It's only available to progs loaded by root.
> >
> > > Keep in mind that, if
> > > SMAP is unavailable, the fault code literally does not know whether
> > > the page fault originated form a valid uaccess region.
> > >
> > > There's also always the possibility that you simultaneously run bpf
> > > and something like Wine or DOSEMU2 that actually maps the zero page,
> > > in which case the effect of the bpf code is going to be quite erratic
> > > and will depend on which mm is loaded.
> >
> > It's tracing. Folks who write those progs accepted the fact that
> > the data returned by probe_read is not going to be 100% accurate.
> 
> Is this really all tracing or is some of it actual live network code?

Networking bpf progs don't use this. bpf tracing does.
But I'm not sure why you're asking.
Tracing has higher performance demands than networking.

> >
> > bpf jit can insert !null check before every such special load
> > (since it knows all of them), but it's an obvious performance loss
> > that would be good to avoid. If fault handler can do this
> > if (addr == 0 && ...) search_bpf_extables()
> > at least in some conditions and cpu flags it's already a win.
> > In all other cases bpf jit will insert !null checks.
> 
> Again, there is no guarantee that a page fault is even generated for
> this.  And it doesn't seem very reasonable for the fault code to have
> to decide whether a NULL pointer dereference is a special BPF fault or
> a real user access fault against a VMA at address 9.

The faulting address and faulting ip will precisely identify this situation.
There is no guess work.