All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
       [not found] ` <97EF0157-F068-4234-B826-C08B7324F356@amacapital.net>
@ 2021-01-29 23:11   ` Alexei Starovoitov
  2021-01-29 23:30     ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29 23:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Andy Lutomirski, Yonghong Song, Jann Horn,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, bpf,
	Alexei Starovoitov, kernel-team, X86 ML, KP Singh

On Fri, Jan 29, 2021 at 08:00:57AM -0800, Andy Lutomirski wrote:
> 
> > On Jan 29, 2021, at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > On Thu, Jan 28, 2021 at 04:32:34PM -0800, Andy Lutomirski wrote:
> > 
> >> I spoke too soon.  get_kernel_nofault() is just fine.  You have
> >> inlined something like __get_kernel_nofault(), which is incorrect.
> >> get_kernel_nofault() would have done the right thing.
> > 
> > Correct, the pagefault_disable() is critical.

What specifically are you referring to?
As far as I can see the current->pagefault_disabled is an artifact of the past.
It doesn't provide any additional information to the fault handler beyond
what extable already does. Consider:

current->pagefault_disabled++
some C code
asm ("load") // with extable annotation
some other C code
current->pagefault_disabled--

If there is fault in the C code the fault handler will do the wrong thing,
since the fault is technically disabled only for asm("load").
The handler will go into bad_area_nosemaphore() instead of find_vma().

The load instruction is annotated in extable.
The fault handler instead of:
  if (faulthandler_disabled) search_exception_tables()
could do:
 search_exception_tables()
right away without sacrificing anything.
If the fault is on one of the special asm("load") the intent of the code
is obvious. This is non faulting load that should be fixed up.
Of course the search of extable should happen before taking mmap lock.

imo pagefault_disabled can be removed.

> Also the _allowed() part. The bounds check is required.

Up-thread I was saying that JIT is inlining copy_from_kernel_nofault().
That's not quite correct. When the code was written it was inlining
bpf_probe_read(). Back then _kernel vs _user distinction didn't exist.
So the bounds check wasn't there. The verifier side was designed
for kernel pointers and NULL from the beginning. No user pointer
(aside from NULL) would ever go through this path.
Today I agree that the range check is necessary.
The question is where to do this check.
I see two options:
- the JIT can emit it
- fault handler can do it, since %rip clearly says that it's JITed asm load.
The intent of the code is not ambiguous.
The intent of the fault is not ambiguous either.
More so the mmap lock and find_vma should not be done.

I prefer the later approach which can be implemented as:
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..7846a95436c1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1248,6 +1248,12 @@ void do_user_addr_fault(struct pt_regs *regs,
        if (unlikely(kprobe_page_fault(regs, X86_TRAP_PF)))
                return;

+       if (!address && (e = search_bpf_extables(regs->ip))) {
+               handler = ex_fixup_handler(e);
+               handler(e, regs, trapnr, error_code, fault_addr);
+               return;
+       }
+

Comparing to former option it saves bpf prog cycles.

Consider that users do not write a->b->c just to walk NULL pointers.
The bpf program authors are not writing junk programs.
If the program is walking the pointers it expects to read useful data.
If one of those pointers can be NULL the program author will
add if (!ptr) check there. The author will use blind a->b->c
only for cases where these pointers are most likely will be !NULL.
"Most likely" means that in 99.9% of the situations there will be no faults.
When people write bpf tracing progs they care about the perf overhead.
Obviously if prog is causing faults somebody will notice and will
slap that bpf author for causing a performance regression.
So faulting on NULL is very rare in practice.
There is a selftests/bpf for faulting on NULL, but it's a test only.
If the NULL check is added by JIT it will not be hit 99.9% of the time.
It will be a pure overhead for program execution.
Hence my preference for the latter approach.

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29 23:11   ` [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly Alexei Starovoitov
@ 2021-01-29 23:30     ` Andy Lutomirski
  2021-01-30  2:54       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29 23:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Andy Lutomirski, Yonghong Song, Jann Horn,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, bpf,
	Alexei Starovoitov, kernel-team, X86 ML, KP Singh

On Fri, Jan 29, 2021 at 3:12 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 08:00:57AM -0800, Andy Lutomirski wrote:
> >
> > > On Jan 29, 2021, at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 04:32:34PM -0800, Andy Lutomirski wrote:
> > >
> > >> I spoke too soon.  get_kernel_nofault() is just fine.  You have
> > >> inlined something like __get_kernel_nofault(), which is incorrect.
> > >> get_kernel_nofault() would have done the right thing.
> > >
> > > Correct, the pagefault_disable() is critical.
>
> What specifically are you referring to?
> As far as I can see the current->pagefault_disabled is an artifact of the past.
> It doesn't provide any additional information to the fault handler beyond
> what extable already does. Consider:
>
> current->pagefault_disabled++
> some C code
> asm ("load") // with extable annotation
> some other C code
> current->pagefault_disabled--

pagefault_disabled is not about providing information to the fault
handler.  It's about changing the semantics of a fault when accessing
a user pointer.  There are two choices of semantics:

1. normal (faulthandler_disabled() returns false): accesses to user
memory can sleep, and accesses to valid VMAs are resolved.  exception
handlers for valid memory accesses are not called if the fault is
resolved.

2. faulthandler_disabled() returns true: accesses to user memory
cannot sleep.  If actual processing would be needed to resolve the
fault, it's an error instead.

This is used for two purposes: optimistic locking and best-effort
tracing.  There are code paths that might prefer to access user memory
with locks held.  They can pagefault_disable(), try the access, and
take the slow path involving dropping locks if the first try fails.
And tracing (e.g. perf call stacks) might need to access user memory
in a context in which they cannot sleep.  They can disable pagefaults
and try.  If they fail, then the user gets an incomplete call stack.

This is specifically about user memory access.  Sure, bpf could do
pagefault_disable(), but that makes no sense -- bpf is trying to do a
*kernel* access and is not validating the pointer.  On x86, which is a
poor architecture in this regard, the memory access instructions don't
adequately distinguish between user and kernel access, and bounds
checks are necessary.

>
> If there is fault in the C code the fault handler will do the wrong thing,
> since the fault is technically disabled only for asm("load").
> The handler will go into bad_area_nosemaphore() instead of find_vma().
>
> The load instruction is annotated in extable.
> The fault handler instead of:
>   if (faulthandler_disabled) search_exception_tables()
> could do:
>  search_exception_tables()
> right away without sacrificing anything.

This would sacrifice correctness in the vastly more common case of
get_user() / put_user() / copy_to/from_user(), which wants to sleep,
not return -FAULT.

> If the fault is on one of the special asm("load") the intent of the code
> is obvious. This is non faulting load that should be fixed up.
> Of course the search of extable should happen before taking mmap lock.
>
> imo pagefault_disabled can be removed.
>
> > Also the _allowed() part. The bounds check is required.
>
> Up-thread I was saying that JIT is inlining copy_from_kernel_nofault().
> That's not quite correct. When the code was written it was inlining
> bpf_probe_read(). Back then _kernel vs _user distinction didn't exist.
> So the bounds check wasn't there. The verifier side was designed
> for kernel pointers and NULL from the beginning. No user pointer
> (aside from NULL) would ever go through this path.
> Today I agree that the range check is necessary.
> The question is where to do this check.
> I see two options:
> - the JIT can emit it
> - fault handler can do it, since %rip clearly says that it's JITed asm load.

The fault handler *can't* emit it correctly because the zero page
might actually be mapped.

Frankly, inlining this at all is a bit bogus, because some day we're
going to improve the range check to check whether the pointer points
to actual memory, and BPF should really do the same thing.  This will
be too big to inline nicely.

> The intent of the code is not ambiguous.
> The intent of the fault is not ambiguous either.
> More so the mmap lock and find_vma should not be done.
>
> I prefer the later approach which can be implemented as:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..7846a95436c1 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1248,6 +1248,12 @@ void do_user_addr_fault(struct pt_regs *regs,
>         if (unlikely(kprobe_page_fault(regs, X86_TRAP_PF)))
>                 return;
>
> +       if (!address && (e = search_bpf_extables(regs->ip))) {

That's quite the set of cache lines that ought not to be touched in
the normal fault path.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29 23:30     ` Andy Lutomirski
@ 2021-01-30  2:54       ` Alexei Starovoitov
  2021-01-31 17:32         ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-30  2:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Peter Zijlstra, Andy Lutomirski, Yonghong Song, Jann Horn,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, bpf,
	Alexei Starovoitov, kernel-team, X86 ML, KP Singh

On Fri, Jan 29, 2021 at 03:30:39PM -0800, Andy Lutomirski wrote:
> On Fri, Jan 29, 2021 at 3:12 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Jan 29, 2021 at 08:00:57AM -0800, Andy Lutomirski wrote:
> > >
> > > > On Jan 29, 2021, at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 04:32:34PM -0800, Andy Lutomirski wrote:
> > > >
> > > >> I spoke too soon.  get_kernel_nofault() is just fine.  You have
> > > >> inlined something like __get_kernel_nofault(), which is incorrect.
> > > >> get_kernel_nofault() would have done the right thing.
> > > >
> > > > Correct, the pagefault_disable() is critical.
> >
> > What specifically are you referring to?
> > As far as I can see the current->pagefault_disabled is an artifact of the past.
> > It doesn't provide any additional information to the fault handler beyond
> > what extable already does. Consider:
> >
> > current->pagefault_disabled++
> > some C code
> > asm ("load") // with extable annotation
> > some other C code
> > current->pagefault_disabled--
> 
> pagefault_disabled is not about providing information to the fault
> handler.  It's about changing the semantics of a fault when accessing
> a user pointer.  There are two choices of semantics:
> 
> 1. normal (faulthandler_disabled() returns false): accesses to user
> memory can sleep, and accesses to valid VMAs are resolved.  exception
> handlers for valid memory accesses are not called if the fault is
> resolved.
> 
> 2. faulthandler_disabled() returns true: accesses to user memory
> cannot sleep.  If actual processing would be needed to resolve the
> fault, it's an error instead.
> 
> This is used for two purposes: optimistic locking and best-effort
> tracing.  There are code paths that might prefer to access user memory
> with locks held.  They can pagefault_disable(), try the access, and
> take the slow path involving dropping locks if the first try fails.
> And tracing (e.g. perf call stacks) might need to access user memory
> in a context in which they cannot sleep.  They can disable pagefaults
> and try.  If they fail, then the user gets an incomplete call stack.

got it. thanks for the explanation. all makes sense.

> This is specifically about user memory access.  Sure, bpf could do
> pagefault_disable(), but that makes no sense -- bpf is trying to do a
> *kernel* access and is not validating the pointer.  

right.

> On x86, which is a
> poor architecture in this regard, the memory access instructions don't
> adequately distinguish between user and kernel access, and bounds
> checks are necessary.
> 
> >
> > If there is fault in the C code the fault handler will do the wrong thing,
> > since the fault is technically disabled only for asm("load").
> > The handler will go into bad_area_nosemaphore() instead of find_vma().
> >
> > The load instruction is annotated in extable.
> > The fault handler instead of:
> >   if (faulthandler_disabled) search_exception_tables()
> > could do:
> >  search_exception_tables()
> > right away without sacrificing anything.
> 
> This would sacrifice correctness in the vastly more common case of
> get_user() / put_user() / copy_to/from_user(), which wants to sleep,
> not return -FAULT.

got it. agree.

> 
> > If the fault is on one of the special asm("load") the intent of the code
> > is obvious. This is non faulting load that should be fixed up.
> > Of course the search of extable should happen before taking mmap lock.
> >
> > imo pagefault_disabled can be removed.
> >
> > > Also the _allowed() part. The bounds check is required.
> >
> > Up-thread I was saying that JIT is inlining copy_from_kernel_nofault().
> > That's not quite correct. When the code was written it was inlining
> > bpf_probe_read(). Back then _kernel vs _user distinction didn't exist.
> > So the bounds check wasn't there. The verifier side was designed
> > for kernel pointers and NULL from the beginning. No user pointer
> > (aside from NULL) would ever go through this path.
> > Today I agree that the range check is necessary.
> > The question is where to do this check.
> > I see two options:
> > - the JIT can emit it
> > - fault handler can do it, since %rip clearly says that it's JITed asm load.
> 
> The fault handler *can't* emit it correctly because the zero page
> might actually be mapped.

Yes. The zero page can be mapped. The fault won't happen
and bpf load will read garbage from user's page zero.
My understanding that this is extremelly rare. Unpriv users cannot map zero.
So no security concern for tracing prog accidently stumbling there.
You mentioned that "wine" does it? Sure. It won't affect bpf tracing.

> Frankly, inlining this at all is a bit bogus, because some day we're
> going to improve the range check to check whether the pointer points
> to actual memory, and BPF should really do the same thing.  This will
> be too big to inline nicely.

Inlining of bpf_probe_read_kernel is absolutely critical to performance.
Before we added this feature people were reporting that this was the
hottest function in tracing progs. The map lookups and prog body
itself were 2nd and 3rd.
In networking the program has to be fast too. The program authors count
every nanosecond to process millions of packets per second.
Then people do 'perf report' and see that bpf prog consumes 10% of cpu.
The prog is performing firewall or load balancer function.
That's useful cpu burn. It's acceptable cpu usage.
Whereas anything tracing is considered overhead. People deploy
tracing bpf prog and somebody notices 0.5% cpu regression.
They yell and complain. Then folks tune the tracing progs to be as
invisible as possible.

> > The intent of the code is not ambiguous.
> > The intent of the fault is not ambiguous either.
> > More so the mmap lock and find_vma should not be done.
> >
> > I prefer the later approach which can be implemented as:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index f1f1b5a0956a..7846a95436c1 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -1248,6 +1248,12 @@ void do_user_addr_fault(struct pt_regs *regs,
> >         if (unlikely(kprobe_page_fault(regs, X86_TRAP_PF)))
> >                 return;
> >
> > +       if (!address && (e = search_bpf_extables(regs->ip))) {
> 
> That's quite the set of cache lines that ought not to be touched in
> the normal fault path.

There is a small mistake above. It should have been:
if (!(address >> PAGE_SHIFT) && search_bpf_extables...

User space mapping of zero page is rare. The kernel faulting there
is even more rare. I think it's possible if root user process maps
zero page on purpose and then does a syscall with addresses from zero page.
So this check is guaranteed to preserve the speed of normal fault path.
Whereas second lookup search_bpf_extables() guarantees that it's
faulting on a specific load inside bpf prog and not any other
part of the kernel. get_user/put_user won't be here.

Anyway I'll prototype adding !null check to JITed code and benchmark
the difference with and without, so we have concrete numbers to talk about.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-30  2:54       ` Alexei Starovoitov
@ 2021-01-31 17:32         ` Andy Lutomirski
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-31 17:32 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, Andy Lutomirski, Yonghong Song, Jann Horn,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, bpf,
	Alexei Starovoitov, kernel-team, X86 ML, KP Singh

On Fri, Jan 29, 2021 at 6:54 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Jan 29, 2021 at 03:30:39PM -0800, Andy Lutomirski wrote:
> > On Fri, Jan 29, 2021 at 3:12 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jan 29, 2021 at 08:00:57AM -0800, Andy Lutomirski wrote:
> > > >
> > > > > On Jan 29, 2021, at 1:21 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > > >
> > > > > On Thu, Jan 28, 2021 at 04:32:34PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > >> I spoke too soon.  get_kernel_nofault() is just fine.  You have
> > > > >> inlined something like __get_kernel_nofault(), which is incorrect.
> > > > >> get_kernel_nofault() would have done the right thing.
> > > > >
> > > > > Correct, the pagefault_disable() is critical.
> > >
> > > What specifically are you referring to?
> > > As far as I can see the current->pagefault_disabled is an artifact of the past.
> > > It doesn't provide any additional information to the fault handler beyond
> > > what extable already does. Consider:
> > >
> > > current->pagefault_disabled++
> > > some C code
> > > asm ("load") // with extable annotation
> > > some other C code
> > > current->pagefault_disabled--
> >
> > pagefault_disabled is not about providing information to the fault
> > handler.  It's about changing the semantics of a fault when accessing
> > a user pointer.  There are two choices of semantics:
> >
> > 1. normal (faulthandler_disabled() returns false): accesses to user
> > memory can sleep, and accesses to valid VMAs are resolved.  exception
> > handlers for valid memory accesses are not called if the fault is
> > resolved.
> >
> > 2. faulthandler_disabled() returns true: accesses to user memory
> > cannot sleep.  If actual processing would be needed to resolve the
> > fault, it's an error instead.
> >
> > This is used for two purposes: optimistic locking and best-effort
> > tracing.  There are code paths that might prefer to access user memory
> > with locks held.  They can pagefault_disable(), try the access, and
> > take the slow path involving dropping locks if the first try fails.
> > And tracing (e.g. perf call stacks) might need to access user memory
> > in a context in which they cannot sleep.  They can disable pagefaults
> > and try.  If they fail, then the user gets an incomplete call stack.
>
> got it. thanks for the explanation. all makes sense.
>
> > This is specifically about user memory access.  Sure, bpf could do
> > pagefault_disable(), but that makes no sense -- bpf is trying to do a
> > *kernel* access and is not validating the pointer.
>
> right.
>
> > On x86, which is a
> > poor architecture in this regard, the memory access instructions don't
> > adequately distinguish between user and kernel access, and bounds
> > checks are necessary.
> >
> > >
> > > If there is fault in the C code the fault handler will do the wrong thing,
> > > since the fault is technically disabled only for asm("load").
> > > The handler will go into bad_area_nosemaphore() instead of find_vma().
> > >
> > > The load instruction is annotated in extable.
> > > The fault handler instead of:
> > >   if (faulthandler_disabled) search_exception_tables()
> > > could do:
> > >  search_exception_tables()
> > > right away without sacrificing anything.
> >
> > This would sacrifice correctness in the vastly more common case of
> > get_user() / put_user() / copy_to/from_user(), which wants to sleep,
> > not return -FAULT.
>
> got it. agree.
>
> >
> > > If the fault is on one of the special asm("load") the intent of the code
> > > is obvious. This is non faulting load that should be fixed up.
> > > Of course the search of extable should happen before taking mmap lock.
> > >
> > > imo pagefault_disabled can be removed.
> > >
> > > > Also the _allowed() part. The bounds check is required.
> > >
> > > Up-thread I was saying that JIT is inlining copy_from_kernel_nofault().
> > > That's not quite correct. When the code was written it was inlining
> > > bpf_probe_read(). Back then _kernel vs _user distinction didn't exist.
> > > So the bounds check wasn't there. The verifier side was designed
> > > for kernel pointers and NULL from the beginning. No user pointer
> > > (aside from NULL) would ever go through this path.
> > > Today I agree that the range check is necessary.
> > > The question is where to do this check.
> > > I see two options:
> > > - the JIT can emit it
> > > - fault handler can do it, since %rip clearly says that it's JITed asm load.
> >
> > The fault handler *can't* emit it correctly because the zero page
> > might actually be mapped.
>
> Yes. The zero page can be mapped. The fault won't happen
> and bpf load will read garbage from user's page zero.
> My understanding that this is extremelly rare. Unpriv users cannot map zero.
> So no security concern for tracing prog accidently stumbling there.
> You mentioned that "wine" does it? Sure. It won't affect bpf tracing.
>
> > Frankly, inlining this at all is a bit bogus, because some day we're
> > going to improve the range check to check whether the pointer points
> > to actual memory, and BPF should really do the same thing.  This will
> > be too big to inline nicely.
>
> Inlining of bpf_probe_read_kernel is absolutely critical to performance.
> Before we added this feature people were reporting that this was the
> hottest function in tracing progs. The map lookups and prog body
> itself were 2nd and 3rd.
> In networking the program has to be fast too. The program authors count
> every nanosecond to process millions of packets per second.

Does the networking case have appropriate locking to make sure you
don't follow a wild pointer?

For that matter, even for tracing, you should probably make sure that
you do real validation
of any pointers you follow as opposed to just checking for being too
close to zero.  If a BPF tracing
program racily follows a wild pointer, it could hit actual mapped user
memory with complex semantics
(e.g. userfaultfd, something backed by FUSE, or something backed by a
filesystem that you are
busily tracing), and the system will deadlock or worse.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  3:09                             ` Andy Lutomirski
@ 2021-01-29  3:26                               ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  3:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 07:09:29PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Jan 28, 2021, at 6:33 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> > 
> > On Thu, Jan 28, 2021 at 06:18:37PM -0800, Andy Lutomirski wrote:
> >>> On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>> 
> >>> On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> >>>> 
> >>>> What exactly could the fault code even do to fix this up?  Something like:
> >>>> 
> >>>> if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> >>>> don't have permission to map NULL) {
> >>>>  special care for bpf;
> >>>> }
> >>> 
> >>> right. where 'special care' is checking extable and skipping single
> >>> load instruction.
> >>> 
> >>>> This seems arbitrary and fragile.  And it's not obviously
> >>>> implementable safely without taking locks,
> >>> 
> >>> search_bpf_extables() needs rcu_read_lock() only.
> >>> Not the locks you're talking about.
> >> 
> >> I mean the locks in the if statement.  How am I supposed to tell
> >> whether this fault is a special bpf fault or a normal page fault
> >> without taking a lock to look up the VMA or to do some other hack?
> > 
> > search_bpf_extables() only needs a faulting rip.
> > No need to lookup vma.
> > if (addr == 0 && search_bpf_extables(regs->ip)...
> > is trivial enough and won't penalize page faults in general.
> > These conditions are not going to happen in the normal kernel code.
> 
> The need to make this decision will happen in normal code.
> 
> The page fault code is a critical piece of the kernel. The absolute top priority is correctness.  Then performance for the actual common case, which is a regular page fault against a valid VMA. Everything else is lower priority.
> 
> This means I’m not about to add an if (special bpf insn) before the VMA lookup. And by the time we do the VMA lookup, we have already lost.

what's the difference between bpf prog and kprobe ? Not much from page fault pov.
They both do things that are not normal for the rest of the kernel.

if (kprobe_page_fault())
is handled first by the fault handler.
So there is a precedent to handle special things.

> You could try to play games with pagefault_disable(), but that will have its own problems.
> 
> 
> > The faulting address and faulting ip will precisely identify this situation.
> > There is no guess work.
> 
> If there is a fault on an instruction with an exception handler and the faulting address is a valid user pointer,
> it is absolutely ambiguous whether it’s BPF chasing a NULL pointer or something else following a valid user pointer.

It's not ambiguous. Please take a look at search_bpf_extables.
It's not like the whole bpf program is one special region.
Only certain specific instructions are in extable. That table was dynamically generated by JIT.
Just like kernel module's extables.
bpf helpers are not in the extable. Most of the bpf prog code is not in it either.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  2:32                           ` Alexei Starovoitov
@ 2021-01-29  3:09                             ` Andy Lutomirski
  2021-01-29  3:26                               ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  3:09 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh



> On Jan 28, 2021, at 6:33 PM, Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> 
> On Thu, Jan 28, 2021 at 06:18:37PM -0800, Andy Lutomirski wrote:
>>> On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>> 
>>> On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
>>>> 
>>>> What exactly could the fault code even do to fix this up?  Something like:
>>>> 
>>>> if (addr == 0 && SMAP off && error_code says it's kernel mode && we
>>>> don't have permission to map NULL) {
>>>>  special care for bpf;
>>>> }
>>> 
>>> right. where 'special care' is checking extable and skipping single
>>> load instruction.
>>> 
>>>> This seems arbitrary and fragile.  And it's not obviously
>>>> implementable safely without taking locks,
>>> 
>>> search_bpf_extables() needs rcu_read_lock() only.
>>> Not the locks you're talking about.
>> 
>> I mean the locks in the if statement.  How am I supposed to tell
>> whether this fault is a special bpf fault or a normal page fault
>> without taking a lock to look up the VMA or to do some other hack?
> 
> search_bpf_extables() only needs a faulting rip.
> No need to lookup vma.
> if (addr == 0 && search_bpf_extables(regs->ip)...
> is trivial enough and won't penalize page faults in general.
> These conditions are not going to happen in the normal kernel code.

The need to make this decision will happen in normal code.

The page fault code is a critical piece of the kernel. The absolute top priority is correctness.  Then performance for the actual common case, which is a regular page fault against a valid VMA. Everything else is lower priority.

This means I’m not about to add an if (special bpf insn) before the VMA lookup. And by the time we do the VMA lookup, we have already lost.

You could try to play games with pagefault_disable(), but that will have its own problems.


> The faulting address and faulting ip will precisely identify this situation.
> There is no guess work.

If there is a fault on an instruction with an exception handler and the faulting address is a valid user pointer, it is absolutely ambiguous whether it’s BPF chasing a NULL pointer or something else following a valid user pointer.

The only reason this thing works somewhat reliably right now is that SMAP lessens the ambiguity to some extent.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  2:18                         ` Andy Lutomirski
@ 2021-01-29  2:32                           ` Alexei Starovoitov
  2021-01-29  3:09                             ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  2:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 06:18:37PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> > >
> > > What exactly could the fault code even do to fix this up?  Something like:
> > >
> > > if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> > > don't have permission to map NULL) {
> > >   special care for bpf;
> > > }
> >
> > right. where 'special care' is checking extable and skipping single
> > load instruction.
> >
> > > This seems arbitrary and fragile.  And it's not obviously
> > > implementable safely without taking locks,
> >
> > search_bpf_extables() needs rcu_read_lock() only.
> > Not the locks you're talking about.
> 
> I mean the locks in the if statement.  How am I supposed to tell
> whether this fault is a special bpf fault or a normal page fault
> without taking a lock to look up the VMA or to do some other hack?

search_bpf_extables() only needs a faulting rip.
No need to lookup vma.
if (addr == 0 && search_bpf_extables(regs->ip)...
is trivial enough and won't penalize page faults in general.
These conditions are not going to happen in the normal kernel code.

> Also, why should BPF get special dispensation to generate a special
> kind of nonsensical page fault that no other kernel code is allowed to
> do?

bpf is special, since it cares about performance a lot
and saving branches in critical path is important.

> 
> >
> > > which we really ought not
> > > to be doing from inside arbitrary bpf programs.
> >
> > Not in arbitrary progs. It's only available to progs loaded by root.
> >
> > > Keep in mind that, if
> > > SMAP is unavailable, the fault code literally does not know whether
> > > the page fault originated form a valid uaccess region.
> > >
> > > There's also always the possibility that you simultaneously run bpf
> > > and something like Wine or DOSEMU2 that actually maps the zero page,
> > > in which case the effect of the bpf code is going to be quite erratic
> > > and will depend on which mm is loaded.
> >
> > It's tracing. Folks who write those progs accepted the fact that
> > the data returned by probe_read is not going to be 100% accurate.
> 
> Is this really all tracing or is some of it actual live network code?

Networking bpf progs don't use this. bpf tracing does.
But I'm not sure why you're asking.
Tracing has higher performance demands than networking.

> >
> > bpf jit can insert !null check before every such special load
> > (since it knows all of them), but it's an obvious performance loss
> > that would be good to avoid. If fault handler can do this
> > if (addr == 0 && ...) search_bpf_extables()
> > at least in some conditions and cpu flags it's already a win.
> > In all other cases bpf jit will insert !null checks.
> 
> Again, there is no guarantee that a page fault is even generated for
> this.  And it doesn't seem very reasonable for the fault code to have
> to decide whether a NULL pointer dereference is a special BPF fault or
> a real user access fault against a VMA at address 9.

The faulting address and faulting ip will precisely identify this situation.
There is no guess work.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  1:53                       ` Alexei Starovoitov
@ 2021-01-29  2:18                         ` Andy Lutomirski
  2021-01-29  2:32                           ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  2:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 5:53 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> >
> > What exactly could the fault code even do to fix this up?  Something like:
> >
> > if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> > don't have permission to map NULL) {
> >   special care for bpf;
> > }
>
> right. where 'special care' is checking extable and skipping single
> load instruction.
>
> > This seems arbitrary and fragile.  And it's not obviously
> > implementable safely without taking locks,
>
> search_bpf_extables() needs rcu_read_lock() only.
> Not the locks you're talking about.

I mean the locks in the if statement.  How am I supposed to tell
whether this fault is a special bpf fault or a normal page fault
without taking a lock to look up the VMA or to do some other hack?
Also, why should BPF get special dispensation to generate a special
kind of nonsensical page fault that no other kernel code is allowed to
do?

>
> > which we really ought not
> > to be doing from inside arbitrary bpf programs.
>
> Not in arbitrary progs. It's only available to progs loaded by root.
>
> > Keep in mind that, if
> > SMAP is unavailable, the fault code literally does not know whether
> > the page fault originated form a valid uaccess region.
> >
> > There's also always the possibility that you simultaneously run bpf
> > and something like Wine or DOSEMU2 that actually maps the zero page,
> > in which case the effect of the bpf code is going to be quite erratic
> > and will depend on which mm is loaded.
>
> It's tracing. Folks who write those progs accepted the fact that
> the data returned by probe_read is not going to be 100% accurate.

Is this really all tracing or is some of it actual live network code?

>
> bpf jit can insert !null check before every such special load
> (since it knows all of them), but it's an obvious performance loss
> that would be good to avoid. If fault handler can do this
> if (addr == 0 && ...) search_bpf_extables()
> at least in some conditions and cpu flags it's already a win.
> In all other cases bpf jit will insert !null checks.

Again, there is no guarantee that a page fault is even generated for
this.  And it doesn't seem very reasonable for the fault code to have
to decide whether a NULL pointer dereference is a special BPF fault or
a real user access fault against a VMA at address 9.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  1:31                     ` Andy Lutomirski
@ 2021-01-29  1:53                       ` Alexei Starovoitov
  2021-01-29  2:18                         ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  1:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 05:31:35PM -0800, Andy Lutomirski wrote:
> 
> What exactly could the fault code even do to fix this up?  Something like:
> 
> if (addr == 0 && SMAP off && error_code says it's kernel mode && we
> don't have permission to map NULL) {
>   special care for bpf;
> }

right. where 'special care' is checking extable and skipping single
load instruction.

> This seems arbitrary and fragile.  And it's not obviously
> implementable safely without taking locks, 

search_bpf_extables() needs rcu_read_lock() only.
Not the locks you're talking about.

> which we really ought not
> to be doing from inside arbitrary bpf programs. 

Not in arbitrary progs. It's only available to progs loaded by root.

> Keep in mind that, if
> SMAP is unavailable, the fault code literally does not know whether
> the page fault originated form a valid uaccess region.
> 
> There's also always the possibility that you simultaneously run bpf
> and something like Wine or DOSEMU2 that actually maps the zero page,
> in which case the effect of the bpf code is going to be quite erratic
> and will depend on which mm is loaded.

It's tracing. Folks who write those progs accepted the fact that
the data returned by probe_read is not going to be 100% accurate.

bpf jit can insert !null check before every such special load
(since it knows all of them), but it's an obvious performance loss
that would be good to avoid. If fault handler can do this
if (addr == 0 && ...) search_bpf_extables() 
at least in some conditions and cpu flags it's already a win.
In all other cases bpf jit will insert !null checks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  1:04                   ` Alexei Starovoitov
@ 2021-01-29  1:31                     ` Andy Lutomirski
  2021-01-29  1:53                       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  1:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 5:04 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 04:45:41PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 28, 2021 at 4:41 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 04:29:51PM -0800, Andy Lutomirski wrote:
> > > > BPF generated a NULL pointer dereference (where NULL is a user
> > > > pointer) and expected it to recover cleanly. What exactly am I
> > > > supposed to debug?  IMO the only thing wrong with the x86 code is that
> > > > it doesn't complain more loudly.  I will fix that, too.
> > >
> > > are you saying that NULL is a _user_ pointer?!
> > > It's NULL. All zeros.
> > > probe_read_kernel(NULL) was returning EFAULT on it and should continue doing so.
> >
> > probe_read_kernel() does not exist.  get_kernel_nofault() returns -ERANGE.
>
> That was an old name. bpf_probe_read_kernel() is using copy_from_kernel_nofault() now.
>
> > And yes, NULL is a user pointer.  I can write you a little Linux
> > program that maps some real valid data at user address 0.  As I noted
>
> are you sure? I thought mmap of addr zero was disallowed long ago.

Quite sure.

#define _GNU_SOURCE

#include <stdio.h>
#include <sys/mman.h>
#include <err.h>

int main()
{
    int *ptr = mmap(0, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS |
MAP_PRIVATE | MAP_FIXED, -1, 0);
    if (ptr == MAP_FAILED)
        err(1, "mmap");

    *ptr = 1;
    printf("I just wrote %d to %p\n", *ptr, ptr);
    return 0;
}

Whether this works or not depends on a complicated combination of
sysctl settings, process capabilities, and whether SELinux feels like
adding its own restrictions.  But it does work on current kernels.

>
> > when I first analyzed this bug, because NULL is a user address, bpf is
> > incorrectly triggering the *user* fault handling code, and that code
> > is objecting.
> >
> > I propose the following fix to the x86 code.  I'll send it as a real
> > patch tomorrow.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=f61282777772f375bba7130ae39ccbd7e83878b2
>
> You realize that you propose to panic kernels for all existing tracing users, right?
>
> Do you have a specific security concern with treating fault on NULL special?

The security concern is probably not that severe because of the
aforementioned restrictions, but I haven't analyzed it very carefully.
But I do have a *functionality* concern.  As the original email that
prompted this whole discussion noted, the current x86 fault code does
not do what you want it to do if SMAP is off.  The fact that it mostly
works with SMAP on is luck.  With SMAP off, we will behave erratically
at best.

What exactly could the fault code even do to fix this up?  Something like:

if (addr == 0 && SMAP off && error_code says it's kernel mode && we
don't have permission to map NULL) {
  special care for bpf;
}

This seems arbitrary and fragile.  And it's not obviously
implementable safely without taking locks, which we really ought not
to be doing from inside arbitrary bpf programs.  Keep in mind that, if
SMAP is unavailable, the fault code literally does not know whether
the page fault originated form a valid uaccess region.

There's also always the possibility that you simultaneously run bpf
and something like Wine or DOSEMU2 that actually maps the zero page,
in which case the effect of the bpf code is going to be quite erratic
and will depend on which mm is loaded.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  1:03             ` Jann Horn
@ 2021-01-29  1:08               ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  1:08 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andy Lutomirski, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Fri, Jan 29, 2021 at 02:03:02AM +0100, Jann Horn wrote:
> On Fri, Jan 29, 2021 at 1:43 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Fri, Jan 29, 2021 at 01:35:16AM +0100, Jann Horn wrote:
> > > On Fri, Jan 29, 2021 at 1:11 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > > that means you have to inline a valid implementation.  In particular,
> > > > > you need to check that you're accessing *kernel* memory.  Just like
> > > >
> > > > That check is on the verifier side. It only does it for kernel
> > > > pointers with known types.
> > > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > > points to valid kernel field which is also a pointer.
> > > > What it doesn't check that b != null, so
> > > > that users don't have to write silly code with 'if (p)' after every
> > > > dereference.
> > >
> > > How is that supposed to work? If I e.g. have a pointer to a
> > > task_struct, and I do something like:
> > >
> > > task->mm->mmap->vm_file->f_inode
> > >
> > > and another thread concurrently mutates the VMA tree and frees the VMA
> > > that we're traversing here, how can BPF guarantee that
> > > task->mm->mmap->vm_file is a valid pointer and not whatever garbage we
> > > read from freed memory?
> >
> > Please read upthread. Every -> is replaced with probe_kernel_read.
> > That's what was kprobes were doing for years. That's what bpf was
> > doing for years.
> 
> Uh... but -> on PTR_TO_BTF_ID pointers is not replaced with
> probe_kernel_read() and can be done directly with BPF_LDX, right?

Almost. They're replaced with BPF_LDX | BPF_PROBE_MEM.
The interpreter is calling bpf_probe_read_kernel() to process them.
JIT is replacing these insns with atomic loads and extable.

> And
> dereferencing a PTR_TO_BTF_ID pointer returns another PTR_TO_BTF_ID
> pointer if type information is available, right? (See
> btf_struct_access().) And stuff like BPF LSM programs or some of the
> XDP stuff receives BTF-typed pointers to kernel data structures as
> arguments, right?
> 
> And as an example, this is visible in
> tools/testing/selftests/bpf/progs/ima.c , which does:
> 
> SEC("lsm.s/bprm_committed_creds")
> int BPF_PROG(ima, struct linux_binprm *bprm)
> {
>   u32 pid = bpf_get_current_pid_tgid() >> 32;
> 
>   if (pid == monitored_pid)
>     ima_hash_ret = bpf_ima_inode_hash(bprm->file->f_inode,
>            &ima_hash, sizeof(ima_hash));
> 
>   return 0;
> }
> 
> As far as I can tell, we are getting a BTF-typed pointer "bprm", doing
> dereferences that again yield BTF-typed pointers, and then pass the
> BTF-typed pointer at the end of it into bpf_ima_inode_hash()?

correct. all of them bpf_probe_read_kernel() == copy_from_kernel_nofault().

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:45                 ` Andy Lutomirski
@ 2021-01-29  1:04                   ` Alexei Starovoitov
  2021-01-29  1:31                     ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  1:04 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 04:45:41PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 28, 2021 at 4:41 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 04:29:51PM -0800, Andy Lutomirski wrote:
> > > BPF generated a NULL pointer dereference (where NULL is a user
> > > pointer) and expected it to recover cleanly. What exactly am I
> > > supposed to debug?  IMO the only thing wrong with the x86 code is that
> > > it doesn't complain more loudly.  I will fix that, too.
> >
> > are you saying that NULL is a _user_ pointer?!
> > It's NULL. All zeros.
> > probe_read_kernel(NULL) was returning EFAULT on it and should continue doing so.
> 
> probe_read_kernel() does not exist.  get_kernel_nofault() returns -ERANGE.

That was an old name. bpf_probe_read_kernel() is using copy_from_kernel_nofault() now.

> And yes, NULL is a user pointer.  I can write you a little Linux
> program that maps some real valid data at user address 0.  As I noted

are you sure? I thought mmap of addr zero was disallowed long ago.

> when I first analyzed this bug, because NULL is a user address, bpf is
> incorrectly triggering the *user* fault handling code, and that code
> is objecting.
> 
> I propose the following fix to the x86 code.  I'll send it as a real
> patch tomorrow.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=f61282777772f375bba7130ae39ccbd7e83878b2

You realize that you propose to panic kernels for all existing tracing users, right?

Do you have a specific security concern with treating fault on NULL special?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:43           ` Alexei Starovoitov
@ 2021-01-29  1:03             ` Jann Horn
  2021-01-29  1:08               ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Jann Horn @ 2021-01-29  1:03 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Fri, Jan 29, 2021 at 1:43 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Jan 29, 2021 at 01:35:16AM +0100, Jann Horn wrote:
> > On Fri, Jan 29, 2021 at 1:11 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > that means you have to inline a valid implementation.  In particular,
> > > > you need to check that you're accessing *kernel* memory.  Just like
> > >
> > > That check is on the verifier side. It only does it for kernel
> > > pointers with known types.
> > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > points to valid kernel field which is also a pointer.
> > > What it doesn't check that b != null, so
> > > that users don't have to write silly code with 'if (p)' after every
> > > dereference.
> >
> > How is that supposed to work? If I e.g. have a pointer to a
> > task_struct, and I do something like:
> >
> > task->mm->mmap->vm_file->f_inode
> >
> > and another thread concurrently mutates the VMA tree and frees the VMA
> > that we're traversing here, how can BPF guarantee that
> > task->mm->mmap->vm_file is a valid pointer and not whatever garbage we
> > read from freed memory?
>
> Please read upthread. Every -> is replaced with probe_kernel_read.
> That's what was kprobes were doing for years. That's what bpf was
> doing for years.

Uh... but -> on PTR_TO_BTF_ID pointers is not replaced with
probe_kernel_read() and can be done directly with BPF_LDX, right? And
dereferencing a PTR_TO_BTF_ID pointer returns another PTR_TO_BTF_ID
pointer if type information is available, right? (See
btf_struct_access().) And stuff like BPF LSM programs or some of the
XDP stuff receives BTF-typed pointers to kernel data structures as
arguments, right?

And as an example, this is visible in
tools/testing/selftests/bpf/progs/ima.c , which does:

SEC("lsm.s/bprm_committed_creds")
int BPF_PROG(ima, struct linux_binprm *bprm)
{
  u32 pid = bpf_get_current_pid_tgid() >> 32;

  if (pid == monitored_pid)
    ima_hash_ret = bpf_ima_inode_hash(bprm->file->f_inode,
           &ima_hash, sizeof(ima_hash));

  return 0;
}

As far as I can tell, we are getting a BTF-typed pointer "bprm", doing
dereferences that again yield BTF-typed pointers, and then pass the
BTF-typed pointer at the end of it into bpf_ima_inode_hash()?

What am I missing?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:41               ` Alexei Starovoitov
@ 2021-01-29  0:45                 ` Andy Lutomirski
  2021-01-29  1:04                   ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  0:45 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 4:41 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 04:29:51PM -0800, Andy Lutomirski wrote:
> > BPF generated a NULL pointer dereference (where NULL is a user
> > pointer) and expected it to recover cleanly. What exactly am I
> > supposed to debug?  IMO the only thing wrong with the x86 code is that
> > it doesn't complain more loudly.  I will fix that, too.
>
> are you saying that NULL is a _user_ pointer?!
> It's NULL. All zeros.
> probe_read_kernel(NULL) was returning EFAULT on it and should continue doing so.

probe_read_kernel() does not exist.  get_kernel_nofault() returns -ERANGE.

And yes, NULL is a user pointer.  I can write you a little Linux
program that maps some real valid data at user address 0.  As I noted
when I first analyzed this bug, because NULL is a user address, bpf is
incorrectly triggering the *user* fault handling code, and that code
is objecting.

I propose the following fix to the x86 code.  I'll send it as a real
patch tomorrow.

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/fixes&id=f61282777772f375bba7130ae39ccbd7e83878b2

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:35         ` Jann Horn
@ 2021-01-29  0:43           ` Alexei Starovoitov
  2021-01-29  1:03             ` Jann Horn
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  0:43 UTC (permalink / raw)
  To: Jann Horn
  Cc: Andy Lutomirski, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Fri, Jan 29, 2021 at 01:35:16AM +0100, Jann Horn wrote:
> On Fri, Jan 29, 2021 at 1:11 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > that means you have to inline a valid implementation.  In particular,
> > > you need to check that you're accessing *kernel* memory.  Just like
> >
> > That check is on the verifier side. It only does it for kernel
> > pointers with known types.
> > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > points to valid kernel field which is also a pointer.
> > What it doesn't check that b != null, so
> > that users don't have to write silly code with 'if (p)' after every
> > dereference.
> 
> How is that supposed to work? If I e.g. have a pointer to a
> task_struct, and I do something like:
> 
> task->mm->mmap->vm_file->f_inode
> 
> and another thread concurrently mutates the VMA tree and frees the VMA
> that we're traversing here, how can BPF guarantee that
> task->mm->mmap->vm_file is a valid pointer and not whatever garbage we
> read from freed memory?

Please read upthread. Every -> is replaced with probe_kernel_read.
That's what was kprobes were doing for years. That's what bpf was
doing for years.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:29             ` Andy Lutomirski
  2021-01-29  0:32               ` Andy Lutomirski
@ 2021-01-29  0:41               ` Alexei Starovoitov
  2021-01-29  0:45                 ` Andy Lutomirski
  1 sibling, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  0:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 04:29:51PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 28, 2021 at 4:26 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 04:18:24PM -0800, Andy Lutomirski wrote:
> > > On Thu, Jan 28, 2021 at 4:11 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > > that means you have to inline a valid implementation.  In particular,
> > > > > you need to check that you're accessing *kernel* memory.  Just like
> > > >
> > > > That check is on the verifier side. It only does it for kernel
> > > > pointers with known types.
> > > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > > points to valid kernel field which is also a pointer.
> > > > What it doesn't check that b != null, so
> > > > that users don't have to write silly code with 'if (p)' after every
> > > > dereference.
> > >
> > > That sounds like a verifier and/or JIT bug.  If you have a pointer p
> > > (doesn't matter whether p is what you call a or a->b) and you have not
> > > confirmed that p points to the kernel range, you may not generate a
> > > load from that pointer.
> >
> > Please read the explanation again. It's an inlined probe_kernel_read.
> 
> Can you point me at the uninlined implementation?  Does it still
> exist?  I see get_kernel_nofault(), which is currently buggy, and I
> will fix it.
> 
> >
> > > >
> > > > > how get_user() validates that the pointer points into user memory,
> > > > > your helper should bounds check the pointer.  On x86, you could check
> > > > > the high bit.
> > > > >
> > > > > As an extra complication, we should really add logic to
> > > > > get_kernel_nofault() to verify that the pointer points into actual
> > > > > memory as opposed to MMIO space (or future incoherent MKTME space or
> > > > > something like that, sigh).  This will severely complicate inlining
> > > > > it.  And we should *really* make the same fix to get_kernel_nofault()
> > > > > -- it should validate that the pointer is a kernel pointer.
> > > > >
> > > > > Is this really worth inlining instead of having the BPF JIT generate
> > > > > an out of line call to a real C function?  That would let us put in a
> > > > > sane implementation.
> > > >
> > > > It's out of the question.
> > > > JIT cannot generate a helper call for single bpf insn without huge overhead.
> > > > All registers are used. It needs full save/restore, stack increase, etc.
> > > >
> > > > Anyhow I bet the bug we're discussing has nothing to do with bpf and jit.
> > > > Something got changed and now probe_kernel_read(NULL) warns on !SMAP.
> > > > This is something to debug.
> > >
> > > The bug is in bpf.
> >
> > If you don't care to debug please don't provide wrong guesses.
> 
> BPF generated a NULL pointer dereference (where NULL is a user
> pointer) and expected it to recover cleanly. What exactly am I
> supposed to debug?  IMO the only thing wrong with the x86 code is that
> it doesn't complain more loudly.  I will fix that, too.

are you saying that NULL is a _user_ pointer?!
It's NULL. All zeros.
probe_read_kernel(NULL) was returning EFAULT on it and should continue doing so.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:11       ` Alexei Starovoitov
  2021-01-29  0:18         ` Andy Lutomirski
@ 2021-01-29  0:35         ` Jann Horn
  2021-01-29  0:43           ` Alexei Starovoitov
  1 sibling, 1 reply; 30+ messages in thread
From: Jann Horn @ 2021-01-29  0:35 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Fri, Jan 29, 2021 at 1:11 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > that means you have to inline a valid implementation.  In particular,
> > you need to check that you're accessing *kernel* memory.  Just like
>
> That check is on the verifier side. It only does it for kernel
> pointers with known types.
> In a sequnce a->b->c the verifier guarantees that 'a' is valid
> kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> points to valid kernel field which is also a pointer.
> What it doesn't check that b != null, so
> that users don't have to write silly code with 'if (p)' after every
> dereference.

How is that supposed to work? If I e.g. have a pointer to a
task_struct, and I do something like:

task->mm->mmap->vm_file->f_inode

and another thread concurrently mutates the VMA tree and frees the VMA
that we're traversing here, how can BPF guarantee that
task->mm->mmap->vm_file is a valid pointer and not whatever garbage we
read from freed memory?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:29             ` Andy Lutomirski
@ 2021-01-29  0:32               ` Andy Lutomirski
  2021-01-29  0:41               ` Alexei Starovoitov
  1 sibling, 0 replies; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  0:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Alexei Starovoitov, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 4:29 PM Andy Lutomirski <luto@kernel.org> wrote:
>
> On Thu, Jan 28, 2021 at 4:26 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 04:18:24PM -0800, Andy Lutomirski wrote:
> > > On Thu, Jan 28, 2021 at 4:11 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > > >
> > > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > > that means you have to inline a valid implementation.  In particular,
> > > > > you need to check that you're accessing *kernel* memory.  Just like
> > > >
> > > > That check is on the verifier side. It only does it for kernel
> > > > pointers with known types.
> > > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > > points to valid kernel field which is also a pointer.
> > > > What it doesn't check that b != null, so
> > > > that users don't have to write silly code with 'if (p)' after every
> > > > dereference.
> > >
> > > That sounds like a verifier and/or JIT bug.  If you have a pointer p
> > > (doesn't matter whether p is what you call a or a->b) and you have not
> > > confirmed that p points to the kernel range, you may not generate a
> > > load from that pointer.
> >
> > Please read the explanation again. It's an inlined probe_kernel_read.
>
> Can you point me at the uninlined implementation?  Does it still
> exist?  I see get_kernel_nofault(), which is currently buggy, and I
> will fix it.

I spoke too soon.  get_kernel_nofault() is just fine.  You have
inlined something like __get_kernel_nofault(), which is incorrect.
get_kernel_nofault() would have done the right thing.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:26           ` Alexei Starovoitov
@ 2021-01-29  0:29             ` Andy Lutomirski
  2021-01-29  0:32               ` Andy Lutomirski
  2021-01-29  0:41               ` Alexei Starovoitov
  0 siblings, 2 replies; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  0:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 4:26 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 04:18:24PM -0800, Andy Lutomirski wrote:
> > On Thu, Jan 28, 2021 at 4:11 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > > >
> > > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > > that means you have to inline a valid implementation.  In particular,
> > > > you need to check that you're accessing *kernel* memory.  Just like
> > >
> > > That check is on the verifier side. It only does it for kernel
> > > pointers with known types.
> > > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > > points to valid kernel field which is also a pointer.
> > > What it doesn't check that b != null, so
> > > that users don't have to write silly code with 'if (p)' after every
> > > dereference.
> >
> > That sounds like a verifier and/or JIT bug.  If you have a pointer p
> > (doesn't matter whether p is what you call a or a->b) and you have not
> > confirmed that p points to the kernel range, you may not generate a
> > load from that pointer.
>
> Please read the explanation again. It's an inlined probe_kernel_read.

Can you point me at the uninlined implementation?  Does it still
exist?  I see get_kernel_nofault(), which is currently buggy, and I
will fix it.

>
> > >
> > > > how get_user() validates that the pointer points into user memory,
> > > > your helper should bounds check the pointer.  On x86, you could check
> > > > the high bit.
> > > >
> > > > As an extra complication, we should really add logic to
> > > > get_kernel_nofault() to verify that the pointer points into actual
> > > > memory as opposed to MMIO space (or future incoherent MKTME space or
> > > > something like that, sigh).  This will severely complicate inlining
> > > > it.  And we should *really* make the same fix to get_kernel_nofault()
> > > > -- it should validate that the pointer is a kernel pointer.
> > > >
> > > > Is this really worth inlining instead of having the BPF JIT generate
> > > > an out of line call to a real C function?  That would let us put in a
> > > > sane implementation.
> > >
> > > It's out of the question.
> > > JIT cannot generate a helper call for single bpf insn without huge overhead.
> > > All registers are used. It needs full save/restore, stack increase, etc.
> > >
> > > Anyhow I bet the bug we're discussing has nothing to do with bpf and jit.
> > > Something got changed and now probe_kernel_read(NULL) warns on !SMAP.
> > > This is something to debug.
> >
> > The bug is in bpf.
>
> If you don't care to debug please don't provide wrong guesses.

BPF generated a NULL pointer dereference (where NULL is a user
pointer) and expected it to recover cleanly. What exactly am I
supposed to debug?  IMO the only thing wrong with the x86 code is that
it doesn't complain more loudly.  I will fix that, too.

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:18         ` Andy Lutomirski
@ 2021-01-29  0:26           ` Alexei Starovoitov
  2021-01-29  0:29             ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  0:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 04:18:24PM -0800, Andy Lutomirski wrote:
> On Thu, Jan 28, 2021 at 4:11 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> > >
> > > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > > that means you have to inline a valid implementation.  In particular,
> > > you need to check that you're accessing *kernel* memory.  Just like
> >
> > That check is on the verifier side. It only does it for kernel
> > pointers with known types.
> > In a sequnce a->b->c the verifier guarantees that 'a' is valid
> > kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> > points to valid kernel field which is also a pointer.
> > What it doesn't check that b != null, so
> > that users don't have to write silly code with 'if (p)' after every
> > dereference.
> 
> That sounds like a verifier and/or JIT bug.  If you have a pointer p
> (doesn't matter whether p is what you call a or a->b) and you have not
> confirmed that p points to the kernel range, you may not generate a
> load from that pointer.

Please read the explanation again. It's an inlined probe_kernel_read.

> >
> > > how get_user() validates that the pointer points into user memory,
> > > your helper should bounds check the pointer.  On x86, you could check
> > > the high bit.
> > >
> > > As an extra complication, we should really add logic to
> > > get_kernel_nofault() to verify that the pointer points into actual
> > > memory as opposed to MMIO space (or future incoherent MKTME space or
> > > something like that, sigh).  This will severely complicate inlining
> > > it.  And we should *really* make the same fix to get_kernel_nofault()
> > > -- it should validate that the pointer is a kernel pointer.
> > >
> > > Is this really worth inlining instead of having the BPF JIT generate
> > > an out of line call to a real C function?  That would let us put in a
> > > sane implementation.
> >
> > It's out of the question.
> > JIT cannot generate a helper call for single bpf insn without huge overhead.
> > All registers are used. It needs full save/restore, stack increase, etc.
> >
> > Anyhow I bet the bug we're discussing has nothing to do with bpf and jit.
> > Something got changed and now probe_kernel_read(NULL) warns on !SMAP.
> > This is something to debug.
> 
> The bug is in bpf.

If you don't care to debug please don't provide wrong guesses.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-29  0:11       ` Alexei Starovoitov
@ 2021-01-29  0:18         ` Andy Lutomirski
  2021-01-29  0:26           ` Alexei Starovoitov
  2021-01-29  0:35         ` Jann Horn
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-29  0:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andy Lutomirski, Yonghong Song, Jann Horn, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, bpf, Alexei Starovoitov,
	kernel-team, X86 ML, KP Singh

On Thu, Jan 28, 2021 at 4:11 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> >
> > Okay, so I guess you're trying to inline probe_read_kernel().  But
> > that means you have to inline a valid implementation.  In particular,
> > you need to check that you're accessing *kernel* memory.  Just like
>
> That check is on the verifier side. It only does it for kernel
> pointers with known types.
> In a sequnce a->b->c the verifier guarantees that 'a' is valid
> kernel pointer and it's also !null. Then it guarantees that offsetof(b)
> points to valid kernel field which is also a pointer.
> What it doesn't check that b != null, so
> that users don't have to write silly code with 'if (p)' after every
> dereference.

That sounds like a verifier and/or JIT bug.  If you have a pointer p
(doesn't matter whether p is what you call a or a->b) and you have not
confirmed that p points to the kernel range, you may not generate a
load from that pointer.

>
> > how get_user() validates that the pointer points into user memory,
> > your helper should bounds check the pointer.  On x86, you could check
> > the high bit.
> >
> > As an extra complication, we should really add logic to
> > get_kernel_nofault() to verify that the pointer points into actual
> > memory as opposed to MMIO space (or future incoherent MKTME space or
> > something like that, sigh).  This will severely complicate inlining
> > it.  And we should *really* make the same fix to get_kernel_nofault()
> > -- it should validate that the pointer is a kernel pointer.
> >
> > Is this really worth inlining instead of having the BPF JIT generate
> > an out of line call to a real C function?  That would let us put in a
> > sane implementation.
>
> It's out of the question.
> JIT cannot generate a helper call for single bpf insn without huge overhead.
> All registers are used. It needs full save/restore, stack increase, etc.
>
> Anyhow I bet the bug we're discussing has nothing to do with bpf and jit.
> Something got changed and now probe_kernel_read(NULL) warns on !SMAP.
> This is something to debug.

The bug is in bpf.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-28 23:51     ` Andy Lutomirski
@ 2021-01-29  0:11       ` Alexei Starovoitov
  2021-01-29  0:18         ` Andy Lutomirski
  2021-01-29  0:35         ` Jann Horn
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2021-01-29  0:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Thu, Jan 28, 2021 at 03:51:13PM -0800, Andy Lutomirski wrote:
> 
> Okay, so I guess you're trying to inline probe_read_kernel().  But
> that means you have to inline a valid implementation.  In particular,
> you need to check that you're accessing *kernel* memory.  Just like

That check is on the verifier side. It only does it for kernel
pointers with known types.
In a sequnce a->b->c the verifier guarantees that 'a' is valid
kernel pointer and it's also !null. Then it guarantees that offsetof(b)
points to valid kernel field which is also a pointer.
What it doesn't check that b != null, so
that users don't have to write silly code with 'if (p)' after every
dereference.

> how get_user() validates that the pointer points into user memory,
> your helper should bounds check the pointer.  On x86, you could check
> the high bit.
> 
> As an extra complication, we should really add logic to
> get_kernel_nofault() to verify that the pointer points into actual
> memory as opposed to MMIO space (or future incoherent MKTME space or
> something like that, sigh).  This will severely complicate inlining
> it.  And we should *really* make the same fix to get_kernel_nofault()
> -- it should validate that the pointer is a kernel pointer.
> 
> Is this really worth inlining instead of having the BPF JIT generate
> an out of line call to a real C function?  That would let us put in a
> sane implementation.

It's out of the question.
JIT cannot generate a helper call for single bpf insn without huge overhead.
All registers are used. It needs full save/restore, stack increase, etc.

Anyhow I bet the bug we're discussing has nothing to do with bpf and jit.
Something got changed and now probe_kernel_read(NULL) warns on !SMAP.
This is something to debug.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-28  1:05   ` Yonghong Song
@ 2021-01-28 23:51     ` Andy Lutomirski
  2021-01-29  0:11       ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-28 23:51 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Andy Lutomirski, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, bpf, Alexei Starovoitov, kernel-team, X86 ML,
	KP Singh

On Wed, Jan 27, 2021 at 5:06 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/27/21 1:47 PM, Andy Lutomirski wrote:
> > On Mon, Jan 25, 2021 at 4:12 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >> When reviewing patch ([1]), which adds a script to run bpf selftest
> >> through qemu at /sbin/init stage, I found the following kernel bug
> >> warning:
> >>
> >> [  112.118892] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1351
> >> [  112.119805] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 354, name: new_name
> >> [  112.120512] 3 locks held by new_name/354:
> >> [  112.120868]  #0: ffff88800476e0a0 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x3a/0x3d0
> >> [  112.121573]  #1: ffffffff82d69800 (rcu_read_lock){....}-{1:2}, at: bpf_iter_run_prog+0x5/0x160
> >> [  112.122348]  #2: ffff8880061c2088 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x1a1/0x640
> >> [  112.123128] Preemption disabled at:
> >> [  112.123130] [<ffffffff8108f913>] migrate_disable+0x33/0x80
> >> [  112.123942] CPU: 0 PID: 354 Comm: new_name Tainted: G           O      5.11.0-rc4-00524-g6e66fbb
> >> 10597-dirty #1249
> >> [  112.124822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01
> >> /2014
> >> [  112.125614] Call Trace:
> >> [  112.125835]  dump_stack+0x77/0x97
> >> [  112.126137]  ___might_sleep.cold.119+0xf2/0x106
> >> [  112.126537]  exc_page_fault+0x1c1/0x640
> >> [  112.126888]  asm_exc_page_fault+0x1e/0x30
> >> [  112.127241] RIP: 0010:bpf_prog_0a182df2d34af188_dump_bpf_prog+0xf5/0xb3c
> >> [  112.127825] Code: 00 00 8b 7d f4 41 8b 76 44 48 39 f7 73 06 48 01 fb 49 89 df 4c 89 7d d8 49 8b
> >> bd 20 01 00 00 48 89 7d e0 49 8b bd e0 00 00 00 <48> 8b 7f 20 48 01 d7 48 89 7d e8 48 89 e9 48 83 c
> >> 1 d0 48 8b 7d c8
> >> [  112.129433] RSP: 0018:ffffc9000035fdc8 EFLAGS: 00010282
> >> [  112.129895] RAX: 0000000000000000 RBX: ffff888005a49458 RCX: 0000000000000024
> >> [  112.130509] RDX: 00000000000002f0 RSI: 0000000000000509 RDI: 0000000000000000
> >> [  112.131126] RBP: ffffc9000035fe20 R08: 0000000000000001 R09: 0000000000000000
> >> [  112.131737] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000400
> >> [  112.132355] R13: ffff888006085800 R14: ffff888004718540 R15: ffff888005a49458
> >> [  112.132990]  ? bpf_prog_0a182df2d34af188_dump_bpf_prog+0xc8/0xb3c
> >> [  112.133526]  bpf_iter_run_prog+0x75/0x160
> >> [  112.133880]  __bpf_prog_seq_show+0x39/0x40
> >> [  112.134258]  bpf_seq_read+0xf6/0x3d0
> >> [  112.134582]  vfs_read+0xa3/0x1b0
> >> [  112.134873]  ksys_read+0x4f/0xc0
> >> [  112.135166]  do_syscall_64+0x2d/0x40
> >> [  112.135482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>
> >> To reproduce the issue, with patch [1] and use the following script:
> >>    tools/testing/selftests/bpf/run_in_vm.sh -- cat /sys/fs/bpf/progs.debug
> >>
> >> The reason of the above kernel warning is due to bpf program
> >> tries to dereference an address of 0 and which is not caught
> >> by bpf exception handling logic.
> >>
> >> ...
> >> SEC("iter/bpf_prog")
> >> int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
> >> {
> >>          struct bpf_prog *prog = ctx->prog;
> >>          struct bpf_prog_aux *aux;
> >>          ...
> >>          if (!prog)
> >>                  return 0;
> >>          aux = prog->aux;
> >>          ...
> >>          ... aux->dst_prog->aux->name ...
> >>          return 0;
> >> }
> >>
> >> If the aux->dst_prog is NULL pointer, a fault will happen when trying
> >> to access aux->dst_prog->aux.
> >>
> >
> > Which would be a bug in the bpf verifier, no?  This is a bpf program
> > that apparently passed verification, and it somehow dereferenced a
> > NULL pointer.
> >
> > Let's enumerate some cases.
> >
> > 1. x86-like architecture, SMAP enabled.  The CPU determines that this
> > is bogus, and bpf gets lucky, because the x86 code runs the exception
> > handler instead of summarily OOPSing.  IMO it would be entirely
> > reasonable to OOPS.
> >
> > 2 x86-like architecture, SMAP disabled.  This looks like a valid user
> > access, and for all bpf knows, 0 might be an actual mapped address,
> > and it might have userfaultfd attached, etc.  And it's called from a
> > non-sleepable context.  The warning is entirely correct.
> >
> > 3. s390x-like architecture.  This is a dereference of a bad kernel
> > pointer.  OOPS, unless you get lucky.
> >
> >
> > This patch is totally bogus.  You can't work around this by some magic
> > exception handling.  Unless I'm missing something big, this is a bpf
> > bug.  The following is not a valid kernel code pattern:
> >
> > label:
> >    dereference might-be-NULL kernel pointer
> > _EXHANDLER_...(label, ...); /* try to recover */
> >
> > This appears to have been introduced by:
> >
> > commit 3dec541b2e632d630fe7142ed44f0b3702ef1f8c
> > Author: Alexei Starovoitov <ast@kernel.org>
> > Date:   Tue Oct 15 20:25:03 2019 -0700
> >
> >      bpf: Add support for BTF pointers to x86 JIT
> >
> > Alexei, this looks like a very long-winded and ultimately incorrect
> > way to remove the branch from:
> >
> > if (ptr)
> >    val = *ptr;
> >
> > Wouldn't it be better to either just emit the branch directly or to
> > make sure that the pointer is valid in the first place?
>
> Let me explain the motivation for this patch.
>
> Previously, for any kernel data structure access,
> people has to use bpf_probe_read() or bpf_probe_read_kernel()
> helper even most of these accesses are okay and will not
> fault. For example, for
>     int t = a->b->c->d
> three bpf_probe_read() will be needed, e.g.,
>     bpf_probe_read_kernel(&t1, sizeof(t1), &a->b);
>     bpf_probe_read_kernel(&t2, sizeof(t2), &t1->c);
>     bpf_probe_read_kernel(&t, sizeof(t), &t2->d);
>
> if there is a fault, bpf_probe_read_kernel() helper will
> suppress the exception and clears the dest buffer and
> return.
>
> The above usage of bpf_probe_read_kernel()
> is complicated and not C like and bpf developers
> does not like it.
>
> bcc (https://github.com/iovisor/bcc/) permits
> users to write "a->b->c->d" styles and then through
> clang rewriter to convert it to a series of
> bpf_probe_read_kernel()'s. But most users are
> directly using clang to compile their programs so
> they have to write bpf_probe_read_kernel()'s
> by themselves.
>
> The motivation here is to improve user experience so
> user can just write
>     int t = a->b->c->d
> some kernel will automatically take care of exceptions
> and maintain bpf_probe_read_kernel() semantics.
> So what the above patch essentially did is to check if the "regs->ip"
> is one of ips which try to a "bpf_probe_read_kernel()" (actually
> a direct access), it will fix up exception (clear the dest register)
> and returns.

Okay, so I guess you're trying to inline probe_read_kernel().  But
that means you have to inline a valid implementation.  In particular,
you need to check that you're accessing *kernel* memory.  Just like
how get_user() validates that the pointer points into user memory,
your helper should bounds check the pointer.  On x86, you could check
the high bit.

As an extra complication, we should really add logic to
get_kernel_nofault() to verify that the pointer points into actual
memory as opposed to MMIO space (or future incoherent MKTME space or
something like that, sigh).  This will severely complicate inlining
it.  And we should *really* make the same fix to get_kernel_nofault()
-- it should validate that the pointer is a kernel pointer.

Is this really worth inlining instead of having the BPF JIT generate
an out of line call to a real C function?  That would let us put in a
sane implementation.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-27 21:47 ` Andy Lutomirski
@ 2021-01-28  1:05   ` Yonghong Song
  2021-01-28 23:51     ` Andy Lutomirski
  0 siblings, 1 reply; 30+ messages in thread
From: Yonghong Song @ 2021-01-28  1:05 UTC (permalink / raw)
  To: Andy Lutomirski, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, kernel-team, X86 ML, KP Singh



On 1/27/21 1:47 PM, Andy Lutomirski wrote:
> On Mon, Jan 25, 2021 at 4:12 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> When reviewing patch ([1]), which adds a script to run bpf selftest
>> through qemu at /sbin/init stage, I found the following kernel bug
>> warning:
>>
>> [  112.118892] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1351
>> [  112.119805] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 354, name: new_name
>> [  112.120512] 3 locks held by new_name/354:
>> [  112.120868]  #0: ffff88800476e0a0 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x3a/0x3d0
>> [  112.121573]  #1: ffffffff82d69800 (rcu_read_lock){....}-{1:2}, at: bpf_iter_run_prog+0x5/0x160
>> [  112.122348]  #2: ffff8880061c2088 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x1a1/0x640
>> [  112.123128] Preemption disabled at:
>> [  112.123130] [<ffffffff8108f913>] migrate_disable+0x33/0x80
>> [  112.123942] CPU: 0 PID: 354 Comm: new_name Tainted: G           O      5.11.0-rc4-00524-g6e66fbb
>> 10597-dirty #1249
>> [  112.124822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01
>> /2014
>> [  112.125614] Call Trace:
>> [  112.125835]  dump_stack+0x77/0x97
>> [  112.126137]  ___might_sleep.cold.119+0xf2/0x106
>> [  112.126537]  exc_page_fault+0x1c1/0x640
>> [  112.126888]  asm_exc_page_fault+0x1e/0x30
>> [  112.127241] RIP: 0010:bpf_prog_0a182df2d34af188_dump_bpf_prog+0xf5/0xb3c
>> [  112.127825] Code: 00 00 8b 7d f4 41 8b 76 44 48 39 f7 73 06 48 01 fb 49 89 df 4c 89 7d d8 49 8b
>> bd 20 01 00 00 48 89 7d e0 49 8b bd e0 00 00 00 <48> 8b 7f 20 48 01 d7 48 89 7d e8 48 89 e9 48 83 c
>> 1 d0 48 8b 7d c8
>> [  112.129433] RSP: 0018:ffffc9000035fdc8 EFLAGS: 00010282
>> [  112.129895] RAX: 0000000000000000 RBX: ffff888005a49458 RCX: 0000000000000024
>> [  112.130509] RDX: 00000000000002f0 RSI: 0000000000000509 RDI: 0000000000000000
>> [  112.131126] RBP: ffffc9000035fe20 R08: 0000000000000001 R09: 0000000000000000
>> [  112.131737] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000400
>> [  112.132355] R13: ffff888006085800 R14: ffff888004718540 R15: ffff888005a49458
>> [  112.132990]  ? bpf_prog_0a182df2d34af188_dump_bpf_prog+0xc8/0xb3c
>> [  112.133526]  bpf_iter_run_prog+0x75/0x160
>> [  112.133880]  __bpf_prog_seq_show+0x39/0x40
>> [  112.134258]  bpf_seq_read+0xf6/0x3d0
>> [  112.134582]  vfs_read+0xa3/0x1b0
>> [  112.134873]  ksys_read+0x4f/0xc0
>> [  112.135166]  do_syscall_64+0x2d/0x40
>> [  112.135482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> To reproduce the issue, with patch [1] and use the following script:
>>    tools/testing/selftests/bpf/run_in_vm.sh -- cat /sys/fs/bpf/progs.debug
>>
>> The reason of the above kernel warning is due to bpf program
>> tries to dereference an address of 0 and which is not caught
>> by bpf exception handling logic.
>>
>> ...
>> SEC("iter/bpf_prog")
>> int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
>> {
>>          struct bpf_prog *prog = ctx->prog;
>>          struct bpf_prog_aux *aux;
>>          ...
>>          if (!prog)
>>                  return 0;
>>          aux = prog->aux;
>>          ...
>>          ... aux->dst_prog->aux->name ...
>>          return 0;
>> }
>>
>> If the aux->dst_prog is NULL pointer, a fault will happen when trying
>> to access aux->dst_prog->aux.
>>
> 
> Which would be a bug in the bpf verifier, no?  This is a bpf program
> that apparently passed verification, and it somehow dereferenced a
> NULL pointer.
> 
> Let's enumerate some cases.
> 
> 1. x86-like architecture, SMAP enabled.  The CPU determines that this
> is bogus, and bpf gets lucky, because the x86 code runs the exception
> handler instead of summarily OOPSing.  IMO it would be entirely
> reasonable to OOPS.
> 
> 2 x86-like architecture, SMAP disabled.  This looks like a valid user
> access, and for all bpf knows, 0 might be an actual mapped address,
> and it might have userfaultfd attached, etc.  And it's called from a
> non-sleepable context.  The warning is entirely correct.
> 
> 3. s390x-like architecture.  This is a dereference of a bad kernel
> pointer.  OOPS, unless you get lucky.
> 
> 
> This patch is totally bogus.  You can't work around this by some magic
> exception handling.  Unless I'm missing something big, this is a bpf
> bug.  The following is not a valid kernel code pattern:
> 
> label:
>    dereference might-be-NULL kernel pointer
> _EXHANDLER_...(label, ...); /* try to recover */
> 
> This appears to have been introduced by:
> 
> commit 3dec541b2e632d630fe7142ed44f0b3702ef1f8c
> Author: Alexei Starovoitov <ast@kernel.org>
> Date:   Tue Oct 15 20:25:03 2019 -0700
> 
>      bpf: Add support for BTF pointers to x86 JIT
> 
> Alexei, this looks like a very long-winded and ultimately incorrect
> way to remove the branch from:
> 
> if (ptr)
>    val = *ptr;
> 
> Wouldn't it be better to either just emit the branch directly or to
> make sure that the pointer is valid in the first place?

Let me explain the motivation for this patch.

Previously, for any kernel data structure access,
people has to use bpf_probe_read() or bpf_probe_read_kernel()
helper even most of these accesses are okay and will not
fault. For example, for
    int t = a->b->c->d
three bpf_probe_read() will be needed, e.g.,
    bpf_probe_read_kernel(&t1, sizeof(t1), &a->b);
    bpf_probe_read_kernel(&t2, sizeof(t2), &t1->c);
    bpf_probe_read_kernel(&t, sizeof(t), &t2->d);

if there is a fault, bpf_probe_read_kernel() helper will
suppress the exception and clears the dest buffer and
return.

The above usage of bpf_probe_read_kernel()
is complicated and not C like and bpf developers
does not like it.

bcc (https://github.com/iovisor/bcc/) permits
users to write "a->b->c->d" styles and then through
clang rewriter to convert it to a series of
bpf_probe_read_kernel()'s. But most users are
directly using clang to compile their programs so
they have to write bpf_probe_read_kernel()'s
by themselves.

The motivation here is to improve user experience so
user can just write
    int t = a->b->c->d
some kernel will automatically take care of exceptions
and maintain bpf_probe_read_kernel() semantics.
So what the above patch essentially did is to check if the "regs->ip"
is one of ips which try to a "bpf_probe_read_kernel()" (actually
a direct access), it will fix up exception (clear the dest register)
and returns.

For a->b->c->d, some users may add "if (ptr) check"
for some of them and if that is the
case, compiler/verifier will honor that. Some users
may not add if they are certain from code that in most
or all cases, pointer will not be null. From verifier
perspective, it will be hard to decide whether
a->b, a->b->c is null or not, adding null checks
to every kernel de-references might be excessive. Also,
not 100% sure whether with null check, the pointer
dereference will absolutely not produce fault or not.
If there is no such guarantee then bpf_probe_read_kernel()
will be still needed.

I see page fault handler specially processed page fault for
kprobe and vsyscall. maybe page faults generated by special
bpf insns (simulating bpf_probe_read_kernel()) can also
be specially processed here.

> 
> --Andy
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-26  0:12 Yonghong Song
  2021-01-26  9:15 ` Peter Zijlstra
@ 2021-01-27 21:47 ` Andy Lutomirski
  2021-01-28  1:05   ` Yonghong Song
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Lutomirski @ 2021-01-27 21:47 UTC (permalink / raw)
  To: Yonghong Song, Jann Horn, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau
  Cc: bpf, Alexei Starovoitov, kernel-team, X86 ML, KP Singh

On Mon, Jan 25, 2021 at 4:12 PM Yonghong Song <yhs@fb.com> wrote:
>
> When reviewing patch ([1]), which adds a script to run bpf selftest
> through qemu at /sbin/init stage, I found the following kernel bug
> warning:
>
> [  112.118892] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1351
> [  112.119805] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 354, name: new_name
> [  112.120512] 3 locks held by new_name/354:
> [  112.120868]  #0: ffff88800476e0a0 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x3a/0x3d0
> [  112.121573]  #1: ffffffff82d69800 (rcu_read_lock){....}-{1:2}, at: bpf_iter_run_prog+0x5/0x160
> [  112.122348]  #2: ffff8880061c2088 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x1a1/0x640
> [  112.123128] Preemption disabled at:
> [  112.123130] [<ffffffff8108f913>] migrate_disable+0x33/0x80
> [  112.123942] CPU: 0 PID: 354 Comm: new_name Tainted: G           O      5.11.0-rc4-00524-g6e66fbb
> 10597-dirty #1249
> [  112.124822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01
> /2014
> [  112.125614] Call Trace:
> [  112.125835]  dump_stack+0x77/0x97
> [  112.126137]  ___might_sleep.cold.119+0xf2/0x106
> [  112.126537]  exc_page_fault+0x1c1/0x640
> [  112.126888]  asm_exc_page_fault+0x1e/0x30
> [  112.127241] RIP: 0010:bpf_prog_0a182df2d34af188_dump_bpf_prog+0xf5/0xb3c
> [  112.127825] Code: 00 00 8b 7d f4 41 8b 76 44 48 39 f7 73 06 48 01 fb 49 89 df 4c 89 7d d8 49 8b
> bd 20 01 00 00 48 89 7d e0 49 8b bd e0 00 00 00 <48> 8b 7f 20 48 01 d7 48 89 7d e8 48 89 e9 48 83 c
> 1 d0 48 8b 7d c8
> [  112.129433] RSP: 0018:ffffc9000035fdc8 EFLAGS: 00010282
> [  112.129895] RAX: 0000000000000000 RBX: ffff888005a49458 RCX: 0000000000000024
> [  112.130509] RDX: 00000000000002f0 RSI: 0000000000000509 RDI: 0000000000000000
> [  112.131126] RBP: ffffc9000035fe20 R08: 0000000000000001 R09: 0000000000000000
> [  112.131737] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000400
> [  112.132355] R13: ffff888006085800 R14: ffff888004718540 R15: ffff888005a49458
> [  112.132990]  ? bpf_prog_0a182df2d34af188_dump_bpf_prog+0xc8/0xb3c
> [  112.133526]  bpf_iter_run_prog+0x75/0x160
> [  112.133880]  __bpf_prog_seq_show+0x39/0x40
> [  112.134258]  bpf_seq_read+0xf6/0x3d0
> [  112.134582]  vfs_read+0xa3/0x1b0
> [  112.134873]  ksys_read+0x4f/0xc0
> [  112.135166]  do_syscall_64+0x2d/0x40
> [  112.135482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> To reproduce the issue, with patch [1] and use the following script:
>   tools/testing/selftests/bpf/run_in_vm.sh -- cat /sys/fs/bpf/progs.debug
>
> The reason of the above kernel warning is due to bpf program
> tries to dereference an address of 0 and which is not caught
> by bpf exception handling logic.
>
> ...
> SEC("iter/bpf_prog")
> int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
> {
>         struct bpf_prog *prog = ctx->prog;
>         struct bpf_prog_aux *aux;
>         ...
>         if (!prog)
>                 return 0;
>         aux = prog->aux;
>         ...
>         ... aux->dst_prog->aux->name ...
>         return 0;
> }
>
> If the aux->dst_prog is NULL pointer, a fault will happen when trying
> to access aux->dst_prog->aux.
>

Which would be a bug in the bpf verifier, no?  This is a bpf program
that apparently passed verification, and it somehow dereferenced a
NULL pointer.

Let's enumerate some cases.

1. x86-like architecture, SMAP enabled.  The CPU determines that this
is bogus, and bpf gets lucky, because the x86 code runs the exception
handler instead of summarily OOPSing.  IMO it would be entirely
reasonable to OOPS.

2 x86-like architecture, SMAP disabled.  This looks like a valid user
access, and for all bpf knows, 0 might be an actual mapped address,
and it might have userfaultfd attached, etc.  And it's called from a
non-sleepable context.  The warning is entirely correct.

3. s390x-like architecture.  This is a dereference of a bad kernel
pointer.  OOPS, unless you get lucky.


This patch is totally bogus.  You can't work around this by some magic
exception handling.  Unless I'm missing something big, this is a bpf
bug.  The following is not a valid kernel code pattern:

label:
  dereference might-be-NULL kernel pointer
_EXHANDLER_...(label, ...); /* try to recover */

This appears to have been introduced by:

commit 3dec541b2e632d630fe7142ed44f0b3702ef1f8c
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Tue Oct 15 20:25:03 2019 -0700

    bpf: Add support for BTF pointers to x86 JIT

Alexei, this looks like a very long-winded and ultimately incorrect
way to remove the branch from:

if (ptr)
  val = *ptr;

Wouldn't it be better to either just emit the branch directly or to
make sure that the pointer is valid in the first place?

--Andy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-26 19:21   ` Yonghong Song
  2021-01-27  9:30     ` Peter Zijlstra
@ 2021-01-27 21:07     ` Andrii Nakryiko
  1 sibling, 0 replies; 30+ messages in thread
From: Andrii Nakryiko @ 2021-01-27 21:07 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Peter Zijlstra, bpf, Alexei Starovoitov, Daniel Borkmann,
	Kernel Team, x86, KP Singh

On Tue, Jan 26, 2021 at 2:57 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/26/21 1:15 AM, Peter Zijlstra wrote:
> > On Mon, Jan 25, 2021 at 04:12:19PM -0800, Yonghong Song wrote:
> >> When the test is run normally after login prompt, cpu_feature_enabled(X86_FEATURE_SMAP)
> >> is true and bad_area_nosemaphore() is called and then fixup_exception() is called,
> >> where bpf specific handler is able to fixup the exception.
> >>
> >> But when the test is run at /sbin/init time, cpu_feature_enabled(X86_FEATURE_SMAP) is
> >> false, the control reaches
> >
> > That makes no sense, cpu features should be set in stone long before we
> > reach userspace.
>
> You are correct. Sorry for misleading description. The reason is I use
> two qemu script, one from my local environment and the other from ci
> test since they works respectively. I thought they should have roughly
> the same kernel setup, but apparently they are not.
>
> For my local qemu script, I have "-cpu host" option and with this,
> X86_FEATURE_SMAP is set up probably in function get_cpu_cap(), file
> arch/x86/kernel/cpu/common.c.
>
> For CI qemu script (in
> https://lore.kernel.org/bpf/20210123004445.299149-1-kpsingh@kernel.org/),
> the "-cpu kvm64" is the qemu argument. This cpu does not
> enable X86_FEATURE_SMAP, so we will see the kernel warning.
>
> Changing CI script to use "-cpu host" resolved the issue. I think "-cpu
> kvm64" may emulate old x86 cpus and ignore X86_FEATURE_SMAP.
>
> Do we have any x64 cpus which does not support X86_FEATURE_SMAP?

We don't control what CPUs are used in our CIs, which is why we didn't
use "-cpu host". Is there some other way to make necessary features be
available in qemu for this to work and not emit warnings?

But also, what will happen in this case on CPUs that really don't
support X86_FEATURE_SMAP? Should that be addressed instead?


>
> For CI, with "-cpu kvm64" is good as it can specify the number
> of cores with e.g. "-smp 4" regardless of underlying host # of cores.
> I think we could change CI to use "-cpu host" by ensuring the CI host
> having at least 4 cores.
>
> Thanks.
>
>
> >
> >> To fix the issue, before the above mmap_read_trylock(), we will check
> >> whether fault ip can be served by bpf exception handler or not, if
> >> yes, the exception will be fixed up and return.
> >
> >
> >
> >> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> >> index f1f1b5a0956a..e8182d30bf67 100644
> >> --- a/arch/x86/mm/fault.c
> >> +++ b/arch/x86/mm/fault.c
> >> @@ -1317,6 +1317,15 @@ void do_user_addr_fault(struct pt_regs *regs,
> >>              if (emulate_vsyscall(hw_error_code, regs, address))
> >>                      return;
> >>      }
> >> +
> >> +#ifdef CONFIG_BPF_JIT
> >> +    /*
> >> +     * Faults incurred by bpf program might need emulation, i.e.,
> >> +     * clearing the dest register.
> >> +     */
> >> +    if (fixup_bpf_exception(regs, X86_TRAP_PF, hw_error_code, address))
> >> +            return;
> >> +#endif
> >>   #endif
> >
> > NAK, this is broken. You're now disallowing faults that should've gone
> > through.
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-26 19:21   ` Yonghong Song
@ 2021-01-27  9:30     ` Peter Zijlstra
  2021-01-27 21:07     ` Andrii Nakryiko
  1 sibling, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2021-01-27  9:30 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, kernel-team, x86, KP Singh

On Tue, Jan 26, 2021 at 11:21:01AM -0800, Yonghong Song wrote:
> Do we have any x64 cpus which does not support X86_FEATURE_SMAP?

x64 is not an achitecture I know of. If you typoed x86_64 then sure, I
think SMAP is fairly recent, Wikipedia seems to suggest Broadwell and
later. AMD seems to sport it since Zen.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-26  9:15 ` Peter Zijlstra
@ 2021-01-26 19:21   ` Yonghong Song
  2021-01-27  9:30     ` Peter Zijlstra
  2021-01-27 21:07     ` Andrii Nakryiko
  0 siblings, 2 replies; 30+ messages in thread
From: Yonghong Song @ 2021-01-26 19:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, kernel-team, x86, KP Singh



On 1/26/21 1:15 AM, Peter Zijlstra wrote:
> On Mon, Jan 25, 2021 at 04:12:19PM -0800, Yonghong Song wrote:
>> When the test is run normally after login prompt, cpu_feature_enabled(X86_FEATURE_SMAP)
>> is true and bad_area_nosemaphore() is called and then fixup_exception() is called,
>> where bpf specific handler is able to fixup the exception.
>>
>> But when the test is run at /sbin/init time, cpu_feature_enabled(X86_FEATURE_SMAP) is
>> false, the control reaches
> 
> That makes no sense, cpu features should be set in stone long before we
> reach userspace.

You are correct. Sorry for misleading description. The reason is I use 
two qemu script, one from my local environment and the other from ci 
test since they works respectively. I thought they should have roughly
the same kernel setup, but apparently they are not.

For my local qemu script, I have "-cpu host" option and with this,
X86_FEATURE_SMAP is set up probably in function get_cpu_cap(), file
arch/x86/kernel/cpu/common.c.

For CI qemu script (in 
https://lore.kernel.org/bpf/20210123004445.299149-1-kpsingh@kernel.org/),
the "-cpu kvm64" is the qemu argument. This cpu does not
enable X86_FEATURE_SMAP, so we will see the kernel warning.

Changing CI script to use "-cpu host" resolved the issue. I think "-cpu 
kvm64" may emulate old x86 cpus and ignore X86_FEATURE_SMAP.

Do we have any x64 cpus which does not support X86_FEATURE_SMAP?

For CI, with "-cpu kvm64" is good as it can specify the number
of cores with e.g. "-smp 4" regardless of underlying host # of cores.
I think we could change CI to use "-cpu host" by ensuring the CI host
having at least 4 cores.

Thanks.


> 
>> To fix the issue, before the above mmap_read_trylock(), we will check
>> whether fault ip can be served by bpf exception handler or not, if
>> yes, the exception will be fixed up and return.
> 
> 
> 
>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
>> index f1f1b5a0956a..e8182d30bf67 100644
>> --- a/arch/x86/mm/fault.c
>> +++ b/arch/x86/mm/fault.c
>> @@ -1317,6 +1317,15 @@ void do_user_addr_fault(struct pt_regs *regs,
>>   		if (emulate_vsyscall(hw_error_code, regs, address))
>>   			return;
>>   	}
>> +
>> +#ifdef CONFIG_BPF_JIT
>> +	/*
>> +	 * Faults incurred by bpf program might need emulation, i.e.,
>> +	 * clearing the dest register.
>> +	 */
>> +	if (fixup_bpf_exception(regs, X86_TRAP_PF, hw_error_code, address))
>> +		return;
>> +#endif
>>   #endif
> 
> NAK, this is broken. You're now disallowing faults that should've gone
> through.
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
  2021-01-26  0:12 Yonghong Song
@ 2021-01-26  9:15 ` Peter Zijlstra
  2021-01-26 19:21   ` Yonghong Song
  2021-01-27 21:47 ` Andy Lutomirski
  1 sibling, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2021-01-26  9:15 UTC (permalink / raw)
  To: Yonghong Song
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, kernel-team, x86, KP Singh

On Mon, Jan 25, 2021 at 04:12:19PM -0800, Yonghong Song wrote:
> When the test is run normally after login prompt, cpu_feature_enabled(X86_FEATURE_SMAP)
> is true and bad_area_nosemaphore() is called and then fixup_exception() is called,
> where bpf specific handler is able to fixup the exception.
> 
> But when the test is run at /sbin/init time, cpu_feature_enabled(X86_FEATURE_SMAP) is
> false, the control reaches

That makes no sense, cpu features should be set in stone long before we
reach userspace.

> To fix the issue, before the above mmap_read_trylock(), we will check
> whether fault ip can be served by bpf exception handler or not, if
> yes, the exception will be fixed up and return.



> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..e8182d30bf67 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1317,6 +1317,15 @@ void do_user_addr_fault(struct pt_regs *regs,
>  		if (emulate_vsyscall(hw_error_code, regs, address))
>  			return;
>  	}
> +
> +#ifdef CONFIG_BPF_JIT
> +	/*
> +	 * Faults incurred by bpf program might need emulation, i.e.,
> +	 * clearing the dest register.
> +	 */
> +	if (fixup_bpf_exception(regs, X86_TRAP_PF, hw_error_code, address))
> +		return;
> +#endif
>  #endif

NAK, this is broken. You're now disallowing faults that should've gone
through.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly
@ 2021-01-26  0:12 Yonghong Song
  2021-01-26  9:15 ` Peter Zijlstra
  2021-01-27 21:47 ` Andy Lutomirski
  0 siblings, 2 replies; 30+ messages in thread
From: Yonghong Song @ 2021-01-26  0:12 UTC (permalink / raw)
  To: bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, x86, KP Singh

When reviewing patch ([1]), which adds a script to run bpf selftest
through qemu at /sbin/init stage, I found the following kernel bug
warning:

[  112.118892] BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1351
[  112.119805] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 354, name: new_name
[  112.120512] 3 locks held by new_name/354:
[  112.120868]  #0: ffff88800476e0a0 (&p->lock){+.+.}-{3:3}, at: bpf_seq_read+0x3a/0x3d0
[  112.121573]  #1: ffffffff82d69800 (rcu_read_lock){....}-{1:2}, at: bpf_iter_run_prog+0x5/0x160
[  112.122348]  #2: ffff8880061c2088 (&mm->mmap_lock#2){++++}-{3:3}, at: exc_page_fault+0x1a1/0x640
[  112.123128] Preemption disabled at:
[  112.123130] [<ffffffff8108f913>] migrate_disable+0x33/0x80
[  112.123942] CPU: 0 PID: 354 Comm: new_name Tainted: G           O      5.11.0-rc4-00524-g6e66fbb
10597-dirty #1249
[  112.124822] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01
/2014
[  112.125614] Call Trace:
[  112.125835]  dump_stack+0x77/0x97
[  112.126137]  ___might_sleep.cold.119+0xf2/0x106
[  112.126537]  exc_page_fault+0x1c1/0x640
[  112.126888]  asm_exc_page_fault+0x1e/0x30
[  112.127241] RIP: 0010:bpf_prog_0a182df2d34af188_dump_bpf_prog+0xf5/0xb3c
[  112.127825] Code: 00 00 8b 7d f4 41 8b 76 44 48 39 f7 73 06 48 01 fb 49 89 df 4c 89 7d d8 49 8b
bd 20 01 00 00 48 89 7d e0 49 8b bd e0 00 00 00 <48> 8b 7f 20 48 01 d7 48 89 7d e8 48 89 e9 48 83 c
1 d0 48 8b 7d c8
[  112.129433] RSP: 0018:ffffc9000035fdc8 EFLAGS: 00010282
[  112.129895] RAX: 0000000000000000 RBX: ffff888005a49458 RCX: 0000000000000024
[  112.130509] RDX: 00000000000002f0 RSI: 0000000000000509 RDI: 0000000000000000
[  112.131126] RBP: ffffc9000035fe20 R08: 0000000000000001 R09: 0000000000000000
[  112.131737] R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000400
[  112.132355] R13: ffff888006085800 R14: ffff888004718540 R15: ffff888005a49458
[  112.132990]  ? bpf_prog_0a182df2d34af188_dump_bpf_prog+0xc8/0xb3c
[  112.133526]  bpf_iter_run_prog+0x75/0x160
[  112.133880]  __bpf_prog_seq_show+0x39/0x40
[  112.134258]  bpf_seq_read+0xf6/0x3d0
[  112.134582]  vfs_read+0xa3/0x1b0
[  112.134873]  ksys_read+0x4f/0xc0
[  112.135166]  do_syscall_64+0x2d/0x40
[  112.135482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

To reproduce the issue, with patch [1] and use the following script:
  tools/testing/selftests/bpf/run_in_vm.sh -- cat /sys/fs/bpf/progs.debug

The reason of the above kernel warning is due to bpf program
tries to dereference an address of 0 and which is not caught
by bpf exception handling logic.

...
SEC("iter/bpf_prog")
int dump_bpf_prog(struct bpf_iter__bpf_prog *ctx)
{
	struct bpf_prog *prog = ctx->prog;
	struct bpf_prog_aux *aux;
	...
	if (!prog)
		return 0;
	aux = prog->aux;
	...
	... aux->dst_prog->aux->name ...
	return 0;
}

If the aux->dst_prog is NULL pointer, a fault will happen when trying
to access aux->dst_prog->aux.

In arch/x86/mm/fault.c function do_usr_addr_fault(), we have following code
         if (unlikely(cpu_feature_enabled(X86_FEATURE_SMAP) &&
                      !(hw_error_code & X86_PF_USER) &&
                      !(regs->flags & X86_EFLAGS_AC)))
         {
                 bad_area_nosemaphore(regs, hw_error_code, address);
                 return;
         }

When the test is run normally after login prompt, cpu_feature_enabled(X86_FEATURE_SMAP)
is true and bad_area_nosemaphore() is called and then fixup_exception() is called,
where bpf specific handler is able to fixup the exception.

But when the test is run at /sbin/init time, cpu_feature_enabled(X86_FEATURE_SMAP) is
false, the control reaches
         if (unlikely(!mmap_read_trylock(mm))) {
                 if (!user_mode(regs) && !search_exception_tables(regs->ip)) {
                         /*
                          * Fault from code in kernel from
                          * which we do not expect faults.
                          */
                         bad_area_nosemaphore(regs, hw_error_code, address);
                         return;
                 }
retry:
                 mmap_read_lock(mm);
         } else {
                 /*
                  * The above down_read_trylock() might have succeeded in
                  * which case we'll have missed the might_sleep() from
                  * down_read():
                  */
                 might_sleep();
         }
and might_sleep() is triggered and the above kernel warning is print.

To fix the issue, before the above mmap_read_trylock(), we will check
whether fault ip can be served by bpf exception handler or not, if
yes, the exception will be fixed up and return.

[1] https://lore.kernel.org/bpf/20210123004445.299149-1-kpsingh@kernel.org/

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
---
 arch/x86/include/asm/extable.h |  3 +++
 arch/x86/mm/extable.c          | 14 ++++++++++++++
 arch/x86/mm/fault.c            |  9 +++++++++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/include/asm/extable.h b/arch/x86/include/asm/extable.h
index 1f0cbc52937c..1e99da15336c 100644
--- a/arch/x86/include/asm/extable.h
+++ b/arch/x86/include/asm/extable.h
@@ -38,6 +38,9 @@ enum handler_type {
 
 extern int fixup_exception(struct pt_regs *regs, int trapnr,
 			   unsigned long error_code, unsigned long fault_addr);
+extern int fixup_bpf_exception(struct pt_regs *regs, int trapnr,
+			       unsigned long error_code,
+			       unsigned long fault_addr);
 extern int fixup_bug(struct pt_regs *regs, int trapnr);
 extern enum handler_type ex_get_fault_handler_type(unsigned long ip);
 extern void early_fixup_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index b93d6cd08a7f..311da42c0aec 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -155,6 +155,20 @@ enum handler_type ex_get_fault_handler_type(unsigned long ip)
 		return EX_HANDLER_OTHER;
 }
 
+int fixup_bpf_exception(struct pt_regs *regs, int trapnr,
+			unsigned long error_code, unsigned long fault_addr)
+{
+	const struct exception_table_entry *e;
+	ex_handler_t handler;
+
+	e = search_bpf_extables(regs->ip);
+	if (!e)
+		return 0;
+
+	handler = ex_fixup_handler(e);
+	return handler(e, regs, trapnr, error_code, fault_addr);
+}
+
 int fixup_exception(struct pt_regs *regs, int trapnr, unsigned long error_code,
 		    unsigned long fault_addr)
 {
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..e8182d30bf67 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1317,6 +1317,15 @@ void do_user_addr_fault(struct pt_regs *regs,
 		if (emulate_vsyscall(hw_error_code, regs, address))
 			return;
 	}
+
+#ifdef CONFIG_BPF_JIT
+	/*
+	 * Faults incurred by bpf program might need emulation, i.e.,
+	 * clearing the dest register.
+	 */
+	if (fixup_bpf_exception(regs, X86_TRAP_PF, hw_error_code, address))
+		return;
+#endif
 #endif
 
 	/*
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-01-31 20:05 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <YBPToXfWV1ekRo4q@hirez.programming.kicks-ass.net>
     [not found] ` <97EF0157-F068-4234-B826-C08B7324F356@amacapital.net>
2021-01-29 23:11   ` [PATCH bpf] x86/bpf: handle bpf-program-triggered exceptions properly Alexei Starovoitov
2021-01-29 23:30     ` Andy Lutomirski
2021-01-30  2:54       ` Alexei Starovoitov
2021-01-31 17:32         ` Andy Lutomirski
2021-01-26  0:12 Yonghong Song
2021-01-26  9:15 ` Peter Zijlstra
2021-01-26 19:21   ` Yonghong Song
2021-01-27  9:30     ` Peter Zijlstra
2021-01-27 21:07     ` Andrii Nakryiko
2021-01-27 21:47 ` Andy Lutomirski
2021-01-28  1:05   ` Yonghong Song
2021-01-28 23:51     ` Andy Lutomirski
2021-01-29  0:11       ` Alexei Starovoitov
2021-01-29  0:18         ` Andy Lutomirski
2021-01-29  0:26           ` Alexei Starovoitov
2021-01-29  0:29             ` Andy Lutomirski
2021-01-29  0:32               ` Andy Lutomirski
2021-01-29  0:41               ` Alexei Starovoitov
2021-01-29  0:45                 ` Andy Lutomirski
2021-01-29  1:04                   ` Alexei Starovoitov
2021-01-29  1:31                     ` Andy Lutomirski
2021-01-29  1:53                       ` Alexei Starovoitov
2021-01-29  2:18                         ` Andy Lutomirski
2021-01-29  2:32                           ` Alexei Starovoitov
2021-01-29  3:09                             ` Andy Lutomirski
2021-01-29  3:26                               ` Alexei Starovoitov
2021-01-29  0:35         ` Jann Horn
2021-01-29  0:43           ` Alexei Starovoitov
2021-01-29  1:03             ` Jann Horn
2021-01-29  1:08               ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.