All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jann Horn <jannh@google.com>
Cc: bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Matthew Garrett <mjg59@google.com>
Subject: Re: BPF map freezing is unreliable; can we instead just inline constants?
Date: Tue, 14 Apr 2020 21:59:05 -0700	[thread overview]
Message-ID: <CAEf4BzZ2vmdmn111KXXrp3qp1qLb4iMjUJ11Cj06SOGeOB6_Qg@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez0mmVtBVTjy-KmpUnvJ52O=EYKwJWoCxcXH8O6zCG1QHA@mail.gmail.com>

On Tue, Apr 14, 2020 at 3:50 PM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Apr 14, 2020 at 9:46 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> > On Tue, Apr 14, 2020 at 9:07 AM Jann Horn <jannh@google.com> wrote:
> > > On Fri, Apr 10, 2020 at 10:48 PM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Fri, Apr 10, 2020 at 1:47 AM Jann Horn <jannh@google.com> wrote:
> > > > > On Fri, Apr 10, 2020 at 1:33 AM Andrii Nakryiko
> > > > > <andrii.nakryiko@gmail.com> wrote:
> > > > > > On Wed, Apr 8, 2020 at 12:42 PM Jann Horn <jannh@google.com> wrote:
> > > > > > >
> > > > > > > Hi!
> > > > > > >
> > > > > > > I saw that BPF allows root to create frozen maps, for which the
> > > > > > > verifier then assumes that they contain constant values. However, map
> > > > > > > freezing is pretty wobbly:
> [...]
> > > > > > > 3. It is assumed that a memory mapping can't be used to write to a
> > > > > > > page anymore after the mapping has been removed; but actually,
> > > > > > > userspace can grab references to pages in a VMA and use those
> > > > > > > references to write to the VMA's pages after the VMA has already been
> > > > > > > closed. (crasher attached as bpf-constant-data-uffd.c, compile with
> > > > > > > "gcc -pthread ...")
> > > > > >
> > > > > > Please help me understand how that works (assuming we drop
> > > > > > VM_MAYWRITE, of course). You mmap() as R/W, then munmap(), then
> > > > > > freeze(). After munmap() refcount of writable pages should drop to
> > > > > > zero. And mmap'ed address should be invalid and unmapped. I'm missing
> > > > > > how after munmap() parallel thread still can write to that memory
> > > > > > page?
> > > > >
> > > > > The mmap()/munmap() syscalls place references to the pages the kernel
> > > > > is using in the page tables of the process. Some other syscalls (such
> > > > > as process_vm_writev()) can read these page table entries, take their
> > > > > own references on those backing pages, and then continue to access
> > > > > those pages even after they've been removed from the task's page
> > > > > tables by munmap(). This works as long as the page table entries don't
> > > > > have magic marker bits on them that prohibit this, which you get if
> > > > > you use something like remap_pfn_range() in a loop instead of
> > > > > remap_vmalloc_range() - but the memory mappings created by that
> > > > > syscall are weird, and e.g. some syscalls like read() and write()
> > > > > might sometimes fail if the buffer argument points into such a memory
> > > > > region.
> > > >
> > > > So mmap() subsystem won't event know about those extra references and
> > > > thus we can't really account that in our code, right? That's sad, but
> > > > hopefully those APIs are root-only?
> > >
> > > Nope, those APIs are reachable by normal users. These extra references
> > > are counted on the page refcount - since they have to be tracked
> > > somehow - but as far as I know, that refcount can also be elevated
> > > spuriously, so triggering hard errors based on it is probably not a
> > > good idea.
> > >
> >
> > Just trying to educate myself and you seem to know a lot about this.
> > If we think about regular file memory-mapping with mmap(). According
> > to this, it seems like it would be possible to mmap() file as writable
> > first, do some changes and then munmap. At this point some application
> > would assume that file can't be modified anymore and can be treated as
> > read-only, yet, it's possible that some other process/thread can just
> > go and still modify file contents. Do I understand correctly?
>
> Yep, exactly.
>
> There are some longstanding issues around this stuff - e.g. in some
> contrived scenarios, this can mean that when you call read() and
> fork() at the same time in a multithreaded process, the data you read
> becomes visible in the child instead of in the parent
> (https://lore.kernel.org/lkml/CAG48ez17Of=dnymzm8GAN_CNG1okMg1KTeMtBQhXGP2dyB5uJw@mail.gmail.com/).
> At least a while ago, it could also cause crashes in filesystem code
> (see e.g. https://lwn.net/Articles/774411/), and cause issues for code
> that wants to compute stable checksums of pages
> (https://lwn.net/Articles/787636/); I'm not sure what the state of
> that stuff is.

Oh, ok, thanks for details. This is... illuminating for sure...

>
> > > > > > > Is there a reason why the verifier doesn't replace loads from frozen
> > > > > > > maps with the values stored in those maps? That seems like it would be
> > > > > > > not only easier to secure, but additionally more performant.
> > > > > >
> > > > > > Verifier doesn't always know exact offset at which program is going to
> > > > > > read read-only map contents. So something like this works:
> > > > > >
> > > > > > const volatile long arr[256];
> > > > > >
> > > > > > int x = rand() % 256;
> > > > > > int y = arr[x];
> > > > > >
> > > > > > In this case verifier doesn't really know the value of y, so it can't
> > > > > > be inlined. Then you can have code in which in one branch register is
> > > > > > loaded with known value, but in another branch same register gets some
> > > > > > value at random offset. Constant tracking is code path-sensitive,
> > > > > > while instructions are shared between different code paths. Unless I'm
> > > > > > missing what you are proposing :)
> > > > >
> > > > > Ah, I missed that possibility. But is that actually something that
> > > > > people do in practice? Or would it be okay for the verifier to just
> > > > > assume an unknown value in these cases?
> > > >
> > > > Verifier will assume unknown value for the branch that has variable
> > > > offset. It can't do the same for another branch (with constant offset)
> > > > because it might not yet have encountered branch with variable offset.
> > > > But either way, you were proposing to rewrite instruction and inline
> > > > read constant, and I don't think it's possible because of this.
> > >
> > > Ah, I see what you mean. That sucks. I guess that means that to fix
> > > this up properly in such edgecases, we'd have to, for each memory
> > > read, keep track of all the values that we want to hardcode for it,
> > > and then generate branches in the unlikely case that the instruction
> > > was reached on paths that expect different values?
> >
> > I guess, though that sounds extreme and extremely unlikely.
>
> It just seems kinda silly to me to have extra memory loads if we know
> that those loads will in most cases load the same fixed value on every
> execution... but oh well.
>
> > I'd say
> > the better way would be to implement immutable BPF maps from the time
> > they are created. E.g., at the time of creating map, you specify extra
> > flag BPF_F_IMMUTABLE and specify pointer to a blob of memory with
> > key/value pairs in it.
>
> It seems a bit hacky to me to add a new special interface for
> populating an immutable map. Wouldn't it make more sense to add a flag
> for "you can't use mmap on this map", or "I plan to freeze this map",
> or something like that, and keep the freezing API?

"you can't use mmap on this map" is default behavior, unless you
specify BPF_F_MMAPABLE. "I plan to freeze this map" could be added,
but how that would help existing users that freeze and mmap()?
Disallowing those now would be a breaking change.

Currently, libbpf is using freezing for .rodata variables, but it
doesn't mmap() before freezing. What we are talking about is malicious
user trying to cause a crash, which given everything is under root is
a bit of a moot point. So I don't know if we actually want to fix
anything here, given that lots of filesystems are already broken for a
while for similar reasons... But it's certainly good to know about
issues like this.

  reply	other threads:[~2020-04-15  4:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-08 16:56 BPF map freezing is unreliable; can we instead just inline constants? Jann Horn
2020-04-09 23:33 ` Andrii Nakryiko
2020-04-10  8:46   ` Jann Horn
2020-04-10 20:48     ` Andrii Nakryiko
2020-04-14 16:07       ` Jann Horn
2020-04-14 19:46         ` Andrii Nakryiko
2020-04-14 22:50           ` Jann Horn
2020-04-15  4:59             ` Andrii Nakryiko [this message]
2020-04-15 19:07               ` Jann Horn
2020-04-15 20:03                 ` Jann Horn
2020-04-16  5:42                   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzZ2vmdmn111KXXrp3qp1qLb4iMjUJ11Cj06SOGeOB6_Qg@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jannh@google.com \
    --cc=mjg59@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.