linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Cc: Network Development <netdev@vger.kernel.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Michael Kerrisk-manpages <mtk.manpages@gmail.com>,
	linux-man <linux-man@vger.kernel.org>
Subject: BPF: RCU use-after-reallocation of hash table elements?
Date: Wed, 3 Oct 2018 18:15:08 +0200	[thread overview]
Message-ID: <CAG48ez0Q3BaFsA_FUs1ZCAMRZtXkq5o_VCbU8CgTEVwa2HRZfQ@mail.gmail.com> (raw)

Hi!

Note: I haven't tested any of this; feel free to tell me that I've
completely misunderstood how all this works.

The BPF manpage, at the moment, states about BPF hash tables:

       BPF_MAP_TYPE_HASH
              Hash-table maps have the following characteristics:

              *  Maps are created and destroyed by user-space programs.
                 Both user-space and eBPF programs can perform lookup,
                 update, and delete operations.

              *  The kernel takes care of allocating and freeing key/value
                 pairs.

              *  The map_update_elem() helper will fail to insert new ele‐
                 ment when the max_entries limit is reached.  (This ensures
                 that eBPF programs cannot exhaust memory.)

              *  map_update_elem() replaces existing elements atomically.

              Hash-table maps are optimized for speed of lookup.

This documentation claims that elements are replaced "atomically", and
that the kernel "takes care of allocating and freeing key/value
pairs". But as far as I can tell, that's not quite the whole story
least since commit 6c90598174322b8888029e40dd84a4eb01f56afe (first in
4.6).

Unless a BPF hash table is created with the (undocumented) flag
BPF_F_NO_PREALLOC, the kernel now actually pre-allocates the hash
table elements. Hash table elements can be freed and reused for new
allocations (!) without waiting for an RCU grace period: Freed
elements are immediately pushed on the percpu freelist, and can be
immediately reused from there. The most obvious consequence of this is
that if a BPF program looks up a hash table entry and then reads the
value, the value can be replaced with a new value in between. A more
subtle consequence is that BPF map lookups can return false-positive
results: If the first half of the lookup key matches the old key, and
the second half of the lookup key matches the new key, then a BPF map
lookup can return a false-positive result, as far as I can tell.

If what I'm saying is correct, I'm not sure what the best fix is.

Add a grace period when freeing hash map entries, and add a new -EBUSY
return value for attempts to create hash map entries when all free
entries are waiting for the end of an RCU grace period?

Add a grace period when freeing hash map entries, and use
rcu_synchronize() when inserting BPF hashmap entries from userspace
and all free entries are waiting for RCU? But that still leaves the
bpf_map_update_elem_proto helper that can be called from BPF.
Deprecate that helper for access to hash maps?

Document the race, and advise people who use BPF for
non-performance-tracing purposes (where occasional false positives
actually matter) to use BPF_F_NO_PREALLOC?

Add some sort of sequence lock to BPF (yuck)?

             reply	other threads:[~2018-10-03 16:15 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-03 16:15 Jann Horn [this message]
2018-10-03 17:05 ` BPF: RCU use-after-reallocation of hash table elements? Alexei Starovoitov
2018-10-03 17:18   ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez0Q3BaFsA_FUs1ZCAMRZtXkq5o_VCbU8CgTEVwa2HRZfQ@mail.gmail.com \
    --to=jannh@google.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-man@vger.kernel.org \
    --cc=mtk.manpages@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).