All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, "ast@kernel.org, stable@vger.kernel.org,
	Daniel Borkmann" <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: [PATCH 4.14 66/71] bpf: avoid false sharing of map refcount with max_entries
Date: Mon, 29 Jan 2018 13:57:34 +0100	[thread overview]
Message-ID: <20180129123832.040554129@linuxfoundation.org> (raw)
In-Reply-To: <20180129123827.271171825@linuxfoundation.org>

4.14-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <daniel@iogearbox.net>

[ upstream commit be95a845cc4402272994ce290e3ad928aff06cb9 ]

In addition to commit b2157399cc98 ("bpf: prevent out-of-bounds
speculation") also change the layout of struct bpf_map such that
false sharing of fast-path members like max_entries is avoided
when the maps reference counter is altered. Therefore enforce
them to be placed into separate cachelines.

pahole dump after change:

  struct bpf_map {
        const struct bpf_map_ops  * ops;                 /*     0     8 */
        struct bpf_map *           inner_map_meta;       /*     8     8 */
        void *                     security;             /*    16     8 */
        enum bpf_map_type          map_type;             /*    24     4 */
        u32                        key_size;             /*    28     4 */
        u32                        value_size;           /*    32     4 */
        u32                        max_entries;          /*    36     4 */
        u32                        map_flags;            /*    40     4 */
        u32                        pages;                /*    44     4 */
        u32                        id;                   /*    48     4 */
        int                        numa_node;            /*    52     4 */
        bool                       unpriv_array;         /*    56     1 */

        /* XXX 7 bytes hole, try to pack */

        /* --- cacheline 1 boundary (64 bytes) --- */
        struct user_struct *       user;                 /*    64     8 */
        atomic_t                   refcnt;               /*    72     4 */
        atomic_t                   usercnt;              /*    76     4 */
        struct work_struct         work;                 /*    80    32 */
        char                       name[16];             /*   112    16 */
        /* --- cacheline 2 boundary (128 bytes) --- */

        /* size: 128, cachelines: 2, members: 17 */
        /* sum members: 121, holes: 1, sum holes: 7 */
  };

Now all entries in the first cacheline are read only throughout
the life time of the map, set up once during map creation. Overall
struct size and number of cachelines doesn't change from the
reordering. struct bpf_map is usually first member and embedded
in map structs in specific map implementations, so also avoid those
members to sit at the end where it could potentially share the
cacheline with first map values e.g. in the array since remote
CPUs could trigger map updates just as well for those (easily
dirtying members like max_entries intentionally as well) while
having subsequent values in cache.

Quoting from Google's Project Zero blog [1]:

  Additionally, at least on the Intel machine on which this was
  tested, bouncing modified cache lines between cores is slow,
  apparently because the MESI protocol is used for cache coherence
  [8]. Changing the reference counter of an eBPF array on one
  physical CPU core causes the cache line containing the reference
  counter to be bounced over to that CPU core, making reads of the
  reference counter on all other CPU cores slow until the changed
  reference counter has been written back to memory. Because the
  length and the reference counter of an eBPF array are stored in
  the same cache line, this also means that changing the reference
  counter on one physical CPU core causes reads of the eBPF array's
  length to be slow on other physical CPU cores (intentional false
  sharing).

While this doesn't 'control' the out-of-bounds speculation through
masking the index as in commit b2157399cc98, triggering a manipulation
of the map's reference counter is really trivial, so lets not allow
to easily affect max_entries from it.

Splitting to separate cachelines also generally makes sense from
a performance perspective anyway in that fast-path won't have a
cache miss if the map gets pinned, reused in other progs, etc out
of control path, thus also avoids unintentional false sharing.

  [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.html

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/linux/bpf.h |   21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -42,7 +42,14 @@ struct bpf_map_ops {
 };
 
 struct bpf_map {
-	atomic_t refcnt;
+	/* 1st cacheline with read-mostly members of which some
+	 * are also accessed in fast-path (e.g. ops, max_entries).
+	 */
+	const struct bpf_map_ops *ops ____cacheline_aligned;
+	struct bpf_map *inner_map_meta;
+#ifdef CONFIG_SECURITY
+	void *security;
+#endif
 	enum bpf_map_type map_type;
 	u32 key_size;
 	u32 value_size;
@@ -52,11 +59,15 @@ struct bpf_map {
 	u32 id;
 	int numa_node;
 	bool unpriv_array;
-	struct user_struct *user;
-	const struct bpf_map_ops *ops;
-	struct work_struct work;
+	/* 7 bytes hole */
+
+	/* 2nd cacheline with misc members to avoid false sharing
+	 * particularly with refcounting.
+	 */
+	struct user_struct *user ____cacheline_aligned;
+	atomic_t refcnt;
 	atomic_t usercnt;
-	struct bpf_map *inner_map_meta;
+	struct work_struct work;
 };
 
 /* function argument constraints */

  parent reply	other threads:[~2018-01-29 20:37 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-29 12:56 [PATCH 4.14 00/71] 4.14.16-stable review Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 01/71] orangefs: use list_for_each_entry_safe in purge_waiting_ops Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 02/71] orangefs: initialize op on loop restart in orangefs_devreq_read Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 03/71] mm, page_alloc: fix potential false positive in __zone_watermark_ok Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 04/71] netfilter: nfnetlink_cthelper: Add missing permission checks Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 05/71] netfilter: xt_osf: " Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 06/71] xfrm: Fix a race in the xdst pcpu cache Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 07/71] Revert "module: Add retpoline tag to VERMAGIC" Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 08/71] Input: xpad - add support for PDP Xbox One controllers Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 09/71] Input: trackpoint - force 3 buttons if 0 button is reported Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 10/71] Input: trackpoint - only expose supported controls for Elan, ALPS and NXP Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 11/71] Btrfs: fix stale entries in readdir Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 12/71] KVM: s390: add proper locking for CMMA migration bitmap Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 13/71] orangefs: fix deadlock; do not write i_size in read_iter Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 14/71] ARM: net: bpf: avoid bx instruction on non-Thumb capable CPUs Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 15/71] ARM: net: bpf: fix tail call jumps Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 16/71] ARM: net: bpf: fix stack alignment Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 17/71] ARM: net: bpf: move stack documentation Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 18/71] ARM: net: bpf: correct stack layout documentation Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 19/71] ARM: net: bpf: fix register saving Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 20/71] ARM: net: bpf: fix LDX instructions Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 21/71] ARM: net: bpf: clarify tail_call index Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 22/71] drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state() Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 23/71] net: Allow neigh contructor functions ability to modify the primary_key Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 24/71] ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 25/71] dccp: dont restart ccid2_hc_tx_rto_expire() if sk in closed state Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 26/71] ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 27/71] ipv6: fix udpv6 sendmsg crash caused by too small MTU Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 28/71] ipv6: ip6_make_skb() needs to clear cork.base.dst Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 29/71] lan78xx: Fix failure in USB Full Speed Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 30/71] net: igmp: fix source address check for IGMPv3 reports Greg Kroah-Hartman
2018-01-29 12:56 ` [PATCH 4.14 31/71] net: qdisc_pkt_len_init() should be more robust Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 32/71] net: tcp: close sock if net namespace is exiting Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 33/71] net/tls: Fix inverted error codes to avoid endless loop Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 34/71] net: vrf: Add support for sends to local broadcast address Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 35/71] pppoe: take ->needed_headroom of lower device into account on xmit Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 36/71] r8169: fix memory corruption on retrieval of hardware statistics Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 37/71] sctp: do not allow the v4 socket to bind a v4mapped v6 address Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 38/71] sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 39/71] tipc: fix a memory leak in tipc_nl_node_get_link() Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 40/71] {net,ib}/mlx5: Dont disable local loopback multicast traffic when needed Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 41/71] net/mlx5: Fix get vector affinity helper function Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 42/71] ppp: unlock all_ppp_mutex before registering device Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 43/71] be2net: restore properly promisc mode after queues reconfiguration Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 44/71] ip6_gre: init dev->mtu and dev->hard_header_len correctly Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 45/71] gso: validate gso_type in GSO handlers Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 46/71] mlxsw: spectrum_router: Dont log an error on missing neighbor Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 47/71] tun: fix a memory leak for tfile->tx_array Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 48/71] flow_dissector: properly cap thoff field Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 49/71] sctp: reinit stream if stream outcnt has been change by sinit in sendmsg Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 50/71] netlink: extack needs to be reset each time through loop Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 51/71] net/mlx5e: Fix fixpoint divide exception in mlx5e_am_stats_compare Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 52/71] nfp: use the correct index for link speed table Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 53/71] netlink: reset extack earlier in netlink_rcv_skb Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 54/71] net/tls: Only attach to sockets in ESTABLISHED state Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 55/71] tls: fix sw_ctx leak Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 56/71] tls: return -EBUSY if crypto_info is already set Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 57/71] tls: reset crypto_info when do_tls_setsockopt_tx fails Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 58/71] net: ipv4: Make "ip route get" match iif lo rules again Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 59/71] vmxnet3: repair memory leak Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 60/71] perf/x86/amd/power: Do not load AMD power module on !AMD platforms Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 61/71] x86/microcode/intel: Extend BDW late-loading further with LLC size check Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 62/71] x86/microcode: Fix again accessing initrd after having been freed Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 63/71] x86/mm/64: Fix vmapped stack syncing on very-large-memory 4-level systems Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 64/71] hrtimer: Reset hrtimer cpu base proper on CPU hotplug Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 65/71] bpf: introduce BPF_JIT_ALWAYS_ON config Greg Kroah-Hartman
2018-01-29 12:57 ` Greg Kroah-Hartman [this message]
2018-01-29 12:57 ` [PATCH 4.14 67/71] bpf: fix divides by zero Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 68/71] bpf: fix 32-bit divide " Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 69/71] bpf: reject stores into ctx via st and xadd Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 70/71] bpf, arm64: fix stack_depth tracking in combination with tail calls Greg Kroah-Hartman
2018-01-29 12:57 ` [PATCH 4.14 71/71] cpufreq: governor: Ensure sufficiently large sampling intervals Greg Kroah-Hartman
2018-01-29 23:59 ` [PATCH 4.14 00/71] 4.14.16-stable review Shuah Khan
2018-01-30 10:06 ` Naresh Kamboju
2018-01-30 12:53   ` Greg Kroah-Hartman
2018-01-30 14:21 ` Guenter Roeck
2018-01-30 14:52   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180129123832.040554129@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.