netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jesper Dangaard Brouer <brouer@redhat.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Sasha Levin <sashal@kernel.org>,
	netdev@vger.kernel.org, bpf@vger.kernel.org
Subject: [PATCH AUTOSEL 4.19 218/219] xdp: fix cpumap redirect SKB creation bug
Date: Fri, 22 Nov 2019 00:49:09 -0500	[thread overview]
Message-ID: <20191122054911.1750-210-sashal@kernel.org> (raw)
In-Reply-To: <20191122054911.1750-1-sashal@kernel.org>

From: Jesper Dangaard Brouer <brouer@redhat.com>

[ Upstream commit 676e4a6fe703f2dae699ee9d56f14516f9ada4ea ]

We want to avoid leaking pointer info from xdp_frame (that is placed in
top of frame) like commit 6dfb970d3dbd ("xdp: avoid leaking info stored in
frame data on page reuse"), and followup commit 97e19cce05e5 ("bpf:
reserve xdp_frame size in xdp headroom") that reserve this headroom.

These changes also affected how cpumap constructed SKBs, as xdpf->headroom
size changed, the skb data starting point were in-effect shifted with 32
bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
calculation also included xdpf->headroom which were reduced by same amount.

A bug was introduced in commit 77ea5f4cbe20 ("bpf/cpumap: make sure
frame_size for build_skb is aligned if headroom isn't"), where the
xdpf->headroom became part of the SKB_DATA_ALIGN rounding up. This
round-up to find the frame_size is in principle still correct as it does
not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
limit. This cause skb_shared_info to spill into next frame. It is a little
hard to trigger, as the SKB need to use above 15 skb_shinfo->frags[] as
far as I calculate. This does happen in practise for TCP streams when
skb_try_coalesce() kicks in.

KASAN can be used to detect these wrong memory accesses, I've seen:
 BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
 BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250

Driver veth also construct a SKB from xdp_frame in this way, but is not
affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
and allows SKB to use this area.

The fix in this patch is to do like veth and instead allow SKB to (re)use
the area occupied by xdp_frame, by clearing via xdp_scrub_frame().  (This
does kill the idea of the SKB being able to access (mem) info from this
area, but I guess it was a bad idea anyhow, and it was already killed by
the veth changes.)

Fixes: 77ea5f4cbe20 ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/bpf/cpumap.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 8974b3755670e..3c18260403dde 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,10 +162,14 @@ static void cpu_map_kthread_stop(struct work_struct *work)
 static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 					 struct xdp_frame *xdpf)
 {
+	unsigned int hard_start_headroom;
 	unsigned int frame_size;
 	void *pkt_data_start;
 	struct sk_buff *skb;
 
+	/* Part of headroom was reserved to xdpf */
+	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
+
 	/* build_skb need to place skb_shared_info after SKB end, and
 	 * also want to know the memory "truesize".  Thus, need to
 	 * know the memory frame size backing xdp_buff.
@@ -183,15 +187,15 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	 * is not at a fixed memory location, with mixed length
 	 * packets, which is bad for cache-line hotness.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + xdpf->headroom) +
+	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
 		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
-	pkt_data_start = xdpf->data - xdpf->headroom;
+	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb(pkt_data_start, frame_size);
 	if (!skb)
 		return NULL;
 
-	skb_reserve(skb, xdpf->headroom);
+	skb_reserve(skb, hard_start_headroom);
 	__skb_put(skb, xdpf->len);
 	if (xdpf->metasize)
 		skb_metadata_set(skb, xdpf->metasize);
@@ -205,6 +209,9 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	 * - RX ring dev queue index	(skb_record_rx_queue)
 	 */
 
+	/* Allow SKB to reuse area used by xdp_frame */
+	xdp_scrub_frame(xdpf);
+
 	return skb;
 }
 
-- 
2.20.1


      parent reply	other threads:[~2019-11-22  5:53 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20191122054911.1750-1-sashal@kernel.org>
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 044/219] mwifiex: fix potential NULL dereference and use after free Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 045/219] mwifiex: debugfs: correct histogram spacing, formatting Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 046/219] brcmfmac: set F2 watermark to 256 for 4373 Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 047/219] brcmfmac: set SDIO F1 MesBusyCtrl for CYW4373 Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 048/219] rtl818x: fix potential use after free Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 059/219] iwlwifi: move iwl_nvm_check_version() into dvm Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 060/219] iwlwifi: mvm: force TCM re-evaluation on TCM resume Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 061/219] iwlwifi: pcie: fix erroneous print Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 062/219] iwlwifi: pcie: set cmd_len in the correct place Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 067/219] VSOCK: bind to random port for VMADDR_PORT_ANY Sasha Levin
2019-11-22  5:46 ` [PATCH AUTOSEL 4.19 087/219] net/mlx5: Continue driver initialization despite debugfs failure Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 088/219] netfilter: nf_nat_sip: fix RTP/RTCP source port translations Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 090/219] bnxt_en: Return linux standard errors in bnxt_ethtool.c Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 091/219] bnxt_en: Save ring statistics before reset Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 092/219] bnxt_en: query force speeds before disabling autoneg mode Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 105/219] vxlan: Fix error path in __vxlan_dev_create() Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 115/219] brcmfmac: Fix access point mode Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 116/219] ath6kl: Only use match sets when firmware supports it Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 117/219] ath6kl: Fix off by one error in scan completion Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 130/219] bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 138/219] net/netlink_compat: Fix a missing check of nla_parse_nested Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 139/219] net/net_namespace: Check the return value of register_pernet_subsys() Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 140/219] libceph: drop last_piece logic from write_partial_message_data() Sasha Levin
2019-11-22 14:00   ` Ilya Dryomov
2019-12-01 14:34     ` Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 145/219] net: (cpts) fix a missing check of clk_prepare Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 146/219] net: stmicro: " Sasha Levin
2019-11-22  5:47 ` [PATCH AUTOSEL 4.19 147/219] net: dsa: bcm_sf2: Propagate error value from mdio_write Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 148/219] atl1e: checking the status of atl1e_write_phy_reg Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 149/219] tipc: fix a missing check of genlmsg_put Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 150/219] net: marvell: fix a missing check of acpi_match_device Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 151/219] net/wan/fsl_ucc_hdlc: Avoid double free in ucc_hdlc_probe() Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 158/219] netfilter: nf_tables: fix a missing check of nla_put_failure Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 159/219] xprtrdma: Prevent leak of rpcrdma_rep objects Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 170/219] tipc: fix memory leak in tipc_nl_compat_publ_dump Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 171/219] net/core/neighbour: tell kmemleak about hash tables Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 174/219] net/core/neighbour: fix kmemleak minimal reference count for " Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 177/219] sfc: suppress duplicate nvmem partition types in efx_ef10_mtd_probe Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 178/219] ip_tunnel: Make none-tunnel-dst tunnel port work with lwtunnel Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 179/219] decnet: fix DN_IFREQ_SIZE Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 180/219] net/smc: prevent races between smc_lgr_terminate() and smc_conn_free() Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 181/219] net/smc: don't wait for send buffer space when data was already sent Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 184/219] net/smc: fix sender_free computation Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 186/219] net/smc: fix byte_order for rx_curs_confirmed Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 187/219] tipc: fix skb may be leaky in tipc_link_input Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 189/219] sfc: initialise found bitmap in efx_ef10_mtd_probe Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 190/219] geneve: change NET_UDP_TUNNEL dependency to select Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 191/219] net: fix possible overflow in __sk_mem_raise_allocated() Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 192/219] net: ip_gre: do not report erspan_ver for gre or gretap Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 193/219] net: ip6_gre: do not report erspan_ver for ip6gre or ip6gretap Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 194/219] sctp: don't compare hb_timer expire date before starting it Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 195/219] bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 197/219] net: dev: Use unsigned integer as an argument to left-shift Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 199/219] bpf: drop refcount if bpf_map_new_fd() fails in map_create() Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 200/219] net: hns3: Change fw error code NOT_EXEC to NOT_SUPPORTED Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 201/219] net: hns3: fix PFC not setting problem for DCB module Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 202/219] net: hns3: fix an issue for hclgevf_ae_get_hdev Sasha Levin
2019-11-22  5:48 ` [PATCH AUTOSEL 4.19 203/219] net: hns3: fix an issue for hns3_update_new_int_gl Sasha Levin
2019-11-22  5:49 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191122054911.1750-210-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).