linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: stable@vger.kernel.org, linux-kernel@vger.kernel.org
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	"David S . Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 4.4 33/65] bpf: generally move prog destruction to RCU deferral
Date: Thu, 25 Oct 2018 10:16:33 -0400	[thread overview]
Message-ID: <20181025141705.213937-33-sashal@kernel.org> (raw)
In-Reply-To: <20181025141705.213937-1-sashal@kernel.org>

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 1aacde3d22c42281236155c1ef6d7a5aa32a826b ]

Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:

 - Set up a process with threads T1, T2 and T3
 - Let T1 set up a socket filter F1 that invokes another filter F2
   through a BPF map [tail call]
 - Let T1 trigger the socket filter via a unix domain socket write,
   don't wait for completion
 - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
 - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
 - Let T3 close the file descriptor for F2, dropping the reference
   count of F2 to 2
 - At this point, T1 should have looked up F2 from the map, but not
   finished executing it
 - Let T3 remove F2 from the BPF map, dropping the reference count of
   F2 to 1
 - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
   the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
   via schedule_work()
 - At this point, the BPF program could be freed
 - BPF execution is still running in a freed BPF program

While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().

That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb56070359b ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.

Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/bpf.h   |  5 -----
 kernel/bpf/arraymap.c |  4 +---
 kernel/bpf/syscall.c  | 13 +++----------
 kernel/events/core.c  |  2 +-
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 132585a7fbd8..bae3da5bcda0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -177,7 +177,6 @@ void bpf_register_map_type(struct bpf_map_type_list *tl);
 struct bpf_prog *bpf_prog_get(u32 ufd);
 struct bpf_prog *bpf_prog_inc(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
-void bpf_prog_put_rcu(struct bpf_prog *prog);
 
 struct bpf_map *bpf_map_get_with_uref(u32 ufd);
 struct bpf_map *__bpf_map_get(struct fd f);
@@ -208,10 +207,6 @@ static inline struct bpf_prog *bpf_prog_get(u32 ufd)
 static inline void bpf_prog_put(struct bpf_prog *prog)
 {
 }
-
-static inline void bpf_prog_put_rcu(struct bpf_prog *prog)
-{
-}
 #endif /* CONFIG_BPF_SYSCALL */
 
 /* verifier prototypes for helper functions called from eBPF programs */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 0eb11b4ac4c7..daa4e0782cf7 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -270,9 +270,7 @@ static void *prog_fd_array_get_ptr(struct bpf_map *map, int fd)
 
 static void prog_fd_array_put_ptr(void *ptr)
 {
-	struct bpf_prog *prog = ptr;
-
-	bpf_prog_put_rcu(prog);
+	bpf_prog_put(ptr);
 }
 
 /* decrement refcnt of all bpf_progs that are stored in this map */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4b9bbfe764e8..04fc1022ad9f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -487,7 +487,7 @@ static void bpf_prog_uncharge_memlock(struct bpf_prog *prog)
 	free_uid(user);
 }
 
-static void __prog_put_common(struct rcu_head *rcu)
+static void __bpf_prog_put_rcu(struct rcu_head *rcu)
 {
 	struct bpf_prog_aux *aux = container_of(rcu, struct bpf_prog_aux, rcu);
 
@@ -496,17 +496,10 @@ static void __prog_put_common(struct rcu_head *rcu)
 	bpf_prog_free(aux->prog);
 }
 
-/* version of bpf_prog_put() that is called after a grace period */
-void bpf_prog_put_rcu(struct bpf_prog *prog)
-{
-	if (atomic_dec_and_test(&prog->aux->refcnt))
-		call_rcu(&prog->aux->rcu, __prog_put_common);
-}
-
 void bpf_prog_put(struct bpf_prog *prog)
 {
 	if (atomic_dec_and_test(&prog->aux->refcnt))
-		__prog_put_common(&prog->aux->rcu);
+		call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu);
 }
 EXPORT_SYMBOL_GPL(bpf_prog_put);
 
@@ -514,7 +507,7 @@ static int bpf_prog_release(struct inode *inode, struct file *filp)
 {
 	struct bpf_prog *prog = filp->private_data;
 
-	bpf_prog_put_rcu(prog);
+	bpf_prog_put(prog);
 	return 0;
 }
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 68b75dfceb0c..21e825250402 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7139,7 +7139,7 @@ static void perf_event_free_bpf_prog(struct perf_event *event)
 	prog = event->tp_event->prog;
 	if (prog && event->tp_event->bpf_prog_owner == event) {
 		event->tp_event->prog = NULL;
-		bpf_prog_put_rcu(prog);
+		bpf_prog_put(prog);
 	}
 }
 
-- 
2.17.1


  parent reply	other threads:[~2018-10-25 14:18 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-25 14:16 [PATCH AUTOSEL 4.4 01/65] KEYS: put keyring if install_session_keyring_to_cred() fails Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 02/65] ipv6: suppress sparse warnings in IP6_ECN_set_ce() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 03/65] net: drop write-only stack variable Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 04/65] ser_gigaset: use container_of() instead of detour Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 05/65] tracing: Skip more functions when doing stack tracing of events Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 06/65] ARM: dts: apq8064: add ahci ports-implemented mask Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 07/65] x86/mm/pat: Prevent hang during boot when mapping pages Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 08/65] btrfs: cleaner_kthread() doesn't need explicit freeze Sasha Levin
2018-10-25 15:07   ` David Sterba
2018-10-25 20:07     ` Sasha Levin
2018-10-26  6:58       ` Jiri Kosina
2018-10-26 10:57         ` Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 09/65] radix-tree: fix radix_tree_iter_retry() for tagged iterators Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 10/65] af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 11/65] net/mlx4_en: Resolve dividing by zero in 32-bit system Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 12/65] ipv6: orphan skbs in reassembly unit Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 13/65] um: Avoid longjmp/setjmp symbol clashes with libpthread.a Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 14/65] sched/cgroup: Fix cgroup entity load tracking tear-down Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 15/65] btrfs: don't create or leak aliased root while cleaning up orphans Sasha Levin
2018-10-25 15:12   ` David Sterba
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 16/65] thermal: allow spear-thermal driver to be a module Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 17/65] thermal: allow u8500-thermal " Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 18/65] tpm: fix: return rc when devm_add_action() fails Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 19/65] x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 20/65] aacraid: Start adapter after updating number of MSIX vectors Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 21/65] perf/core: Don't leak event in the syscall error path Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 22/65] [media] usbvision: revert commit 588afcc1 Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 23/65] MIPS: Fix FCSR Cause bit handling for correct SIGFPE issue Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 24/65] ASoC: ak4613: Enable cache usage to fix crashes on resume Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 25/65] ASoC: wm8940: " Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 26/65] CIFS: handle guest access errors to Windows shares Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 27/65] arm64: Fix potential race with hardware DBM in ptep_set_access_flags() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 28/65] xfrm: Clear sk_dst_cache when applying per-socket policy Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 29/65] scsi: Add STARGET_CREATED_REMOVE state to scsi_target_state Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 30/65] sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdata Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 31/65] sch_red: update backlog as well Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 32/65] usb-storage: fix bogus hardware error messages for ATA pass-thru devices Sasha Levin
2018-10-25 14:16 ` Sasha Levin [this message]
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 34/65] drm/nouveau/fbcon: fix oops without fbdev emulation Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 35/65] fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 36/65] ixgbevf: Fix handling of NAPI budget when multiple queues are enabled per vector Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 37/65] net/mlx5e: Fix LRO modify Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 38/65] net/mlx5e: Correctly handle RSS indirection table when changing number of channels Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 39/65] ixgbe: fix RSS limit for X550 Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 40/65] ixgbe: Correct X550EM_x revision check Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 41/65] ALSA: timer: Fix zero-division by continue of uninitialized instance Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 42/65] vti6: flush x-netns xfrm cache when vti interface is removed Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 43/65] gro: Allow tunnel stacking in the case of FOU/GUE Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 44/65] brcmfmac: Fix glom_skb leak in brcmf_sdiod_recv_chain Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 45/65] l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 46/65] tty: serial: sprd: fix error return code in sprd_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 47/65] video: fbdev: pxa3xx_gcu: fix error return code in pxa3xx_gcu_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 48/65] sparc64 mm: Fix more TSB sizing issues Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 49/65] gpu: host1x: fix error return code in host1x_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 50/65] sparc64: Fix exception handling in UltraSPARC-III memcpy Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 51/65] gpio: msic: fix error return code in platform_msic_gpio_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 52/65] usb: imx21-hcd: fix error return code in imx21_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 53/65] usb: ehci-omap: fix error return code in ehci_hcd_omap_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 54/65] usb: dwc3: omap: fix error return code in dwc3_omap_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 55/65] spi/bcm63xx-hspi: fix error return code in bcm63xx_hsspi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 56/65] MIPS: Handle non word sized instructions when examining frame Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 57/65] spi/bcm63xx: fix error return code in bcm63xx_spi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 58/65] spi: xlp: fix error return code in xlp_spi_probe() Sasha Levin
2018-10-25 14:16 ` [PATCH AUTOSEL 4.4 59/65] ASoC: spear: fix error return code in spdif_in_probe() Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 60/65] PM / devfreq: tegra: fix error return code in tegra_devfreq_probe() Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 61/65] bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 62/65] scsi: aacraid: Fix typo in blink status Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 63/65] MIPS: microMIPS: Fix decoding of swsp16 instruction Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 64/65] igb: Remove superfluous reset to PHY and page 0 selection Sasha Levin
2018-10-25 14:17 ` [PATCH AUTOSEL 4.4 65/65] MIPS: DEC: Fix an int-handler.S CPU_DADDI_WORKAROUNDS regression Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181025141705.213937-33-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).