All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev, Kuniyuki Iwashima <kuniyu@amazon.com>,
	Paolo Abeni <pabeni@redhat.com>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 36/79] udp: Update reuse->has_conns under reuseport_lock.
Date: Thu, 27 Oct 2022 18:55:46 +0200	[thread overview]
Message-ID: <20221027165055.592854693@linuxfoundation.org> (raw)
In-Reply-To: <20221027165054.270676357@linuxfoundation.org>

From: Kuniyuki Iwashima <kuniyu@amazon.com>

[ Upstream commit 69421bf98482d089e50799f45e48b25ce4a8d154 ]

When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.

However, the current way to set has_conns is illegal and possible to
trigger that problem.  reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.

For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection.  To avoid the race, let's split
the reader and updater with proper locking.

 cpu1                               cpu2
+----+                             +----+

__ip[46]_datagram_connect()        reuseport_grow()
.                                  .
|- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
|  .                               |
|  |- rcu_read_lock()
|  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
|  |
|  |                               |  /* reuse->has_conns == 0 here */
|  |                               |- more_reuse->has_conns = reuse->has_conns
|  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
|  |                               |
|  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
|  |                               |                     more_reuse)
|  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED

Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review.  [0]

For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.

  1) shutdown() TCP listener, which is moved into the latter part of
     reuse->socks[] to migrate reqsk.

  2) New listen() overflows reuse->socks[] and call reuseport_grow().

  3) reuse->max_socks overflows u16 with the new listener.

  4) reuseport_grow() pops the old shutdown()ed listener from the array
     and update its sk->sk_reuseport_cb as NULL without lock_sock().

shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().

[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/

Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/net/sock_reuseport.h | 11 +++++------
 net/core/sock_reuseport.c    | 16 ++++++++++++++++
 net/ipv4/datagram.c          |  2 +-
 net/ipv4/udp.c               |  2 +-
 net/ipv6/datagram.c          |  2 +-
 net/ipv6/udp.c               |  2 +-
 6 files changed, 25 insertions(+), 10 deletions(-)

diff --git a/include/net/sock_reuseport.h b/include/net/sock_reuseport.h
index 0e558ca7afbf..6348c6f26903 100644
--- a/include/net/sock_reuseport.h
+++ b/include/net/sock_reuseport.h
@@ -39,21 +39,20 @@ extern struct sock *reuseport_select_sock(struct sock *sk,
 extern int reuseport_attach_prog(struct sock *sk, struct bpf_prog *prog);
 extern int reuseport_detach_prog(struct sock *sk);
 
-static inline bool reuseport_has_conns(struct sock *sk, bool set)
+static inline bool reuseport_has_conns(struct sock *sk)
 {
 	struct sock_reuseport *reuse;
 	bool ret = false;
 
 	rcu_read_lock();
 	reuse = rcu_dereference(sk->sk_reuseport_cb);
-	if (reuse) {
-		if (set)
-			reuse->has_conns = 1;
-		ret = reuse->has_conns;
-	}
+	if (reuse && reuse->has_conns)
+		ret = true;
 	rcu_read_unlock();
 
 	return ret;
 }
 
+void reuseport_has_conns_set(struct sock *sk);
+
 #endif  /* _SOCK_REUSEPORT_H */
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index f478c65a281b..364cf6c6912b 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -18,6 +18,22 @@ DEFINE_SPINLOCK(reuseport_lock);
 
 static DEFINE_IDA(reuseport_ida);
 
+void reuseport_has_conns_set(struct sock *sk)
+{
+	struct sock_reuseport *reuse;
+
+	if (!rcu_access_pointer(sk->sk_reuseport_cb))
+		return;
+
+	spin_lock_bh(&reuseport_lock);
+	reuse = rcu_dereference_protected(sk->sk_reuseport_cb,
+					  lockdep_is_held(&reuseport_lock));
+	if (likely(reuse))
+		reuse->has_conns = 1;
+	spin_unlock_bh(&reuseport_lock);
+}
+EXPORT_SYMBOL(reuseport_has_conns_set);
+
 static int reuseport_sock_index(struct sock *sk,
 				const struct sock_reuseport *reuse,
 				bool closed)
diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
index 4a8550c49202..112c6e892d30 100644
--- a/net/ipv4/datagram.c
+++ b/net/ipv4/datagram.c
@@ -70,7 +70,7 @@ int __ip4_datagram_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len
 	}
 	inet->inet_daddr = fl4->daddr;
 	inet->inet_dport = usin->sin_port;
-	reuseport_has_conns(sk, true);
+	reuseport_has_conns_set(sk);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 	inet->inet_id = prandom_u32();
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4446aa8237ff..b093daaa3deb 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -446,7 +446,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 			result = lookup_reuseport(net, sk, skb,
 						  saddr, sport, daddr, hnum);
 			/* Fall back to scoring if group has connections */
-			if (result && !reuseport_has_conns(sk, false))
+			if (result && !reuseport_has_conns(sk))
 				return result;
 
 			result = result ? : sk;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 206f66310a88..f4559e5bc84b 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -256,7 +256,7 @@ int __ip6_datagram_connect(struct sock *sk, struct sockaddr *uaddr,
 		goto out;
 	}
 
-	reuseport_has_conns(sk, true);
+	reuseport_has_conns_set(sk);
 	sk->sk_state = TCP_ESTABLISHED;
 	sk_set_txhash(sk);
 out:
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9b504bf49214..514e6a55959f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -179,7 +179,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 			result = lookup_reuseport(net, sk, skb,
 						  saddr, sport, daddr, hnum);
 			/* Fall back to scoring if group has connections */
-			if (result && !reuseport_has_conns(sk, false))
+			if (result && !reuseport_has_conns(sk))
 				return result;
 
 			result = result ? : sk;
-- 
2.35.1




  parent reply	other threads:[~2022-10-27 17:05 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27 16:55 [PATCH 5.10 00/79] 5.10.151-rc1 review Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 01/79] ocfs2: clear dinode links count in case of error Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 02/79] ocfs2: fix BUG when iput after ocfs2_mknod fails Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 03/79] selinux: enable use of both GFP_KERNEL and GFP_ATOMIC in convert_context() Greg Kroah-Hartman
2022-10-27 16:55   ` Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 04/79] cpufreq: qcom: fix writes in read-only memory region Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 05/79] i2c: qcom-cci: Fix ordering of pm_runtime_xx and i2c_add_adapter Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 06/79] cpufreq: tegra194: Fix module loading Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 07/79] x86/microcode/AMD: Apply the patch early on every logical thread Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 08/79] hwmon/coretemp: Handle large core ID value Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 09/79] ata: ahci-imx: Fix MODULE_ALIAS Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 10/79] ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 11/79] cpufreq: qcom: fix memory leak in error path Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 12/79] kvm: Add support for arch compat vm ioctls Greg Kroah-Hartman
2022-10-30  9:54   ` Pavel Machek
2022-10-27 16:55 ` [PATCH 5.10 13/79] KVM: arm64: vgic: Fix exit condition in scan_its_table() Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 14/79] media: mceusb: set timeout to at least timeout provided Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 15/79] media: venus: dec: Handle the case where find_format fails Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 16/79] bpf: Generate BTF_KIND_FLOAT when linking vmlinux Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 17/79] kbuild: Quote OBJCOPY var to avoid a pahole call break the build Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 18/79] kbuild: skip per-CPU BTF generation for pahole v1.18-v1.21 Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 19/79] kbuild: Unify options for BTF generation for vmlinux and modules Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 20/79] kbuild: Add skip_encoding_btf_enum64 option to pahole Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 21/79] block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 22/79] blk-wbt: call rq_qos_add() after wb_normal is initialized Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 23/79] arm64: errata: Remove AES hwcap for COMPAT tasks Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 24/79] r8152: add PID for the Lenovo OneLink+ Dock Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 25/79] btrfs: fix processing of delayed data refs during backref walking Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 26/79] btrfs: fix processing of delayed tree block " Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 27/79] ACPI: extlog: Handle multiple records Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 28/79] tipc: Fix recognition of trial period Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 29/79] tipc: fix an information leak in tipc_topsrv_kern_subscr Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 30/79] i40e: Fix DMA mappings leak Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 31/79] HID: magicmouse: Do not set BTN_MOUSE on double report Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 32/79] sfc: Change VF mac via PF as first preference if available Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 33/79] net/atm: fix proc_mpc_write incorrect return value Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 34/79] net: phy: dp83867: Extend RX strap quirk for SGMII mode Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 35/79] tcp: Add num_closed_socks to struct sock_reuseport Greg Kroah-Hartman
2022-10-27 19:53   ` Kuniyuki Iwashima
2022-10-28  6:17     ` Greg KH
2022-10-28 17:05       ` Kuniyuki Iwashima
2022-10-29  6:27         ` Greg KH
2022-10-27 16:55 ` Greg Kroah-Hartman [this message]
2022-10-27 16:55 ` [PATCH 5.10 37/79] cifs: Fix xid leak in cifs_copy_file_range() Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 38/79] cifs: Fix xid leak in cifs_flock() Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 39/79] cifs: Fix xid leak in cifs_ses_add_channel() Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 40/79] net: hsr: avoid possible NULL deref in skb_clone() Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 41/79] ionic: catch NULL pointer issue on reconfig Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 42/79] nvme-hwmon: rework to avoid devm allocation Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 43/79] nvme-hwmon: Return error code when registration fails Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 44/79] nvme-hwmon: consistently ignore errors from nvme_hwmon_init Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 45/79] nvme-hwmon: kmalloc the NVME SMART log buffer Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 46/79] net: sched: cake: fix null pointer access issue when cake_init() fails Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 47/79] net: sched: delete duplicate cleanup of backlog and qlen Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 48/79] net: sched: sfb: fix null pointer access issue when sfb_init() fails Greg Kroah-Hartman
2022-10-27 16:55 ` [PATCH 5.10 49/79] sfc: include vport_id in filter spec hash and equal() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 50/79] net: hns: fix possible memory leak in hnae_ae_register() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 51/79] net: sched: fix race condition in qdisc_graft() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 52/79] net: phy: dp83822: disable MDI crossover status change interrupt Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 53/79] iommu/vt-d: Allow NVS regions in arch_rmrr_sanity_check() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 54/79] iommu/vt-d: Clean up si_domain in the init_dmars() error path Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 55/79] drm/virtio: Use appropriate atomic state in virtio_gpu_plane_cleanup_fb() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 56/79] dmaengine: mxs-dma: Remove the unused .id_table Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 57/79] dmaengine: mxs: use platform_driver_register Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 58/79] tracing: Simplify conditional compilation code in tracing_set_tracer() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 59/79] tracing: Do not free snapshot if tracer is on cmdline Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 60/79] xen: assume XENFEAT_gnttab_map_avail_bits being set for pv guests Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 61/79] xen/gntdev: Accommodate VMA splitting Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 62/79] mmc: sdhci-tegra: Use actual clock rate for SW tuning correction Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 63/79] riscv: Add machine name to kernel boot log and stack dump output Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 64/79] riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 65/79] perf pmu: Validate raw event with sysfs exported format bits Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 66/79] perf: Skip and warn on unknown format configN attrs Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 67/79] fcntl: make F_GETOWN(EX) return 0 on dead owner task Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 68/79] fcntl: fix potential deadlocks for &fown_struct.lock Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 69/79] arm64: dts: qcom: sc7180-trogdor: Fixup modem memory region Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 70/79] arm64: topology: move store_cpu_topology() to shared code Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 71/79] riscv: topology: fix default topology reporting Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 72/79] perf/x86/intel/pt: Relax address filter validation Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 73/79] hv_netvsc: Fix race between VF offering and VF association message from host Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 74/79] [PATCH v3] ACPI: video: Force backlight native for more TongFang devices Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 75/79] x86/Kconfig: Drop check for -mabi=ms for CONFIG_EFI_STUB Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 76/79] Makefile.debug: re-enable debug info for .S files Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 77/79] mmc: core: Add SD card quirk for broken discard Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 78/79] blk-wbt: fix that rwb->wc is always set to 1 in wbt_init() Greg Kroah-Hartman
2022-10-27 16:56 ` [PATCH 5.10 79/79] mm: /proc/pid/smaps_rollup: fix no vmas null-deref Greg Kroah-Hartman
2022-10-27 18:10 ` [PATCH 5.10 00/79] 5.10.151-rc1 review Guenter Roeck
2022-10-27 19:25   ` Greg Kroah-Hartman
2022-10-27 19:27     ` Pavel Machek
2022-10-27 19:39       ` Guenter Roeck
2022-10-27 19:54         ` Florian Fainelli
2022-10-27 19:49       ` Linus Torvalds
2022-10-28 11:01         ` Greg Kroah-Hartman
2022-10-28 10:47 ` Sudip Mukherjee (Codethink)
2022-10-28 10:58   ` Greg Kroah-Hartman
2022-10-28 11:58 ` Jon Hunter
2022-10-28 12:21 ` Pavel Machek
2022-10-28 13:59 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221027165055.592854693@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=kuniyu@amazon.com \
    --cc=pabeni@redhat.com \
    --cc=patches@lists.linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.