linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	David Ahern <dsahern@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.4 13/90] ipv6: Use global sernum for dst validation with nexthop objects
Date: Wed, 13 May 2020 11:44:09 +0200	[thread overview]
Message-ID: <20200513094410.412050680@linuxfoundation.org> (raw)
In-Reply-To: <20200513094408.810028856@linuxfoundation.org>

From: David Ahern <dsahern@kernel.org>

[ Upstream commit 8f34e53b60b337e559f1ea19e2780ff95ab2fa65 ]

Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.

IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.

With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.

Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).

This problem only affects routes using the new, external nexthops.

Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.

Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ip6_fib.h       |    4 ++++
 include/net/net_namespace.h |    7 +++++++
 net/ipv6/route.c            |   25 +++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -177,6 +177,7 @@ struct fib6_info {
 struct rt6_info {
 	struct dst_entry		dst;
 	struct fib6_info __rcu		*from;
+	int				sernum;
 
 	struct rt6key			rt6i_dst;
 	struct rt6key			rt6i_src;
@@ -260,6 +261,9 @@ static inline u32 rt6_get_cookie(const s
 	struct fib6_info *from;
 	u32 cookie = 0;
 
+	if (rt->sernum)
+		return rt->sernum;
+
 	rcu_read_lock();
 
 	from = rcu_dereference(rt->from);
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -428,6 +428,13 @@ static inline int rt_genid_ipv4(struct n
 	return atomic_read(&net->ipv4.rt_genid);
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int rt_genid_ipv6(const struct net *net)
+{
+	return atomic_read(&net->ipv6.fib6_sernum);
+}
+#endif
+
 static inline void rt_genid_bump_ipv4(struct net *net)
 {
 	atomic_inc(&net->ipv4.rt_genid);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1388,9 +1388,18 @@ static struct rt6_info *ip6_rt_pcpu_allo
 	}
 	ip6_rt_copy_init(pcpu_rt, res);
 	pcpu_rt->rt6i_flags |= RTF_PCPU;
+
+	if (f6i->nh)
+		pcpu_rt->sernum = rt_genid_ipv6(dev_net(dev));
+
 	return pcpu_rt;
 }
 
+static bool rt6_is_valid(const struct rt6_info *rt6)
+{
+	return rt6->sernum == rt_genid_ipv6(dev_net(rt6->dst.dev));
+}
+
 /* It should be called with rcu_read_lock() acquired */
 static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res)
 {
@@ -1398,6 +1407,19 @@ static struct rt6_info *rt6_get_pcpu_rou
 
 	pcpu_rt = this_cpu_read(*res->nh->rt6i_pcpu);
 
+	if (pcpu_rt && pcpu_rt->sernum && !rt6_is_valid(pcpu_rt)) {
+		struct rt6_info *prev, **p;
+
+		p = this_cpu_ptr(res->nh->rt6i_pcpu);
+		prev = xchg(p, NULL);
+		if (prev) {
+			dst_dev_put(&prev->dst);
+			dst_release(&prev->dst);
+		}
+
+		pcpu_rt = NULL;
+	}
+
 	return pcpu_rt;
 }
 
@@ -2599,6 +2621,9 @@ static struct dst_entry *ip6_dst_check(s
 
 	rt = container_of(dst, struct rt6_info, dst);
 
+	if (rt->sernum)
+		return rt6_is_valid(rt) ? dst : NULL;
+
 	rcu_read_lock();
 
 	/* All IPV6 dsts are created with ->obsolete set to the value



  parent reply	other threads:[~2020-05-13  9:48 UTC|newest]

Thread overview: 95+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13  9:43 [PATCH 5.4 00/90] 5.4.41-rc1 review Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 01/90] USB: serial: qcserial: Add DW5816e support Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 02/90] nvme: refactor nvme_identify_ns_descs error handling Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 03/90] nvme: fix possible hang when ns scanning fails during error recovery Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 04/90] tracing/kprobes: Fix a double initialization typo Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 05/90] net: macb: Fix runtime PM refcounting Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 06/90] drm/amdgpu: move kfd suspend after ip_suspend_phase1 Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 07/90] drm/amdgpu: drop redundant cg/pg ungate on runpm enter Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 08/90] vt: fix unicode console freeing with a common interface Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 09/90] tty: xilinx_uartps: Fix missing id assignment to the console Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 10/90] devlink: fix return value after hitting end in region read Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 11/90] dp83640: reverse arguments to list_add_tail Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 12/90] fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks Greg Kroah-Hartman
2020-05-13  9:44 ` Greg Kroah-Hartman [this message]
2020-05-13  9:44 ` [PATCH 5.4 14/90] mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 15/90] neigh: send protocol value in neighbor create notification Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 16/90] net: dsa: Do not leave DSA master with NULL netdev_ops Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 17/90] net: macb: fix an issue about leak related system resources Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 18/90] net: macsec: preserve ingress frame ordering Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 19/90] net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 20/90] net_sched: sch_skbprio: add message validation to skbprio_change() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 21/90] net: stricter validation of untrusted gso packets Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 22/90] net: tc35815: Fix phydev supported/advertising mask Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 23/90] net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 24/90] net/tls: Fix sk_psock refcnt leak when in tls_data_ready() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 25/90] net: usb: qmi_wwan: add support for DW5816e Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 26/90] nfp: abm: fix a memory leak bug Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 27/90] sch_choke: avoid potential panic in choke_reset() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 28/90] sch_sfq: validate silly quantum values Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 29/90] tipc: fix partial topology connection closure Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 30/90] tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040 Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 31/90] bnxt_en: Fix VF anti-spoof filter setup Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 32/90] bnxt_en: Reduce BNXT_MSIX_VEC_MAX value to supported CQs per PF Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 33/90] bnxt_en: Improve AER slot reset Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 34/90] bnxt_en: Return error when allocating zero size context memory Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 35/90] bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 36/90] net/mlx5: DR, On creation set CQs arm_db member to right value Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 37/90] net/mlx5: Fix forced completion access non initialized command entry Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 38/90] net/mlx5: Fix command entry leak in Internal Error State Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 39/90] net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 40/90] net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 41/90] HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 42/90] sctp: Fix bundling of SHUTDOWN with COOKIE-ACK Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 43/90] Revert "HID: wacom: generic: read the number of expected touches on a per collection basis" Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 44/90] HID: usbhid: Fix race between usbhid_close() and usbhid_stop() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 45/90] HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 46/90] USB: uas: add quirk for LaCie 2Big Quadra Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 47/90] usb: chipidea: msm: Ensure proper controller reset using role switch API Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 48/90] USB: serial: garmin_gps: add sanity checking for data length Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 49/90] tracing: Add a vmalloc_sync_mappings() for safe measure Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 50/90] crypto: arch/nhpoly1305 - process in explicit 4k chunks Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 51/90] KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 52/90] KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 53/90] KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 54/90] KVM: arm64: Fix 32bit PC wrap-around Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 55/90] arm64: hugetlb: avoid potential NULL dereference Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 56/90] drm: ingenic-drm: add MODULE_DEVICE_TABLE Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 57/90] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 58/90] epoll: atomically remove wait entry on wake up Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 59/90] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 60/90] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 61/90] mm: limit boost_watermark on small zones Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 62/90] ceph: fix endianness bug when handling MDS session feature bits Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 63/90] ceph: demote quotarealm lookup warning to a debug message Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 64/90] staging: gasket: Check the return value of gasket_get_bar_index() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 65/90] coredump: fix crash when umh is disabled Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 66/90] riscv: set max_pfn to the PFN of the last page Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 67/90] iocost: protect iocg->abs_vdebt with iocg->waitq.lock Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 68/90] batman-adv: fix batadv_nc_random_weight_tq Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 69/90] batman-adv: Fix refcnt leak in batadv_show_throughput_override Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 70/90] batman-adv: Fix refcnt leak in batadv_store_throughput_override Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 71/90] batman-adv: Fix refcnt leak in batadv_v_ogm_process Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 72/90] x86/entry/64: Fix unwind hints in register clearing code Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 73/90] x86/entry/64: Fix unwind hints in kernel exit path Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 74/90] x86/entry/64: Fix unwind hints in rewind_stack_do_exit() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 75/90] x86/unwind/orc: Dont skip the first frame for inactive tasks Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 76/90] x86/unwind/orc: Prevent unwinding before ORC initialization Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 77/90] x86/unwind/orc: Fix error path for bad ORC entry type Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 78/90] x86/unwind/orc: Fix premature unwind stoppage due to IRET frames Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 79/90] KVM: x86: Fixes posted interrupt check for IRQs delivery modes Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 80/90] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 81/90] netfilter: nat: never update the UDP checksum when its 0 Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 82/90] netfilter: nf_osf: avoid passing pointer to local var Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 83/90] objtool: Fix stack offset tracking for indirect CFAs Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 84/90] iommu/virtio: Reverse arguments to list_add Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 85/90] scripts/decodecode: fix trapping instruction formatting Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 86/90] mm, memcg: fix error return value of mem_cgroup_css_alloc() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 87/90] bdi: move bdi_dev_name out of line Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 88/90] bdi: add a ->dev_name field to struct backing_dev_info Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 89/90] fsnotify: replace inode pointer with an object id Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 90/90] fanotify: merge duplicate events on parent and child Greg Kroah-Hartman
2020-05-13 13:46 ` [PATCH 5.4 00/90] 5.4.41-rc1 review Jon Hunter
2020-05-13 17:03 ` Guenter Roeck
2020-05-13 17:50 ` Naresh Kamboju
2020-05-13 23:01 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513094410.412050680@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).