All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Nikolay Aleksandrov <nikolay@cumulusnetworks.com>,
	David Ahern <dsahern@kernel.org>,
	"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.4 13/90] ipv6: Use global sernum for dst validation with nexthop objects
Date: Wed, 13 May 2020 11:44:09 +0200	[thread overview]
Message-ID: <20200513094410.412050680@linuxfoundation.org> (raw)
In-Reply-To: <20200513094408.810028856@linuxfoundation.org>

From: David Ahern <dsahern@kernel.org>

[ Upstream commit 8f34e53b60b337e559f1ea19e2780ff95ab2fa65 ]

Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.

IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.

With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.

Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).

This problem only affects routes using the new, external nexthops.

Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.

Fixes: 5b98324ebe29 ("ipv6: Allow routes to use nexthop objects")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 include/net/ip6_fib.h       |    4 ++++
 include/net/net_namespace.h |    7 +++++++
 net/ipv6/route.c            |   25 +++++++++++++++++++++++++
 3 files changed, 36 insertions(+)

--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -177,6 +177,7 @@ struct fib6_info {
 struct rt6_info {
 	struct dst_entry		dst;
 	struct fib6_info __rcu		*from;
+	int				sernum;
 
 	struct rt6key			rt6i_dst;
 	struct rt6key			rt6i_src;
@@ -260,6 +261,9 @@ static inline u32 rt6_get_cookie(const s
 	struct fib6_info *from;
 	u32 cookie = 0;
 
+	if (rt->sernum)
+		return rt->sernum;
+
 	rcu_read_lock();
 
 	from = rcu_dereference(rt->from);
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -428,6 +428,13 @@ static inline int rt_genid_ipv4(struct n
 	return atomic_read(&net->ipv4.rt_genid);
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline int rt_genid_ipv6(const struct net *net)
+{
+	return atomic_read(&net->ipv6.fib6_sernum);
+}
+#endif
+
 static inline void rt_genid_bump_ipv4(struct net *net)
 {
 	atomic_inc(&net->ipv4.rt_genid);
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1388,9 +1388,18 @@ static struct rt6_info *ip6_rt_pcpu_allo
 	}
 	ip6_rt_copy_init(pcpu_rt, res);
 	pcpu_rt->rt6i_flags |= RTF_PCPU;
+
+	if (f6i->nh)
+		pcpu_rt->sernum = rt_genid_ipv6(dev_net(dev));
+
 	return pcpu_rt;
 }
 
+static bool rt6_is_valid(const struct rt6_info *rt6)
+{
+	return rt6->sernum == rt_genid_ipv6(dev_net(rt6->dst.dev));
+}
+
 /* It should be called with rcu_read_lock() acquired */
 static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res)
 {
@@ -1398,6 +1407,19 @@ static struct rt6_info *rt6_get_pcpu_rou
 
 	pcpu_rt = this_cpu_read(*res->nh->rt6i_pcpu);
 
+	if (pcpu_rt && pcpu_rt->sernum && !rt6_is_valid(pcpu_rt)) {
+		struct rt6_info *prev, **p;
+
+		p = this_cpu_ptr(res->nh->rt6i_pcpu);
+		prev = xchg(p, NULL);
+		if (prev) {
+			dst_dev_put(&prev->dst);
+			dst_release(&prev->dst);
+		}
+
+		pcpu_rt = NULL;
+	}
+
 	return pcpu_rt;
 }
 
@@ -2599,6 +2621,9 @@ static struct dst_entry *ip6_dst_check(s
 
 	rt = container_of(dst, struct rt6_info, dst);
 
+	if (rt->sernum)
+		return rt6_is_valid(rt) ? dst : NULL;
+
 	rcu_read_lock();
 
 	/* All IPV6 dsts are created with ->obsolete set to the value



  parent reply	other threads:[~2020-05-13  9:48 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-13  9:43 [PATCH 5.4 00/90] 5.4.41-rc1 review Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 01/90] USB: serial: qcserial: Add DW5816e support Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 02/90] nvme: refactor nvme_identify_ns_descs error handling Greg Kroah-Hartman
2020-05-13  9:43 ` [PATCH 5.4 03/90] nvme: fix possible hang when ns scanning fails during error recovery Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 04/90] tracing/kprobes: Fix a double initialization typo Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 05/90] net: macb: Fix runtime PM refcounting Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 06/90] drm/amdgpu: move kfd suspend after ip_suspend_phase1 Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 07/90] drm/amdgpu: drop redundant cg/pg ungate on runpm enter Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 08/90] vt: fix unicode console freeing with a common interface Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 09/90] tty: xilinx_uartps: Fix missing id assignment to the console Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 10/90] devlink: fix return value after hitting end in region read Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 11/90] dp83640: reverse arguments to list_add_tail Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 12/90] fq_codel: fix TCA_FQ_CODEL_DROP_BATCH_SIZE sanity checks Greg Kroah-Hartman
2020-05-13  9:44 ` Greg Kroah-Hartman [this message]
2020-05-13  9:44 ` [PATCH 5.4 14/90] mlxsw: spectrum_acl_tcam: Position vchunk in a vregion list properly Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 15/90] neigh: send protocol value in neighbor create notification Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 16/90] net: dsa: Do not leave DSA master with NULL netdev_ops Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 17/90] net: macb: fix an issue about leak related system resources Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 18/90] net: macsec: preserve ingress frame ordering Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 19/90] net/mlx4_core: Fix use of ENOSPC around mlx4_counter_alloc() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 20/90] net_sched: sch_skbprio: add message validation to skbprio_change() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 21/90] net: stricter validation of untrusted gso packets Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 22/90] net: tc35815: Fix phydev supported/advertising mask Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 23/90] net/tls: Fix sk_psock refcnt leak in bpf_exec_tx_verdict() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 24/90] net/tls: Fix sk_psock refcnt leak when in tls_data_ready() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 25/90] net: usb: qmi_wwan: add support for DW5816e Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 26/90] nfp: abm: fix a memory leak bug Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 27/90] sch_choke: avoid potential panic in choke_reset() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 28/90] sch_sfq: validate silly quantum values Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 29/90] tipc: fix partial topology connection closure Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 30/90] tunnel: Propagate ECT(1) when decapsulating as recommended by RFC6040 Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 31/90] bnxt_en: Fix VF anti-spoof filter setup Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 32/90] bnxt_en: Reduce BNXT_MSIX_VEC_MAX value to supported CQs per PF Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 33/90] bnxt_en: Improve AER slot reset Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 34/90] bnxt_en: Return error when allocating zero size context memory Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 35/90] bnxt_en: Fix VLAN acceleration handling in bnxt_fix_features() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 36/90] net/mlx5: DR, On creation set CQs arm_db member to right value Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 37/90] net/mlx5: Fix forced completion access non initialized command entry Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 38/90] net/mlx5: Fix command entry leak in Internal Error State Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 39/90] net: mvpp2: prevent buffer overflow in mvpp22_rss_ctx() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 40/90] net: mvpp2: cls: Prevent buffer overflow in mvpp2_ethtool_cls_rule_del() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 41/90] HID: wacom: Read HID_DG_CONTACTMAX directly for non-generic devices Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 42/90] sctp: Fix bundling of SHUTDOWN with COOKIE-ACK Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 43/90] Revert "HID: wacom: generic: read the number of expected touches on a per collection basis" Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 44/90] HID: usbhid: Fix race between usbhid_close() and usbhid_stop() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 45/90] HID: wacom: Report 2nd-gen Intuos Pro S center button status over BT Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 46/90] USB: uas: add quirk for LaCie 2Big Quadra Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 47/90] usb: chipidea: msm: Ensure proper controller reset using role switch API Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 48/90] USB: serial: garmin_gps: add sanity checking for data length Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 49/90] tracing: Add a vmalloc_sync_mappings() for safe measure Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 50/90] crypto: arch/nhpoly1305 - process in explicit 4k chunks Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 51/90] KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 52/90] KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 53/90] KVM: arm: vgic: Fix limit condition when writing to GICD_I[CS]ACTIVER Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 54/90] KVM: arm64: Fix 32bit PC wrap-around Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 55/90] arm64: hugetlb: avoid potential NULL dereference Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 56/90] drm: ingenic-drm: add MODULE_DEVICE_TABLE Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 57/90] ipc/mqueue.c: change __do_notify() to bypass check_kill_permission() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 58/90] epoll: atomically remove wait entry on wake up Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 59/90] eventpoll: fix missing wakeup for ovflist in ep_poll_callback Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 60/90] mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous() Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 61/90] mm: limit boost_watermark on small zones Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 62/90] ceph: fix endianness bug when handling MDS session feature bits Greg Kroah-Hartman
2020-05-13  9:44 ` [PATCH 5.4 63/90] ceph: demote quotarealm lookup warning to a debug message Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 64/90] staging: gasket: Check the return value of gasket_get_bar_index() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 65/90] coredump: fix crash when umh is disabled Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 66/90] riscv: set max_pfn to the PFN of the last page Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 67/90] iocost: protect iocg->abs_vdebt with iocg->waitq.lock Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 68/90] batman-adv: fix batadv_nc_random_weight_tq Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 69/90] batman-adv: Fix refcnt leak in batadv_show_throughput_override Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 70/90] batman-adv: Fix refcnt leak in batadv_store_throughput_override Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 71/90] batman-adv: Fix refcnt leak in batadv_v_ogm_process Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 72/90] x86/entry/64: Fix unwind hints in register clearing code Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 73/90] x86/entry/64: Fix unwind hints in kernel exit path Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 74/90] x86/entry/64: Fix unwind hints in rewind_stack_do_exit() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 75/90] x86/unwind/orc: Dont skip the first frame for inactive tasks Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 76/90] x86/unwind/orc: Prevent unwinding before ORC initialization Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 77/90] x86/unwind/orc: Fix error path for bad ORC entry type Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 78/90] x86/unwind/orc: Fix premature unwind stoppage due to IRET frames Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 79/90] KVM: x86: Fixes posted interrupt check for IRQs delivery modes Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 80/90] arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 81/90] netfilter: nat: never update the UDP checksum when its 0 Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 82/90] netfilter: nf_osf: avoid passing pointer to local var Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 83/90] objtool: Fix stack offset tracking for indirect CFAs Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 84/90] iommu/virtio: Reverse arguments to list_add Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 85/90] scripts/decodecode: fix trapping instruction formatting Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 86/90] mm, memcg: fix error return value of mem_cgroup_css_alloc() Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 87/90] bdi: move bdi_dev_name out of line Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 88/90] bdi: add a ->dev_name field to struct backing_dev_info Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 89/90] fsnotify: replace inode pointer with an object id Greg Kroah-Hartman
2020-05-13  9:45 ` [PATCH 5.4 90/90] fanotify: merge duplicate events on parent and child Greg Kroah-Hartman
     [not found] ` <20200513094408.810028856-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>
2020-05-13 13:46   ` [PATCH 5.4 00/90] 5.4.41-rc1 review Jon Hunter
2020-05-13 13:46     ` Jon Hunter
2020-05-13 17:03 ` Guenter Roeck
2020-05-13 17:50 ` Naresh Kamboju
2020-05-13 23:01 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200513094410.412050680@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.