stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Wei Wang <weiwan@google.com>,
	Ido Schimmel <idosch@idosch.org>,
	Jesse Hathaway <jesse@mbuki-mvuki.org>,
	Martin KaFai Lau <kafai@fb.com>, David Ahern <dsahern@gmail.com>,
	Ido Schimmel <idosch@mellanox.com>,
	"David S. Miller" <davem@davemloft.net>,
	Carsten Schmid <carsten_schmid@mentor.com>
Subject: [PATCH 4.14 09/30] ipv4: fix race condition between route lookup and invalidation
Date: Mon,  8 Feb 2021 16:00:55 +0100	[thread overview]
Message-ID: <20210208145805.625201180@linuxfoundation.org> (raw)
In-Reply-To: <20210208145805.239714726@linuxfoundation.org>

From: Wei Wang <weiwan@google.com>

[ upstream commit 5018c59607a511cdee743b629c76206d9c9e6d7b ]

Jesse and Ido reported the following race condition:
<CPU A, t0> - Received packet A is forwarded and cached dst entry is
taken from the nexthop ('nhc->nhc_rth_input'). Calls skb_dst_set()

<t1> - Given Jesse has busy routers ("ingesting full BGP routing tables
from multiple ISPs"), route is added / deleted and rt_cache_flush() is
called

<CPU B, t2> - Received packet B tries to use the same cached dst entry
from t0, but rt_cache_valid() is no longer true and it is replaced in
rt_cache_route() by the newer one. This calls dst_dev_put() on the
original dst entry which assigns the blackhole netdev to 'dst->dev'

<CPU A, t3> - dst_input(skb) is called on packet A and it is dropped due
to 'dst->dev' being the blackhole netdev

There are 2 issues in the v4 routing code:
1. A per-netns counter is used to do the validation of the route. That
means whenever a route is changed in the netns, users of all routes in
the netns needs to redo lookup. v6 has an implementation of only
updating fn_sernum for routes that are affected.
2. When rt_cache_valid() returns false, rt_cache_route() is called to
throw away the current cache, and create a new one. This seems
unnecessary because as long as this route does not change, the route
cache does not need to be recreated.

To fully solve the above 2 issues, it probably needs quite some code
changes and requires careful testing, and does not suite for net branch.

So this patch only tries to add the deleted cached rt into the uncached
list, so user could still be able to use it to receive packets until
it's done.

Fixes: 95c47f9cf5e0 ("ipv4: call dst_dev_put() properly")
Signed-off-by: Wei Wang <weiwan@google.com>
Reported-by: Ido Schimmel <idosch@idosch.org>
Reported-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Tested-by: Jesse Hathaway <jesse@mbuki-mvuki.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Carsten Schmid <carsten_schmid@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/ipv4/route.c |   38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1437,6 +1437,24 @@ static bool rt_bind_exception(struct rta
 	return ret;
 }
 
+struct uncached_list {
+	spinlock_t		lock;
+	struct list_head	head;
+};
+
+static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt_uncached_list);
+
+static void rt_add_uncached_list(struct rtable *rt)
+{
+	struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list);
+
+	rt->rt_uncached_list = ul;
+
+	spin_lock_bh(&ul->lock);
+	list_add_tail(&rt->rt_uncached, &ul->head);
+	spin_unlock_bh(&ul->lock);
+}
+
 static bool rt_cache_route(struct fib_nh *nh, struct rtable *rt)
 {
 	struct rtable *orig, *prev, **p;
@@ -1456,7 +1474,7 @@ static bool rt_cache_route(struct fib_nh
 	prev = cmpxchg(p, orig, rt);
 	if (prev == orig) {
 		if (orig) {
-			dst_dev_put(&orig->dst);
+			rt_add_uncached_list(orig);
 			dst_release(&orig->dst);
 		}
 	} else {
@@ -1467,24 +1485,6 @@ static bool rt_cache_route(struct fib_nh
 	return ret;
 }
 
-struct uncached_list {
-	spinlock_t		lock;
-	struct list_head	head;
-};
-
-static DEFINE_PER_CPU_ALIGNED(struct uncached_list, rt_uncached_list);
-
-static void rt_add_uncached_list(struct rtable *rt)
-{
-	struct uncached_list *ul = raw_cpu_ptr(&rt_uncached_list);
-
-	rt->rt_uncached_list = ul;
-
-	spin_lock_bh(&ul->lock);
-	list_add_tail(&rt->rt_uncached, &ul->head);
-	spin_unlock_bh(&ul->lock);
-}
-
 static void ipv4_dst_destroy(struct dst_entry *dst)
 {
 	struct dst_metrics *p = (struct dst_metrics *)DST_METRICS_PTR(dst);



  parent reply	other threads:[~2021-02-08 15:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 15:00 [PATCH 4.14 00/30] 4.14.221-rc1 review Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 01/30] USB: serial: cp210x: add pid/vid for WSDA-200-USB Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 02/30] USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 03/30] USB: serial: option: Adding support for Cinterion MV31 Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 04/30] Input: i8042 - unbreak Pegatron C15B Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 05/30] arm64: dts: ls1046a: fix dcfg address range Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 06/30] net: lapb: Copy the skb before sending a packet Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 07/30] objtool: Support Clang non-section symbols in ORC generation Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 08/30] elfcore: fix building with clang Greg Kroah-Hartman
2021-02-08 15:00 ` Greg Kroah-Hartman [this message]
2021-02-08 15:00 ` [PATCH 4.14 10/30] USB: gadget: legacy: fix an error code in eth_bind() Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 11/30] USB: usblp: dont call usb_set_interface if theres a single alt Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 12/30] usb: dwc2: Fix endpoint direction check in ep_from_windex Greg Kroah-Hartman
2021-02-08 15:00 ` [PATCH 4.14 13/30] ovl: fix dentry leak in ovl_get_redirect Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 14/30] mac80211: fix station rate table updates on assoc Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 15/30] kretprobe: Avoid re-registration of the same kretprobe earlier Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 16/30] xhci: fix bounce buffer usage for non-sg list case Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 17/30] cifs: report error instead of invalid when revalidating a dentry fails Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 18/30] smb3: Fix out-of-bounds bug in SMB2_negotiate() Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 19/30] mmc: core: Limit retries when analyse of SDIO tuples fails Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 20/30] nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 21/30] ARM: footbridge: fix dc21285 PCI configuration accessors Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 22/30] mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 23/30] mm: hugetlb: fix a race between isolating and freeing page Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 24/30] mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 25/30] mm: thp: fix MADV_REMOVE deadlock on shmem THP Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 26/30] x86/build: Disable CET instrumentation in the kernel Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 27/30] x86/apic: Add extra serialization for non-serializing MSRs Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 28/30] Input: xpad - sync supported devices with fork on GitHub Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 29/30] iommu/vt-d: Do not use flush-queue when caching-mode is on Greg Kroah-Hartman
2021-02-08 15:01 ` [PATCH 4.14 30/30] net: dsa: mv88e6xxx: override existent unicast portvec in port_fdb_add Greg Kroah-Hartman
2021-02-09 17:34 ` [PATCH 4.14 00/30] 4.14.221-rc1 review Naresh Kamboju
2021-02-09 18:10 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210208145805.625201180@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=carsten_schmid@mentor.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=idosch@idosch.org \
    --cc=idosch@mellanox.com \
    --cc=jesse@mbuki-mvuki.org \
    --cc=kafai@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).