All of lore.kernel.org
 help / color / mirror / Atom feed
From: stranche@codeaurora.org
To: David Ahern <dsahern@gmail.com>
Cc: Wei Wang <weiwan@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Martin KaFai Lau <kafai@fb.com>,
	Mahesh Bandewar <maheshb@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Subject: Re: Refcount mismatch when unregistering netdevice from kernel
Date: Mon, 04 Jan 2021 20:05:17 -0700	[thread overview]
Message-ID: <9f25d75823a73c6f0f556f0905f931d1@codeaurora.org> (raw)
In-Reply-To: <839f0ad6-83c1-1df6-c34d-b844c52ba771@gmail.com>

On 2020-12-11 09:10, David Ahern wrote:

>>>> Could we further distinguish between dst added to the uncached list 
>>>> by
>>>> icmp6_dst_alloc() and xfrm6_fill_dst(), and confirm which ones are 
>>>> the
>>>> ones leaking reference?
>>>> I suspect it would be the xfrm ones, but I think it is worth 
>>>> verifying.
>>>> 
>> 
>> After digging into the DST allocation/destroy a bit more, it seems 
>> that
>> there are some cases where the DST's refcount does not hit zero, 
>> causing
>> them to never be freed and release their references.
>> One case comes from here on the IPv6 packet output path (these DST
>> structs would hold references to both the inet6_dev and the netdevice)
>> ip6_pol_route_output+0x20/0x2c -> ip6_pol_route+0x1dc/0x34c ->
>> rt6_make_pcpu_route+0x18/0xf4 -> ip6_rt_pcpu_alloc+0xb4/0x19c
> 
> This is the normal data path, and this refers to a per-cpu dst cache.
> Delete the route and the cached entries get removed.
> 

After tracing all the DST entries created by the system, we've been able 
to see
that all unfreed DST entries belong to the same route on the system. One 
is the
main rt6_info struct it references and the rest are percpu copies of it.

>> 
>> We also see two DSTs where they are stored as the xdst->rt entry on 
>> the
>> XFRM path that do not get released. One is allocated by the same path 
>> as
>> above, and the other like this
>> xfrm6_esp_err+0x7c/0xd4 -> esp6_err+0xc8/0x100 ->
>> ip6_update_pmtu+0xc8/0x100 -> __ip6_rt_update_pmtu+0x248/0x434 ->
>> ip6_rt_cache_alloc+0xa0/0x1dc
> 
> This entry goes into an exception cache. I have lost track of kernel
> versions and features. Try listing the route cache to see these:  ip -6
> ro ls cache

Thanks for the tip here. We've further seen that the route that refers 
to these
unfreed DST is always a cached exception route. After tracing the routes 
as well,
we can see that the fib6_info struct for this route is never freed 
either, thus
preventing any of the DSTs associated with it from being cleaned up and 
releasing
their refcounts on the device. In fact, we can see that the fib6_info 
struct is no
longer present in the main fib6 tree after a period of time. The last 
time we're
able to see the pointer to the route in the tree is during a route 
replace
operation from userspace, but it seems that the fib6_info is not fully 
released.
In particular, the exception cache is not flushed out for the route 
during the
replace operation like it is during a standard fib6_del_route() call.

We're able to reproduce the refcount mismatch after some experimentation 
as well.
Essentially, it consists of
1) adding a default route (ip -6 route add dev XXX default)
2) forcing the creation of an exception route via manually injecting an 
ICMPv6
Packet Too Big into the device.
3) Replace the default route (ip -6 route change dev XXX default)
4) Delete the device. (ip link del XXX)

After adding a call to flush out the exception cache for the route, the 
mismatch
is no longer seen:
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 7a0c877..95e4310 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1215,6 +1215,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, 
struct fib6_info *rt,
                 }
                 nsiblings = iter->fib6_nsiblings;
                 iter->fib6_node = NULL;
+               rt6_flush_exceptions(iter);
                 fib6_purge_rt(iter, fn, info->nl_net);
                 if (rcu_access_pointer(fn->rr_ptr) == iter)
                         fn->rr_ptr = NULL;

  reply	other threads:[~2021-01-05  3:06 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-08  3:55 Refcount mismatch when unregistering netdevice from kernel stranche
2020-12-08 15:08 ` Eric Dumazet
2020-12-08 18:09   ` Wei Wang
2020-12-08 19:12     ` stranche
2020-12-08 21:51       ` Wei Wang
2020-12-09  0:03         ` David Ahern
2020-12-11  1:12           ` stranche
2020-12-11 16:10             ` David Ahern
2021-01-05  3:05               ` stranche [this message]
2021-01-05  4:58                 ` David Ahern
2021-01-05 19:09                   ` Wei Wang
2021-02-11 19:21                     ` Alexei Starovoitov
2021-02-12  1:28                       ` Jakub Kicinski
2021-02-12  1:44                         ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f25d75823a73c6f0f556f0905f931d1@codeaurora.org \
    --to=stranche@codeaurora.org \
    --cc=dsahern@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=kuba@kernel.org \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=subashab@codeaurora.org \
    --cc=weiwan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.