linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Julian Anastasov <ja@ssi.bg>
To: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "David S. Miller" <davem@davemloft.net>,
	David Ahern <dsahern@gmail.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com>,
	ddstreet@ieee.org, dvyukov@google.com,
	linux-kernel <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, syzkaller-bugs@googlegroups.com,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH] ipv4: Delete uncached routes upon unregistration of loopback device.
Date: Sat, 4 May 2019 23:13:45 +0300 (EEST)	[thread overview]
Message-ID: <alpine.LFD.2.21.1905042115250.6546@ja.home.ssi.bg> (raw)
In-Reply-To: <519ea12b-4c24-9e8e-c5eb-ca02c9c7d264@i-love.sakura.ne.jp>


	Hello,

On Sat, 4 May 2019, Tetsuo Handa wrote:

> syzbot is hitting infinite loop when a loopback device in a namespace is
> unregistered [1]. This is because rt_flush_dev() is moving the refcount of
> "any device to unregister" to "a loopback device in that namespace" but
> nobody can drop the refcount moved from non loopback devices when the
> loopback device in that namespace is unregistered.
> 
> This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly
> purge netdev references on uncached routes.") but there is no description
> why we have to temporarily move the refcount to "a loopback device in that
> namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op
> when "a loopback device in that namespace" is about to be unregistered.
> 
> Since I don't know the reason, this patch breaks the infinite loop by
> deleting the uncached route (which eventually drops the refcount via
> dst_destroy()) when "a loopback device in that namespace" is unregistered
> rather than when "non-loopback devices in that namespace" is unregistered.

	There is one simple rule: code that holds device references should
catch device events and to put the references. This should happen
not after the NETDEV_UNREGISTER event.

	But there are users such as dsts that have longer life because
there are other objects that can hold dsts - sockets, for example.
Sockets can hold dst as result of route resolving (sk_dst_cache) and to 
reuse it as long as it is valid. And many sockets can point to same dst.
Obviously, socket can be idle while device disappears. We do not
propagate device events to every socket. What we do is to mark the dst
entry as invalid and to drop its dev reference. So, the problem can be
noticed on next sending when the cached dst is revalidated. As the dst 
entry is marked as obsolete, it will be dropped.

	What you see in rt_flush_dev() is that the IPv4 route subsystem
drops its dev references at the right time. Why net->loopback_dev ?
Because we prefer the leaked dsts to prevent netns to be released, so
that we have more info where is the problem. If we account the leaks
to init netns, nobody will notice them. These are leaks that missed
many cleanup steps: device events and even net-exit events. If you
see "lo" in error messages, it is more likely a dst leak, i.e.
we got a dst reference but dst_release() was not called before netns
cleanup. In such case you need to track not the dev references but the
dst references. If another device is shown, it is not a dst leak
but some dev leak (dev_put not called).

> [1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reported-by: syzbot <syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com>
> ---
>  net/ipv4/route.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6fdf1c195d8e..7e865c11d4f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev)
>  {
>  	struct net *net = dev_net(dev);
>  	struct rtable *rt;
> +	struct rtable *tmp;
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu) {
>  		struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu);
>  
>  		spin_lock_bh(&ul->lock);
> -		list_for_each_entry(rt, &ul->head, rt_uncached) {
> +		list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) {
>  			if (rt->dst.dev != dev)
>  				continue;
> +			if (dev == net->loopback_dev) {
> +				list_del_init(&rt->rt_uncached);
> +				ip_rt_put(rt);
> +				continue;
> +			}
>  			rt->dst.dev = net->loopback_dev;
>  			dev_hold(rt->dst.dev);
>  			dev_put(dev);
> -- 
> 2.17.1

Regards

--
Julian Anastasov <ja@ssi.bg>

  parent reply	other threads:[~2019-05-04 20:14 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-15 18:50 unregister_netdevice: waiting for DEV to become free (2) syzbot
2018-08-15 20:28 ` syzbot
2018-08-15 20:41   ` Dmitry Vyukov
2018-08-20  4:31 ` syzbot
2018-08-20 12:55   ` Julian Anastasov
2018-08-21  5:40     ` Cong Wang
2018-08-22  4:11       ` Julian Anastasov
2019-04-15 13:36     ` Tetsuo Handa
2019-04-15 15:35       ` David Ahern
2019-04-21 20:41         ` Stephen Suryaputra
2019-04-22 14:58           ` David Ahern
2019-04-22 16:04             ` Eric Dumazet
2019-04-22 16:09               ` Eric Dumazet
2019-04-16 14:00       ` Tetsuo Handa
2019-04-26 13:43         ` Tetsuo Handa
2019-04-27 17:16           ` David Ahern
2019-04-27 22:33             ` Tetsuo Handa
2019-04-27 23:52               ` Eric Dumazet
2019-04-28  4:22                 ` Tetsuo Handa
2019-04-28 15:04                   ` Eric Dumazet
2019-04-29 18:34                   ` David Ahern
2019-04-29 18:43                     ` David Ahern
2019-05-01 13:38                       ` Tetsuo Handa
2019-05-01 14:52                         ` David Ahern
2019-05-01 16:16                           ` Tetsuo Handa
2019-05-04 14:52                             ` [PATCH] ipv4: Delete uncached routes upon unregistration of loopback device Tetsuo Handa
2019-05-04 15:56                               ` Eric Dumazet
2019-05-04 17:09                                 ` Tetsuo Handa
2019-05-04 17:24                                   ` Eric Dumazet
2019-05-04 20:13                               ` Julian Anastasov [this message]
2019-11-28  9:56     ` unregister_netdevice: waiting for DEV to become free (2) Tetsuo Handa
2019-11-29  5:54       ` Lukas Bulwahn
2019-11-29  6:51       ` Jouni Högander
2019-12-05 10:00       ` Jouni Högander
2019-12-05 11:00         ` Tetsuo Handa
2019-12-16 11:12           ` Tetsuo Handa
2019-12-17  7:08             ` Jouni Högander
2019-10-11 10:14   ` Tetsuo Handa
2019-10-11 15:12     ` Alexei Starovoitov
2019-10-16 10:34       ` Toke Høiland-Jørgensen
2019-11-15  9:43         ` Tetsuo Handa
2019-11-21 11:36           ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.21.1905042115250.6546@ja.home.ssi.bg \
    --to=ja@ssi.bg \
    --cc=davem@davemloft.net \
    --cc=ddstreet@ieee.org \
    --cc=dsahern@gmail.com \
    --cc=dvyukov@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=syzbot+30209ea299c09d8785c9@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).