netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taehee Yoo <ap420073@gmail.com>
To: Eric Dumazet <edumazet@google.com>, Dmitry Vyukov <dvyukov@google.com>
Cc: syzbot <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com>,
	"Jiří Pírko" <jiri@resnulli.us>,
	davem@davemloft.net, kuba@kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	pabeni@redhat.com, syzkaller-bugs@googlegroups.com,
	"Cong Wang" <xiyou.wangcong@gmail.com>
Subject: Re: [syzbot] kernel panic: kernel stack overflow
Date: Fri, 21 Oct 2022 20:08:13 +0900	[thread overview]
Message-ID: <8bac748a-3309-b249-c098-f0a86ed7d384@gmail.com> (raw)
In-Reply-To: <f22f16ec-e78b-bf9e-ea43-5232b2403fa1@gmail.com>

Hi,

2022. 10. 14. 오전 12:00에 Taehee Yoo 이(가) 쓴 글:
 > Hi,
 >
 > On 10/12/22 21:19, Eric Dumazet wrote:
 >  > On Wed, Oct 12, 2022 at 12:53 AM Dmitry Vyukov <dvyukov@google.com>
 > wrote:
 >  >>
 >  >> On Wed, 12 Oct 2022 at 09:48, syzbot
 >  >> <syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com> wrote:
 >  >>>
 >  >>> Hello,
 >  >>>
 >  >>> syzbot found the following issue on:
 >  >>>
 >  >>> HEAD commit:    bbed346d5a96 Merge branch 'for-next/core' into
 > for-kernelci
 >  >>> git tree:
 > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-kernelci
 >  >>> console output:
 > https://syzkaller.appspot.com/x/log.txt?x=14a03a2a880000
 >  >>> kernel config:
 > https://syzkaller.appspot.com/x/.config?x=aae2d21e7dd80684
 >  >>> dashboard link:
 > https://syzkaller.appspot.com/bug?extid=60748c96cf5c6df8e581
 >  >>> compiler:       Debian clang version
 > 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld
 > (GNU Binutils for Debian) 2.35.2
 >  >>> userspace arch: arm64
 >  >>>
 >  >>> Unfortunately, I don't have any reproducer for this issue yet.
 >  >>>
 >  >>> Downloadable assets:
 >  >>> disk image:
 > 
https://storage.googleapis.com/syzbot-assets/11078f50b80b/disk-bbed346d.raw.xz 

 >
 >  >>> vmlinux:
 > 
https://storage.googleapis.com/syzbot-assets/398e5f1e6c84/vmlinux-bbed346d.xz 

 >
 >  >>>
 >  >>> IMPORTANT: if you fix the issue, please add the following tag to
 > the commit:
 >  >>> Reported-by: syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com
 >  >>
 >  >> +Jiri
 >  >>
 >  >> It looks like the issue is with the team device. It seems to call
 >  >> itself infinitely.
 >  >> team_device_event was mentioned in stack overflow bugs in the past:
 >  >>
 > 
https://groups.google.com/g/syzkaller-bugs/search?q=%22team_device_event%22
 >  >>
 >  >
 >  >
 >  > Taehee Yoo, can you take a look ?
 >  >
 >  > Patch series of yours was supposed to limit max nest level to 8
 >  >
 >  >
 > 
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=65921376425fc9c8b7ce647e1f7989f7cdf5dd70 

 >
 >  >
 >
 > I found a reproducer.
 >
 > #test_team.sh
 > ip link add dummy0 type dummy
 > ip link set dummy0 up
 > for a1 in {0..1}
 > do
 >          ip link add team$a1 type team
 >          for a2 in {0..1}
 >          do
 >                  ip link add team$a1$a2 master team$a1 type team
 >                  for a3 in {0..1}
 >                  do
 >                          ip link add team$a1$a2$a3 master team$a1$a2
 > type team
 >                          for a4 in {0..1}
 >                          do
 >                                  ip link add team$a1$a2$a3$a4 master
 > team$a1$a2$a3 type team
 >                                  for a5 in {0..1}
 >                                  do
 >                                          ip link add team$a1$a2$a3$a4$a5
 > master team$a1$a2$a3$a4 type team
 >                                          for a6 in {0..1}
 >                                          do
 >                                                  ip link add
 > team$a1$a2$a3$a4$a5$a6 master team$a1$a2$a3$a4$a5 type team
 >                                                  ip link add
 > macvlan$a1$a2$a3$a4$a5$a6 link dummy0 master team$a1$a2$a3$a4$a5$a6 type
 > macvlan
 >                                                  ip link set
 > macvlan$a1$a2$a3$a4$a5$a6 up
 >                                                  ip link set
 > team$a1$a2$a3$a4$a5$a6 up
 >                                          done
 >                                          ip link set 
team$a1$a2$a3$a4$a5 up
 >                                  done
 >                                  ip link set team$a1$a2$a3$a4 up
 >                          done
 >                          ip link set team$a1$a2$a3 up
 >                  done
 >                  ip link set team$a1$a2 up
 >          done
 >          ip link set team$a1 up
 > done
 >
 > #test_ethtool.sh
 > for a1 in {0..1}
 > do
 >          ethtool -K team$a1 lro $1
 >          for a2 in {0..1}
 >          do
 >                  ethtool -K team$a1$a2 lro $1
 >                  for a3 in {0..1}
 >                  do
 >                          ethtool -K team$a1$a2$a3 lro $1
 >                          for a4 in {0..1}
 >                          do
 >                                  ethtool -K team$a1$a2$a3$a4 lro $1
 >                                  for a5 in {0..1}
 >                                  do
 >                                          ethtool -K team$a1$a2$a3$a4$a5
 > lro $1
 >                                          for a6 in {0..1}
 >                                          do
 >                                                  ethtool -K
 > team$a1$a2$a3$a4$a5$a6 lro $1
 >                                                  ethtool -K
 > macvlan$a1$a2$a3$a4$a5$a6 lro $1
 >                                          done
 >                                  done
 >                          done
 >                  done
 >          done
 > done
 >
 > shell#1
 > bash test_team.sh
 > while :
 > do
 > bash test_ethtool.sh on
 > done
 > shell#2
 > while :
 > do
 > bash test_ethtool.sh off
 > done
 >
 > We can see a very similar call trace with the above reproducer.
 > I think it is the same issue.
 > Could you please test it?
 >
 > And, I found the fixed same issue too.
 > 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.0&id=dd912306ff008891c82cd9f63e8181e47a9cb2fb 

 >
 > https://groups.google.com/g/syzkaller-bugs/c/-5OV1OW-dS4/m/o2Oq6AYSAwAJ
 >

I found the root cause of this issue.

This is simpler reproducer.

ip link add team0 type team
ethtool -K team0 lro on
for i in {1..100}
do
         ip link add team$i master team0 type team
         ethtool -K team$i lro on
done

ethtool -K team0 lro off

The above graph is like below:
        team0
          |
   +------+------+-----+-----+
   |      |      |     |     |
team1  team2  team3  ...  team100

int __netdev_update_features(struct net_device *dev)
{
         struct net_device *upper, *lower;
         netdev_features_t features;
         struct list_head *iter;
         int err = -1;
...
sync_lower:
         /* some features must be disabled on lower devices when disabled
          * on an upper device (think: bonding master or bridge)
          */
         netdev_for_each_lower_dev(dev, lower, iter)
                 netdev_sync_lower_features(dev, lower, features);
...


static void netdev_sync_lower_features(struct net_device *upper,
         struct net_device *lower, netdev_features_t features)
{
         netdev_features_t upper_disables = NETIF_F_UPPER_DISABLES;
         netdev_features_t feature;
         int feature_bit;

         for_each_netdev_feature(upper_disables, feature_bit) {
                 feature = __NETIF_F_BIT(feature_bit);
                 if (!(features & feature) && (lower->features & feature)) {
                         netdev_dbg(upper, "Disabling feature %pNF on 
lower dev %s.\n",
                                    &feature, lower->name);
                         lower->wanted_features &= ~feature;
                         __netdev_update_features(lower);

                         if (unlikely(lower->features & feature))
                                 netdev_WARN(upper, "failed to disable 
%pNF on %s!\n",
                                             &feature, lower->name);
                         else
                                 netdev_features_change(lower);<-----HERE
                 }
         }
}

void netdev_features_change(struct net_device *dev)
{
         call_netdevice_notifiers(NETDEV_FEAT_CHANGE, dev);
}

The code looks like an iterator.
But it would work recursively because of notification.

When team0's feature(LRO) is changed with <ethtool -K team0 lro off>", 
__netdev_update_features(team0) is called.
__netdev_update_features(team0) internally sends NETDEV_FEAT_CHANGE 
event to all lower interfaces(team1, team2, ... team100).
team1 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE 
to the upper interface(team0).
team0 will receive NETDEV_FEAT_CHANGE again, and it sends 
NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
(At this point, team1 flag was already set, so it will be skipped.)
team2 will receive NETDEV_FEAT_CHANGE, and it sends NETDEV_FEAT_CHANGE 
to the upper interface(team0).
team0 will receive NETDEV_FEAT_CHANGE again again, and it sends 
NETDEV_FEAT_CHANGE to the all lower interfaces(team1, team2, ... team100).
(team1, team2 skipped.)
...
So, if there are a few lower interfaces(roughly under 30 lower 
interfaces), it anyway works even if internally works recursively.
But so many lower interfaces exist, stack overflow will occur.
This is the root cause of this issue.

I think synchronization direction should be one way.
Up or Down.
It means that if the team0 interface can send the NETDEV_FEAT_CHANGE 
notification event to the lower interface,
the lower interfaces should be disallowed to send NETDEV_FEAT_CHANGE 
event to the upper interface.

bonding has same issue.

  parent reply	other threads:[~2022-10-21 11:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12  7:48 [syzbot] kernel panic: kernel stack overflow syzbot
2022-10-12  7:53 ` Dmitry Vyukov
2022-10-12 12:19   ` Eric Dumazet
2022-10-12 13:31     ` Taehee Yoo
2022-10-13 15:00     ` Taehee Yoo
2022-10-13 16:04       ` Eric Dumazet
2022-10-21 11:08       ` Taehee Yoo [this message]
2023-04-07  7:22         ` wangyufen
2023-04-07  8:17           ` Taehee Yoo
2022-10-12 13:11   ` Jiri Pirko
2022-10-12 13:54     ` Dmitry Vyukov
2022-10-12 15:08       ` Jiri Pirko
2022-10-12 16:42         ` Eric Dumazet
2022-10-13  7:11           ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8bac748a-3309-b249-c098-f0a86ed7d384@gmail.com \
    --to=ap420073@gmail.com \
    --cc=davem@davemloft.net \
    --cc=dvyukov@google.com \
    --cc=edumazet@google.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+60748c96cf5c6df8e581@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).