netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Václav Zindulka" <vaclav.zindulka@tlapnet.cz>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Linux Kernel Network Developers <netdev@vger.kernel.org>
Subject: Re: iproute2: tc deletion freezes whole server
Date: Fri, 27 Mar 2020 11:35:56 +0100	[thread overview]
Message-ID: <CANxWus8YFkWPELmau=tbTXYa8ezyMsC5M+vLrNPoqbOcrLo0Cg@mail.gmail.com> (raw)
In-Reply-To: <CAM_iQpWLK8ZKShdsWNQrbhFa2B9V8e+OSNRQ_06zyNmDToq5ew@mail.gmail.com>

On Thu, Mar 26, 2020 at 6:38 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>Are the stack traces captured by perf any different with unpatched?

I've compared initial perf reports. There is difference in call stack
between sfp+ interfaces vs ifb and metallic interfaces. While ifb and
metallic interfaces have normal behavior on sfp+ interfaces the call
stack is different. I've added perf record of eno1 metallic interface
here https://github.com/zvalcav/tc-kernel/tree/master/20200325/Intel-82599ES
so you can compare the behavior of all of them. I've also added perf
report of patched kernel.

--100.00%--entry_SYSCALL_64_after_hwframe
          do_syscall_64
          __sys_sendmsg
          ___sys_sendmsg
          ____sys_sendmsg
          sock_sendmsg
          netlink_sendmsg
          netlink_unicast
          netlink_rcv_skb
                                   <-- call stack is similar to other
interfaces
          |
           --99.98%--rtnetlink_rcv_msg
                     tc_get_qdisc
                     qdisc_graft
                                      <-- call stack splits in this
function - dev_deactivate is called only on sfp+ interfaces
                     |
                     |--98.36%--dev_deactivate
                              <-- 98% calls of fq_codel_reset()
function are done here. These must be the excessive calls - see below.
                     |          dev_deactivate_many
                     |          |
                     |          |--49.28%--dev_deactivate_queue.constprop.57
                     |          |          |
                     |          |           --49.27%--qdisc_reset
                     |          |                     hfsc_reset_qdisc
                     |          |                     |
                     |          |                      --48.69%--qdisc_reset
                     |          |                                |
                     |          |
--47.23%--fq_codel_reset             <-- half of excessive function
calls come from here
                     |          |                                           |
                     |          |
     |--3.20%--codel_vars_init
                     |          |                                           |
                     |          |
     |--1.64%--rtnl_kfree_skbs
                     |          |                                           |
                     |          |
      --0.81%--memset_erms
                     |          |
                     |           --49.08%--qdisc_reset
                     |                     hfsc_reset_qdisc
                     |                     |
                     |                      --48.53%--qdisc_reset
                     |                                |
                     |
--47.09%--fq_codel_reset                       <-- other half of
excessive function calls come from here
                     |                                           |
                     |
|--2.90%--codel_vars_init
                     |                                           |
                     |
|--1.61%--rtnl_kfree_skbs
                     |                                           |
                     |
--0.82%--memset_erms
                     |
                      --1.62%--qdisc_destroy
                               <-- here are remaining, (I suppose)
regular calls. Call stack is similar to other interfaces here.
                                |
                                |--0.86%--hfsc_destroy_qdisc
                                |          |
                                |           --0.82%--hfsc_destroy_class
                                |                     |
                                |                      --0.81%--qdisc_destroy
                                |                                |
                                |
--0.72%--fq_codel_reset
                                |
                                 --0.77%--hfsc_reset_qdisc
                                           |
                                            --0.75%--qdisc_reset
                                                      |
                                                       --0.70%--fq_codel_reset

> On Thu, Mar 26, 2020 at 10:07 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Thu, Mar 26, 2020 at 7:24 AM Václav Zindulka
> > <vaclav.zindulka@tlapnet.cz> wrote:
> > >
> > > > On Wed, Mar 25, 2020 at 6:43 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > > > > Are you able to test an experimental patch attached in this email?
> > > >
> > > > Sure. I'll compile new kernel tomorrow. Thank you for quick patch.
> > > > I'll let you know as soon as I have anything.
> > >
> > > I've compiled kernel with the patch you provided and tested it.
> > > However the problem is not solved. It behaves exactly the same way.
> > > I'm trying to put some printk into the fq_codel_reset() to test it
> > > more.
> >
> > Are the stack traces captured by perf any different with unpatched?
>
> Wait, it seems my assumption of refcnt==0 is wrong even for deletion,
> in qdisc_graft() the last refcnt is put after dev_deactivate()...

Your assumption is not totally wrong. I have added some printks into
fq_codel_reset() function. Final passes during deletion are processed
in the if condition you added in the patch - 13706. Yet the rest and
most of them go through regular routine - 1768074. 1024 is value of i
in for loop.

But the most significant problem is that fq_codel_reset() function is
processed too many times. According to kern.log it was being processed
about 77 000 times per second for almost 24 seconds. It was over 1.7
million passes during rules deletion.

Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791176] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791184] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791192] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791199] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791207] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791214] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791221] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791228] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791236] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791243] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791251] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791258] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791266] Function
fq_codel_reset - regular exit - passes: 1024
Mar 26 15:26:23 shaperd-prelouc1-dev kernel: [  110.791273] Function
fq_codel_reset - regular exit - passes: 1024

Here is another 1.7 million of lines... Function doesn't take much
time according to timestamp but it is being called way too many times.

Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017811] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017823] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017836] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017849] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017862] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017874] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 15:26:46 shaperd-prelouc1-dev kernel: [  134.017886] Function
fq_codel_reset - patch exit - passes: 1024
Mar 26 21:05:54 shaperd-prelouc1-dev kernel: [  134.017899] Function
fq_codel_reset - patch exit - passes: 1024

>
> Let me think about how we could distinguish this case from other
> reset cases.

As above, the problem is most probably in excessive calls of
fq_codel_reset(). I'm not surprised it freezes whole server or network
adapter.

Thank you

  reply	other threads:[~2020-03-27 10:36 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-22 18:06 iproute2: tc deletion freezes whole server Václav Zindulka
2020-03-23 18:17 ` Cong Wang
2020-03-24 16:27   ` Václav Zindulka
2020-03-24 22:57     ` Cong Wang
2020-03-25 11:27       ` Václav Zindulka
2020-03-25 13:33         ` Václav Zindulka
2020-03-25 17:43         ` Cong Wang
2020-03-25 18:23           ` Václav Zindulka
2020-03-26 14:24             ` Václav Zindulka
2020-03-26 17:07               ` Cong Wang
2020-03-26 17:38                 ` Cong Wang
2020-03-27 10:35                   ` Václav Zindulka [this message]
2020-03-28 13:04                     ` Václav Zindulka
2020-03-31  5:59                       ` Cong Wang
2020-03-31 12:46                         ` Václav Zindulka
2020-04-08 20:18                           ` Cong Wang
2020-04-12 20:17                             ` Václav Zindulka
2020-04-13 17:28                               ` Cong Wang
2020-04-15 15:01                                 ` Václav Zindulka
2020-04-30 12:40                                   ` Václav Zindulka
2020-05-04 17:46                                     ` Cong Wang
2020-05-04 20:36                                       ` Cong Wang
2020-05-05  8:46                                         ` Václav Zindulka
2020-05-07 18:52                                           ` Cong Wang
2020-05-08 13:59                                             ` Václav Zindulka
2020-05-17 19:35                                               ` Cong Wang
2020-05-18 14:16                                                 ` Václav Zindulka
2020-05-18 18:22                                                   ` Cong Wang
2020-05-19  8:04                                                     ` Václav Zindulka
2020-05-19 17:56                                                       ` Cong Wang
2020-05-24 10:03                                                         ` Václav Zindulka
2020-05-26 18:37                                                           ` Cong Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANxWus8YFkWPELmau=tbTXYa8ezyMsC5M+vLrNPoqbOcrLo0Cg@mail.gmail.com' \
    --to=vaclav.zindulka@tlapnet.cz \
    --cc=netdev@vger.kernel.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).