All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: netdev <netdev@vger.kernel.org>, Alexander Duyck <aduyck@mirantis.com>
Subject: Re: [RFC] net: remove busylock
Date: Thu, 19 May 2016 11:56:30 -0700	[thread overview]
Message-ID: <1463684190.18194.228.camel@edumazet-glaptop3.roam.corp.google.com> (raw)
In-Reply-To: <CAKgT0UfBxW=KpqJux+tjyNpHQUHhZ5Laiqnt5FPs=jpkBJWrHA@mail.gmail.com>

On Thu, 2016-05-19 at 11:03 -0700, Alexander Duyck wrote:
> On Thu, May 19, 2016 at 10:08 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > busylock was added at the time we had expensive ticket spinlocks
> >
> > (commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 ("net: add additional
> > lock to qdisc to increase throughput")
> >
> > Now kernel spinlocks are MCS, this busylock things is no longer
> > relevant. It is slowing down things a bit.
> >
> >
> > With HTB qdisc, here are the numbers for 200 concurrent TCP_RR, on a host with 48 hyperthreads.
> >
> > lpaa5:~# sar -n DEV 4 4 |grep eth0
> > 10:05:44         eth0 798951.25 798951.75  52276.22  52275.26      0.00      0.00      0.50
> > 10:05:48         eth0 798576.00 798572.75  52251.24  52250.39      0.00      0.00      0.75
> > 10:05:52         eth0 798746.00 798748.75  52262.89  52262.13      0.00      0.00      0.50
> > 10:05:56         eth0 798303.25 798291.50  52235.22  52233.10      0.00      0.00      0.50
> > Average:         eth0 798644.12 798641.19  52256.39  52255.22      0.00      0.00      0.56
> >
> > Disabling busylock (by using a local sysctl)
> >
> > lpaa5:~# sar -n DEV 4 4 |grep eth0
> > 10:05:14         eth0 864085.75 864091.50  56538.09  56537.46      0.00      0.00      0.50
> > 10:05:18         eth0 864734.75 864729.25  56580.35  56579.05      0.00      0.00      0.75
> > 10:05:22         eth0 864366.00 864361.50  56556.74  56555.00      0.00      0.00      0.50
> > 10:05:26         eth0 864246.50 864248.75  56549.19  56547.65      0.00      0.00      0.50
> > Average:         eth0 864358.25 864357.75  56556.09  56554.79      0.00      0.00      0.56
> >
> > That would be a 8 % increase.
> 
> The main point of the busy lock is to deal with the bulk throughput
> case, not the latency case which would be relatively well behaved.
> The problem wasn't really related to lock bouncing slowing things
> down.  It was the fairness between the threads that was killing us
> because the dequeue needs to have priority.



> 
> The main problem that the busy lock solved was the fact that you could
> start a number of stream tests equal to the number of CPUs in a given
> system and the result was that the performance would drop off a cliff
> and you would drop almost all the packets for almost all the streams
> because the qdisc never had a chance to drain because it would be CPU
> - 1 enqueues, followed by 1 dequeue.
> 
> What we need if we are going to get rid of busy lock would be some
> sort of priority locking mechanism that would allow the dequeue thread
> to jump to the head of the line if it is attempting to take the lock.
> Otherwise you end up spending all your time enqueuing packets into
> oblivion because the qdiscs just overflow without the busy lock in
> place.


Removing busylock helped in all cases I tested. (at least on x86 as
David pointed out)

As I said, we need to revisit busylock now that spinlocks are different.

In one case (20 concurrent UDP netperf), I even got a 500 % increase.

With busylock :

lpaa5:~# sar -n DEV 4 4|grep eth0
11:33:34         eth0      9.00 115057.00      1.60  38426.92      0.00      0.00      0.50
11:33:38         eth0     13.50 113237.75      2.04  37819.69      0.00      0.00      0.75
11:33:42         eth0     13.50 111492.25      1.76  37236.58      0.00      0.00      0.75
11:33:46         eth0     12.75 111401.50      2.40  37205.93      0.00      0.00      0.75
Average:         eth0     12.19 112797.12      1.95  37672.28      0.00      0.00      0.69

Packets are dropped in HTB because we hit a limit of 1000 packets there

- 100.00%  netperf  [kernel.kallsyms]  [k] kfree_skb           ▒
   - kfree_skb                                                 ▒
      -  100.00% htb_enqueue                                   ▒
            __dev_queue_xmit                                   ▒
            dev_queue_xmit                                     ▒
            ip_finish_output2                                  ▒
            ip_finish_output                                   ▒
            ip_output                                          ▒
            ip_local_out                                       ▒
         +  ip_send_skb      


Presumably it would tremendously help if the actual kfree_skb()
was done after qdisc lock is released, ie not from the qdisc->enqueue()
method.


Without busylock :

lpaa5:~# sar -n DEV 4 4|grep eth0
11:41:12         eth0     11.00 669053.50      1.99 223452.30      0.00      0.00      0.75
11:41:16         eth0      8.50 669513.25      2.27 223605.55      0.00      0.00      0.75
11:41:20         eth0      3.50 669426.50      0.90 223577.19      0.00      0.00      0.50
11:41:24         eth0      8.25 669284.00      1.42 223529.79      0.00      0.00      0.50
Average:         eth0      7.81 669319.31      1.65 223541.21      0.00      0.00      0.62

  parent reply	other threads:[~2016-05-19 18:56 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-19 17:08 [RFC] net: remove busylock Eric Dumazet
2016-05-19 18:03 ` Alexander Duyck
2016-05-19 18:41   ` Rick Jones
2016-05-19 18:56   ` Eric Dumazet [this message]
2016-05-19 19:35     ` Eric Dumazet
2016-05-19 20:39       ` Alexander Duyck
2016-05-20  4:49         ` John Fastabend
2016-05-20  4:56           ` Eric Dumazet
2016-05-20  7:29   ` Jesper Dangaard Brouer
2016-05-20 13:11     ` Eric Dumazet
2016-05-20 13:47       ` Eric Dumazet
2016-05-20 14:16         ` Eric Dumazet
2016-05-20 17:49           ` Jesper Dangaard Brouer
2016-05-20 21:32             ` Eric Dumazet
2016-05-23  9:50               ` Jesper Dangaard Brouer
2016-05-23 21:24                 ` [PATCH net] net_sched: avoid too many hrtimer_start() calls Eric Dumazet
2016-05-24 21:49                   ` David Miller
2016-05-24 13:50             ` [RFC] net: remove busylock David Laight
2016-05-24 14:37               ` Eric Dumazet
2016-05-20 16:01       ` John Fastabend
2016-05-19 18:12 ` David Miller
2016-05-19 18:44   ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1463684190.18194.228.camel@edumazet-glaptop3.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=aduyck@mirantis.com \
    --cc=alexander.duyck@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.