From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [RFC] net: remove busylock Date: Thu, 19 May 2016 11:56:30 -0700 Message-ID: <1463684190.18194.228.camel@edumazet-glaptop3.roam.corp.google.com> References: <1463677716.18194.203.camel@edumazet-glaptop3.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: netdev , Alexander Duyck To: Alexander Duyck Return-path: Received: from mail-pa0-f68.google.com ([209.85.220.68]:34115 "EHLO mail-pa0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754835AbcESS4i (ORCPT ); Thu, 19 May 2016 14:56:38 -0400 Received: by mail-pa0-f68.google.com with SMTP id yl2so8649782pac.1 for ; Thu, 19 May 2016 11:56:38 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 2016-05-19 at 11:03 -0700, Alexander Duyck wrote: > On Thu, May 19, 2016 at 10:08 AM, Eric Dumazet wrote: > > busylock was added at the time we had expensive ticket spinlocks > > > > (commit 79640a4ca6955e3ebdb7038508fa7a0cd7fa5527 ("net: add additio= nal > > lock to qdisc to increase throughput") > > > > Now kernel spinlocks are MCS, this busylock things is no longer > > relevant. It is slowing down things a bit. > > > > > > With HTB qdisc, here are the numbers for 200 concurrent TCP_RR, on = a host with 48 hyperthreads. > > > > lpaa5:~# sar -n DEV 4 4 |grep eth0 > > 10:05:44 eth0 798951.25 798951.75 52276.22 52275.26 = 0.00 0.00 0.50 > > 10:05:48 eth0 798576.00 798572.75 52251.24 52250.39 = 0.00 0.00 0.75 > > 10:05:52 eth0 798746.00 798748.75 52262.89 52262.13 = 0.00 0.00 0.50 > > 10:05:56 eth0 798303.25 798291.50 52235.22 52233.10 = 0.00 0.00 0.50 > > Average: eth0 798644.12 798641.19 52256.39 52255.22 = 0.00 0.00 0.56 > > > > Disabling busylock (by using a local sysctl) > > > > lpaa5:~# sar -n DEV 4 4 |grep eth0 > > 10:05:14 eth0 864085.75 864091.50 56538.09 56537.46 = 0.00 0.00 0.50 > > 10:05:18 eth0 864734.75 864729.25 56580.35 56579.05 = 0.00 0.00 0.75 > > 10:05:22 eth0 864366.00 864361.50 56556.74 56555.00 = 0.00 0.00 0.50 > > 10:05:26 eth0 864246.50 864248.75 56549.19 56547.65 = 0.00 0.00 0.50 > > Average: eth0 864358.25 864357.75 56556.09 56554.79 = 0.00 0.00 0.56 > > > > That would be a 8 % increase. >=20 > The main point of the busy lock is to deal with the bulk throughput > case, not the latency case which would be relatively well behaved. > The problem wasn't really related to lock bouncing slowing things > down. It was the fairness between the threads that was killing us > because the dequeue needs to have priority. >=20 > The main problem that the busy lock solved was the fact that you coul= d > start a number of stream tests equal to the number of CPUs in a given > system and the result was that the performance would drop off a cliff > and you would drop almost all the packets for almost all the streams > because the qdisc never had a chance to drain because it would be CPU > - 1 enqueues, followed by 1 dequeue. >=20 > What we need if we are going to get rid of busy lock would be some > sort of priority locking mechanism that would allow the dequeue threa= d > to jump to the head of the line if it is attempting to take the lock. > Otherwise you end up spending all your time enqueuing packets into > oblivion because the qdiscs just overflow without the busy lock in > place. Removing busylock helped in all cases I tested. (at least on x86 as David pointed out) As I said, we need to revisit busylock now that spinlocks are different= =2E In one case (20 concurrent UDP netperf), I even got a 500 % increase. With busylock : lpaa5:~# sar -n DEV 4 4|grep eth0 11:33:34 eth0 9.00 115057.00 1.60 38426.92 0.00= 0.00 0.50 11:33:38 eth0 13.50 113237.75 2.04 37819.69 0.00= 0.00 0.75 11:33:42 eth0 13.50 111492.25 1.76 37236.58 0.00= 0.00 0.75 11:33:46 eth0 12.75 111401.50 2.40 37205.93 0.00= 0.00 0.75 Average: eth0 12.19 112797.12 1.95 37672.28 0.00= 0.00 0.69 Packets are dropped in HTB because we hit a limit of 1000 packets there - 100.00% netperf [kernel.kallsyms] [k] kfree_skb =E2=96=92 - kfree_skb =E2=96=92 - 100.00% htb_enqueue =E2=96=92 __dev_queue_xmit =E2=96=92 dev_queue_xmit =E2=96=92 ip_finish_output2 =E2=96=92 ip_finish_output =E2=96=92 ip_output =E2=96=92 ip_local_out =E2=96=92 + ip_send_skb =20 Presumably it would tremendously help if the actual kfree_skb() was done after qdisc lock is released, ie not from the qdisc->enqueue() method. Without busylock : lpaa5:~# sar -n DEV 4 4|grep eth0 11:41:12 eth0 11.00 669053.50 1.99 223452.30 0.00= 0.00 0.75 11:41:16 eth0 8.50 669513.25 2.27 223605.55 0.00= 0.00 0.75 11:41:20 eth0 3.50 669426.50 0.90 223577.19 0.00= 0.00 0.50 11:41:24 eth0 8.25 669284.00 1.42 223529.79 0.00= 0.00 0.50 Average: eth0 7.81 669319.31 1.65 223541.21 0.00= 0.00 0.62