From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [net-next PATCH V1 1/3] net: bulk alloc and reuse of SKBs in NAPI context Date: Tue, 10 May 2016 08:44:13 -0700 Message-ID: <1462895053.23934.86.camel@edumazet-glaptop3.roam.corp.google.com> References: <20160509134352.3573.37044.stgit@firesoul> <20160509134429.3573.4048.stgit@firesoul> <20160509215956.19ec1c10@redhat.com> <20160510143017.212c3846@redhat.com> <1462888134.23934.60.camel@edumazet-glaptop3.roam.corp.google.com> <20160510164857.72c8a1cb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Alexander Duyck , Netdev , "David S. Miller" , Saeed Mahameed , Or Gerlitz , Eugenia Emantayev To: Jesper Dangaard Brouer Return-path: Received: from mail-pf0-f170.google.com ([209.85.192.170]:35279 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751330AbcEJPoQ (ORCPT ); Tue, 10 May 2016 11:44:16 -0400 Received: by mail-pf0-f170.google.com with SMTP id 77so7145007pfv.2 for ; Tue, 10 May 2016 08:44:15 -0700 (PDT) In-Reply-To: <20160510164857.72c8a1cb@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Tue, 2016-05-10 at 16:48 +0200, Jesper Dangaard Brouer wrote: > On Tue, 10 May 2016 06:48:54 -0700 > Eric Dumazet wrote: > > > On Tue, 2016-05-10 at 14:30 +0200, Jesper Dangaard Brouer wrote: > > > > > Disable busy poll on both client and server, Not patched: > > > > > > $ netperf -H 198.18.40.2 -t TCP_RR -l 60 -T 6,6 -Cc > > > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 port 0 AF_INET to 198.18.40.2 > > > () port 0 AF_INET : histogram : demo : first burst 0 : cpu bind > > > Local /Remote > > > Socket Size Request Resp. Elapsed Trans. CPU CPU S.dem S.dem > > > Send Recv Size Size Time Rate local remote local remote > > > bytes bytes bytes bytes secs. per sec % S % S us/Tr us/Tr > > > > > > 16384 87380 1 1 60.00 78077.55 3.74 2.69 3.830 8.265 > > > 16384 87380 > > > > Tell us more about the -T6,6 > > > > For example how many TX/RX queues you have on the NIC, and which cpus > > service interrupts. > > The -T6,6 option: > -T lcpu,rcpu Request netperf/netserver be bound to local/remote cpu > Sure, I know -T option in netperf. > I use the option to get more stable results. If I don't pin/bind the > CPU netperf/netserver is running on then the CPU scheduler will migrate > the processes around. This gives unpredictable results, worst for the > busy_poll tests. Especially if the RX softirq runs on the same CPU > (also true if it runs on a HyperTread siping). > > Netperf client (8 cores i7-4790K CPU @ 4.00GHz) RX:8 and TX:8 queues. > Netserver server (2x 12 cores E5-2630 @ 2.30GHz) RX:8 and TX:24 queues. > Driver mlx4. > Disabled GRO to hit code path I changed in patch 2. But are you using stuff like aRFS, RPS , RFS ? Each netperf run lands on different cpus, and we know that results can have a 25% variability because of that, even more on 2-node systems. By forcing -T6,6 you force the netperf/netserver cpu, not the RX queues. A nice effort would be to be able to chose the source in the 4-tuple at connect() time so that we know that Toeplitz hash will select the 'correct' RX queue.