From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Netperf UDP issue with connected sockets Date: Mon, 21 Nov 2016 10:10:57 -0800 Message-ID: <1479751857.8455.419.camel@edumazet-glaptop3.roam.corp.google.com> References: <20140903165943.372b897b@redhat.com> <1409757426.26422.41.camel@edumazet-glaptop2.roam.corp.google.com> <20161116131609.4e5726b4@redhat.com> <7c4b43a4-74bf-1ee2-6f0d-17783b5d8fcb@hpe.com> <20161116234022.2bad179b@redhat.com> <1479342849.8455.233.camel@edumazet-glaptop3.roam.corp.google.com> <20161117091638.5fab8494@redhat.com> <1479388850.8455.240.camel@edumazet-glaptop3.roam.corp.google.com> <20161117144248.23500001@redhat.com> <1479392258.8455.249.camel@edumazet-glaptop3.roam.corp.google.com> <20161117155753.17b76f5a@redhat.com> <1479399679.8455.255.camel@edumazet-glaptop3.roam.corp.google.com> <20161117193021.580589ae@redhat.com> <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com> <20161121170351.50a09ee1@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Rick Jones , netdev@vger.kernel.org, Saeed Mahameed , Tariq Toukan To: Jesper Dangaard Brouer Return-path: Received: from mail-pf0-f195.google.com ([209.85.192.195]:33481 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752299AbcKUSLA (ORCPT ); Mon, 21 Nov 2016 13:11:00 -0500 Received: by mail-pf0-f195.google.com with SMTP id 144so18723647pfv.0 for ; Mon, 21 Nov 2016 10:10:59 -0800 (PST) In-Reply-To: <20161121170351.50a09ee1@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Mon, 2016-11-21 at 17:03 +0100, Jesper Dangaard Brouer wrote: > On Thu, 17 Nov 2016 10:51:23 -0800 > Eric Dumazet wrote: > > > On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote: > > > > > The point is I can see a socket Send-Q forming, thus we do know the > > > application have something to send. Thus, and possibility for > > > non-opportunistic bulking. Allowing/implementing bulk enqueue from > > > socket layer into qdisc layer, should be fairly simple (and rest of > > > xmit_more is already in place). > > > > > > As I said, you are fooled by TX completions. > > Obviously TX completions play a role yes, and I bet I can adjust the > TX completion to cause xmit_more to happen, at the expense of > introducing added latency. > > The point is the "bloated" spinlock in __dev_queue_xmit is still caused > by the MMIO tailptr/doorbell. The added cost occurs when enqueueing > packets, and result in the inability to get enough packets into the > qdisc for xmit_more going (on my system). I argue that a bulk enqueue > API would allow us to get past the hurtle of transitioning into > xmit_more mode more easily. > This is very nice, but we already have bulk enqueue, it is called xmit_more. Kernel does not know your application is sending a packet after the one you send. xmit_more is not often used applications/stacks send many small packets. qdisc is empty (one enqueued packet is immediately dequeued so skb->xmit_more is 0), and even bypassed (TCQ_F_CAN_BYPASS) Not sure it this has been tried before, but the doorbell avoidance could be done by the driver itself, because it knows a TX completion will come shortly (well... if softirqs are not delayed too much !) Doorbell would be forced only if : ( "skb->xmit_more is not set" AND "TX engine is not 'started yet'" ) OR ( too many [1] packets were put in TX ring buffer, no point deferring more) Start the pump, but once it is started, let the doorbells being done by TX completion. ndo_start_xmit and TX completion handler would have to maintain a shared state describing if packets were ready but doorbell deferred. Note that TX completion means "if at least one packet was drained", otherwise busy polling, constantly calling napi->poll() would force a doorbell too soon for devices sharing a NAPI for both RX and TX. But then, maybe busy poll would like to force a doorbell... I could try these ideas on mlx4 shortly. [1] limit could be derived from active "ethtool -c" params, eg tx-frames