From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: [PATCH net-next] virtio_net: ethtool tx napi configuration Date: Fri, 14 Sep 2018 00:46:16 -0400 Message-ID: References: <20180909224449.203593-1-willemdebruijn.kernel@gmail.com> <2fd8d46f-7466-e507-af31-c587693beb6e@gmail.com> <4a3d69be-8651-e36d-bc14-5a3f1f23d155@redhat.com> <6db3c755-a1dc-dcc9-e110-bfc38143e83d@redhat.com> <0af0043d-c13c-e68e-795d-ef62901f57fc@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: Florian Fainelli , Network Development , David Miller , caleb.raitto@gmail.com, "Michael S. Tsirkin" , "Jon Olson (Google Drive)" , Willem de Bruijn To: Jason Wang Return-path: Received: from mail-ed1-f66.google.com ([209.85.208.66]:45342 "EHLO mail-ed1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726396AbeINJ7f (ORCPT ); Fri, 14 Sep 2018 05:59:35 -0400 Received: by mail-ed1-f66.google.com with SMTP id p52-v6so6336772eda.12 for ; Thu, 13 Sep 2018 21:46:53 -0700 (PDT) In-Reply-To: <0af0043d-c13c-e68e-795d-ef62901f57fc@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Sep 13, 2018 at 11:53 PM Jason Wang wrote: > > > > On 2018=E5=B9=B409=E6=9C=8814=E6=97=A5 11:40, Willem de Bruijn wrote: > > On Thu, Sep 13, 2018 at 11:27 PM Jason Wang wrote= : > >> > >> > >> On 2018=E5=B9=B409=E6=9C=8813=E6=97=A5 22:58, Willem de Bruijn wrote: > >>> On Thu, Sep 13, 2018 at 5:02 AM Jason Wang wrot= e: > >>>> > >>>> On 2018=E5=B9=B409=E6=9C=8813=E6=97=A5 07:27, Willem de Bruijn wrote= : > >>>>> On Wed, Sep 12, 2018 at 3:11 PM Willem de Bruijn > >>>>> wrote: > >>>>>> On Wed, Sep 12, 2018 at 2:16 PM Florian Fainelli wrote: > >>>>>>> On 9/12/2018 11:07 AM, Willem de Bruijn wrote: > >>>>>>>> On Wed, Sep 12, 2018 at 1:42 PM Florian Fainelli wrote: > >>>>>>>>> On 9/9/2018 3:44 PM, Willem de Bruijn wrote: > >>>>>>>>>> From: Willem de Bruijn > >>>>>>>>>> > >>>>>>>>>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) ha= ndlers. > >>>>>>>>>> Interrupt moderation is currently not supported, so these acce= pt and > >>>>>>>>>> display the default settings of 0 usec and 1 frame. > >>>>>>>>>> > >>>>>>>>>> Toggle tx napi through a bit in tx-frames. So as to not interf= ere > >>>>>>>>>> with possible future interrupt moderation, use bit 10, well ou= tside > >>>>>>>>>> the reasonable range of real interrupt moderation values. > >>>>>>>>>> > >>>>>>>>>> Changes are not atomic. The tx IRQ, napi BH and transmit path = must > >>>>>>>>>> be quiesced when switching modes. Only allow changing this set= ting > >>>>>>>>>> when the device is down. > >>>>>>>>> Humm, would not a private ethtool flag to switch TX NAPI on/off= be more > >>>>>>>>> appropriate rather than use the coalescing configuration API he= re? > >>>>>>>> What do you mean by private ethtool flag? A new field in ethtool > >>>>>>>> --features (-k)? > >>>>>>> I meant using ethtool_drvinfo::n_priv_flags, ETH_SS_PRIV_FLAGS an= d then > >>>>>>> ETHTOOL_GFPFLAGS and ETHTOOL_SPFLAGS to control the toggling of t= hat > >>>>>>> private flag. mlx5 has a number of privates flags for instance. > >>>>>> Interesting, thanks! I was not at all aware of those ethtool flags= . > >>>>>> Am having a look. It definitely looks promising. > >>>>> Okay, I made that change. That is indeed much cleaner, thanks. > >>>>> Let me send the patch, initially as RFC. > >>>>> > >>>>> I've observed one issue where if we toggle the flag before bringing > >>>>> up the device, it hits a kernel BUG at include/linux/netdevice.h:51= 5 > >>>>> > >>>>> BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > >>>> This reminds me that we need to check netif_running() before trying = to > >>>> enable and disable tx napi in ethtool_set_coalesce(). > >>> The first iteration of my patch checked IFF_UP and effectively > >>> only allowed the change when not running. What do you mean > >>> by need to check? > >> I mean if device is not up, there's no need to toggle napi state and t= x > >> lock. > >> > >>> And to respond to the other follow-up notes at once: > >>> > >>>> Consider we may have interrupt moderation in the future, I tend to u= se > >>>> set_coalesce. Otherwise we may need two steps to enable moderation: > >>>> > >>>> - tx-napi on > >>>> - set_coalesce > >>> FWIW, I don't care strongly whether we do this through coalesce or pr= iv_flags. > >> Ok. > > Since you prefer coalesce, let's go with that (and a revision of your > > latest patch). > > Good to know this. > > >>>>> + if (!napi_weight) > >>>>> + virtqueue_enable_cb(vi->sq[i].vq); > >>>> I don't get why we need to disable enable cb here. > >>> To avoid entering no-napi mode with too few descriptors to > >>> make progress and no way to get out of that state. This is a > >>> pretty crude attempt at handling that, admittedly. > >> But in this case, we will call enable_cb_delayed() and we will finally > >> get a interrupt? > > Right. It's a bit of a roundabout way to ensure that > > netif_tx_wake_queue and thus eventually free_old_xmit_skbs are called. > > It might make more sense to just wake the device without going through > > an interrupt. > > I'm not sure I get this. If we don't enable tx napi, we tend to delay TX > interrupt if we found the ring is about to full to avoid interrupt > storm, so we're probably ok in this case. I'm only concerned about the transition state when converting from napi to no-napi when the queue is stopped and tx interrupt disabled. With napi mode the interrupt is only disabled if napi is scheduled, in which case it will eventually reenable the interrupt. But when switching to no-napi mode in this state no progress will be made. But it seems this cannot happen. When converting to no-napi mode, set_coalesce waits for napi to complete in napi_disable. So the interrupt should always start enabled when transitioning into no-napi mode.