From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willem de Bruijn Subject: Re: [PATCH net-next] virtio_net: ethtool tx napi configuration Date: Thu, 13 Sep 2018 23:40:51 -0400 Message-ID: References: <20180909224449.203593-1-willemdebruijn.kernel@gmail.com> <2fd8d46f-7466-e507-af31-c587693beb6e@gmail.com> <4a3d69be-8651-e36d-bc14-5a3f1f23d155@redhat.com> <6db3c755-a1dc-dcc9-e110-bfc38143e83d@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: Florian Fainelli , Network Development , David Miller , caleb.raitto@gmail.com, "Michael S. Tsirkin" , "Jon Olson (Google Drive)" , Willem de Bruijn To: Jason Wang Return-path: Received: from mail-ed1-f67.google.com ([209.85.208.67]:45217 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725953AbeINIx7 (ORCPT ); Fri, 14 Sep 2018 04:53:59 -0400 Received: by mail-ed1-f67.google.com with SMTP id p52-v6so6258143eda.12 for ; Thu, 13 Sep 2018 20:41:28 -0700 (PDT) In-Reply-To: <6db3c755-a1dc-dcc9-e110-bfc38143e83d@redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, Sep 13, 2018 at 11:27 PM Jason Wang wrote: > > > > On 2018=E5=B9=B409=E6=9C=8813=E6=97=A5 22:58, Willem de Bruijn wrote: > > On Thu, Sep 13, 2018 at 5:02 AM Jason Wang wrote: > >> > >> > >> On 2018=E5=B9=B409=E6=9C=8813=E6=97=A5 07:27, Willem de Bruijn wrote: > >>> On Wed, Sep 12, 2018 at 3:11 PM Willem de Bruijn > >>> wrote: > >>>> On Wed, Sep 12, 2018 at 2:16 PM Florian Fainelli wrote: > >>>>> > >>>>> On 9/12/2018 11:07 AM, Willem de Bruijn wrote: > >>>>>> On Wed, Sep 12, 2018 at 1:42 PM Florian Fainelli wrote: > >>>>>>> > >>>>>>> On 9/9/2018 3:44 PM, Willem de Bruijn wrote: > >>>>>>>> From: Willem de Bruijn > >>>>>>>> > >>>>>>>> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) hand= lers. > >>>>>>>> Interrupt moderation is currently not supported, so these accept= and > >>>>>>>> display the default settings of 0 usec and 1 frame. > >>>>>>>> > >>>>>>>> Toggle tx napi through a bit in tx-frames. So as to not interfer= e > >>>>>>>> with possible future interrupt moderation, use bit 10, well outs= ide > >>>>>>>> the reasonable range of real interrupt moderation values. > >>>>>>>> > >>>>>>>> Changes are not atomic. The tx IRQ, napi BH and transmit path mu= st > >>>>>>>> be quiesced when switching modes. Only allow changing this setti= ng > >>>>>>>> when the device is down. > >>>>>>> Humm, would not a private ethtool flag to switch TX NAPI on/off b= e more > >>>>>>> appropriate rather than use the coalescing configuration API here= ? > >>>>>> What do you mean by private ethtool flag? A new field in ethtool > >>>>>> --features (-k)? > >>>>> I meant using ethtool_drvinfo::n_priv_flags, ETH_SS_PRIV_FLAGS and = then > >>>>> ETHTOOL_GFPFLAGS and ETHTOOL_SPFLAGS to control the toggling of tha= t > >>>>> private flag. mlx5 has a number of privates flags for instance. > >>>> Interesting, thanks! I was not at all aware of those ethtool flags. > >>>> Am having a look. It definitely looks promising. > >>> Okay, I made that change. That is indeed much cleaner, thanks. > >>> Let me send the patch, initially as RFC. > >>> > >>> I've observed one issue where if we toggle the flag before bringing > >>> up the device, it hits a kernel BUG at include/linux/netdevice.h:515 > >>> > >>> BUG_ON(!test_bit(NAPI_STATE_SCHED, &n->state)); > >> This reminds me that we need to check netif_running() before trying to > >> enable and disable tx napi in ethtool_set_coalesce(). > > The first iteration of my patch checked IFF_UP and effectively > > only allowed the change when not running. What do you mean > > by need to check? > > I mean if device is not up, there's no need to toggle napi state and tx > lock. > > > > > And to respond to the other follow-up notes at once: > > > >> Consider we may have interrupt moderation in the future, I tend to use > >> set_coalesce. Otherwise we may need two steps to enable moderation: > >> > >> - tx-napi on > >> - set_coalesce > > FWIW, I don't care strongly whether we do this through coalesce or priv= _flags. > > Ok. Since you prefer coalesce, let's go with that (and a revision of your latest patch). > > >>> + if (!napi_weight) > >>> + virtqueue_enable_cb(vi->sq[i].vq); > >> I don't get why we need to disable enable cb here. > > To avoid entering no-napi mode with too few descriptors to > > make progress and no way to get out of that state. This is a > > pretty crude attempt at handling that, admittedly. > > But in this case, we will call enable_cb_delayed() and we will finally > get a interrupt? Right. It's a bit of a roundabout way to ensure that netif_tx_wake_queue and thus eventually free_old_xmit_skbs are called. It might make more sense to just wake the device without going through an interrupt.