From mboxrd@z Thu Jan  1 00:00:00 1970
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Subject: Re: [PATCH net-next] virtio_net: force_napi_tx module param.
Date: Tue, 24 Jul 2018 16:52:53 -0400
Message-ID: <CAF=yD-KT4_yLq2OehHhdFc=9R0eRrKMHq2ae9NndJWqc+8h=LA@mail.gmail.com>
References: <20180723231119.142904-1-caleb.raitto@gmail.com>
 <20180724134138-mutt-send-email-mst@kernel.org> <CAF=yD-JmhKHhXcUdfm39t=OV+5VMT90XdGWaaKSWK+kG5UBh8Q@mail.gmail.com>
 <20180724212238-mutt-send-email-mst@kernel.org>
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Cc: caleb.raitto@gmail.com, Jason Wang <jasowang@redhat.com>,
        David Miller <davem@davemloft.net>,
        Network Development <netdev@vger.kernel.org>,
        Caleb Raitto <caraitto@google.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-ed1-f65.google.com ([209.85.208.65]:33234 "EHLO
        mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S2388691AbeGXWBs (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 24 Jul 2018 18:01:48 -0400
Received: by mail-ed1-f65.google.com with SMTP id x5-v6so5259384edr.0
        for <netdev@vger.kernel.org>; Tue, 24 Jul 2018 13:53:30 -0700 (PDT)
In-Reply-To: <20180724212238-mutt-send-email-mst@kernel.org>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On Tue, Jul 24, 2018 at 2:39 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jul 24, 2018 at 10:01:39AM -0400, Willem de Bruijn wrote:
> > On Tue, Jul 24, 2018 at 6:44 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jul 23, 2018 at 04:11:19PM -0700, Caleb Raitto wrote:
> > > > From: Caleb Raitto <caraitto@google.com>
> > > >
> > > > The driver disables tx napi if it's not certain that completions will
> > > > be processed affine with tx service.
> > > >
> > > > Its heuristic doesn't account for some scenarios where it is, such as
> > > > when the queue pair count matches the core but not hyperthread count.
> > > >
> > > > Allow userspace to override the heuristic. This is an alternative
> > > > solution to that in the linked patch. That added more logic in the
> > > > kernel for these cases, but the agreement was that this was better left
> > > > to user control.
> > > >
> > > > Do not expand the existing napi_tx variable to a ternary value,
> > > > because doing so can break user applications that expect
> > > > boolean ('Y'/'N') instead of integer output. Add a new param instead.
> > > >
> > > > Link: https://patchwork.ozlabs.org/patch/725249/
> > > > Acked-by: Willem de Bruijn <willemb@google.com>
> > > > Acked-by: Jon Olson <jonolson@google.com>
> > > > Signed-off-by: Caleb Raitto <caraitto@google.com>
> > >
> > > Is there a reason the same rule should apply to all devices?
> > > If not shouldn't this be an ethtool option?
> >
> > It not very likely that a guest will have multiple virtio_net devices,
> > some of which have affinity_hint_set and some do not?
>
> Just to answer this question, I do hear a lot about guests with multiple
> virtio net interfaces.  These might be just the noisy few, but they do
> exist.
>
> > I'd really rather not add the extra option at all, but remove
> > the affinity_hint_set requirement for now. Without more data,
> > I understand the concern about cacheline bouncing if napi-tx
> > would becomes the default at some point and we don't have
> > data on this by then. But while it isn't default and a user has to
> > opt in to napi-tx to try it that seems enough guardrail to me.
> >
> > The original reason was lack of data on whether napi-tx may suffer
> > from cache invalidations when tx and rx softirq are on different cpus
> > and we enable tx descriptor cleaning from the rx handler (i.e., on ACK).
> > >From those initial numbers it seemed to be a win even with those
> > invalidations.
> >
> >   https://patchwork.ozlabs.org/patch/746232/
> >
> > In lieu of removing the affinity_hint_set, this boolean is the least amount
> > of both new interface and implementation to allow experimentation. We
> > can easily leave it as a noop eventually when we are confident that
> > napi-tx can be enabled even without affinity. By comparison, an ethtool
> > param would be quite a bit of new logic.
>
> So it boils down to this: if we believe napi tx is
> always a good idea, then this flag is there just in case,
> and the patch is fine. If it's good for some workloads
> but not others, and will be for a while, we are better off
> with an ethtool flag.
>
> What's the case here?

Based on benchmark results so far, it looks like the first. It's markedly
better for some and in the noise for most.

The crux is the "for a while". We undoubtedly will find some cases that
we need to fix before flipping the default.

The choice is between using napi_tx vs adding a separate ethtool function
to do essentially the same. This patch is the simpler option, and easier to
backport to guest kernels.

> OTOH if you want to add more trick to the affinity hint, such
> as the mentioned above # of queues matching core count,
> that is also fine IMHO.

>>From the above linked patch, I understand that there are yet
other special cases in production, such as a hard cap on #tx queues to
32 regardless of number of vcpus. I don't think we want to add special
cases for all this kind of business logic in the kernel.