All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Wei Wang <weiwan@google.com>
Cc: jasowang@redhat.com,
	Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	David Miller <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	virtualization@lists.linux-foundation.org,
	Willem de Bruijn <willemb@google.com>
Subject: Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi
Date: Thu, 30 Sep 2021 01:40:02 -0400	[thread overview]
Message-ID: <20210930013920-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAEA6p_CQwn1BrU=t3yAmmKUgn9vWfkao_2c-FrqBk0qK0r7shQ@mail.gmail.com>

On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote:
> On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > > > > >
> > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > > > > interrupt load.
> > > > > > > > > > >
> > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > > > > note_interrupt that applies here.
> > > > > > > > > > >
> > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > > > > stealing, it is a net win.
> > > > > > > > > >
> > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > > > > >
> > > > > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > > > > >
> > > > > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > > > > the feature. That is not captured by this statistic.
> > > > > > > > >
> > > > > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > > > > many concurrent flows.
> > > > > > > >
> > > > > > > >
> > > > > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > > > > Since xmit enables callbacks (just in case!) we also
> > > > > > > > get an interrupt, which is automatically spurious.
> > > > > > > >
> > > > > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > > > > (or additionally?)
> > > > > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > > > > a. we are currently stealing packets
> > > > > > > > or
> > > > > > > > b. we stole all packets
> > > > > >
> > > > > > Agreed. This might prove a significant performance gain at the same time :)
> > > > > >
> > > > > > > >
> > > > > > > Thinking along this line, that probably means, we should disable cb on
> > > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > > > > reenable it after the rx napi work is done?
> > > > > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > > > > to steal pkts or have stolen all pkts.
> > > > > >
> > > > > > The earlier the better. I see no benefit to delay until the rx handler
> > > > > > actually runs.
> > > > > >
> > > > >
> > > > > I've been thinking more on this. I think the fundamental issue here is
> > > > > that the rx napi handler virtnet_poll() does the tx side work by
> > > > > calling virtnet_poll_cleantx() without any notification to the tx
> > > > > side.
> > > > > I am thinking, in virtnet_poll(), instead of directly call
> > > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > > > > schedule the tx side napi, and let the tx napi handler do the cleaning
> > > > > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > > > > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > > > > to re-enable the cb on the tx side. This way, the tx side knows what
> > > > > has been done, and will likely reduce the # of spurious tx interrupts?
> > > > > And I don't think there is much cost in doing that, since
> > > > > napi_schedule() basically queues the tx napi to the back of its
> > > > > napi_list, and serves it right after the rx napi handler is done.
> > > > > What do you guys think? I could quickly test it up to see if it solves
> > > > > the issue.
> > > >
> > > >
> > > > Sure pls test. I think you will want to disable event index
> > > > for now to make sure disable cb is not a nop (I am working on
> > > > fixing that).
> > > >
> > >
> > > Hi Michael and Jason,
> > >
> > > I'd like to follow up on this issue a bit more.
> > > I've done some more investigation into this issue:
> > > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
> > > aggressively"), we are still seeing this issue with a tcp_stream test
> > > with 240 flows.
> > > 2. We've tried with the following patch to suppress cleaning tx queue
> > > from rx napi handler for 10% of the time:
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 79bd2585ec6b..711768dbc617 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 return;
> > >
> > >         if (__netif_tx_trylock(txq)) {
> > > +               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
> > > +                       goto unlock;
> > >                 do {
> > >                         virtqueue_disable_cb(sq->vq);
> > >                         free_old_xmit_skbs(sq, true);
> > > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > >                         netif_tx_wake_queue(txq);
> > >
> > > +unlock:
> > >                 __netif_tx_unlock(txq);
> > >         }
> > >  }
> > > This also does not help. It turns out skipping 10% is just not enough.
> > > We have to skip for 50% of the time in order for the warning to be
> > > suppressed.
> > > And this does not seem to be a viable solution since how much we skip
> > > probably will depend on the traffic pattern.
> > >
> > > My questions here:
> > > 1. Michael mentioned that if we use split queues with event idx, the
> > > interrupts are not actually being disabled. Is this still the case? If
> > > so, is that also the cause for so many spurious interrupts?
> > > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
> > > virtio_disable_cb"). I am not quite sure, would that change help
> > > reduce the # of spurious interrupts we see if we use split queues with
> > > event idx? From my limited understanding, that patch skips calling
> > > virtqueue_disable_cb_split() if event_trigger is set for split queues.
> > >
> > > BTW, I have the setup to reproduce this issue easily. So do let me
> > > know if you have other ideas on how to fix it.
> > >
> > > Thanks.
> > > Wei
> >
> > I think that commit is needed to fix the issue, yes.
> >
> > My suggestion is to try v5.14 in its entirety
> > rather than cherry-picking.
> >
> > If you see that the issue is
> > fixed there I can point you to a list of commit to backport.
> >
> 
> Thanks Michael. It does seem with 5.14 kernel, I no longer see this
> issue happening, which is great!
> Could you point me to a list of commits that fixed it?

Try this:

a7766ef18b33 virtio_net: disable cb aggressively
8d622d21d248 virtio: fix up virtio_disable_cb
22bc63c58e87 virtio_net: move txq wakeups under tx q lock
5a2f966d0f3f virtio_net: move tx vq operation under tx queue lock


> > >
> > > > > > > Because the steal work is done
> > > > > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > > > > been raised before that. Will we come back to process the tx interrupt
> > > > > > > again after we re-enabled the cb on the tx side?
> > > > > > >
> > > > > > > > This should be enough to reduce the chances below 99% ;)
> > > > > > > >
> > > > > > > > One annoying thing is that with split and event index, we do not disable
> > > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > > > > event index feature? I am not sure it is actually worth it with
> > > > > > > > stealing.
> > > > > >
> > > > > > With event index, we suppress interrupts when another interrupt is
> > > > > > already pending from a previous packet, right? When the previous
> > > > > > position of the producer is already beyond the consumer. It doesn't
> > > > > > matter whether the previous packet triggered a tx interrupt or
> > > > > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > > > > leave it out.
> > > >
> >


WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Wei Wang <weiwan@google.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Willem de Bruijn <willemb@google.com>,
	virtualization@lists.linux-foundation.org,
	Jakub Kicinski <kuba@kernel.org>,
	David Miller <davem@davemloft.net>
Subject: Re: [PATCH net] virtio-net: suppress bad irq warning for tx napi
Date: Thu, 30 Sep 2021 01:40:02 -0400	[thread overview]
Message-ID: <20210930013920-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <CAEA6p_CQwn1BrU=t3yAmmKUgn9vWfkao_2c-FrqBk0qK0r7shQ@mail.gmail.com>

On Wed, Sep 29, 2021 at 04:08:29PM -0700, Wei Wang wrote:
> On Wed, Sep 29, 2021 at 2:53 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Sep 29, 2021 at 01:21:58PM -0700, Wei Wang wrote:
> > > On Mon, Apr 12, 2021 at 10:16 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Fri, Feb 05, 2021 at 02:28:33PM -0800, Wei Wang wrote:
> > > > > On Thu, Feb 4, 2021 at 12:48 PM Willem de Bruijn
> > > > > <willemdebruijn.kernel@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Feb 3, 2021 at 6:53 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > >
> > > > > > > On Wed, Feb 3, 2021 at 3:10 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Feb 03, 2021 at 01:24:08PM -0500, Willem de Bruijn wrote:
> > > > > > > > > On Wed, Feb 3, 2021 at 5:42 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 02, 2021 at 07:06:53PM -0500, Willem de Bruijn wrote:
> > > > > > > > > > > On Tue, Feb 2, 2021 at 6:53 PM Willem de Bruijn <willemb@google.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2021 at 6:47 PM Wei Wang <weiwan@google.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2021 at 3:12 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Jan 28, 2021 at 04:21:36PM -0800, Wei Wang wrote:
> > > > > > > > > > > > > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > > > > > > > > > > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > > > > > > > > > > > > complete interrupts. But this could introduce a race where tx complete
> > > > > > > > > > > > > > > interrupt has been raised, but the handler found there is no work to do
> > > > > > > > > > > > > > > because we have done the work in the previous rx interrupt handler.
> > > > > > > > > > > > > > > This could lead to the following warning msg:
> > > > > > > > > > > > > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > > > > > > > > > > > > "irqpoll" option)
> > > > > > > > > > > > > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > > > > > > > > > > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > > > > > > > > > > > > [ 3588.017940] Call Trace:
> > > > > > > > > > > > > > > [ 3588.017942]  <IRQ>
> > > > > > > > > > > > > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > > > > > > > > > > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > > > > > > > > > > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > > > > > > > > > > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > > > > > > > > > > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > > > > > > > > > > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > > > > > > > > > > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > > > > > > > > > > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > > > > > > > > > > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > > > > > > > > > > > > [ 3588.017966]  </IRQ>
> > > > > > > > > > > > > > > [ 3588.017989] handlers:
> > > > > > > > > > > > > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > > > > > > > > > > > > [ 3588.025099] Disabling IRQ #38
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This patch adds a new param to struct vring_virtqueue, and we set it for
> > > > > > > > > > > > > > > tx virtqueues if napi-tx is enabled, to suppress the warning in such
> > > > > > > > > > > > > > > case.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Fixes: 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi")
> > > > > > > > > > > > > > > Reported-by: Rick Jones <jonesrick@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Wei Wang <weiwan@google.com>
> > > > > > > > > > > > > > > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This description does not make sense to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > irq X: nobody cared
> > > > > > > > > > > > > > only triggers after an interrupt is unhandled repeatedly.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So something causes a storm of useless tx interrupts here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let's find out what it was please. What you are doing is
> > > > > > > > > > > > > > just preventing linux from complaining.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The traffic that causes this warning is a netperf tcp_stream with at
> > > > > > > > > > > > > least 128 flows between 2 hosts. And the warning gets triggered on the
> > > > > > > > > > > > > receiving host, which has a lot of rx interrupts firing on all queues,
> > > > > > > > > > > > > and a few tx interrupts.
> > > > > > > > > > > > > And I think the scenario is: when the tx interrupt gets fired, it gets
> > > > > > > > > > > > > coalesced with the rx interrupt. Basically, the rx and tx interrupts
> > > > > > > > > > > > > get triggered very close to each other, and gets handled in one round
> > > > > > > > > > > > > of do_IRQ(). And the rx irq handler gets called first, which calls
> > > > > > > > > > > > > virtnet_poll(). However, virtnet_poll() calls virtnet_poll_cleantx()
> > > > > > > > > > > > > to try to do the work on the corresponding tx queue as well. That's
> > > > > > > > > > > > > why when tx interrupt handler gets called, it sees no work to do.
> > > > > > > > > > > > > And the reason for the rx handler to handle the tx work is here:
> > > > > > > > > > > > > https://lists.linuxfoundation.org/pipermail/virtualization/2017-April/034740.html
> > > > > > > > > > > >
> > > > > > > > > > > > Indeed. It's not a storm necessarily. The warning occurs after one
> > > > > > > > > > > > hundred such events, since boot, which is a small number compared real
> > > > > > > > > > > > interrupt load.
> > > > > > > > > > >
> > > > > > > > > > > Sorry, this is wrong. It is the other call to __report_bad_irq from
> > > > > > > > > > > note_interrupt that applies here.
> > > > > > > > > > >
> > > > > > > > > > > > Occasionally seeing an interrupt with no work is expected after
> > > > > > > > > > > > 7b0411ef4aa6 ("virtio-net: clean tx descriptors from rx napi"). As
> > > > > > > > > > > > long as this rate of events is very low compared to useful interrupts,
> > > > > > > > > > > > and total interrupt count is greatly reduced vs not having work
> > > > > > > > > > > > stealing, it is a net win.
> > > > > > > > > >
> > > > > > > > > > Right, but if 99900 out of 100000 interrupts were wasted, then it is
> > > > > > > > > > surely an even greater win to disable interrupts while polling like
> > > > > > > > > > this.  Might be tricky to detect, disabling/enabling aggressively every
> > > > > > > > > > time even if there's nothing in the queue is sure to cause lots of cache
> > > > > > > > > > line bounces, and we don't want to enable callbacks if they were not
> > > > > > > > > > enabled e.g. by start_xmit ...  Some kind of counter?
> > > > > > > > >
> > > > > > > > > Yes. It was known that the work stealing is more effective in some
> > > > > > > > > workloads than others. But a 99% spurious rate I had not anticipated.
> > > > > > > > >
> > > > > > > > > Most interesting is the number of interrupts suppressed as a result of
> > > > > > > > > the feature. That is not captured by this statistic.
> > > > > > > > >
> > > > > > > > > In any case, we'll take a step back to better understand behavior. And
> > > > > > > > > especially why this high spurious rate exhibits in this workload with
> > > > > > > > > many concurrent flows.
> > > > > > > >
> > > > > > > >
> > > > > > > > I've been thinking about it. Imagine work stealing working perfectly.
> > > > > > > > Each time we xmit a packet, it is stolen and freed.
> > > > > > > > Since xmit enables callbacks (just in case!) we also
> > > > > > > > get an interrupt, which is automatically spurious.
> > > > > > > >
> > > > > > > > My conclusion is that we shouldn't just work around it but instead
> > > > > > > > (or additionally?)
> > > > > > > > reduce the number of interrupts by disabling callbacks e.g. when
> > > > > > > > a. we are currently stealing packets
> > > > > > > > or
> > > > > > > > b. we stole all packets
> > > > > >
> > > > > > Agreed. This might prove a significant performance gain at the same time :)
> > > > > >
> > > > > > > >
> > > > > > > Thinking along this line, that probably means, we should disable cb on
> > > > > > > the tx virtqueue, when scheduling the napi work on the rx side, and
> > > > > > > reenable it after the rx napi work is done?
> > > > > > > Also, I wonder if it is too late to disable cb at the point we start
> > > > > > > to steal pkts or have stolen all pkts.
> > > > > >
> > > > > > The earlier the better. I see no benefit to delay until the rx handler
> > > > > > actually runs.
> > > > > >
> > > > >
> > > > > I've been thinking more on this. I think the fundamental issue here is
> > > > > that the rx napi handler virtnet_poll() does the tx side work by
> > > > > calling virtnet_poll_cleantx() without any notification to the tx
> > > > > side.
> > > > > I am thinking, in virtnet_poll(), instead of directly call
> > > > > virtnet_poll_cleantx(), why not do virtqueue_napi_schedule() to
> > > > > schedule the tx side napi, and let the tx napi handler do the cleaning
> > > > > work. This way, we automatically call virtqueue_disable_cb() on the tx
> > > > > vq, and after the tx work is done, virtqueue_napi_complete() is called
> > > > > to re-enable the cb on the tx side. This way, the tx side knows what
> > > > > has been done, and will likely reduce the # of spurious tx interrupts?
> > > > > And I don't think there is much cost in doing that, since
> > > > > napi_schedule() basically queues the tx napi to the back of its
> > > > > napi_list, and serves it right after the rx napi handler is done.
> > > > > What do you guys think? I could quickly test it up to see if it solves
> > > > > the issue.
> > > >
> > > >
> > > > Sure pls test. I think you will want to disable event index
> > > > for now to make sure disable cb is not a nop (I am working on
> > > > fixing that).
> > > >
> > >
> > > Hi Michael and Jason,
> > >
> > > I'd like to follow up on this issue a bit more.
> > > I've done some more investigation into this issue:
> > > 1. With Michael's recent patch: a7766ef18b336 ("virtio_net: disable cb
> > > aggressively"), we are still seeing this issue with a tcp_stream test
> > > with 240 flows.
> > > 2. We've tried with the following patch to suppress cleaning tx queue
> > > from rx napi handler for 10% of the time:
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 79bd2585ec6b..711768dbc617 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1510,6 +1510,8 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 return;
> > >
> > >         if (__netif_tx_trylock(txq)) {
> > > +               if (virtqueue_more_used(sq->vq) && !prandom_u32_max(10))
> > > +                       goto unlock;
> > >                 do {
> > >                         virtqueue_disable_cb(sq->vq);
> > >                         free_old_xmit_skbs(sq, true);
> > > @@ -1518,6 +1520,7 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> > >                 if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > >                         netif_tx_wake_queue(txq);
> > >
> > > +unlock:
> > >                 __netif_tx_unlock(txq);
> > >         }
> > >  }
> > > This also does not help. It turns out skipping 10% is just not enough.
> > > We have to skip for 50% of the time in order for the warning to be
> > > suppressed.
> > > And this does not seem to be a viable solution since how much we skip
> > > probably will depend on the traffic pattern.
> > >
> > > My questions here:
> > > 1. Michael mentioned that if we use split queues with event idx, the
> > > interrupts are not actually being disabled. Is this still the case? If
> > > so, is that also the cause for so many spurious interrupts?
> > > 2. Michael also submitted another patch: 8d622d21d248 ("virtio: fix up
> > > virtio_disable_cb"). I am not quite sure, would that change help
> > > reduce the # of spurious interrupts we see if we use split queues with
> > > event idx? From my limited understanding, that patch skips calling
> > > virtqueue_disable_cb_split() if event_trigger is set for split queues.
> > >
> > > BTW, I have the setup to reproduce this issue easily. So do let me
> > > know if you have other ideas on how to fix it.
> > >
> > > Thanks.
> > > Wei
> >
> > I think that commit is needed to fix the issue, yes.
> >
> > My suggestion is to try v5.14 in its entirety
> > rather than cherry-picking.
> >
> > If you see that the issue is
> > fixed there I can point you to a list of commit to backport.
> >
> 
> Thanks Michael. It does seem with 5.14 kernel, I no longer see this
> issue happening, which is great!
> Could you point me to a list of commits that fixed it?

Try this:

a7766ef18b33 virtio_net: disable cb aggressively
8d622d21d248 virtio: fix up virtio_disable_cb
22bc63c58e87 virtio_net: move txq wakeups under tx q lock
5a2f966d0f3f virtio_net: move tx vq operation under tx queue lock


> > >
> > > > > > > Because the steal work is done
> > > > > > > in the napi handler of the rx queue. But the tx interrupt must have
> > > > > > > been raised before that. Will we come back to process the tx interrupt
> > > > > > > again after we re-enabled the cb on the tx side?
> > > > > > >
> > > > > > > > This should be enough to reduce the chances below 99% ;)
> > > > > > > >
> > > > > > > > One annoying thing is that with split and event index, we do not disable
> > > > > > > > interrupts. Could be worth revisiting, for now maybe just disable the
> > > > > > > > event index feature? I am not sure it is actually worth it with
> > > > > > > > stealing.
> > > > > >
> > > > > > With event index, we suppress interrupts when another interrupt is
> > > > > > already pending from a previous packet, right? When the previous
> > > > > > position of the producer is already beyond the consumer. It doesn't
> > > > > > matter whether the previous packet triggered a tx interrupt or
> > > > > > deferred to an already scheduled rx interrupt? From that seems fine to
> > > > > > leave it out.
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

  reply	other threads:[~2021-09-30  5:40 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29  0:21 [PATCH net] virtio-net: suppress bad irq warning for tx napi Wei Wang
2021-02-02  2:24 ` Jakub Kicinski
2021-02-02  3:06 ` Jason Wang
2021-02-02  3:06   ` Jason Wang
2021-02-02 14:37   ` Willem de Bruijn
2021-02-02 14:37     ` Willem de Bruijn
2021-02-03  5:33     ` Jason Wang
2021-02-03  5:33       ` Jason Wang
2021-02-03 18:28       ` Willem de Bruijn
2021-02-03 18:28         ` Willem de Bruijn
2021-02-04  3:06         ` Jason Wang
2021-02-04  3:06           ` Jason Wang
2021-02-04 20:50           ` Willem de Bruijn
2021-02-04 20:50             ` Willem de Bruijn
2021-02-08  3:29             ` Jason Wang
2021-02-08  3:29               ` Jason Wang
2021-02-08 19:08               ` Willem de Bruijn
2021-02-08 19:08                 ` Willem de Bruijn
2021-02-09  3:27                 ` Jason Wang
2021-02-09  3:27                   ` Jason Wang
2021-02-09 14:58                   ` Willem de Bruijn
2021-02-09 14:58                     ` Willem de Bruijn
2021-02-09 18:00                     ` Wei Wang
2021-02-10  9:14                       ` Michael S. Tsirkin
2021-02-10  9:14                         ` Michael S. Tsirkin
2021-02-11  0:13                         ` Wei Wang
2021-02-18  5:39                         ` Jason Wang
2021-02-18  5:39                           ` Jason Wang
2021-02-21 11:33                           ` Michael S. Tsirkin
2021-02-21 11:33                             ` Michael S. Tsirkin
2021-02-02 23:11 ` Michael S. Tsirkin
2021-02-02 23:11   ` Michael S. Tsirkin
2021-02-02 23:47   ` Wei Wang
2021-02-02 23:53     ` Willem de Bruijn
2021-02-03  0:06       ` Willem de Bruijn
2021-02-03 10:38         ` Michael S. Tsirkin
2021-02-03 10:38           ` Michael S. Tsirkin
2021-02-03 18:24           ` Willem de Bruijn
2021-02-03 18:24             ` Willem de Bruijn
2021-02-03 23:09             ` Michael S. Tsirkin
2021-02-03 23:09               ` Michael S. Tsirkin
2021-02-03 23:52               ` Wei Wang
2021-02-04 20:47                 ` Willem de Bruijn
2021-02-04 20:47                   ` Willem de Bruijn
2021-02-05 22:28                   ` Wei Wang
2021-04-13  5:15                     ` Michael S. Tsirkin
2021-04-13  5:15                       ` Michael S. Tsirkin
2021-09-29 20:21                       ` Wei Wang
2021-09-29 21:53                         ` Michael S. Tsirkin
2021-09-29 21:53                           ` Michael S. Tsirkin
2021-09-29 23:08                           ` Wei Wang
2021-09-30  5:40                             ` Michael S. Tsirkin [this message]
2021-09-30  5:40                               ` Michael S. Tsirkin
2021-09-30 18:10                             ` Dave Taht
2021-04-12 22:08 ` Michael S. Tsirkin
2021-04-12 22:08   ` Michael S. Tsirkin
2021-04-12 22:33   ` Michael S. Tsirkin
2021-04-12 22:33     ` Michael S. Tsirkin
2021-04-12 23:14     ` David Miller
2021-04-12 23:14       ` David Miller
2021-04-13  4:24       ` Michael S. Tsirkin
2021-04-13  4:24         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210930013920-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=davem@davemloft.net \
    --cc=jasowang@redhat.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=weiwan@google.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.