All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Mitchell <kevmitch@arista.com>
To: Antoine Tenart <atenart@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: new warning caused by ("net-sysfs: update the queue counts in the unregistration path")
Date: Wed, 28 Sep 2022 16:20:33 -0700	[thread overview]
Message-ID: <YzTWwf/FyzBKGaww@chmeee> (raw)
In-Reply-To: <166435838013.3919.14607521178984182789@kwain>

On Wed, Sep 28, 2022 at 11:46:20AM +0200, Antoine Tenart wrote:
> Quoting Kevin Mitchell (2022-09-28 03:27:46)
> > With the inclusion of d7dac083414e ("net-sysfs: update the queue counts in the
> > unregistration path"), we have started see the following message during one of
> > our stress tests that brings an interface up and down while continuously
> > trying to send out packets on it:
> >
> > et3_11_1 selects TX queue 0, but real number of TX queues is 0
> >
> > It seems that this is a result of a race between remove_queue_kobjects() and
> > netdev_cap_txqueue() for the last packets before setting dev->flags &= ~IFF_UP
> > in __dev_close_many(). When this message is displayed, netdev_cap_txqueue()
> > selects queue 0 anyway (the noop queue at this point). As it did before the
> > above commit, that queue (which I guess is still around due to reference
> > counting) proceeds to drop the packet and return NET_XMIT_CN. So there doesn't
> > appear to be a functional change. However, the warning message seems to be
> > spurious if not slightly confusing.
>
> Do you know the call traces leading to this? Also I'm not 100% sure to
> follow as remove_queue_kobjects is called in the unregistration path
> while the test is setting the iface up & down. What driver is used?

Sorry, my language was imprecise. The device is being unregistered and
re-registered. The driver is out of tree for our front panel ports. I don't
think this is specific to the driver, but I'd be happy to be convinced
otherwise.

The call trace to the queue removal is

[  628.165565]  dump_stack+0x74/0x90
(remove_queue_kobject)
[  628.165569]  netdev_unregister_kobject+0x7a/0xb3
[  628.165572]  rollback_registered_many+0x560/0x5c4
[  628.165576]  unregister_netdevice_queue+0xa3/0xfc
[  628.165578]  unregister_netdev+0x1e/0x25
[  628.165589]  fdev_free+0x26e/0x29d [strata_dma_drv]

The call trace to the warning message is

[ 1094.355489]  dump_stack+0x74/0x90
(netdev_cap_txqueue)
[ 1094.355495]  netdev_core_pick_tx+0x91/0xaf
[ 1094.355500]  __dev_queue_xmit+0x249/0x602
[ 1094.355503]  ? printk+0x58/0x6f
[ 1094.355510]  dev_queue_xmit+0x10/0x12
[ 1094.355518]  packet_sendmsg+0xe88/0xeee
[ 1094.355524]  ? update_curr+0x6b/0x15d
[ 1094.355530]  sock_sendmsg_nosec+0x12/0x1d
[ 1094.355533]  sock_write_iter+0x8a/0xb6
[ 1094.355539]  new_sync_write+0x7c/0xb4
[ 1094.355543]  vfs_write+0xfe/0x12a
[ 1094.355547]  ksys_write+0x6e/0xb9
[ 1094.355552]  ? exit_to_user_mode_prepare+0xd3/0xf0
[ 1094.355555]  __x64_sys_write+0x1a/0x1c
[ 1094.355559]  do_syscall_64+0x31/0x40
[ 1094.355564]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

>
> As you said and looking around queue 0 is somewhat special and used as a
> fallback. My suggestion would be to 1) check if the above race is
> expected 2) if yes, a possible solution would be not to warn when
> real_num_tx_queues == 0 as in such cases selecting queue 0 would be the
> expected fallback (and you might want to check places like [1]).

Yes this is exactly where this is happening and that sounds like a good idea to
me. As far as I can tell, the message is completely innocuous. If there really
are no cases where it is useful to have this warning for real_num_tx_queues ==
0, I could submit a patch to not emit it in that case.

>
> Thanks,
> Antoine
>
> [1] https://elixir.bootlin.com/linux/latest/source/net/core/dev.c#L4126

  reply	other threads:[~2022-09-28 23:20 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28  1:27 new warning caused by ("net-sysfs: update the queue counts in the unregistration path") Kevin Mitchell
2022-09-28  9:46 ` Antoine Tenart
2022-09-28 23:20   ` Kevin Mitchell [this message]
2022-09-30  2:11     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzTWwf/FyzBKGaww@chmeee \
    --to=kevmitch@arista.com \
    --cc=atenart@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.