netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net] net: don't let netpoll invoke NAPI if in xmit context
@ 2023-03-31  2:21 Jakub Kicinski
  2023-03-31  2:41 ` Eric Dumazet
  2023-04-02 12:30 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Kicinski @ 2023-03-31  2:21 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, Jakub Kicinski, Roman Gushchin, leitao,
	shemminger

Commit 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix") narrowed
down the region under netif_tx_trylock() inside netpoll_send_skb().
(At that point in time netif_tx_trylock() would lock all queues of
the device.) Taking the tx lock was problematic because driver's
cleanup method may take the same lock. So the change made us hold
the xmit lock only around xmit, and expected the driver to take
care of locking within ->ndo_poll_controller().

Unfortunately this only works if netpoll isn't itself called with
the xmit lock already held. Netpoll code is careful and uses
trylock(). The drivers, however, may be using plain lock().
Printing while holding the xmit lock is going to result in rare
deadlocks.

Luckily we record the xmit lock owners, so we can scan all the queues,
the same way we scan NAPI owners. If any of the xmit locks is held
by the local CPU we better not attempt any polling.

It would be nice if we could narrow down the check to only the NAPIs
and the queue we're trying to use. I don't see a way to do that now.

Reported-by: Roman Gushchin <roman.gushchin@linux.dev>
Fixes: 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: leitao@debian.org
CC: shemminger@linux.foundation.org
---
 net/core/netpoll.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index a089b704b986..e6a739b1afa9 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -137,6 +137,20 @@ static void queue_process(struct work_struct *work)
 	}
 }
 
+static int netif_local_xmit_active(struct net_device *dev)
+{
+	int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
+		if (READ_ONCE(txq->xmit_lock_owner) == smp_processor_id())
+			return 1;
+	}
+
+	return 0;
+}
+
 static void poll_one_napi(struct napi_struct *napi)
 {
 	int work;
@@ -183,7 +197,10 @@ void netpoll_poll_dev(struct net_device *dev)
 	if (!ni || down_trylock(&ni->dev_lock))
 		return;
 
-	if (!netif_running(dev)) {
+	/* Some drivers will take the same locks in poll and xmit,
+	 * we can't poll if local CPU is already in xmit.
+	 */
+	if (!netif_running(dev) || netif_local_xmit_active(dev)) {
 		up(&ni->dev_lock);
 		return;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: don't let netpoll invoke NAPI if in xmit context
  2023-03-31  2:21 [PATCH net] net: don't let netpoll invoke NAPI if in xmit context Jakub Kicinski
@ 2023-03-31  2:41 ` Eric Dumazet
  2023-03-31  3:23   ` Jakub Kicinski
  2023-04-02 12:30 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2023-03-31  2:41 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, pabeni, Roman Gushchin, leitao, shemminger

On Fri, Mar 31, 2023 at 4:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> Commit 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix") narrowed
> down the region under netif_tx_trylock() inside netpoll_send_skb().
> (At that point in time netif_tx_trylock() would lock all queues of
> the device.) Taking the tx lock was problematic because driver's
> cleanup method may take the same lock. So the change made us hold
> the xmit lock only around xmit, and expected the driver to take
> care of locking within ->ndo_poll_controller().
>
> Unfortunately this only works if netpoll isn't itself called with
> the xmit lock already held. Netpoll code is careful and uses
> trylock(). The drivers, however, may be using plain lock().
> Printing while holding the xmit lock is going to result in rare
> deadlocks.
>
> Luckily we record the xmit lock owners, so we can scan all the queues,
> the same way we scan NAPI owners. If any of the xmit locks is held
> by the local CPU we better not attempt any polling.
>
> It would be nice if we could narrow down the check to only the NAPIs
> and the queue we're trying to use. I don't see a way to do that now.
>
> Reported-by: Roman Gushchin <roman.gushchin@linux.dev>
> Fixes: 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix")
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: leitao@debian.org
> CC: shemminger@linux.foundation.org
> ---
>  net/core/netpoll.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index a089b704b986..e6a739b1afa9 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -137,6 +137,20 @@ static void queue_process(struct work_struct *work)
>         }
>  }
>
> +static int netif_local_xmit_active(struct net_device *dev)
> +{
> +       int i;
> +
> +       for (i = 0; i < dev->num_tx_queues; i++) {
> +               struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
> +
> +               if (READ_ONCE(txq->xmit_lock_owner) == smp_processor_id())
> +                       return 1;
> +       }
> +

Resend in plain text mode for the list, sorry for duplicate

Note that we update WRITE_ONCE(txq->xmit_lock_owner, cpu) _after_
spin_lock(&txq->_xmit_lock);

So there is a tiny window I think, for missing that we got the
spinlock, but I do not see how to avoid it without excessive cost.


> +       return 0;
> +}
> +
>  static void poll_one_napi(struct napi_struct *napi)
>  {
>         int work;
> @@ -183,7 +197,10 @@ void netpoll_poll_dev(struct net_device *dev)
>         if (!ni || down_trylock(&ni->dev_lock))
>                 return;
>
> -       if (!netif_running(dev)) {
> +       /* Some drivers will take the same locks in poll and xmit,
> +        * we can't poll if local CPU is already in xmit.
> +        */
> +       if (!netif_running(dev) || netif_local_xmit_active(dev)) {
>                 up(&ni->dev_lock);
>                 return;
>         }
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: don't let netpoll invoke NAPI if in xmit context
  2023-03-31  2:41 ` Eric Dumazet
@ 2023-03-31  3:23   ` Jakub Kicinski
  2023-03-31  3:34     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Kicinski @ 2023-03-31  3:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, netdev, pabeni, Roman Gushchin, leitao, shemminger

On Fri, 31 Mar 2023 04:41:23 +0200 Eric Dumazet wrote:
> Note that we update WRITE_ONCE(txq->xmit_lock_owner, cpu) _after_
> spin_lock(&txq->_xmit_lock);
> 
> So there is a tiny window I think, for missing that we got the
> spinlock, but I do not see how to avoid it without excessive cost.

Ugh, true. Hopefully the chances of taking an IRQ which tries to print
something between those two instructions are fairly low. 

I was considering using dev_recursion_level() but AFAICT we don't
currently bump it when dequeuing from the qdisc..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: don't let netpoll invoke NAPI if in xmit context
  2023-03-31  3:23   ` Jakub Kicinski
@ 2023-03-31  3:34     ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2023-03-31  3:34 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: davem, netdev, pabeni, Roman Gushchin, leitao, shemminger

On Fri, Mar 31, 2023 at 5:23 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 31 Mar 2023 04:41:23 +0200 Eric Dumazet wrote:
> > Note that we update WRITE_ONCE(txq->xmit_lock_owner, cpu) _after_
> > spin_lock(&txq->_xmit_lock);
> >
> > So there is a tiny window I think, for missing that we got the
> > spinlock, but I do not see how to avoid it without excessive cost.
>
> Ugh, true. Hopefully the chances of taking an IRQ which tries to print
> something between those two instructions are fairly low.
>
> I was considering using dev_recursion_level() but AFAICT we don't
> currently bump it when dequeuing from the qdisc..

SGTM, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net] net: don't let netpoll invoke NAPI if in xmit context
  2023-03-31  2:21 [PATCH net] net: don't let netpoll invoke NAPI if in xmit context Jakub Kicinski
  2023-03-31  2:41 ` Eric Dumazet
@ 2023-04-02 12:30 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-04-02 12:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, roman.gushchin, leitao, shemminger

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 30 Mar 2023 19:21:44 -0700 you wrote:
> Commit 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix") narrowed
> down the region under netif_tx_trylock() inside netpoll_send_skb().
> (At that point in time netif_tx_trylock() would lock all queues of
> the device.) Taking the tx lock was problematic because driver's
> cleanup method may take the same lock. So the change made us hold
> the xmit lock only around xmit, and expected the driver to take
> care of locking within ->ndo_poll_controller().
> 
> [...]

Here is the summary with links:
  - [net] net: don't let netpoll invoke NAPI if in xmit context
    https://git.kernel.org/netdev/net/c/275b471e3d2d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-04-02 12:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31  2:21 [PATCH net] net: don't let netpoll invoke NAPI if in xmit context Jakub Kicinski
2023-03-31  2:41 ` Eric Dumazet
2023-03-31  3:23   ` Jakub Kicinski
2023-03-31  3:34     ` Eric Dumazet
2023-04-02 12:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).