linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] netpoll: fix netpoll lockups
@ 2006-12-12 10:16 Ingo Molnar
  2006-12-12 12:49 ` [patch] net, 8139too.c: fix netpoll deadlock Ingo Molnar
  2006-12-12 16:13 ` [patch] netpoll: fix netpoll lockups Linus Torvalds
  0 siblings, 2 replies; 10+ messages in thread
From: Ingo Molnar @ 2006-12-12 10:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, David S. Miller, Thomas Gleixner, linux-kernel

Subject: [patch] netpoll: fix netpoll lockups
From: Ingo Molnar <mingo@elte.hu>

current -git doesnt boot on my laptop due to the following netpoll 
breakages:

 - unlock the tx lock in the else branch too ...
 - use irq-safe locking instead of bh-safe locking, netpoll is
   often called from irq context.

with this patch -git boots fine with lockdep enabled and there are no 
locking complaints and everything works fine. (The netpoll_send_skb() 
portion of this patch was based on Andrew's bh-locking based netpoll 
patch in -mm.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 net/core/netpoll.c |   39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

Index: linux-hres-timers.q/net/core/netpoll.c
===================================================================
--- linux-hres-timers.q.orig/net/core/netpoll.c
+++ linux-hres-timers.q/net/core/netpoll.c
@@ -55,6 +55,7 @@ static void queue_process(struct work_st
 	struct netpoll_info *npinfo =
 		container_of(work, struct netpoll_info, tx_work.work);
 	struct sk_buff *skb;
+	unsigned long flags;
 
 	while ((skb = skb_dequeue(&npinfo->txq))) {
 		struct net_device *dev = skb->dev;
@@ -64,15 +65,19 @@ static void queue_process(struct work_st
 			continue;
 		}
 
-		netif_tx_lock_bh(dev);
+		local_irq_save(flags);
+		netif_tx_lock(dev);
 		if (netif_queue_stopped(dev) ||
 		    dev->hard_start_xmit(skb, dev) != NETDEV_TX_OK) {
 			skb_queue_head(&npinfo->txq, skb);
-			netif_tx_unlock_bh(dev);
+			netif_tx_unlock(dev);
+			local_irq_restore(flags);
 
 			schedule_delayed_work(&npinfo->tx_work, HZ/10);
 			return;
 		}
+		netif_tx_unlock(dev);
+		local_irq_restore(flags);
 	}
 }
 
@@ -231,7 +236,7 @@ repeat:
 static void netpoll_send_skb(struct netpoll *np, struct sk_buff *skb)
 {
 	int status = NETDEV_TX_BUSY;
-	unsigned long tries;
+	unsigned long tries, flags;
  	struct net_device *dev = np->dev;
  	struct netpoll_info *npinfo = np->dev->npinfo;
 
@@ -242,22 +247,26 @@ static void netpoll_send_skb(struct netp
 
 	/* don't get messages out of order, and no recursion */
 	if (skb_queue_len(&npinfo->txq) == 0 &&
-	    npinfo->poll_owner != smp_processor_id() &&
-	    netif_tx_trylock(dev)) {
-		/* try until next clock tick */
-		for (tries = jiffies_to_usecs(1)/USEC_PER_POLL; tries > 0; --tries) {
-			if (!netif_queue_stopped(dev))
-				status = dev->hard_start_xmit(skb, dev);
+		    npinfo->poll_owner != smp_processor_id()) {
+		local_irq_save(flags);	/* Where's netif_tx_trylock_irqsave()? */
+		if (netif_tx_trylock(dev)) {
+			/* try until next clock tick */
+			for (tries = jiffies_to_usecs(1)/USEC_PER_POLL;
+					tries > 0; --tries) {
+				if (!netif_queue_stopped(dev))
+					status = dev->hard_start_xmit(skb, dev);
 
-			if (status == NETDEV_TX_OK)
-				break;
+				if (status == NETDEV_TX_OK)
+					break;
 
-			/* tickle device maybe there is some cleanup */
-			netpoll_poll(np);
+				/* tickle device maybe there is some cleanup */
+				netpoll_poll(np);
 
-			udelay(USEC_PER_POLL);
+				udelay(USEC_PER_POLL);
+			}
+			netif_tx_unlock(dev);
 		}
-		netif_tx_unlock(dev);
+		local_irq_restore(flags);
 	}
 
 	if (status != NETDEV_TX_OK) {

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch] net, 8139too.c: fix netpoll deadlock
  2006-12-12 10:16 [patch] netpoll: fix netpoll lockups Ingo Molnar
@ 2006-12-12 12:49 ` Ingo Molnar
  2006-12-12 13:28   ` Jeff Garzik
  2006-12-12 16:13 ` [patch] netpoll: fix netpoll lockups Linus Torvalds
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2006-12-12 12:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, David S. Miller, Thomas Gleixner, linux-kernel

Subject: [patch] net, 8139too.c: fix netpoll deadlock
From: Ingo Molnar <mingo@elte.hu>

fix deadlock in the 8139too driver: poll handlers should never forcibly 
enable local interrupts, because they might be used by netpoll/printk 
from IRQ context.

=================================
[ INFO: inconsistent lock state ]
2.6.19 #11
---------------------------------
inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/1 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (&npinfo->poll_lock){-+..}, at: [<c0350a41>] net_rx_action+0x64/0x1de
{softirq-on-W} state was registered at:
  [<c0134c86>] mark_lock+0x5b/0x39c
  [<c0135012>] mark_held_locks+0x4b/0x68
  [<c01351e9>] trace_hardirqs_on+0x115/0x139
  [<c02879e6>] rtl8139_poll+0x3d7/0x3f4
  [<c035c85d>] netpoll_poll+0x82/0x32f
  [<c035c775>] netpoll_send_skb+0xc9/0x12f
  [<c035cdcc>] netpoll_send_udp+0x253/0x25b
  [<c0288463>] write_msg+0x40/0x65
  [<c011cead>] __call_console_drivers+0x45/0x51
  [<c011cf16>] _call_console_drivers+0x5d/0x61
  [<c011d4fb>] release_console_sem+0x11f/0x1d8
  [<c011d7d7>] register_console+0x1ac/0x1b3
  [<c02883f8>] init_netconsole+0x55/0x67
  [<c010040c>] init+0x9a/0x24e
  [<c01049cf>] kernel_thread_helper+0x7/0x10
  [<ffffffff>] 0xffffffff
irq event stamp: 819992
hardirqs last  enabled at (819992): [<c0350a16>] net_rx_action+0x39/0x1de
hardirqs last disabled at (819991): [<c0350b1e>] net_rx_action+0x141/0x1de
softirqs last  enabled at (817552): [<c01214e4>] __do_softirq+0xa3/0xa8
softirqs last disabled at (819987): [<c0106051>] do_softirq+0x5b/0xc9

other info that might help us debug this:
no locks held by swapper/1.

stack backtrace:
 [<c0104d88>] dump_trace+0x63/0x1e8
 [<c0104f26>] show_trace_log_lvl+0x19/0x2e
 [<c010532d>] show_trace+0x12/0x14
 [<c0105343>] dump_stack+0x14/0x16
 [<c0134980>] print_usage_bug+0x23c/0x246
 [<c0134d33>] mark_lock+0x108/0x39c
 [<c01356a7>] __lock_acquire+0x361/0x9ed
 [<c0136018>] lock_acquire+0x56/0x72
 [<c03aff1f>] _spin_lock+0x35/0x42
 [<c0350a41>] net_rx_action+0x64/0x1de
 [<c0121493>] __do_softirq+0x52/0xa8
 [<c0106051>] do_softirq+0x5b/0xc9
 [<c0121338>] irq_exit+0x3c/0x48
 [<c0106163>] do_IRQ+0xa4/0xbd
 [<c01047c6>] common_interrupt+0x2e/0x34
 [<c011db92>] vprintk+0x2c0/0x309
 [<c011dbf6>] printk+0x1b/0x1d
 [<c01003f2>] init+0x80/0x24e
 [<c01049cf>] kernel_thread_helper+0x7/0x10
 =======================

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 drivers/net/8139too.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-hres-timers.q/drivers/net/8139too.c
===================================================================
--- linux-hres-timers.q.orig/drivers/net/8139too.c
+++ linux-hres-timers.q/drivers/net/8139too.c
@@ -2131,14 +2131,15 @@ static int rtl8139_poll(struct net_devic
 	}
 
 	if (done) {
+		unsigned long flags;
 		/*
 		 * Order is important since data can get interrupted
 		 * again when we think we are done.
 		 */
-		local_irq_disable();
+		local_irq_save(flags);
 		RTL_W16_F(IntrMask, rtl8139_intr_mask);
 		__netif_rx_complete(dev);
-		local_irq_enable();
+		local_irq_restore(flags);
 	}
 	spin_unlock(&tp->rx_lock);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] net, 8139too.c: fix netpoll deadlock
  2006-12-12 12:49 ` [patch] net, 8139too.c: fix netpoll deadlock Ingo Molnar
@ 2006-12-12 13:28   ` Jeff Garzik
  2006-12-12 21:49     ` Francois Romieu
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2006-12-12 13:28 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, David S. Miller, Thomas Gleixner,
	linux-kernel

Ingo Molnar wrote:
> Subject: [patch] net, 8139too.c: fix netpoll deadlock
> From: Ingo Molnar <mingo@elte.hu>
> 
> fix deadlock in the 8139too driver: poll handlers should never forcibly 
> enable local interrupts, because they might be used by netpoll/printk 
> from IRQ context.

ACK

(I'll queue it, if Linus doesn't pick it up; please CC me in the future)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] netpoll: fix netpoll lockups
  2006-12-12 10:16 [patch] netpoll: fix netpoll lockups Ingo Molnar
  2006-12-12 12:49 ` [patch] net, 8139too.c: fix netpoll deadlock Ingo Molnar
@ 2006-12-12 16:13 ` Linus Torvalds
  2006-12-12 16:20   ` [patch] netpoll: fix netpoll lockup Ingo Molnar
  1 sibling, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2006-12-12 16:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, David S. Miller, Thomas Gleixner, Herbert Xu,
	Linux Kernel Mailing List



On Tue, 12 Dec 2006, Ingo Molnar wrote:
> 
> current -git doesnt boot on my laptop due to the following netpoll 
> breakages:
> 
>  - unlock the tx lock in the else branch too ...
>  - use irq-safe locking instead of bh-safe locking, netpoll is
>    often called from irq context.

This one doesn't apply for me any more, probably because David checked in 
the patch from Andrew that fixed at least _part_ of the problem.

Davem, Ingo, Herbert, can you verify whether the fixes in the current -git 
tree replace this patch from Ingo, or whether Ingo's patch is still needed 
and just needs to be refreshed.

		Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [patch] netpoll: fix netpoll lockup
  2006-12-12 16:13 ` [patch] netpoll: fix netpoll lockups Linus Torvalds
@ 2006-12-12 16:20   ` Ingo Molnar
  2006-12-12 21:00     ` Herbert Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2006-12-12 16:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, David S. Miller, Thomas Gleixner, Herbert Xu,
	Linux Kernel Mailing List


* Linus Torvalds <torvalds@osdl.org> wrote:

> On Tue, 12 Dec 2006, Ingo Molnar wrote:
> > 
> > current -git doesnt boot on my laptop due to the following netpoll 
> > breakages:
> > 
> >  - unlock the tx lock in the else branch too ...
> >  - use irq-safe locking instead of bh-safe locking, netpoll is
> >    often called from irq context.
> 
> This one doesn't apply for me any more, probably because David checked 
> in the patch from Andrew that fixed at least _part_ of the problem.
> 
> Davem, Ingo, Herbert, can you verify whether the fixes in the current 
> -git tree replace this patch from Ingo, or whether Ingo's patch is 
> still needed and just needs to be refreshed.

the first half of it is still needed - find the delta patch ontop of 
current -git below.

	Ingo

------------------------>
Subject: [patch] netpoll: fix netpoll lockup
From: Ingo Molnar <mingo@elte.hu>

current -git doesnt boot on my laptop due to netpoll
not unlocking the tx lock in the else branch.

booted this up on my laptop with lockdep enabled and there are
no locking complaints and it works fine.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 net/core/netpoll.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: linux-hres-timers.q/net/core/netpoll.c
===================================================================
--- linux-hres-timers.q.orig/net/core/netpoll.c
+++ linux-hres-timers.q/net/core/netpoll.c
@@ -55,6 +55,7 @@ static void queue_process(struct work_st
 	struct netpoll_info *npinfo =
 		container_of(work, struct netpoll_info, tx_work.work);
 	struct sk_buff *skb;
+	unsigned long flags;
 
 	while ((skb = skb_dequeue(&npinfo->txq))) {
 		struct net_device *dev = skb->dev;
@@ -64,15 +65,19 @@ static void queue_process(struct work_st
 			continue;
 		}
 
-		netif_tx_lock_bh(dev);
+		local_irq_save(flags);
+		netif_tx_lock(dev);
 		if (netif_queue_stopped(dev) ||
 		    dev->hard_start_xmit(skb, dev) != NETDEV_TX_OK) {
 			skb_queue_head(&npinfo->txq, skb);
-			netif_tx_unlock_bh(dev);
+			netif_tx_unlock(dev);
+			local_irq_restore(flags);
 
 			schedule_delayed_work(&npinfo->tx_work, HZ/10);
 			return;
 		}
+		netif_tx_unlock(dev);
+		local_irq_restore(flags);
 	}
 }
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] netpoll: fix netpoll lockup
  2006-12-12 16:20   ` [patch] netpoll: fix netpoll lockup Ingo Molnar
@ 2006-12-12 21:00     ` Herbert Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2006-12-12 21:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Andrew Morton, David S. Miller, Thomas Gleixner,
	Linux Kernel Mailing List

On Tue, Dec 12, 2006 at 05:20:42PM +0100, Ingo Molnar wrote:
> 
> the first half of it is still needed - find the delta patch ontop of 
> current -git below.

The unlock in the else branch is definitely needed.  However, since
queue_process is always run from process context we don't need the
IRQ disabling.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] net, 8139too.c: fix netpoll deadlock
  2006-12-12 13:28   ` Jeff Garzik
@ 2006-12-12 21:49     ` Francois Romieu
  2006-12-13  1:12       ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Romieu @ 2006-12-12 21:49 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ingo Molnar, Linus Torvalds, Andrew Morton, David S. Miller,
	Thomas Gleixner, linux-kernel

Jeff Garzik <jeff@garzik.org> :
> Ingo Molnar wrote:
[...]
> >fix deadlock in the 8139too driver: poll handlers should never forcibly 
> >enable local interrupts, because they might be used by netpoll/printk 
> >from IRQ context.
> 
> ACK
> 
> (I'll queue it, if Linus doesn't pick it up; please CC me in the future)

I have lived with the "NAPI ->poll() handler runs in BH irq enabled context"
rule for years. Is it definitely false/dead ?

If so at least 8139cp needs the same fix.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] net, 8139too.c: fix netpoll deadlock
  2006-12-12 21:49     ` Francois Romieu
@ 2006-12-13  1:12       ` Ingo Molnar
  2007-02-14  2:30         ` Atsushi Nemoto
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2006-12-13  1:12 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Jeff Garzik, Linus Torvalds, Andrew Morton, David S. Miller,
	Thomas Gleixner, linux-kernel


* Francois Romieu <romieu@fr.zoreil.com> wrote:

> > (I'll queue it, if Linus doesn't pick it up; please CC me in the 
> > future)
> 
> I have lived with the "NAPI ->poll() handler runs in BH irq enabled 
> context" rule for years. Is it definitely false/dead ?
> 
> If so at least 8139cp needs the same fix.

hm, this isnt really about NAPI polling, but about the 
netconsole/netpoll/netdump poll_controller() handler.

with netconsole, printk can be called from IRQ context (and is 
frequently from IRQ context during bootup or module initialization), so 
a BH rule isnt enough for them.

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] net, 8139too.c: fix netpoll deadlock
  2006-12-13  1:12       ` Ingo Molnar
@ 2007-02-14  2:30         ` Atsushi Nemoto
  2007-02-16  4:33           ` Atsushi Nemoto
  0 siblings, 1 reply; 10+ messages in thread
From: Atsushi Nemoto @ 2007-02-14  2:30 UTC (permalink / raw)
  To: mingo; +Cc: romieu, jeff, torvalds, akpm, davem, tglx, linux-kernel

Let me resume two months old topic...

On Wed, 13 Dec 2006 02:12:31 +0100, Ingo Molnar <mingo@elte.hu> wrote:
> > I have lived with the "NAPI ->poll() handler runs in BH irq enabled 
> > context" rule for years. Is it definitely false/dead ?
> > 
> > If so at least 8139cp needs the same fix.
> 
> hm, this isnt really about NAPI polling, but about the 
> netconsole/netpoll/netdump poll_controller() handler.
> 
> with netconsole, printk can be called from IRQ context (and is 
> frequently from IRQ context during bootup or module initialization), so 
> a BH rule isnt enough for them.

I see NAPI poll routine might be called with interrupt disabled.

Many (all?) NAPI drivers call netif_receive_skb() from its poll
routine (as described in NAPI-HOWTO.txt), but I thought
netif_receive_skb() cannot be called from irq context or irq disabled.
So it seems the problem is not solved completely.  Or am I missing
something?

---
Atsushi Nemoto

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] net, 8139too.c: fix netpoll deadlock
  2007-02-14  2:30         ` Atsushi Nemoto
@ 2007-02-16  4:33           ` Atsushi Nemoto
  0 siblings, 0 replies; 10+ messages in thread
From: Atsushi Nemoto @ 2007-02-16  4:33 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, romieu, jeff, torvalds, akpm, davem, tglx

On Wed, 14 Feb 2007 11:30:25 +0900 (JST), Atsushi Nemoto <anemo@mba.ocn.ne.jp> wrote:
> > hm, this isnt really about NAPI polling, but about the 
> > netconsole/netpoll/netdump poll_controller() handler.
> > 
> > with netconsole, printk can be called from IRQ context (and is 
> > frequently from IRQ context during bootup or module initialization), so 
> > a BH rule isnt enough for them.
> 
> I see NAPI poll routine might be called with interrupt disabled.
> 
> Many (all?) NAPI drivers call netif_receive_skb() from its poll
> routine (as described in NAPI-HOWTO.txt), but I thought
> netif_receive_skb() cannot be called from irq context or irq disabled.
> So it seems the problem is not solved completely.  Or am I missing
> something?

Any comments for this issue?  If my understanding was correct, I think
add some checking to netif_receive_skb() is better then fixing all
poll routines.  Is this patch acceptable?


Subject: fix irq problem with NAPI + NETPOLL

It seems netif_receive_skb() was designed not to call from irq
context, but NAPI + NETPOLL break this rule.  If netif_receive_skb()
was called from irq context, redirect to netif_rx() instead of
processing the skb in that context.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
---
--- linux-2.6.20/net/core/dev.c	2007-02-05 03:44:54.000000000 +0900
+++ linux/net/core/dev.c	2007-02-16 13:19:06.000000000 +0900
@@ -1769,8 +1769,15 @@ int netif_receive_skb(struct sk_buff *sk
 	__be16 type;
 
 	/* if we've gotten here through NAPI, check netpoll */
-	if (skb->dev->poll && netpoll_rx(skb))
-		return NET_RX_DROP;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	if (skb->dev->poll && skb->dev->poll_controller) {
+		/* NAPI poll might be called in irq context on NETPOLL */
+		if (in_irq() || irqs_disabled())
+			return netif_rx(skb);
+		if (netpoll_rx(skb))
+			return NET_RX_DROP;
+	}
+#endif
 
 	if (!skb->tstamp.off_sec)
 		net_timestamp(skb);

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-02-16  4:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-12 10:16 [patch] netpoll: fix netpoll lockups Ingo Molnar
2006-12-12 12:49 ` [patch] net, 8139too.c: fix netpoll deadlock Ingo Molnar
2006-12-12 13:28   ` Jeff Garzik
2006-12-12 21:49     ` Francois Romieu
2006-12-13  1:12       ` Ingo Molnar
2007-02-14  2:30         ` Atsushi Nemoto
2007-02-16  4:33           ` Atsushi Nemoto
2006-12-12 16:13 ` [patch] netpoll: fix netpoll lockups Linus Torvalds
2006-12-12 16:20   ` [patch] netpoll: fix netpoll lockup Ingo Molnar
2006-12-12 21:00     ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).