All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message
@ 2009-11-25 21:35 Andreas Mohr
  2009-11-25 22:56 ` David Miller
  2009-11-30  8:15 ` David Miller
  0 siblings, 2 replies; 5+ messages in thread
From: Andreas Mohr @ 2009-11-25 21:35 UTC (permalink / raw)
  To: davem; +Cc: zambrano, dave, mb, netdev, linux-kernel

Hello all,

my OpenWrt MIPSEL ASUS WL-500gP v2 kept locking up during driver
debugging with kobject debugging turned on (when debugging non-working
drivers such as USB, ftdi_sio and usb_audio).
Turns out that it's not very helpful to have a netconsole-deployed
network driver's interrupt handler message spew noise permanently
(see comment in patch).

See
http://bugzilla.kernel.org/show_bug.cgi?id=14691
for background information.

Tested on 2.6.30.9 and working (now normal debug messages manage
to come through, too, yet box still locked up, simply due to excessive
_normal_ log messages such as kobject logging, but that's _expected_
and user-correctable).

checkpatch.pl'd, applies cleanly to -rc8.
Likely -stable candidate material.

r8169 (and perhaps other netconsole-capable drivers?)
might need such a correction, too, if the
                        if (likely(napi_schedule_prep(&tp->napi)))
                                __napi_schedule(&tp->napi);
                        else if (netif_msg_intr(tp)) {
                                printk(KERN_INFO "%s: interrupt %04x in poll\n",
                                dev->name, status);
                        }
is similarly problematic.
BTW: this patch here is to be seen separate from the r8169 interrupt handler
rework as discussed at "r8169: avoid losing MSI interrupts", since even with a
"working" interrupt handler (i.e. one that closes the "new interrupt" race window,
which the one in b44 currently probably doesn't) such error messages
have a fatal feedback loop in the case of netconsole deployment.

Thanks,

Signed-off-by: Andreas Mohr <andi@lisas.de>

--- linux-2.6.30.9/drivers/net/b44.c.orig	2009-11-25 20:49:48.000000000 +0100
+++ linux-2.6.30.9/drivers/net/b44.c	2009-11-25 21:26:34.000000000 +0100
@@ -913,8 +913,18 @@
 			__b44_disable_ints(bp);
 			__napi_schedule(&bp->napi);
 		} else {
-			printk(KERN_ERR PFX "%s: Error, poll already scheduled\n",
-			       dev->name);
+			/* netconsole fix!!:
+			 * without ratelimiting, this message would:
+			 * immediately find its way out to b44 netconsole ->
+			 * new IRQ -> re-encounter failed napi_schedule_prep()
+			 * -> error message -> ad nauseam -> box LOCKUP.
+			 * See thread "r8169: avoid losing MSI interrupts"
+			 * for further inspiration. */
+			if (printk_ratelimit())
+				printk(KERN_ERR PFX
+					"%s: Error, poll already scheduled; "
+					"istat 0x%x, imask 0x%x\n",
+						dev->name, istat, imask);
 		}
 
 irq_ack:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message
  2009-11-25 21:35 [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message Andreas Mohr
@ 2009-11-25 22:56 ` David Miller
  2009-11-30  8:15 ` David Miller
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2009-11-25 22:56 UTC (permalink / raw)
  To: andi; +Cc: zambrano, dave, mb, netdev, linux-kernel

From: Andreas Mohr <andi@lisas.de>
Date: Wed, 25 Nov 2009 22:35:46 +0100

> Turns out that it's not very helpful to have a netconsole-deployed
> network driver's interrupt handler message spew noise permanently
> (see comment in patch).

It's not even an illegal state, especially with shared interrupts.
The warning should never be emitted.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message
  2009-11-25 21:35 [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message Andreas Mohr
  2009-11-25 22:56 ` David Miller
@ 2009-11-30  8:15 ` David Miller
  2009-11-30  8:16   ` David Miller
  2009-11-30 12:17   ` Andreas Mohr
  1 sibling, 2 replies; 5+ messages in thread
From: David Miller @ 2009-11-30  8:15 UTC (permalink / raw)
  To: andi; +Cc: zambrano, dave, mb, netdev, linux-kernel

From: Andreas Mohr <andi@lisas.de>
Date: Wed, 25 Nov 2009 22:35:46 +0100

> See
> http://bugzilla.kernel.org/show_bug.cgi?id=14691
> for background information.

The patch below is what I'll check in to fix this, thanks.

As for the r8169 side, that case is much more complicated
to fix.  That driver messes with the interrupt masking
before the NAPI schedule check, instead of after it's
sure that NAPI isn't already scheduled like b44 does.

Therefore we might need to undo that programming or move
it into the code block where __napi_schedule() is actually
invoked.

I'll queue this b44 patch up for -stable too.

b44: Fix wedge when using netconsole.

Fixes kernel bugzilla #14691

Due to the way netpoll works, it is perfectly legal to see
NAPI already scheduled when new device events are pending
in b44_interrupt().

So logging a message about it is wrong and in fact harmful.

Based upon a patch by Andreas Mohr.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/b44.c |    3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/drivers/net/b44.c b/drivers/net/b44.c
index e046943..2a91323 100644
--- a/drivers/net/b44.c
+++ b/drivers/net/b44.c
@@ -912,9 +912,6 @@ static irqreturn_t b44_interrupt(int irq, void *dev_id)
 			bp->istat = istat;
 			__b44_disable_ints(bp);
 			__napi_schedule(&bp->napi);
-		} else {
-			printk(KERN_ERR PFX "%s: Error, poll already scheduled\n",
-			       dev->name);
 		}
 
 irq_ack:
-- 
1.6.5


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message
  2009-11-30  8:15 ` David Miller
@ 2009-11-30  8:16   ` David Miller
  2009-11-30 12:17   ` Andreas Mohr
  1 sibling, 0 replies; 5+ messages in thread
From: David Miller @ 2009-11-30  8:16 UTC (permalink / raw)
  To: andi; +Cc: zambrano, dave, mb, netdev, linux-kernel


Could someone email andi@lisas.de privately and let him know
that he isn't seeing any of my emails:

<andi@lisas.de>: host rhlx01.hs-esslingen.de[129.143.116.10] said: 550 5.7.1
    <74-93-104-97-Washington.hfc.comcastbusiness.net[74.93.104.97]>: Client
    host rejected: connections from dialup hosts not accepted. use your ISPs
    mail relay. (in reply to RCPT TO command)

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message
  2009-11-30  8:15 ` David Miller
  2009-11-30  8:16   ` David Miller
@ 2009-11-30 12:17   ` Andreas Mohr
  1 sibling, 0 replies; 5+ messages in thread
From: Andreas Mohr @ 2009-11-30 12:17 UTC (permalink / raw)
  To: David Miller; +Cc: zambrano, dave, mb, netdev, linux-kernel

Hi,

now managed to see the replies, thanks for the notification Michael!
(I'm afraid our SMTP trap won't get lifted since it's probably
justified given the SPAM situation).

> The patch below is what I'll check in to fix this, thanks.

Seems fine.

> I'll queue this b44 patch up for -stable too.

Dito.

Thank you very much,

Andreas Mohr

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-11-30 12:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-25 21:35 [PATCH] b44.c box lockup fix (netconsole): ratelimit NAPI poll error message Andreas Mohr
2009-11-25 22:56 ` David Miller
2009-11-30  8:15 ` David Miller
2009-11-30  8:16   ` David Miller
2009-11-30 12:17   ` Andreas Mohr

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.