All of lore.kernel.org
 help / color / mirror / Atom feed
* netconsole hangs w/ alt-sysrq-t
@ 2004-04-22 15:29 Jeff Moyer
  2004-04-25 19:15 ` Matt Mackall
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-04-22 15:29 UTC (permalink / raw)
  To: linux-kernel

If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print a
small amount of output to the console(s) and then hang the system.  In this
case, I'm using the e100 driver, and we end up exhausting the available
cbs.  Since we are in interrupt context, the driver's poll routine is never
run, and we loop infinitely waiting for resources to free up that never
will.  Kernel version is 2.6.5.

No easy solutions jump out at me, unfortunately.  Comments?

Regards,

Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-22 15:29 netconsole hangs w/ alt-sysrq-t Jeff Moyer
@ 2004-04-25 19:15 ` Matt Mackall
  2004-04-28 12:44   ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-04-25 19:15 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Thu, Apr 22, 2004 at 11:29:33AM -0400, Jeff Moyer wrote:
> If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print a
> small amount of output to the console(s) and then hang the system.  In this
> case, I'm using the e100 driver, and we end up exhausting the available
> cbs.  Since we are in interrupt context, the driver's poll routine is never
> run, and we loop infinitely waiting for resources to free up that never
> will.  Kernel version is 2.6.5.

Can you try 2.6.6-rc2? It has a fix to congestion handling that should
address this.

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-25 19:15 ` Matt Mackall
@ 2004-04-28 12:44   ` Jeff Moyer
  2004-04-28 14:03     ` Matt Mackall
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-04-28 12:44 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

mpm> On Thu, Apr 22, 2004 at 11:29:33AM -0400, Jeff Moyer wrote:
>> If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print a
>> small amount of output to the console(s) and then hang the system.  In
>> this case, I'm using the e100 driver, and we end up exhausting the
>> available cbs.  Since we are in interrupt context, the driver's poll
>> routine is never run, and we loop infinitely waiting for resources to
>> free up that never will.  Kernel version is 2.6.5.

mpm> Can you try 2.6.6-rc2? It has a fix to congestion handling that should
mpm> address this.

Is the attached patch the change you are referring to?  If so, I don't see
how this would fix the problem.  I ended up deferring netpoll writes to
process context, which has been working fine for me.  Have I missed
something?

-Jeff

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/04/13 14:07:53-04:00 mpm@selenic.com 
#   [PATCH] netpoll transmit busy bugfix
#   
#   Fix for handling of full transmit queue when netpoll trap is enabled.
#   
#   From Stelian Pop <stelian@popies.net>
# 
# net/core/netpoll.c
#   2004/04/10 16:19:31-04:00 mpm@selenic.com +3 -9
#   netpoll transmit busy bugfix
# 
diff -Nru a/net/core/netpoll.c b/net/core/netpoll.c
--- a/net/core/netpoll.c	Wed Apr 28 05:43:08 2004
+++ b/net/core/netpoll.c	Wed Apr 28 05:43:08 2004
@@ -163,21 +163,15 @@
 	spin_lock(&np->dev->xmit_lock);
 	np->dev->xmit_lock_owner = smp_processor_id();
 
-	if (netif_queue_stopped(np->dev)) {
-		np->dev->xmit_lock_owner = -1;
-		spin_unlock(&np->dev->xmit_lock);
-
-		netpoll_poll(np);
-		goto repeat;
-	}
-
 	status = np->dev->hard_start_xmit(skb, np->dev);
 	np->dev->xmit_lock_owner = -1;
 	spin_unlock(&np->dev->xmit_lock);
 
 	/* transmit busy */
-	if(status)
+	if(status) {
+		netpoll_poll(np);
 		goto repeat;
+	}
 }
 
 void netpoll_send_udp(struct netpoll *np, const char *msg, int len)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-28 12:44   ` Jeff Moyer
@ 2004-04-28 14:03     ` Matt Mackall
  2004-04-28 14:07       ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-04-28 14:03 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Wed, Apr 28, 2004 at 08:44:47AM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> 
> mpm> On Thu, Apr 22, 2004 at 11:29:33AM -0400, Jeff Moyer wrote:
> >> If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print a
> >> small amount of output to the console(s) and then hang the system.  In
> >> this case, I'm using the e100 driver, and we end up exhausting the
> >> available cbs.  Since we are in interrupt context, the driver's poll
> >> routine is never run, and we loop infinitely waiting for resources to
> >> free up that never will.  Kernel version is 2.6.5.
> 
> mpm> Can you try 2.6.6-rc2? It has a fix to congestion handling that should
> mpm> address this.
> 
> Is the attached patch the change you are referring to?  If so, I don't see
> how this would fix the problem.  I ended up deferring netpoll writes to
> process context, which has been working fine for me.  Have I missed
> something?

Well process context defeats the purpose. Ok, I've more closely read
your report and if I understand correctly, you're using the NAPI
version of e100? There's some magic NAPI bits in netpoll_poll that
might help here:

        if(trapped && np->dev->poll &&
           test_bit(__LINK_STATE_RX_SCHED, &np->dev->state))
                np->dev->poll(np->dev, &budget);

Perhaps we need to pull the trapped test out of there. Then with any
luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
we'll call netpoll_poll to pump the card, and we'll be able to flush
it.

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-28 14:03     ` Matt Mackall
@ 2004-04-28 14:07       ` Jeff Moyer
  2004-04-28 14:27         ` Matt Mackall
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-04-28 14:07 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

mpm> On Wed, Apr 28, 2004 at 08:44:47AM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
>> <mpm@selenic.com> adds:
>> 
mpm> On Thu, Apr 22, 2004 at 11:29:33AM -0400, Jeff Moyer wrote:
>> >> If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print
>> a >> small amount of output to the console(s) and then hang the system.
>> In >> this case, I'm using the e100 driver, and we end up exhausting the
>> >> available cbs.  Since we are in interrupt context, the driver's poll
>> >> routine is never run, and we loop infinitely waiting for resources to
>> >> free up that never will.  Kernel version is 2.6.5.
>> 
mpm> Can you try 2.6.6-rc2? It has a fix to congestion handling that should
mpm> address this.
>> Is the attached patch the change you are referring to?  If so, I don't
>> see how this would fix the problem.  I ended up deferring netpoll writes
>> to process context, which has been working fine for me.  Have I missed
>> something?

mpm> Well process context defeats the purpose. Ok, I've more closely read
mpm> your report and if I understand correctly, you're using the NAPI
mpm> version of e100? There's some magic NAPI bits in netpoll_poll that
mpm> might help here:

Yes, sorry I didn't specify that earlier.

mpm>         if(trapped && np->dev->poll && test_bit(__LINK_STATE_RX_SCHED,
mpm> &np->dev->state))
np-> dev->poll(np->dev, &budget);

mpm> Perhaps we need to pull the trapped test out of there. Then with any
mpm> luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
mpm> we'll call netpoll_poll to pump the card, and we'll be able to flush
mpm> it.

I don't think so.  You can end up in code running in interrupt context that
is not designed to (ip routing code, etc).  I've been down that path
already.  I only defer to process context if irqs_disabled().

Other suggestions?

-Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-28 14:07       ` Jeff Moyer
@ 2004-04-28 14:27         ` Matt Mackall
  2004-04-28 18:23           ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-04-28 14:27 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Wed, Apr 28, 2004 at 10:07:17AM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> 
> mpm> On Wed, Apr 28, 2004 at 08:44:47AM -0400, Jeff Moyer wrote:
> >> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
> >> <mpm@selenic.com> adds:
> >> 
> mpm> On Thu, Apr 22, 2004 at 11:29:33AM -0400, Jeff Moyer wrote:
> >> >> If netconsole is enabled, and you hit Alt-Sysrq-t, then it will print
> >> a >> small amount of output to the console(s) and then hang the system.
> >> In >> this case, I'm using the e100 driver, and we end up exhausting the
> >> >> available cbs.  Since we are in interrupt context, the driver's poll
> >> >> routine is never run, and we loop infinitely waiting for resources to
> >> >> free up that never will.  Kernel version is 2.6.5.
> >> 
> mpm> Can you try 2.6.6-rc2? It has a fix to congestion handling that should
> mpm> address this.
> >> Is the attached patch the change you are referring to?  If so, I don't
> >> see how this would fix the problem.  I ended up deferring netpoll writes
> >> to process context, which has been working fine for me.  Have I missed
> >> something?
> 
> mpm> Well process context defeats the purpose. Ok, I've more closely read
> mpm> your report and if I understand correctly, you're using the NAPI
> mpm> version of e100? There's some magic NAPI bits in netpoll_poll that
> mpm> might help here:
> 
> Yes, sorry I didn't specify that earlier.
> 
> mpm>         if(trapped && np->dev->poll && test_bit(__LINK_STATE_RX_SCHED,
> mpm> &np->dev->state))
> np-> dev->poll(np->dev, &budget);
> 
> mpm> Perhaps we need to pull the trapped test out of there. Then with any
> mpm> luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
> mpm> we'll call netpoll_poll to pump the card, and we'll be able to flush
> mpm> it.
> 
> I don't think so.  You can end up in code running in interrupt context that
> is not designed to (ip routing code, etc).  I've been down that path
> already.  I only defer to process context if irqs_disabled().

Fair enough. Turning on trapped basically short circuits the rest of
the NAPI code so that stuff doesn't hit the stack when we call ->poll.
Could you try doing a netpoll_set_trap(1)/(0) around the call to ->poll
and see if that actually lets the thing work? Then we can try to
figure out the right way to do this.

My point about deferring still stands, as when we're dumping an oops,
we can't really expect that we'll ever get to a process context.

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-28 14:27         ` Matt Mackall
@ 2004-04-28 18:23           ` Jeff Moyer
  2004-05-20  2:20             ` klogd, netconsole without /dev/console? Tom Oehser
  2004-06-25 15:50             ` netconsole hangs w/ alt-sysrq-t Jeff Moyer
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff Moyer @ 2004-04-28 18:23 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

[snip]

mpm> Well process context defeats the purpose. Ok, I've more closely read
mpm> your report and if I understand correctly, you're using the NAPI
mpm> version of e100? There's some magic NAPI bits in netpoll_poll that
mpm> might help here:
>> Yes, sorry I didn't specify that earlier.
>> 
mpm> if(trapped && np->dev->poll && test_bit(__LINK_STATE_RX_SCHED,
mpm> &np->dev->state))
np-> dev->poll(np->dev, &budget);
>>
mpm> Perhaps we need to pull the trapped test out of there. Then with any
mpm> luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
mpm> we'll call netpoll_poll to pump the card, and we'll be able to flush
mpm> it.
>> I don't think so.  You can end up in code running in interrupt context
>> that is not designed to (ip routing code, etc).  I've been down that
>> path already.  I only defer to process context if irqs_disabled().

mpm> Fair enough. Turning on trapped basically short circuits the rest of
mpm> the NAPI code so that stuff doesn't hit the stack when we call ->poll.
mpm> Could you try doing a netpoll_set_trap(1)/(0) around the call to
mpm> ->poll and see if that actually lets the thing work? Then we can try
mpm> to figure out the right way to do this.

Okay, I tried that before and I still ran into problems.  Just to be sure,
I grabbed 2.6.6-rc3 and tried your suggestion.  The resulting stack trace
is at the end of this message.  I can't explain how we get from
netif_receive_skb to anywhere up the stack, though.  Perhaps you'll have
better luck with it.

Note that this is alt-sysrq-t output, so the first bits of stack you see
are not relevant.  What counts is everything after the Badness in
local_bh_enable.

mpm> My point about deferring still stands, as when we're dumping an oops,
mpm> we can't really expect that we'll ever get to a process context.

Sure, but you can explicitly flush the queue in such a circumstance, though
this is still not an ideal solution.

-Jeff


pdflush       S 00000000     0     9      4            10     8 (L-TLB)
cfe03f7c 00000046 00000000 00000000 00000000 00000000 cfe03f48 c011fa9c
       5ec54655 123c321a 00000000 00000002 cff725d0 cff736b0 cff736d0 c1244ce0
       0000119b 4318ade1 00000006 cfe12650 cfe12808 00000046 00000000 00000003
Call Trace:
 [<c011fa9c>] recalc_task_prio+0x8c/0x1a0
 [<c014bb80>] pdflush+0x0/0x20
 [<c014b918>] __pdflush+0xb8/0x320
 [<c014bb9a>] pdflush+0x1a/0x20
 [<c014bb80>] pdflush+0x0/0x20
 [<c013b09c>] Badness in local_bh_enable at kernel/softirq.c:136
Call Trace:
 [<c012ab64>] local_bh_enable+0x84/0x90
 [<c02b638e>] neigh_lookup+0x7e/0xb0
 [<c02f3260>] arp_process+0x200/0x530
 [<c0120e9b>] rebalance_tick+0x3b/0xf0
 [<c02b1363>] netif_receive_skb+0x1d3/0x280
 [<d08ff2d7>] e100_poll+0x6e7/0x760 [e100]
 [<d08fe4b4>] e100_xmit_frame+0x284/0x400 [e100]
 [<d08febf0>] e100_poll+0x0/0x760 [e100]
 [<c02bd13e>] netpoll_poll+0x5e/0x60
 [<c02bd413>] netpoll_send_skb+0xc3/0x120
 [<d086a058>] write_msg+0x58/0x60 [netconsole]
 [<d086a000>] write_msg+0x0/0x60 [netconsole]
 [<c01263c6>] __call_console_drivers+0x56/0x60
 [<c0126927>] release_console_sem+0x77/0x140
 [<c0126790>] printk+0x1c0/0x270
 [<c014bb80>] pdflush+0x0/0x20
 [<c013b09c>] kthread+0x9c/0xb0
 [<c0107f35>] show_trace+0xa5/0xc0
 [<c013b09c>] kthread+0x9c/0xb0
 [<c0107fcb>] show_stack+0x7b/0x90
 [<c012280c>] show_state+0x5c/0x90
 [<c022042e>] __handle_sysrq_nolock+0x6e/0xe6
 [<c02203a0>] handle_sysrq+0x40/0x60
 [<c021a662>] kbd_event+0x32/0x70
 [<c0294ab5>] input_event+0xf5/0x400
 [<c0297cbf>] atkbd_report_key+0x2f/0x80
 [<c0297f33>] atkbd_interrupt+0x223/0x530
 [<c029bb46>] serio_interrupt+0x66/0x70
 [<c029c464>] i8042_interrupt+0xf4/0x190
 [<c0109560>] handle_IRQ_event+0x30/0x60
 [<c01099f1>] do_IRQ+0x121/0x290
 =======================
 [<c0107ba4>] common_interrupt+0x18/0x20
 [<c011007b>] transmeta_identify+0x2b/0x60
 [<c011856f>] apm_bios_call_simple+0x9f/0xe0
 [<c01186d8>] apm_do_idle+0x18/0x70
 [<c01187dc>] apm_cpu_idle+0x7c/0x150
 [<c0105030>] default_idle+0x0/0x40
 [<c01050f6>] cpu_idle+0x46/0x50
 [<c03dd8ff>] start_kernel+0x19f/0x200
 [<c03dd480>] unknown_bootoption+0x0/0x130

^ permalink raw reply	[flat|nested] 18+ messages in thread

* klogd, netconsole without /dev/console?
  2004-04-28 18:23           ` Jeff Moyer
@ 2004-05-20  2:20             ` Tom Oehser
  2004-06-25 15:50             ` netconsole hangs w/ alt-sysrq-t Jeff Moyer
  1 sibling, 0 replies; 18+ messages in thread
From: Tom Oehser @ 2004-05-20  2:20 UTC (permalink / raw)
  Cc: linux-kernel


 If I start klogd with a -c, it turns off _both_ netconsole _and_ /dev/console.

 If I start klogd without -c, I get both of them on.

 Shouldn't there be a way to have klogd and netconsole but no /dev/console?

-Tom

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-04-28 18:23           ` Jeff Moyer
  2004-05-20  2:20             ` klogd, netconsole without /dev/console? Tom Oehser
@ 2004-06-25 15:50             ` Jeff Moyer
  2004-06-25 23:27               ` Matt Mackall
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-06-25 15:50 UTC (permalink / raw)
  To: Matt Mackall, linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Jeff Moyer <jmoyer@redhat.com> adds:

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
jmoyer> [snip]

mpm> Well process context defeats the purpose. Ok, I've more closely read
mpm> your report and if I understand correctly, you're using the NAPI
mpm> version of e100? There's some magic NAPI bits in netpoll_poll that
mpm> might help here:
>>> Yes, sorry I didn't specify that earlier.
>>> 
mpm> if(trapped && np->dev->poll && test_bit(__LINK_STATE_RX_SCHED,
mpm> &np->dev->state))
np-> dev->poll(np->dev, &budget);
>>>
mpm> Perhaps we need to pull the trapped test out of there. Then with any
mpm> luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
mpm> we'll call netpoll_poll to pump the card, and we'll be able to flush
mpm> it.
>>> I don't think so.  You can end up in code running in interrupt context
>>> that is not designed to (ip routing code, etc).  I've been down that
>>> path already.  I only defer to process context if irqs_disabled().

mpm> Fair enough. Turning on trapped basically short circuits the rest of
mpm> the NAPI code so that stuff doesn't hit the stack when we call ->poll.
mpm> Could you try doing a netpoll_set_trap(1)/(0) around the call to
mpm> ->poll and see if that actually lets the thing work? Then we can try
mpm> to figure out the right way to do this.

You will still hit the stack in the case of netconsole, since it doesn't
register an rx hook and netif_receive_skb has this:

#ifdef CONFIG_NETPOLL_RX
	if (skb->dev->netpoll_rx && skb->dev->poll && netpoll_rx(skb)) {
		kfree_skb(skb);
		return NET_RX_DROP;
	}
#endif

So we fall through and end up calling code which doesn't want to run in
interrupt context.

One solution that I've come up with, but may not be in the spirit of
netpoll, is to change that test above to the following:

	if (unlikely(netpoll_trap())) {
		if (skb->dev->netpoll_rx && skb->dev->poll)
			netpoll_rx(skb);
		kfree_skb(skb);
		return NET_RX_DROP;
	}

This changes semantics, in that the rx routine will only be called if we've
decided to "trap" the network stack.  Note also that this will cause us to
drop packets while doing our logging.

Matt, any thoughts on this?

Regards,

Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-25 15:50             ` netconsole hangs w/ alt-sysrq-t Jeff Moyer
@ 2004-06-25 23:27               ` Matt Mackall
  2004-06-28 14:48                 ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-06-25 23:27 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Fri, Jun 25, 2004 at 11:50:25AM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Jeff Moyer <jmoyer@redhat.com> adds:
> 
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> jmoyer> [snip]
> 
> mpm> Well process context defeats the purpose. Ok, I've more closely read
> mpm> your report and if I understand correctly, you're using the NAPI
> mpm> version of e100? There's some magic NAPI bits in netpoll_poll that
> mpm> might help here:
> >>> Yes, sorry I didn't specify that earlier.
> >>> 
> mpm> if(trapped && np->dev->poll && test_bit(__LINK_STATE_RX_SCHED,
> mpm> &np->dev->state))
> np-> dev->poll(np->dev, &budget);
> >>>
> mpm> Perhaps we need to pull the trapped test out of there. Then with any
> mpm> luck, dev->hard_start_xmit will return non-zero in netpoll_send_skb,
> mpm> we'll call netpoll_poll to pump the card, and we'll be able to flush
> mpm> it.
> >>> I don't think so.  You can end up in code running in interrupt context
> >>> that is not designed to (ip routing code, etc).  I've been down that
> >>> path already.  I only defer to process context if irqs_disabled().
> 
> mpm> Fair enough. Turning on trapped basically short circuits the rest of
> mpm> the NAPI code so that stuff doesn't hit the stack when we call ->poll.
> mpm> Could you try doing a netpoll_set_trap(1)/(0) around the call to
> mpm> ->poll and see if that actually lets the thing work? Then we can try
> mpm> to figure out the right way to do this.
> 
> You will still hit the stack in the case of netconsole, since it doesn't
> register an rx hook and netif_receive_skb has this:
> 
> #ifdef CONFIG_NETPOLL_RX
> 	if (skb->dev->netpoll_rx && skb->dev->poll && netpoll_rx(skb)) {
> 		kfree_skb(skb);
> 		return NET_RX_DROP;
> 	}
> #endif

Ok, that is indeed a problem.
 
> So we fall through and end up calling code which doesn't want to run in
> interrupt context.
> 
> One solution that I've come up with, but may not be in the spirit of
> netpoll, is to change that test above to the following:
> 
> 	if (unlikely(netpoll_trap())) {
> 		if (skb->dev->netpoll_rx && skb->dev->poll)
> 			netpoll_rx(skb);
> 		kfree_skb(skb);
> 		return NET_RX_DROP;
> 	}
>
> This changes semantics, in that the rx routine will only be called if we've
> decided to "trap" the network stack.  Note also that this will cause us to
> drop packets while doing our logging.

Huh. I'm not sure if that covers everything.

We want to: 

a) push stuff through netpoll_rx() iff rx is enabled
b) drop packets when netpoll_rx() says to
c) drop packets when we're trapped for debugging/netdump
d) drop packets when we manually call dev->poll
e) keep as little overhead as possible

The above breaks (a) when we're not trapped (consider breaking in with
debugger) and (b) in any case.

My current thought is we want to get this down to a single test in the
fastpath by replacing netpoll_rx in skb->dev with a flag that is equal
to netpoll_trapped() | dev->netpoll_rx, which we manipulate when call
dev->poll within netpoll. 

How's that sound?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-25 23:27               ` Matt Mackall
@ 2004-06-28 14:48                 ` Jeff Moyer
  2004-06-28 15:19                   ` Matt Mackall
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-06-28 14:48 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

[snip]

mpm> Huh. I'm not sure if that covers everything.

mpm> We want to:

mpm> a) push stuff through netpoll_rx() iff rx is enabled b) drop packets
mpm> when netpoll_rx() says to c) drop packets when we're trapped for
mpm> debugging/netdump d) drop packets when we manually call dev->poll e)
mpm> keep as little overhead as possible

mpm> The above breaks (a) when we're not trapped (consider breaking in with
mpm> debugger) and (b) in any case.

mpm> My current thought is we want to get this down to a single test in the
mpm> fastpath by replacing netpoll_rx in skb->dev with a flag that is equal
mpm> to netpoll_trapped() | dev->netpoll_rx, which we manipulate when call
dev-> poll within netpoll.

mpm> How's that sound?

That sounds fine.  So, in the netif_receive_skb function, we'd now have
something like:

	if (skb->dev->netpoll_rx && netpoll_rx(skb)) {
		kfree_skb(skb);
		return NET_RX_DROP;
	}

I removed the test for skb->dev->poll, since this is the NAPI code path, I
think it should always be present, right?

And then at the top of the netpoll_rx routine:

	if (!(skb->dev->netpoll_rx & NETPOLL_RX_ENABLED))
		goto out;

The routine, as before, returns the value of trapped.  And now,
netpoll_poll would do this:

	if(np->dev->poll && test_bit(__LINK_STATE_RX_SCHED, &np->dev->state)) {
		np->dev->netpoll_rx |= NETPOLL_RX_TRAPPED;
		if (trapped)
			np->dev->poll(np->dev, &budget);
		else {
			trapped = 1;
			np->dev->poll(np->dev, &budget);
			trapped = 0;
		}
		np->dev->netpoll_rx &= ~NETPOLL_RX_TRAPPED;
	}

Is this more in line with what you had in mind?  One thing I would consider
changing, here, is having netpoll_set_trap take a struct netpoll argument.
That way, we could have netpoll_set_trap manipulate the NETPOLL_RX_TRAPPED
bit.  This would make the trap specific to each interface, but I think that
may be desirable.

-Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-28 14:48                 ` Jeff Moyer
@ 2004-06-28 15:19                   ` Matt Mackall
  2004-06-28 16:58                     ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-06-28 15:19 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Mon, Jun 28, 2004 at 10:48:41AM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> 
> [snip]
> 
> mpm> Huh. I'm not sure if that covers everything.
> 
> mpm> We want to:
> 
> mpm> a) push stuff through netpoll_rx() iff rx is enabled b) drop packets
> mpm> when netpoll_rx() says to c) drop packets when we're trapped for
> mpm> debugging/netdump d) drop packets when we manually call dev->poll e)
> mpm> keep as little overhead as possible
> 
> mpm> The above breaks (a) when we're not trapped (consider breaking in with
> mpm> debugger) and (b) in any case.
> 
> mpm> My current thought is we want to get this down to a single test in the
> mpm> fastpath by replacing netpoll_rx in skb->dev with a flag that is equal
> mpm> to netpoll_trapped() | dev->netpoll_rx, which we manipulate when call
> dev-> poll within netpoll.
> 
> mpm> How's that sound?
> 
> That sounds fine.  So, in the netif_receive_skb function, we'd now have
> something like:
> 
> 	if (skb->dev->netpoll_rx && netpoll_rx(skb)) {
> 		kfree_skb(skb);
> 		return NET_RX_DROP;
> 	}
> 
> I removed the test for skb->dev->poll, since this is the NAPI code path, I
> think it should always be present, right?

No, I think we get to this on the non-NAPI route as well. The ->poll
check keeps us from filtering twice.

> And then at the top of the netpoll_rx routine:
> 
> 	if (!(skb->dev->netpoll_rx & NETPOLL_RX_ENABLED))
> 		goto out;

Let's explicitly return 1.
 
> The routine, as before, returns the value of trapped.  And now,
> netpoll_poll would do this:
> 
> 	if(np->dev->poll && test_bit(__LINK_STATE_RX_SCHED, &np->dev->state)) {
> 		np->dev->netpoll_rx |= NETPOLL_RX_TRAPPED;
> 		if (trapped)
> 			np->dev->poll(np->dev, &budget);
> 		else {
> 			trapped = 1;
> 			np->dev->poll(np->dev, &budget);
> 			trapped = 0;
> 		}
> 		np->dev->netpoll_rx &= ~NETPOLL_RX_TRAPPED;
> 	}
> 
> Is this more in line with what you had in mind? 

Yes. Let's make that NETPOLL_RX_DROP to disambiguate the
trapped/dropped-for-polling. And we can do a simple
increment/decrement on trapped itself - but do we still need it here
now that we've shorted out the NAPI path with RX_DROP?

There might be some new atomicity or memory barrier issues here, be
mindful.

> One thing I would consider changing, here, is having
> netpoll_set_trap take a struct netpoll argument. That way, we could
> have netpoll_set_trap manipulate the NETPOLL_RX_TRAPPED bit. This
> would make the trap specific to each interface, but I think that may
> be desirable.

It doesn't really make sense for trapped to be non-global - it means
the machine is in polling-only mode, eg debugger or the like. It's
rather misusing it to kill the NAPI receive side for netpoll_poll.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-28 15:19                   ` Matt Mackall
@ 2004-06-28 16:58                     ` Jeff Moyer
  2004-06-29 12:36                       ` Jeff Moyer
  2004-07-01 16:52                       ` Matt Mackall
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff Moyer @ 2004-06-28 16:58 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

mpm> On Mon, Jun 28, 2004 at 10:48:41AM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
>> <mpm@selenic.com> adds:
>> 
>> [snip]
>> 
mpm> Huh. I'm not sure if that covers everything.
>>
mpm> We want to:
>>
mpm> a) push stuff through netpoll_rx() iff rx is enabled b) drop packets
mpm> when netpoll_rx() says to c) drop packets when we're trapped for
mpm> debugging/netdump d) drop packets when we manually call dev->poll e)
mpm> keep as little overhead as possible
>>
mpm> The above breaks (a) when we're not trapped (consider breaking in with
mpm> debugger) and (b) in any case.
>>
mpm> My current thought is we want to get this down to a single test in the
mpm> fastpath by replacing netpoll_rx in skb->dev with a flag that is equal
mpm> to netpoll_trapped() | dev->netpoll_rx, which we manipulate when call
dev-> poll within netpoll.
>>
mpm> How's that sound?
>> That sounds fine.  So, in the netif_receive_skb function, we'd now have
>> something like:
>> 
>> if (skb->dev->netpoll_rx && netpoll_rx(skb)) { kfree_skb(skb); return
>> NET_RX_DROP; }
>> 
>> I removed the test for skb->dev->poll, since this is the NAPI code path,
>> I think it should always be present, right?

mpm> No, I think we get to this on the non-NAPI route as well. The ->poll
mpm> check keeps us from filtering twice.

I'll have to take your word for it on this one, as I can't find a way into
this code for the non-napi case.  Would anyone care to give an
authoritative answer on this?

>> And then at the top of the netpoll_rx routine:
>> 
>> if (!(skb->dev->netpoll_rx & NETPOLL_RX_ENABLED)) goto out;

mpm> Let's explicitly return 1.

Ahh, yes, that makes things much more clear.
 
>> The routine, as before, returns the value of trapped.  And now,
>> netpoll_poll would do this:
>> 
>> if(np->dev->poll && test_bit(__LINK_STATE_RX_SCHED, &np->dev->state)) {
np-> dev->netpoll_rx |= NETPOLL_RX_TRAPPED;
>> if (trapped)
np-> dev->poll(np->dev, &budget);
>> else { trapped = 1;
np-> dev->poll(np->dev, &budget);
>> trapped = 0; }
np-> dev->netpoll_rx &= ~NETPOLL_RX_TRAPPED;
>> }
>> 
>> Is this more in line with what you had in mind?

mpm> Yes. Let's make that NETPOLL_RX_DROP to disambiguate the
mpm> trapped/dropped-for-polling. And we can do a simple
mpm> increment/decrement on trapped itself - but do we still need it here
mpm> now that we've shorted out the NAPI path with RX_DROP?

Right.  The code becomes much cleaner with this change (we don't need to
increment/decrement trapped).

mpm> There might be some new atomicity or memory barrier issues here, be
mpm> mindful.

I'll think about this some more.

>> One thing I would consider changing, here, is having netpoll_set_trap
>> take a struct netpoll argument. That way, we could have netpoll_set_trap
>> manipulate the NETPOLL_RX_TRAPPED bit. This would make the trap specific
>> to each interface, but I think that may be desirable.

mpm> It doesn't really make sense for trapped to be non-global - it means
mpm> the machine is in polling-only mode, eg debugger or the like. It's
mpm> rather misusing it to kill the NAPI receive side for netpoll_poll.

Fair enough.  Attached is a patch with these changes.  I've removed
CONFIG_NETPOLL_RX as it no longer seems useful.  Let me know what you
think.

One other thing we might consider is adding a call to touch_nmi_watchdog()
to zap_completion_queue.

Thanks!

Jeff

--- linux-2.6.7/arch/sparc64/defconfig.orig	2004-06-28 12:37:24.000000000 -0400
+++ linux-2.6.7/arch/sparc64/defconfig	2004-06-28 12:37:27.000000000 -0400
@@ -705,7 +705,6 @@
 #
 CONFIG_NET_PKTGEN=m
 CONFIG_NETPOLL=y
-# CONFIG_NETPOLL_RX is not set
 # CONFIG_NETPOLL_TRAP is not set
 CONFIG_NET_POLL_CONTROLLER=y
 CONFIG_HAMRADIO=y
--- linux-2.6.7/arch/ia64/configs/sn2_defconfig.orig	2004-06-28 12:37:50.000000000 -0400
+++ linux-2.6.7/arch/ia64/configs/sn2_defconfig	2004-06-28 12:37:53.000000000 -0400
@@ -636,7 +636,6 @@
 #
 # CONFIG_BT is not set
 CONFIG_NETPOLL=y
-# CONFIG_NETPOLL_RX is not set
 # CONFIG_NETPOLL_TRAP is not set
 CONFIG_NET_POLL_CONTROLLER=y
 
--- linux-2.6.7/arch/ppc64/configs/pSeries_defconfig.orig	2004-06-28 12:39:00.000000000 -0400
+++ linux-2.6.7/arch/ppc64/configs/pSeries_defconfig	2004-06-28 12:39:03.000000000 -0400
@@ -544,7 +544,6 @@
 #
 # CONFIG_BT is not set
 CONFIG_NETPOLL=y
-CONFIG_NETPOLL_RX=y
 CONFIG_NETPOLL_TRAP=y
 CONFIG_NET_POLL_CONTROLLER=y
 
--- linux-2.6.7/arch/ppc64/configs/iSeries_defconfig.orig	2004-06-28 12:39:19.000000000 -0400
+++ linux-2.6.7/arch/ppc64/configs/iSeries_defconfig	2004-06-28 12:39:23.000000000 -0400
@@ -358,7 +358,6 @@
 #
 # CONFIG_NET_PKTGEN is not set
 CONFIG_NETPOLL=y
-CONFIG_NETPOLL_RX=y
 CONFIG_NETPOLL_TRAP=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_HAMRADIO is not set
--- linux-2.6.7/arch/ppc64/defconfig.orig	2004-06-28 12:40:06.000000000 -0400
+++ linux-2.6.7/arch/ppc64/defconfig	2004-06-28 12:40:08.000000000 -0400
@@ -544,8 +544,7 @@
 #
 # CONFIG_BT is not set
 CONFIG_NETPOLL=y
-CONFIG_NETPOLL_RX=y
 CONFIG_NETPOLL_TRAP=y
 CONFIG_NET_POLL_CONTROLLER=y
 
--- linux-2.6.7/arch/x86_64/defconfig.orig	2004-06-28 12:40:21.000000000 -0400
+++ linux-2.6.7/arch/x86_64/defconfig	2004-06-28 12:40:24.000000000 -0400
@@ -399,7 +399,6 @@
 #
 # CONFIG_NET_PKTGEN is not set
 CONFIG_NETPOLL=y
-# CONFIG_NETPOLL_RX is not set
 # CONFIG_NETPOLL_TRAP is not set
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_HAMRADIO is not set
--- linux-2.6.7/include/linux/netdevice.h.orig	2004-06-28 12:05:36.000000000 -0400
+++ linux-2.6.7/include/linux/netdevice.h	2004-06-28 12:13:36.000000000 -0400
@@ -459,7 +459,7 @@
 						     unsigned char *haddr);
 	int			(*neigh_setup)(struct net_device *dev, struct neigh_parms *);
 	int			(*accept_fastpath)(struct net_device *, struct dst_entry*);
-#ifdef CONFIG_NETPOLL_RX
+#ifdef CONFIG_NETPOLL
 	int			netpoll_rx;
 #endif
 #ifdef CONFIG_NET_POLL_CONTROLLER
--- linux-2.6.7/net/core/dev.c.orig	2004-06-28 12:05:22.000000000 -0400
+++ linux-2.6.7/net/core/dev.c	2004-06-28 12:40:52.000000000 -0400
@@ -1575,7 +1575,7 @@
 	struct softnet_data *queue;
 	unsigned long flags;
 
-#ifdef CONFIG_NETPOLL_RX
+#ifdef CONFIG_NETPOLL
 	if (skb->dev->netpoll_rx && netpoll_rx(skb)) {
 		kfree_skb(skb);
 		return NET_RX_DROP;
@@ -1737,7 +1737,7 @@
 	int ret = NET_RX_DROP;
 	unsigned short type;
 
-#ifdef CONFIG_NETPOLL_RX
+#ifdef CONFIG_NETPOLL
 	if (skb->dev->netpoll_rx && skb->dev->poll && netpoll_rx(skb)) {
 		kfree_skb(skb);
 		return NET_RX_DROP;
--- linux-2.6.7/net/core/netpoll.c.orig	2004-06-28 12:05:28.000000000 -0400
+++ linux-2.6.7/net/core/netpoll.c	2004-06-28 12:11:45.000000000 -0400
@@ -29,6 +29,9 @@
 #define MAX_SKBS 32
 #define MAX_UDP_CHUNK 1460
 
+#define NETPOLL_RX_ENABLED  1
+#define NETPOLL_RX_DROP     2
+
 static spinlock_t skb_list_lock = SPIN_LOCK_UNLOCKED;
 static int nr_skbs;
 static struct sk_buff *skbs;
@@ -70,9 +73,12 @@
 	np->dev->poll_controller(np->dev);
 
 	/* If scheduling is stopped, tickle NAPI bits */
-	if(trapped && np->dev->poll &&
-	   test_bit(__LINK_STATE_RX_SCHED, &np->dev->state))
+	if (np->dev->poll && 
+	    test_bit(__LINK_STATE_RX_SCHED, &np->dev->state)) {
+		np->dev->netpoll_rx |= NETPOLL_RX_DROP;
 		np->dev->poll(np->dev, &budget);
+		np->dev->netpoll_rx &= ~NETPOLL_RX_DROP;
+	}
 	zap_completion_queue();
 }
 
@@ -333,6 +339,9 @@
 	struct list_head *p;
 	unsigned long flags;
 
+	if (!(skb->dev->netpoll_rx & NETPOLL_RX_ENABLED))
+		return 1;
+
 	if (skb->dev->type != ARPHRD_ETHER)
 		goto out;
 
@@ -591,10 +600,7 @@
 	if(np->rx_hook) {
 		unsigned long flags;
 
-#ifdef CONFIG_NETPOLL_RX
-		np->dev->netpoll_rx = 1;
-#endif
-
+		np->dev->netpoll_rx = NETPOLL_RX_ENABLED;
 		spin_lock_irqsave(&rx_list_lock, flags);
 		list_add(&np->rx_list, &rx_list);
 		spin_unlock_irqrestore(&rx_list_lock, flags);
@@ -613,9 +619,7 @@
 
 		spin_lock_irqsave(&rx_list_lock, flags);
 		list_del(&np->rx_list);
-#ifdef CONFIG_NETPOLL_RX
-		np->dev->netpoll_rx = 0;
-#endif
+		np->dev->netpoll_rx &= ~NETPOLL_RX_ENABLED;
 		spin_unlock_irqrestore(&rx_list_lock, flags);
 	}
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-28 16:58                     ` Jeff Moyer
@ 2004-06-29 12:36                       ` Jeff Moyer
  2004-07-01 16:52                       ` Matt Mackall
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff Moyer @ 2004-06-29 12:36 UTC (permalink / raw)
  To: Matt Mackall, linux-kernel


[snip]

@@ -70,9 +73,12 @@
 	np->dev->poll_controller(np->dev);
 
 	/* If scheduling is stopped, tickle NAPI bits */
-	if(trapped && np->dev->poll &&
-	   test_bit(__LINK_STATE_RX_SCHED, &np->dev->state))
+	if (np->dev->poll && 
+	    test_bit(__LINK_STATE_RX_SCHED, &np->dev->state)) {
+		np->dev->netpoll_rx |= NETPOLL_RX_DROP;
 		np->dev->poll(np->dev, &budget);
+		np->dev->netpoll_rx &= ~NETPOLL_RX_DROP;
+	}
 	zap_completion_queue();
 }
 
Silly me.  This still needs to set trapped (and still needs more thinking
about smp issues).

-Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-06-28 16:58                     ` Jeff Moyer
  2004-06-29 12:36                       ` Jeff Moyer
@ 2004-07-01 16:52                       ` Matt Mackall
  2004-07-01 16:54                         ` Jeff Moyer
  1 sibling, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-07-01 16:52 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Mon, Jun 28, 2004 at 12:58:22PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> mpm> No, I think we get to this on the non-NAPI route as well. The ->poll
> mpm> check keeps us from filtering twice.
> 
> I'll have to take your word for it on this one, as I can't find a way into
> this code for the non-napi case.  Would anyone care to give an
> authoritative answer on this?

I could be mistaken, but that's my recollection from last summer.
Hopefully I'll have some spare cycles for this next week.
 
> One other thing we might consider is adding a call to touch_nmi_watchdog()
> to zap_completion_queue.

Not sure how I feel about that yet. If we can't get out of the guts of
netpoll in a timely fashion, then perhaps that's an indication that
the watchdog should indeed fire. Describe the scenario where you think
we should do this, please.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-07-01 16:52                       ` Matt Mackall
@ 2004-07-01 16:54                         ` Jeff Moyer
  2004-07-01 17:05                           ` Matt Mackall
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Moyer @ 2004-07-01 16:54 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

mpm> On Mon, Jun 28, 2004 at 12:58:22PM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
>> <mpm@selenic.com> adds:
mpm> No, I think we get to this on the non-NAPI route as well. The ->poll
mpm> check keeps us from filtering twice.
>> I'll have to take your word for it on this one, as I can't find a way
>> into this code for the non-napi case.  Would anyone care to give an
>> authoritative answer on this?

mpm> I could be mistaken, but that's my recollection from last summer.
mpm> Hopefully I'll have some spare cycles for this next week.
 
>> One other thing we might consider is adding a call to
>> touch_nmi_watchdog() to zap_completion_queue.

mpm> Not sure how I feel about that yet. If we can't get out of the guts of
mpm> netpoll in a timely fashion, then perhaps that's an indication that
mpm> the watchdog should indeed fire. Describe the scenario where you think
mpm> we should do this, please.

Alt-Sysrq-t from the keyboard on a heavily-loaded UP system.

-Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-07-01 16:54                         ` Jeff Moyer
@ 2004-07-01 17:05                           ` Matt Mackall
  2004-07-01 17:53                             ` Jeff Moyer
  0 siblings, 1 reply; 18+ messages in thread
From: Matt Mackall @ 2004-07-01 17:05 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: linux-kernel

On Thu, Jul 01, 2004 at 12:54:50PM -0400, Jeff Moyer wrote:
> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:
> 
> mpm> On Mon, Jun 28, 2004 at 12:58:22PM -0400, Jeff Moyer wrote:
> >> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
> >> <mpm@selenic.com> adds:
> mpm> No, I think we get to this on the non-NAPI route as well. The ->poll
> mpm> check keeps us from filtering twice.
> >> I'll have to take your word for it on this one, as I can't find a way
> >> into this code for the non-napi case.  Would anyone care to give an
> >> authoritative answer on this?
> 
> mpm> I could be mistaken, but that's my recollection from last summer.
> mpm> Hopefully I'll have some spare cycles for this next week.
>  
> >> One other thing we might consider is adding a call to
> >> touch_nmi_watchdog() to zap_completion_queue.
> 
> mpm> Not sure how I feel about that yet. If we can't get out of the guts of
> mpm> netpoll in a timely fashion, then perhaps that's an indication that
> mpm> the watchdog should indeed fire. Describe the scenario where you think
> mpm> we should do this, please.
> 
> Alt-Sysrq-t from the keyboard on a heavily-loaded UP system.

Is it because we're spinning for too long waiting for skbs or is it
just because sysrq-t takes that long to dump its guts? How does the
watchdog get tickled in the sysrq-t over slow serial console case?

In other words, would this better be done between dumps in the sysrq-t
loop where we actually know we're making forward progress?

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: netconsole hangs w/ alt-sysrq-t
  2004-07-01 17:05                           ` Matt Mackall
@ 2004-07-01 17:53                             ` Jeff Moyer
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Moyer @ 2004-07-01 17:53 UTC (permalink / raw)
  To: Matt Mackall; +Cc: linux-kernel

==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall <mpm@selenic.com> adds:

mpm> On Thu, Jul 01, 2004 at 12:54:50PM -0400, Jeff Moyer wrote:
>> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall
>> <mpm@selenic.com> adds:
>> 
mpm> On Mon, Jun 28, 2004 at 12:58:22PM -0400, Jeff Moyer wrote:
>> >> ==> Regarding Re: netconsole hangs w/ alt-sysrq-t; Matt Mackall >>
>> <mpm@selenic.com> adds:
mpm> No, I think we get to this on the non-NAPI route as well. The ->poll
mpm> check keeps us from filtering twice.
>> >> I'll have to take your word for it on this one, as I can't find a way
>> >> into this code for the non-napi case.  Would anyone care to give an
>> >> authoritative answer on this?
>> 
mpm> I could be mistaken, but that's my recollection from last summer.
mpm> Hopefully I'll have some spare cycles for this next week.
>> >> One other thing we might consider is adding a call to >>
>> touch_nmi_watchdog() to zap_completion_queue.
>> 
mpm> Not sure how I feel about that yet. If we can't get out of the guts of
mpm> netpoll in a timely fashion, then perhaps that's an indication that
mpm> the watchdog should indeed fire. Describe the scenario where you think
mpm> we should do this, please.
>> Alt-Sysrq-t from the keyboard on a heavily-loaded UP system.

mpm> Is it because we're spinning for too long waiting for skbs or is it
mpm> just because sysrq-t takes that long to dump its guts? How does the
mpm> watchdog get tickled in the sysrq-t over slow serial console case?

mpm> In other words, would this better be done between dumps in the sysrq-t
mpm> loop where we actually know we're making forward progress?

Good point.  I tend to center my thinking too much around netdump.  You're
right, if we're waiting too long for skb's, we probably should trigger the
NMI watchdog.  It also appears that show_state includes a call to
touch_nmi_watchdog in its loop.

-Jeff

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2004-07-01 17:55 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-22 15:29 netconsole hangs w/ alt-sysrq-t Jeff Moyer
2004-04-25 19:15 ` Matt Mackall
2004-04-28 12:44   ` Jeff Moyer
2004-04-28 14:03     ` Matt Mackall
2004-04-28 14:07       ` Jeff Moyer
2004-04-28 14:27         ` Matt Mackall
2004-04-28 18:23           ` Jeff Moyer
2004-05-20  2:20             ` klogd, netconsole without /dev/console? Tom Oehser
2004-06-25 15:50             ` netconsole hangs w/ alt-sysrq-t Jeff Moyer
2004-06-25 23:27               ` Matt Mackall
2004-06-28 14:48                 ` Jeff Moyer
2004-06-28 15:19                   ` Matt Mackall
2004-06-28 16:58                     ` Jeff Moyer
2004-06-29 12:36                       ` Jeff Moyer
2004-07-01 16:52                       ` Matt Mackall
2004-07-01 16:54                         ` Jeff Moyer
2004-07-01 17:05                           ` Matt Mackall
2004-07-01 17:53                             ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.