linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.7 softirq incorrectness.
@ 2001-07-22 20:44 Rusty Russell
  2001-07-22 23:34 ` Andrea Arcangeli
  0 siblings, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2001-07-22 20:44 UTC (permalink / raw)
  To: Andrea Arcangeli, torvalds; +Cc: linux-kernel

This current code is bogus.  Consider:
	spin_lock_irqsave(flags);	
	cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
	spin_unlock_irqrestore(flags);

Oops... softirq not run until the next interrupt.  So, EITHER:

1) make local_irq_restore check for pending softirqs in as we do for
   local_bh_enable(), and get rid of the wakeup_softirqd() in
   cpu_raise_softirq().  ie. all "exits" from in_interrupt == true are
   symmetrical.

*OR*

2) Change the check in cpu_raise_softirq to:
	if (!in_hw_irq_handler(cpu))
		wakeup_softirqd(cpu);

   and implement in_hw_irq_handler() on all platforms.  Then get rid of
   the test in local_bh_enable().

Please pick one approach or the other, and stick with it!  The current
code does half of each, badly. 8(

Thanks,
Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-22 20:44 2.4.7 softirq incorrectness Rusty Russell
@ 2001-07-22 23:34 ` Andrea Arcangeli
  2001-07-23  9:06   ` Rusty Russell
                     ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-22 23:34 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, linux-kernel

On Mon, Jul 23, 2001 at 06:44:10AM +1000, Rusty Russell wrote:
> This current code is bogus.  Consider:
> 	spin_lock_irqsave(flags);	
> 	cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
> 	spin_unlock_irqrestore(flags);

What kernel are you looking at? There's no such code in 2.4.7, the only
two raise of the NET_RX_SOFTIRQ softirq are in dev.c in net_rx_action
and netif_rx:

here the one in netif_rx:

			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
			local_irq_restore(flags);

here the one in net_rx_action:

	/* This already runs in BH context, no need to wake up BH's */
	__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
	local_irq_enable();

> Oops... softirq not run until the next interrupt.  So, EITHER:

The first netif_rx is required to run from interrupt handler (otherwise
we should have executed cpu_raise_softirq and not __cpu_raise_softirq)
so we cannot miss the do_softirq in the return path from do_IRQ() and so
we cannot wait for the next incoming interrupt (if we have a overflow of
the do_softirq loop ksoftirqd will take care of it without waiting for
the next interrupt as it could instead happen in old 2.4 kernels).

The second net_rx_action is running into the softirq code itself that
will marks itself runnable again and this will generate a do_softirq
overflow that is handled gracefully by ksoftirqd again without waiting
for the next interrupt (in old 2.4 kernels you had to wait for the next
irq instead).

I cannot see any problem.

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-22 23:34 ` Andrea Arcangeli
@ 2001-07-23  9:06   ` Rusty Russell
  2001-07-23 14:29     ` Andrea Arcangeli
  2001-07-23  9:25   ` Kai Germaschewski
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2001-07-23  9:06 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: torvalds, linux-kernel

In message <20010723013416.B23517@athlon.random> you write:
> On Mon, Jul 23, 2001 at 06:44:10AM +1000, Rusty Russell wrote:
> > This current code is bogus.  Consider:
> > 	spin_lock_irqsave(flags);	
> > 	cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
> > 	spin_unlock_irqrestore(flags);
> 
> What kernel are you looking at? There's no such code in 2.4.7, the only

Oh, so it's only a trap *waiting* to happen.  That's OK then!

> The first netif_rx is required to run from interrupt handler (otherwise
> we should have executed cpu_raise_softirq and not
> __cpu_raise_softirq)

Aside: why does it do a local_irq_save() if it's always run from an
interrupt handler?

> I cannot see any problem.

Why not fix all the cases?  Why have this wierd secret rule that
cpu_raise_softirq() should not be called with irqs disabled?

Call me old-fashioned, but why not *fix* the problem, if you're going
to rewrite this code... again...

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-22 23:34 ` Andrea Arcangeli
  2001-07-23  9:06   ` Rusty Russell
@ 2001-07-23  9:25   ` Kai Germaschewski
  2001-07-23 11:12     ` Jeff Garzik
  2001-07-23 14:18     ` Andrea Arcangeli
  2001-07-23 12:05   ` David S. Miller
  2001-07-23 22:24   ` Alexey Kuznetsov
  3 siblings, 2 replies; 41+ messages in thread
From: Kai Germaschewski @ 2001-07-23  9:25 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rusty Russell, torvalds, linux-kernel

On Mon, 23 Jul 2001, Andrea Arcangeli wrote:

> here the one in netif_rx:
> 
> 			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
> 			local_irq_restore(flags);
> [...] 

> The first netif_rx is required to run from interrupt handler (otherwise
> we should have executed cpu_raise_softirq and not __cpu_raise_softirq)
> so we cannot miss the do_softirq in the return path from do_IRQ() and so
> we cannot wait for the next incoming interrupt (if we have a overflow of
> the do_softirq loop ksoftirqd will take care of it without waiting for
> the next interrupt as it could instead happen in old 2.4 kernels).

Hmmh, wait a second. I take it that means calling netif_rx not from 
hard-irq context, but e.g. from bh is a bug? (Even if the only consequence 
is delaying the processing by up to one timer tick?)

If so, I believe this bug exists in a couple of a places. One example is -
of course - the ISDN code, where netif_rx() is called from bh context. But
I would think that e.g. ppp_generic is affected as well.

Could you clarify this? Maybe adding something like the following to 
netif_rx() is a good idea?

	if (!in_irq)
		printk(KERN_WARNING "netif_rx called not in_irq() from %p\n",
		       __builtin_return_address(0));

--Kai


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23  9:25   ` Kai Germaschewski
@ 2001-07-23 11:12     ` Jeff Garzik
  2001-07-23 14:18     ` Andrea Arcangeli
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff Garzik @ 2001-07-23 11:12 UTC (permalink / raw)
  To: Kai Germaschewski; +Cc: Andrea Arcangeli, Rusty Russell, torvalds, linux-kernel

Kai Germaschewski wrote:
> 
> On Mon, 23 Jul 2001, Andrea Arcangeli wrote:
> 
> > here the one in netif_rx:
> >
> >                       __cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
> >                       local_irq_restore(flags);
> > [...]
> 
> > The first netif_rx is required to run from interrupt handler (otherwise
> > we should have executed cpu_raise_softirq and not __cpu_raise_softirq)
> > so we cannot miss the do_softirq in the return path from do_IRQ() and so
> > we cannot wait for the next incoming interrupt (if we have a overflow of
> > the do_softirq loop ksoftirqd will take care of it without waiting for
> > the next interrupt as it could instead happen in old 2.4 kernels).
> 
> Hmmh, wait a second. I take it that means calling netif_rx not from
> hard-irq context, but e.g. from bh is a bug? (Even if the only consequence
> is delaying the processing by up to one timer tick?)
> 
> If so, I believe this bug exists in a couple of a places. One example is -
> of course - the ISDN code, where netif_rx() is called from bh context. But
> I would think that e.g. ppp_generic is affected as well.

no, as long as you mark the bh (or local_enable/disable_bh) you will run
the softirqs.

-- 
Jeff Garzik      | "I wouldn't be so judgemental
Building 1024    |  if you weren't such a sick freak."
MandrakeSoft     |             -- goats.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-22 23:34 ` Andrea Arcangeli
  2001-07-23  9:06   ` Rusty Russell
  2001-07-23  9:25   ` Kai Germaschewski
@ 2001-07-23 12:05   ` David S. Miller
  2001-07-23 14:31     ` Andrea Arcangeli
  2001-07-23 22:24   ` Alexey Kuznetsov
  3 siblings, 1 reply; 41+ messages in thread
From: David S. Miller @ 2001-07-23 12:05 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Andrea Arcangeli, torvalds, linux-kernel


Rusty Russell writes:
 > In message <20010723013416.B23517@athlon.random> you write:
 > > What kernel are you looking at? There's no such code in 2.4.7, the only
 > 
 > Oh, so it's only a trap *waiting* to happen.  That's OK then!
 ...
 > Why not fix all the cases?  Why have this wierd secret rule that
 > cpu_raise_softirq() should not be called with irqs disabled?

Why keep it secret?  I think Andrea is exactly right here, and we
should just comment this restriction.  That's all.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23  9:25   ` Kai Germaschewski
  2001-07-23 11:12     ` Jeff Garzik
@ 2001-07-23 14:18     ` Andrea Arcangeli
  1 sibling, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-23 14:18 UTC (permalink / raw)
  To: Kai Germaschewski; +Cc: Rusty Russell, torvalds, linux-kernel

On Mon, Jul 23, 2001 at 11:25:23AM +0200, Kai Germaschewski wrote:
> Hmmh, wait a second. I take it that means calling netif_rx not from 
> hard-irq context, but e.g. from bh is a bug? (Even if the only consequence 

No it isn't a bug either, calling it from irq or bh is perfectly fine
w.r.t. softirq latency. If you post it from softirq ksoftirqd will
handle the overflow event exactly like in net_rx_action.

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23  9:06   ` Rusty Russell
@ 2001-07-23 14:29     ` Andrea Arcangeli
  2001-07-24  9:35       ` Rusty Russell
  0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-23 14:29 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, linux-kernel

On Mon, Jul 23, 2001 at 07:06:40PM +1000, Rusty Russell wrote:
> Aside: why does it do a local_irq_save() if it's always run from an
> interrupt handler?

to avoid corrupting the backlog with nested irqs.

However if you want a microoptimization is to sti before
__cpu_raise_softirq, __cpu_raise_softirq from 2.4.7 is required to be
atomic with respect of irqs (but it doesn't need to be atomic with
respect of SMP). in the x86 port is handled as a single not locked bts
instruction. So it can be run with irq enabled.

Here the optimization:

--- 2.4.7aa1/net/core/dev.c.~1~	Sat Jul 21 00:04:34 2001
+++ 2.4.7aa1/net/core/dev.c	Mon Jul 23 16:21:35 2001
@@ -1217,10 +1217,10 @@
 enqueue:
 			dev_hold(skb->dev);
 			__skb_queue_tail(&queue->input_pkt_queue,skb);
+			local_irq_restore(flags);
 
 			/* Runs from irqs or BH's, no need to wake BH */
 			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-			local_irq_restore(flags);
 #ifndef OFFLINE_SAMPLE
 			get_sample_stats(this_cpu);
 #endif
@@ -1529,10 +1529,10 @@
 
 	local_irq_disable();
 	netdev_rx_stat[this_cpu].time_squeeze++;
+	local_irq_enable();
 
 	/* This already runs in BH context, no need to wake up BH's */
 	__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-	local_irq_enable();
 
 	NET_PROFILE_LEAVE(softnet_process);
 	return;

> > I cannot see any problem.
> 
> Why not fix all the cases?  Why have this wierd secret rule that
> cpu_raise_softirq() should not be called with irqs disabled?

cpu_raise_softirq _can_ be called with irq disabled too just now, irq
enabled or disabled has no influence at all on cpu_raise_softirq.

The fact you are running on a irq handler or not has influence instead,
if you are running in a irq handler do_IRQ will take care of the
latency, if you are running in normal kernel code ksoftirqd will take
care of the latency, and both cases are handled perfectly right.

> Call me old-fashioned, but why not *fix* the problem, if you're going
> to rewrite this code... again...

There's no problem at all to fix, everything is just fine from 2.4.7,
period.

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23 12:05   ` David S. Miller
@ 2001-07-23 14:31     ` Andrea Arcangeli
  0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-23 14:31 UTC (permalink / raw)
  To: David S. Miller; +Cc: Rusty Russell, torvalds, linux-kernel

On Mon, Jul 23, 2001 at 05:05:00AM -0700, David S. Miller wrote:
> 
> Rusty Russell writes:
>  > In message <20010723013416.B23517@athlon.random> you write:
>  > > What kernel are you looking at? There's no such code in 2.4.7, the only
>  > 
>  > Oh, so it's only a trap *waiting* to happen.  That's OK then!
>  ...
>  > Why not fix all the cases?  Why have this wierd secret rule that
>  > cpu_raise_softirq() should not be called with irqs disabled?
> 
> Why keep it secret?  I think Andrea is exactly right here, and we
> should just comment this restriction.  That's all.

there's not such restriction as far I can tell (see my previous email
for details).

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-22 23:34 ` Andrea Arcangeli
                     ` (2 preceding siblings ...)
  2001-07-23 12:05   ` David S. Miller
@ 2001-07-23 22:24   ` Alexey Kuznetsov
  2001-07-25 22:23     ` Andrea Arcangeli
  3 siblings, 1 reply; 41+ messages in thread
From: Alexey Kuznetsov @ 2001-07-23 22:24 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello!

> The first netif_rx is required to run from interrupt handler 

No! netif_rx() is called from _any_ context. Check with grep.
So, this must be repaired in some way.

Actually, assumption that local_bh_enable() etc does not happen
with disabled irq was the biggest hole in Ingo's patch: all the functions
ever doing local_irq_save() assume that they _can_ be called with
disabled irqs (otherwise they would make local_irq_disable() instead)
and, hence, some spinlocks held.



> for the next interrupt (in old 2.4 kernels you had to wait for the next
> irq instead).

Well, not "had to", a small delay was purpose of this call yet.
Actually, now it can be replaced with a direct schedule ksoftirqd,
because we surely have softirq flood, when this place is reached.

Alexey


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23 14:29     ` Andrea Arcangeli
@ 2001-07-24  9:35       ` Rusty Russell
  2001-07-25 19:33         ` Andrea Arcangeli
  0 siblings, 1 reply; 41+ messages in thread
From: Rusty Russell @ 2001-07-24  9:35 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: torvalds, linux-kernel

In message <20010723162909.D822@athlon.random> you write:
> > Why not fix all the cases?  Why have this wierd secret rule that
> > cpu_raise_softirq() should not be called with irqs disabled?
> 
> cpu_raise_softirq _can_ be called with irq disabled too just now, irq
> enabled or disabled has no influence at all on cpu_raise_softirq.

No, you are wrong.  If I do (NOT in a hw interrupt handler):

	local_irq_save(flags);
	...
	cpu_raise_softirq(smp_processor_id(), FOO_SOFTIRQ);
	...
	local_irq_restore(flags);

ksoftirqd won't get woken, and the FOO soft irq won't get run until
the next interrupt comes in.  You solved (horribly) the analagous case
for local_bh_disable/enable, but not this one.

Below as suggested in my previous email (x86 only, untested).  I also
added a couple of comments.  There's still the issue of stack
overflows if you get hit hard enough with interrupts (do_softirq is
exposed to reentry for short periods), but that's separate.

Linus, please consider,
Rusty.
--
Premature optmztion is rt of all evl. --DK

diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/arch/i386/kernel/irq.c working-2.4.7-unclean/arch/i386/kernel/irq.c
--- linux-2.4.7-official/arch/i386/kernel/irq.c	Sat Jul  7 07:48:47 2001
+++ working-2.4.7-unclean/arch/i386/kernel/irq.c	Tue Jul 24 17:29:19 2001
@@ -576,6 +576,7 @@
 	unsigned int status;
 
 	kstat.irqs[cpu][irq]++;
+	in_hw_irq(cpu) = 1;
 	spin_lock(&desc->lock);
 	desc->handler->ack(irq);
 	/*
@@ -633,6 +634,7 @@
 	 */
 	desc->handler->end(irq);
 	spin_unlock(&desc->lock);
+	in_hw_irq(cpu) = 0;
 
 	if (softirq_pending(cpu))
 		do_softirq();
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/include/asm-i386/hardirq.h working-2.4.7-unclean/include/asm-i386/hardirq.h
--- linux-2.4.7-official/include/asm-i386/hardirq.h	Sun Jul 22 13:13:21 2001
+++ working-2.4.7-unclean/include/asm-i386/hardirq.h	Tue Jul 24 18:40:09 2001
@@ -13,6 +13,7 @@
 	unsigned int __syscall_count;
 	struct task_struct * __ksoftirqd_task; /* waitqueue is too large */
 	unsigned int __nmi_count;	/* arch dependent */
+	int __in_hw_irq;
 } ____cacheline_aligned irq_cpustat_t;
 
 #include <linux/irq_cpustat.h>	/* Standard mappings for irq_cpustat_t above */
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/include/asm-i386/softirq.h working-2.4.7-unclean/include/asm-i386/softirq.h
--- linux-2.4.7-official/include/asm-i386/softirq.h	Sun Jul 22 13:13:21 2001
+++ working-2.4.7-unclean/include/asm-i386/softirq.h	Tue Jul 24 19:07:24 2001
@@ -4,46 +4,15 @@
 #include <asm/atomic.h>
 #include <asm/hardirq.h>
 
-#define __cpu_bh_enable(cpu) \
+#define cpu_bh_enable(cpu) \
 		do { barrier(); local_bh_count(cpu)--; } while (0)
 #define cpu_bh_disable(cpu) \
 		do { local_bh_count(cpu)++; barrier(); } while (0)
 
 #define local_bh_disable()	cpu_bh_disable(smp_processor_id())
-#define __local_bh_enable()	__cpu_bh_enable(smp_processor_id())
+#define local_bh_enable()	cpu_bh_enable(smp_processor_id())
 
 #define in_softirq() (local_bh_count(smp_processor_id()) != 0)
-
-/*
- * NOTE: this assembly code assumes:
- *
- *    (char *)&local_bh_count - 8 == (char *)&softirq_pending
- *
- * If you change the offsets in irq_stat then you have to
- * update this code as well.
- */
-#define local_bh_enable()						\
-do {									\
-	unsigned int *ptr = &local_bh_count(smp_processor_id());	\
-									\
-	barrier();							\
-	if (!--*ptr)							\
-		__asm__ __volatile__ (					\
-			"cmpl $0, -8(%0);"				\
-			"jnz 2f;"					\
-			"1:;"						\
-									\
-			".section .text.lock,\"ax\";"			\
-			"2: pushl %%eax; pushl %%ecx; pushl %%edx;"	\
-			"call %c1;"					\
-			"popl %%edx; popl %%ecx; popl %%eax;"		\
-			"jmp 1b;"					\
-			".previous;"					\
-									\
-		: /* no output */					\
-		: "r" (ptr), "i" (do_softirq)				\
-		/* no registers clobbered */ );				\
-} while (0)
 
 #define __cpu_raise_softirq(cpu, nr) __set_bit(nr, &softirq_pending(cpu))
 
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/include/linux/irq_cpustat.h working-2.4.7-unclean/include/linux/irq_cpustat.h
--- linux-2.4.7-official/include/linux/irq_cpustat.h	Sun Jul 22 13:13:24 2001
+++ working-2.4.7-unclean/include/linux/irq_cpustat.h	Tue Jul 24 18:40:09 2001
@@ -31,6 +31,7 @@
 #define local_bh_count(cpu)	__IRQ_STAT((cpu), __local_bh_count)
 #define syscall_count(cpu)	__IRQ_STAT((cpu), __syscall_count)
 #define ksoftirqd_task(cpu)	__IRQ_STAT((cpu), __ksoftirqd_task)
+#define in_hw_irq(cpu)		__IRQ_STAT((cpu), __in_hw_irq)
   /* arch dependent irq_stat fields */
 #define nmi_count(cpu)		__IRQ_STAT((cpu), __nmi_count)		/* i386, ia64 */
 
diff -urN -I \$.*\$ --exclude TAGS -X /home/rusty/devel/kernel/kernel-patches/current-dontdiff --minimal linux-2.4.7-official/kernel/softirq.c working-2.4.7-unclean/kernel/softirq.c
--- linux-2.4.7-official/kernel/softirq.c	Sun Jul 22 13:13:25 2001
+++ working-2.4.7-unclean/kernel/softirq.c	Tue Jul 24 19:26:03 2001
@@ -68,6 +68,7 @@
 	long flags;
 	__u32 mask;
 
+	/* Prevent reentry: we always hold an irq or bh count */
 	if (in_interrupt())
 		return;
 
@@ -102,8 +103,10 @@
 			mask &= ~pending;
 			goto restart;
 		}
-		__local_bh_enable();
+		local_bh_enable();
 
+		/* More came in while we were processing?  Let
+                   softirqd handle the overload. */
 		if (pending)
 			wakeup_softirqd(cpu);
 	}
@@ -115,16 +118,10 @@
 {
 	__cpu_raise_softirq(cpu, nr);
 
-	/*
-	 * If we're in an interrupt or bh, we're done
-	 * (this also catches bh-disabled code). We will
-	 * actually run the softirq once we return from
-	 * the irq or bh.
-	 *
-	 * Otherwise we wake up ksoftirqd to make sure we
-	 * schedule the softirq soon.
-	 */
-	if (!(local_irq_count(cpu) | local_bh_count(cpu)))
+	/* If we're in an interrupt handler, then softirqs get
+	   processed on the way out.  For all other (rarer) cases,
+	   wake softirqd to deal with it soon. */
+	if (!in_hw_irq(cpu))
 		wakeup_softirqd(cpu);
 }
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-24  9:35       ` Rusty Russell
@ 2001-07-25 19:33         ` Andrea Arcangeli
  2001-07-26 20:26           ` Rusty Russell
  0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-25 19:33 UTC (permalink / raw)
  To: Rusty Russell; +Cc: torvalds, linux-kernel

On Tue, Jul 24, 2001 at 07:35:10PM +1000, Rusty Russell wrote:
> In message <20010723162909.D822@athlon.random> you write:
> > > Why not fix all the cases?  Why have this wierd secret rule that
> > > cpu_raise_softirq() should not be called with irqs disabled?
> > 
> > cpu_raise_softirq _can_ be called with irq disabled too just now, irq
> > enabled or disabled has no influence at all on cpu_raise_softirq.
> 
> No, you are wrong.  If I do (NOT in a hw interrupt handler):
> 
> 	local_irq_save(flags);
> 	...
> 	cpu_raise_softirq(smp_processor_id(), FOO_SOFTIRQ);
> 	...
> 	local_irq_restore(flags);
> 
> ksoftirqd won't get woken, and the FOO soft irq won't get run until

You are wrong, please check again all the code involved.

inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
{
	__cpu_raise_softirq(cpu, nr);

	/*
	 * If we're in an interrupt or bh, we're done
	 * (this also catches bh-disabled code). We will
	 * actually run the softirq once we return from
	 * the irq or bh.
	 *
	 * Otherwise we wake up ksoftirqd to make sure we
	 * schedule the softirq soon.
	 */
	if (!(local_irq_count(cpu) | local_bh_count(cpu)))
		wakeup_softirqd(cpu);
}

If you are not in hw interrupt local_irq_count is zero so you will run
wakeup_softirqd().

The fact irq are enabled or disabled has no influence on the logic.

> the next interrupt comes in.  You solved (horribly) the analagous case
> for local_bh_disable/enable, but not this one.

I didn't changed at all local_bh_enable (except a fix for a missing
barrier()), local_bh_enable/disable was solved by Ingo in 2.4.6.

> Below as suggested in my previous email (x86 only, untested).  I also

It seems you're duplicating the local_irq_count functionalty plus you
break local_bh_enable, from local_bh_enable you want to run the softirq
immediatly.

> added a couple of comments.  There's still the issue of stack
> overflows if you get hit hard enough with interrupts (do_softirq is
> exposed to reentry for short periods), but that's separate.

do_softirq can be re-entered but on re-entry it will return immediatly
because local_bh_count will be non zero in such case, so the only stack
overhead of a re-entry is a few words so it cannot harm (if stack
overflows it cannot be because of do_softirq re-entry).

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-23 22:24   ` Alexey Kuznetsov
@ 2001-07-25 22:23     ` Andrea Arcangeli
  2001-07-26 17:46       ` kuznet
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-25 22:23 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: linux-kernel

On Tue, Jul 24, 2001 at 02:24:47AM +0400, Alexey Kuznetsov wrote:
> Hello!
> 
> > The first netif_rx is required to run from interrupt handler 
> 
> No! netif_rx() is called from _any_ context. Check with grep.

Originally it was a cpu_raise_softirq, but David asked to put the __ so
Linus added the comment as well:

		/* Runs from irqs or BH's, no need to wake BH */

At that time I checked loopback that runs under the bh so it's ok too.

> So, this must be repaired in some way.

Yes.  If ethertap or others runs outside bh and irq they could use if
(pending) do_softirq by hand (or as worse wakeup ksoftirqd by hand)
after netif_rx.

> Actually, assumption that local_bh_enable() etc does not happen
> with disabled irq was the biggest hole in Ingo's patch: all the functions

I hoped it was never the case because when you serialize against bh it's
because you are using the bh logic instead of irqs and the whole point
of the bh logic is to left irq enabled. but I'm not surprised some
problem actually triggered because of this change.

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-25 22:23     ` Andrea Arcangeli
@ 2001-07-26 17:46       ` kuznet
  2001-07-26 18:03         ` Jeff Garzik
  2001-07-26 18:29         ` Andrea Arcangeli
  2001-07-27  0:47       ` Maksim Krasnyanskiy
  2001-07-27  9:34       ` David S. Miller
  2 siblings, 2 replies; 41+ messages in thread
From: kuznet @ 2001-07-26 17:46 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello!

> At that time I checked loopback that runs under the bh so it's ok too.

Well, it was not alone. I just looked at couple of places, when
netif_rx was used. One is right, another (looping multicasts) is wrong. :-)

So, is plain raising softirq and leaving it raised before return
to normal context not a bug? If so, then no problems.
The worst, which can happen is that it will work as earlier, right?
And we are allowed to yuild bhs at any point, when we desire. Nice.

Actually, also I was afraid opposite thing: netif_rx was used to allow
to restart processing of skb, when we were in wrong context or were afraid
recursion. And the situation, when it is called with disabled irqs and/or
raised spinlock_irq (it was valid very recently!), is undetectable.
Actually, I hope such places are absent, networking core does not use
irq protection at all, except for netif_rx() yet. :-)


> after netif_rx.

But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
Is this not right?

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-26 17:46       ` kuznet
@ 2001-07-26 18:03         ` Jeff Garzik
  2001-07-26 18:29         ` Andrea Arcangeli
  1 sibling, 0 replies; 41+ messages in thread
From: Jeff Garzik @ 2001-07-26 18:03 UTC (permalink / raw)
  To: kuznet; +Cc: Andrea Arcangeli, linux-kernel

kuznet@ms2.inr.ac.ru wrote:
> > after netif_rx.
> 
> But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
> Is this not right?

that's fine, and was one of the suggested solutions during the earlier
discussion of Andrea's fix.

-- 
Jeff Garzik      | "Mind if I drive?" -Sam
Building 1024    | "Not if you don't mind me clawing at the dash
MandrakeSoft     |  and shrieking like a cheerleader." -Max

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-26 17:46       ` kuznet
  2001-07-26 18:03         ` Jeff Garzik
@ 2001-07-26 18:29         ` Andrea Arcangeli
  2001-07-27 16:48           ` kuznet
  2001-07-27 18:31           ` Maksim Krasnyanskiy
  1 sibling, 2 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-26 18:29 UTC (permalink / raw)
  To: kuznet; +Cc: linux-kernel

On Thu, Jul 26, 2001 at 09:46:52PM +0400, kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> > At that time I checked loopback that runs under the bh so it's ok too.
> 
> Well, it was not alone. I just looked at couple of places, when
> netif_rx was used. One is right, another (looping multicasts) is wrong. :-)
> 
> So, is plain raising softirq and leaving it raised before return
> to normal context not a bug? If so, then no problems.
> The worst, which can happen is that it will work as earlier, right?

Depends what you mean with 'normal context'. If you mean 'userspace
context' then it is a bug, and in 2.4.5 we would been catching that case
in entry.S.

If there are lots of users of netif_rx outside bh or irq context I guess
this is the simpler way is:

--- 2.4.7/net/core/dev.c	Sat Jul 21 00:04:34 2001
+++ 2.4.7aa1/net/core/dev.c	Thu Jul 26 20:05:26 2001
@@ -1217,10 +1217,10 @@
 enqueue:
 			dev_hold(skb->dev);
 			__skb_queue_tail(&queue->input_pkt_queue,skb);
+			local_irq_restore(flags);
 
 			/* Runs from irqs or BH's, no need to wake BH */
-			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-			local_irq_restore(flags);
+			cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
 #ifndef OFFLINE_SAMPLE
 			get_sample_stats(this_cpu);
 #endif
@@ -1529,10 +1529,10 @@
 
 	local_irq_disable();
 	netdev_rx_stat[this_cpu].time_squeeze++;
+	local_irq_enable();
 
 	/* This already runs in BH context, no need to wake up BH's */
-	__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-	local_irq_enable();
+	cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
 
 	NET_PROFILE_LEAVE(softnet_process);
 	return;

> And we are allowed to yuild bhs at any point, when we desire. Nice.
> 
> Actually, also I was afraid opposite thing: netif_rx was used to allow
> to restart processing of skb, when we were in wrong context or were afraid
> recursion. And the situation, when it is called with disabled irqs and/or
> raised spinlock_irq (it was valid very recently!), is undetectable.

It should be detectable with this debugging code (untested but trivially
fixable if it doesn't compile):

--- 2.4.7aa1/include/asm-i386/softirq.h.~1~	Wed Jul 25 22:38:08 2001
+++ 2.4.7aa1/include/asm-i386/softirq.h	Thu Jul 26 20:22:28 2001
@@ -25,7 +25,11 @@
 #define local_bh_enable()						\
 do {									\
 	unsigned int *ptr = &local_bh_count(smp_processor_id());	\
+	unsigned long flags;						\
 									\
+	__save_flags(flags);						\
+	if (!(flags & (1 << 9)))					\
+		BUG();							\
 	barrier();							\
 	if (!--*ptr)							\
 		__asm__ __volatile__ (					\

> Actually, I hope such places are absent, networking core does not use
> irq protection at all, except for netif_rx() yet. :-)

I hope too :).

> > after netif_rx.
> 
> But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
> Is this not right?

That is certainly right. However it is slower than just doing if
(pending) do_softirq()  after netif_rx().

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-25 19:33         ` Andrea Arcangeli
@ 2001-07-26 20:26           ` Rusty Russell
  0 siblings, 0 replies; 41+ messages in thread
From: Rusty Russell @ 2001-07-26 20:26 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: torvalds, linux-kernel

In message <20010725213351.A32148@athlon.random> you write:
> You are wrong, please check again all the code involved.

Apologies: you are absolutely correct, disregard my crap patch.
local_irq_save does not inc local_irq_count, unlike local_bh_disable.
Oops.

> do_softirq can be re-entered but on re-entry it will return immediatly

"immediately" leaves a narrow window.  Sure, it'd be really hard to
hit, but it's still there.

Rusty.
--
Premature optmztion is rt of all evl. --DK

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-25 22:23     ` Andrea Arcangeli
  2001-07-26 17:46       ` kuznet
@ 2001-07-27  0:47       ` Maksim Krasnyanskiy
  2001-07-27 15:01         ` Andrea Arcangeli
  2001-07-27  9:34       ` David S. Miller
  2 siblings, 1 reply; 41+ messages in thread
From: Maksim Krasnyanskiy @ 2001-07-27  0:47 UTC (permalink / raw)
  To: Andrea Arcangeli, kuznet; +Cc: linux-kernel


>If there are lots of users of netif_rx outside bh or irq context I guess
>this is the simpler way is:
I'm not sure about a lot but there are certainly some. TUN/TAP driver for example.
It calls netif_rx from user context (syscall). 

> > > after netif_rx.
> > 
> > But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
> > Is this not right?
>
>That is certainly right. However it is slower than just doing if (pending) do_softirq()  after netif_rx().
Should we then create generic function (something like netif_rx_from_user) than will call do_softirq 
after calling netif_rx ?

btw I think I just found another problem with softirqs. Actually with tasklets. May be you guys can help me with that.
The problem is that my tasklet is getting run on 2 cpu simultaneously and according to description in linux/interrupt.h 
it shouldn't.
I have a simple tx_task. All it does is sending queued stuff to the device. Whenever I need to send something 
I queue it and do tasklet_schedule(tx_task). Everything works just fine but on SMP machine I noticed that sometimes 
data is sent in the wrong order. And the only reason why reordering could happen is if several tx_tasks are runing at the 
same time. So I added a simple check in tasklet function to complain if it's already running. 
Here is the simplified code example:

tasklet_init(&hdev->tx_task, hci_tx_task, (unsigned long) hdev);

static void hci_tx_task(unsigned long arg)
{
         struct hci_dev *hdev = (struct hci_dev *) arg;
         struct sk_buff *skb;

         static __u32 r = 0;

         if (test_and_set_bit(1, &r)) {
                 ERR("already running cpu %d", smp_processor_id());
                 return;
         }

         /* Send next queued packet */
         while ((skb = skb_dequeue(&hdev->raw_q)))
                 hci_send_frame(skb);

         clear_bit(1, &r);
}

tasklet_schedule(&hdev->tx_task);

Now everything is in order and I'm getting bunch of "already running" messages with different cpu_id.
(hci_tx_task is never called directly).

So, I looked at the tasklet_schedule and tasklet_action code. And it seems to me that tasklet can be scheduled on several 
cpus at the same time. Here is scenario:
         tasklet X is running on cpu 1. tasklet_action clears STATE_SCHED bit and calls tasklet function.
         tasklet_schedule(taskletX) is called on cpu 2. Since STATE_SCHED is cleared tasklet X is scheduled.
         tasklet_action is called on cpu 2 (tasklet X functions is still running on cpu 1)
In that case tasklet_action on cpu 2 should hit a:
         if (!tasklet_trylock(t))
                 BUG();
because tasklet is still locked by cpu 1. For some reason it doesn't though (I don't see any "kernel bug" messages). 
But my tx_task is run second time.

Comments ?

Max

Maksim Krasnyanskiy	
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-25 22:23     ` Andrea Arcangeli
  2001-07-26 17:46       ` kuznet
  2001-07-27  0:47       ` Maksim Krasnyanskiy
@ 2001-07-27  9:34       ` David S. Miller
  2001-07-27 17:01         ` kuznet
  2 siblings, 1 reply; 41+ messages in thread
From: David S. Miller @ 2001-07-27  9:34 UTC (permalink / raw)
  To: kuznet; +Cc: Andrea Arcangeli, linux-kernel


kuznet@ms2.inr.ac.ru writes:
 > So, is plain raising softirq and leaving it raised before return
 > to normal context not a bug? If so, then no problems.

Do you mean "user context" when you say normal?  Or just arbitrary
non-interrupt context.  In fact, to me, no specific execution context
stands out with description of "normal".  All of them are normal
:-))))

 > > after netif_rx.
 > 
 > But why not to do just local_bh_disable(); netif_rx(); local_bh_enable()?
 > Is this not right?

As Jeff mentioned, this is the cure we agreed to in earlier softirq
postings.

The reason I pushed to have netif_FOO use __cpu_raise_softirq() was
that I felt sick to my stomache when I saw a new whole function call
added to that spot.  It is one of the most imporant paths in packet
processing, and it runs regardless of protocol you use (well, except
perhaps AF_UNIX :-)))

Let us just fix the odd places that aren't calling things in the
correct environment.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-27  0:47       ` Maksim Krasnyanskiy
@ 2001-07-27 15:01         ` Andrea Arcangeli
  0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-27 15:01 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: kuznet, linux-kernel

On Thu, Jul 26, 2001 at 05:47:50PM -0700, Maksim Krasnyanskiy wrote:
> Should we then create generic function (something like netif_rx_from_user) than will call do_softirq 
> after calling netif_rx ?

creating such a function is certainly ok (it must first check pending()
before running do_softirq of course). The name shouldn't be "from user"
because we actually call it from normal kernel context.

> I queue it and do tasklet_schedule(tx_task). Everything works just fine but on SMP machine I noticed that sometimes 
> data is sent in the wrong order. And the only reason why reordering could happen is if several tx_tasks are runing at the 

Do you use tasklet_enable? This patch fixes a bug in tasklet_enable.
(bug found by David Mosemberg) We are thinking at more CPU friendly ways
to handle the tasklet_disable, Linus just had a suggestion, but I don't
have time right now to think much about the alternate approches (i'm at
ols), I will do next week. If you are usng tasklet_enable you may want
to give it a spin.

--- 2.4.7/include/linux/interrupt.h	Sun Jul 22 01:16:45 2001
+++ 2.4.7aa1/include/linux/interrupt.h	Thu Jul 26 21:08:16 2001
@@ -139,24 +139,26 @@
 static inline void tasklet_disable_nosync(struct tasklet_struct *t)
 {
 	atomic_inc(&t->count);
+	smp_mb__after_atomic_inc();
 }
 
 static inline void tasklet_disable(struct tasklet_struct *t)
 {
 	tasklet_disable_nosync(t);
 	tasklet_unlock_wait(t);
+	smp_mb();
 }
 
 static inline void tasklet_enable(struct tasklet_struct *t)
 {
-	if (atomic_dec_and_test(&t->count))
-		tasklet_schedule(t);
+	smp_mb__before_atomic_dec();
+	atomic_dec(&t->count);
 }
 
 static inline void tasklet_hi_enable(struct tasklet_struct *t)
 {
-	if (atomic_dec_and_test(&t->count))
-		tasklet_hi_schedule(t);
+	smp_mb__before_atomic_dec();
+	atomic_dec(&t->count);
 }
 
 extern void tasklet_kill(struct tasklet_struct *t);

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-26 18:29         ` Andrea Arcangeli
@ 2001-07-27 16:48           ` kuznet
  2001-07-27 18:31           ` Maksim Krasnyanskiy
  1 sibling, 0 replies; 41+ messages in thread
From: kuznet @ 2001-07-27 16:48 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Hello!

> If there are lots of users of netif_rx outside bh or irq context I guess
> this is the simpler way is:

I prefer to grep for netif_rx yet.

> It should be detectable with this debugging code (untested but trivially
> fixable if it doesn't compile):

It is worth to add to official kernel.

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-27  9:34       ` David S. Miller
@ 2001-07-27 17:01         ` kuznet
  0 siblings, 0 replies; 41+ messages in thread
From: kuznet @ 2001-07-27 17:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: andrea, linux-kernel

Hello!

> Do you mean "user context" when you say normal? 

Why "user" then? :-)

Well, after some digging in brains I find that I use home-made terminology.
non-interrupt/non-bh context is called "process context" here. :-)
"Normal context" is process context with enabled BHs and irqs.


> The reason I pushed to have netif_FOO use __cpu_raise_softirq() was
> that I felt sick to my stomache when I saw a new whole function call
> added to that spot. 

I experience even more unpleasant feelings, when thinking what would happen
if netif_rx() without these __ is called from under irq protected spinlock.

> Let us just fix the odd places that aren't calling things in the
> correct environment.

I caught you! :-) Each "context" is normal, but "environment" still can
be wrong. 

Yes, I agreed. It is the simplest solution. In any case, all the instances
of netif_rx are to be checked for spinlock-safeness.

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-26 18:29         ` Andrea Arcangeli
  2001-07-27 16:48           ` kuznet
@ 2001-07-27 18:31           ` Maksim Krasnyanskiy
  2001-07-27 18:59             ` kuznet
  2001-07-27 19:21             ` Maksim Krasnyanskiy
  1 sibling, 2 replies; 41+ messages in thread
From: Maksim Krasnyanskiy @ 2001-07-27 18:31 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: kuznet, linux-kernel


> > Should we then create generic function (something like netif_rx_from_user) than will call do_softirq 
> > after calling netif_rx ?
>
>creating such a function is certainly ok (it must first check pending()
>before running do_softirq of course). The name shouldn't be "from user"
>because we actually call it from normal kernel context.
Sure.

> > I queue it and do tasklet_schedule(tx_task). Everything works just fine but on SMP machine I noticed that sometimes 
> > data is sent in the wrong order. And the only reason why reordering could happen is if several tx_tasks are runing at the 
>
>Do you use tasklet_enable ? 
Yep. To sync rx and tx tasks.

>This patch fixes a bug in tasklet_enable.
>(bug found by David Mosemberg) We are thinking at more CPU friendly ways
>to handle the tasklet_disable, Linus just had a suggestion, but I don't
>have time right now to think much about the alternate approches (i'm at
>ols), I will do next week. If you are usng tasklet_enable you may want
>to give it a spin.
Applied to 2.4.8-pre1. Didn't make any difference. 
Also it doesn't fix the scenario that I described (reschedule while running). I'm still wondering why don't I hit that trylock/BUG 
in tasklet_action.

Thanks
Max

Maksim Krasnyanskiy	
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-27 18:31           ` Maksim Krasnyanskiy
@ 2001-07-27 18:59             ` kuznet
  2001-07-27 19:21             ` Maksim Krasnyanskiy
  1 sibling, 0 replies; 41+ messages in thread
From: kuznet @ 2001-07-27 18:59 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: andrea, linux-kernel

Hello!

> Applied to 2.4.8-pre1. Didn't make any difference.

Yes, that hole is mostly theoretical. At least, on intel. Seems, gcc is still
not enough clever to reorder atomic operations.


> Also it doesn't fix the scenario that I described (reschedule while running). I'm still wondering why don't I hit that trylock/BUG 
> in tasklet_action.

How old the problem is? Was it always present?

To be honest, this is too strong bug to believe to this at all. :-)

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-27 18:31           ` Maksim Krasnyanskiy
  2001-07-27 18:59             ` kuznet
@ 2001-07-27 19:21             ` Maksim Krasnyanskiy
  2001-07-27 19:35               ` kuznet
  2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
  1 sibling, 2 replies; 41+ messages in thread
From: Maksim Krasnyanskiy @ 2001-07-27 19:21 UTC (permalink / raw)
  To: kuznet; +Cc: andrea, linux-kernel


> > Also it doesn't fix the scenario that I described (reschedule while running). I'm still wondering why don't I hit that trylock/BUG 
> > in tasklet_action.
>
>How old the problem is ? Was it always present ?
That's a good question. Data ordering check in Bluetooth tools was introduced pretty recently.
So, before 2.4.7 I wasn't paying attention to it and there for didn't notice any problems with 
tasklets.

>To be honest, this is too strong bug to believe to this at all. :-)
:) Agree. But I checked all my code, tx_task is called from tasklet only. And I do see that it's getting run on several 
cpus at the same time.

Also don't you agree with that it's possible (at least in theory) to hit that trylock/BUG in tasklet_action ?

Max

Maksim Krasnyanskiy	
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: 2.4.7 softirq incorrectness.
  2001-07-27 19:21             ` Maksim Krasnyanskiy
@ 2001-07-27 19:35               ` kuznet
  2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
  1 sibling, 0 replies; 41+ messages in thread
From: kuznet @ 2001-07-27 19:35 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: andrea, linux-kernel

Hello!

> Also don't you agree with that it's possible (at least in theory) to hit that trylock/BUG in tasklet_action ?

Alas, I am not an expert in this area after Ingo's patch. Let's learn
together. At first sight, it must crash at this BUG() instead
of serialization, indeed. :-)

I am still afraid to boot kernels after 2.4.5. Feel discomfort. :-)

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-27 19:21             ` Maksim Krasnyanskiy
  2001-07-27 19:35               ` kuznet
@ 2001-07-28  0:52               ` Maksim Krasnyanskiy
  2001-07-28 17:41                 ` kuznet
                                   ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Maksim Krasnyanskiy @ 2001-07-28  0:52 UTC (permalink / raw)
  To: kuznet; +Cc: andrea Arcangeli, linux-kernel, torvalds, mingo


> > Also don't you agree with that it's possible (at least in theory) to hit that trylock/BUG in tasklet_action ?
>
>Alas, I am not an expert in this area after Ingo's patch. Let's learn
>together. At first sight, it must crash at this BUG() instead of serialization, indeed. :-)
>
>I am still afraid to boot kernels after 2.4.5. Feel discomfort. :-)
You have a very valid reasons for this :). 
I just tried 2.4.5 and guess what, it works _correctly_ e.g. it does not run same tasklet on several cpus.

I looked at the tasklet_schedule and friends. And what I found surprised me quite a bit. How did it went in ;)
Somebody misunderstood the purpose of the STATE_RUN (tasklet_trylock/unlock) and STATE_SCHED bits.
Here is how it should work (Alex correct me if I'm wrong here).
Purpose of STATE_SCHED is to protect tasklet from being _scheduled_ on the several cpus at the same time. 
When we run a tasklet we unlink it from the queue and clear STATE_SCHED to allow it to be scheduled again.
Purpose of STATE_RUN is to protect tasklet from being _run_ on the several cpu at the same time.
Before we run a tasklet we set STATE_RUN (tasklet_trylock) and we clear STATE_RUN(tasklet_unlock) when 
we're done with this tasklet.
In other word we run tasklet with STATE_RUN set (locked tasklet) but with STATE_SCHED cleared.

In 2.4.7 and 2.4.8pre1 (and probably in 2.4.6) 
tasklet_schedule calls tasklet_unlock after it schedules tasklet, which is _totaly_ wrong.
(that's why I was not hitting that trylock/BUG in tasklet_action)
tasklet_action assumes that if we found locked tasklet in our sched queue it's a BUG which is again wrong. 
Locked tasklet in the sched queue is prefectly fine, it just means that tasklet is still running on other cpu and we 
should just reschedule it (and that's what original code does btw).

Also I think uninlining tasklet_schedule was not a good idea. It's kinda critical especially for already scheduled case.
We make a function call just to find out that tasklet is already scheduled.  And there is no point in call disable_irq if
we're not gonna schedule tasklet. And there is no point in locking tasklet on the UP machines.

So, the patch (Alex you can safely boot latest kernels now :)). It restores original (correct) functionality and makes tasklet 
code a little more readable. It also includes patch to tasklet_enable/disable suggested by Andrea. 
Tested on my dual PPro SMP box, PII UP and Sparc64 UP.

Linus, please consider applying it because current code is _broken_.
(http://bluez.sf.net/tasklet_patch.gz in case my mailer messed it up)

Thanks
Max

--- linux/kernel/softirq.c.old  Fri Jul 27 17:19:52 2001
+++ linux/kernel/softirq.c      Fri Jul 27 15:57:50 2001
@@ -140,43 +140,7 @@
  /* Tasklets */
  
  struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned;
-
-void tasklet_schedule(struct tasklet_struct *t)
-{
-       unsigned long flags;
-       int cpu;
-
-       cpu = smp_processor_id();
-       local_irq_save(flags);
-       /*
-        * If nobody is running it then add it to this CPU's
-        * tasklet queue.
-        */
-       if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
-               t->next = tasklet_vec[cpu].list;
-               tasklet_vec[cpu].list = t;
-               cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
-               tasklet_unlock(t);
-       }
-       local_irq_restore(flags);
-}
-
-void tasklet_hi_schedule(struct tasklet_struct *t)
-{
-       unsigned long flags;
-       int cpu;
-
-       cpu = smp_processor_id();
-       local_irq_save(flags);
-
-       if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
-               t->next = tasklet_hi_vec[cpu].list;
-               tasklet_hi_vec[cpu].list = t;
-               cpu_raise_softirq(cpu, HI_SOFTIRQ);
-               tasklet_unlock(t);
-       }
-       local_irq_restore(flags);
-}
+struct tasklet_head tasklet_hi_vec[NR_CPUS] __cacheline_aligned;
  
  static void tasklet_action(struct softirq_action *a)
  {
@@ -193,16 +157,16 @@
  
                 list = list->next;
  
-               if (!tasklet_trylock(t))
-                       BUG();
-               if (!atomic_read(&t->count)) {
-                       if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-                               BUG();
-                       t->func(t->data);
+               if (tasklet_trylock(t)) {
+                       if (!atomic_read(&t->count)) {
+                               if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+                                       BUG();
+                               t->func(t->data);
+                               tasklet_unlock(t);
+                               continue;
+                       }
                         tasklet_unlock(t);
-                       continue;
                 }
-               tasklet_unlock(t);
  
                 local_irq_disable();
                 t->next = tasklet_vec[cpu].list;
@@ -212,10 +176,6 @@
         }
  }
  
-
-
-struct tasklet_head tasklet_hi_vec[NR_CPUS] __cacheline_aligned;
-
  static void tasklet_hi_action(struct softirq_action *a)
  {
         int cpu = smp_processor_id();
@@ -231,16 +191,16 @@
  
                 list = list->next;
  
-               if (!tasklet_trylock(t))
-                       BUG();
-               if (!atomic_read(&t->count)) {
-                       if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-                               BUG();
-                       t->func(t->data);
+               if (tasklet_trylock(t)) {
+                       if (!atomic_read(&t->count)) {
+                               if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+                                       BUG();
+                               t->func(t->data);
+                               tasklet_unlock(t);
+                               continue;
+                       }
                         tasklet_unlock(t);
-                       continue;
                 }
-               tasklet_unlock(t);
  
                 local_irq_disable();
                 t->next = tasklet_hi_vec[cpu].list;
@@ -249,7 +209,6 @@
                 local_irq_enable();
         }
  }
-
  
  void tasklet_init(struct tasklet_struct *t,
                   void (*func)(unsigned long), unsigned long data)
--- linux/include/linux/interrupt.h.old Fri Jul 27 17:20:11 2001
+++ linux/include/linux/interrupt.h     Fri Jul 27 17:26:37 2001
@@ -114,49 +114,93 @@
  #define DECLARE_TASKLET_DISABLED(name, func, data) \
  struct tasklet_struct name = { NULL, 0, ATOMIC_INIT(1), func, data }
  
+#define        TASKLET_STATE_SCHED 0x01        /* Tasklet is scheduled for execution */
+#define        TASKLET_STATE_RUN   0x02        /* Tasklet is running */
  
-enum
+struct tasklet_head {
+       struct tasklet_struct *list;
+} __attribute__ ((__aligned__(SMP_CACHE_BYTES)));
+
+extern  struct tasklet_head tasklet_vec[NR_CPUS];
+extern  struct tasklet_head tasklet_hi_vec[NR_CPUS];
+
+#ifdef CONFIG_SMP
+static inline int tasklet_trylock(struct tasklet_struct *t)
  {
-       TASKLET_STATE_SCHED,    /* Tasklet is scheduled for execution */
-       TASKLET_STATE_RUN       /* Tasklet is running */
-};
+       return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
+}
  
-struct tasklet_head
+static inline void tasklet_unlock(struct tasklet_struct *t)
  {
-       struct tasklet_struct *list;
-} __attribute__ ((__aligned__(SMP_CACHE_BYTES)));
+       smp_mb__before_clear_bit(); 
+       clear_bit(TASKLET_STATE_RUN, &(t)->state);
+}
  
-extern struct tasklet_head tasklet_vec[NR_CPUS];
-extern struct tasklet_head tasklet_hi_vec[NR_CPUS];
+static inline void tasklet_unlock_wait(struct tasklet_struct *t)
+{
+       while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+}
+#else
+#define tasklet_trylock(t) 1
+#define tasklet_unlock_wait(t)
+#define tasklet_unlock(t)
+#endif
+
+static inline void tasklet_schedule(struct tasklet_struct *t)
+{
+       unsigned long flags;
+       int cpu;
+
+       cpu = smp_processor_id();
+
+       if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+               local_irq_save(flags);
+               t->next = tasklet_vec[cpu].list;
+               tasklet_vec[cpu].list = t;
+               cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
+               local_irq_restore(flags);
+       }
+}
  
-#define tasklet_trylock(t) (!test_and_set_bit(TASKLET_STATE_RUN, &(t)->state))
-#define tasklet_unlock(t) do { smp_mb__before_clear_bit(); clear_bit(TASKLET_STATE_RUN, &(t)->state); } while(0)
-#define tasklet_unlock_wait(t) while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+static inline void tasklet_hi_schedule(struct tasklet_struct *t)
+{
+       unsigned long flags;
+       int cpu;
+
+       cpu = smp_processor_id();
  
-extern void tasklet_schedule(struct tasklet_struct *t);
-extern void tasklet_hi_schedule(struct tasklet_struct *t);
+       if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+               local_irq_save(flags);
+               t->next = tasklet_hi_vec[cpu].list;
+               tasklet_hi_vec[cpu].list = t;
+               cpu_raise_softirq(cpu, HI_SOFTIRQ);
+               local_irq_restore(flags);
+       }
+}
  
  static inline void tasklet_disable_nosync(struct tasklet_struct *t)
  {
         atomic_inc(&t->count);
+       smp_mb__after_atomic_inc();
  }
  
  static inline void tasklet_disable(struct tasklet_struct *t)
  {
         tasklet_disable_nosync(t);
         tasklet_unlock_wait(t);
+       smp_mb();
  }
  
  static inline void tasklet_enable(struct tasklet_struct *t)
  {
-       if (atomic_dec_and_test(&t->count))
-               tasklet_schedule(t);
+       smp_mb__before_atomic_dec();
+       atomic_dec(&t->count);
  }
  
  static inline void tasklet_hi_enable(struct tasklet_struct *t)
  {
-       if (atomic_dec_and_test(&t->count))
-               tasklet_hi_schedule(t);
+       smp_mb__before_atomic_dec();
+       atomic_dec(&t->count);
  }
  
  extern void tasklet_kill(struct tasklet_struct *t);

-----------------



  


Maksim Krasnyanskiy	
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
@ 2001-07-28 17:41                 ` kuznet
  2001-07-28 18:02                   ` Andrea Arcangeli
  2001-07-28 17:54                 ` Andrea Arcangeli
  2001-07-30 18:32                 ` Maksim Krasnyanskiy
  2 siblings, 1 reply; 41+ messages in thread
From: kuznet @ 2001-07-28 17:41 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: andrea, linux-kernel, torvalds, mingo, Dave Miller

Hello!

> Here is how it should work (Alex correct me if I'm wrong here).

You _were_ right. Now... I still do not know, I can only comment
and state that current code really looks funny. :-)


> Purpose of STATE_SCHED is to protect tasklet from being _scheduled_ on the several cpus at the same time. 

Rather its purpose was not protective. When it was set, it meaned that
the function _will_ be run. tasklet_action was allowed to reset it
at any time before function is called. But not after, of course.


> When we run a tasklet we unlink it from the queue and clear STATE_SCHED to allow it to be scheduled again.

This can be made later, but before the function is called.


> tasklet_schedule calls tasklet_unlock after it schedules tasklet,

Hmm... but this opens one more bug: are schedules not lost, when
they are made while tasklet is running?


> we're not gonna schedule tasklet. And there is no point in locking tasklet on the UP machines.

It can be converted to spinlock. I felt a discomfort creating spinlock,
which never spins. :-)


And one more question:

> -               cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); <<<<
> -               tasklet_unlock(t);
> -       }
> -       local_irq_restore(flags);			   <<<<

But Andrea has just tought me that this is invalid to call cpu_raise_softirq
in such context. No differences of netif_rx() here, all the issues are
the same.


>  (Alex you can safely boot latest kernels now :)).

Thank you. I am not afraid of booting not-working kernels, even like this. :-)
I am afraid, when do not feel ground. After your analysis even direction to
ground is lost. :-)

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
  2001-07-28 17:41                 ` kuznet
@ 2001-07-28 17:54                 ` Andrea Arcangeli
  2001-07-28 19:17                   ` Andrea Arcangeli
  2001-07-30 18:32                 ` Maksim Krasnyanskiy
  2 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-28 17:54 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: kuznet, linux-kernel, torvalds, mingo

On Fri, Jul 27, 2001 at 05:52:01PM -0700, Maksim Krasnyanskiy wrote:
> tasklet_schedule calls tasklet_unlock after it schedules tasklet, which is _totaly_ wrong.

yes, it was a leftover from 2.4.6. Your diff -u 2.4.7 2.4.5 is correct
and I also suggest it for for mainline inclusion, thanks.

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 17:41                 ` kuznet
@ 2001-07-28 18:02                   ` Andrea Arcangeli
  2001-07-28 19:02                     ` kuznet
  0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-28 18:02 UTC (permalink / raw)
  To: kuznet; +Cc: Maksim Krasnyanskiy, linux-kernel, torvalds, mingo, Dave Miller

On Sat, Jul 28, 2001 at 09:41:41PM +0400, kuznet@ms2.inr.ac.ru wrote:
> > -               cpu_raise_softirq(cpu, TASKLET_SOFTIRQ); <<<<
> > -               tasklet_unlock(t);
> > -       }
> > -       local_irq_restore(flags);			   <<<<
> 
> But Andrea has just tought me that this is invalid to call cpu_raise_softirq
> in such context. No differences of netif_rx() here, all the issues are
> the same.

cpu_raise_softirq is valid in any context. calling cpu_raise_softirq
there was correct (__cpu_raise_softirq would been too weak).

For performance reasons we should instead __cpu_raise_sofirq (instead of
cpu_raise_softirq) in tasklet_action that runs within do_softirq but
that is a minor optimization.

> I am afraid, when do not feel ground. After your analysis even direction to
> ground is lost. :-)

:) As far I can see returning to the correct 2.4.5 tasklet logic will
fix the tasklet problem (only tasklets had a problem in 2.4.7).

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 18:02                   ` Andrea Arcangeli
@ 2001-07-28 19:02                     ` kuznet
  2001-07-28 19:32                       ` Andrea Arcangeli
  0 siblings, 1 reply; 41+ messages in thread
From: kuznet @ 2001-07-28 19:02 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: maxk, linux-kernel, torvalds, mingo, davem

Hello!

> cpu_raise_softirq is valid in any context. calling cpu_raise_softirq
> there was correct (__cpu_raise_softirq would been too weak).

I see now, the picture clears.


> fix the tasklet problem (only tasklets had a problem in 2.4.7).

I said the problem was not in code. In understanding this.
I am still not 100% sure what is legal, what is not. :-)

F.e. Andrea, teach me how to make the following thing (not for released
kernel, for me): I want to schedule softirq, but I do not want that
this softirq eat all the cpu. It looks natural to use ksoftirqd for this.

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 17:54                 ` Andrea Arcangeli
@ 2001-07-28 19:17                   ` Andrea Arcangeli
  0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-28 19:17 UTC (permalink / raw)
  To: Maksim Krasnyanskiy; +Cc: kuznet, linux-kernel, torvalds, mingo

On Sat, Jul 28, 2001 at 07:54:04PM +0200, Andrea Arcangeli wrote:
> On Fri, Jul 27, 2001 at 05:52:01PM -0700, Maksim Krasnyanskiy wrote:
> > tasklet_schedule calls tasklet_unlock after it schedules tasklet, which is _totaly_ wrong.
> 
> yes, it was a leftover from 2.4.6. Your diff -u 2.4.7 2.4.5 is correct
> and I also suggest it for for mainline inclusion, thanks.

I did some minor further improvement to it while merging it into my tree
and then I rediffed it. This one also adds the debug check for
local_bh_enable in a clied region. (note for other archs:
__cpu_raise_softirq is required to be atomic with respect to irqs but it
doesn't need to be atomic with respect to smp, in x86 it is implemented
as a non locked bts for example)

diff -urN 2.4.7/include/asm-i386/softirq.h softirq/include/asm-i386/softirq.h
--- 2.4.7/include/asm-i386/softirq.h	Wed Jul 25 22:38:08 2001
+++ softirq/include/asm-i386/softirq.h	Sat Jul 28 21:11:35 2001
@@ -25,7 +25,11 @@
 #define local_bh_enable()						\
 do {									\
 	unsigned int *ptr = &local_bh_count(smp_processor_id());	\
+	unsigned long flags;						\
 									\
+	__save_flags(flags);						\
+	if (!(flags & (1 << 9)))					\
+		BUG();							\
 	barrier();							\
 	if (!--*ptr)							\
 		__asm__ __volatile__ (					\
diff -urN 2.4.7/include/linux/interrupt.h softirq/include/linux/interrupt.h
--- 2.4.7/include/linux/interrupt.h	Sun Jul 22 01:16:45 2001
+++ softirq/include/linux/interrupt.h	Sat Jul 28 21:09:15 2001
@@ -118,7 +118,7 @@
 enum
 {
 	TASKLET_STATE_SCHED,	/* Tasklet is scheduled for execution */
-	TASKLET_STATE_RUN	/* Tasklet is running */
+	TASKLET_STATE_RUN	/* Tasklet is running (SMP only) */
 };
 
 struct tasklet_head
@@ -129,34 +129,80 @@
 extern struct tasklet_head tasklet_vec[NR_CPUS];
 extern struct tasklet_head tasklet_hi_vec[NR_CPUS];
 
-#define tasklet_trylock(t) (!test_and_set_bit(TASKLET_STATE_RUN, &(t)->state))
-#define tasklet_unlock(t) do { smp_mb__before_clear_bit(); clear_bit(TASKLET_STATE_RUN, &(t)->state); } while(0)
-#define tasklet_unlock_wait(t) while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+#ifdef CONFIG_SMP
+static inline int tasklet_trylock(struct tasklet_struct *t)
+{
+	return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
+}
+
+static inline void tasklet_unlock(struct tasklet_struct *t)
+{
+	smp_mb__before_clear_bit(); 
+	clear_bit(TASKLET_STATE_RUN, &(t)->state);
+}
+
+static inline void tasklet_unlock_wait(struct tasklet_struct *t)
+{
+	while (test_bit(TASKLET_STATE_RUN, &(t)->state)) { barrier(); }
+}
+#else
+#define tasklet_trylock(t) 1
+#define tasklet_unlock_wait(t) do { } while (0)
+#define tasklet_unlock(t) do { } while (0)
+#endif
+
+static inline void tasklet_schedule(struct tasklet_struct *t)
+{
+	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+		int cpu = smp_processor_id();
+		unsigned long flags;
+
+		local_irq_save(flags);
+		t->next = tasklet_vec[cpu].list;
+		tasklet_vec[cpu].list = t;
+		cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
+		local_irq_restore(flags);
+	}
+}
+
+static inline void tasklet_hi_schedule(struct tasklet_struct *t)
+{
+	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+		int cpu = smp_processor_id();
+		unsigned long flags;
+
+		local_irq_save(flags);
+		t->next = tasklet_hi_vec[cpu].list;
+		tasklet_hi_vec[cpu].list = t;
+		cpu_raise_softirq(cpu, HI_SOFTIRQ);
+		local_irq_restore(flags);
+	}
+}
 
-extern void tasklet_schedule(struct tasklet_struct *t);
-extern void tasklet_hi_schedule(struct tasklet_struct *t);
 
 static inline void tasklet_disable_nosync(struct tasklet_struct *t)
 {
 	atomic_inc(&t->count);
+	smp_mb__after_atomic_inc();
 }
 
 static inline void tasklet_disable(struct tasklet_struct *t)
 {
 	tasklet_disable_nosync(t);
 	tasklet_unlock_wait(t);
+	smp_mb();
 }
 
 static inline void tasklet_enable(struct tasklet_struct *t)
 {
-	if (atomic_dec_and_test(&t->count))
-		tasklet_schedule(t);
+	smp_mb__before_atomic_dec();
+	atomic_dec(&t->count);
 }
 
 static inline void tasklet_hi_enable(struct tasklet_struct *t)
 {
-	if (atomic_dec_and_test(&t->count))
-		tasklet_hi_schedule(t);
+	smp_mb__before_atomic_dec();
+	atomic_dec(&t->count);
 }
 
 extern void tasklet_kill(struct tasklet_struct *t);
diff -urN 2.4.7/kernel/softirq.c softirq/kernel/softirq.c
--- 2.4.7/kernel/softirq.c	Sat Jul 21 00:04:34 2001
+++ softirq/kernel/softirq.c	Sat Jul 28 21:09:46 2001
@@ -143,43 +143,7 @@
 /* Tasklets */
 
 struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned;
-
-void tasklet_schedule(struct tasklet_struct *t)
-{
-	unsigned long flags;
-	int cpu;
-
-	cpu = smp_processor_id();
-	local_irq_save(flags);
-	/*
-	 * If nobody is running it then add it to this CPU's
-	 * tasklet queue.
-	 */
-	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
-		t->next = tasklet_vec[cpu].list;
-		tasklet_vec[cpu].list = t;
-		cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
-		tasklet_unlock(t);
-	}
-	local_irq_restore(flags);
-}
-
-void tasklet_hi_schedule(struct tasklet_struct *t)
-{
-	unsigned long flags;
-	int cpu;
-
-	cpu = smp_processor_id();
-	local_irq_save(flags);
-
-	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
-		t->next = tasklet_hi_vec[cpu].list;
-		tasklet_hi_vec[cpu].list = t;
-		cpu_raise_softirq(cpu, HI_SOFTIRQ);
-		tasklet_unlock(t);
-	}
-	local_irq_restore(flags);
-}
+struct tasklet_head tasklet_hi_vec[NR_CPUS] __cacheline_aligned;
 
 static void tasklet_action(struct softirq_action *a)
 {
@@ -196,29 +160,25 @@
 
 		list = list->next;
 
-		if (!tasklet_trylock(t))
-			BUG();
-		if (!atomic_read(&t->count)) {
-			if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-				BUG();
-			t->func(t->data);
+		if (tasklet_trylock(t)) {
+			if (!atomic_read(&t->count)) {
+				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+					BUG();
+				t->func(t->data);
+				tasklet_unlock(t);
+				continue;
+			}
 			tasklet_unlock(t);
-			continue;
 		}
-		tasklet_unlock(t);
 
 		local_irq_disable();
 		t->next = tasklet_vec[cpu].list;
 		tasklet_vec[cpu].list = t;
-		cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
 		local_irq_enable();
+		__cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
 	}
 }
 
-
-
-struct tasklet_head tasklet_hi_vec[NR_CPUS] __cacheline_aligned;
-
 static void tasklet_hi_action(struct softirq_action *a)
 {
 	int cpu = smp_processor_id();
@@ -234,22 +194,22 @@
 
 		list = list->next;
 
-		if (!tasklet_trylock(t))
-			BUG();
-		if (!atomic_read(&t->count)) {
-			if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-				BUG();
-			t->func(t->data);
+		if (tasklet_trylock(t)) {
+			if (!atomic_read(&t->count)) {
+				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
+					BUG();
+				t->func(t->data);
+				tasklet_unlock(t);
+				continue;
+			}
 			tasklet_unlock(t);
-			continue;
 		}
-		tasklet_unlock(t);
 
 		local_irq_disable();
 		t->next = tasklet_hi_vec[cpu].list;
 		tasklet_hi_vec[cpu].list = t;
-		cpu_raise_softirq(cpu, HI_SOFTIRQ);
 		local_irq_enable();
+		__cpu_raise_softirq(cpu, HI_SOFTIRQ);
 	}
 }
 
diff -urN 2.4.7/net/core/dev.c softirq/net/core/dev.c
--- 2.4.7/net/core/dev.c	Sat Jul 21 00:04:34 2001
+++ softirq/net/core/dev.c	Sat Jul 28 21:10:20 2001
@@ -1217,10 +1217,10 @@
 enqueue:
 			dev_hold(skb->dev);
 			__skb_queue_tail(&queue->input_pkt_queue,skb);
+			local_irq_restore(flags);
 
 			/* Runs from irqs or BH's, no need to wake BH */
 			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-			local_irq_restore(flags);
 #ifndef OFFLINE_SAMPLE
 			get_sample_stats(this_cpu);
 #endif
@@ -1529,10 +1529,10 @@
 
 	local_irq_disable();
 	netdev_rx_stat[this_cpu].time_squeeze++;
+	local_irq_enable();
 
 	/* This already runs in BH context, no need to wake up BH's */
 	__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
-	local_irq_enable();
 
 	NET_PROFILE_LEAVE(softnet_process);
 	return;

Andrea

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 19:02                     ` kuznet
@ 2001-07-28 19:32                       ` Andrea Arcangeli
  2001-07-28 23:28                         ` Alexey Kuznetsov
  0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2001-07-28 19:32 UTC (permalink / raw)
  To: kuznet; +Cc: maxk, linux-kernel, torvalds, mingo, davem

On Sat, Jul 28, 2001 at 11:02:07PM +0400, kuznet@ms2.inr.ac.ru wrote:
> F.e. Andrea, teach me how to make the following thing (not for released
> kernel, for me): I want to schedule softirq, but I do not want that
> this softirq eat all the cpu. It looks natural to use ksoftirqd for this.

yes, ksoftirqd should just avoid you to eat all the cpu even if you keep
posting the softirq all the time. ksoftirqd is reniced at +20 so it
should be pretty nice with the other tasks in the system.

If you want to delay the softirq and run it at a low frequency then you
should use a timer or another functionality (the softirq is required to
run ASAP always).

I hope I didn't misunderstood the question in such case please correct
me.

Andrea

PS. I will be offline shortly so I may not be able to answer further
emails until Monday.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 19:32                       ` Andrea Arcangeli
@ 2001-07-28 23:28                         ` Alexey Kuznetsov
  2001-07-29 17:07                           ` Linus Torvalds
  0 siblings, 1 reply; 41+ messages in thread
From: Alexey Kuznetsov @ 2001-07-28 23:28 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: maxk, linux-kernel, torvalds, mingo, davem

Hello!

> yes, ksoftirqd should just avoid you to eat all the cpu 

I see now, you completely killed dead loops from Ingo's patch!
It is your original approach and I am happy with it.

Before falling to euforia, the last question:
Is Ingo really happy with this? He blamed about latency,
it is not better than in 2.4.5 (with cpu_idle fix) :-)

> I hope I didn't misunderstood the question in such case please correct
> me.

You really misunderstood this a bit, but solely because the question
was meaningless in context of 2.4.7. My apologies. :-)

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28 23:28                         ` Alexey Kuznetsov
@ 2001-07-29 17:07                           ` Linus Torvalds
  2001-07-29 17:52                             ` kuznet
  0 siblings, 1 reply; 41+ messages in thread
From: Linus Torvalds @ 2001-07-29 17:07 UTC (permalink / raw)
  To: Alexey Kuznetsov; +Cc: Andrea Arcangeli, maxk, linux-kernel, mingo, davem


On Sun, 29 Jul 2001, Alexey Kuznetsov wrote:
>
> Before falling to euforia, the last question:
> Is Ingo really happy with this? He blamed about latency,
> it is not better than in 2.4.5 (with cpu_idle fix) :-)

I think the latency issue was really the fact that we weren't always
running softirqs in a timely fashion after they had been disabled by a
"disable_bh()". That is fixed with the new softirq stuff, regardless of
the other issues.

But it would be good to have some specweb runs etc done to verify.

Ingo?

		Linus


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-29 17:07                           ` Linus Torvalds
@ 2001-07-29 17:52                             ` kuznet
  2001-07-30 18:50                               ` Ingo Molnar
  0 siblings, 1 reply; 41+ messages in thread
From: kuznet @ 2001-07-29 17:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: andrea, maxk, linux-kernel, mingo, davem

Hello!

> I think the latency issue was really the fact that we weren't always
> running softirqs in a timely fashion after they had been disabled by a
> "disable_bh()". That is fixed with the new softirq stuff, regardless of
> the other issues.

I hope too. Actually, this observation was main argument pro ksoftirqd
and against instant restart. Ingo objected but finally the issue was buried...
or I lost discussion not reading maillists for some time.

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
  2001-07-28 17:41                 ` kuznet
  2001-07-28 17:54                 ` Andrea Arcangeli
@ 2001-07-30 18:32                 ` Maksim Krasnyanskiy
  2 siblings, 0 replies; 41+ messages in thread
From: Maksim Krasnyanskiy @ 2001-07-30 18:32 UTC (permalink / raw)
  To: kuznet; +Cc: andrea, linux-kernel, torvalds, mingo, Dave Miller


> > tasklet_schedule calls tasklet_unlock after it schedules tasklet,
>
>Hmm... but this opens one more bug: are schedules not lost, when
>they are made while tasklet is running ?
Yes it does.

btw Here is an idea. May be we should reschedule tasklet on the same cpu it's running on.
It should probably improve cache usage and stuff. 
I'm thinking about something like this.

static inline void tasklet_schedule(struct tasklet_struct *t)
{
         if (test_bit(TASKLET_STATE_RUN, &t->state)) {
                 set_bit(TASKLET_NEED_RESCHED, &t->state);
         } else if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
                 int cpu = smp_processor_id();
                 unsigned long flags;

                 local_irq_save(flags);
                 t->next = tasklet_vec[cpu].list;
                 tasklet_vec[cpu].list = t;
                 cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
                 local_irq_restore(flags);
         }
}

static void tasklet_action(struct softirq_action *a)
{
         int cpu = smp_processor_id();
         struct tasklet_struct *list;

         local_irq_disable();
         list = tasklet_vec[cpu].list;
         tasklet_vec[cpu].list = NULL;
         local_irq_enable();

         while (list) {
                 struct tasklet_struct *t = list;

                 list = list->next;

                 if (tasklet_trylock(t)) {
                         if (!atomic_read(&t->count)) {
                                 if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
                                         BUG();
                                 t->func(t->data);
                                 tasklet_unlock(t);
                                 if (test_and_clear_bit(TASKLET_NEED_RESCHED, &t->state)
                                    goto resched;
                                 continue;
                         }
                         tasklet_unlock(t);
                 }

resched:
                 local_irq_disable();
                 t->next = tasklet_vec[cpu].list;
                 tasklet_vec[cpu].list = t;
                 cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
                 local_irq_enable();
         }
}

There is small window there but this is just rfc. So if tasklet is already running we set NEED_RESCHED bit and tasklet_action
reschedules tasklet on the same cpu. (currently we may reschedule it on anther cpu).
Comments ?

Max

Maksim Krasnyanskiy	
Senior Kernel Engineer
Qualcomm Incorporated

maxk@qualcomm.com
http://bluez.sf.net
http://vtun.sf.net


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-29 17:52                             ` kuznet
@ 2001-07-30 18:50                               ` Ingo Molnar
  2001-07-30 22:47                                 ` Nigel Gamble
  2001-07-31 18:08                                 ` kuznet
  0 siblings, 2 replies; 41+ messages in thread
From: Ingo Molnar @ 2001-07-30 18:50 UTC (permalink / raw)
  To: kuznet; +Cc: Linus Torvalds, andrea, maxk, linux-kernel, davem


> > I think the latency issue was really the fact that we weren't always
> > running softirqs in a timely fashion after they had been disabled by a
> > "disable_bh()". That is fixed with the new softirq stuff, regardless of
> > the other issues.

nope. i observed latency issues with restart + ksoftirqd as well. [when i
first saw these latency problems i basically had ksoftirqd implemented
independently from your patch, and threw the idea away because it was
insufficient from the latency point of view.] Those latencies are harder
to observe because they are not 1/HZ anymore but several hundred millisecs
at most. Plus, like i said previously, pushing IRQ context work into a
scheduler-level context 'feels' incorrect to me - it only makes the
latencies less visible. I'll do some measurements.

	Ingo


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-30 18:50                               ` Ingo Molnar
@ 2001-07-30 22:47                                 ` Nigel Gamble
  2001-07-30 22:56                                   ` Christoph Hellwig
  2001-07-31 18:08                                 ` kuznet
  1 sibling, 1 reply; 41+ messages in thread
From: Nigel Gamble @ 2001-07-30 22:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, linux-kernel

On Mon, 30 Jul 2001, Ingo Molnar wrote:
> > > I think the latency issue was really the fact that we weren't always
> > > running softirqs in a timely fashion after they had been disabled by a
> > > "disable_bh()". That is fixed with the new softirq stuff, regardless of
> > > the other issues.
> 
> nope. i observed latency issues with restart + ksoftirqd as well. [when i
> first saw these latency problems i basically had ksoftirqd implemented
> independently from your patch, and threw the idea away because it was
> insufficient from the latency point of view.] Those latencies are harder
> to observe because they are not 1/HZ anymore but several hundred millisecs
> at most. Plus, like i said previously, pushing IRQ context work into a
> scheduler-level context 'feels' incorrect to me - it only makes the
> latencies less visible. I'll do some measurements.

If you schedule ksoftirqd as SCHED_FIFO or SCHED_RR, and run with the
preemptible kernel patch, you can get maximum latencies down to a few
hundred microseconds most of the time, with typical latencies of a few
tens of microseconds.   Perhaps you could also measure that
configuration?  (Latest preemptible kernel patch is available at
kpreempt.sourceforge.net).

I'd like to see all the various execution contexts of Linux (irqs,
softirqs, tasklets, kernel daemons) all become (real-time where
necessary) kernel threads like ksoftirqd, scheduled with the appropriate
scheduling class and priority.  The resulting kernel code would be much
simpler and more maintainable; and it would make it possible to change
the scheduling priority of the threads to optimize for different
application loads.

Nigel Gamble                                    nigel@nrg.org
Mountain View, CA, USA.                         http://www.nrg.org/

MontaVista Software                             nigel@mvista.com


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-30 22:47                                 ` Nigel Gamble
@ 2001-07-30 22:56                                   ` Christoph Hellwig
  0 siblings, 0 replies; 41+ messages in thread
From: Christoph Hellwig @ 2001-07-30 22:56 UTC (permalink / raw)
  To: Nigel Gamble; +Cc: Linus Torvalds, linux-kernel, mingo

In article <Pine.LNX.4.05.10107301529300.11108-100000@cosmic.nrg.org> you wrote:
> I'd like to see all the various execution contexts of Linux (irqs,
> softirqs, tasklets, kernel daemons) all become (real-time where
> necessary) kernel threads like ksoftirqd, scheduled with the appropriate
> scheduling class and priority.  The resulting kernel code would be much
> simpler and more maintainable; and it would make it possible to change
> the scheduling priority of the threads to optimize for different
> application loads.

That's how solaris does it, btw.  From time to time sun seems to get
something right ;)

	Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] [IMPORTANT] Re: 2.4.7 softirq incorrectness.
  2001-07-30 18:50                               ` Ingo Molnar
  2001-07-30 22:47                                 ` Nigel Gamble
@ 2001-07-31 18:08                                 ` kuznet
  1 sibling, 0 replies; 41+ messages in thread
From: kuznet @ 2001-07-31 18:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, andrea, maxk, linux-kernel, davem

Hello!

> nope. i observed latency issues with restart + ksoftirqd as well.

Ingo, what is impact on total performance yet?

Just to remind: simple constantation of the fact that "latency is bad"
still does not imply that dead loops are inevitable.
We really could try to complicate policing of softirqs (f.e. counting cycles
on them and breaking loop with falling to ksoftirqd after it preempts
threads for more than some threshold). But it is still not clear
that it is worth to do. If you see an impact, let's try this way.

Probably, priority on ksoftirqd should be tuned. Seems, it still
should have high priority, a bit less than real-time.

Alexey

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2001-07-31 18:09 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-22 20:44 2.4.7 softirq incorrectness Rusty Russell
2001-07-22 23:34 ` Andrea Arcangeli
2001-07-23  9:06   ` Rusty Russell
2001-07-23 14:29     ` Andrea Arcangeli
2001-07-24  9:35       ` Rusty Russell
2001-07-25 19:33         ` Andrea Arcangeli
2001-07-26 20:26           ` Rusty Russell
2001-07-23  9:25   ` Kai Germaschewski
2001-07-23 11:12     ` Jeff Garzik
2001-07-23 14:18     ` Andrea Arcangeli
2001-07-23 12:05   ` David S. Miller
2001-07-23 14:31     ` Andrea Arcangeli
2001-07-23 22:24   ` Alexey Kuznetsov
2001-07-25 22:23     ` Andrea Arcangeli
2001-07-26 17:46       ` kuznet
2001-07-26 18:03         ` Jeff Garzik
2001-07-26 18:29         ` Andrea Arcangeli
2001-07-27 16:48           ` kuznet
2001-07-27 18:31           ` Maksim Krasnyanskiy
2001-07-27 18:59             ` kuznet
2001-07-27 19:21             ` Maksim Krasnyanskiy
2001-07-27 19:35               ` kuznet
2001-07-28  0:52               ` [PATCH] [IMPORTANT] " Maksim Krasnyanskiy
2001-07-28 17:41                 ` kuznet
2001-07-28 18:02                   ` Andrea Arcangeli
2001-07-28 19:02                     ` kuznet
2001-07-28 19:32                       ` Andrea Arcangeli
2001-07-28 23:28                         ` Alexey Kuznetsov
2001-07-29 17:07                           ` Linus Torvalds
2001-07-29 17:52                             ` kuznet
2001-07-30 18:50                               ` Ingo Molnar
2001-07-30 22:47                                 ` Nigel Gamble
2001-07-30 22:56                                   ` Christoph Hellwig
2001-07-31 18:08                                 ` kuznet
2001-07-28 17:54                 ` Andrea Arcangeli
2001-07-28 19:17                   ` Andrea Arcangeli
2001-07-30 18:32                 ` Maksim Krasnyanskiy
2001-07-27  0:47       ` Maksim Krasnyanskiy
2001-07-27 15:01         ` Andrea Arcangeli
2001-07-27  9:34       ` David S. Miller
2001-07-27 17:01         ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).