linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
@ 2001-10-03 14:15 Manfred Spraul
  2001-10-03 15:09 ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Manfred Spraul @ 2001-10-03 14:15 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel, Ingo Molnar, Andreas Dilger, linux-netdev

> On Wed, 3 Oct 2001, jamal wrote:
> > On Wed, 3 Oct 2001, Ingo Molnar wrote:
> > >
> > > but the objectives, judging from the description you gave, are i
> > > think largely orthogonal,  with some overlapping in the polling
> > > part.
> >
> > yes. Weve done a lot of thoroughly thought work in that area and i
> > think it will be a sin to throw it out.
> >
>
> I hit the send button to fast..
> The dynamic irq limiting (it must not be set by a system admin to
> conserve the principle of work) could be used as a last resort.
> The point is, if you are not generating a lot of interupts to begin
> with (as is the case with NAPI), i dont see the irq rate limiting
> kicking in at all.

A few notes as seen for low-end nics:

Forcing an irq limit without asking the driver is bad - it must be the
opposite way around.
e.g. the winbond nic contains a bug that forces it to 1 interrupt/packet
tx, but I can switch to rx polling/mitigation.
I'm sure the ne2k-pci users would also complain if a fixed irq limit is
added - I bet the majority of the drivers perform worse with a fixed
limit, only some perform better, and most perform best if they are given
a notice that they should reduce their irq rate. (e.g. disable
rx_packet, tx_packet. Leave the error interrupts on, and do the
rx_packet, tx_packet work in the poll handler)

But a hint for the driver ("now switch mitigation on/off") seems to be a
good idea. And that hint should not be the return value of netif_rx -
what if the driver is only sending packets?
What if it's not even a network driver?

NAPI seems to be very promising to fix the total system overload case
(so many packets arrive that despite irq mitigation the system is still
overloaded).

But the implementation of irq mitigation is driver specific, and a 10
millisecond stop is far too long.

--
    Manfred




^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 14:15 [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5 Manfred Spraul
@ 2001-10-03 15:09 ` jamal
  2001-10-03 18:37   ` Davide Libenzi
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-03 15:09 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel, Ingo Molnar, Andreas Dilger, linux-netdev



On Wed, 3 Oct 2001, Manfred Spraul wrote:

> > On Wed, 3 Oct 2001, jamal wrote:
> > > On Wed, 3 Oct 2001, Ingo Molnar wrote:
> > > >
> > > > but the objectives, judging from the description you gave, are i
> > > > think largely orthogonal,  with some overlapping in the polling
> > > > part.
> > >
> > > yes. Weve done a lot of thoroughly thought work in that area and i
> > > think it will be a sin to throw it out.
> > >
> >
> > I hit the send button to fast..
> > The dynamic irq limiting (it must not be set by a system admin to
> > conserve the principle of work) could be used as a last resort.
> > The point is, if you are not generating a lot of interupts to begin
> > with (as is the case with NAPI), i dont see the irq rate limiting
> > kicking in at all.
>
> A few notes as seen for low-end nics:
>
> Forcing an irq limit without asking the driver is bad - it must be the
> opposite way around.
> e.g. the winbond nic contains a bug that forces it to 1 interrupt/packet
> tx, but I can switch to rx polling/mitigation.

Indeed this is a weird case that we have not encountered but it does make
the point that the driver knows best what to do.

> I'm sure the ne2k-pci users would also complain if a fixed irq limit is
> added - I bet the majority of the drivers perform worse with a fixed
> limit, only some perform better, and most perform best if they are given
> a notice that they should reduce their irq rate. (e.g. disable
> rx_packet, tx_packet. Leave the error interrupts on, and do the
> rx_packet, tx_packet work in the poll handler)
>

agreed. The reaction should be left to the driver's policy.

> But a hint for the driver ("now switch mitigation on/off") seems to be a
> good idea. And that hint should not be the return value of netif_rx -
> what if the driver is only sending packets?
> What if it's not even a network driver?

For 2.4, unfortunately there was no other way to pass that feedback
without the driver sending a packet up the stack. Our system feedback
probe is based on sampling the backlog queue.

> NAPI seems to be very promising to fix the total system overload case
> (so many packets arrive that despite irq mitigation the system is still
> overloaded).
>
> But the implementation of irq mitigation is driver specific, and a 10
> millisecond stop is far too long.
>

violent agreement.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:09 ` jamal
@ 2001-10-03 18:37   ` Davide Libenzi
  0 siblings, 0 replies; 151+ messages in thread
From: Davide Libenzi @ 2001-10-03 18:37 UTC (permalink / raw)
  To: jamal
  Cc: Manfred Spraul, linux-kernel, Ingo Molnar, Andreas Dilger, linux-netdev

On Wed, 3 Oct 2001, jamal wrote:
> > NAPI seems to be very promising to fix the total system overload case
> > (so many packets arrive that despite irq mitigation the system is still
> > overloaded).
> >
> > But the implementation of irq mitigation is driver specific, and a 10
> > millisecond stop is far too long.
> >
>
> violent agreement.

The Ingo's solution move the mitigation control into the kernel with the
immediate advantage that it'll work right now with existing drivers.
I think that the idea of kirqpoll is good but the long term solution
should be the move of the mitigation knowledge inside the drivers that
will register their own kirqpoll callbacks when they're going to mask
irqs.
In this case the "intelligence" about irq rates is left in the place where
there's more knowledge about the I/O traffic nature.



- Davide



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 16:11                                       ` Alan Cox
  2001-10-08 16:11                                         ` jamal
@ 2001-10-10 16:26                                         ` Pavel Machek
  1 sibling, 0 replies; 151+ messages in thread
From: Pavel Machek @ 2001-10-10 16:26 UTC (permalink / raw)
  To: Alan Cox
  Cc: jamal, Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel,
	netdev, Linus Torvalds

Hi!

> > Agreed if you add the polling cardbus bit.
> > Note polling cardbus would require more changes than the above.
> 
> I don't think it does. There are two pieces to the problem
> 
> 	a)	Not dying horribly
> 	b)	Handling it elegantly
> 
> b) is driver specific (NAPI etc) and I think well understood to the point
> its being used already for performance reasons
> 
> a) is as simple as 
> 
> 	if(stuck_in_irq(foo) && irq_shared(foo))
> 	{
> 		disable_real_irq(foo);
> 		timer_fake_irq_foo();
> 	}

I'd kill irq_shared() test, and added a printk :-).
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:35                                   ` Alan Cox
  2001-10-08 15:57                                     ` jamal
@ 2001-10-10 16:25                                     ` Pavel Machek
  1 sibling, 0 replies; 151+ messages in thread
From: Pavel Machek @ 2001-10-10 16:25 UTC (permalink / raw)
  To: Alan Cox
  Cc: jamal, Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel,
	netdev, Linus Torvalds

Hi!

> Even at 200Hz polling a typical cardbus card with say 32 ring buffer slots
> can process 6000pps.

On my velo, I have pcmcia but don't quite know how to drive it properly.
I have not figured interrupts, so I ran ne2000 in polling mode. .5MB/sec
is not bad for as slow hardware as velo is (see sig). Next experiment was
ATA flash. I had to bump HZ to 1000 for it, and I'm getting spurious 
unexpected interrupt messaegs, but it works surprisingly well.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 14:45 jamal
  2001-10-09  0:36 ` Scott Laird
@ 2001-10-09  4:04 ` Werner Almesberger
  1 sibling, 0 replies; 151+ messages in thread
From: Werner Almesberger @ 2001-10-09  4:04 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel, netdev

jamal wrote:
> - Linux is already "very modular" as a router with both the traffic
> control framework and netfilter. I like their language specification etc;
> ours is a little more primitive in comparison.

I guess you're talking about iproute2/tc ;-) Things are better with tcng:
http://tcng.sourceforge.net/

Click covers more areas than just Traffic Control, though.

- Werner

-- 
  _________________________________________________________________________
 / Werner Almesberger, Lausanne, CH                    wa@almesberger.net /
/_http://icawww.epfl.ch/almesberger/_____________________________________/

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-09  0:36 ` Scott Laird
@ 2001-10-09  3:17   ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-09  3:17 UTC (permalink / raw)
  To: Scott Laird; +Cc: linux-kernel, netdev, Bernd Eckenfels



On Mon, 8 Oct 2001, Scott Laird wrote:

>
>
> On Mon, 8 Oct 2001, jamal wrote:
> >
> > Several things to note/observe:
> > - They use some very specialized piece of hardware (with two PCI buses).
>
> Huh?  It was just an L440GX, which was probably the single most common PC
> server board for a while in 1999-2000.  Most of VA Linux's systems used
> them.  I wouldn't call them "very specialized."
>

Ok, sorry you are right, not very high end, but not exactly cheap even at
the time to have a motherboard with two PCI busses (i for one would have
been delighted to have had access to one even today);
Nevertheless, impressive numbers still.
I could do achieve MLFR of ~200Kpps on an elcheapo PII with 4port znyx
cards on an ASUS that has a single PCI bus; and from what Donald Becker
was saying we could probably do better with 4 interface cards rather than
a single 4-port card due to bus mastership issues.
I suppose thats why Robert can pull more packets on only two gige NICs on
a single bus. He's more than likely hitting PCI bottlenecks at this point.
A second PCI bus with a second set of cards should help (dis)prove this
theory.

> > - Roberts results on a single PCI bus hardware was showing ~360Kpps
> > routing vs clicks 435Kpps. This is not "far off" given the differences in
> > hardware. What would be really interesting is to have the click folks
> > post their latency results. I am curious as to what a purely polling
> > scheme they have would achieve (as opposed to NAPI which is a mixture of
> > interupts and polls).
>
> Their 'TOCS00' paper lists a 29us one-way latency on page 22.
>

Thats a very good number. I wonder what it means though and at what rates
those numbers are extracted. For example some of the tests i run on the
znyx card with only two ports generating traffic -- you can observe a
rough latency of around 33us upto about the MLFFR and then the latency
jumps sharply to hunderds of us. Infact at 147Kpps input, you observe
anywhere in the range of upto 800us although we are clearly flat at the
MLFFR throughput on the output. These numbers might also be affected by
the latency measurement scheme used,

> Click looks interesting, much more so then most academic network projects,
> but I'm still not sure if it'd really be useful in most "real"

agreed, although i think we need to have more research of the type that
click is bringing ...

> environments.  It looks too flexible for most people to manage.  It'd be
> an interesting addition to my test lab, though :-).

indeed.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 14:45 jamal
@ 2001-10-09  0:36 ` Scott Laird
  2001-10-09  3:17   ` jamal
  2001-10-09  4:04 ` Werner Almesberger
  1 sibling, 1 reply; 151+ messages in thread
From: Scott Laird @ 2001-10-09  0:36 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel, netdev, Bernd Eckenfels



On Mon, 8 Oct 2001, jamal wrote:
>
> Several things to note/observe:
> - They use some very specialized piece of hardware (with two PCI buses).

Huh?  It was just an L440GX, which was probably the single most common PC
server board for a while in 1999-2000.  Most of VA Linux's systems used
them.  I wouldn't call them "very specialized."

> - Roberts results on a single PCI bus hardware was showing ~360Kpps
> routing vs clicks 435Kpps. This is not "far off" given the differences in
> hardware. What would be really interesting is to have the click folks
> post their latency results. I am curious as to what a purely polling
> scheme they have would achieve (as opposed to NAPI which is a mixture of
> interupts and polls).

Their 'TOCS00' paper lists a 29us one-way latency on page 22.

Click looks interesting, much more so then most academic network projects,
but I'm still not sure if it'd really be useful in most "real"
environments.  It looks too flexible for most people to manage.  It'd be
an interesting addition to my test lab, though :-).


Scott


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 19:17                         ` kuznet
  2001-10-08 13:58                           ` jamal
@ 2001-10-08 17:42                           ` Robert Olsson
  2001-10-08 17:39                             ` jamal
  1 sibling, 1 reply; 151+ messages in thread
From: Robert Olsson @ 2001-10-08 17:42 UTC (permalink / raw)
  To: jamal
  Cc: kuznet, Andreas Dilger, Robert.Olsson, mingo, linux-kernel, bcrl,
	netdev, torvalds, alan


jamal writes:
 > 
 > This was Robert actually; conclusion was Interupts are very expensive. If
 > we can get rid of as many of them as possible, we are getting a side
 > benefit. I cant find the old data, but Robert has some data over here:
 > http://robur.slu.se/Linux/net-development/experiments/010301


 Jamal! 

 I think you ment:
 http://robur.slu.se/Linux/net-development/experiments/010313

 MB with PIC irq controller IO-APIC boards does a lot better.

 Cheers.

						--ro

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 17:42                           ` Robert Olsson
@ 2001-10-08 17:39                             ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-08 17:39 UTC (permalink / raw)
  To: Robert Olsson
  Cc: kuznet, Andreas Dilger, mingo, linux-kernel, bcrl, netdev,
	torvalds, alan



On Mon, 8 Oct 2001, Robert Olsson wrote:

>  I think you ment:
>  http://robur.slu.se/Linux/net-development/experiments/010313
>
>  MB with PIC irq controller IO-APIC boards does a lot better.
>

Ooops, Yes, i am sorry (12 days difference only ;->)

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 16:11                                       ` Alan Cox
@ 2001-10-08 16:11                                         ` jamal
  2001-10-10 16:26                                         ` Pavel Machek
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-08 16:11 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel, netdev,
	Linus Torvalds



On Mon, 8 Oct 2001, Alan Cox wrote:

> > Agreed if you add the polling cardbus bit.
> > Note polling cardbus would require more changes than the above.
>
> I don't think it does.

I was repsonding to your earlier comment that:

> Once you disable the IRQ and kick over to polling the cardbus and the
> ethernet both still get regular service. Ok so your pps rate and your
> latency are unpleasant, but you are not dead.

basically pointing that we'll need more work to be done to get Ingos patch
to poll the cardbus and eth0 in the example i gave.
those will have to be per driver. Did i miss something?
Agree on your other points there

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:57                                     ` jamal
@ 2001-10-08 16:11                                       ` Alan Cox
  2001-10-08 16:11                                         ` jamal
  2001-10-10 16:26                                         ` Pavel Machek
  0 siblings, 2 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-08 16:11 UTC (permalink / raw)
  To: jamal
  Cc: Alan Cox, Jeff Garzik, Andrea Arcangeli, Ingo Molnar,
	Linux-Kernel, netdev, Linus Torvalds

> Agreed if you add the polling cardbus bit.
> Note polling cardbus would require more changes than the above.

I don't think it does. There are two pieces to the problem

	a)	Not dying horribly
	b)	Handling it elegantly

b) is driver specific (NAPI etc) and I think well understood to the point
its being used already for performance reasons

a) is as simple as 

	if(stuck_in_irq(foo) && irq_shared(foo))
	{
		disable_real_irq(foo);
		timer_fake_irq_foo();
	}

We know spoofing a shared irq is safe.

Alan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:35                                   ` Alan Cox
@ 2001-10-08 15:57                                     ` jamal
  2001-10-08 16:11                                       ` Alan Cox
  2001-10-10 16:25                                     ` Pavel Machek
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-08 15:57 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel, netdev,
	Linus Torvalds



On Mon, 8 Oct 2001, Alan Cox wrote:

> > I hear you, but I think isolation is important;
> > If i am telneted (literal example here) onto that machine (note eth0 is
> > not cardbus based) and cardbus is causing the loops then iam screwed.
> > [The same applies to everything that shares interupts]
>
> Worst case it sucks, but it isnt dead.
>
> Once you disable the IRQ and kick over to polling the cardbus and the
> ethernet both still get regular service. Ok so your pps rate and your
> latency are unpleasant, but you are not dead.
>

Agreed if you add the polling cardbus bit.
Note polling cardbus would require more changes than the above.
My concern was more of the following: This is a temporary solution.  A
quickie if you may. The proper solution is to have the isolation part. If
we push this in, doesnt it result in procastination of "we'll do it later"
Why not do it properly since this was never a show stopper to begin with?
[The showstopper was networking]

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:24                             ` Andrea Arcangeli
@ 2001-10-08 15:35                               ` Alan Cox
  0 siblings, 0 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-08 15:35 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Alan Cox, Jeff Garzik, Ingo Molnar, jamal, Linux-Kernel, netdev,
	Linus Torvalds

> Another thing I said recently is that the hardirq airbag have nothing to
> do with softirqs, and that's right. Patch messing the softirq logic in
> function of the hardirq airbag are just totally broken or at least
> confusing because incidentally merged together by mistake.

Agreed

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:20                                 ` jamal
@ 2001-10-08 15:35                                   ` Alan Cox
  2001-10-08 15:57                                     ` jamal
  2001-10-10 16:25                                     ` Pavel Machek
  0 siblings, 2 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-08 15:35 UTC (permalink / raw)
  To: jamal
  Cc: Alan Cox, Jeff Garzik, Andrea Arcangeli, Ingo Molnar,
	Linux-Kernel, netdev, Linus Torvalds

> I hear you, but I think isolation is important;
> If i am telneted (literal example here) onto that machine (note eth0 is
> not cardbus based) and cardbus is causing the loops then iam screwed.
> [The same applies to everything that shares interupts]

Worst case it sucks, but it isnt dead.

Once you disable the IRQ and kick over to polling the cardbus and the
ethernet both still get regular service. Ok so your pps rate and your
latency are unpleasant, but you are not dead.

For a shared IRQ we know we can safely switch to a 200Hz poll of shared
irq lines marked 'stuck'. The problem ones are non shared ISA devices going
mad - there you have to be careful not to fake more irqs than real ones
are delivered since some ISA device drivers "know" the IRQ is for them.

Even at 200Hz polling a typical cardbus card with say 32 ring buffer slots
can process 6000pps.

Alan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:12                           ` Alan Cox
  2001-10-08 15:09                             ` jamal
@ 2001-10-08 15:24                             ` Andrea Arcangeli
  2001-10-08 15:35                               ` Alan Cox
  1 sibling, 1 reply; 151+ messages in thread
From: Andrea Arcangeli @ 2001-10-08 15:24 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Ingo Molnar, jamal, Linux-Kernel, netdev, Linus Torvalds

On Mon, Oct 08, 2001 at 04:12:53PM +0100, Alan Cox wrote:
> "Driver killed because the air bag enable is off by default and only
> mentioned on page 87 of the handbook in a footnote"

Nobody suggested to not add "an airbag" by default.

Infact the polling isn't an airbag at all, when you poll you're flying
so you never need an airbag at all, only when you're on the ground you
may need the airbag.

Another thing I said recently is that the hardirq airbag have nothing to
do with softirqs, and that's right. Patch messing the softirq logic in
function of the hardirq airbag are just totally broken or at least
confusing because incidentally merged together by mistake.

Andrea

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:09                             ` jamal
@ 2001-10-08 15:22                               ` Alan Cox
  2001-10-08 15:20                                 ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Alan Cox @ 2001-10-08 15:22 UTC (permalink / raw)
  To: jamal
  Cc: Alan Cox, Jeff Garzik, Andrea Arcangeli, Ingo Molnar,
	Linux-Kernel, netdev, Linus Torvalds

> On Mon, 8 Oct 2001, Alan Cox wrote:
> 
> > NAPI is important - the irq disable tactic is a last resort. If the right
> > hardware is irq flood aware it should only ever trigger to save us from
> > irq routing errors (eg cardbus hangs)
> 
> Agreed. As long as the IRQ flood protector can do proper isolation.
> Here's hat i see on my dell latitude laptop with a built in ethernet (not
> cardbus related ;->)

It doesnt save you from horrible performance. NAPI is there to do that, it
saves you from a dead box. You can at least rmmod the cardbus controller
with protection in place (or go looking for the problem with a debugger)

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:22                               ` Alan Cox
@ 2001-10-08 15:20                                 ` jamal
  2001-10-08 15:35                                   ` Alan Cox
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-08 15:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel, netdev,
	Linus Torvalds



On Mon, 8 Oct 2001, Alan Cox wrote:

> It doesnt save you from horrible performance. NAPI is there to do that, it
> saves you from a dead box. You can at least rmmod the cardbus controller
> with protection in place (or go looking for the problem with a debugger)

I hear you, but I think isolation is important;
If i am telneted (literal example here) onto that machine (note eth0 is
not cardbus based) and cardbus is causing the loops then iam screwed.
[The same applies to everything that shares interupts]

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:00                       ` Alan Cox
  2001-10-08 15:03                         ` Jeff Garzik
@ 2001-10-08 15:19                         ` Andrea Arcangeli
  1 sibling, 0 replies; 151+ messages in thread
From: Andrea Arcangeli @ 2001-10-08 15:19 UTC (permalink / raw)
  To: Alan Cox
  Cc: Ingo Molnar, jamal, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Linus Torvalds

On Mon, Oct 08, 2001 at 04:00:36PM +0100, Alan Cox wrote:
> > Of course we agree that such a "polling router/firewall" behaviour must
> > not be the default but it must be enabled on demand by the admin via
> > sysctl or whatever else userspace API. And I don't see any problem with
> > that.
> 
> No I don't agree. "Stop random end users crashing my machine at will" is not
> a magic sysctl option - its a default. 

The "random user hanging my machine" has nothing to do with "it is ok in
a router to dedicate one cpu to polling".

The whole email was about "in a router is ok to poll" I'm not saying "to
solve the food problem you should be forced to turn on polling".

I also said that if you turn on polling you also solve the DoS, yes, but
that was just a side note. My only implicit thought about the side note
was that most machines sensible to the  DoS are routers where people
wants the max performance and where they can dedicate one cpu (also in
UP) to polling.  So the only argument I can make is that the amount of
userbase concerned about the "current" hardirq DoS would decrease
significantly if polling method would becomes available in linux.

I'm certainly not saying that the "stop random user crashing my machine
at will" should be a sysctl option and not the default.

Andrea

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:03                         ` Jeff Garzik
@ 2001-10-08 15:12                           ` Alan Cox
  2001-10-08 15:09                             ` jamal
  2001-10-08 15:24                             ` Andrea Arcangeli
  0 siblings, 2 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-08 15:12 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Alan Cox, Andrea Arcangeli, Ingo Molnar, jamal, Linux-Kernel,
	netdev, Linus Torvalds

> I think (Ingo's?) analogy of an airbag was appropriate, if that's indeed
> how the code winds up functioning.

Very much so

"Driver killed because the air bag enable is off by default and only
mentioned on page 87 of the handbook in a footnote"

> Having a mechanism that prevents what would otherwise be a lockup is
> useful.  NAPI is useful.  Having both would be nice :)

NAPI is important - the irq disable tactic is a last resort. If the right
hardware is irq flood aware it should only ever trigger to save us from
irq routing errors (eg cardbus hangs) 

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08  0:31                     ` Andrea Arcangeli
  2001-10-08  4:58                       ` Bernd Eckenfels
  2001-10-08 15:00                       ` Alan Cox
@ 2001-10-08 15:10                       ` bill davidsen
  2 siblings, 0 replies; 151+ messages in thread
From: bill davidsen @ 2001-10-08 15:10 UTC (permalink / raw)
  To: linux-kernel

In article <20011008023118.L726@athlon.random> andrea@suse.de wrote:

| You're perfectly right that it's not ok for a generic computing
| environment to spend lots of cpu in polling, but it is clear that in a
| dedicated router/firewall we can just shutdown the NIC interrupt forever via
| disable_irq (no matter if the nic supports hw flow control or not, and
| in turn no matter if the kid tries to spam the machine with small
| packets) and dedicate 1 cpu to the polling-work with ksoftirqd polling
| forever the NIC to deliver maximal routing performance or something like
| that.  ksoftirqd will ensure fairness with the userspace load as well.
| You probably wouldn't get a benefit with tux because you would
| potentially lose way too much cpu with true polling and you're traffic
| is mostly going from the server to the clients not the othet way around
| (plus the clients uses delayed acks etc..), but the world isn't just
| tux.
| 
| Of course we agree that such a "polling router/firewall" behaviour must
| not be the default but it must be enabled on demand by the admin via
| sysctl or whatever else userspace API. And I don't see any problem with
| that.

  Depending on implementation, this may be an acceptable default,
assuming that the code can determine when too many irqs are being
serviced. There are many servers, and even workstations in campus
environments, which would benefit from changing to polling under burst
load. I don't know if even a router need be locked in that state, it
should stay there under normal load.

  I'm convinced that polling under heavy load is beneficial or
non-harmful on virtually every type of networked machine. Actually, any
machine subject to interrupt storms, many device interface or control
systems can get high rates due to physical events, networking is just a
common problem.

-- 
bill davidsen <davidsen@tmr.com>
 "If I were a diplomat, in the best case I'd go hungry.  In the worst
  case, people would die."
		-- Robert Lipe

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:12                           ` Alan Cox
@ 2001-10-08 15:09                             ` jamal
  2001-10-08 15:22                               ` Alan Cox
  2001-10-08 15:24                             ` Andrea Arcangeli
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-08 15:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: Jeff Garzik, Andrea Arcangeli, Ingo Molnar, Linux-Kernel, netdev,
	Linus Torvalds



On Mon, 8 Oct 2001, Alan Cox wrote:

> NAPI is important - the irq disable tactic is a last resort. If the right
> hardware is irq flood aware it should only ever trigger to save us from
> irq routing errors (eg cardbus hangs)

Agreed. As long as the IRQ flood protector can do proper isolation.
Here's hat i see on my dell latitude laptop with a built in ethernet (not
cardbus related ;->)

-------------------------------
[root@jzny /root]# cat /proc/interrupts
           CPU0
  0:   29408219          XT-PIC  timer
  1:     332192          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
 10:     643040          XT-PIC  Texas Instruments PCI1410 PC card Cardbus
Controller, eth0
 11:         17          XT-PIC  usb-uhci
 12:    2207062          XT-PIC  PS/2 Mouse
 14:     307504          XT-PIC  ide0
NMI:          0
LOC:          0
ERR:          0
MIS:          0
-----------------------------

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08 15:00                       ` Alan Cox
@ 2001-10-08 15:03                         ` Jeff Garzik
  2001-10-08 15:12                           ` Alan Cox
  2001-10-08 15:19                         ` Andrea Arcangeli
  1 sibling, 1 reply; 151+ messages in thread
From: Jeff Garzik @ 2001-10-08 15:03 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Ingo Molnar, jamal, Linux-Kernel, netdev,
	Linus Torvalds

On Mon, 8 Oct 2001, Alan Cox wrote:
> > Of course we agree that such a "polling router/firewall" behaviour must
> > not be the default but it must be enabled on demand by the admin via
> > sysctl or whatever else userspace API. And I don't see any problem with
> > that.
> 
> No I don't agree. "Stop random end users crashing my machine at will" is not
> a magic sysctl option - its a default. 

I think (Ingo's?) analogy of an airbag was appropriate, if that's indeed
how the code winds up functioning.

Having a mechanism that prevents what would otherwise be a lockup is
useful.  NAPI is useful.  Having both would be nice :)

	Jeff





^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08  0:31                     ` Andrea Arcangeli
  2001-10-08  4:58                       ` Bernd Eckenfels
@ 2001-10-08 15:00                       ` Alan Cox
  2001-10-08 15:03                         ` Jeff Garzik
  2001-10-08 15:19                         ` Andrea Arcangeli
  2001-10-08 15:10                       ` bill davidsen
  2 siblings, 2 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-08 15:00 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ingo Molnar, jamal, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Linus Torvalds,
	Alan Cox

> Of course we agree that such a "polling router/firewall" behaviour must
> not be the default but it must be enabled on demand by the admin via
> sysctl or whatever else userspace API. And I don't see any problem with
> that.

No I don't agree. "Stop random end users crashing my machine at will" is not
a magic sysctl option - its a default. 


Alan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
@ 2001-10-08 14:45 jamal
  2001-10-09  0:36 ` Scott Laird
  2001-10-09  4:04 ` Werner Almesberger
  0 siblings, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-08 14:45 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Bernd Eckenfels



>Yes, have a look at the work of the Click Modular Router PPL from MIT,
>having a Polling Router Module Implementatin which outperforms Linux
>Kernel Routing by far (according to their paper :)

I have read the click paper; i also just looked at the code and it seems
the tulip driver they use has the same roots as us (based on Alexey's
initial HFC driver)

Several things to note/observe:
- They use some very specialized piece of hardware (with two PCI buses).
- Roberts results on a single PCI bus hardware was showing ~360Kpps
routing vs clicks 435Kpps. This is not "far off" given the differences in
hardware. What would be really interesting is to have the click folks
post their latency results. I am curious as to what a purely polling
scheme they have would achieve (as opposed to NAPI which is a mixture of
interupts and polls).
- Linux is already "very modular" as a router with both the traffic
control framework and netfilter. I like their language specification etc;
ours is a little more primitive in comparison.
- Click seems to only run on a system that is designated as a router (as
you seem to point out).

Linux has a few other perks, but the above were to compare the two.

> You can find the Link to Click somewhere on my Page:
> http://www.freefire.org/tools/index.en.php3in the Operating System
> section (i think)

Nice web page and collection, btw. The right web page seems to be:
http://www.freefire.org/tools/index.en.php3

I looked at the latest click paper on SMP. It would help if they were
aware of whats happening on Linux (since it seems to be their primary OS).
softnet does what they are asking for sans the scheduling (which in Linux
proper is done via the IRQ scheduling). They also have a way for the
admin to specify the scheduling scheme; which is nice, but i am not sure
to be very valuable; I'll read the paper again to avoid hasty judgement.
It would be nice to work with the click people (at least to avoid
redundant work and maybe to get Linux mentioned in their paper -- they
even mention ALTQ but forget Linux, which is more advanced ;->).

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 19:17                         ` kuznet
@ 2001-10-08 13:58                           ` jamal
  2001-10-08 17:42                           ` Robert Olsson
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-08 13:58 UTC (permalink / raw)
  To: kuznet
  Cc: Andreas Dilger, Robert.Olsson, mingo, linux-kernel, bcrl, netdev,
	torvalds, alan



On Fri, 5 Oct 2001 kuznet@ms2.inr.ac.ru wrote:

> Hello!
>
> > One question which I have is why would you ever want to continue polling
> > if there is no work to be done?  Is it a tradeoff between the amount of
> > time to handle an IRQ vs. the time to do a poll?
>
> Yes. IRQ even taken alone eat non-trivial amount of resources.
>
> Actually, I remember Jamal worked with machine, which had
> no io-apic and only irq ack/mask/unmask eated >15% of cpu there. :-)
>

This was Robert actually; conclusion was Interupts are very expensive. If
we can get rid of as many of them as possible, we are getting a side
benefit. I cant find the old data, but Robert has some data over here:
http://robur.slu.se/Linux/net-development/experiments/010301



> "some hysteresis" is right word. This loop is an experiment with still
> unknown result yet. Originally, Jamal proposed to spin several times.
> I killed this.

It was a good idea you killed it, now that i think in retrospect,
The solution is much cleaner without it.

> Robert proposed to check inifinite loop yet. (Note,
> jiffies check is just a way to get rid of completely idle devices,
> one jiffie is enough lonf time to be considered infinite).
>

In my opinion we really dont need this. I did some quick testing, with and
without it and i dont see any differences.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-08  0:31                     ` Andrea Arcangeli
@ 2001-10-08  4:58                       ` Bernd Eckenfels
  2001-10-08 15:00                       ` Alan Cox
  2001-10-08 15:10                       ` bill davidsen
  2 siblings, 0 replies; 151+ messages in thread
From: Bernd Eckenfels @ 2001-10-08  4:58 UTC (permalink / raw)
  To: linux-kernel

In article <20011008023118.L726@athlon.random> you wrote:
> You're perfectly right that it's not ok for a generic computing
> environment to spend lots of cpu in polling, but it is clear that in a
> dedicated router/firewall we can just shutdown the NIC interrupt forever via
> disable_irq (no matter if the nic supports hw flow control or not, and
> in turn no matter if the kid tries to spam the machine with small
> packets) and dedicate 1 cpu to the polling-work with ksoftirqd polling
> forever the NIC to deliver maximal routing performance or something like
> that.

Yes, have a look at the work of the Click Modular Router PPL from MIT,
having a Polling Router Module Implementatin which outperforms Linux Kernel
Routing by far (according to their paper :)

You can find the Link to Click somewhere on my Page:
http://www.freefire.org/tools/index.en.php3in the Operating System section
(i think)

I can recommend the click-paper.pdf

Greetings
Bernd

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:51                   ` Ingo Molnar
  2001-10-04  0:46                     ` jamal
@ 2001-10-08  0:31                     ` Andrea Arcangeli
  2001-10-08  4:58                       ` Bernd Eckenfels
                                         ` (2 more replies)
  1 sibling, 3 replies; 151+ messages in thread
From: Andrea Arcangeli @ 2001-10-08  0:31 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox

[ I hope not to reiterate the obvious, I didn't read every single email
  of this thread ]

> > > In a generic computing environment i want to spend cycles doing useful
> > > work, not polling. Even the quick kpolld hack [which i dropped, so please
> > > dont regard it as a 'competitor' patch] i consider superior to this, as i
> > > can renice kpolld to reduce polling. (plus kpolld sucks up available idle
> > > cycles as well.) Unless i royally misunderstand it, i cannot stop the
> > > above code from wasting my cycles, and if that is true i do not want to
> > > see it in the kernel proper in this form.
> 
> On Wed, 3 Oct 2001, jamal wrote:
> > The interupt just flags "i, netdev, have work to do"; [...]
On Wed, Oct 03, 2001 at 06:51:55PM +0200, Ingo Molnar wrote:
> (and the only thing i pointed out was that the patch as-is did not limit
> the amount of polling done.)

You're perfectly right that it's not ok for a generic computing
environment to spend lots of cpu in polling, but it is clear that in a
dedicated router/firewall we can just shutdown the NIC interrupt forever via
disable_irq (no matter if the nic supports hw flow control or not, and
in turn no matter if the kid tries to spam the machine with small
packets) and dedicate 1 cpu to the polling-work with ksoftirqd polling
forever the NIC to deliver maximal routing performance or something like
that.  ksoftirqd will ensure fairness with the userspace load as well.
You probably wouldn't get a benefit with tux because you would
potentially lose way too much cpu with true polling and you're traffic
is mostly going from the server to the clients not the othet way around
(plus the clients uses delayed acks etc..), but the world isn't just
tux.

Of course we agree that such a "polling router/firewall" behaviour must
not be the default but it must be enabled on demand by the admin via
sysctl or whatever else userspace API. And I don't see any problem with
that.

Andrea

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 10:25         ` Ingo Molnar
@ 2001-10-07 20:37           ` Andrea Arcangeli
  0 siblings, 0 replies; 151+ messages in thread
From: Andrea Arcangeli @ 2001-10-07 20:37 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: BALBIR SINGH, Linus Torvalds, linux-kernel

Ingo could you explain me one basic thing?

What the hell has the hardirq rate limit logic to do with softirqs?

(btw, I don't know why you call it irq-rewrite, you didn't rewrote
anything, you just added a irq flood avoidance feature by lefting irq
disabled when you detect an irq flood caming in)

hardirqs have nothing to do with softirqs. Softirq as their name suggest
are a totally software thing, they're generated by software,
incidentally for the network stack they're posted from hard irq handlers
because network cards are irq driven, but that's just a special case (of
course it is the common case), but it is not the general case.

Your hardirq rate limit logic that lefts the irq disabled for some time
is certainly needed from a security standpoint to avoid DoS if untrusted
users can generates a flood of irqs using some device, unless the
devices provides a way to flow control the irq rate (which I understood
most hardware that can generate a flood of irqs provides anyways).

as far I can tell any change to the softirq logic is completly
orthogonal with the softirq changes. Changing both things together or
seeing any connection between the two just shows a very limited network
oriented view of the whole picture about the softirqs.

Now I'm not saying that I don't want to change anything in the softirq
logic, for example the deschedule logic made lots of sense and I can see
the benefit for users like the network stack.

Andrea

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 18:48                       ` Andreas Dilger
  2001-10-05 19:07                         ` Davide Libenzi
  2001-10-05 19:17                         ` kuznet
@ 2001-10-07  6:11                         ` Robert Olsson
  2 siblings, 0 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-07  6:11 UTC (permalink / raw)
  To: kuznet
  Cc: Andreas Dilger, Robert.Olsson, mingo, hadi, linux-kernel, bcrl,
	netdev, torvalds, alan


kuznet@ms2.inr.ac.ru writes:

 > "some hysteresis" is right word. This loop is an experiment with still
 > unknown result yet. Originally, Jamal proposed to spin several times.
 > I killed this. Robert proposed to check inifinite loop yet. (Note,
 > jiffies check is just a way to get rid of completely idle devices,
 > one jiffie is enough lonf time to be considered infinite).
 > 

 And from our discussion about packet-reordering we get even more motivation
 for the "extra-polls" not only to save IRQ's 

 We may expand this to others too...

 As polling-lists are per CPU and consecutive polls stays within the same
 CPU the device becomes bound to one CPU. We are protected against packet 
 reordering as long there are consecutive polls.

 I've consulted some CS people who has worked with this issues and I have
 understood packet reordering is non-trivial problem at least with a general 
 approach.

 So to me it seems we do very well with a very simple scheme and as I 
 understand all SMP networking will benefit from this.

 Our "field-test" indicates that the packet load is still well distributed 
 among the CPU's.

 So maybe the showstopper comes out as a showwinner. :-)

 Cheers.

						--ro

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 18:48                       ` Andreas Dilger
  2001-10-05 19:07                         ` Davide Libenzi
@ 2001-10-05 19:17                         ` kuznet
  2001-10-08 13:58                           ` jamal
  2001-10-08 17:42                           ` Robert Olsson
  2001-10-07  6:11                         ` Robert Olsson
  2 siblings, 2 replies; 151+ messages in thread
From: kuznet @ 2001-10-05 19:17 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Robert.Olsson, mingo, hadi, linux-kernel, bcrl, netdev, torvalds, alan

Hello!

> One question which I have is why would you ever want to continue polling
> if there is no work to be done?  Is it a tradeoff between the amount of
> time to handle an IRQ vs. the time to do a poll?

Yes. IRQ even taken alone eat non-trivial amount of resources.

Actually, I remember Jamal worked with machine, which had
no io-apic and only irq ack/mask/unmask eated >15% of cpu there. :-)

>						  An assumption that if
> there was previous network traffic there is likely to be more the next
> time the interface is checked (assuming you have other work to do between
> the time you last polled the device and the next poll)?

Exactly.

Note also that the testing of "goto not_done" was made in pure environment:
dedicated router. Continuous polling is an evident advantage in this situation,
only power is eaten. I would not enable this on a notebook. :-)


> Is enabling/disabling of the RX interrupts on the network card an issue
> in the sense of "you need to wait X us after writing to this register
> for it to take effect" or other issue which makes it preferrable to have
> some "hysteresis" between changing state from IRQ-driven to polling?

"some hysteresis" is right word. This loop is an experiment with still
unknown result yet. Originally, Jamal proposed to spin several times.
I killed this. Robert proposed to check inifinite loop yet. (Note,
jiffies check is just a way to get rid of completely idle devices,
one jiffie is enough lonf time to be considered infinite).

Alexey

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 18:48                       ` Andreas Dilger
@ 2001-10-05 19:07                         ` Davide Libenzi
  2001-10-05 19:17                         ` kuznet
  2001-10-07  6:11                         ` Robert Olsson
  2 siblings, 0 replies; 151+ messages in thread
From: Davide Libenzi @ 2001-10-05 19:07 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Robert Olsson, mingo, jamal, linux-kernel, Alexey Kuznetsov,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox

On Fri, 5 Oct 2001, Andreas Dilger wrote:

> On Oct 05, 2001  16:52 +0200, Robert Olsson wrote:
> >  > If you get to the stage where you are turning off IRQs and going to a
> >  > polling mode, then don't turn IRQs back on until you have a poll (or
> >  > two or whatever) that there is no work to be done.  This will at worst
> >  > give you 50% polling success, but in practise you wouldn't start polling
> >  > until there is lots of work to be done, so the real success rate will
> >  > be much higher.
> >  >
> >  > At this point (no work to be done when polling) there are clearly no
> >  > interrupts would be generated (because no packets have arrived), so it
> >  > should be reasonable to turn interrupts back on and stop polling (assuming
> >  > non-broken hardware).  You now go back to interrupt-driven work until
> >  > the rate increases again.  This means you limit IRQ rates when needed,
> >  > but only do one or two excess polls before going back to IRQ-driven work.
> >
> >  Yes this has been considered and actually I think Jamal did this in one of
> >  the pre NAPI patch. I tried something similar... but instead of using a
> >  number of excess polls I was doing excess polls for a short time (a
> >  jiffie). This was the showstopper mentioned the previous mails. :-)
>
> (Note that I hadn't read the NAPI paper until after I posted the above, and
> it appears that I was describing pretty much what NAPI already does ;-).  In
> light of that, I wholly agree that NAPI is a superior solution for handling
> IRQ overload from a network device.
>
> >  Anyway it up to driver to decide this policy. If the driver returns
> >  "not_done" it is simply polled again. So low-rate network drivers can have
> >  a different policy compared to an OC-48 driver. Even continues polling is
> >  therefore possible and even showstoppers. :-)  There are protection for
> >  polling livelocks.
>
> One question which I have is why would you ever want to continue polling
> if there is no work to be done?

According to the doc the poll is stopped when 1) there're no more packets
to be fetched from dma ring 2) quota is reached.




- Davide



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05 14:52                     ` Robert Olsson
@ 2001-10-05 18:48                       ` Andreas Dilger
  2001-10-05 19:07                         ` Davide Libenzi
                                           ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: Andreas Dilger @ 2001-10-05 18:48 UTC (permalink / raw)
  To: Robert Olsson
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox

On Oct 05, 2001  16:52 +0200, Robert Olsson wrote:
>  > If you get to the stage where you are turning off IRQs and going to a
>  > polling mode, then don't turn IRQs back on until you have a poll (or
>  > two or whatever) that there is no work to be done.  This will at worst
>  > give you 50% polling success, but in practise you wouldn't start polling
>  > until there is lots of work to be done, so the real success rate will
>  > be much higher.
>  > 
>  > At this point (no work to be done when polling) there are clearly no
>  > interrupts would be generated (because no packets have arrived), so it
>  > should be reasonable to turn interrupts back on and stop polling (assuming
>  > non-broken hardware).  You now go back to interrupt-driven work until
>  > the rate increases again.  This means you limit IRQ rates when needed,
>  > but only do one or two excess polls before going back to IRQ-driven work.
> 
>  Yes this has been considered and actually I think Jamal did this in one of
>  the pre NAPI patch. I tried something similar... but instead of using a
>  number of excess polls I was doing excess polls for a short time (a
>  jiffie). This was the showstopper mentioned the previous mails. :-) 

(Note that I hadn't read the NAPI paper until after I posted the above, and
it appears that I was describing pretty much what NAPI already does ;-).  In
light of that, I wholly agree that NAPI is a superior solution for handling
IRQ overload from a network device.

>  Anyway it up to driver to decide this policy. If the driver returns 
>  "not_done" it is simply polled again. So low-rate network drivers can have 
>  a different policy compared to an OC-48 driver. Even continues polling is
>  therefore possible and even showstoppers. :-)  There are protection for
>  polling livelocks.

One question which I have is why would you ever want to continue polling
if there is no work to be done?  Is it a tradeoff between the amount of
time to handle an IRQ vs. the time to do a poll?  An assumption that if
there was previous network traffic there is likely to be more the next
time the interface is checked (assuming you have other work to do between
the time you last polled the device and the next poll)?

Is enabling/disabling of the RX interrupts on the network card an issue
in the sense of "you need to wait X us after writing to this register
for it to take effect" or other issue which makes it preferrable to have
some "hysteresis" between changing state from IRQ-driven to polling?

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:35                       ` Ingo Molnar
  2001-10-04 11:41                         ` jamal
@ 2001-10-05 16:42                         ` kuznet
  1 sibling, 0 replies; 151+ messages in thread
From: kuznet @ 2001-10-05 16:42 UTC (permalink / raw)
  To: mingo; +Cc: hadi, linux-kernel, Robert.Olsson, bcrl, netdev, torvalds, alan

Hello!

> i'm asking the following thing. dev->quota, as i read the patch now, can
> cause extra calls to ->poll() even though the RX ring of that particular
> device is empty and the driver has indicated it's done processing RX
> packets. (i'm now assuming that the extra-polling-for-a-jiffy line in the
> current patch is removed - that one is a showstopper to begin with.) Is
> this claim of mine correct?

No.

If ring is empty, device is removed from poll list and dev->poll is not called
more.

dev->quota is to preempt service when ring does not want to clear.
In this case work remains for the next round after all the rest
of interfaces are served. Well, it is to allow to give user control
on distribution cpu time between interfaces, when cpu is 100% utilized
and we have to drop something. Devices with lower weights will get
less service.


> packets. (i'm now assuming that the extra-polling-for-a-jiffy line in the

It is not so bogus with current kernel with working ksoftirqd.

The goal was to check what happens really when we enforce polling
even when machine is generally happy. For me it is not evident apriori:
more cpu is eaten uselessly or less due to absent irqs.
Note, that on dedicated router it is pretty normal to spin in context
of ksoftirqd switching to control tasks when it is required.
And, actually, it is amazing feature of the scheme, that it is so easy
to add such option.

Anyway, to all that I remember, the question remained unanswered. :-)
Robert even observed that only 9% of cpu is eaten, which is surely
cannot be true. :-)

Alexey

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
                                     ` (2 preceding siblings ...)
  2001-10-04 22:10                   ` Alan Cox
@ 2001-10-05 15:22                   ` Robert Olsson
  3 siblings, 0 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-05 15:22 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, mingo, jamal, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Linus Torvalds,
	Simon Kirby


Alan Cox writes:

 > You only think that. After a few minutes the kiddie pulls down your routing
 > because your route daemons execute no code. Also during the attack your sshd
 > wont run so you cant log in to find out what is up

Indeed.

I have a real example from a university core router with BGP and full 
Internet routing. I managed to get in via ssh during the DoS attack. 
We see that the 5 min dropping rate is about the same as the input 
rate. The duration of this attack was more half an hour and BGP survied 
and the box was pretty manageable. This was with a hacked tulip driver 
switching to RX-polling at high loads.
 
eth2: UP Locked MII Full DuplexLink UP
Admin up    6 day(s) 13 hour(s) 47 min 51 sec 
Last input  NOW
Last output NOW
5min RX bit/s   23.9 M
5min TX bit/s   1.1 M
5min RX pkts/s  46439        
5min TX pkts/s  759          
5min TX errors  0            
5min RX errors  0            
5min RX dropped 47038        
5min TX dropped 0            
5min collisions 0            

Well, this was a router but I think we very soon have the same demands for
most Internet servers.

Cheers.
						--ro



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 21:08                   ` Robert Olsson
  2001-10-03 22:22                     ` Andreas Dilger
@ 2001-10-05 14:52                     ` Robert Olsson
  2001-10-05 18:48                       ` Andreas Dilger
  1 sibling, 1 reply; 151+ messages in thread
From: Robert Olsson @ 2001-10-05 14:52 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Robert Olsson, mingo, jamal, linux-kernel, Alexey Kuznetsov,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox


Andreas Dilger writes:

 > If you get to the stage where you are turning off IRQs and going to a
 > polling mode, then don't turn IRQs back on until you have a poll (or
 > two or whatever) that there is no work to be done.  This will at worst
 > give you 50% polling success, but in practise you wouldn't start polling
 > until there is lots of work to be done, so the real success rate will
 > be much higher.
 > 
 > At this point (no work to be done when polling) there are clearly no
 > interrupts would be generated (because no packets have arrived), so it
 > should be reasonable to turn interrupts back on and stop polling (assuming
 > non-broken hardware).  You now go back to interrupt-driven work until
 > the rate increases again.  This means you limit IRQ rates when needed,
 > but only do one or two excess polls before going back to IRQ-driven work.

 Hello!

 Yes this has been considered and actually I think Jamal did this in one of
 the pre NAPI patch. I tried something similar... but instead of using a number
 of excess polls I was doing excess polls for a short time (a jiffie). This 
 was the showstopper mentioned the previous mails. :-) 

 Anyway it up to driver to decide this policy. If the driver returns 
 "not_done" it is simply polled again. So low-rate network drivers can have 
 a different policy compared to an OC-48 driver. Even continues polling is
 therefore possible and even showstoppers. :-)  There are protection for
 polling livelocks.

 Cheers.
						--ro

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 13:38             ` Robert Olsson
  2001-10-04 21:22               ` Alex Bligh - linux-kernel
@ 2001-10-05 14:32               ` Robert Olsson
  1 sibling, 0 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-05 14:32 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel
  Cc: Robert Olsson, jamal, Ingo Molnar, linux-kernel,
	Alexey Kuznetsov, Benjamin LaHaise, netdev, Linus Torvalds,
	Alan Cox


Alex Bligh - linux-kernel writes:

 > I seem to remember jamal saying the NAPI stuff was available
 > since 2.(early). Is there a stable 2.2.20 patch?


 Hello!

 Current NAPI incarnation came first for 2.4.3 and holds ANK trademark.
 Jamal had pre-NAPI patches long before and we have been testing/profiling
 polling and flow control versions of popular network drivers in the lab and 
 on on highly loaded Internet sites for a long time. I consider the NAPI
 work to initiated by Jamal at OLS two years ago. No I don't know of any 
 usable code for 2.2.*

 Cheers.

						--ro





^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05  0:00                           ` Ben Greear
  2001-10-05  0:18                             ` Davide Libenzi
@ 2001-10-05  2:01                             ` jamal
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-05  2:01 UTC (permalink / raw)
  To: Ben Greear
  Cc: Linus Torvalds, Robert Love, Benjamin LaHaise,
	Alex Bligh - linux-kernel, mingo, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, netdev, Alan Cox, Simon Kirby



On Thu, 4 Oct 2001, Ben Greear wrote:

> Linus Torvalds wrote:
> >
> > On 4 Oct 2001, Robert Love wrote:
> > >
> > > Agreed.  I am actually amazed that the opposite of what is happening
> > > does not happen -- that more people aren't clamoring for this solution.
> >
> > Ehh.. I think that most people who are against Ingo's patches are so
> > mainly because there _is_ an alternative that looks nicer.
> >
> >                 Linus
>
> The alternative (NAPI) only works with Tulip and Intel NICs, it seems.
> When the alternative works for every driver known (including 3rd party
> ones, like the e100), then it will truly be an alternative.  Untill
> then, it will be a great feature for those who can use it, and the
> rest of the poor folks will need a big generic hammer.
>

Ben,
Lets put some reality check and history just for entertainment value:
It took ten years of Linux existence (and i am just using you
as an example no pun intended) to realize your life was actually
an emergency that depended  on Ingos patch. Maybe i am being cruel,
so lets backtrack only over the last 4 years when Alexey first had the
HFC in there; i am willing to bet a large amount of money that you didnt
once use it or even care to post a query if such a thing existed. Ok, so
lets assume you didnt know it existed ... over a year back i posted widely
on it with conjunction to the return code to the netif_rx() extension ...
and still you didnt care that much although you seem to be a user of one
of the converted drivers -- the tulip and in particular the znyx hardware
which was used in the testing. IIRC, you actually said something on that
post .. Then one bright early morn, Eastern time zone, Ingo appears, not
in the form of atoms rather electrons masquareding as bits ...

cheers,
jamal

PS:- I am going to try and mitigate myself from this thread now; my
email-sending rate will be drastically reduced.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-05  0:00                           ` Ben Greear
@ 2001-10-05  0:18                             ` Davide Libenzi
  2001-10-05  2:01                             ` jamal
  1 sibling, 0 replies; 151+ messages in thread
From: Davide Libenzi @ 2001-10-05  0:18 UTC (permalink / raw)
  To: Ben Greear
  Cc: Linus Torvalds, Robert Love, Benjamin LaHaise,
	Alex Bligh - linux-kernel, mingo, jamal, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, netdev, Alan Cox, Simon Kirby

On Thu, 4 Oct 2001, Ben Greear wrote:

> Linus Torvalds wrote:
> >
> > On 4 Oct 2001, Robert Love wrote:
> > >
> > > Agreed.  I am actually amazed that the opposite of what is happening
> > > does not happen -- that more people aren't clamoring for this solution.
> >
> > Ehh.. I think that most people who are against Ingo's patches are so
> > mainly because there _is_ an alternative that looks nicer.
> >
> >                 Linus
>
> The alternative (NAPI) only works with Tulip and Intel NICs, it seems.
> When the alternative works for every driver known (including 3rd party
> ones, like the e100), then it will truly be an alternative.  Untill
> then, it will be a great feature for those who can use it, and the
> rest of the poor folks will need a big generic hammer.

NAPI needs aware drivers and introduces changes to the queue processing (
packets left in DMA ring ) and it'll be at least 2.5.x
It's clearly a nicer solution that does not suffer of drawbacks that
Ingo's code have.
Ingo's patch is more hack-ish but addresses the problem with minimal
changes.




- Davide



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 23:51                         ` Linus Torvalds
@ 2001-10-05  0:00                           ` Ben Greear
  2001-10-05  0:18                             ` Davide Libenzi
  2001-10-05  2:01                             ` jamal
  0 siblings, 2 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-05  0:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Robert Love, Benjamin LaHaise, Alex Bligh - linux-kernel, mingo,
	jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson, netdev,
	Alan Cox, Simon Kirby

Linus Torvalds wrote:
> 
> On 4 Oct 2001, Robert Love wrote:
> >
> > Agreed.  I am actually amazed that the opposite of what is happening
> > does not happen -- that more people aren't clamoring for this solution.
> 
> Ehh.. I think that most people who are against Ingo's patches are so
> mainly because there _is_ an alternative that looks nicer.
> 
>                 Linus

The alternative (NAPI) only works with Tulip and Intel NICs, it seems.
When the alternative works for every driver known (including 3rd party
ones, like the e100), then it will truly be an alternative.  Untill
then, it will be a great feature for those who can use it, and the
rest of the poor folks will need a big generic hammer.

>From personal experince, I imagine the problem is also that it was
not invented here, where here is where each of sit.  And I include
myself in that bias!

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 23:47                       ` Robert Love
@ 2001-10-04 23:51                         ` Linus Torvalds
  2001-10-05  0:00                           ` Ben Greear
  0 siblings, 1 reply; 151+ messages in thread
From: Linus Torvalds @ 2001-10-04 23:51 UTC (permalink / raw)
  To: Robert Love
  Cc: Benjamin LaHaise, Alex Bligh - linux-kernel, mingo, jamal,
	linux-kernel, Alexey Kuznetsov, Robert Olsson, netdev, Alan Cox,
	Simon Kirby


On 4 Oct 2001, Robert Love wrote:
>
> Agreed.  I am actually amazed that the opposite of what is happening
> does not happen -- that more people aren't clamoring for this solution.

Ehh.. I think that most people who are against Ingo's patches are so
mainly because there _is_ an alternative that looks nicer.

		Linus


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 23:20                     ` Alex Bligh - linux-kernel
  2001-10-04 23:26                       ` Benjamin LaHaise
@ 2001-10-04 23:47                       ` Robert Love
  2001-10-04 23:51                         ` Linus Torvalds
  1 sibling, 1 reply; 151+ messages in thread
From: Robert Love @ 2001-10-04 23:47 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Alex Bligh - linux-kernel, mingo, jamal, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, netdev, Linus Torvalds,
	Alan Cox, Simon Kirby

On Thu, 2001-10-04 at 19:26, Benjamin LaHaise wrote:
> Frankly I'm sick of this entire discussion where people claim that no 
> form of interrupt throttling is ever needed.  It's an emergency measure 
> that is needed under some circumstances as very few drivers properly 
> protect against this kind of DoS.  Drivers that do things correctly will 
> never trigger the hammer.  Plus it's configurable.  If you'd bothered to 
> read and understand the rest of this thread you wouldn't have posted.

Agreed.  I am actually amazed that the opposite of what is happening
does not happen -- that more people aren't clamoring for this solution.

Six months ago I was testing some TCP application and by accident placed
a sendto() in an infinite loop.  The destination of the packets (on my
LAN) locked up completely!  And this was a powerful Pentium III with a
3c905 NIC.  Not acceptable.

	Robert Love


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 23:25                     ` Alex Bligh - linux-kernel
@ 2001-10-04 23:34                       ` Simon Kirby
  0 siblings, 0 replies; 151+ messages in thread
From: Simon Kirby @ 2001-10-04 23:34 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel; +Cc: mingo, linux-kernel

On Fri, Oct 05, 2001 at 12:25:41AM +0100, Alex Bligh - linux-kernel wrote:

> > Ingo is not limiting interrupts to make it drop packets and forget things
> > just so that userspace can proceed.  Instead, he is postponing servicing
> > of the interrupts so that the card can batch up more packets and the
> > interrupt will retrieve more at once rather than continually leaving and
> > entering the interrupt to just pick up a few packets.  Without this, the
> > interrupt will starve everything else, and nothing will get done.
> 
> Ah OK. in this case already looking at interupt coalescing at firmware
> level which mitigates this 'earlier on', however even this
> stratgy fails at higher pps levels => i.e. in these circumstances
> the card buffer is already full-ish, as the interrupt has already been
> postponed, and postponing it further can only cause dropped packets
> through buffer overrun.

Right.  But right now, the fact that the packets are so small and are
arriving so fast makes the interrupt handler overhead starve everything
else, and interrupt mitigation can make a box that would otherwise be
dead work properly.  If the box gets even more packets and the CPU
saturates, then the box would be dead before without the patch anyway.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 22:10                   ` Alan Cox
@ 2001-10-04 23:28                     ` Alex Bligh - linux-kernel
  0 siblings, 0 replies; 151+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-10-04 23:28 UTC (permalink / raw)
  To: Alan Cox, linux-kernel
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Simon Kirby,
	Alex Bligh - linux-kernel



--On Thursday, 04 October, 2001 11:10 PM +0100 Alan Cox 
<alan@lxorguk.ukuu.org.uk> wrote:

> You only think that. After a few minutes the kiddie pulls down your
> routing because your route daemons execute no code. Also during the
> attack your sshd wont run so you cant log in to find out what is up

There is truth in this. Which is why doing things like
a crude WRED on the card, in the firmware,
(i.e. before it sends the data into user space) is something
we looked at but never got round to.

--
Alex Bligh

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 23:20                     ` Alex Bligh - linux-kernel
@ 2001-10-04 23:26                       ` Benjamin LaHaise
  2001-10-04 23:47                       ` Robert Love
  1 sibling, 0 replies; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-04 23:26 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby

On Fri, Oct 05, 2001 at 12:20:34AM +0100, Alex Bligh - linux-kernel wrote:
> Rather than bugging the author of the driver card, we've actually
> been trying to fix it, down to rewriting the firmware. So for
> this purpose I/we am/are the driver maintainer thanks. However,
> there are limitations like bus speed which mean that in practice
> if we receive a large enough number of small packets each second,
> the box will saturate.

Not if the driver has a decent irq mitigation schema and uses the 
hw flow control + NAPI bits.

> Not sure this required jumping down my throat.

Frankly I'm sick of this entire discussion where people claim that no 
form of interrupt throttling is ever needed.  It's an emergency measure 
that is needed under some circumstances as very few drivers properly 
protect against this kind of DoS.  Drivers that do things correctly will 
never trigger the hammer.  Plus it's configurable.  If you'd bothered to 
read and understand the rest of this thread you wouldn't have posted.

		-ben

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 22:01                   ` Simon Kirby
@ 2001-10-04 23:25                     ` Alex Bligh - linux-kernel
  2001-10-04 23:34                       ` Simon Kirby
  0 siblings, 1 reply; 151+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-10-04 23:25 UTC (permalink / raw)
  To: Simon Kirby, Alex Bligh - linux-kernel
  Cc: mingo, linux-kernel, Alex Bligh - linux-kernel



--On Thursday, 04 October, 2001 3:01 PM -0700 Simon Kirby 
<sim@netnation.com> wrote:

> Ingo is not limiting interrupts to make it drop packets and forget things
> just so that userspace can proceed.  Instead, he is postponing servicing
> of the interrupts so that the card can batch up more packets and the
> interrupt will retrieve more at once rather than continually leaving and
> entering the interrupt to just pick up a few packets.  Without this, the
> interrupt will starve everything else, and nothing will get done.

Ah OK. in this case already looking at interupt coalescing at firmware
level which mitigates this 'earlier on', however even this
stratgy fails at higher pps levels => i.e. in these circumstances
the card buffer is already full-ish, as the interrupt has already been
postponed, and postponing it further can only cause dropped packets
through buffer overrun.

--
Alex Bligh

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 21:49                   ` Benjamin LaHaise
@ 2001-10-04 23:20                     ` Alex Bligh - linux-kernel
  2001-10-04 23:26                       ` Benjamin LaHaise
  2001-10-04 23:47                       ` Robert Love
  0 siblings, 2 replies; 151+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-10-04 23:20 UTC (permalink / raw)
  To: Benjamin LaHaise, Alex Bligh - linux-kernel
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby,
	Alex Bligh - linux-kernel



--On Thursday, 04 October, 2001 5:49 PM -0400 Benjamin LaHaise 
<bcrl@redhat.com> wrote:

>> In at least one environment known to me (router), I'd rather it
>> kept accepting packets, and f/w'ing them, and didn't switch VTs etc.
>> By dropping down performance, you've made the DoS attack even
>> more successful than it would otherwise have been (the kiddie
>> looks at effect on the host at the end).
>
> Then bug the driver author of your ethernet cards or turn the hammer off.
>  You're the sysadmin, you know that your system is unusual.  Deal with it.

The hammer has an average age of 13yrs and is difficult to turn off,
unfortunately.

Rather than bugging the author of the driver card, we've actually
been trying to fix it, down to rewriting the firmware. So for
this purpose I/we am/are the driver maintainer thanks. However,
there are limitations like bus speed which mean that in practice
if we receive a large enough number of small packets each second,
the box will saturate.

My point was merely that some applications (and using a linux
box as a router is not that 'unusual') want to deprioritize
different things under resource starvation. Changing the default,
in an unconfigurable way, isn't a great idea. Sure dealing
with external resource exhaustions for hosts is indeed a good
idea. I was just suggesting that it wasn't always what you
wanted to do.

Not sure this required jumping down my throat.

--
Alex Bligh

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
  2001-10-04 21:49                   ` Benjamin LaHaise
  2001-10-04 22:01                   ` Simon Kirby
@ 2001-10-04 22:10                   ` Alan Cox
  2001-10-04 23:28                     ` Alex Bligh - linux-kernel
  2001-10-05 15:22                   ` Robert Olsson
  3 siblings, 1 reply; 151+ messages in thread
From: Alan Cox @ 2001-10-04 22:10 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox, Simon Kirby

> In at least one environment known to me (router), I'd rather it
> kept accepting packets, and f/w'ing them, and didn't switch VTs etc.
> By dropping down performance, you've made the DoS attack even
> more successful than it would otherwise have been (the kiddie
> looks at effect on the host at the end).

You only think that. After a few minutes the kiddie pulls down your routing
because your route daemons execute no code. Also during the attack your sshd
wont run so you cant log in to find out what is up

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
  2001-10-04 21:49                   ` Benjamin LaHaise
@ 2001-10-04 22:01                   ` Simon Kirby
  2001-10-04 23:25                     ` Alex Bligh - linux-kernel
  2001-10-04 22:10                   ` Alan Cox
  2001-10-05 15:22                   ` Robert Olsson
  3 siblings, 1 reply; 151+ messages in thread
From: Simon Kirby @ 2001-10-04 22:01 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel; +Cc: mingo, linux-kernel

On Thu, Oct 04, 2001 at 10:28:17PM +0100, Alex Bligh - linux-kernel wrote:

> In at least one environment known to me (router), I'd rather it
> kept accepting packets, and f/w'ing them, and didn't switch VTs etc.
> By dropping down performance, you've made the DoS attack even
> more successful than it would otherwise have been (the kiddie
> looks at effect on the host at the end).

No.

Ingo is not limiting interrupts to make it drop packets and forget things
just so that userspace can proceed.  Instead, he is postponing servicing
of the interrupts so that the card can batch up more packets and the
interrupt will retrieve more at once rather than continually leaving and
entering the interrupt to just pick up a few packets.  Without this, the
interrupt will starve everything else, and nothing will get done.

By postponing servicing of the interrupt (and thus increasing latency
slightly), throughput will actually increase.

Obviously, if the card Rx buffers overflow because the interrupts weren't
serviced quickly enough, then packets will be dropped.  This is still
better than the machine not being able to actually do anything with the
received packets (and also not able to do anything else such as allow the
administrator to figure out what is happening).

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
@ 2001-10-04 21:49                   ` Benjamin LaHaise
  2001-10-04 23:20                     ` Alex Bligh - linux-kernel
  2001-10-04 22:01                   ` Simon Kirby
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-04 21:49 UTC (permalink / raw)
  To: Alex Bligh - linux-kernel
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby

On Thu, Oct 04, 2001 at 10:28:17PM +0100, Alex Bligh - linux-kernel wrote:
> In at least one environment known to me (router), I'd rather it
> kept accepting packets, and f/w'ing them, and didn't switch VTs etc.
> By dropping down performance, you've made the DoS attack even
> more successful than it would otherwise have been (the kiddie
> looks at effect on the host at the end).

Then bug the driver author of your ethernet cards or turn the hammer off.  
You're the sysadmin, you know that your system is unusual.  Deal with it.

		-ben

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 14:51               ` Ingo Molnar
  2001-10-03 15:14                 ` jamal
@ 2001-10-04 21:28                 ` Alex Bligh - linux-kernel
  2001-10-04 21:49                   ` Benjamin LaHaise
                                     ` (3 more replies)
  1 sibling, 4 replies; 151+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-10-04 21:28 UTC (permalink / raw)
  To: mingo, jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby,
	Alex Bligh - linux-kernel



--On Wednesday, 03 October, 2001 4:51 PM +0200 Ingo Molnar <mingo@elte.hu> 
wrote:

> your refusal to accept this problem as an existing and real problem is
> really puzzling me.

In at least one environment known to me (router), I'd rather it
kept accepting packets, and f/w'ing them, and didn't switch VTs etc.
By dropping down performance, you've made the DoS attack even
more successful than it would otherwise have been (the kiddie
looks at effect on the host at the end).

--
Alex Bligh

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 13:38             ` Robert Olsson
@ 2001-10-04 21:22               ` Alex Bligh - linux-kernel
  2001-10-05 14:32               ` Robert Olsson
  1 sibling, 0 replies; 151+ messages in thread
From: Alex Bligh - linux-kernel @ 2001-10-04 21:22 UTC (permalink / raw)
  To: Robert Olsson, jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Alex Bligh - linux-kernel

>  > The paper is at: http://www.cyberus.ca/~hadi/usenix-paper.tgz
>  > Robert can point you to the latest patches.
>
>
>  Current code... there are still some parts we like to better.
>
>  Available via ftp from robur.slu.se:/pub/Linux/net-development/NAPI/
>  2.4.10-poll.pat

I seem to remember jamal saying the NAPI stuff was available
since 2.(early). Is there a stable 2.2.20 patch?

--
Alex Bligh

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 19:00                             ` jamal
@ 2001-10-04 21:16                               ` Ion Badulescu
  0 siblings, 0 replies; 151+ messages in thread
From: Ion Badulescu @ 2001-10-04 21:16 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel

On Thu, 4 Oct 2001, jamal wrote:

> I could write a small HOWTO at least for HWFLOWCONTROL since that doesnt
> need anything fancy.

That'd be very nice.

Thanks,
Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 18:55                           ` Ion Badulescu
@ 2001-10-04 19:00                             ` jamal
  2001-10-04 21:16                               ` Ion Badulescu
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-04 19:00 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: linux-kernel



On Thu, 4 Oct 2001, Ion Badulescu wrote:

> On Thu, 4 Oct 2001 07:54:19 -0400 (EDT), jamal <hadi@cyberus.ca> wrote:
>
> > Has nothing to do with specific hardware although i see your point.
> > send me an eepro and i'll at least add hardware flow control for you.
> > The API is simple, its up to the driver maintainers to use. This
> > discussion is good to make people aware of those drivers.
>
> A bit of documentation for the hardware flow control API would help as
> well. The API might be fine and dandy, but if all you have is a couple of
> modified drivers -- some of which are not even in the standard kernel --
> then you can bet not many driver writers are going to even be aware of it,
> let alone care to implement it.

I could write a small HOWTO at least for HWFLOWCONTROL since that doesnt
need anything fancy.

>
> For instance: in 2.2.19, the help text for CONFIG_NET_HW_FLOWCONTROL says
> only tulip supports it in the standard kernel -- yet I can't find that
> support anywhere in drivers/net/*.c, tulip.c included.
>

Thats dated. It means a doc is needed.

> In 2.4.10 tulip finally supports it (and I'm definitely going to take a
> closer look), but that's about it. And tulip is definitely the wrong
> example to pick if you want a nice and clean model for your driver.
>

I like the tulip code.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 11:54                         ` jamal
  2001-10-04 15:03                           ` Tim Hockin
@ 2001-10-04 18:55                           ` Ion Badulescu
  2001-10-04 19:00                             ` jamal
  1 sibling, 1 reply; 151+ messages in thread
From: Ion Badulescu @ 2001-10-04 18:55 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel

On Thu, 4 Oct 2001 07:54:19 -0400 (EDT), jamal <hadi@cyberus.ca> wrote:

> Has nothing to do with specific hardware although i see your point.
> send me an eepro and i'll at least add hardware flow control for you.
> The API is simple, its up to the driver maintainers to use. This
> discussion is good to make people aware of those drivers.

A bit of documentation for the hardware flow control API would help as 
well. The API might be fine and dandy, but if all you have is a couple of 
modified drivers -- some of which are not even in the standard kernel -- 
then you can bet not many driver writers are going to even be aware of it, 
let alone care to implement it.

For instance: in 2.2.19, the help text for CONFIG_NET_HW_FLOWCONTROL says 
only tulip supports it in the standard kernel -- yet I can't find that 
support anywhere in drivers/net/*.c, tulip.c included.

In 2.4.10 tulip finally supports it (and I'm definitely going to take a 
closer look), but that's about it. And tulip is definitely the wrong 
example to pick if you want a nice and clean model for your driver.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 17:40                           ` Andreas Dilger
@ 2001-10-04 18:33                             ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04 18:33 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Simon Kirby



On Thu, 4 Oct 2001, Andreas Dilger wrote:

> On Oct 04, 2001  07:34 -0400, jamal wrote:
> > 1) you shut down shared interupts; take a look at this posting by Marcus
> > Sundberg <marcus@cendio.se>
> >
> > ---------------
> >
> >   0:    7602983          XT-PIC  timer
> >   1:      10575          XT-PIC  keyboard
> >   2:          0          XT-PIC  cascade
> >   8:          1          XT-PIC  rtc
> >  11:    1626004          XT-PIC  Toshiba America Info Systems ToPIC95 PCI
> > \
> > to Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95
> > PCI \
> > to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom Card, \
> > Intel 440MX, irda0  12:       1342          XT-PIC  PS/2 Mouse
> >  14:      23605          XT-PIC  ide0
> >
> > -----------------------------
> >
> > Now you go and shut down IRQ 11 and punish all devices there. If you can
> > avoid that, it is acceptable as a temporary replacement to be upgraded to
> > a better scheme.
>
> Well, if we fall back to polling devices if the IRQ is disabled, then the
> shared interrupt case can be handled as well.  However, there were complaints
> about the patch when Ingo had device polling included, as opposed to just
> IRQ mitigation.
>

I dont think youve followed the discussions too well and normally i
wouldnt respond but you addressed me. Ingos netdevice polling is not the
right approach, please look at NAPI and read the paper. NAPI does
all what youve been suggesting. We are not even discussing that at this
point. We are discussing the sledgehammer effect and how you could break a
finger or two trying to kill that fly with it. The example above
illustrates it.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  7:41                         ` Henning P. Schmiedehausen
  2001-10-04 16:09                           ` Ben Greear
@ 2001-10-04 18:30                           ` Christopher E. Brown
  1 sibling, 0 replies; 151+ messages in thread
From: Christopher E. Brown @ 2001-10-04 18:30 UTC (permalink / raw)
  To: hps; +Cc: linux-kernel



On Thu, 4 Oct 2001, Henning P. Schmiedehausen wrote:
>
> Does it finally do speed and duplex auto negotiation with Cisco
> Catalyst Switches? Something I never ever got to work with various 2.0
> and 2.2 drivers, mode settings, Catalyst settings, IOS versions and
> almost anything else that I ever tried.
>
> 	Regards
> 		Henning


	Lets be fair here, while there are issues with some brands of
tulip card, Cisco is often to blame as well.  There are known issues
with N-WAY autoneg on many Ciscos, switches *and* routers.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 15:56                           ` Ben Greear
@ 2001-10-04 18:23                             ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04 18:23 UTC (permalink / raw)
  To: Ben Greear
  Cc: Simon Kirby, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox



On Thu, 4 Oct 2001, Ben Greear wrote:

> jamal wrote:
> >
> > On Wed, 3 Oct 2001, Ben Greear wrote:
> >
> > > The tulip driver only started working for my DLINK 4-port NIC after
> > > about 2.4.8, and last I checked the ZYNX 4-port still refuses to work,
> > > so I wouldn't consider it a paradigm of stability and grace quite yet.
> >
> > The tests in www.cyberus.ca/~hadi/247-res/ were done with 4-port znyx
> > cards using 2.4.7.
> > What kind of problems are you having? Maybe i can help.
>
> Mostly problems with auto-negotiation it seems.  Earlier 2.4 kernels
> just would never go 100bt/FD.  Later (broken) versions would claim to
> be 100bt/FD, but they still showed lots of collisions and frame errors.
>
> I'll try the ZYNX on the latest kernel in the next few days and let you
> know what I find...

Please do.

>
> > My point is that the API exists. Driver owners could use it; this
> > discussion seems to have at least helped to point in the existence of the
> > API. Alexey had the hardware flow control in there since 2.1.x .., us
> > that at least. In my opinion, Ingos patch is radical enough to be allowed
> > in when we are approaching stability. And it is a lazy way of solving the
> > problem
>
> The API has been there since 2.1.x, and yet few drivers support it?  I
> can see why Ingo decided to fix the problem generically.

That logic is convoluted.

> > > cat /proc/net/softnet_stat
> > > 2b85c320 0000d374 6524ce48 00000000 00000000 00000000 00000000
00000000 0$
> > > 2b8b5e29 0000d615 653eba32 00000000 00000000 00000000 00000000
00000000 0$
>
> So you're priting out counters in HEX??  This seems one place where a nice
> base-10 number would be appropriate :)

Its mostly for formating reasons:
2b85c320 is 730186528 (and wont fit in one line)

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:33                 ` Linus Torvalds
                                     ` (2 preceding siblings ...)
  2001-10-04  4:12                   ` bill davidsen
@ 2001-10-04 18:16                   ` Alan Cox
  3 siblings, 0 replies; 151+ messages in thread
From: Alan Cox @ 2001-10-04 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Greear, jamal, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox

>  (a) is not a major security issue. If you allow untrusted users full
>      100/1000Mbps access to your internal network, you have _other_
>      security issues, like packet sniffing etc that are much much MUCH
>      worse. So the packet flooding thing is very much a corner case, and
>      claiming that we have a big problem is silly.

Not nowdays. 100Mbit pipes to the backbone are routine for web serving in
the real world - at least the paying end (aka porn).

Alan

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 17:32                             ` Henning P. Schmiedehausen
@ 2001-10-04 18:03                               ` Ben Greear
  0 siblings, 0 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-04 18:03 UTC (permalink / raw)
  To: hps; +Cc: linux-kernel

"Henning P. Schmiedehausen" wrote:

> >several 2-port EEPRO based NICs out there that work really well
> >too, but they are expensive...
> 
> Hm. If I really need more NICs than PCI slots, I normally use a
> Router. And I've even toyed a little with a Gigabit card linked to a
> Cisco C3524XL using a certain 802.1q unofficial extension to the Linux
> kernel to try and provide 24 100 MBit Ethernet Interfaces from a
> single Linux Box [2].

I wrote (one of) the VLAN patch, and I've brought up 4k
VLAN interfaces.  Let me or the vlan@scry.wanfear.com mailing list
know if you have trouble with my VLAN patch...  My vlan patch
can be found:
http://www.candelatech.com/~greear/vlan.html


Enjoy,
Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 11:34                         ` jamal
@ 2001-10-04 17:40                           ` Andreas Dilger
  2001-10-04 18:33                             ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Andreas Dilger @ 2001-10-04 17:40 UTC (permalink / raw)
  To: jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Simon Kirby

On Oct 04, 2001  07:34 -0400, jamal wrote:
> 1) you shut down shared interupts; take a look at this posting by Marcus
> Sundberg <marcus@cendio.se>
> 
> ---------------
> 
>   0:    7602983          XT-PIC  timer
>   1:      10575          XT-PIC  keyboard
>   2:          0          XT-PIC  cascade
>   8:          1          XT-PIC  rtc
>  11:    1626004          XT-PIC  Toshiba America Info Systems ToPIC95 PCI
> \
> to Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95
> PCI \
> to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom Card, \
> Intel 440MX, irda0  12:       1342          XT-PIC  PS/2 Mouse
>  14:      23605          XT-PIC  ide0
> 
> -----------------------------
> 
> Now you go and shut down IRQ 11 and punish all devices there. If you can
> avoid that, it is acceptable as a temporary replacement to be upgraded to
> a better scheme.

Well, if we fall back to polling devices if the IRQ is disabled, then the
shared interrupt case can be handled as well.  However, there were complaints
about the patch when Ingo had device polling included, as opposed to just
IRQ mitigation.

> 2) By not being granular enough and shutting down sources of noise, you
> are actually not being effective in increasing system utilization.

Well, since the IRQ itself uses system resources, if it is disabled it will
allow those resources to actually do something (i.e. polling instead, when
we know there is a lot of work to do).

Even if it does not have polling in the patch, the choice is to turn off
the IRQ, or have the system hang because it can not make any progress
because of the high number of interrupts.  If your patch ensures that the
network IRQ load is kept down, then Ingo's will never be activated.

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 22:22                     ` Andreas Dilger
@ 2001-10-04 17:32                       ` Davide Libenzi
  0 siblings, 0 replies; 151+ messages in thread
From: Davide Libenzi @ 2001-10-04 17:32 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Robert Olsson, mingo, jamal, linux-kernel, Alexey Kuznetsov,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox

On Wed, 3 Oct 2001, Andreas Dilger wrote:

> If you get to the stage where you are turning off IRQs and going to a
> polling mode, then don't turn IRQs back on until you have a poll (or
> two or whatever) that there is no work to be done.  This will at worst
> give you 50% polling success, but in practise you wouldn't start polling
> until there is lots of work to be done, so the real success rate will
> be much higher.
>
> At this point (no work to be done when polling) there are clearly no
> interrupts would be generated (because no packets have arrived), so it
> should be reasonable to turn interrupts back on and stop polling (assuming
> non-broken hardware).  You now go back to interrupt-driven work until
> the rate increases again.  This means you limit IRQ rates when needed,
> but only do one or two excess polls before going back to IRQ-driven work.
>
> Granted, I don't know what the overhead of turning the IRQs on and off
> is, but since we do it all the time already (for each ISR) it can't be
> that bad.
>
> If you are always having work to do when polling, then interrupts will
> never be turned on again, but who cares at that point because the work
> is getting done?  Similarly, if you have IRQs disabled, but are sharing
> IRQs there is nothing wrong in polling all devices sharing that IRQ
> (at least conceptually).
>
> I don't know much about IRQ handlers, but I assume that this is already
> what happens if you are sharing an IRQ - you don't know which of many
> sources it comes from, so you poll all of them to see if they have any
> work to be done.  If you are polling some of the shared-IRQ devices too
> frequently (i.e. they never have work to do), you could have some sort
> of progressive backoff, so you skip polling those for a growing number
> of polls (this could also be set by the driver if it knows that it could
> only generate real work every X ms, so we skip about X/poll_rate polls).

This seems a pretty nice solution that achieve 1) to limit the irq
frequency 2) avoid the huge shared irqs latency given by the irq masking.
By having a per irq # poll callbacks could give the opportunity to poll
"time to time" sharing devices during the offending device poll loop.



- Davide



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 16:09                           ` Ben Greear
@ 2001-10-04 17:32                             ` Henning P. Schmiedehausen
  2001-10-04 18:03                               ` Ben Greear
  0 siblings, 1 reply; 151+ messages in thread
From: Henning P. Schmiedehausen @ 2001-10-04 17:32 UTC (permalink / raw)
  To: linux-kernel

Ben Greear <greearb@candelatech.com> writes:

>"Henning P. Schmiedehausen" wrote:
>> 
>> Does it finally do speed and duplex auto negotiation with Cisco
>> Catalyst Switches? Something I never ever got to work with various 2.0
>> and 2.2 drivers, mode settings, Catalyst settings, IOS versions and
>> almost anything else that I ever tried.

>Check the latest driver, it works with my IBM switch, and with other
>EEPRO and Tulip NICs now, so it may work for you.  The DLINK 4-port

Hi,

thanks for the suggestion, but I'm actually sold on using eepro100 and
3c59x NICs, both flavours never gave me any trouble (yes, I know about
the 3xc59x and I was always careful to choose either the "B" with 2.0
and early 2.2 and now the "C" with later 2.2. Call me a snob for going
the "FreeBSD way" and choosing HW that works and not taking the
challenge to bring even the most obscure HW lying in a bin at a
customer to work but telling the customer "you can now buy a new,
guaranteed flawlessly performing NIC for $25 or pay me for four hours
trying to get _that_ NIC to work. I charge a little more than $25 per
hour..". Got them every time. ;-)

Basically I burned [1] all my tulip NICs around a long time ago.

>several 2-port EEPRO based NICs out there that work really well
>too, but they are expensive...

Hm. If I really need more NICs than PCI slots, I normally use a
Router. And I've even toyed a little with a Gigabit card linked to a
Cisco C3524XL using a certain 802.1q unofficial extension to the Linux
kernel to try and provide 24 100 MBit Ethernet Interfaces from a
single Linux Box [2].

	Regards
		Henning

[1] Put them in the unavoidable Windows NT and 2000 boxes where most of them
    with "vendor supported, MHL approved, certified and signed drivers" 
    crash and burn as happily as under Linux. But the it is the fault of 
    "the other consultant". I don't do Windows. 

[2] Didn't work, though. Got a C7206 instead. O:-)


-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  7:41                         ` Henning P. Schmiedehausen
@ 2001-10-04 16:09                           ` Ben Greear
  2001-10-04 17:32                             ` Henning P. Schmiedehausen
  2001-10-04 18:30                           ` Christopher E. Brown
  1 sibling, 1 reply; 151+ messages in thread
From: Ben Greear @ 2001-10-04 16:09 UTC (permalink / raw)
  To: hps; +Cc: linux-kernel

"Henning P. Schmiedehausen" wrote:
> 
> Ben Greear <greearb@candelatech.com> writes:
> 
> >jamal wrote:
> >>
> >> I think you can save yourself a lot of pain today by going to a "better
> >> driver"/hardware. Switch to a tulip based board; in particular one which
> >> is based on the 21143 chipset. Compile in hardware traffic control and
> >> save yourself some pain.
> 
> >The tulip driver only started working for my DLINK 4-port NIC
> >after about 2.4.8, and last I checked the ZYNX 4-port still refuses
> >to work, so I wouldn't consider it a paradigm of
> >stability and grace quite yet.  Regardless of that, it is often
> >impossible to trade NICS (think built-in 1U servers), and claiming
> >to only work correctly on certain hardware (and potentially lock up
> >hard on other hardware) is a pretty sorry state of affairs...
> 
> Does it finally do speed and duplex auto negotiation with Cisco
> Catalyst Switches? Something I never ever got to work with various 2.0
> and 2.2 drivers, mode settings, Catalyst settings, IOS versions and
> almost anything else that I ever tried.

Check the latest driver, it works with my IBM switch, and with other
EEPRO and Tulip NICs now, so it may work for you.  The DLINK 4-port
is actually the only one I know of that I have ever gotten to fully
function.  The ZYNX would kind of work at half-duplex for a while,
and an ancient Adaptec I tried locks the whole computer on insmod
of it's driver (IRQ routing issues someone guessed...)  There are
several 2-port EEPRO based NICs out there that work really well
too, but they are expensive...

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 11:47                         ` jamal
@ 2001-10-04 15:56                           ` Ben Greear
  2001-10-04 18:23                             ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Ben Greear @ 2001-10-04 15:56 UTC (permalink / raw)
  To: jamal
  Cc: Simon Kirby, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox

jamal wrote:
> 
> On Wed, 3 Oct 2001, Ben Greear wrote:
> 
> > The tulip driver only started working for my DLINK 4-port NIC after
> > about 2.4.8, and last I checked the ZYNX 4-port still refuses to work,
> > so I wouldn't consider it a paradigm of stability and grace quite yet.
> 
> The tests in www.cyberus.ca/~hadi/247-res/ were done with 4-port znyx
> cards using 2.4.7.
> What kind of problems are you having? Maybe i can help.

Mostly problems with auto-negotiation it seems.  Earlier 2.4 kernels
just would never go 100bt/FD.  Later (broken) versions would claim to
be 100bt/FD, but they still showed lots of collisions and frame errors.

I'll try the ZYNX on the latest kernel in the next few days and let you
know what I find...

> My point is that the API exists. Driver owners could use it; this
> discussion seems to have at least helped to point in the existence of the
> API. Alexey had the hardware flow control in there since 2.1.x .., us
> that at least. In my opinion, Ingos patch is radical enough to be allowed
> in when we are approaching stability. And it is a lazy way of solving the
> problem

The API has been there since 2.1.x, and yet few drivers support it?  I
can see why Ingo decided to fix the problem generically.  I think it would
be great if his code printed a log message upon trigger that basically said:
"You should get yourself a NAPI enabled driver that does flow-control if
possible."  That may give the appropriate visibility to the issue and let
the driver writers improve their drivers accordingly...

Ben

> 
> cheers,
> jamal
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04 11:54                         ` jamal
@ 2001-10-04 15:03                           ` Tim Hockin
  2001-10-04 18:55                           ` Ion Badulescu
  1 sibling, 0 replies; 151+ messages in thread
From: Tim Hockin @ 2001-10-04 15:03 UTC (permalink / raw)
  To: jamal
  Cc: Simon Kirby, Ben Greear, Ingo Molnar, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise, netdev,
	Alan Cox

> Has nothing to do with specific hardware although i see your point.
> send me an eepro and i'll at least add hardware flow control for you.
> The API is simple, its up to the driver maintainers to use. This
> discussion is good to make people aware of those drivers.


is there a place where this is explained?  I'd be happy to make drivers on
which I work support this.  It's like ethtool - easy to do, but no one has
done it because they didn't know.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  0:44                     ` jamal
  2001-10-04  6:35                       ` Ingo Molnar
@ 2001-10-04 13:05                       ` Robert Olsson
  1 sibling, 0 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-04 13:05 UTC (permalink / raw)
  To: mingo
  Cc: jamal, Alexey Kuznetsov, linux-kernel, Robert Olsson, bcrl,
	netdev, Linus Torvalds, Alan Cox


Ingo Molnar writes:
 > 
 > i'm asking the following thing. dev->quota, as i read the patch now, can
 > cause extra calls to ->poll() even though the RX ring of that particular
 > device is empty and the driver has indicated it's done processing RX
 > packets. (i'm now assuming that the extra-polling-for-a-jiffy line in the
 > current patch is removed - that one is a showstopper to begin with.) Is
 > this claim of mine correct?

 Hello!

 Well I'm the one to blame... :-) This comes from my experiments to delay 
 to polling before going into RX-irq-enable mode. This is one of the areas
 to be addressed further with NAPI. And this code was not in any of the 
 files that I announced I think..?

 Cheers.

						--ro

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  8:45                       ` Simon Kirby
@ 2001-10-04 11:54                         ` jamal
  2001-10-04 15:03                           ` Tim Hockin
  2001-10-04 18:55                           ` Ion Badulescu
  0 siblings, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-04 11:54 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Ben Greear, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox



On Thu, 4 Oct 2001, Simon Kirby wrote:

> On Wed, Oct 03, 2001 at 09:04:22PM -0400, jamal wrote:
>
> > I think you can save yourself a lot of pain today by going to a "better
> > driver"/hardware. Switch to a tulip based board; in particular one which
> > is based on the 21143 chipset. Compile in hardware traffic control and
> > save yourself some pain.
>
> Or an Acenic-based card, but that's more expensive.
>
> The problem we had with Tulip-based cards is that it's hard to find a
> good model (variant) that is supported with different kernel versions and
> stock drivers, doesn't change internally with time, and is easily
> distinguishable by our hardware suppliers.  "Intel EtherExpress PRO100+"
> is difficult to get wrong, and there are generally less issues with
> driver compatibility because there are many fewer (no) clones, just a few
> different board revisions.  The same goes with 3COM 905/980s, etc.
>
> I'm not saying Tulips aren't better (they probably are, competition is
> good), but eepro100s are quite simple (and have been reliable for our
> servers much more than 3com 905s and other cards have been in the past).
>

Has nothing to do with specific hardware although i see your point.
send me an eepro and i'll at least add hardware flow control for you.
The API is simple, its up to the driver maintainers to use. This
discussion is good to make people aware of those drivers.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:52                         ` Ingo Molnar
@ 2001-10-04 11:50                           ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04 11:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Ben Greear, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox, Simon Kirby



On Thu, 4 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, Ben Greear wrote:
>
> > > so far your appraoch is that of a shotgun i.e  "let me fire in
> > > that crowd and i'll hit my target but dont care if i take down a few
> > > more"; regardless of how noble the reasoning is, it's  as Linus described
> > > it -- a sledge hammer.
> >
> > Aye, but by shooting this target and getting a few bystanders, you save
> > everyone else...  (And it's only a flesh wound!!)
>
> especially considering that the current code nukes the whole city ;)
>

Ingo, cut down on the bad mushrooms ;->

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:50                       ` Ingo Molnar
@ 2001-10-04 11:49                         ` jamal
  0 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04 11:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Simon Kirby, Linus Torvalds, Ben Greear, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise, netdev,
	Alan Cox



On Thu, 4 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> > I think you can save yourself a lot of pain today by going to a
> > "better driver"/hardware. Switch to a tulip based board; [...]
>
> This is not an option in many cases. (eg. where a company standardizes on
> something non-tulip, or due to simple financial/organizational reasons.)
> What you say is the approach i see in the FreeBSD camp frequently: "use
> these [limited set of] wonderful cards and drivers, the rest sucks
> hardware-design-wise and we dont really care about them", which elitist
> attitude i strongly disagree with.
>

It is not elitist. Maybe we can force people to use the API now. it
exists. And hardware flow control does not require special hardware
features. As well NAPI kills the requirement for mitigation in the future.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:47                       ` Ben Greear
  2001-10-04  7:41                         ` Henning P. Schmiedehausen
@ 2001-10-04 11:47                         ` jamal
  2001-10-04 15:56                           ` Ben Greear
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-04 11:47 UTC (permalink / raw)
  To: Ben Greear
  Cc: Simon Kirby, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox



On Wed, 3 Oct 2001, Ben Greear wrote:

> The tulip driver only started working for my DLINK 4-port NIC after
> about 2.4.8, and last I checked the ZYNX 4-port still refuses to work,
> so I wouldn't consider it a paradigm of stability and grace quite yet.

The tests in www.cyberus.ca/~hadi/247-res/ were done with 4-port znyx
cards using 2.4.7.
What kind of problems are you having? Maybe i can help.

> Regardless of that, it is often impossible to trade NICS (think
> built-in 1U servers), and claiming to only work correctly on certain
> hardware (and potentially lock up hard on other hardware) is a pretty
> sorry state of affairs...

My point is that the API exists. Driver owners could use it; this
discussion seems to have at least helped to point in the existence of the
API. Alexey had the hardware flow control in there since 2.1.x .., us
that at least. In my opinion, Ingos patch is radical enough to be allowed
in when we are approaching stability. And it is a lazy way of solving the
problem

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:35                       ` Ingo Molnar
@ 2001-10-04 11:41                         ` jamal
  2001-10-05 16:42                         ` kuznet
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04 11:41 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexey Kuznetsov, linux-kernel, Robert Olsson, bcrl, netdev,
	Linus Torvalds, Alan Cox



On Thu, 4 Oct 2001, Ingo Molnar wrote:

> i'm asking the following thing. dev->quota, as i read the patch now, can
> cause extra calls to ->poll() even though the RX ring of that particular
> device is empty and the driver has indicated it's done processing RX
> packets. (i'm now assuming that the extra-polling-for-a-jiffy line in the
> current patch is removed - that one is a showstopper to begin with.) Is
> this claim of mine correct?

There should be no extra calls to ->poll() and if there are we should
fix them. Take a look at the state machine i posted earlier.
The one liner is removed.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  8:25 Magnus Redin
@ 2001-10-04 11:39 ` Trever L. Adams
  0 siblings, 0 replies; 151+ messages in thread
From: Trever L. Adams @ 2001-10-04 11:39 UTC (permalink / raw)
  To: Magnus Redin; +Cc: Linux Kernel Mailing List

On Thu, 2001-10-04 at 04:25, Magnus Redin wrote:
> 
> Linus writes:
> > Note that the big question here is WHO CARES?
> 
> Everybody building firewalls, routers, high performance web servers
> and broadband content servers with a Linux kernel.
> Everyody having a 100 Mbit/s external connection. 
> 
> 100 Mbit/s access is not uncommon for broadband access, at least in
> Sweden. There are right now a few hundred thousand twisted pair Cat 5
> and 5E installations into peoples homes with 100 Mbit/s
> equipment. Most of them are right now throttled to 10 Mbit/s to save
> upstream bandwidth but that will change as soon as we get more TV
> channels on the broadband nets. Cat 5E cabling is specified to be able
> to get gigabit into the homes to minimise the risk of the cabling
> becoming worthless in 10 or 20 years.

For businesses in some parts of the country, this is also becoming more
common (though it is usually 10 Mbit/s.  I believe that this will become
more and more common.

I do not agree with Linus's concept that you are foolish to allow people
"untrusted direct access", in so far as it applies to "no one would/will
allow high speed connections their machines."  Linus, dial-up
connections may not be a thing of the past for years to come, but what
we call high-speed is indeed changing.  Let us not let Linux fall
behind.  (AirSwitch in Utah offers 10 Mbits/s to the home in at least
Utah County.)

As for the technical debate of how to do this load limiting or
performance enhancement... I say do what is best on technical grounds...
not on bad assumptions.  This may mean that the other set of patches
going around may be best, or it may mean Ingo's is best or maybe
something entirely different.  I personally do not know!

Trever Adams


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:28                       ` Ingo Molnar
@ 2001-10-04 11:34                         ` jamal
  2001-10-04 17:40                           ` Andreas Dilger
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-04 11:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby



On Thu, 4 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> > > which in turn stops that device as well sooner or later. Optionally,
> > > in the future, this can be made more finegrained for chipsets that
> > > support device-independent IRQ mitigation features, like the USB 2.0
> > > EHCI feature mentioned by David Brownell.
>
> > I think each subsytem should be in charge of its own fate. USB applies
> > in whatever subsystem it belongs to. Cooperating subsystems doing what
> > os best for the system.
>
> this is a claim that is nearly perverse and shows a fundamental
> misunderstanding of how Linux handles error situations. Perhaps we should
> never check NULL pointer dereference in the networking code? Should the
> NMI oopser not debug networking related lockups? Should we never print a
> warning message on a double enable_irq() in a bad networking driver?
>
> *of course* if a chipset supports IRQ mitigation then the generic IRQ code
> can be enabled to use it. We can have networking devices over USB as well.
> USB is a bus protocol that provides access to devices, not just a
> 'subsystem'. And *of course*, the IRQ code is completely right to do
> various sanity checks - as it does today.
>
> Linux has various safety nets in various places - always had. It's always
> the history of problems in a certain area, the seriousness and impact of
> the problem, and the intrusiveness of the safety approach that decides
> whether some safety net is added or not, whether it's put under
> CONFIG_KERNEL_DEBUG or not. While everybody is free to disagree about the
> importance of this particular safety net, just saying 'do not mess with
> *our* interrupts' sounds rather childish. Especially considering that
> tools are available to trigger lockups via broadband access. Especially
> considering that just a few mails earlier you claimed that such lockups do
> not even exist. To quote that paragraph of yours:
>

Your scheme is definetely a safety net, no doubt. But is incomplete.
Whatever subsytem/softirq/process in charge of the USB devices is the next
level of delegation. And of course the driver knows best what is good for
the goose. We delegate at each level of the hierachy and and the ultimate
authority is your code, when it is done right.
But i think we are deviating. We started this with network drivers, which
is where the real proven issue is.

> # Date: Wed, 3 Oct 2001 08:49:51 -0400 (EDT)
> # From: jamal <hadi@cyberus.ca>
>
> [...]
> # You dont need the patch for 2.4 to work against any lockups. And
> # infact i am suprised that you observe _any_ lockups on a PIII which
> # are not observed on my PII. Linux as is, without any tuneups can
> # handle upto about 40000 packets/sec input before you start observing
> # user space startvations. This is about 30Mbps at 64 byte packets; its
> # about 60Mbps at 128 byte packets and comfortable at 100Mbps with byte
> # size of 256. We really dont have a problem at 100Mbps.
>
> so you should never see any lockups.
>

For your P3 is what i meant since i see none on my P2 at the 100Mbps with
256 bytes. But you probably meant user-space starvation under maybe
twice that rate to which i agree and apologize for misunderstanding.

> > Your patch with Linus' idea of "flag mask" would be more acceptable as
> > a last resort. All subsytems should be cooperative and we resort to
> > this to send misbehaving kids to their room.
>
> i have nothing against it in 2.5, of course. Until then => my patch adds
> an irq.c daddy that sends the bad kids to their room.

until then change the eepro to use at least hardware flow control.
If it has mitigation, use the return codes from netif_rx(). Lets see if
that doesnt help you; yes, its a pain, but avoids a lot of unknowns which
your patch introduces.

> > > Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
> > > should never be affected by it in any way. No overhead, no performance
> > > hit.
> >
> > so far your appraoch is that of a shotgun [...]
>
> i'm not sure what this has to do with your NAPI patch. You should never
> see the code trigger. It's an unused sledgehammer (or shotgun) put into
> the garage, as far as NAPI is concerned. And besides, there are lots of
> people on your continent that believe in spare shotguns ;)
>
> i'd rather compare this approach to an airbag, or perhaps shackles.
> Interrupt auto-limiting, despite your absurd and misleading analogy, does
> not 'destroy' or 'kill' anything. It merely limits an IRQ source for up to
> 10 msecs (if HZ is 1000 then it's only 1 msec), if that IRQ source has
> been detected to be critically misbehaving.

Well, i meant two things:
1) you shut down shared interupts; take a look at this posting by Marcus
Sundberg <marcus@cendio.se>

---------------

  0:    7602983          XT-PIC  timer
  1:      10575          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
 11:    1626004          XT-PIC  Toshiba America Info Systems ToPIC95 PCI
\
to Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95
PCI \
to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom Card, \
Intel 440MX, irda0  12:       1342          XT-PIC  PS/2 Mouse
 14:      23605          XT-PIC  ide0

-----------------------------

Now you go and shut down IRQ 11 and punish all devices there. If you can
avoid that, it is acceptable as a temporary replacement to be upgraded to
a better scheme.

2) By not being granular enough and shutting down sources of noise, you
are actually not being effective in increasing system utilization. weve
beat this to death.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  9:49       ` BALBIR SINGH
@ 2001-10-04 10:25         ` Ingo Molnar
  2001-10-07 20:37           ` Andrea Arcangeli
  0 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04 10:25 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Linus Torvalds, linux-kernel


On Thu, 4 Oct 2001, BALBIR SINGH wrote:

> Shouldn't the interrupt mitigation be on a per CPU basis? [...]

this was done by an earlier version of the patch, but it's wrong. An IRQ
cannot arrive to multiple CPUs at once (well, normal device interrupts at
least) - it will arrive either to some random CPU, or can be bound via
/proc/irq/N/smp_affinity. (there are architectures that do
soft-distribution of interrupts, but that can be considered pseudo-random)

But in both cases, it's the actual, per-irq IRQ load that matters. If one
CPU is hogged by IRQ handlers that is not an issue - other CPUs can still
take over the work. If *all* CPUs are hogged then the patch detects the
overload.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  9:22     ` Ingo Molnar
@ 2001-10-04  9:49       ` BALBIR SINGH
  2001-10-04 10:25         ` Ingo Molnar
  0 siblings, 1 reply; 151+ messages in thread
From: BALBIR SINGH @ 2001-10-04  9:49 UTC (permalink / raw)
  To: mingo; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

Sorry, if I missed something in the patch, but here is a question.

Shouldn't the interrupt mitigation be on a per CPU basis?
What I mean is that if a particular CPU is hogged due to some
interrupt, that interrupt should be mitigated on that particular CPU
and not on all CPUs in the system. So, unless an interrupt ends up
taking a lot of time on all CPUs it should still have a chance to
do something.

This could probably help in distributing the interrupts more evenly and
fairly on an SMP system or vice-versa.

Balbir



Ingo Molnar wrote:

>On Thu, 4 Oct 2001, BALBIR SINGH wrote:
>
>>Ingo, is it possible to provide an interface (optional interface) to
>>drivers, so that they can decide how many interrupts are too much.
>>
>
>well, it existed, and i can add it back - i dont have any strong feelings
>either.
>
>>Drivers who feel that they should go in for interrupt mitigation have
>>the option of deciding to go for it.
>>
>
>in those cases the 'irq overload' code should not trigger. It's not the
>rate of interrupts that matters, it's the amount of time we spend in irq
>contexts. The code counts the number of times we 'interrupt and interrupt
>context'. Interrupting an irq-context is a sign of irq overload. The code
>goes into 'overload mode' (and disables that particular interrupt source
>for the rest of the current timer tick) only if more than 97% of all
>interrupts from that source 'interrupt and irq context'. (ie. irq load is
>really high.) As any statistical method it has some inaccuracy, but
>'statistically' it gets things right.
>
>	Ingo
>
>




[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  9:19   ` BALBIR SINGH
@ 2001-10-04  9:22     ` Ingo Molnar
  2001-10-04  9:49       ` BALBIR SINGH
  0 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  9:22 UTC (permalink / raw)
  To: BALBIR SINGH; +Cc: Linus Torvalds, linux-kernel


On Thu, 4 Oct 2001, BALBIR SINGH wrote:

> Ingo, is it possible to provide an interface (optional interface) to
> drivers, so that they can decide how many interrupts are too much.

well, it existed, and i can add it back - i dont have any strong feelings
either.

> Drivers who feel that they should go in for interrupt mitigation have
> the option of deciding to go for it.

in those cases the 'irq overload' code should not trigger. It's not the
rate of interrupts that matters, it's the amount of time we spend in irq
contexts. The code counts the number of times we 'interrupt and interrupt
context'. Interrupting an irq-context is a sign of irq overload. The code
goes into 'overload mode' (and disables that particular interrupt source
for the rest of the current timer tick) only if more than 97% of all
interrupts from that source 'interrupt and irq context'. (ie. irq load is
really high.) As any statistical method it has some inaccuracy, but
'statistically' it gets things right.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 18:23 ` Ingo Molnar
@ 2001-10-04  9:19   ` BALBIR SINGH
  2001-10-04  9:22     ` Ingo Molnar
  0 siblings, 1 reply; 151+ messages in thread
From: BALBIR SINGH @ 2001-10-04  9:19 UTC (permalink / raw)
  To: mingo; +Cc: Linus Torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1754 bytes --]

Ingo, is it possible to provide an interface (optional interface) to drivers,
so that they can decide how many interrupts are too much. Drivers who feel
that they should go in for interrupt mitigation have the option of deciding
to go for it.

Ofcourse, you could also have a ceiling on the maximun number of interrupts,
but the ceiling should be user configurable (using sysctl or /proc), this
would enable administrators to config their systems depending on what kind
of devices (with shared interrupts or not) they have.

Just my 2cents,
Balbir


Ingo Molnar wrote:

>On Wed, 3 Oct 2001, Linus Torvalds wrote:
>
>>Now test it again with the disk interrupt being shared with the
>>network card.
>>
>>Doesn't happen? It sure does. [...]
>>
>
>yes, disk IRQs might be delayed in that case. Without this mechanizm there
>is a lockup.
>
>>Which is why I like the NAPI approach.  If somebody overloads my
>>network card, my USB camera doesn't stop working.
>>
>
>i agree that NAPI is a better approach. And IRQ overload does not happen
>on cards that have hardware-based irq mitigation support already. (and i
>should note that those cards will likely perform even faster with NAPI.)
>
>>I don't disagree with your patch as a last resort when all else fails,
>>but I _do_ disagree with it as a network load limiter.
>>
>
>okay - i removed those parts already (kpolld) in today's patch. (It
>initially was an experiment to prove that this is the only problem we are
>facing under such loads.)
>
>	Ingo
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/
>




[-- Attachment #2: Wipro_Disclaimer.txt --]
[-- Type: text/plain, Size: 853 bytes --]

----------------------------------------------------------------------------------------------------------------------
Information transmitted by this E-MAIL is proprietary to Wipro and/or its Customers and
is intended for use only by the individual or entity to which it is
addressed, and may contain information that is privileged, confidential or
exempt from disclosure under applicable law. If you are not the intended
recipient or it appears that this mail has been forwarded to you without
proper authority, you are notified that any use or dissemination of this
information in any manner is strictly prohibited. In such cases, please
notify us immediately at mailto:mailadmin@wipro.com and delete this mail
from your records.
----------------------------------------------------------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:04                     ` jamal
  2001-10-04  6:47                       ` Ben Greear
  2001-10-04  6:50                       ` Ingo Molnar
@ 2001-10-04  8:45                       ` Simon Kirby
  2001-10-04 11:54                         ` jamal
  2 siblings, 1 reply; 151+ messages in thread
From: Simon Kirby @ 2001-10-04  8:45 UTC (permalink / raw)
  To: jamal
  Cc: Ben Greear, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox

On Wed, Oct 03, 2001 at 09:04:22PM -0400, jamal wrote:

> I think you can save yourself a lot of pain today by going to a "better
> driver"/hardware. Switch to a tulip based board; in particular one which
> is based on the 21143 chipset. Compile in hardware traffic control and
> save yourself some pain.

Or an Acenic-based card, but that's more expensive.

The problem we had with Tulip-based cards is that it's hard to find a
good model (variant) that is supported with different kernel versions and
stock drivers, doesn't change internally with time, and is easily
distinguishable by our hardware suppliers.  "Intel EtherExpress PRO100+"
is difficult to get wrong, and there are generally less issues with
driver compatibility because there are many fewer (no) clones, just a few
different board revisions.  The same goes with 3COM 905/980s, etc.

I'm not saying Tulips aren't better (they probably are, competition is
good), but eepro100s are quite simple (and have been reliable for our
servers much more than 3com 905s and other cards have been in the past).

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
@ 2001-10-04  8:25 Magnus Redin
  2001-10-04 11:39 ` Trever L. Adams
  0 siblings, 1 reply; 151+ messages in thread
From: Magnus Redin @ 2001-10-04  8:25 UTC (permalink / raw)
  To: linux-kernel


Linus writes:
> Note that the big question here is WHO CARES?

Everybody building firewalls, routers, high performance web servers
and broadband content servers with a Linux kernel.
Everyody having a 100 Mbit/s external connection. 

100 Mbit/s access is not uncommon for broadband access, at least in
Sweden. There are right now a few hundred thousand twisted pair Cat 5
and 5E installations into peoples homes with 100 Mbit/s
equipment. Most of them are right now throttled to 10 Mbit/s to save
upstream bandwidth but that will change as soon as we get more TV
channels on the broadband nets. Cat 5E cabling is specified to be able
to get gigabit into the homes to minimise the risk of the cabling
becoming worthless in 10 or 20 years.

A 100 Mbit/s untrusted connection is a reality for quite some people
and its not unreasonable for linux users when it cost $20-$30 per
month. The peering connection will probably be too weak with that
price but you still get thousandss of untrusted neighbours with a full
100 Mbit/s to your computer.

Btw, I work with production and customer support at a company building
linux based firewalls. I am unfortunately not a developer but it is
great fun to read the kernel mailinglist and watch misfeatures and
bugs being discovered, discussed and eradicated. Who need to watch
football when there is Linux VM battle of wits and engineering?

Best regards,
---
Magnus Redin  <redin@ingate.com>   Ingate - Firewall with SIP & NAT
Ingate System AB  +46 13 214600    http://www.ingate.com/


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:47                       ` Ben Greear
@ 2001-10-04  7:41                         ` Henning P. Schmiedehausen
  2001-10-04 16:09                           ` Ben Greear
  2001-10-04 18:30                           ` Christopher E. Brown
  2001-10-04 11:47                         ` jamal
  1 sibling, 2 replies; 151+ messages in thread
From: Henning P. Schmiedehausen @ 2001-10-04  7:41 UTC (permalink / raw)
  To: linux-kernel

Ben Greear <greearb@candelatech.com> writes:

>jamal wrote:
>> 
>> I think you can save yourself a lot of pain today by going to a "better
>> driver"/hardware. Switch to a tulip based board; in particular one which
>> is based on the 21143 chipset. Compile in hardware traffic control and
>> save yourself some pain.

>The tulip driver only started working for my DLINK 4-port NIC
>after about 2.4.8, and last I checked the ZYNX 4-port still refuses
>to work, so I wouldn't consider it a paradigm of
>stability and grace quite yet.  Regardless of that, it is often
>impossible to trade NICS (think built-in 1U servers), and claiming
>to only work correctly on certain hardware (and potentially lock up
>hard on other hardware) is a pretty sorry state of affairs...

Does it finally do speed and duplex auto negotiation with Cisco
Catalyst Switches? Something I never ever got to work with various 2.0
and 2.2 drivers, mode settings, Catalyst settings, IOS versions and
almost anything else that I ever tried. 

	Regards
		Henning
-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:55                         ` Jeff Garzik
@ 2001-10-04  6:56                           ` Ingo Molnar
  0 siblings, 0 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  6:56 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Ben Greear, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Thu, 4 Oct 2001, Jeff Garzik wrote:

> On Wed, 3 Oct 2001, Ben Greear wrote:
> > That requires re-writing all the drivers, right?
>
> NAPI? [...]

Ben is talking about the long-planned "irq_action->handler() returns a
code that indicates progress" approach Linus talked about. *that* needs
the changing of every driver, since every IRQ handler prototype that is
'void' now needs to be changed to return 'int'. (the change is trivial,
but intrusive.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:50                       ` Ben Greear
  2001-10-04  6:52                         ` Ingo Molnar
@ 2001-10-04  6:55                         ` Jeff Garzik
  2001-10-04  6:56                           ` Ingo Molnar
  1 sibling, 1 reply; 151+ messages in thread
From: Jeff Garzik @ 2001-10-04  6:55 UTC (permalink / raw)
  To: Ben Greear
  Cc: jamal, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Linus Torvalds,
	Alan Cox, Simon Kirby

On Wed, 3 Oct 2001, Ben Greear wrote:
> That requires re-writing all the drivers, right?

NAPI?  no.  You move some existing code into a separate function, mainly

	Jeff




^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  6:50                       ` Ben Greear
@ 2001-10-04  6:52                         ` Ingo Molnar
  2001-10-04 11:50                           ` jamal
  2001-10-04  6:55                         ` Jeff Garzik
  1 sibling, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  6:52 UTC (permalink / raw)
  To: Ben Greear
  Cc: jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Wed, 3 Oct 2001, Ben Greear wrote:

> > so far your appraoch is that of a shotgun i.e  "let me fire in
> > that crowd and i'll hit my target but dont care if i take down a few
> > more"; regardless of how noble the reasoning is, it's  as Linus described
> > it -- a sledge hammer.
>
> Aye, but by shooting this target and getting a few bystanders, you save
> everyone else...  (And it's only a flesh wound!!)

especially considering that the current code nukes the whole city ;)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:04                     ` jamal
  2001-10-04  6:47                       ` Ben Greear
@ 2001-10-04  6:50                       ` Ingo Molnar
  2001-10-04 11:49                         ` jamal
  2001-10-04  8:45                       ` Simon Kirby
  2 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  6:50 UTC (permalink / raw)
  To: jamal
  Cc: Simon Kirby, Linus Torvalds, Ben Greear, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise, netdev,
	Alan Cox


On Wed, 3 Oct 2001, jamal wrote:

> I think you can save yourself a lot of pain today by going to a
> "better driver"/hardware. Switch to a tulip based board; [...]

This is not an option in many cases. (eg. where a company standardizes on
something non-tulip, or due to simple financial/organizational reasons.)
What you say is the approach i see in the FreeBSD camp frequently: "use
these [limited set of] wonderful cards and drivers, the rest sucks
hardware-design-wise and we dont really care about them", which elitist
attitude i strongly disagree with.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  0:53                     ` jamal
  2001-10-04  6:28                       ` Ingo Molnar
@ 2001-10-04  6:50                       ` Ben Greear
  2001-10-04  6:52                         ` Ingo Molnar
  2001-10-04  6:55                         ` Jeff Garzik
  1 sibling, 2 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-04  6:50 UTC (permalink / raw)
  To: jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox, Simon Kirby

jamal wrote:

> Your patch with Linus' idea of "flag mask" would be more acceptable as a
> last resort. All subsytems should be cooperative and we resort to this to
> send misbehaving kids to their room.

That requires re-writing all the drivers, right?  Seems a very bad
thing to do in 2.4

> 
> > Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
> > should never be affected by it in any way. No overhead, no performance
> > hit.
> 
> so far your appraoch is that of a shotgun i.e  "let me fire in
> that crowd and i'll hit my target but dont care if i take down a few
> more"; regardless of how noble the reasoning is, it's  as Linus described
> it -- a sledge hammer.

Aye, but by shooting this target and getting a few bystanders, you save
everyone else...  (And it's only a flesh wound!!)

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:04                     ` jamal
@ 2001-10-04  6:47                       ` Ben Greear
  2001-10-04  7:41                         ` Henning P. Schmiedehausen
  2001-10-04 11:47                         ` jamal
  2001-10-04  6:50                       ` Ingo Molnar
  2001-10-04  8:45                       ` Simon Kirby
  2 siblings, 2 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-04  6:47 UTC (permalink / raw)
  To: jamal
  Cc: Simon Kirby, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox

jamal wrote:
> 
> On Wed, 3 Oct 2001, Simon Kirby wrote:
> 
> > On Wed, Oct 03, 2001 at 09:33:12AM -0700, Linus Torvalds wrote:
> >
> > Actually, the way I first started looking at this problem is the result
> > of a few attacks that have happened on our network.  It's not just a
> > while(1) sendto(); UDP spamming program that triggers it -- TCP SYN
> > floods show the problem as well, and _there is no way_ to protect against
> > this without using syncookies or some similar method that can only be
> > done on the receiving TCP stack only.
> >
> > At one point, one of our webservers received 30-40Mbit/sec of SYN packets
> > sustained for almost 24 hours.  Needless to say, the machine was not
> > happy.
> >
> 
> I think you can save yourself a lot of pain today by going to a "better
> driver"/hardware. Switch to a tulip based board; in particular one which
> is based on the 21143 chipset. Compile in hardware traffic control and
> save yourself some pain.

The tulip driver only started working for my DLINK 4-port NIC
after about 2.4.8, and last I checked the ZYNX 4-port still refuses
to work, so I wouldn't consider it a paradigm of
stability and grace quite yet.  Regardless of that, it is often
impossible to trade NICS (think built-in 1U servers), and claiming
to only work correctly on certain hardware (and potentially lock up
hard on other hardware) is a pretty sorry state of affairs...

Ben

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  0:44                     ` jamal
@ 2001-10-04  6:35                       ` Ingo Molnar
  2001-10-04 11:41                         ` jamal
  2001-10-05 16:42                         ` kuznet
  2001-10-04 13:05                       ` Robert Olsson
  1 sibling, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  6:35 UTC (permalink / raw)
  To: jamal
  Cc: Alexey Kuznetsov, linux-kernel, Robert Olsson, bcrl, netdev,
	Linus Torvalds, Alan Cox


On Wed, 3 Oct 2001, jamal wrote:

> > i'm worried by the dev->quota variable a bit. As visible now in the
> > 2.4.10-poll.pat and tulip-NAPI-010910.tar.gz code, it keeps calling the
> > ->poll() function until dev->quota is gone. I think it should only keep
> > calling the function until the rx ring is fully processed - and it should
> > re-enable the receiver afterwards, when exiting net_rx_action.
>
> This would result in an unfairness. Think of one device which receives
> packets really fast that it takes most of the CPU capacity just
> processing it.

no, i asked something else.

i'm asking the following thing. dev->quota, as i read the patch now, can
cause extra calls to ->poll() even though the RX ring of that particular
device is empty and the driver has indicated it's done processing RX
packets. (i'm now assuming that the extra-polling-for-a-jiffy line in the
current patch is removed - that one is a showstopper to begin with.) Is
this claim of mine correct?

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  0:53                     ` jamal
@ 2001-10-04  6:28                       ` Ingo Molnar
  2001-10-04 11:34                         ` jamal
  2001-10-04  6:50                       ` Ben Greear
  1 sibling, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-04  6:28 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Wed, 3 Oct 2001, jamal wrote:

> > which in turn stops that device as well sooner or later. Optionally,
> > in the future, this can be made more finegrained for chipsets that
> > support device-independent IRQ mitigation features, like the USB 2.0
> > EHCI feature mentioned by David Brownell.

> I think each subsytem should be in charge of its own fate. USB applies
> in whatever subsystem it belongs to. Cooperating subsystems doing what
> os best for the system.

this is a claim that is nearly perverse and shows a fundamental
misunderstanding of how Linux handles error situations. Perhaps we should
never check NULL pointer dereference in the networking code? Should the
NMI oopser not debug networking related lockups? Should we never print a
warning message on a double enable_irq() in a bad networking driver?

*of course* if a chipset supports IRQ mitigation then the generic IRQ code
can be enabled to use it. We can have networking devices over USB as well.
USB is a bus protocol that provides access to devices, not just a
'subsystem'. And *of course*, the IRQ code is completely right to do
various sanity checks - as it does today.

Linux has various safety nets in various places - always had. It's always
the history of problems in a certain area, the seriousness and impact of
the problem, and the intrusiveness of the safety approach that decides
whether some safety net is added or not, whether it's put under
CONFIG_KERNEL_DEBUG or not. While everybody is free to disagree about the
importance of this particular safety net, just saying 'do not mess with
*our* interrupts' sounds rather childish. Especially considering that
tools are available to trigger lockups via broadband access. Especially
considering that just a few mails earlier you claimed that such lockups do
not even exist. To quote that paragraph of yours:

# Date: Wed, 3 Oct 2001 08:49:51 -0400 (EDT)
# From: jamal <hadi@cyberus.ca>

[...]
# You dont need the patch for 2.4 to work against any lockups. And
# infact i am suprised that you observe _any_ lockups on a PIII which
# are not observed on my PII. Linux as is, without any tuneups can
# handle upto about 40000 packets/sec input before you start observing
# user space startvations. This is about 30Mbps at 64 byte packets; its
# about 60Mbps at 128 byte packets and comfortable at 100Mbps with byte
# size of 256. We really dont have a problem at 100Mbps.

so you should never see any lockups.

> Your patch with Linus' idea of "flag mask" would be more acceptable as
> a last resort. All subsytems should be cooperative and we resort to
> this to send misbehaving kids to their room.

i have nothing against it in 2.5, of course. Until then => my patch adds
an irq.c daddy that sends the bad kids to their room.

> > Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
> > should never be affected by it in any way. No overhead, no performance
> > hit.
>
> so far your appraoch is that of a shotgun [...]

i'm not sure what this has to do with your NAPI patch. You should never
see the code trigger. It's an unused sledgehammer (or shotgun) put into
the garage, as far as NAPI is concerned. And besides, there are lots of
people on your continent that believe in spare shotguns ;)

i'd rather compare this approach to an airbag, or perhaps shackles.
Interrupt auto-limiting, despite your absurd and misleading analogy, does
not 'destroy' or 'kill' anything. It merely limits an IRQ source for up to
10 msecs (if HZ is 1000 then it's only 1 msec), if that IRQ source has
been detected to be critically misbehaving.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:33                 ` Linus Torvalds
  2001-10-03 17:25                   ` Ingo Molnar
  2001-10-03 20:02                   ` Simon Kirby
@ 2001-10-04  4:12                   ` bill davidsen
  2001-10-04 18:16                   ` Alan Cox
  3 siblings, 0 replies; 151+ messages in thread
From: bill davidsen @ 2001-10-04  4:12 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.33.0110030920500.9427-100000@penguin.transmeta.com> 
    torvalds@transmeta.com wrote:
>Note that the big question here is WHO CARES?
>
>There are two issues, and they are independent:
> (a) handling of network packet flooding nicely
> (b) handling screaming devices nicely.
>
>First off, some comments:
> (a) is not a major security issue. If you allow untrusted users full
>     100/1000Mbps access to your internal network, you have _other_
>     security issues, like packet sniffing etc that are much much MUCH
>     worse. So the packet flooding thing is very much a corner case, and
>     claiming that we have a big problem is silly.

  Did something give you the idea that this only happens on internal
networks? Generally we have untrusted users on external networks, and
lots of them. I have seen problems on heavily loaded DNS and news
servers, and can easily imagine that routers would get it as well. It
doesn't take someone running a load generator to generate load! I have a
syslog server which gets packets from the cluster, and the irq rate on
that gets high enough to worry me, although that tends to be spike load.

|      HOWEVER, (a) _can_ be a performance issue under benchmark load.
|      Benchmarks (unlike real life) are almost always set up to have full
|      network bandwidth access, and can show this issue.

| Ingo tries to fix both of these with a sledgehammer. I'd rather use a bit
| more finesse, and as I do not actually agree with the people who seem to
| think that this is a major problem TODAY, I'll be more than happy to have
| people think about it. The NAPI people have thought about it - but it has
| obviously not been descussed _nearly_ widely enough.

  It is a problem which happens today, on production servers in use
today, and is currently solved by using more servers than would be
needed if the system didn't fall over under this type of load.

| I personally am very nervous about Ingo's approach. I do not believe that
| it will work well over a wide range of machines, and I suspect that the
| "tunables" have been tuned for one load and one machine. I would not be
| surprised if Ingo finds that trying to put the machine under heavy disk
| load with multiple disk controllers might also cause interrupt mitigation,
| which would be unacceptably BAD.

  I will agree that some care is going to be needed to avoid choking the
system, but honestly I doubt that there will be a rush of people going
out and bothering with the feature unless they neeed it. There is some
rate limiting stuff in iptables, and I would bet a six pack of good beer
very few people bother to use them at all unless they are having a
problem. I don't recall any posts saying "I shot myself in the foot with
packet rate limiting."

  As I understand the patch, it applies to individual irq and not to the
system as a whole. I admit I read the description and not the source.
But even with multiple SCSI controllers, I can't imagine hitting 20k
irq/sec, which you can with a few NICs. I am amazed that Linux can
function at 70k context switches/sec, but it sure doesn't function well!

  I think the potential for harm is pretty small, and generally when you
have the problem you run vmstat (or vmstat2) to see what's happening,
and if the system melts just after irq rate hits N, you might start with
80% of N as a first guess. The performance of a locked-up system is
worse than one dropping packets.

  The full fix you want is probably a good thing for 2.5, I think it's
just too radical to drop into a stable serveis (my opinion only).

-- 
bill davidsen <davidsen@tmr.com>
 "If I were a diplomat, in the best case I'd go hungry.  In the worst
  case, people would die."
		-- Robert Lipe

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  8:38         ` Ingo Molnar
@ 2001-10-04  3:50           ` bill davidsen
  0 siblings, 0 replies; 151+ messages in thread
From: bill davidsen @ 2001-10-04  3:50 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.33.0110022256430.2543-100000@localhost.localdomain> 
    mingo@elte.hu wrote:

>there are *tons* of disadvantages if IRQs are shared. In any
>high-performance environment, not having enough interrupt sources is a
>sizing or hw design mistake. You can have up to 200 interrupts even on a
>PC, using multipe IO-APICs. Any decent server board distributes interrupt
>sources properly. Shared interrupts are a legacy of the PC design, and we
>are moving away from it slowly but surely. Especially under gigabit loads
>there are several PCI busses anyway, so getting non-shared interrupts is
>not only easy but a necessity as well. There is no law in physics that
>somehow mandates or prefers the sharing of interrupt vectors: devices are
>distinct, they use up distinct slots in the board. The PCI bus can get
>multiple IRQ sources out of a single card, so even multi-controller cards
>are covered.

  Sharing irq between unrelated devices is probably evil in all cases,
but for identical devices like multiple NICs, the shared irq results in
*one* irq call, followed by polling the devices connected, which can be
lower overhead than servicing N interrupts on a multi-NIC system.

  Shared interrupts predate the PC by a decade (or more), so the comment
about the "PC design" is not relevant. In general polling multiple
devices is less CPU than servicing the same i/o by a larger number of
entries to the interrupt handler. The polling offers the possibility of
lower the number of context switches, far more expensive than checking a
device.

  In serial and network devices the poll is often unavoidable, unless
you use one irq for send and one for receive you will be doing a bit of
polling in any case.

-- 
bill davidsen <davidsen@tmr.com>
 "If I were a diplomat, in the best case I'd go hungry.  In the worst
  case, people would die."
		-- Robert Lipe

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:30                       ` Benjamin LaHaise
  2001-10-03 22:31                         ` Rob Landley
@ 2001-10-04  1:39                         ` jamal
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04  1:39 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: kuznet, mingo, linux-kernel, Robert.Olsson, netdev, torvalds, alan



On Wed, 3 Oct 2001, Benjamin LaHaise wrote:

> On Wed, Oct 03, 2001 at 09:10:10PM -0400, jamal wrote:
> > > Well, this sounds like a 2.5 patch.  When do we get to merge it?
> >
> >
> > It is backward compatible to 2.4 netif_rx() which means it can go in now.
> > The problem is netdrivers that want to use the interface have to be
> > morphed.
>
> I'm alluding to the fact that we need a place to put in-development patches.
>

Sorry ;-> Yes, where is is 2.5 again? ;->

> > As a general disclaimer, i really dont mean to put down Ingo's efforts i
> > just think the irq mitigation idea as is now is wrong for both 2.4 and 2.5
>
> What is your solution to the problem?  Leaving it up to the driver authors
> doesn't work as they're not perfect.  Yes, drivers should attempt to do a
> good job at irq mitigation, but sometimes a safety net is needed.
>

To be honest i am getting a little nervous with what i saw in something
that seems to be a stable kernel. I was nervous  when i saw ksoftirq, but
its already in there. I think we can use the ksoftirq replacement pending
testing to show if latency is improved. I have time this weekend, if that
patch can be isolated it can be tested with NAPI etc.
As for the irq mitigation, in its current form it is insufficient; but
would be OK to go into 2.5 with plans to go and implement the isolation
feature. I would put NAPI into this same category. We can then backport
both back to 2.4.
With current 2.4, i say yes, we leave it to the drivers (and infact claim
we have a sustainable solution if conformed to)

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:10                     ` jamal
@ 2001-10-04  1:30                       ` Benjamin LaHaise
  2001-10-03 22:31                         ` Rob Landley
  2001-10-04  1:39                         ` jamal
  0 siblings, 2 replies; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-04  1:30 UTC (permalink / raw)
  To: jamal; +Cc: kuznet, mingo, linux-kernel, Robert.Olsson, netdev, torvalds, alan

On Wed, Oct 03, 2001 at 09:10:10PM -0400, jamal wrote:
> > Well, this sounds like a 2.5 patch.  When do we get to merge it?
> 
> 
> It is backward compatible to 2.4 netif_rx() which means it can go in now.
> The problem is netdrivers that want to use the interface have to be
> morphed.

I'm alluding to the fact that we need a place to put in-development patches.

> As a general disclaimer, i really dont mean to put down Ingo's efforts i
> just think the irq mitigation idea as is now is wrong for both 2.4 and 2.5

What is your solution to the problem?  Leaving it up to the driver authors 
doesn't work as they're not perfect.  Yes, drivers should attempt to do a 
good job at irq mitigation, but sometimes a safety net is needed.

		-ben

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 19:03                   ` Benjamin LaHaise
@ 2001-10-04  1:10                     ` jamal
  2001-10-04  1:30                       ` Benjamin LaHaise
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-04  1:10 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: kuznet, mingo, linux-kernel, Robert.Olsson, netdev, torvalds, alan



On Wed, 3 Oct 2001, Benjamin LaHaise wrote:

> On Wed, Oct 03, 2001 at 08:53:58PM +0400, kuznet@ms2.inr.ac.ru wrote:
> > Citing my old explanation:
> >
> > >"Polling" is not a real polling in fact, it just accepts irqs as
> > >events waking rx softirq with blocking subsequent irqs.
> > >Actual receive happens at softirq.
> > >
> > >Seems, this approach solves the worst half of livelock problem completely:
> > >irqs are throttled and tuned to load automatically.
> > >Well, and drivers become cleaner.
>
> Well, this sounds like a 2.5 patch.  When do we get to merge it?


It is backward compatible to 2.4 netif_rx() which means it can go in now.
The problem is netdrivers that want to use the interface have to be
morphed.
As a general disclaimer, i really dont mean to put down Ingo's efforts i
just think the irq mitigation idea as is now is wrong for both 2.4 and 2.5

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 20:02                   ` Simon Kirby
@ 2001-10-04  1:04                     ` jamal
  2001-10-04  6:47                       ` Ben Greear
                                         ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: jamal @ 2001-10-04  1:04 UTC (permalink / raw)
  To: Simon Kirby
  Cc: Linus Torvalds, Ben Greear, Ingo Molnar, linux-kernel,
	Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise, netdev,
	Alan Cox



On Wed, 3 Oct 2001, Simon Kirby wrote:

> On Wed, Oct 03, 2001 at 09:33:12AM -0700, Linus Torvalds wrote:
>
> Actually, the way I first started looking at this problem is the result
> of a few attacks that have happened on our network.  It's not just a
> while(1) sendto(); UDP spamming program that triggers it -- TCP SYN
> floods show the problem as well, and _there is no way_ to protect against
> this without using syncookies or some similar method that can only be
> done on the receiving TCP stack only.
>
> At one point, one of our webservers received 30-40Mbit/sec of SYN packets
> sustained for almost 24 hours.  Needless to say, the machine was not
> happy.
>

I think you can save yourself a lot of pain today by going to a "better
driver"/hardware. Switch to a tulip based board; in particular one which
is based on the 21143 chipset. Compile in hardware traffic control and
save yourself some pain.
The interface was published but so far only the tulip conforms to it.
It can sustain upto about 90% of the wire rate before it starts
dropping. And at those rates you still have plenty of CPU available.
The ingress policer in the traffic control code might also be able to
help, however CPU cycles are already wasted by the time that code is hit;
with NAPI you should be able to push the filtering much lower in the
stack.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 17:28                   ` Ingo Molnar
@ 2001-10-04  0:53                     ` jamal
  2001-10-04  6:28                       ` Ingo Molnar
  2001-10-04  6:50                       ` Ben Greear
  0 siblings, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-04  0:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby



On Wed, 3 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> > use the netif_rx() return code and hardware flowcontrol to fix it.
>
> i'm using hardware flowcontrol in the patch, but at a different, higher
> level. This part of the do_IRQ() code disables the offending IRQ source:
>
> 	[...]
>         desc->status |= IRQ_MITIGATED|IRQ_PENDING;
>         __disable_irq(desc, irq);
>
> which in turn stops that device as well sooner or later. Optionally, in
> the future, this can be made more finegrained for chipsets that support
> device-independent IRQ mitigation features, like the USB 2.0 EHCI feature
> mentioned by David Brownell.
>

I think each subsytem should be in charge of its own fate. USB applies in
whatever subsystem it belongs to. Cooperating subsystems doing what os
best for the system.

> i'd prefer it if all subsystems and drivers in the kernel behaved properly
> and limited their IRQ load - but this does not always happen and users are
> hit by irq overload situations.
>

Your patch with Linus' idea of "flag mask" would be more acceptable as a
last resort. All subsytems should be cooperative and we resort to this to
send misbehaving kids to their room.

> Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
> should never be affected by it in any way. No overhead, no performance
> hit.

so far your appraoch is that of a shotgun i.e  "let me fire in
that crowd and i'll hit my target but dont care if i take down a few
more"; regardless of how noble the reasoning is, it's  as Linus described
it -- a sledge hammer.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:51                   ` Ingo Molnar
@ 2001-10-04  0:46                     ` jamal
  2001-10-08  0:31                     ` Andrea Arcangeli
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-04  0:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> (and the only thing i pointed out was that the patch as-is did not limit
> the amount of polling done.)

you mean in the softirq or the one line in the driver?

>
> > > *if* you can make polling a success in ~90% of the time we enter
> > > tulip_poll() under non-specific server load (ie. not routing), then i
> > > think you have really good metrics.
> >
> > we can make it 100% successful; i mentioned that we only do work, if
> > there is work to be done.
>
> can you really make it 100% successful for rx? Ie. do you only ever call
> the ->poll() function if there is a new packet waiting? How do you know
> with a 100% probability that someone on the network just sent a new packet
> waiting? (without receiving an interrupt to begin with that is.)
>

Take a look at what i think is the NAPI state machine pending a nod
from Alexey and Robert:
http://www.cyberus.ca/~hadi/NAPI-SM.ps.gz

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 17:06                   ` Ingo Molnar
@ 2001-10-04  0:44                     ` jamal
  2001-10-04  6:35                       ` Ingo Molnar
  2001-10-04 13:05                       ` Robert Olsson
  0 siblings, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-04  0:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexey Kuznetsov, linux-kernel, Robert Olsson, bcrl, netdev,
	Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, Ingo Molnar wrote:

> i like this approach very much, and indeed this is not polling in any way.
>
> i'm worried by the dev->quota variable a bit. As visible now in the
> 2.4.10-poll.pat and tulip-NAPI-010910.tar.gz code, it keeps calling the
> ->poll() function until dev->quota is gone. I think it should only keep
> calling the function until the rx ring is fully processed - and it should
> re-enable the receiver afterwards, when exiting net_rx_action.

This would result in an unfairness. Think of one device which receives
packets really fast that it takes most of the CPU capacity just processing
it.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-04  1:30                       ` Benjamin LaHaise
@ 2001-10-03 22:31                         ` Rob Landley
  2001-10-04  1:39                         ` jamal
  1 sibling, 0 replies; 151+ messages in thread
From: Rob Landley @ 2001-10-03 22:31 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel

On Wednesday 03 October 2001 21:30, Benjamin LaHaise wrote:
> On Wed, Oct 03, 2001 at 09:10:10PM -0400, jamal wrote:
> > > Well, this sounds like a 2.5 patch.  When do we get to merge it?
> >
> > It is backward compatible to 2.4 netif_rx() which means it can go in now.
> > The problem is netdrivers that want to use the interface have to be
> > morphed.
>
> I'm alluding to the fact that we need a place to put in-development
> patches.

Such as a 2.5 kernel tree? :)

Sorry, couldn't resist.  It was just hanging there...  *Sniff*  I tried.  I 
was weak...!

Rob

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 21:08                   ` Robert Olsson
@ 2001-10-03 22:22                     ` Andreas Dilger
  2001-10-04 17:32                       ` Davide Libenzi
  2001-10-05 14:52                     ` Robert Olsson
  1 sibling, 1 reply; 151+ messages in thread
From: Andreas Dilger @ 2001-10-03 22:22 UTC (permalink / raw)
  To: Robert Olsson
  Cc: mingo, jamal, linux-kernel, Alexey Kuznetsov, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox

On Oct 03, 2001  23:08 +0200, Robert Olsson wrote:
> Ingo Molnar writes:
>  > (i did not criticize the list_add/list_del in any way, it's obviously
>  > correct to cycle the polled devices. I highlited that code only to show
>  > that the current patch as-is polls too agressively for generic server
>  > load.)
> 
>  Yes I think we need some data here... 
> 
>  > can you really make it 100% successful for rx? Ie. do you only ever call
>  > the ->poll() function if there is a new packet waiting? How do you know
>  > with a 100% probability that someone on the network just sent a new packet
>  > waiting? (without receiving an interrupt to begin with that is.)
> 
>  Well we need RX-interrupts not to spin away the CPU or exhaust the the PCI-
>  bus. The NAPI scheme is simple, turn off RX-interrupts when the first packet
>  comes and have the kernel to pull packets from the RX-ring. 
> 
>  I tried have pure polling... it easy do just have your driver return
>  "not_done" all the time. Not a good idea. :-) Maybe as sofirq test.

I think it is rather easy to make this self-regulating (I may be wrong).

If you get to the stage where you are turning off IRQs and going to a
polling mode, then don't turn IRQs back on until you have a poll (or
two or whatever) that there is no work to be done.  This will at worst
give you 50% polling success, but in practise you wouldn't start polling
until there is lots of work to be done, so the real success rate will
be much higher.

At this point (no work to be done when polling) there are clearly no
interrupts would be generated (because no packets have arrived), so it
should be reasonable to turn interrupts back on and stop polling (assuming
non-broken hardware).  You now go back to interrupt-driven work until
the rate increases again.  This means you limit IRQ rates when needed,
but only do one or two excess polls before going back to IRQ-driven work.

Granted, I don't know what the overhead of turning the IRQs on and off
is, but since we do it all the time already (for each ISR) it can't be
that bad.

If you are always having work to do when polling, then interrupts will
never be turned on again, but who cares at that point because the work
is getting done?  Similarly, if you have IRQs disabled, but are sharing
IRQs there is nothing wrong in polling all devices sharing that IRQ
(at least conceptually).

I don't know much about IRQ handlers, but I assume that this is already
what happens if you are sharing an IRQ - you don't know which of many
sources it comes from, so you poll all of them to see if they have any
work to be done.  If you are polling some of the shared-IRQ devices too
frequently (i.e. they never have work to do), you could have some sort
of progressive backoff, so you skip polling those for a growing number
of polls (this could also be set by the driver if it knows that it could
only generate real work every X ms, so we skip about X/poll_rate polls).

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:56                 ` jamal
  2001-10-03 16:51                   ` Ingo Molnar
@ 2001-10-03 21:08                   ` Robert Olsson
  2001-10-03 22:22                     ` Andreas Dilger
  2001-10-05 14:52                     ` Robert Olsson
  1 sibling, 2 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-03 21:08 UTC (permalink / raw)
  To: mingo
  Cc: jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox


Ingo Molnar writes:

 > (i did not criticize the list_add/list_del in any way, it's obviously
 > correct to cycle the polled devices. I highlited that code only to show
 > that the current patch as-is polls too agressively for generic server
 > load.)

 Yes I think we need some data here... 

 > can you really make it 100% successful for rx? Ie. do you only ever call
 > the ->poll() function if there is a new packet waiting? How do you know
 > with a 100% probability that someone on the network just sent a new packet
 > waiting? (without receiving an interrupt to begin with that is.)

 Well we need RX-interrupts not to spin away the CPU or exhaust the the PCI-
 bus. The NAPI scheme is simple, turn off RX-interrupts when the first packet
 comes and have the kernel to pull packets from the RX-ring. 

 I tried have pure polling... it easy do just have your driver return
 "not_done" all the time. Not a good idea. :-) Maybe as sofirq test.
 
 If the device has more packets to deliver than is "allowed" we put this
 back on list and the polling process can give the next device its share
 and so on. So we handle screaming network devices and packet flooding 
 nicely a deliver a decent performance even under those circumstances.

 As you seen from some code fragments we have played with some mechanisms 
 to delay the transition from polling to irq-enable. I think I accepted
 a not_done polls for jiffies in some of the tests. Agree other variants 
 are possible and hopefully better.

 SMP is another area, robustness and performance of course but in case
 of SMP we also have to deal with packet reordering which is something
 we really like to minimize. Even here I think the NAPI polling scheme 
 is interesting. During consecutive polls the device is bound to the same 
 CPU and no packet reordering should occur. 

 And from data we have now we can see packet load is even distributed
 among different CPU's and should follow the smp_affinity.

 Cheers.

						--ro


 

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 18:11                     ` Linus Torvalds
@ 2001-10-03 20:41                       ` Jeremy Hansen
  0 siblings, 0 replies; 151+ messages in thread
From: Jeremy Hansen @ 2001-10-03 20:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


I better go check my pants...

Thanks
-jeremy

On Wed, 3 Oct 2001, Linus Torvalds wrote:

> In article <Pine.LNX.4.33.0110031920410.9973-100000@localhost.localdomain>,
> Ingo Molnar  <mingo@elte.hu> wrote:
> >
> >well, just tested my RAID testsystem as well. I have not tested heavy
> >IO-related IRQ load with the patch before (so it was not tuned for that
> >test in any way), but did so now: an IO test running on 12 disks, (5 IO
> >interfaces: 3 SCSI cards and 2 IDE interfaces) producing 150 MB/sec block
> >IO load and a fair number of SCSI and IDE interrupts, did not trigger the
> >overload code.
> 
> Now test it again with the disk interrupt being shared with the network
> card.
> 
> Doesn't happen? It sure does. It happens more often especially on
> slightly lower-end machines (on laptops it's downright disgusting how
> often _every_ single PCI device ends up sharing the same interrupt).
> 
> And as the lower-end machines are the ones that probably can be forced
> to trigger the whole thing more often, this is a real issue.
> 
> And on my "high-end" machine, I actually have USB and ethernet on the
> same interrupt.  It would be kind of nasty if heavy network traffic
> makes my camera stop working... 
> 
> The fact is, there is never any good reason for limiting "trusted"
> interrupts, ie anything that is internal to the box.  Things like disks,
> graphics controllers etc. 
> 
> Which is why I like the NAPI approach.  If somebody overloads my network
> card, my USB camera doesn't stop working. 
> 
> I don't disagree with your patch as a last resort when all else fails,
> but I _do_ disagree with it as a network load limiter.
> 
> 			Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
The trouble with being poor is that it takes up all your time.


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:33                 ` Linus Torvalds
  2001-10-03 17:25                   ` Ingo Molnar
@ 2001-10-03 20:02                   ` Simon Kirby
  2001-10-04  1:04                     ` jamal
  2001-10-04  4:12                   ` bill davidsen
  2001-10-04 18:16                   ` Alan Cox
  3 siblings, 1 reply; 151+ messages in thread
From: Simon Kirby @ 2001-10-03 20:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Greear, jamal, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox

On Wed, Oct 03, 2001 at 09:33:12AM -0700, Linus Torvalds wrote:

> Note that the big question here is WHO CARES?
> 
> There are two issues, and they are independent:
>  (a) handling of network packet flooding nicely
>  (b) handling screaming devices nicely.
> 
> First off, some comments:
>  (a) is not a major security issue. If you allow untrusted users full
>      100/1000Mbps access to your internal network, you have _other_
>      security issues, like packet sniffing etc that are much much MUCH
>      worse. So the packet flooding thing is very much a corner case, and
>      claiming that we have a big problem is silly.
> 
>      HOWEVER, (a) _can_ be a performance issue under benchmark load.
>      Benchmarks (unlike real life) are almost always set up to have full
>      network bandwidth access, and can show this issue.

Actually, the way I first started looking at this problem is the result
of a few attacks that have happened on our network.  It's not just a
while(1) sendto(); UDP spamming program that triggers it -- TCP SYN
floods show the problem as well, and _there is no way_ to protect against
this without using syncookies or some similar method that can only be
done on the receiving TCP stack only.

At one point, one of our webservers received 30-40Mbit/sec of SYN packets
sustained for almost 24 hours.  Needless to say, the machine was not
happy.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:53                 ` kuznet
  2001-10-03 17:06                   ` Ingo Molnar
@ 2001-10-03 19:03                   ` Benjamin LaHaise
  2001-10-04  1:10                     ` jamal
  1 sibling, 1 reply; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-03 19:03 UTC (permalink / raw)
  To: kuznet; +Cc: mingo, hadi, linux-kernel, Robert.Olsson, netdev, torvalds, alan

On Wed, Oct 03, 2001 at 08:53:58PM +0400, kuznet@ms2.inr.ac.ru wrote:
> Citing my old explanation:
> 
> >"Polling" is not a real polling in fact, it just accepts irqs as
> >events waking rx softirq with blocking subsequent irqs.
> >Actual receive happens at softirq.
> >
> >Seems, this approach solves the worst half of livelock problem completely:
> >irqs are throttled and tuned to load automatically.
> >Well, and drivers become cleaner.

Well, this sounds like a 2.5 patch.  When do we get to merge it?

		-ben

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
       [not found] <200110031811.f93IBoN10026@penguin.transmeta.com>
@ 2001-10-03 18:23 ` Ingo Molnar
  2001-10-04  9:19   ` BALBIR SINGH
  0 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 18:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


On Wed, 3 Oct 2001, Linus Torvalds wrote:

> Now test it again with the disk interrupt being shared with the
> network card.
>
> Doesn't happen? It sure does. [...]

yes, disk IRQs might be delayed in that case. Without this mechanizm there
is a lockup.

> Which is why I like the NAPI approach.  If somebody overloads my
> network card, my USB camera doesn't stop working.

i agree that NAPI is a better approach. And IRQ overload does not happen
on cards that have hardware-based irq mitigation support already. (and i
should note that those cards will likely perform even faster with NAPI.)

> I don't disagree with your patch as a last resort when all else fails,
> but I _do_ disagree with it as a network load limiter.

okay - i removed those parts already (kpolld) in today's patch. (It
initially was an experiment to prove that this is the only problem we are
facing under such loads.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 17:25                   ` Ingo Molnar
@ 2001-10-03 18:11                     ` Linus Torvalds
  2001-10-03 20:41                       ` Jeremy Hansen
  0 siblings, 1 reply; 151+ messages in thread
From: Linus Torvalds @ 2001-10-03 18:11 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.33.0110031920410.9973-100000@localhost.localdomain>,
Ingo Molnar  <mingo@elte.hu> wrote:
>
>well, just tested my RAID testsystem as well. I have not tested heavy
>IO-related IRQ load with the patch before (so it was not tuned for that
>test in any way), but did so now: an IO test running on 12 disks, (5 IO
>interfaces: 3 SCSI cards and 2 IDE interfaces) producing 150 MB/sec block
>IO load and a fair number of SCSI and IDE interrupts, did not trigger the
>overload code.

Now test it again with the disk interrupt being shared with the network
card.

Doesn't happen? It sure does. It happens more often especially on
slightly lower-end machines (on laptops it's downright disgusting how
often _every_ single PCI device ends up sharing the same interrupt).

And as the lower-end machines are the ones that probably can be forced
to trigger the whole thing more often, this is a real issue.

And on my "high-end" machine, I actually have USB and ethernet on the
same interrupt.  It would be kind of nasty if heavy network traffic
makes my camera stop working... 

The fact is, there is never any good reason for limiting "trusted"
interrupts, ie anything that is internal to the box.  Things like disks,
graphics controllers etc. 

Which is why I like the NAPI approach.  If somebody overloads my network
card, my USB camera doesn't stop working. 

I don't disagree with your patch as a last resort when all else fails,
but I _do_ disagree with it as a network load limiter.

			Linus

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:14                 ` jamal
@ 2001-10-03 17:28                   ` Ingo Molnar
  2001-10-04  0:53                     ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 17:28 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Wed, 3 Oct 2001, jamal wrote:

> use the netif_rx() return code and hardware flowcontrol to fix it.

i'm using hardware flowcontrol in the patch, but at a different, higher
level. This part of the do_IRQ() code disables the offending IRQ source:

	[...]
        desc->status |= IRQ_MITIGATED|IRQ_PENDING;
        __disable_irq(desc, irq);

which in turn stops that device as well sooner or later. Optionally, in
the future, this can be made more finegrained for chipsets that support
device-independent IRQ mitigation features, like the USB 2.0 EHCI feature
mentioned by David Brownell.

i'd prefer it if all subsystems and drivers in the kernel behaved properly
and limited their IRQ load - but this does not always happen and users are
hit by irq overload situations.

Your NAPI patch, or any driver/subsystem that does flowcontrol accurately
should never be affected by it in any way. No overhead, no performance
hit.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:33                 ` Linus Torvalds
@ 2001-10-03 17:25                   ` Ingo Molnar
  2001-10-03 18:11                     ` Linus Torvalds
  2001-10-03 20:02                   ` Simon Kirby
                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 17:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ben Greear, jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Alan Cox


On Wed, 3 Oct 2001, Linus Torvalds wrote:

> [...] I would not be surprised if Ingo finds that trying to put the
> machine under heavy disk load with multiple disk controllers might
> also cause interrupt mitigation, which would be unacceptably BAD.

well, just tested my RAID testsystem as well. I have not tested heavy
IO-related IRQ load with the patch before (so it was not tuned for that
test in any way), but did so now: an IO test running on 12 disks, (5 IO
interfaces: 3 SCSI cards and 2 IDE interfaces) producing 150 MB/sec block
IO load and a fair number of SCSI and IDE interrupts, did not trigger the
overload code. I started the network overload utility during this test,
and the code detected overload on the network interrupt (and only on the
network interrupt). IO load is still high (down to 130 MB/sec), while a
fair amount of networking load is handled as well. (While there certainly
are higher IO loads on some Linux boxes, mine should be above the average
IO traffic.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:53                 ` kuznet
@ 2001-10-03 17:06                   ` Ingo Molnar
  2001-10-04  0:44                     ` jamal
  2001-10-03 19:03                   ` Benjamin LaHaise
  1 sibling, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 17:06 UTC (permalink / raw)
  To: Alexey Kuznetsov
  Cc: hadi, linux-kernel, Robert.Olsson, bcrl, netdev, Linus Torvalds,
	Alan Cox


On Wed, 3 Oct 2001 kuznet@ms2.inr.ac.ru wrote:

> Ingo, "polling" is wrong name. It does not poll. :-)

ok. i was also mislead by a quick hack in the source code :)

> Actually, this misnomer is the worst thing whic I worried about.

i think something like: 'offloading hardirq work into softirqs' covers the
concept better, right?

> Citing my old explanation:
>
> > "Polling" is not a real polling in fact, it just accepts irqs as
> > events waking rx softirq with blocking subsequent irqs.
> > Actual receive happens at softirq.
> >
> > Seems, this approach solves the worst half of livelock problem
> > completely: irqs are throttled and tuned to load automatically.
> > Well, and drivers become cleaner.

i like this approach very much, and indeed this is not polling in any way.

i'm worried by the dev->quota variable a bit. As visible now in the
2.4.10-poll.pat and tulip-NAPI-010910.tar.gz code, it keeps calling the
->poll() function until dev->quota is gone. I think it should only keep
calling the function until the rx ring is fully processed - and it should
re-enable the receiver afterwards, when exiting net_rx_action.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:28               ` Ingo Molnar
  2001-10-03 15:56                 ` jamal
@ 2001-10-03 16:53                 ` kuznet
  2001-10-03 17:06                   ` Ingo Molnar
  2001-10-03 19:03                   ` Benjamin LaHaise
  1 sibling, 2 replies; 151+ messages in thread
From: kuznet @ 2001-10-03 16:53 UTC (permalink / raw)
  To: mingo; +Cc: hadi, linux-kernel, Robert.Olsson, bcrl, netdev, torvalds, alan

Hello!

> In a generic computing environment i want to spend cycles doing useful
> work, not polling.

Ingo, "polling" is wrong name. It does not poll. :-)
Actually, this misnomer is the worst thing whic I worried about.

Citing my old explanation:

>"Polling" is not a real polling in fact, it just accepts irqs as
>events waking rx softirq with blocking subsequent irqs.
>Actual receive happens at softirq.
>
>Seems, this approach solves the worst half of livelock problem completely:
>irqs are throttled and tuned to load automatically.
>Well, and drivers become cleaner.

Alexey

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:56                 ` jamal
@ 2001-10-03 16:51                   ` Ingo Molnar
  2001-10-04  0:46                     ` jamal
  2001-10-08  0:31                     ` Andrea Arcangeli
  2001-10-03 21:08                   ` Robert Olsson
  1 sibling, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 16:51 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox


On Wed, 3 Oct 2001, jamal wrote:

> this code was added by Robert to check something; cant remember the
> details on that specific date. [...]

ok.

> > +         while (!list_empty(&queue->poll_list)) {
> > +                 struct net_device *dev;
> > [...]
> > +                 if (dev->quota <= 0 || dev->poll(dev, &budget)) {
> > +                         local_irq_disable();
> > +                         list_del(&dev->poll_list);
> > +                         list_add_tail(&dev->poll_list, &queue->poll_list);

> You misunderstood. This is to enforce fairness. [...]

(i did not criticize the list_add/list_del in any way, it's obviously
correct to cycle the polled devices. I highlited that code only to show
that the current patch as-is polls too agressively for generic server
load.)

> Read the paper.

(i prefer source code. Can i assume the 'authorative' patch to be the one
with the "goto not_done;" line removed, correct?)

> > In a generic computing environment i want to spend cycles doing useful
> > work, not polling. Even the quick kpolld hack [which i dropped, so please
> > dont regard it as a 'competitor' patch] i consider superior to this, as i
> > can renice kpolld to reduce polling. (plus kpolld sucks up available idle
> > cycles as well.) Unless i royally misunderstand it, i cannot stop the
> > above code from wasting my cycles, and if that is true i do not want to
> > see it in the kernel proper in this form.

> The interupt just flags "i, netdev, have work to do"; [...]

(and the only thing i pointed out was that the patch as-is did not limit
the amount of polling done.)

> > *if* you can make polling a success in ~90% of the time we enter
> > tulip_poll() under non-specific server load (ie. not routing), then i
> > think you have really good metrics.
>
> we can make it 100% successful; i mentioned that we only do work, if
> there is work to be done.

can you really make it 100% successful for rx? Ie. do you only ever call
the ->poll() function if there is a new packet waiting? How do you know
with a 100% probability that someone on the network just sent a new packet
waiting? (without receiving an interrupt to begin with that is.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:42               ` Ben Greear
  2001-10-03 15:58                 ` jamal
@ 2001-10-03 16:33                 ` Linus Torvalds
  2001-10-03 17:25                   ` Ingo Molnar
                                     ` (3 more replies)
  1 sibling, 4 replies; 151+ messages in thread
From: Linus Torvalds @ 2001-10-03 16:33 UTC (permalink / raw)
  To: Ben Greear
  Cc: jamal, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Alan Cox


On Wed, 3 Oct 2001, Ben Greear wrote:
>
> Will NAPI patch, as it sits today, fix all IRQ lockup problems for
> all drivers (as Ingo's patch claims to do), or will it just fix
> drivers (eepro, tulip) that have been integrated with it?

Note that the big question here is WHO CARES?

There are two issues, and they are independent:
 (a) handling of network packet flooding nicely
 (b) handling screaming devices nicely.

First off, some comments:
 (a) is not a major security issue. If you allow untrusted users full
     100/1000Mbps access to your internal network, you have _other_
     security issues, like packet sniffing etc that are much much MUCH
     worse. So the packet flooding thing is very much a corner case, and
     claiming that we have a big problem is silly.

     HOWEVER, (a) _can_ be a performance issue under benchmark load.
     Benchmarks (unlike real life) are almost always set up to have full
     network bandwidth access, and can show this issue.

 (b) is to a large degree due to a stupid driver interface. I've wanted to
     change the IRQ handler functions to return a flag mask for about
     three years, but with hundreds of drivers it's always been a bit too
     painful.

     Why do we want to return a flag mask? Because we want the _driver_ to
     be able to say "shut me up" (if the driver cannot shut itself up and
     wants to throttle), and we want the _driver_ to be able to say "Hmm,
     that interrupt was not for me", so that the higher levels can quickly
     figure out if we have the case of us having two drivers but three
     devices, and the third device screaming its head off.

Ingo tries to fix both of these with a sledgehammer. I'd rather use a bit
more finesse, and as I do not actually agree with the people who seem to
think that this is a major problem TODAY, I'll be more than happy to have
people think about it. The NAPI people have thought about it - but it has
obviously not been descussed _nearly_ widely enough.

I personally am very nervous about Ingo's approach. I do not believe that
it will work well over a wide range of machines, and I suspect that the
"tunables" have been tuned for one load and one machine. I would not be
surprised if Ingo finds that trying to put the machine under heavy disk
load with multiple disk controllers might also cause interrupt mitigation,
which would be unacceptably BAD.

			Linus


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:09                   ` Ben Greear
  2001-10-03 16:14                     ` Ingo Molnar
@ 2001-10-03 16:20                     ` Jeff Garzik
  1 sibling, 0 replies; 151+ messages in thread
From: Jeff Garzik @ 2001-10-03 16:20 UTC (permalink / raw)
  To: Ben Greear
  Cc: jamal, Ingo Molnar, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, Benjamin LaHaise, netdev, Linus Torvalds,
	Alan Cox

On Wed, 3 Oct 2001, Ben Greear wrote:
> jamal wrote:
> > On Wed, 3 Oct 2001, Ben Greear wrote:
> > > jamal wrote:
> > > > No. NAPI is for any type of network activities not just for routers or
> > > > sniffers. It works just fine with servers. What do you see in there that
> > > > will make it not work with servers?
> > >
> > > Will NAPI patch, as it sits today, fix all IRQ lockup problems for
> > > all drivers (as Ingo's patch claims to do), or will it just fix
> > > drivers (eepro, tulip) that have been integrated with it?
> > 
> > Unfortunately amongst the three of us tulip seemed to be the most common.
> > Robert has a gige intel. So patches appear only for those two drivers. I
> > could write up a document on how to change drivers.
> 
> So, couldn't your NAPI patch be used by drivers that are updated, and
> let Ingo's patch be a catch-all for un-fixed drivers?  As we move foward,
> more and more drivers support your version, and Ingo's patch becomes less
> utilized.  So long as the patches are tuned such that yours keeps Ingo's
> from being triggered on devices you support, there should be no real
> conflict, eh?

The main thing for me is that jamal/robert/ANK's work has been
undergoing research and refinement for a while now, with very promising
results combined with minimal impact on network drivers.

Any of Ingo's solutions need to be tested in a variety of situations
before we can jump on it with any confidence.

For example, although Ingo dismisses shared-irq situations as
an uninteresting case, we need to take that case into account as well,
because starvation can definitely occur.

I'm all for trying out ideas and test patches, but something as core as
hard IRQ handling needs a lot of testing and research in many different
real world situations before we use it.

So far I do not agree that there is a magic bullet...

	Jeff




^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 16:09                   ` Ben Greear
@ 2001-10-03 16:14                     ` Ingo Molnar
  2001-10-03 16:20                     ` Jeff Garzik
  1 sibling, 0 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 16:14 UTC (permalink / raw)
  To: Ben Greear
  Cc: jamal, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox


On Wed, 3 Oct 2001, Ben Greear wrote:

> So, couldn't your NAPI patch be used by drivers that are updated, and
> let Ingo's patch be a catch-all for un-fixed drivers?  As we move
> foward, more and more drivers support your version, and Ingo's patch
> becomes less utilized.  So long as the patches are tuned such that
> yours keeps Ingo's from being triggered on devices you support, there
> should be no real conflict, eh?

exactly. auto-mitigation will not hurt NAPI-enabled devices the least.
Also, auto-mitigation is device-independent.

perhaps Jamal misunderstood the nature of my patch, so i'd like to state
it again: auto-mitigation is a feature that is not triggered normally. I
did a quick hack yesterday to include kpolld - that was a mistake, i was
wrong, and i've removed it. kpolld was mostly an experiment to prove that
TCP network connections can be fully functional during extreme overload
situations as well. Also, auto-mitigation will be a nice mechanizm to make
people more aware of the NAPI patch: if they ever notice 'Possible IRQ
overload:' messages then they can be told to try the NAPI patches.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:58                 ` jamal
@ 2001-10-03 16:09                   ` Ben Greear
  2001-10-03 16:14                     ` Ingo Molnar
  2001-10-03 16:20                     ` Jeff Garzik
  0 siblings, 2 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-03 16:09 UTC (permalink / raw)
  To: jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox

jamal wrote:
> 
> On Wed, 3 Oct 2001, Ben Greear wrote:
> 
> > jamal wrote:
> >
> > > No. NAPI is for any type of network activities not just for routers or
> > > sniffers. It works just fine with servers. What do you see in there that
> > > will make it not work with servers?
> >
> > Will NAPI patch, as it sits today, fix all IRQ lockup problems for
> > all drivers (as Ingo's patch claims to do), or will it just fix
> > drivers (eepro, tulip) that have been integrated with it?
> 
> Unfortunately amongst the three of us tulip seemed to be the most common.
> Robert has a gige intel. So patches appear only for those two drivers. I
> could write up a document on how to change drivers.
> 

So, couldn't your NAPI patch be used by drivers that are updated, and
let Ingo's patch be a catch-all for un-fixed drivers?  As we move foward,
more and more drivers support your version, and Ingo's patch becomes less
utilized.  So long as the patches are tuned such that yours keeps Ingo's
from being triggered on devices you support, there should be no real
conflict, eh?

Ben

> cheers,
> jamal

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:42               ` Ben Greear
@ 2001-10-03 15:58                 ` jamal
  2001-10-03 16:09                   ` Ben Greear
  2001-10-03 16:33                 ` Linus Torvalds
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-03 15:58 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, Ben Greear wrote:

> jamal wrote:
>
> > No. NAPI is for any type of network activities not just for routers or
> > sniffers. It works just fine with servers. What do you see in there that
> > will make it not work with servers?
>
> Will NAPI patch, as it sits today, fix all IRQ lockup problems for
> all drivers (as Ingo's patch claims to do), or will it just fix
> drivers (eepro, tulip) that have been integrated with it?

Unfortunately amongst the three of us tulip seemed to be the most common.
Robert has a gige intel. So patches appear only for those two drivers. I
could write up a document on how to change drivers.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 15:28               ` Ingo Molnar
@ 2001-10-03 15:56                 ` jamal
  2001-10-03 16:51                   ` Ingo Molnar
  2001-10-03 21:08                   ` Robert Olsson
  2001-10-03 16:53                 ` kuznet
  1 sibling, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-03 15:56 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, Ingo Molnar wrote:

>
> On Wed, 3 Oct 2001, jamal wrote:
>
> > No. NAPI is for any type of network activities not just for routers or
> > sniffers. It works just fine with servers. What do you see in there
> > that will make it not work with servers?
>
> eg. such solutions in tulip-NAPI-010910:
>
>          /* For now we do this to avoid getting into
>             IRQ mode too quickly */
>
>          if( jiffies - dev->last_rx  == 0 ) goto not_done;
>          [...]
>  not_done:
>          [...]
>          return 1;

this code was added by Robert to check something; cant remember the
details on that specific date. The goal is to test variuos workloads and
conditions before reaching  conclusions; so it might have been valid on
that day only
Take it out and things should work just fine.

> combined with this code in the net_rx_action softirq handler:
>
> +         while (!list_empty(&queue->poll_list)) {
> +                 struct net_device *dev;
> [...]
> +                 if (dev->quota <= 0 || dev->poll(dev, &budget)) {
> +                         local_irq_disable();
> +                         list_del(&dev->poll_list);
> +                         list_add_tail(&dev->poll_list, &queue->poll_list);
>
> while the stated goal of NAPI is to do 'intelligent, feedback based
> polling', apparently the code is not trusting its own metrics, and is
> forcing the interface into polling mode if we are still within the same 10
> msec period of time, or if we have looped 300 times (default
> netdev_max_backlog value). Not very intelligent IMO.
>

You misunderstood. This is to enforce fairness. Read the paper. When
you have one device sending 100Kpps and another sending 1pps to the stack,
you wanna make sure that the 1pps doesnt get starved -- thats what the
purpose of the above code is (hence the Round  robin scheduling and the
quota per device).

> In a generic computing environment i want to spend cycles doing useful
> work, not polling. Even the quick kpolld hack [which i dropped, so please
> dont regard it as a 'competitor' patch] i consider superior to this, as i
> can renice kpolld to reduce polling. (plus kpolld sucks up available idle
> cycles as well.) Unless i royally misunderstand it, i cannot stop the
> above code from wasting my cycles, and if that is true i do not want to
> see it in the kernel proper in this form.
>

Again, you misunderstood. Please spend a few more minutes reading the code
and i should insist you read the paper ;->
The interupt just flags "i, netdev, have work to do"; the the poll thread
grabs packets off it when the softirq gets scheduled. So we dont do
unecessary polling; we only poll when  there is work to be done.
In the low-load case this solution reduces to the same as interupt driven
system and scales to the system/CPU capacity.

> if the only thing done by a system is processing network packets, then
> polling is a very nice solution for high loads. So do not take my comments
> as an attack against polling.
>

The poll thread is run as softirq, just as the other half of networking is
today. And as should be because networking is extremely important as a
subsyestem.

> *if* you can make polling a success in ~90% of the time we enter
> tulip_poll() under non-specific server load (ie. not routing), then i
> think you have really good metrics.

we can make it 100% successful; i mentioned that we only do work, if there
is work to be done.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 13:03             ` jamal
  2001-10-03 13:25               ` jamal
  2001-10-03 15:28               ` Ingo Molnar
@ 2001-10-03 15:42               ` Ben Greear
  2001-10-03 15:58                 ` jamal
  2001-10-03 16:33                 ` Linus Torvalds
  2 siblings, 2 replies; 151+ messages in thread
From: Ben Greear @ 2001-10-03 15:42 UTC (permalink / raw)
  To: jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox

jamal wrote:

> No. NAPI is for any type of network activities not just for routers or
> sniffers. It works just fine with servers. What do you see in there that
> will make it not work with servers?

Will NAPI patch, as it sits today, fix all IRQ lockup problems for
all drivers (as Ingo's patch claims to do), or will it just fix
drivers (eepro, tulip) that have been integrated with it?

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 13:03             ` jamal
  2001-10-03 13:25               ` jamal
@ 2001-10-03 15:28               ` Ingo Molnar
  2001-10-03 15:56                 ` jamal
  2001-10-03 16:53                 ` kuznet
  2001-10-03 15:42               ` Ben Greear
  2 siblings, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 15:28 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox


On Wed, 3 Oct 2001, jamal wrote:

> No. NAPI is for any type of network activities not just for routers or
> sniffers. It works just fine with servers. What do you see in there
> that will make it not work with servers?

eg. such solutions in tulip-NAPI-010910:

         /* For now we do this to avoid getting into
            IRQ mode too quickly */

         if( jiffies - dev->last_rx  == 0 ) goto not_done;
         [...]
 not_done:
         [...]
         return 1;

combined with this code in the net_rx_action softirq handler:

+         while (!list_empty(&queue->poll_list)) {
+                 struct net_device *dev;
[...]
+                 if (dev->quota <= 0 || dev->poll(dev, &budget)) {
+                         local_irq_disable();
+                         list_del(&dev->poll_list);
+                         list_add_tail(&dev->poll_list, &queue->poll_list);

while the stated goal of NAPI is to do 'intelligent, feedback based
polling', apparently the code is not trusting its own metrics, and is
forcing the interface into polling mode if we are still within the same 10
msec period of time, or if we have looped 300 times (default
netdev_max_backlog value). Not very intelligent IMO.

In a generic computing environment i want to spend cycles doing useful
work, not polling. Even the quick kpolld hack [which i dropped, so please
dont regard it as a 'competitor' patch] i consider superior to this, as i
can renice kpolld to reduce polling. (plus kpolld sucks up available idle
cycles as well.) Unless i royally misunderstand it, i cannot stop the
above code from wasting my cycles, and if that is true i do not want to
see it in the kernel proper in this form.

if the only thing done by a system is processing network packets, then
polling is a very nice solution for high loads. So do not take my comments
as an attack against polling.

*if* you can make polling a success in ~90% of the time we enter
tulip_poll() under non-specific server load (ie. not routing), then i
think you have really good metrics.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 14:51               ` Ingo Molnar
@ 2001-10-03 15:14                 ` jamal
  2001-10-03 17:28                   ` Ingo Molnar
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-03 15:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby



On Wed, 3 Oct 2001, Ingo Molnar wrote:

Robert has a driver extension (part of Alexey's iputils) that cna
generate in the 140Kpps (for 100Mbps) and about 900Kpps for the e1000, but
i'll take a look at Simons stuff if it is available. Marc Boucher has
something that is an in-kernel client/server  as well.

> 10.0.3.4 is running vanilla 2.4.11-pre2 UP, a 466 MHz PII box with enough
> RAM, using eepro100. The system effectively locks up - even in the full
> knowledge of what is happening, i can hardly switch consoles, let alone do
> anything like ifconfig eth0 down to fix the lockup. Until this kind of
> load is present the only option is to power-cycle the box. SysRq does not
> work.

use the netif_rx() return code and hardware flowcontrol to fix it.

> and frankly, this has been well-known for a long time - it's just since
> Simon sent me this testcode that i realized how trivial it is. Alexey told
> me about Linux routers effectively locking up if put under 100 mbit IRQ
> load more than a year ago, when i first tried to fix softirq latencies. I
> think if you are doing networking patches then you should be aware of it
> as well.
>

I am fully aware of it. We have progessed extensively since then. Look at
NAPI.

> your refusal to accept this problem as an existing and real problem is
> really puzzling me.
>

I must have miscommunicated. I am not saying there is no problem otherwise
i wouldnt be working on this to begin with. I am just against your shotgun
approach.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 12:49             ` jamal
@ 2001-10-03 14:51               ` Ingo Molnar
  2001-10-03 15:14                 ` jamal
  2001-10-04 21:28                 ` Alex Bligh - linux-kernel
  0 siblings, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03 14:51 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Wed, 3 Oct 2001, jamal wrote:

> You dont need the patch for 2.4 to work against any lockups. And
> infact i am suprised that you observe _any_ lockups on a PIII which
> are not observed on my PII. [...]

as mentioned before, it's dead easy to lock up current kernels via high
enough networking irq/softirq load:

    box:~> wc -l udpspam.c
    131 udpspam.c

    box:~> ./udpspam 10.0.3.4

10.0.3.4 is running vanilla 2.4.11-pre2 UP, a 466 MHz PII box with enough
RAM, using eepro100. The system effectively locks up - even in the full
knowledge of what is happening, i can hardly switch consoles, let alone do
anything like ifconfig eth0 down to fix the lockup. Until this kind of
load is present the only option is to power-cycle the box. SysRq does not
work.

(ask Simon for the code.)

and frankly, this has been well-known for a long time - it's just since
Simon sent me this testcode that i realized how trivial it is. Alexey told
me about Linux routers effectively locking up if put under 100 mbit IRQ
load more than a year ago, when i first tried to fix softirq latencies. I
think if you are doing networking patches then you should be aware of it
as well.

your refusal to accept this problem as an existing and real problem is
really puzzling me.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  9:22         ` Ingo Molnar
@ 2001-10-03 14:06           ` David Brownell
  0 siblings, 0 replies; 151+ messages in thread
From: David Brownell @ 2001-10-03 14:06 UTC (permalink / raw)
  To: mingo; +Cc: lkml

USB 2.0 host controllers (EHCI) support a kind of hardware
level interrupt mitigation, whereby a register controls interrupt
latency.  The controller can delay interrupts from 1-64 microframes, 
where microframe = 125usec, and the current driver defaults that
latency to 1 microframe (best overall performance) but sets that
from a module parameter.

I've only read the discussion via archive, so I might have missed
something, but I didn't see what I had hoped to see:  a feedback
mechanism so drivers (PCI in the case of EHCI) can learn that
decreasing the IRQ rate would be good, or later that it's OK to
increase it again.  (Seems like Alan Cox suggested as much too ...)

I saw several suggestions specific to the networking layer,
but I'd sure hope to see mechanisms in place that work for
non-network drivers.  Someday; right now highspeed USB
devices (480 MBit/sec) aren't common yet, mostly disks, and
motherboard chipsets don't yet support it.

- Dave




^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  9:38           ` Ingo Molnar
  2001-10-03 13:03             ` jamal
@ 2001-10-03 13:38             ` Robert Olsson
  2001-10-04 21:22               ` Alex Bligh - linux-kernel
  2001-10-05 14:32               ` Robert Olsson
  1 sibling, 2 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-03 13:38 UTC (permalink / raw)
  To: jamal
  Cc: Ingo Molnar, linux-kernel, Alexey Kuznetsov, Robert Olsson,
	Benjamin LaHaise, netdev, Linus Torvalds, Alan Cox



jamal writes:

 > The paper is at: http://www.cyberus.ca/~hadi/usenix-paper.tgz
 > Robert can point you to the latest patches.


 Current code... there are still some parts we like to better.

 Available via ftp from robur.slu.se:/pub/Linux/net-development/NAPI/
 2.4.10-poll.pat
 
 The original code:

 ANK-NAPI-tulip-only.pat
 ANK-NAPI-kernel-only.pat

 And for GIGE there is a e1000 driver in test. 

 Cheers.

						--ro


 

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03 13:03             ` jamal
@ 2001-10-03 13:25               ` jamal
  2001-10-03 15:28               ` Ingo Molnar
  2001-10-03 15:42               ` Ben Greear
  2 siblings, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-03 13:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, jamal wrote:

>
>
> On Wed, 3 Oct 2001, Ingo Molnar wrote:
>
> >
> > but the objectives, judging from the description you gave, are i think
> > largely orthogonal,  with some overlapping in the polling part.
>
> yes. Weve done a lot of thoroughly thought work in that area and i think
> it will be a sin to throw it out.
>

I hit the send button to fast..
The dynamic irq limiting (it must not be set by a system admin to conserve
the principle of work) could be used as a last resort. The point is, if
you are not generating a lot of interupts to begin with (as is the case
with NAPI), i dont see the irq rate limiting kicking in at all. Maybe for
broken drivers and perhaps for other devices other than those within the
network subsystem (i think weve pretty much taken care of the network
subsystem). But you must fix the irq sharing issue first and be
able to precisely isolate and punish the rude devices.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  9:38           ` Ingo Molnar
@ 2001-10-03 13:03             ` jamal
  2001-10-03 13:25               ` jamal
                                 ` (2 more replies)
  2001-10-03 13:38             ` Robert Olsson
  1 sibling, 3 replies; 151+ messages in thread
From: jamal @ 2001-10-03 13:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox



On Wed, 3 Oct 2001, Ingo Molnar wrote:

>
> On Tue, 2 Oct 2001, jamal wrote:
>
> > This already is done in the current NAPI patch which you should have
> > seen by now. [...]
>

The paper is at: http://www.cyberus.ca/~hadi/usenix-paper.tgz
Robert can point you to the latest patches.

>
> but the objectives, judging from the description you gave, are i think
> largely orthogonal,  with some overlapping in the polling part.

yes. Weve done a lot of thoroughly thought work in that area and i think
it will be a sin to throw it out.

> The polling
> part of my patch is just a few quick lines here and there and it's not
> intrusive at all.

NAPI is not intrusive either, it is backward compatible.

> I needed it to make sure all problems are solved and
> that the system & network is actually usable in overload situations.
>

And you can; look at my previous email. I would rather patch 2.4 to use
NAPI than see your patch in there.

> you i think are concentrating on router performance (i'd add dedicated
> networking appliances to the list), using cooperative drivers. I trying to
> solve a DoS attack against 2.4 boxes, and i'm trying to guarantee the
> uninterrupted (pun unintended) functioning of the system from the point of
> the IRQ handler code.

No. NAPI is for any type of network activities not just for routers or
sniffers. It works just fine with servers. What do you see in there that
will make it not work with servers?

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  8:34           ` Ingo Molnar
  2001-10-03  9:29             ` Helge Hafting
@ 2001-10-03 12:49             ` jamal
  2001-10-03 14:51               ` Ingo Molnar
  1 sibling, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-03 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby




On Wed, 3 Oct 2001, Ingo Molnar wrote:

>
> On Tue, 2 Oct 2001, jamal wrote:
>
> > [...] please have the courtesy of at least posting results/numbers of
> > how this improved things and under what workloads and conditions.
> > [...]
>
> 500 MHz PIII UP server, 433 MHz client over a single 100 mbit ethernet
> using Simon Kirby's udpspam tool to overload the server. Result: 2.4.10
> locks up before the patch. 2.4.10 with the first generation irqrate patch
> applied protects against the lockup (if max_rate is correct), but results
> in dropped packets. The auto-tuning+polling patch results in a working
> system and working network, no lockup and no dropped packets. Why this
> happened and how it happened has been discussed extensively.
>
> (the effect of polling-driven networking is just an extra and unintended
> bonus side-effect.)
>

This is insufficient and, no pun intended, but you must be joking if you
intend on putting this patch into the kernel based on these observations.

For sample data look at: http://www.cyberus.ca/~hadi/247-res/
Weve been collecting data for about a year and fixing the patchs and we
still dont think we cover the full range (hopefully other people will help
in that when we merge).

You dont need the patch for 2.4 to work against any lockups. And
infact i am suprised that you observe _any_ lockups on a PIII which are
not observed on my PII. Linux as is, without any tuneups can handle
upto about 40000 packets/sec input before you start observing user space
startvations. This is about 30Mbps at 64 byte packets; its about 60Mbps at
128 byte packets and comfortable at 100Mbps with byte size of 256. We
really dont have a problem at 100Mbps.

There are several solutions in 2.4 and i suggest you try those first

1) has been around since 2.1 is hardware flow control.
First you need to register callbacks to throttle on/off your device.
Typically the xoff() callbacks will involve the driver turning off the
receive and receive_nobuf interupt sources and the xon() callback will
undo this. The network subsytem observes congestion levels by the size of
the backlog queue. it shuts off devices with when its overloaded and
unthrottles them when the conditions get better

2) and upgrade to the above introduced in 2.4:
Instead of waiting until you get shut off because of an overloaded
syatem, you could do something about it... use the return values from
netif_rx to make decisions. The return value indicates whether the system
is getting congested or not. The value is computed based on a moving
window averaging of the backlog queue and so is a pretty good reflection
of congestion levels. Typical uses of the return value are to tune the
mitigation registers. If the congestion thresholds are approaching a high
watermark, you back off and if they indicate things are getting
better, you increase you packet rate to the stack.

since you seem to be unaware of the above, i would suggest you try them
out first.

NAPI builds upon the above and introduces a more generic solution.

cheers,
jamal



^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 22:00         ` jamal
  2001-10-03  8:34           ` Ingo Molnar
@ 2001-10-03  9:38           ` Ingo Molnar
  2001-10-03 13:03             ` jamal
  2001-10-03 13:38             ` Robert Olsson
  1 sibling, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03  9:38 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox


On Tue, 2 Oct 2001, jamal wrote:

> This already is done in the current NAPI patch which you should have
> seen by now. [...]

(i searched the web and mailing list archives and havent found it (in fact
this is the first mention i saw) - could you give me a link so i can take
a look at it? I just found your slides but no link to actual code.
Thanks!)

but the objectives, judging from the description you gave, are i think
largely orthogonal, with some overlapping in the polling part. The polling
part of my patch is just a few quick lines here and there and it's not
intrusive at all. I needed it to make sure all problems are solved and
that the system & network is actually usable in overload situations.

you i think are concentrating on router performance (i'd add dedicated
networking appliances to the list), using cooperative drivers. I trying to
solve a DoS attack against 2.4 boxes, and i'm trying to guarantee the
uninterrupted (pun unintended) functioning of the system from the point of
the IRQ handler code.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-03  8:34           ` Ingo Molnar
@ 2001-10-03  9:29             ` Helge Hafting
  2001-10-03 12:49             ` jamal
  1 sibling, 0 replies; 151+ messages in thread
From: Helge Hafting @ 2001-10-03  9:29 UTC (permalink / raw)
  To: mingo, linux-kernel

Ingo Molnar wrote:

> 500 MHz PIII UP server, 433 MHz client over a single 100 mbit ethernet
> using Simon Kirby's udpspam tool to overload the server. Result: 2.4.10
> locks up before the patch. 2.4.10 with the first generation irqrate patch
> applied protects against the lockup (if max_rate is correct), but results
> in dropped packets. The auto-tuning+polling patch results in a working
> system and working network, no lockup and no dropped packets. Why this
> happened and how it happened has been discussed extensively.

I hope we get some variant of this in 2.4.  A device callback
stopping rx interrupts only is of course even better, but
won't that be 2.5 stuff?

Helge Hafting

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  5:55       ` Ben Greear
@ 2001-10-03  9:22         ` Ingo Molnar
  2001-10-03 14:06           ` David Brownell
  0 siblings, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03  9:22 UTC (permalink / raw)
  To: Ben Greear
  Cc: Benjamin LaHaise, jamal, linux-kernel, Alexey Kuznetsov,
	Robert Olsson, netdev


On Mon, 1 Oct 2001, Ben Greear wrote:

> So, when you turn off the IRQs, are the drivers somehow made aware of
> this so that they can go into polling mode? That might fix the 10ms
> latency/starvation problem that bothers me...

the latest, -D9 patch does this. If drivers provide a (backwards
compatible) ->poll_controller() call then they can be polled by kpolld.
There are also a few points within the networking code that trigger a poll
pass, to make sure events are processed even if networking-intensive
applications take away all CPU time from kpolld. The device is only polled
if the IRQ is detected to be in overload mode. IRQ-overload protection
does not depend on the existence of the availability of the
->poll_controller(). The poll_controller() call is very simple for most
drivers. (It has to be per-driver, because not all drivers advance their
state purely via their device interrupts.)

but kpolld itself and auto-mitigation is not limited to networking - any
other driver framework that has high-irq-load problems can use it.

> I'm more worried about dropped pkts.  If you can receive 10k packets
> per second, then you can receive (lose) 100 packets in 10ms....

yep - this does not happen anymore, at least under the loads i tested
which otherwise choke a purely irq-driven machine. (It will happen in a
gradual way if load is increased further, but that is natural.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  6:50 ` Marcus Sundberg
@ 2001-10-03  8:47   ` Ingo Molnar
  0 siblings, 0 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03  8:47 UTC (permalink / raw)
  To: Marcus Sundberg; +Cc: linux-kernel


On 2 Oct 2001, Marcus Sundberg wrote:

> Guess my P3-based laptop doesn't count as modern then:
>
>   0:    7602983          XT-PIC  timer
>   1:      10575          XT-PIC  keyboard
>   2:          0          XT-PIC  cascade
>   8:          1          XT-PIC  rtc
>  11:  1626004 XT-PIC Toshiba America Info Systems ToPIC95 PCI to
> Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95
> PCI to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom
> Card, Intel 440MX, irda0

ugh!

> I can't even imagine why they did it like this...

well, you arent going to be using it as a webserver i guess? :) But the
costs on desktops are minimal. It's the high-irq-rate server environments
that want separate irq sources.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 12:10       ` jamal
  2001-10-02 22:00         ` jamal
@ 2001-10-03  8:38         ` Ingo Molnar
  2001-10-04  3:50           ` bill davidsen
  1 sibling, 1 reply; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03  8:38 UTC (permalink / raw)
  To: jamal
  Cc: Benjamin LaHaise, linux-kernel, Alexey Kuznetsov, Robert Olsson, netdev


On Tue, 2 Oct 2001, jamal wrote:

> You are still missing the point (by humping on the literal meaning of
> the example i provide), the point is: fine grained vs shutting down
> the whole IRQ.

i'm convinced that this is a minor detail.

there are *tons* of disadvantages if IRQs are shared. In any
high-performance environment, not having enough interrupt sources is a
sizing or hw design mistake. You can have up to 200 interrupts even on a
PC, using multipe IO-APICs. Any decent server board distributes interrupt
sources properly. Shared interrupts are a legacy of the PC design, and we
are moving away from it slowly but surely. Especially under gigabit loads
there are several PCI busses anyway, so getting non-shared interrupts is
not only easy but a necessity as well. There is no law in physics that
somehow mandates or prefers the sharing of interrupt vectors: devices are
distinct, they use up distinct slots in the board. The PCI bus can get
multiple IRQ sources out of a single card, so even multi-controller cards
are covered.

i fully agree that both the irq code and drivers themselves have to handle
shared interrupts correctly, and we should not penalize shared interrupts
unnecesserily, but do they have to influence our design decisions too
much? Nope.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 22:00         ` jamal
@ 2001-10-03  8:34           ` Ingo Molnar
  2001-10-03  9:29             ` Helge Hafting
  2001-10-03 12:49             ` jamal
  2001-10-03  9:38           ` Ingo Molnar
  1 sibling, 2 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-03  8:34 UTC (permalink / raw)
  To: jamal
  Cc: linux-kernel, Alexey Kuznetsov, Robert Olsson, Benjamin LaHaise,
	netdev, Linus Torvalds, Alan Cox, Simon Kirby


On Tue, 2 Oct 2001, jamal wrote:

> [...] please have the courtesy of at least posting results/numbers of
> how this improved things and under what workloads and conditions.
> [...]

500 MHz PIII UP server, 433 MHz client over a single 100 mbit ethernet
using Simon Kirby's udpspam tool to overload the server. Result: 2.4.10
locks up before the patch. 2.4.10 with the first generation irqrate patch
applied protects against the lockup (if max_rate is correct), but results
in dropped packets. The auto-tuning+polling patch results in a working
system and working network, no lockup and no dropped packets. Why this
happened and how it happened has been discussed extensively.

(the effect of polling-driven networking is just an extra and unintended
bonus side-effect.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 12:10       ` jamal
@ 2001-10-02 22:00         ` jamal
  2001-10-03  8:34           ` Ingo Molnar
  2001-10-03  9:38           ` Ingo Molnar
  2001-10-03  8:38         ` Ingo Molnar
  1 sibling, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-02 22:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, kuznet, Robert Olsson, Benjamin LaHaise, netdev,
	Linus Torvalds, Alan Cox


Ingo Molnar <mingo@elte.hu> wrote:

>> Silencing a specific target cannot be done by IRQ masking, you have to
>> ask the controller to shut up. It may be the default "shut up" handler
>> is disable_irq but that is non optimal.

>this could be done later on, but i think this is out of question for 2.4,
>as it needs extensive changes in irq handler and network driver API.

This already is done in the current NAPI patch which you should have seen
by now. NAPI is backward compatible: It would work just fine with 2.4 and
drivers can be upgraded slowly.
If theres anything that should make it into 2.4 then NAPI it should be
(with some componets from your code that still needs to be proven under
different workloads).

>> And how do you select max_rate sanely? [...]

>> Saying "hey, that's the users problem", is _not_ a solution. It needs
>> to have some automatic cut-off that finds the right sustainable rate
>> automatically, instead of hardcoding random default values and asking
>> the user to know the unknowable.

>good point. I did not ignore this problem, i was just unable to find any
>solution that felt robust, so i convinced myself that max_rate is the
>best idea :-)

if you havent taken a look at NAPI please do so instead of creating these
nightly brainstorm patches. With all due respect, if you insist on doing
that please have the courtesy of at least posting results/numbers of how
this improved things and under what workloads and conditions.
I do believe that some of the pieces of what you have would help -- in
conjunction with NAPI.
A scenario where we have an appearing ksoftirqd, then disapearing and then
a new kpolld showing up just indicates very bad engineering/juju which
seems to be based on pulling tricks out of a hat.

Lets work together instead of creating chaos.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 14:30   ` Alan Cox
@ 2001-10-02 20:51     ` Ingo Molnar
  0 siblings, 0 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-02 20:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ben Greear, linux-kernel


On Tue, 2 Oct 2001, Alan Cox wrote:

> What you really care about is limiting the total amount of CPU time
> used for interrupt processing so that usermode progress is made.
> [...]

exactly. The estimator in -D9 tries to achieve precisely this, both
hardirqs and softirqs are measured.

> Silencing a specific target cannot be done by IRQ masking, you have to
> ask the controller to shut up. It may be the default "shut up" handler
> is disable_irq but that is non optimal.

this could be done later on, but i think this is out of question for 2.4,
as it needs extensive changes in irq handler and network driver API.

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 17:03       ` Robert Olsson
  2001-10-02 17:37         ` jamal
@ 2001-10-02 19:46         ` Andreas Dilger
  1 sibling, 0 replies; 151+ messages in thread
From: Andreas Dilger @ 2001-10-02 19:46 UTC (permalink / raw)
  To: Robert Olsson
  Cc: Ben Greear, Benjamin LaHaise, jamal, linux-kernel, kuznet,
	Ingo Molnar, netdev

On Oct 02, 2001  19:03 +0200, Robert Olsson wrote:
> Jamal mentioned some about the polling efforts for Linux. I can give some
> experimental data here with GIGE. Motivation, implantation etc is in paper 
> to presented at USENIX Oakland. 

How do you determine the polling rate?  I take it that this is a different
patch than Ingo's?

> Iface   MTU Met  RX-OK RX-ERR RX-DRP RX-OVR  TX-OK TX-ERR TX-DRP TX-OVR Flags
> eth0   1500   0 4031309 7803725 7803725 5968699    22     0      0      0 BRU
> eth1   1500   0     18      0      0      0 4031305      0      0      0 BRU
> 
> The RX-ERR, RX-DRP are bugs from the e1000 driver. Anyway we getting 40% of 
> packet storm routed. With a estimated throughput is about 350.000 p/s 

Are you sure they are "bugs" and not dropped packets?  It seems to me that
RX-ERR == RX-DRP, which would seem to me that the receive buffers are full
on the card and are not being emptied quickly enough (or maybe that is
indicated by RX-OVR...)  I don't know whether it is _possible_ to empty
the buffers quickly enough, I suppose CPU usage info would also shed some
light on that.

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02 17:03       ` Robert Olsson
@ 2001-10-02 17:37         ` jamal
  2001-10-02 19:46         ` Andreas Dilger
  1 sibling, 0 replies; 151+ messages in thread
From: jamal @ 2001-10-02 17:37 UTC (permalink / raw)
  To: Robert Olsson
  Cc: Ben Greear, Benjamin LaHaise, linux-kernel, kuznet, Ingo Molnar, netdev



Some data on lesser worthy cards (i.e 10/100) on 2.4.7
can be found at:
http://www.cyberus.ca/~hadi/247-res/

cheers,
jamal

On Tue, 2 Oct 2001, Robert Olsson wrote:

>
>
> Hello!
>
> Jamal mentioned some about the polling efforts for Linux. I can give some
> experimental data here with GIGE. Motivation, implantation etc is in paper
> to presented at USENIX Oakland.
>
> Below a IP forwarding test. Injected 10 Million 64 byte packets into eth0 at
> a speed of 890.000 p/s received and routed and TX:ed on eth1.
>
> PIII @ 933 MHz. Kernel UP 2.4.10 with polling patch the are NIC's e1000
> eth0 (irq=24) and eth1 (irq=26)
>
>
> Iface   MTU Met  RX-OK RX-ERR RX-DRP RX-OVR  TX-OK TX-ERR TX-DRP TX-OVR Flags
> eth0   1500   0 4031309 7803725 7803725 5968699     22      0      0      0 BRU
> eth1   1500   0     18      0      0      0 4031305      0      0      0 BRU
>
>
> The RX-ERR, RX-DRP are bugs from the e1000 driver. Anyway we getting 40% of
> packet storm routed. With a estimated throughput is about 350.000 p/s
>
>  irq       CPU0
>  24:      80652   IO-APIC-level  e1000
>  26:         41   IO-APIC-level  e1000
>
> The for RX (polling) we use only 24 interrupts. TX is mitigated (not polled)
> in this run. We see a lot more interrupts for same amount of packets. I think
> we can actually tune this a bit... And I should also say that RxIntDelay=0
> (e1000 driver). So there is no latency before the driver register for polling
> by kernel.
>
> USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
> root         3  3.0  0.0     0     0  ?  SWN  12:51   0:11 (ksoftirqd_CPU0)
>
> The polling (softirq) now handled by ksoftirqd  but I have seen a path from
> Ingo that schedules without the need for ksoftirqd.
>
> Also note that during poll we disable only RX interrupts so all other device
> interrupts/functions are handled properly.
>
> And tulip variants of this is in production use and seems very solid. The
> kernel code part holds ANK-trademark. :-)
>
> Cheers.
>
> 						--ro
>


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  5:13     ` Benjamin LaHaise
  2001-10-02  5:55       ` Ben Greear
  2001-10-02 12:10       ` jamal
@ 2001-10-02 17:03       ` Robert Olsson
  2001-10-02 17:37         ` jamal
  2001-10-02 19:46         ` Andreas Dilger
  2 siblings, 2 replies; 151+ messages in thread
From: Robert Olsson @ 2001-10-02 17:03 UTC (permalink / raw)
  To: Ben Greear
  Cc: Benjamin LaHaise, jamal, linux-kernel, kuznet, Robert Olsson,
	Ingo Molnar, netdev



Hello!

Jamal mentioned some about the polling efforts for Linux. I can give some
experimental data here with GIGE. Motivation, implantation etc is in paper 
to presented at USENIX Oakland. 

Below a IP forwarding test. Injected 10 Million 64 byte packets into eth0 at 
a speed of 890.000 p/s received and routed and TX:ed on eth1.

PIII @ 933 MHz. Kernel UP 2.4.10 with polling patch the are NIC's e1000 
eth0 (irq=24) and eth1 (irq=26)


Iface   MTU Met  RX-OK RX-ERR RX-DRP RX-OVR  TX-OK TX-ERR TX-DRP TX-OVR Flags
eth0   1500   0 4031309 7803725 7803725 5968699     22      0      0      0 BRU
eth1   1500   0     18      0      0      0 4031305      0      0      0 BRU


The RX-ERR, RX-DRP are bugs from the e1000 driver. Anyway we getting 40% of 
packet storm routed. With a estimated throughput is about 350.000 p/s 

 irq       CPU0       
 24:      80652   IO-APIC-level  e1000
 26:         41   IO-APIC-level  e1000

The for RX (polling) we use only 24 interrupts. TX is mitigated (not polled) 
in this run. We see a lot more interrupts for same amount of packets. I think 
we can actually tune this a bit... And I should also say that RxIntDelay=0
(e1000 driver). So there is no latency before the driver register for polling 
by kernel.

USER       PID %CPU %MEM  SIZE   RSS TTY STAT START   TIME COMMAND
root         3  3.0  0.0     0     0  ?  SWN  12:51   0:11 (ksoftirqd_CPU0)

The polling (softirq) now handled by ksoftirqd  but I have seen a path from 
Ingo that schedules without the need for ksoftirqd.

Also note that during poll we disable only RX interrupts so all other device
interrupts/functions are handled properly.

And tulip variants of this is in production use and seems very solid. The 
kernel code part holds ANK-trademark. :-)

Cheers.
 
						--ro

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:50 ` Ben Greear
@ 2001-10-02 14:30   ` Alan Cox
  2001-10-02 20:51     ` Ingo Molnar
  0 siblings, 1 reply; 151+ messages in thread
From: Alan Cox @ 2001-10-02 14:30 UTC (permalink / raw)
  To: Ben Greear; +Cc: mingo, linux-kernel

> I'm all for anything that speeds up (and makes more reliable) high network
> speeds, but I often run with 8+ ethernet devices, so IRQs have to be shared,
> and a 10ms lockdown on an interface could lose lots of packets.  Although
> it's not a perfect solution, maybe you could (in the kernel) multiple the
> max by the number of things using that IRQ?  For example, if you have four
> ethernet drivers on one IRQ, then let that IRQ fire 4 times faster than
> normal before putting it in lockdown...

What you really care about is limiting the total amount of CPU time used for
interrupt processing so that usermode progress is made. The network layer
shows this up paticularly badly because (and its kind of hard to avoid this)
it frees resources on the hardware before userspace has processed them. 

Silencing a specific target cannot be done by IRQ masking, you have to 
ask the controller to shut up. It may be the default "shut up" handler
is disable_irq but that is non optimal.

Having driver callbacks as part of the irq handler also massively improves
the effect of the event, because faced with an IRQ storm a card can

-	decide if it is the guily party

If so

-	consider switching to polled mode
-	change its ring buffer size to reduce IRQ load and up latency
	as a tradeoff
-	anything else magical the hardware has (like retuning irq
	mitigation registers)

Alan

	

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  5:13     ` Benjamin LaHaise
  2001-10-02  5:55       ` Ben Greear
@ 2001-10-02 12:10       ` jamal
  2001-10-02 22:00         ` jamal
  2001-10-03  8:38         ` Ingo Molnar
  2001-10-02 17:03       ` Robert Olsson
  2 siblings, 2 replies; 151+ messages in thread
From: jamal @ 2001-10-02 12:10 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel, kuznet, Robert Olsson, Ingo Molnar, netdev



On Tue, 2 Oct 2001, Benjamin LaHaise wrote:

> On Mon, Oct 01, 2001 at 09:54:49PM -0400, jamal wrote:
>
> > And how does /proc/irq/NR/max_rate solve this?
> > I have a feeling you are trying to say that varying /proc/irq/NR/max_rate
> > gives opportunity for user processes to execute;
> > note, although that is bad logic, you could also modify the high and low
> > watermarks for when we have congestion in the backlog queue
> > (This is already doable via /proc)
>
> The high and low watermarks are only sufficient if the task the machine is
> performing is limited to bh mode operations.  What I mean is that user space
> can be starved by the cyclic nature of the network queues: they will
> eventually be emptied, at which time more interrupts will be permitted.
>

Which hardware flow control has been doing since 2.1 days;

> > It is unfair to add any latency to a device that didnt cause or
> > contributre to the havoc.
>
> I disagree.  When a machine is overloaded, everything gets slower.  But a
> side effect of delaying interrupts is that more work gets done for each
> irq handler that is run and efficiency goes up.  The hard part is balancing
> the two in an attempt to achieve a steady rate of progress.
>

Let me see if i understand this:
-scheme 1: shut down only actions (within a device) that are contributing
to the overload and only they get affected because they are misbehaving;
when things get better(and we know when they get better), we turn it on
again
- scheme two: shut down the IRQ which might infact include other devices
for a jiffy or two (which doesnt mean the condition got better)

Are you saying that you disagree scheme 1 is better?

> > I think you missed my point. i am saying there is more than one source of
> > interupt for that same IRQ number that you are indiscrimately shutting
> > down in a network device.
>
> You're missing the effect that irq throttling has: it results in a system
> that is effectively running in "polled" mode.  Information does get
> processed, and thruput remains high, it is just that some additional
> latency is found in operations.  Which is acceptable by definition as
> the system is under extreme load.

sure. Just like the giant bottom half lock is acceptable when you can do
fine grained locking ;->
Dont preach polling to me; i am already a convert and you attended the
presentation i gave. Weve had patches for months which have been running
on live system. We were just waiting for 2.5 ...

>
> > So, assuming that tx complete interupts do actually shut you down
> > (although i doubt that very much given the classical Donald Becker tx
> > descriptor prunning) pick another interupt source; lets say MII link
> > status; why do you want to kill that when it is not causing any noise but
> > is a source of good asynchronous information (that could be used for
> > example in HA systems)?
>
> That information will eventually be picked up.  I doubt the extra latency
> will be of significant note.  If it is, you've got realtime concerns,
> which is not our goal to address at this time.
>

You are still missing the point (by humping on the literal meaning of the
example i provide), the point is: fine grained vs shutting down the whole
IRQ.

>
> > and what is this "known safe limit"? ;->
>
> It's system dependant.  It's load dependant.  For a short list of the number
> of factors that you have to include to compute this:
>
> 	- number of cycles userspace needs to run
> 	- number of cache misses that userspace is forced to
> 	  incur due to irq handlers running
> 	- amount of time to dedicate to the irq handler
> 	- variance due to error path handling
> 	- increased system cpu usage due to higher memory load
> 	- front side bus speed of cpu
> 	- speed of cpu
> 	- length of cpu pipelines
> 	- time spent waiting on io cycles
> 	.....
>
> It is non-trivial to determine a limit.  And trying to tune a system
> automatically is just as hard: which factor do you choose for the system
> to attempt to tune itself with?  How does that choice affect users who
> want to tune for other loads?  What if latency is more important than
> dropping data?
>
> There are a lot of choices as to how we handle these situations.  They
> all involve tradeoffs of one kind or another.  Personally, I have a
> preference towards irq rate limiting as I have measured the tradeoff
> between latency and thruput, and by putting that control in the hands of
> the admin, the choice that is best for the real load of the system is
> not made at compile time.
>
> If you look at what other operating systems do to schedule interrupts
> as tasks and then looks at the actual cost, is it really something we
> want to do?  Linux has made a point of keeping things as simple as
> possible, and it has brought us great wins because we do not have the
> overhead that other, more complicated systems have chosen.  It might
> be a loss in a specific case to rate limit interrupts, but if that is
> so, just change the rate.  What can you say about the dynamic self
> tuning techniques that didn't take into account that particular type
> of load?  Recompiling is not always an option.
>

I am not sure where you are getting the  opinion that there is recompiling
involved or how what we have is complex (the patch is much smaller than
what Ingo posted);
And no, you dont need to maintain any state of all those things in your
list; in 2.4, which is a good start, you have the system load being probed
via a second order effect i.e the growth rate of the backlog queue is a
good indicator that the system is not pulling packets off fast enough.
This is a very good measure of _all_ those items on your list. I am not
saying it is God's answer, merely pointing that it is a good indicator
which doesnt need to maintain state of 1000 items or cause additional
computations on the datapath.
We get a fairly early warning that we are about to be overloaded. We can then
shut off the offending device's _receive_ interupt source when it doesnt
heed the congestion notification advice weve been giving it. It heeds the
advice by mitigating.
In the 2.5 patch (i should say it is a clean patch to 2.4 actually and is
backward compatible) we worked around the fact that the 2.4 solution
requires a specific NIC feature (mitigation) among a lot of other things.
Infact we have already proven that mitigation is only good when you have
one or two NICs on the system.

> > What we are providing is actually a scheme to exactly measure that "known
> > safe limit" you are refering to without depending on someone having to
> > tell you "here's a good number for that 8 way xeon"
> > If there is system capacity available  why the fsck is it not being used?
>
> That's a choice for the admin to make.  Sometimes having reserves that aren't
> used is a safety net that people are willing to pay for.  ext2 has by
> default a reserve that isn't normally used.  Do people complain?  No.  It
> buys several useful features (resistance against fragmentation, space for
> daemon temporary files on disk full, ...) that pay dividends of the cost.
>

I am not sure whether you are trolling or not. We are talking about a
system conserving a work principle and you compare it a reservation
system.

> Is irq throttling the be all and end all?  No.  Can other techniques work
> better?  Yes.  Always?  No.  And nothing prevents us from using this and
> other techniques together.  Please don't dismiss it solely because you
> see cases that it doesn't handle.
>

I am not dismising the whole patch. I most definetly dismiss those
two ideas i pointed out.

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:16 Ingo Molnar
                   ` (3 preceding siblings ...)
  2001-10-01 23:03 ` Linus Torvalds
@ 2001-10-02  6:50 ` Marcus Sundberg
  2001-10-03  8:47   ` Ingo Molnar
  4 siblings, 1 reply; 151+ messages in thread
From: Marcus Sundberg @ 2001-10-02  6:50 UTC (permalink / raw)
  To: linux-kernel

mingo@elte.hu (Ingo Molnar) writes:

> (note that in case of shared interrupts, another 'innocent' device might
> stay disabled for some short amount of time as well - but this is not an
> issue because this mitigation does not make that device inoperable, it
> just delays its interrupt by up to 10 msecs. Plus, modern systems have
> properly distributed interrupts.)

Guess my P3-based laptop doesn't count as modern then:

  0:    7602983          XT-PIC  timer
  1:      10575          XT-PIC  keyboard
  2:          0          XT-PIC  cascade
  8:          1          XT-PIC  rtc
 11:    1626004          XT-PIC  Toshiba America Info Systems ToPIC95 PCI to Cardbus Bridge with ZV Support, Toshiba America Info Systems ToPIC95 PCI to Cardbus Bridge with ZV Support (#2), usb-uhci, eth0, BreezeCom Card, Intel 440MX, irda0
 12:       1342          XT-PIC  PS/2 Mouse
 14:      23605          XT-PIC  ide0

I can't even imagine why they did it like this...

//Marcus
-- 
---------------------------------+---------------------------------
         Marcus Sundberg         |      Phone: +46 707 452062
   Embedded Systems Consultant   |     Email: marcus@cendio.se
        Cendio Systems AB        |      http://www.cendio.com

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  5:13     ` Benjamin LaHaise
@ 2001-10-02  5:55       ` Ben Greear
  2001-10-03  9:22         ` Ingo Molnar
  2001-10-02 12:10       ` jamal
  2001-10-02 17:03       ` Robert Olsson
  2 siblings, 1 reply; 151+ messages in thread
From: Ben Greear @ 2001-10-02  5:55 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: jamal, linux-kernel, kuznet, Robert Olsson, Ingo Molnar, netdev

Benjamin LaHaise wrote:

> You're missing the effect that irq throttling has: it results in a system
> that is effectively running in "polled" mode.  Information does get
> processed, and thruput remains high, it is just that some additional
> latency is found in operations.  Which is acceptable by definition as
> the system is under extreme load.

So, when you turn off the IRQs, are the drivers somehow made
aware of this so that they can go into polling mode?  That might
fix the 10ms latency/starvation problem that bothers me...

Assuming it is fairly easy to put a driver into polling mode, if
you are explicitly told to do so, maybe this generic IRQ coelescing
could be the thing that generically poked all drivers.  Drivers
that are too primitive to understand or deal with polling can just
wait their 10ms, but smarter ones will happily poll away untill told
not to by the IRQ load limiter...

> That information will eventually be picked up.  I doubt the extra latency
> will be of significant note.  If it is, you've got realtime concerns,
> which is not our goal to address at this time.

I'm more worried about dropped pkts.  If you can receive 10k packets per second,
then you can receive (lose) 100 packets in 10ms....

> 
> > and what is this "known safe limit"? ;->
> 
> It's system dependant.  It's load dependant.  For a short list of the number
> of factors that you have to include to compute this:
> 
>         - number of cycles userspace needs to run
>         - number of cache misses that userspace is forced to
>           incur due to irq handlers running
>         - amount of time to dedicate to the irq handler
>         - variance due to error path handling
>         - increased system cpu usage due to higher memory load
>         - front side bus speed of cpu
>         - speed of cpu
>         - length of cpu pipelines
>         - time spent waiting on io cycles
>         .....

Hopefully, at the very worst, you can have configurables like:
 - User-space responsiveness v/s kernel IRQ handling,
   range of 1 to 100, where 100 == userRules.
 - Latency: 1 who cares, so long as work happens, to 100: Fast and furious, or not at all.

In otherwords, for gods sake don't make me have to understand how my
cache and CPU pipeline works!! :)


- Another Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  1:54   ` jamal
@ 2001-10-02  5:13     ` Benjamin LaHaise
  2001-10-02  5:55       ` Ben Greear
                         ` (2 more replies)
  0 siblings, 3 replies; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-02  5:13 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel, kuznet, Robert Olsson, Ingo Molnar, netdev

On Mon, Oct 01, 2001 at 09:54:49PM -0400, jamal wrote:
> i am not sure what you are getting at. CPU load is of course a function of
> the CPU capacity. assuming that interupts are the only source of system
> load is just bad engineering.

Indeed.  I didn't mean to exclude anything by omission.

> And how does /proc/irq/NR/max_rate solve this?
> I have a feeling you are trying to say that varying /proc/irq/NR/max_rate
> gives opportunity for user processes to execute;
> note, although that is bad logic, you could also modify the high and low
> watermarks for when we have congestion in the backlog queue
> (This is already doable via /proc)

The high and low watermarks are only sufficient if the task the machine is 
performing is limited to bh mode operations.  What I mean is that user space 
can be starved by the cyclic nature of the network queues: they will 
eventually be emptied, at which time more interrupts will be permitted.

> It is unfair to add any latency to a device that didnt cause or
> contributre to the havoc.

I disagree.  When a machine is overloaded, everything gets slower.  But a 
side effect of delaying interrupts is that more work gets done for each 
irq handler that is run and efficiency goes up.  The hard part is balancing 
the two in an attempt to achieve a steady rate of progress.

> I think you missed my point. i am saying there is more than one source of
> interupt for that same IRQ number that you are indiscrimately shutting
> down in a network device.

You're missing the effect that irq throttling has: it results in a system 
that is effectively running in "polled" mode.  Information does get 
processed, and thruput remains high, it is just that some additional 
latency is found in operations.  Which is acceptable by definition as 
the system is under extreme load.

> So, assuming that tx complete interupts do actually shut you down
> (although i doubt that very much given the classical Donald Becker tx
> descriptor prunning) pick another interupt source; lets say MII link
> status; why do you want to kill that when it is not causing any noise but
> is a source of good asynchronous information (that could be used for
> example in HA systems)?

That information will eventually be picked up.  I doubt the extra latency 
will be of significant note.  If it is, you've got realtime concerns, 
which is not our goal to address at this time.


> and what is this "known safe limit"? ;->

It's system dependant.  It's load dependant.  For a short list of the number 
of factors that you have to include to compute this:

	- number of cycles userspace needs to run
	- number of cache misses that userspace is forced to 
	  incur due to irq handlers running
	- amount of time to dedicate to the irq handler
	- variance due to error path handling
	- increased system cpu usage due to higher memory load
	- front side bus speed of cpu
	- speed of cpu
	- length of cpu pipelines
	- time spent waiting on io cycles
	.....

It is non-trivial to determine a limit.  And trying to tune a system 
automatically is just as hard: which factor do you choose for the system 
to attempt to tune itself with?  How does that choice affect users who 
want to tune for other loads?  What if latency is more important than 
dropping data?

There are a lot of choices as to how we handle these situations.  They 
all involve tradeoffs of one kind or another.  Personally, I have a 
preference towards irq rate limiting as I have measured the tradeoff 
between latency and thruput, and by putting that control in the hands of 
the admin, the choice that is best for the real load of the system is 
not made at compile time.

If you look at what other operating systems do to schedule interrupts 
as tasks and then looks at the actual cost, is it really something we 
want to do?  Linux has made a point of keeping things as simple as 
possible, and it has brought us great wins because we do not have the 
overhead that other, more complicated systems have chosen.  It might 
be a loss in a specific case to rate limit interrupts, but if that is 
so, just change the rate.  What can you say about the dynamic self 
tuning techniques that didn't take into account that particular type 
of load?  Recompiling is not always an option.

> What we are providing is actually a scheme to exactly measure that "known
> safe limit" you are refering to without depending on someone having to
> tell you "here's a good number for that 8 way xeon"
> If there is system capacity available  why the fsck is it not being used?

That's a choice for the admin to make.  Sometimes having reserves that aren't 
used is a safety net that people are willing to pay for.  ext2 has by 
default a reserve that isn't normally used.  Do people complain?  No.  It 
buys several useful features (resistance against fragmentation, space for 
daemon temporary files on disk full, ...) that pay dividends of the cost.

Is irq throttling the be all and end all?  No.  Can other techniques work
better?  Yes.  Always?  No.  And nothing prevents us from using this and 
other techniques together.  Please don't dismiss it solely because you 
see cases that it doesn't handle.

		-ben

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  1:04 ` Benjamin LaHaise
@ 2001-10-02  1:54   ` jamal
  2001-10-02  5:13     ` Benjamin LaHaise
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-02  1:54 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: linux-kernel, kuznet, Robert Olsson, Ingo Molnar, netdev



On Mon, 1 Oct 2001, Benjamin LaHaise wrote:

> On Mon, Oct 01, 2001 at 08:41:20PM -0400, jamal wrote:
> >
> > >The new mechanizm:
> > >
> > >- the irq handling code has been extended to support 'soft mitigation',
> > >  ie. to mitigate the rate of hardware interrupts, without support from
> > >  the actual hardware. There is a reasonable default, but the value can
> > >  also be decreased/increased on a per-irq basis via
> > > /proc/irq/NR/max_rate.
> >
> > I am sorry, but this is bogus. There is no _reasonable value_. Reasonable
> > value is dependent on system load and has never been and never
> > will be measured by interupt rates. Even in non-work conserving schemes
>
> It is not dependant on system load, but rather on the performance of the
> CPU and the number of interrupt sources in the system.

i am not sure what you are getting at. CPU load is of course a function of
the CPU capacity. assuming that interupts are the only source of system
load is just bad engineering.

>
> > There is already a feedback system that is built into 2.4 that
> > measures system load by the rate at which the system processes the backlog
> > queue. Look at netif_rx return values. The only driver that utilizes this
> > is currently the tulip. Look at the tulip code.
> > This in conjuction with h/ware flow control should give you sustainable
> > system.
>
> Not quite.  You're still ignoring the effect of interrupts on the users'
> ability to execute instructions during their timeslice.
>

And how does /proc/irq/NR/max_rate solve this?
I have a feeling you are trying to say that varying /proc/irq/NR/max_rate
gives opportunity for user processes to execute;
note, although that is bad logic, you could also modify the high and low
watermarks for when we have congestion in the backlog queue
(This is already doable via /proc)

> > [Granted that mitigation is a hardware specific solution; the scheme we
> > presented at the kernel summit is the next level to this and will be
> > non-dependednt on h/ware.]
> >
> > >(note that in case of shared interrupts, another 'innocent' device might
> > >stay disabled for some short amount of time as well - but this is not an
> > >issue because this mitigation does not make that device inoperable, it
> > >just delays its interrupt by up to 10 msecs. Plus, modern systems have
> > >properly distributed interrupts.)
> >
> > This is a _really bad_ idea. not just because you are punishing other
> > devices.
>
> I'm afraid I have to disagree with you on this statement.  What I will
> agree with is that 10msec is too much.
>

It is unfair to add any latency to a device that didnt cause or
contributre to the havoc.


> > Lets take network devices as examples: we dont want to disable interupts;
> > we want to disable offending actions within the device. For example, it is
> > ok to disable/mitigate receive interupts because they are overloading the
> > system but not transmit completion because that will add to the overall
> > latency.
>
> Wrong.  Let me introduce you to my 486DX/33.  It has PCI.  I'm putting my
> gige card into the poor beast.  transmitting full out, it can receive a
> sufficiently high number of tx done interrupts that it has no CPU cycles left
> to run, say, gated in userspace.
>

I think you missed my point. i am saying there is more than one source of
interupt for that same IRQ number that you are indiscrimately shutting
down in a network device.
So, assuming that tx complete interupts do actually shut you down
(although i doubt that very much given the classical Donald Becker tx
descriptor prunning) pick another interupt source; lets say MII link
status; why do you want to kill that when it is not causing any noise but
is a source of good asynchronous information (that could be used for
example in HA systems)?

> Falling back to polled operation is a well known technique in realtime and
> reliable systems.  By limiting the interrupt rate to a known safe limit,
> the system will remain responsive to non-interrupt tasks even under heavy
> interrupt loads.  This is the point at which a thruput graph on a slow
> machine shows a complete breakdown in performance, which is always possible
> on a slow enough CPU with a high performance device that takes input from
> a remotely controlled user.  This is *required*, and is not optional, and
> there is no way that a system can avoid it without making every interrupt
> a task, but that's a mess nobody wants to see in Linux.
>

and what is this "known safe limit"? ;->
What we are providing is actually a scheme to exactly measure that "known
safe limit" you are refering to without depending on someone having to
tell you "here's a good number for that 8 way xeon"
If there is system capacity available  why the fsck is it not being used?

cheers,
jamal


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-02  0:41 jamal
@ 2001-10-02  1:04 ` Benjamin LaHaise
  2001-10-02  1:54   ` jamal
  0 siblings, 1 reply; 151+ messages in thread
From: Benjamin LaHaise @ 2001-10-02  1:04 UTC (permalink / raw)
  To: jamal; +Cc: linux-kernel, kuznet, Robert Olsson, Ingo Molnar, netdev

On Mon, Oct 01, 2001 at 08:41:20PM -0400, jamal wrote:
> 
> >The new mechanizm:
> >
> >- the irq handling code has been extended to support 'soft mitigation',
> >  ie. to mitigate the rate of hardware interrupts, without support from
> >  the actual hardware. There is a reasonable default, but the value can
> >  also be decreased/increased on a per-irq basis via
> > /proc/irq/NR/max_rate.
> 
> I am sorry, but this is bogus. There is no _reasonable value_. Reasonable
> value is dependent on system load and has never been and never
> will be measured by interupt rates. Even in non-work conserving schemes

It is not dependant on system load, but rather on the performance of the 
CPU and the number of interrupt sources in the system.

> There is already a feedback system that is built into 2.4 that
> measures system load by the rate at which the system processes the backlog
> queue. Look at netif_rx return values. The only driver that utilizes this
> is currently the tulip. Look at the tulip code.
> This in conjuction with h/ware flow control should give you sustainable
> system.

Not quite.  You're still ignoring the effect of interrupts on the users' 
ability to execute instructions during their timeslice.

> [Granted that mitigation is a hardware specific solution; the scheme we
> presented at the kernel summit is the next level to this and will be
> non-dependednt on h/ware.]
> 
> >(note that in case of shared interrupts, another 'innocent' device might
> >stay disabled for some short amount of time as well - but this is not an
> >issue because this mitigation does not make that device inoperable, it
> >just delays its interrupt by up to 10 msecs. Plus, modern systems have
> >properly distributed interrupts.)
> 
> This is a _really bad_ idea. not just because you are punishing other
> devices.

I'm afraid I have to disagree with you on this statement.  What I will 
agree with is that 10msec is too much.

> Lets take network devices as examples: we dont want to disable interupts;
> we want to disable offending actions within the device. For example, it is
> ok to disable/mitigate receive interupts because they are overloading the
> system but not transmit completion because that will add to the overall
> latency.

Wrong.  Let me introduce you to my 486DX/33.  It has PCI.  I'm putting my 
gige card into the poor beast.  transmitting full out, it can receive a 
sufficiently high number of tx done interrupts that it has no CPU cycles left 
to run, say, gated in userspace.

Falling back to polled operation is a well known technique in realtime and 
reliable systems.  By limiting the interrupt rate to a known safe limit, 
the system will remain responsive to non-interrupt tasks even under heavy 
interrupt loads.  This is the point at which a thruput graph on a slow 
machine shows a complete breakdown in performance, which is always possible 
on a slow enough CPU with a high performance device that takes input from 
a remotely controlled user.  This is *required*, and is not optional, and 
there is no way that a system can avoid it without making every interrupt 
a task, but that's a mess nobody wants to see in Linux.

		-ben


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
@ 2001-10-02  0:41 jamal
  2001-10-02  1:04 ` Benjamin LaHaise
  0 siblings, 1 reply; 151+ messages in thread
From: jamal @ 2001-10-02  0:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: kuznet, Robert Olsson, Ingo Molnar, netdev


>The new mechanizm:
>
>- the irq handling code has been extended to support 'soft mitigation',
>  ie. to mitigate the rate of hardware interrupts, without support from
>  the actual hardware. There is a reasonable default, but the value can
>  also be decreased/increased on a per-irq basis via
> /proc/irq/NR/max_rate.

I am sorry, but this is bogus. There is no _reasonable value_. Reasonable
value is dependent on system load and has never been and never
will be measured by interupt rates. Even in non-work conserving schemes
There is already a feedback system that is built into 2.4 that
measures system load by the rate at which the system processes the backlog
queue. Look at netif_rx return values. The only driver that utilizes this
is currently the tulip. Look at the tulip code.
This in conjuction with h/ware flow control should give you sustainable
system.
[Granted that mitigation is a hardware specific solution; the scheme we
presented at the kernel summit is the next level to this and will be
non-dependednt on h/ware.]

>(note that in case of shared interrupts, another 'innocent' device might
>stay disabled for some short amount of time as well - but this is not an
>issue because this mitigation does not make that device inoperable, it
>just delays its interrupt by up to 10 msecs. Plus, modern systems have
>properly distributed interrupts.)

This is a _really bad_ idea. not just because you are punishing other
devices.
Lets take network devices as examples: we dont want to disable interupts;
we want to disable offending actions within the device. For example, it is
ok to disable/mitigate receive interupts because they are overloading the
system but not transmit completion because that will add to the overall
latency.

cheers,
jamal


PS: we have been testing what was presented at the kernel summit for the
last few months with very promising results. Both on live and setups which
are experimental where data is generated at very high rates with hardware
traffic generators


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:16 Ingo Molnar
                   ` (2 preceding siblings ...)
  2001-10-01 22:50 ` Ben Greear
@ 2001-10-01 23:03 ` Linus Torvalds
  2001-10-02  6:50 ` Marcus Sundberg
  4 siblings, 0 replies; 151+ messages in thread
From: Linus Torvalds @ 2001-10-01 23:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Alan Cox, Alexey Kuznetsov, Andrea Arcangeli, Simon Kirby


On Tue, 2 Oct 2001, Ingo Molnar wrote:
>
> - the irq handling code has been extended to support 'soft mitigation',
>   ie. to mitigate the rate of hardware interrupts, without support from
>   the actual hardware. There is a reasonable default, but the value can
>   also be decreased/increased on a per-irq basis via /proc/irq/NR/max_rate.

Adn how do you select max_rate sanely? It depends on how heavy each
interrupt is, the speed of the CPU etc etc. A rate that works for a
network card with a certain packet size may be completely ineffective on
the same machine with the same network card but a different packet size.

When you select the wrong number, you slow the system down for no good
reason (too low a number) or your mitigation has zero effect because the
system can't do that many interrupts per tick anyway (too high a number).

Saying "hey, that's the users problem", is _not_ a solution. It needs to
have some automatic cut-off that finds the right sustainable rate
automatically, instead of hardcoding random default values and asking the
user to know the unknowable.

Automatically doing the right thing may be hard, but it should be
solvable. In particular, something like the following _may_ be a workable
approach, rather than having a hardcoded limit:

 - have a notion of "made progress". Certain events count as progress, and
   will reset the interrupt count.
	Examples of "progress":
		- idle task loop
		- a context switch

 - depend on the fact that on a PC, the timer interrupt has the highest
   priority, and make the timer interrupt do something like

	if (!made_progress) {
		disable_next_irq = 1;
	} else
		made_progress = 0;

 - have all other interrupts do something like

	if (disable_next_irq)
		goto mitigate;

which just says that we mitigate an irq _only_ if we didn't make any
progress at all. Rather than mitigating on some random count that can
never be perfect.

(Tweak to suit your own definition of "made progress" - maybe you'd like
to require more than just a context switch).

		Linus


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:16 Ingo Molnar
  2001-10-01 22:26 ` Tim Hockin
  2001-10-01 22:36 ` Andreas Dilger
@ 2001-10-01 22:50 ` Ben Greear
  2001-10-02 14:30   ` Alan Cox
  2001-10-01 23:03 ` Linus Torvalds
  2001-10-02  6:50 ` Marcus Sundberg
  4 siblings, 1 reply; 151+ messages in thread
From: Ben Greear @ 2001-10-01 22:50 UTC (permalink / raw)
  To: mingo; +Cc: linux-kernel

Ingo Molnar wrote:

> (note that in case of shared interrupts, another 'innocent' device might
> stay disabled for some short amount of time as well - but this is not an
> issue because this mitigation does not make that device inoperable, it
> just delays its interrupt by up to 10 msecs. Plus, modern systems have
> properly distributed interrupts.)

I'm all for anything that speeds up (and makes more reliable) high network
speeds, but I often run with 8+ ethernet devices, so IRQs have to be shared,
and a 10ms lockdown on an interface could lose lots of packets.  Although
it's not a perfect solution, maybe you could (in the kernel) multiple the
max by the number of things using that IRQ?  For example, if you have four
ethernet drivers on one IRQ, then let that IRQ fire 4 times faster than
normal before putting it in lockdown...

Do you have any idea how many packets-per-second you can get out of a
system (obviously, your system of choice) using your updated code?

(I'm running about 7k packets-per-second tx, and 7k rx, on 3 EEPRO ports
simultaneously on a 1Ghz PIII and 2.4.9-pre10...  This is from user-space,
so much of the CPU is spent hauling my packets to and from the device..)

Ben

-- 
Ben Greear <greearb@candelatech.com>          <Ben_Greear@excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:26 ` Tim Hockin
@ 2001-10-01 22:50   ` Ingo Molnar
  0 siblings, 0 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-01 22:50 UTC (permalink / raw)
  To: Tim Hockin
  Cc: linux-kernel, Linus Torvalds, Alan Cox, Alexey Kuznetsov,
	Andrea Arcangeli, Simon Kirby


On Mon, 1 Oct 2001, Tim Hockin wrote:

> Our solution/needs are slightly different - we want to service as many
> interrupts as possible and do as much network traffic as possible, and
> interactive-tasks be damned.

i the patch in fact enables this too: you can more agressively get irqs
and softirqs executed by increasing max_rate just above the 'critical'
rate you can measure. (and the blocked-interrupts period of time will be
enough to let the softirq work to be finished.) So in fact you might even
end up having higher performance by blocking interrupts in a certain
portion of a timer tick - backlogged work will be processed. Via max_rate
you can partition the percentage of CPU time dedicated to softirq and
process work. (which in your case would be softirq-only work - which
should not be underestimated either.)

	Ingo


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:16 Ingo Molnar
  2001-10-01 22:26 ` Tim Hockin
@ 2001-10-01 22:36 ` Andreas Dilger
  2001-10-01 22:50 ` Ben Greear
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 151+ messages in thread
From: Andreas Dilger @ 2001-10-01 22:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Linus Torvalds, Alan Cox, Alexey Kuznetsov,
	Andrea Arcangeli, Simon Kirby

On Oct 02, 2001  00:16 +0200, Ingo Molnar wrote:
> - the irq handling code has been extended to support 'soft mitigation',
>   ie. to mitigate the rate of hardware interrupts, without support from
>   the actual hardware. There is a reasonable default, but the value can
>   also be decreased/increased on a per-irq basis via /proc/irq/NR/max_rate.
> 
> the method is the following. We count the number of interrupts serviced,
> and if within a jiffy there are more than max_rate interrupts, the code
> disables the IRQ source and marks it as IRQ_MITIGATED. On the next timer
> interrupt the irq_rate_check() function is called, which makes sure that
> 'blocked' irqs are restarted & handled properly.

How far is it to go from a mitigated IRQ (because of too high an interrupt
rate) to a polled interface (e.g. for network cards)?  This was discussed
a number of times to improve overall performance on bust network systems.

Concievably, a network card could tune max_rate to a value where it is
more efficient (CPU wise) to poll the interface instead of using IRQs.
However, waiting for the next regular timer interrupt may be too long
(resulting in lost packets) as buffers overflowed.  Would it also be
possible for a driver to register a "maximum delay" between servicing
interrupts (within reason, on a non-RT system) so that it can say "I
have X kB of buffers, and the maximum line rate is Y kB/s, so I need
to be serviced within X/Y s when polling without losing data".

Cheers, Andreas
--
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert


^ permalink raw reply	[flat|nested] 151+ messages in thread

* Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
  2001-10-01 22:16 Ingo Molnar
@ 2001-10-01 22:26 ` Tim Hockin
  2001-10-01 22:50   ` Ingo Molnar
  2001-10-01 22:36 ` Andreas Dilger
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 151+ messages in thread
From: Tim Hockin @ 2001-10-01 22:26 UTC (permalink / raw)
  To: mingo
  Cc: linux-kernel, Linus Torvalds, Alan Cox, Alexey Kuznetsov,
	Andrea Arcangeli, Simon Kirby

> - a little utility written by Simon Kirby proved that no matter how much
>   softirq throttling, it's easy to lock up a pretty powerful Linux
>   box via a high rate of network interrupts, from relatively low-powered
>   clients as well. 2.4.6, 2.4.7, 2.4.10 all lock up. Alexey said it as
>   well that it's still easy to lock up low-powered Linux routers via more
>   or less normal traffic.

We proved this a year+ ago.  We've got some code brewing to do fair sharing
of IRQs for heavy load situations.  I don't have all the details, but
eventually...

> i've tested the patch on both UP, SMP, XT-PIC and APIC systems, it
> correctly limits network interrupt rates (and other device interrupt
> rates) to the given limit. I've done stress-testing as well. The patch is
> against 2.4.11-pre1, but it applies just fine to the -ac tree as well.

Our solution/needs are slightly different - we want to service as many
interrupts as possible and do as much network traffic as possible, and
interactive-tasks be damned.

^ permalink raw reply	[flat|nested] 151+ messages in thread

* [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
@ 2001-10-01 22:16 Ingo Molnar
  2001-10-01 22:26 ` Tim Hockin
                   ` (4 more replies)
  0 siblings, 5 replies; 151+ messages in thread
From: Ingo Molnar @ 2001-10-01 22:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Alan Cox, Alexey Kuznetsov, Andrea Arcangeli,
	Simon Kirby

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6202 bytes --]


to sum things up, we have three main problem areas that are connected to
hardirq and softirq processing:

- a little utility written by Simon Kirby proved that no matter how much
  softirq throttling, it's easy to lock up a pretty powerful Linux
  box via a high rate of network interrupts, from relatively low-powered
  clients as well. 2.4.6, 2.4.7, 2.4.10 all lock up. Alexey said it as
  well that it's still easy to lock up low-powered Linux routers via more
  or less normal traffic.

- prior 2.4.7 we used to 'leak' softirq handling => we ended up missing
  softirqs in a number of circumstances. Stock 2.4.10 still has a number
  of places that do this too.

- a number of people have reported gigabit performance problems (some
  people reported a 10-20% drop in performance under load) since
  ksoftirqd was added - which was added to fix some of the 2.4.6-
  softirq-handling latency problems.

we also have another problem that often pops up when the BIOS goes bad or
a device driver does some mistake:

- Linux often 'locks up' if it gets into a 'interrupt storm' - when
  interrupt sources that send a very high rate of interrupts. This can be
  seen as boot-time hangs and module-insert-time hangs as well.

the attached patch, while a bit radical, is i believe a robust solution to
all four problems. It gives gigabit performance back, avoids the lockups
and attempts to reach as short softirq-processing latency as possible.

the new mechanizm:

- the irq handling code has been extended to support 'soft mitigation',
  ie. to mitigate the rate of hardware interrupts, without support from
  the actual hardware. There is a reasonable default, but the value can
  also be decreased/increased on a per-irq basis via /proc/irq/NR/max_rate.

the method is the following. We count the number of interrupts serviced,
and if within a jiffy there are more than max_rate interrupts, the code
disables the IRQ source and marks it as IRQ_MITIGATED. On the next timer
interrupt the irq_rate_check() function is called, which makes sure that
'blocked' irqs are restarted & handled properly. The interrupt is disabled
in the interrupt controller, which has the nice side-effect of fixing and
blocking interrupt storms. (The support code for 'soft mitigation' is
designed to be very lightweight, it's a decrement and a test in the IRQ
handling hot path.)

(note that in case of shared interrupts, another 'innocent' device might
stay disabled for some short amount of time as well - but this is not an
issue because this mitigation does not make that device inoperable, it
just delays its interrupt by up to 10 msecs. Plus, modern systems have
properly distributed interrupts.)

- softirq code got simplified significantly. The concept is to 'handle all
  pending softirqs' - just as the hardware IRQ code 'handles all hardware
  interrupts that were passed to it'. Since most of the time there is a
  direct relationship between softirq work and hardirq work, the
  mitigation of hardirqs mitigates softirq load as well.

- ksoftirqd is gone, there is never any softirq pending while
  softirq-unaware code is executing.

- the tasklet code needed some cleanup along the way, and it also won some
  restart-on-enable and restart-on-unlock properties that it lacked
  before. (but which is desired.)

due to these changes, the linecount in softirq.c got smaller by 25%.
[i dropped the unwakeup change - but that one could be useful in the VM,
to eg. unwakeup bdflush or kswapd.]

- drivers can optionally use the set_irq_rate(irq, new_rate) call to
  change the current IRQ rate. Drivers are the ones who know best what
  kind of loads to expect from the hardware, so they might want to
  influence this value. Also, drivers that implement IRQ mitigation
  themselves in hardware, can effectively disable the soft-mitigation code
  by using a very high rate value.

what is the concept behind all this? Simplicity, and concept. We were
clearly heading in the wrong direction: putting more complexity into the
core softirq code to handle some really extreme and unusual cases. Also,
softirqs were slowly morphing into something process-ish - but in Linux we
already have a concept of processes, so we'd have two dualling concepts.
(We still have tasklets, which are not really processes - they are
single-threaded paths of execution.)

with this patch, softirqs can again be what they should be: lightweight
'interrupt code' that processes hard-IRQ events but still does this with
interrupts enabled, to allow for low hard-IRQ latencies. Anything that is
conceptually heavyweight IMO does not belong into softirqs, it should be
moved into process contexts. That will take care of CPU-time usage
accounting and CPU-time-limiting and priority issues as well.

(the patch also imports the latency and softirq-restart fixes from my
previous softirq patches.)

i've tested the patch on both UP, SMP, XT-PIC and APIC systems, it
correctly limits network interrupt rates (and other device interrupt
rates) to the given limit. I've done stress-testing as well. The patch is
against 2.4.11-pre1, but it applies just fine to the -ac tree as well.

with a high irq-rate limit set, ping flooding has this effect on the
test-system:

 [root@mars /root]# vmstat 1
    procs                      memory    swap          io
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in
  0  0  0      0 877024   1140  11364   0   0    12     0 30960
  0  0  0      0 877024   1140  11364   0   0     0     0 30950
  0  0  0      0 877024   1140  11364   0   0     0     0 30520

ie. 30k interrupts/sec. With the max_rate set to 1000 interrupts/sec:

 [root@mars /root]# echo 1000 > /proc/irq/21/max_rate
 [root@mars /root]# vmstat 1
    procs                      memory    swap          io
  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in
  0  0  0      0 877004   1144  11372   0   0     0     0 1112
  0  0  0      0 877004   1144  11372   0   0     0     0 1111
  0  0  0      0 877004   1144  11372   0   0     0     0 1111

so it works just fine here. Interactive tasks are still snappy over the
same interface.

Comments, reports, suggestions and testing feedback is more than welcome,

	Ingo

[-- Attachment #2: Type: TEXT/PLAIN, Size: 26601 bytes --]

--- linux/kernel/ksyms.c.orig	Mon Oct  1 21:52:32 2001
+++ linux/kernel/ksyms.c	Mon Oct  1 21:52:43 2001
@@ -538,8 +538,6 @@
 EXPORT_SYMBOL(tasklet_kill);
 EXPORT_SYMBOL(__run_task_queue);
 EXPORT_SYMBOL(do_softirq);
-EXPORT_SYMBOL(raise_softirq);
-EXPORT_SYMBOL(cpu_raise_softirq);
 EXPORT_SYMBOL(__tasklet_schedule);
 EXPORT_SYMBOL(__tasklet_hi_schedule);
 
--- linux/kernel/softirq.c.orig	Mon Oct  1 21:52:32 2001
+++ linux/kernel/softirq.c	Mon Oct  1 21:53:52 2001
@@ -44,26 +44,11 @@
 
 static struct softirq_action softirq_vec[32] __cacheline_aligned;
 
-/*
- * we cannot loop indefinitely here to avoid userspace starvation,
- * but we also don't want to introduce a worst case 1/HZ latency
- * to the pending events, so lets the scheduler to balance
- * the softirq load for us.
- */
-static inline void wakeup_softirqd(unsigned cpu)
-{
-	struct task_struct * tsk = ksoftirqd_task(cpu);
-
-	if (tsk && tsk->state != TASK_RUNNING)
-		wake_up_process(tsk);
-}
-
 asmlinkage void do_softirq()
 {
 	int cpu = smp_processor_id();
 	__u32 pending;
 	long flags;
-	__u32 mask;
 
 	if (in_interrupt())
 		return;
@@ -75,7 +60,6 @@
 	if (pending) {
 		struct softirq_action *h;
 
-		mask = ~pending;
 		local_bh_disable();
 restart:
 		/* Reset the pending bitmask before enabling irqs */
@@ -95,152 +79,130 @@
 		local_irq_disable();
 
 		pending = softirq_pending(cpu);
-		if (pending & mask) {
-			mask &= ~pending;
+		if (pending)
 			goto restart;
-		}
 		__local_bh_enable();
-
-		if (pending)
-			wakeup_softirqd(cpu);
 	}
 
 	local_irq_restore(flags);
 }
 
-/*
- * This function must run with irq disabled!
- */
-inline void cpu_raise_softirq(unsigned int cpu, unsigned int nr)
-{
-	__cpu_raise_softirq(cpu, nr);
-
-	/*
-	 * If we're in an interrupt or bh, we're done
-	 * (this also catches bh-disabled code). We will
-	 * actually run the softirq once we return from
-	 * the irq or bh.
-	 *
-	 * Otherwise we wake up ksoftirqd to make sure we
-	 * schedule the softirq soon.
-	 */
-	if (!(local_irq_count(cpu) | local_bh_count(cpu)))
-		wakeup_softirqd(cpu);
-}
-
-void raise_softirq(unsigned int nr)
-{
-	long flags;
-
-	local_irq_save(flags);
-	cpu_raise_softirq(smp_processor_id(), nr);
-	local_irq_restore(flags);
-}
-
 void open_softirq(int nr, void (*action)(struct softirq_action*), void *data)
 {
 	softirq_vec[nr].data = data;
 	softirq_vec[nr].action = action;
 }
 
-
 /* Tasklets */
 
 struct tasklet_head tasklet_vec[NR_CPUS] __cacheline_aligned;
 struct tasklet_head tasklet_hi_vec[NR_CPUS] __cacheline_aligned;
 
-void __tasklet_schedule(struct tasklet_struct *t)
+static inline void __tasklet_enable(struct tasklet_struct *t,
+					struct tasklet_head *vec, int softirq)
 {
 	int cpu = smp_processor_id();
-	unsigned long flags;
 
-	local_irq_save(flags);
-	t->next = tasklet_vec[cpu].list;
-	tasklet_vec[cpu].list = t;
-	cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
-	local_irq_restore(flags);
+	smp_mb__before_atomic_dec();
+	if (!atomic_dec_and_test(&t->count))
+		return;
+
+	local_irq_disable();
+	/*
+	 * Being able to clear the SCHED bit from 1 to 0 means
+	 * we got the right to handle this tasklet.
+	 * Setting it from 0 to 1 means we can queue it.
+	 */
+	if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state) && !t->next) {
+		if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) {
+
+			t->next = (vec + cpu)->list;
+			(vec + cpu)->list = t;
+			__cpu_raise_softirq(cpu, softirq);
+		}
+	}
+	local_irq_enable();
+	rerun_softirqs(cpu);
 }
 
-void __tasklet_hi_schedule(struct tasklet_struct *t)
+void tasklet_enable(struct tasklet_struct *t)
+{
+	__tasklet_enable(t, tasklet_vec, TASKLET_SOFTIRQ);
+}
+
+void tasklet_hi_enable(struct tasklet_struct *t)
+{
+	__tasklet_enable(t, tasklet_hi_vec, HI_SOFTIRQ);
+}
+
+static inline void __tasklet_sched(struct tasklet_struct *t,
+					struct tasklet_head *vec, int softirq)
 {
 	int cpu = smp_processor_id();
 	unsigned long flags;
 
 	local_irq_save(flags);
-	t->next = tasklet_hi_vec[cpu].list;
-	tasklet_hi_vec[cpu].list = t;
-	cpu_raise_softirq(cpu, HI_SOFTIRQ);
+	t->next = (vec + cpu)->list;
+	(vec + cpu)->list = t;
+	__cpu_raise_softirq(cpu, softirq);
 	local_irq_restore(flags);
+	rerun_softirqs(cpu);
 }
 
-static void tasklet_action(struct softirq_action *a)
+void __tasklet_schedule(struct tasklet_struct *t)
 {
-	int cpu = smp_processor_id();
-	struct tasklet_struct *list;
-
-	local_irq_disable();
-	list = tasklet_vec[cpu].list;
-	tasklet_vec[cpu].list = NULL;
-	local_irq_enable();
-
-	while (list) {
-		struct tasklet_struct *t = list;
-
-		list = list->next;
-
-		if (tasklet_trylock(t)) {
-			if (!atomic_read(&t->count)) {
-				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-					BUG();
-				t->func(t->data);
-				tasklet_unlock(t);
-				continue;
-			}
-			tasklet_unlock(t);
-		}
+	__tasklet_sched(t, tasklet_vec, TASKLET_SOFTIRQ);
+}
 
-		local_irq_disable();
-		t->next = tasklet_vec[cpu].list;
-		tasklet_vec[cpu].list = t;
-		__cpu_raise_softirq(cpu, TASKLET_SOFTIRQ);
-		local_irq_enable();
-	}
+void __tasklet_hi_schedule(struct tasklet_struct *t)
+{
+	__tasklet_sched(t, tasklet_hi_vec, HI_SOFTIRQ);
 }
 
-static void tasklet_hi_action(struct softirq_action *a)
+static inline void __tasklet_action(struct softirq_action *a,
+					struct tasklet_head *vec)
 {
 	int cpu = smp_processor_id();
 	struct tasklet_struct *list;
 
 	local_irq_disable();
-	list = tasklet_hi_vec[cpu].list;
-	tasklet_hi_vec[cpu].list = NULL;
+	list = (vec + cpu)->list;
+	(vec + cpu)->list = NULL;
 	local_irq_enable();
 
 	while (list) {
 		struct tasklet_struct *t = list;
 
 		list = list->next;
+		t->next = NULL;
 
-		if (tasklet_trylock(t)) {
-			if (!atomic_read(&t->count)) {
-				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
-					BUG();
-				t->func(t->data);
-				tasklet_unlock(t);
-				continue;
-			}
+repeat:
+		if (!tasklet_trylock(t))
+			continue;
+		if (atomic_read(&t->count)) {
 			tasklet_unlock(t);
+			continue;
 		}
-
-		local_irq_disable();
-		t->next = tasklet_hi_vec[cpu].list;
-		tasklet_hi_vec[cpu].list = t;
-		__cpu_raise_softirq(cpu, HI_SOFTIRQ);
-		local_irq_enable();
+		if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) {
+			t->func(t->data);
+			tasklet_unlock(t);
+			if (test_bit(TASKLET_STATE_SCHED, &t->state))
+				goto repeat;
+			continue;
+		}
+		tasklet_unlock(t);
 	}
 }
 
+static void tasklet_action(struct softirq_action *a)
+{
+	__tasklet_action(a, tasklet_vec);
+}
+
+static void tasklet_hi_action(struct softirq_action *a)
+{
+	__tasklet_action(a, tasklet_hi_vec);
+}
 
 void tasklet_init(struct tasklet_struct *t,
 		  void (*func)(unsigned long), unsigned long data)
@@ -268,8 +230,6 @@
 	clear_bit(TASKLET_STATE_SCHED, &t->state);
 }
 
-
-
 /* Old style BHs */
 
 static void (*bh_base[32])(void);
@@ -325,7 +285,7 @@
 {
 	int i;
 
-	for (i=0; i<32; i++)
+	for (i = 0; i < 32; i++)
 		tasklet_init(bh_task_vec+i, bh_action, i);
 
 	open_softirq(TASKLET_SOFTIRQ, tasklet_action, NULL);
@@ -358,61 +318,3 @@
 			f(data);
 	}
 }
-
-static int ksoftirqd(void * __bind_cpu)
-{
-	int bind_cpu = *(int *) __bind_cpu;
-	int cpu = cpu_logical_map(bind_cpu);
-
-	daemonize();
-	current->nice = 19;
-	sigfillset(&current->blocked);
-
-	/* Migrate to the right CPU */
-	current->cpus_allowed = 1UL << cpu;
-	while (smp_processor_id() != cpu)
-		schedule();
-
-	sprintf(current->comm, "ksoftirqd_CPU%d", bind_cpu);
-
-	__set_current_state(TASK_INTERRUPTIBLE);
-	mb();
-
-	ksoftirqd_task(cpu) = current;
-
-	for (;;) {
-		if (!softirq_pending(cpu))
-			schedule();
-
-		__set_current_state(TASK_RUNNING);
-
-		while (softirq_pending(cpu)) {
-			do_softirq();
-			if (current->need_resched)
-				schedule();
-		}
-
-		__set_current_state(TASK_INTERRUPTIBLE);
-	}
-}
-
-static __init int spawn_ksoftirqd(void)
-{
-	int cpu;
-
-	for (cpu = 0; cpu < smp_num_cpus; cpu++) {
-		if (kernel_thread(ksoftirqd, (void *) &cpu,
-				  CLONE_FS | CLONE_FILES | CLONE_SIGNAL) < 0)
-			printk("spawn_ksoftirqd() failed for cpu %d\n", cpu);
-		else {
-			while (!ksoftirqd_task(cpu_logical_map(cpu))) {
-				current->policy |= SCHED_YIELD;
-				schedule();
-			}
-		}
-	}
-
-	return 0;
-}
-
-__initcall(spawn_ksoftirqd);
--- linux/kernel/timer.c.orig	Tue Aug 21 14:26:19 2001
+++ linux/kernel/timer.c	Mon Oct  1 21:52:43 2001
@@ -674,6 +674,7 @@
 void do_timer(struct pt_regs *regs)
 {
 	(*(unsigned long *)&jiffies)++;
+	irq_rate_check();
 #ifndef CONFIG_SMP
 	/* SMP process accounting uses the local APIC timer */
 
--- linux/include/linux/netdevice.h.orig	Mon Oct  1 21:52:28 2001
+++ linux/include/linux/netdevice.h	Mon Oct  1 23:07:44 2001
@@ -486,8 +486,9 @@
 		local_irq_save(flags);
 		dev->next_sched = softnet_data[cpu].output_queue;
 		softnet_data[cpu].output_queue = dev;
-		cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
+		__cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
 		local_irq_restore(flags);
+		rerun_softirqs(cpu);
 	}
 }
 
@@ -535,8 +536,9 @@
 		local_irq_save(flags);
 		skb->next = softnet_data[cpu].completion_queue;
 		softnet_data[cpu].completion_queue = skb;
-		cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
+		__cpu_raise_softirq(cpu, NET_TX_SOFTIRQ);
 		local_irq_restore(flags);
+		rerun_softirqs(cpu);
 	}
 }
 
--- linux/include/linux/interrupt.h.orig	Mon Oct  1 21:52:32 2001
+++ linux/include/linux/interrupt.h	Mon Oct  1 23:07:33 2001
@@ -74,9 +74,15 @@
 asmlinkage void do_softirq(void);
 extern void open_softirq(int nr, void (*action)(struct softirq_action*), void *data);
 extern void softirq_init(void);
-#define __cpu_raise_softirq(cpu, nr) do { softirq_pending(cpu) |= 1UL << (nr); } while (0)
-extern void FASTCALL(cpu_raise_softirq(unsigned int cpu, unsigned int nr));
-extern void FASTCALL(raise_softirq(unsigned int nr));
+extern void show_stack(unsigned long* esp);
+#define __cpu_raise_softirq(cpu, nr) \
+		do { softirq_pending(cpu) |= 1UL << (nr); } while (0)
+
+#define rerun_softirqs(cpu) 					\
+do {								\
+	if (!(local_irq_count(cpu) | local_bh_count(cpu)))	\
+		do_softirq();					\
+} while (0);
 
 
 
@@ -182,18 +188,8 @@
 	smp_mb();
 }
 
-static inline void tasklet_enable(struct tasklet_struct *t)
-{
-	smp_mb__before_atomic_dec();
-	atomic_dec(&t->count);
-}
-
-static inline void tasklet_hi_enable(struct tasklet_struct *t)
-{
-	smp_mb__before_atomic_dec();
-	atomic_dec(&t->count);
-}
-
+extern void tasklet_enable(struct tasklet_struct *t);
+extern void tasklet_hi_enable(struct tasklet_struct *t);
 extern void tasklet_kill(struct tasklet_struct *t);
 extern void tasklet_init(struct tasklet_struct *t,
 			 void (*func)(unsigned long), unsigned long data);
@@ -263,5 +259,6 @@
 extern unsigned long probe_irq_on(void);	/* returns 0 on failure */
 extern int probe_irq_off(unsigned long);	/* returns 0 or negative on failure */
 extern unsigned int probe_irq_mask(unsigned long);	/* returns mask of ISA interrupts */
+extern void irq_rate_check(void);
 
 #endif
--- linux/include/linux/irq.h.orig	Mon Oct  1 21:52:32 2001
+++ linux/include/linux/irq.h	Mon Oct  1 23:07:19 2001
@@ -31,6 +31,7 @@
 #define IRQ_LEVEL	64	/* IRQ level triggered */
 #define IRQ_MASKED	128	/* IRQ masked - shouldn't be seen again */
 #define IRQ_PER_CPU	256	/* IRQ is per CPU */
+#define IRQ_MITIGATED	512	/* IRQ got rate-limited */
 
 /*
  * Interrupt controller descriptor. This is all we need
@@ -62,6 +63,7 @@
 	struct irqaction *action;	/* IRQ action list */
 	unsigned int depth;		/* nested irq disables */
 	spinlock_t lock;
+	unsigned int count;
 } ____cacheline_aligned irq_desc_t;
 
 extern irq_desc_t irq_desc [NR_IRQS];
--- linux/include/asm-i386/irq.h.orig	Mon Oct  1 23:06:53 2001
+++ linux/include/asm-i386/irq.h	Mon Oct  1 23:07:06 2001
@@ -33,6 +33,7 @@
 extern void disable_irq(unsigned int);
 extern void disable_irq_nosync(unsigned int);
 extern void enable_irq(unsigned int);
+extern void set_irq_rate(unsigned int irq, unsigned int rate);
 
 #ifdef CONFIG_X86_LOCAL_APIC
 #define ARCH_HAS_NMI_WATCHDOG		/* See include/linux/nmi.h */
--- linux/include/asm-mips/softirq.h.orig	Mon Oct  1 21:52:32 2001
+++ linux/include/asm-mips/softirq.h	Mon Oct  1 21:52:43 2001
@@ -40,6 +40,4 @@
 
 #define in_softirq() (local_bh_count(smp_processor_id()) != 0)
 
-#define __cpu_raise_softirq(cpu, nr)	set_bit(nr, &softirq_pending(cpu))
-
 #endif /* _ASM_SOFTIRQ_H */
--- linux/include/asm-mips64/softirq.h.orig	Mon Oct  1 21:52:32 2001
+++ linux/include/asm-mips64/softirq.h	Mon Oct  1 21:52:43 2001
@@ -39,19 +39,4 @@
 
 #define in_softirq() (local_bh_count(smp_processor_id()) != 0)
 
-extern inline void __cpu_raise_softirq(int cpu, int nr)
-{
-	unsigned int *m = (unsigned int *) &softirq_pending(cpu);
-	unsigned int temp;
-
-	__asm__ __volatile__(
-		"1:\tll\t%0, %1\t\t\t# __cpu_raise_softirq\n\t"
-		"or\t%0, %2\n\t"
-		"sc\t%0, %1\n\t"
-		"beqz\t%0, 1b"
-		: "=&r" (temp), "=m" (*m)
-		: "ir" (1UL << nr), "m" (*m)
-		: "memory");
-}
-
 #endif /* _ASM_SOFTIRQ_H */
--- linux/net/core/dev.c.orig	Mon Oct  1 21:52:32 2001
+++ linux/net/core/dev.c	Mon Oct  1 21:52:43 2001
@@ -1218,8 +1218,9 @@
 			dev_hold(skb->dev);
 			__skb_queue_tail(&queue->input_pkt_queue,skb);
 			/* Runs from irqs or BH's, no need to wake BH */
-			cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
+			__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
 			local_irq_restore(flags);
+			rerun_softirqs(this_cpu);
 #ifndef OFFLINE_SAMPLE
 			get_sample_stats(this_cpu);
 #endif
@@ -1529,8 +1530,9 @@
 	local_irq_disable();
 	netdev_rx_stat[this_cpu].time_squeeze++;
 	/* This already runs in BH context, no need to wake up BH's */
-	cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
+	__cpu_raise_softirq(this_cpu, NET_RX_SOFTIRQ);
 	local_irq_enable();
+	rerun_softirqs(this_cpu);
 
 	NET_PROFILE_LEAVE(softnet_process);
 	return;
--- linux/arch/i386/kernel/irq.c.orig	Mon Oct  1 21:52:28 2001
+++ linux/arch/i386/kernel/irq.c	Mon Oct  1 23:06:26 2001
@@ -18,6 +18,7 @@
  */
 
 #include <linux/config.h>
+#include <linux/compiler.h>
 #include <linux/ptrace.h>
 #include <linux/errno.h>
 #include <linux/signal.h>
@@ -68,7 +69,24 @@
 irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
 	{ [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
 
-static void register_irq_proc (unsigned int irq);
+#define DEFAULT_IRQ_RATE 20000
+
+/*
+ * Maximum number of interrupts allowed, per second.
+ * Individual values can be set via echoing the new
+ * decimal value into /proc/irq/IRQ/max_rate.
+ */
+static unsigned int irq_rate [NR_IRQS] =
+		{ [0 ... NR_IRQS-1] = DEFAULT_IRQ_RATE };
+
+/*
+ * Print warnings only once. We reset it to 1 if rate
+ * limit has been changed.
+ */
+static unsigned int rate_warning [NR_IRQS] =
+		{ [0 ... NR_IRQS-1] = 1 };
+
+static void register_irq_proc(unsigned int irq);
 
 /*
  * Special irq handlers.
@@ -230,35 +248,8 @@
 	show_stack(NULL);
 	printk("\n");
 }
-	
-#define MAXCOUNT 100000000
 
-/*
- * I had a lockup scenario where a tight loop doing
- * spin_unlock()/spin_lock() on CPU#1 was racing with
- * spin_lock() on CPU#0. CPU#0 should have noticed spin_unlock(), but
- * apparently the spin_unlock() information did not make it
- * through to CPU#0 ... nasty, is this by design, do we have to limit
- * 'memory update oscillation frequency' artificially like here?
- *
- * Such 'high frequency update' races can be avoided by careful design, but
- * some of our major constructs like spinlocks use similar techniques,
- * it would be nice to clarify this issue. Set this define to 0 if you
- * want to check whether your system freezes.  I suspect the delay done
- * by SYNC_OTHER_CORES() is in correlation with 'snooping latency', but
- * i thought that such things are guaranteed by design, since we use
- * the 'LOCK' prefix.
- */
-#define SUSPECTED_CPU_OR_CHIPSET_BUG_WORKAROUND 0
-
-#if SUSPECTED_CPU_OR_CHIPSET_BUG_WORKAROUND
-# define SYNC_OTHER_CORES(x) udelay(x+1)
-#else
-/*
- * We have to allow irqs to arrive between __sti and __cli
- */
-# define SYNC_OTHER_CORES(x) __asm__ __volatile__ ("nop")
-#endif
+#define MAXCOUNT 100000000
 
 static inline void wait_on_irq(int cpu)
 {
@@ -276,7 +267,7 @@
 				break;
 
 		/* Duh, we have to loop. Release the lock to avoid deadlocks */
-		clear_bit(0,&global_irq_lock);
+		clear_bit(0, &global_irq_lock);
 
 		for (;;) {
 			if (!--count) {
@@ -284,7 +275,8 @@
 				count = ~0;
 			}
 			__sti();
-			SYNC_OTHER_CORES(cpu);
+			/* Allow irqs to arrive */
+			__asm__ __volatile__ ("nop");
 			__cli();
 			if (irqs_running())
 				continue;
@@ -467,6 +459,13 @@
  * controller lock. 
  */
  
+inline void __disable_irq(irq_desc_t *desc, unsigned int irq)
+{
+	if (!desc->depth++) {
+		desc->status |= IRQ_DISABLED;
+		desc->handler->disable(irq);
+	}
+}
 /**
  *	disable_irq_nosync - disable an irq without waiting
  *	@irq: Interrupt to disable
@@ -485,10 +484,7 @@
 	unsigned long flags;
 
 	spin_lock_irqsave(&desc->lock, flags);
-	if (!desc->depth++) {
-		desc->status |= IRQ_DISABLED;
-		desc->handler->disable(irq);
-	}
+	__disable_irq(desc, irq);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 
@@ -516,23 +512,8 @@
 	}
 }
 
-/**
- *	enable_irq - enable handling of an irq
- *	@irq: Interrupt to enable
- *
- *	Undoes the effect of one call to disable_irq().  If this
- *	matches the last disable, processing of interrupts on this
- *	IRQ line is re-enabled.
- *
- *	This function may be called from IRQ context.
- */
- 
-void enable_irq(unsigned int irq)
+static inline void __enable_irq(irq_desc_t *desc, unsigned int irq)
 {
-	irq_desc_t *desc = irq_desc + irq;
-	unsigned long flags;
-
-	spin_lock_irqsave(&desc->lock, flags);
 	switch (desc->depth) {
 	case 1: {
 		unsigned int status = desc->status & ~IRQ_DISABLED;
@@ -551,9 +532,69 @@
 		printk("enable_irq(%u) unbalanced from %p\n", irq,
 		       __builtin_return_address(0));
 	}
+}
+
+/**
+ *	enable_irq - enable handling of an irq
+ *	@irq: Interrupt to enable
+ *
+ *	Undoes the effect of one call to disable_irq().  If this
+ *	matches the last disable, processing of interrupts on this
+ *	IRQ line is re-enabled.
+ *
+ *	This function may be called from IRQ context.
+ */
+ 
+void enable_irq(unsigned int irq)
+{
+	irq_desc_t *desc = irq_desc + irq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&desc->lock, flags);
+	__enable_irq(desc, irq);
 	spin_unlock_irqrestore(&desc->lock, flags);
 }
 
+void set_irq_rate(unsigned int irq, unsigned int rate)
+{
+	if (rate < 2*HZ)
+		rate = 2*HZ;
+	if (irq_rate[irq] != rate)
+		rate_warning[irq] = 1;
+	irq_rate[irq] = rate;
+}
+
+static inline void __handle_mitigated(irq_desc_t *desc, unsigned int irq)
+{
+	desc->status &= ~IRQ_MITIGATED;
+	__enable_irq(desc, irq);
+}
+
+/*
+ * This function, provided by every architecture, resets
+ * the irq-limit counters in every jiffy. Overhead is
+ * fairly small, since it gets the spinlock only if the IRQ
+ * got mitigated.
+ */
+
+void irq_rate_check(void)
+{
+	unsigned long flags;
+	irq_desc_t *desc;
+	int i;
+
+	for (i = 0; i < NR_IRQS; i++) {
+		desc = irq_desc + i;
+		if (desc->count <= 1) {
+			spin_lock_irqsave(&desc->lock, flags);
+			if (desc->status & IRQ_MITIGATED)
+				__handle_mitigated(desc, i);
+			spin_unlock_irqrestore(&desc->lock, flags);
+		}
+		desc->count = irq_rate[i] / HZ;
+	}
+}
+
 /*
  * do_IRQ handles all normal device IRQ's (the special
  * SMP cross-CPU interrupts have their own specific
@@ -585,6 +626,13 @@
 	   WAITING is used by probe to mark irqs that are being tested
 	   */
 	status = desc->status & ~(IRQ_REPLAY | IRQ_WAITING);
+	/*
+	 * One decrement and one branch (test for zero) into
+	 * an unlikely-predicted branch. It cannot be cheaper
+	 * than this.
+	 */
+	if (unlikely(!--desc->count))
+		goto mitigate;
 	status |= IRQ_PENDING; /* we _want_ to handle it */
 
 	/*
@@ -639,6 +687,27 @@
 	if (softirq_pending(cpu))
 		do_softirq();
 	return 1;
+
+mitigate:
+	/*
+	 * We take a slightly longer path here to not put
+	 * overhead into the IRQ hotpath:
+	 */
+	desc->count = 1;
+	if (status & IRQ_MITIGATED)
+		goto out;
+	/*
+	 * Disable interrupt source. It will be re-enabled
+	 * by the next timer interrupt - and possibly be
+	 * restarted if needed.
+	 */
+	desc->status |= IRQ_MITIGATED | IRQ_PENDING;
+	__disable_irq(desc, irq);
+	if (rate_warning[irq]) {
+		printk(KERN_WARNING "Rate limit of %d irqs/sec exceeded for IRQ%d! Throttling irq source.\n", irq_rate[irq], irq);
+		rate_warning[irq] = 0;
+	}
+	goto out;
 }
 
 /**
@@ -809,7 +878,7 @@
 	 * something may have generated an irq long ago and we want to
 	 * flush such a longstanding irq before considering it as spurious. 
 	 */
-	for (i = NR_IRQS-1; i > 0; i--)  {
+	for (i = NR_IRQS-1; i > 0; i--) {
 		desc = irq_desc + i;
 
 		spin_lock_irq(&desc->lock);
@@ -1030,9 +1099,49 @@
 static struct proc_dir_entry * root_irq_dir;
 static struct proc_dir_entry * irq_dir [NR_IRQS];
 
+#define DEC_DIGITS 9
+
+/*
+ * Parses from 0 to 999999999. More than enough for IRQ purposes.
+ */
+static unsigned int parse_dec_value(const char *buffer,
+		unsigned long count, unsigned long *ret)
+{
+	unsigned char decnum [DEC_DIGITS];
+	unsigned long value;
+	int i;
+
+	if (!count)
+		return -EINVAL;
+	if (count > DEC_DIGITS)
+		count = DEC_DIGITS;
+	if (copy_from_user(decnum, buffer, count))
+		return -EFAULT;
+
+	/*
+	 * Parse the first 9 characters as a decimal string,
+	 * any non-decimal char is end-of-string.
+	 */
+	value = 0;
+
+	for (i = 0; i < count; i++) {
+		unsigned int c = decnum[i];
+
+		switch (c) {
+			case '0' ... '9': c -= '0'; break;
+		default:
+			goto out;
+		}
+		value = value * 10 + c;
+	}
+out:
+	*ret = value;
+	return 0;
+}
+
 #define HEX_DIGITS 8
 
-static unsigned int parse_hex_value (const char *buffer,
+static unsigned int parse_hex_value(const char *buffer,
 		unsigned long count, unsigned long *ret)
 {
 	unsigned char hexnum [HEX_DIGITS];
@@ -1071,18 +1180,17 @@
 
 #if CONFIG_SMP
 
-static struct proc_dir_entry * smp_affinity_entry [NR_IRQS];
-
 static unsigned long irq_affinity [NR_IRQS] = { [0 ... NR_IRQS-1] = ~0UL };
-static int irq_affinity_read_proc (char *page, char **start, off_t off,
+
+static int irq_affinity_read_proc(char *page, char **start, off_t off,
 			int count, int *eof, void *data)
 {
-	if (count < HEX_DIGITS+1)
+	if (count <= HEX_DIGITS)
 		return -EINVAL;
 	return sprintf (page, "%08lx\n", irq_affinity[(long)data]);
 }
 
-static int irq_affinity_write_proc (struct file *file, const char *buffer,
+static int irq_affinity_write_proc(struct file *file, const char *buffer,
 					unsigned long count, void *data)
 {
 	int irq = (long) data, full_count = count, err;
@@ -1109,16 +1217,16 @@
 
 #endif
 
-static int prof_cpu_mask_read_proc (char *page, char **start, off_t off,
+static int prof_cpu_mask_read_proc(char *page, char **start, off_t off,
 			int count, int *eof, void *data)
 {
 	unsigned long *mask = (unsigned long *) data;
-	if (count < HEX_DIGITS+1)
+	if (count <= HEX_DIGITS)
 		return -EINVAL;
 	return sprintf (page, "%08lx\n", *mask);
 }
 
-static int prof_cpu_mask_write_proc (struct file *file, const char *buffer,
+static int prof_cpu_mask_write_proc(struct file *file, const char *buffer,
 					unsigned long count, void *data)
 {
 	unsigned long *mask = (unsigned long *) data, full_count = count, err;
@@ -1132,10 +1240,45 @@
 	return full_count;
 }
 
+static int irq_rate_read_proc(char *page, char **start, off_t off,
+			int count, int *eof, void *data)
+{
+	int irq = (int) data;
+	if (count <= DEC_DIGITS)
+		return -EINVAL;
+	return sprintf (page, "%d\n", irq_rate[irq]);
+}
+
+static int irq_rate_write_proc(struct file *file, const char *buffer,
+					unsigned long count, void *data)
+{
+	int irq = (int) data;
+	unsigned long full_count = count, err;
+	unsigned long new_value;
+
+	/* do not allow the timer interrupt to be rate-limited ... :-| */
+	if (!irq)
+		return -EINVAL;
+	err = parse_dec_value(buffer, count, &new_value);
+	if (err)
+		return err;
+
+	/*
+	 * Do not allow a frequency to be lower than 1 interrupt
+	 * per jiffy.
+	 */
+	if (!new_value)
+		return -EINVAL;
+
+	set_irq_rate(irq, new_value);
+	return full_count;
+}
+
 #define MAX_NAMELEN 10
 
-static void register_irq_proc (unsigned int irq)
+static void register_irq_proc(unsigned int irq)
 {
+	struct proc_dir_entry *entry;
 	char name [MAX_NAMELEN];
 
 	if (!root_irq_dir || (irq_desc[irq].handler == &no_irq_type) ||
@@ -1148,28 +1291,32 @@
 	/* create /proc/irq/1234 */
 	irq_dir[irq] = proc_mkdir(name, root_irq_dir);
 
-#if CONFIG_SMP
-	{
-		struct proc_dir_entry *entry;
+	/* create /proc/irq/1234/max_rate */
+	entry = create_proc_entry("max_rate", 0600, irq_dir[irq]);
 
-		/* create /proc/irq/1234/smp_affinity */
-		entry = create_proc_entry("smp_affinity", 0600, irq_dir[irq]);
+	if (entry) {
+		entry->nlink = 1;
+		entry->data = (void *)irq;
+		entry->read_proc = irq_rate_read_proc;
+		entry->write_proc = irq_rate_write_proc;
+	}
 
-		if (entry) {
-			entry->nlink = 1;
-			entry->data = (void *)(long)irq;
-			entry->read_proc = irq_affinity_read_proc;
-			entry->write_proc = irq_affinity_write_proc;
-		}
+#if CONFIG_SMP
+	/* create /proc/irq/1234/smp_affinity */
+	entry = create_proc_entry("smp_affinity", 0600, irq_dir[irq]);
 
-		smp_affinity_entry[irq] = entry;
+	if (entry) {
+		entry->nlink = 1;
+		entry->data = (void *)(long)irq;
+		entry->read_proc = irq_affinity_read_proc;
+		entry->write_proc = irq_affinity_write_proc;
 	}
 #endif
 }
 
 unsigned long prof_cpu_mask = -1;
 
-void init_irq_proc (void)
+void init_irq_proc(void)
 {
 	struct proc_dir_entry *entry;
 	int i;
@@ -1181,7 +1328,7 @@
 	entry = create_proc_entry("prof_cpu_mask", 0600, root_irq_dir);
 
 	if (!entry)
-	    return;
+		return;
 
 	entry->nlink = 1;
 	entry->data = (void *)&prof_cpu_mask;

^ permalink raw reply	[flat|nested] 151+ messages in thread

end of thread, other threads:[~2001-10-13 19:36 UTC | newest]

Thread overview: 151+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-03 14:15 [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5 Manfred Spraul
2001-10-03 15:09 ` jamal
2001-10-03 18:37   ` Davide Libenzi
  -- strict thread matches above, loose matches on Subject: below --
2001-10-08 14:45 jamal
2001-10-09  0:36 ` Scott Laird
2001-10-09  3:17   ` jamal
2001-10-09  4:04 ` Werner Almesberger
2001-10-04  8:25 Magnus Redin
2001-10-04 11:39 ` Trever L. Adams
     [not found] <200110031811.f93IBoN10026@penguin.transmeta.com>
2001-10-03 18:23 ` Ingo Molnar
2001-10-04  9:19   ` BALBIR SINGH
2001-10-04  9:22     ` Ingo Molnar
2001-10-04  9:49       ` BALBIR SINGH
2001-10-04 10:25         ` Ingo Molnar
2001-10-07 20:37           ` Andrea Arcangeli
2001-10-02  0:41 jamal
2001-10-02  1:04 ` Benjamin LaHaise
2001-10-02  1:54   ` jamal
2001-10-02  5:13     ` Benjamin LaHaise
2001-10-02  5:55       ` Ben Greear
2001-10-03  9:22         ` Ingo Molnar
2001-10-03 14:06           ` David Brownell
2001-10-02 12:10       ` jamal
2001-10-02 22:00         ` jamal
2001-10-03  8:34           ` Ingo Molnar
2001-10-03  9:29             ` Helge Hafting
2001-10-03 12:49             ` jamal
2001-10-03 14:51               ` Ingo Molnar
2001-10-03 15:14                 ` jamal
2001-10-03 17:28                   ` Ingo Molnar
2001-10-04  0:53                     ` jamal
2001-10-04  6:28                       ` Ingo Molnar
2001-10-04 11:34                         ` jamal
2001-10-04 17:40                           ` Andreas Dilger
2001-10-04 18:33                             ` jamal
2001-10-04  6:50                       ` Ben Greear
2001-10-04  6:52                         ` Ingo Molnar
2001-10-04 11:50                           ` jamal
2001-10-04  6:55                         ` Jeff Garzik
2001-10-04  6:56                           ` Ingo Molnar
2001-10-04 21:28                 ` Alex Bligh - linux-kernel
2001-10-04 21:49                   ` Benjamin LaHaise
2001-10-04 23:20                     ` Alex Bligh - linux-kernel
2001-10-04 23:26                       ` Benjamin LaHaise
2001-10-04 23:47                       ` Robert Love
2001-10-04 23:51                         ` Linus Torvalds
2001-10-05  0:00                           ` Ben Greear
2001-10-05  0:18                             ` Davide Libenzi
2001-10-05  2:01                             ` jamal
2001-10-04 22:01                   ` Simon Kirby
2001-10-04 23:25                     ` Alex Bligh - linux-kernel
2001-10-04 23:34                       ` Simon Kirby
2001-10-04 22:10                   ` Alan Cox
2001-10-04 23:28                     ` Alex Bligh - linux-kernel
2001-10-05 15:22                   ` Robert Olsson
2001-10-03  9:38           ` Ingo Molnar
2001-10-03 13:03             ` jamal
2001-10-03 13:25               ` jamal
2001-10-03 15:28               ` Ingo Molnar
2001-10-03 15:56                 ` jamal
2001-10-03 16:51                   ` Ingo Molnar
2001-10-04  0:46                     ` jamal
2001-10-08  0:31                     ` Andrea Arcangeli
2001-10-08  4:58                       ` Bernd Eckenfels
2001-10-08 15:00                       ` Alan Cox
2001-10-08 15:03                         ` Jeff Garzik
2001-10-08 15:12                           ` Alan Cox
2001-10-08 15:09                             ` jamal
2001-10-08 15:22                               ` Alan Cox
2001-10-08 15:20                                 ` jamal
2001-10-08 15:35                                   ` Alan Cox
2001-10-08 15:57                                     ` jamal
2001-10-08 16:11                                       ` Alan Cox
2001-10-08 16:11                                         ` jamal
2001-10-10 16:26                                         ` Pavel Machek
2001-10-10 16:25                                     ` Pavel Machek
2001-10-08 15:24                             ` Andrea Arcangeli
2001-10-08 15:35                               ` Alan Cox
2001-10-08 15:19                         ` Andrea Arcangeli
2001-10-08 15:10                       ` bill davidsen
2001-10-03 21:08                   ` Robert Olsson
2001-10-03 22:22                     ` Andreas Dilger
2001-10-04 17:32                       ` Davide Libenzi
2001-10-05 14:52                     ` Robert Olsson
2001-10-05 18:48                       ` Andreas Dilger
2001-10-05 19:07                         ` Davide Libenzi
2001-10-05 19:17                         ` kuznet
2001-10-08 13:58                           ` jamal
2001-10-08 17:42                           ` Robert Olsson
2001-10-08 17:39                             ` jamal
2001-10-07  6:11                         ` Robert Olsson
2001-10-03 16:53                 ` kuznet
2001-10-03 17:06                   ` Ingo Molnar
2001-10-04  0:44                     ` jamal
2001-10-04  6:35                       ` Ingo Molnar
2001-10-04 11:41                         ` jamal
2001-10-05 16:42                         ` kuznet
2001-10-04 13:05                       ` Robert Olsson
2001-10-03 19:03                   ` Benjamin LaHaise
2001-10-04  1:10                     ` jamal
2001-10-04  1:30                       ` Benjamin LaHaise
2001-10-03 22:31                         ` Rob Landley
2001-10-04  1:39                         ` jamal
2001-10-03 15:42               ` Ben Greear
2001-10-03 15:58                 ` jamal
2001-10-03 16:09                   ` Ben Greear
2001-10-03 16:14                     ` Ingo Molnar
2001-10-03 16:20                     ` Jeff Garzik
2001-10-03 16:33                 ` Linus Torvalds
2001-10-03 17:25                   ` Ingo Molnar
2001-10-03 18:11                     ` Linus Torvalds
2001-10-03 20:41                       ` Jeremy Hansen
2001-10-03 20:02                   ` Simon Kirby
2001-10-04  1:04                     ` jamal
2001-10-04  6:47                       ` Ben Greear
2001-10-04  7:41                         ` Henning P. Schmiedehausen
2001-10-04 16:09                           ` Ben Greear
2001-10-04 17:32                             ` Henning P. Schmiedehausen
2001-10-04 18:03                               ` Ben Greear
2001-10-04 18:30                           ` Christopher E. Brown
2001-10-04 11:47                         ` jamal
2001-10-04 15:56                           ` Ben Greear
2001-10-04 18:23                             ` jamal
2001-10-04  6:50                       ` Ingo Molnar
2001-10-04 11:49                         ` jamal
2001-10-04  8:45                       ` Simon Kirby
2001-10-04 11:54                         ` jamal
2001-10-04 15:03                           ` Tim Hockin
2001-10-04 18:55                           ` Ion Badulescu
2001-10-04 19:00                             ` jamal
2001-10-04 21:16                               ` Ion Badulescu
2001-10-04  4:12                   ` bill davidsen
2001-10-04 18:16                   ` Alan Cox
2001-10-03 13:38             ` Robert Olsson
2001-10-04 21:22               ` Alex Bligh - linux-kernel
2001-10-05 14:32               ` Robert Olsson
2001-10-03  8:38         ` Ingo Molnar
2001-10-04  3:50           ` bill davidsen
2001-10-02 17:03       ` Robert Olsson
2001-10-02 17:37         ` jamal
2001-10-02 19:46         ` Andreas Dilger
2001-10-01 22:16 Ingo Molnar
2001-10-01 22:26 ` Tim Hockin
2001-10-01 22:50   ` Ingo Molnar
2001-10-01 22:36 ` Andreas Dilger
2001-10-01 22:50 ` Ben Greear
2001-10-02 14:30   ` Alan Cox
2001-10-02 20:51     ` Ingo Molnar
2001-10-01 23:03 ` Linus Torvalds
2001-10-02  6:50 ` Marcus Sundberg
2001-10-03  8:47   ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).