linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
@ 2003-03-06 20:01 Nakajima, Jun
  0 siblings, 0 replies; 11+ messages in thread
From: Nakajima, Jun @ 2003-03-06 20:01 UTC (permalink / raw)
  To: Arjan van de Ven, Kamble, Nitin A
  Cc: Andrew Morton, linux-kernel, kai.bankett, mingo, Mallick, Asit K,
	Saxena, Sunil

I think tuning for NUMA issues are different. The intention/scope of this patch was to provide an efficient interrupt routing in software that would work for dual/SMP P4P based systems.  Although we found this improved older systems as well, there was no need to do this earlier since it was done by the chipsets in the platform and software did not have any thing to do. 

Jun

> -----Original Message-----
> From: Arjan van de Ven [mailto:arjan@fenrus.demon.nl]
> Sent: Wednesday, March 05, 2003 10:27 AM
> To: Kamble, Nitin A
> Cc: Andrew Morton; linux-kernel@vger.kernel.org; kai.bankett@ontika.net;
> mingo@redhat.com; Nakajima, Jun; Mallick, Asit K; Saxena, Sunil
> Subject: RE: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-
> fixes
> 
> On Wed, 2003-03-05 at 05:21, Kamble, Nitin A wrote:
> > There are few issues we found with the user level daemon approach.
> >
> >    Static binding compatibility: With the user level daemon, users can
> > not
> > use the /proc/irq/i/smp_affinity interface for the static binding of
> > interrupts.
> 
> no they can just write/change the config file, with a gui if needed
> 
> >
> >   There is some information which is only available in the kernel today,
> 
> there's also some information only available to userspace today that the
> userspace daemon can and does use.
> 
> > Also the future implementation might need more kernel data. This is
> > important for interfaces such as NAPI, where interrupts handling changes
> > on the fly.
> 
> ehm. almost. but napi isn't it ....
> 
> and the userspace side can easily have a system vendor provided file
> that represents all kinds of very specific system info about the numa
> structure..... working with every kernel out there.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
@ 2003-03-05 19:57 Kamble, Nitin A
  0 siblings, 0 replies; 11+ messages in thread
From: Kamble, Nitin A @ 2003-03-05 19:57 UTC (permalink / raw)
  To: Kai Bankett
  Cc: linux-kernel, kai.bankett, mingo, akpm, Nakajima, Jun, Mallick,
	Asit K, Saxena, Sunil

> -----Original Message-----
> From: Kai Bankett [mailto:chaosman@ontika.net]
> Are you really sure that option 2 looks better on a static and heavy
> interrupt load ?
> If the load is generated by few heavy sources (sources_count <
> count(cpus)) why not distributed them (mostly) statically across the
> available cpus ? What gain do you have by rotating them round robin in
> this case ?
> I think round robin only starts making sense if the number of heavy
> sources is > number of physical cpus.

[NK] If there is no rotating around at all, then it is same as
statically binding the IRQs to cpus. And with the netstat benchmark the
kirq has performed about 10% better than nicely statically bound IRQs.
It is happening like that because, after processing the interrupt the
benchmark
also has to do some processing, and if all the threads are doing the
processing at almost equal speed it gives better performance. If one
thread is faster and another is slower, the slower guy slows down the
whole system.

Thanks,
Nitin
> 
> Kai

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-05  4:21 Kamble, Nitin A
  2003-03-05  4:38 ` Jeff Garzik
@ 2003-03-05 18:26 ` Arjan van de Ven
  1 sibling, 0 replies; 11+ messages in thread
From: Arjan van de Ven @ 2003-03-05 18:26 UTC (permalink / raw)
  To: Kamble, Nitin A
  Cc: Andrew Morton, linux-kernel, kai.bankett, mingo, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil

[-- Attachment #1: Type: text/plain, Size: 965 bytes --]

On Wed, 2003-03-05 at 05:21, Kamble, Nitin A wrote:
> There are few issues we found with the user level daemon approach.
>   
>    Static binding compatibility: With the user level daemon, users can
> not  
> use the /proc/irq/i/smp_affinity interface for the static binding of
> interrupts.

no they can just write/change the config file, with a gui if needed

> 
>   There is some information which is only available in the kernel today,

there's also some information only available to userspace today that the
userspace daemon can and does use.

> Also the future implementation might need more kernel data. This is
> important for interfaces such as NAPI, where interrupts handling changes
> on the fly.

ehm. almost. but napi isn't it ....

and the userspace side can easily have a system vendor provided file
that represents all kinds of very specific system info about the numa
structure..... working with every kernel out there.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-05  4:38 ` Jeff Garzik
@ 2003-03-05 15:46   ` Jason Lunz
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Lunz @ 2003-03-05 15:46 UTC (permalink / raw)
  To: linux-kernel

jgarzik@pobox.com said:
> Further, for NAPI and networking in general, it is recommended to bind
> each NIC to a single interrupt, and never change that binding. 

I assume you mean "bind each NIC interrupt to a single CPU" here. I've
done quite a lot of benchmarking on dual SMP that shows that for
high-load networking, you basically have two cases:

 - the irq load is less than what can be handled by one CPU. This is the
   case, for example, using a NAPI e1000 driver under any load on a
   > 1 GHz SMP machine. even with two e1000 cards under extreme load,
   one CPU can run the interrupt handlers with cycles to spare (thanks
   to NAPI).  This config (all NIC interrupts on CPU0) is optimal as
   long as CPU doesn't become saturated. Trying to distribute the
   interrupt load across multiple CPUs incurs measurable performance
   loss, probably due to cache effects.
 
 - the irq load is enough to livelock one CPU. It's easy for this to
   happen with gigE NICs on a non-NAPI kernel, for example. In this
   case, you're better off binding each heavy interrupt source to a
   different CPU.

2.4's default behavior isn't optimal in either case.

> Delivering a single NIC's interrupts to multiple CPUs leads to a 
> noticeable performance loss.  This is why some people complain that 
> their specific network setups are faster on a uniprocessor kernel than 
> an SMP kernel.

This is what I've seen as well. The good news is that you can pretty
much recapture the uniprocessor performance by binding all heavy
interrupt sources to one CPU, as long as that CPU can handle it. And any
modern machine with a NAPI kernel _can_ handle any realistic gigE load.

I should mention that these results are all measurements of gigabit
bridge performance, where every frame needs to be received on one NIC
and sent on the other. So there are obvious cache benefits to doing it
all on one CPU.

-- 
Jason Lunz			Reflex Security
lunz@reflexsecurity.com		http://www.reflexsecurity.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-04 23:33 Kamble, Nitin A
  2003-03-04 23:51 ` Andrew Morton
@ 2003-03-05 10:48 ` Kai Bankett
  1 sibling, 0 replies; 11+ messages in thread
From: Kai Bankett @ 2003-03-05 10:48 UTC (permalink / raw)
  To: Kamble, Nitin A
  Cc: linux-kernel, kai.bankett, mingo, akpm, Nakajima, Jun, Mallick,
	Asit K, Saxena, Sunil

>
>
>  2. Or move the heavy imbalance around all the cpus in the round robin 
>     fashion at high rate.
>
>
>Both the solutions will eliminate the bouncing behavior. The current 
>implementation is based on the option 2, with the only difference of 
>lower rate of distribution (5 sec).  The optimal option is workload 
>dependant. With static and heavy interrupt load, the option 2 looks 
>better, while with random interrupt load the option 1 is good enough.
>
>  
>
Hi Nitin,

Thanks much for your response !
Are you really sure that option 2 looks better on a static and heavy 
interrupt load ?
If the load is generated by few heavy sources (sources_count < 
count(cpus)) why not distributed them (mostly) statically across the 
available cpus ? What gain do you have by rotating them round robin in 
this case ?
I think round robin only starts making sense if the number of heavy 
sources is > number of physical cpus.

Kai


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-05  4:21 Kamble, Nitin A
@ 2003-03-05  4:38 ` Jeff Garzik
  2003-03-05 15:46   ` Jason Lunz
  2003-03-05 18:26 ` Arjan van de Ven
  1 sibling, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2003-03-05  4:38 UTC (permalink / raw)
  To: Kamble, Nitin A
  Cc: Andrew Morton, linux-kernel, kai.bankett, mingo, Nakajima, Jun,
	Mallick, Asit K, Saxena, Sunil

Kamble, Nitin A wrote:
> There are few issues we found with the user level daemon approach.

Thanks much for the response!


>    Static binding compatibility: With the user level daemon, users can
> not  
> use the /proc/irq/i/smp_affinity interface for the static binding of
> interrupts.

Not terribly accurate:  in "one-shot" mode, where the daemon balances 
irqs once at startup, users can change smp_affinity all they want.

In the normal continuous-balance mode, it is quite easy to have the 
daemon either (a) notice changes users make or (b) configure the daemon. 
  The daemon does not do (a) or (b) currently, but it is a simple change.


>   There is some information which is only available in the kernel today,
> Also the future implementation might need more kernel data. This is
> important for interfaces such as NAPI, where interrupts handling changes
> on the fly.

This depends on the information :)  Some information that is useful for 
balancing is only [easily] available from userspace.   In-kernel 
information may be easily exported through "sysfs", which is designed to 
export in-kernel information.

Further, for NAPI and networking in general, it is recommended to bind 
each NIC to a single interrupt, and never change that binding. 
Delivering a single NIC's interrupts to multiple CPUs leads to a 
noticeable performance loss.  This is why some people complain that 
their specific network setups are faster on a uniprocessor kernel than 
an SMP kernel.

I have not examined interrupt delivery for other peripherals, such at 
ATA or SCSI hosts, but for networking you definitely want to statically 
bind each NIC's irqs to a separate CPU, and then not touch that binding.

Best regards, and thanks again for your valuable feedback,

	Jeff




^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
@ 2003-03-05  4:21 Kamble, Nitin A
  2003-03-05  4:38 ` Jeff Garzik
  2003-03-05 18:26 ` Arjan van de Ven
  0 siblings, 2 replies; 11+ messages in thread
From: Kamble, Nitin A @ 2003-03-05  4:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, kai.bankett, mingo, Nakajima, Jun, Mallick, Asit K,
	Saxena, Sunil

There are few issues we found with the user level daemon approach.
  
   Static binding compatibility: With the user level daemon, users can
not  
use the /proc/irq/i/smp_affinity interface for the static binding of
interrupts.

  There is some information which is only available in the kernel today,
Also the future implementation might need more kernel data. This is
important for interfaces such as NAPI, where interrupts handling changes
on the fly.

Thanks,
Nitin

> Now there has been some discssion as to whether these algorithmic
> decisions
> can be moved out of the kernel altogether.  And with periods of one
and
> five
> seconds that does appear to be feasible.
> 
> I believe that you have looked at this before and encountered some
problem
> with it.  Could you please describe what happened there?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-04 23:33 Kamble, Nitin A
@ 2003-03-04 23:51 ` Andrew Morton
  2003-03-05 10:48 ` Kai Bankett
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2003-03-04 23:51 UTC (permalink / raw)
  To: Kamble, Nitin A
  Cc: linux-kernel, kai.bankett, mingo, jun.nakajima, asit.k.mallick,
	sunil.saxena

"Kamble, Nitin A" <nitin.a.kamble@intel.com> wrote:
>
> Both the solutions will eliminate the bouncing behavior. The current 
> implementation is based on the option 2, with the only difference of 
> lower rate of distribution (5 sec).  The optimal option is workload 
> dependant. With static and heavy interrupt load, the option 2 looks 
> better, while with random interrupt load the option 1 is good enough.

OK, thanks.

Now there has been some discssion as to whether these algorithmic decisions
can be moved out of the kernel altogether.  And with periods of one and five
seconds that does appear to be feasible.

I believe that you have looked at this before and encountered some problem
with it.  Could you please describe what happened there?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
@ 2003-03-04 23:33 Kamble, Nitin A
  2003-03-04 23:51 ` Andrew Morton
  2003-03-05 10:48 ` Kai Bankett
  0 siblings, 2 replies; 11+ messages in thread
From: Kamble, Nitin A @ 2003-03-04 23:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: kai.bankett, mingo, akpm, Nakajima, Jun, Mallick, Asit K, Saxena, Sunil

Hi Andrew, Kai,

  The bouncing is seen because of the round robin IRQ distribution in
some 
particular cases. In some cases, (such as single heavy interrupt source
in 
a 2way SMP system) binding heavy interrupt sources to different cpus is
not 
going to remove the complete imbalance. In that case we fall back to
Ingo's 
round robin approach. We have studied the previous round robin interrupt

distribution implemented in the kernel, and we found that, at very high 
interrupt rate, the performance of the system increased with the
increasing 
period of the round robin distribution. Please see the original LKML
posting 
for more details. 
http://www.uwsg.indiana.edu/hypermail/linux/kernel/0212.2/1122.html 

So when if there is significant imbalance left after binding the IRQs to
cpus, 
there are two options now,

  1. Do not move around. Let the significant imbalance stick on a
particular 
     cpu.

  2. Or move the heavy imbalance around all the cpus in the round robin 
     fashion at high rate.

Also we can have either of the option configurable in the kernel.

Both the solutions will eliminate the bouncing behavior. The current 
implementation is based on the option 2, with the only difference of 
lower rate of distribution (5 sec).  The optimal option is workload 
dependant. With static and heavy interrupt load, the option 2 looks 
better, while with random interrupt load the option 1 is good enough.

Thanks & Regards,
Nitin



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
  2003-03-04 16:33 Kai Bankett
@ 2003-03-04 16:45 ` Jeff Garzik
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Garzik @ 2003-03-04 16:45 UTC (permalink / raw)
  To: Kai Bankett; +Cc: mingo, linux-kernel

Would it be possible for you to test Arjan's irqbalance daemon?

We believe it is a superior solution to in-kernel irq balancing, but
also, can be safely used in addition to in-kernel irq balancing.
(we just have not run benchmarks to prove this yet :))

http://people.redhat.com/arjanv/irqbalance/

This userspace solution is shipping with current Red Hat, and is
portable to non-ia32 architectures.

	Jeff




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes
@ 2003-03-04 16:33 Kai Bankett
  2003-03-04 16:45 ` Jeff Garzik
  0 siblings, 1 reply; 11+ messages in thread
From: Kai Bankett @ 2003-03-04 16:33 UTC (permalink / raw)
  To: mingo, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

Hi,

This patch does the following things :

* Bouncing of IRQs between CPUs reduction
  If IRQ already moved - check if further move makes sense
  (blocks bounce forth and back)

* Bring back interrupts to the IRQ_PRIMARY_CPU (default = 0)
  If interupts / time drops down the IRQ is routed back to the default CPU

* Introduces a desc_interrupt[irq].processor value
  This is needed to decide which IRQ has to be routed back to the default CPU.

* Add visualization for desc_interrupt[irq].processor value in
  '/proc/interrupt'

* If less than 2 CPUs online on boot-time kirqd is not started
  At least the rest of the logic wouldn't be able to recognize added CPUs at a
  later time.

* FIX timer_irq_works() - used a 'unsigned int' to store jiffies value

[-- Attachment #2: diffstat.txt --]
[-- Type: text/plain, Size: 247 bytes --]

 arch/i386/kernel/io_apic.c  |  118 +++++++++++++++++++++++++++++---------------
 arch/i386/kernel/irq.c      |   11 ++++
 arch/i386/kernel/irq.c.orig |only
 include/linux/irq.h         |    3 +
 4 files changed, 92 insertions(+), 40 deletions(-)

[-- Attachment #3: balance_irq-2.5.63-bk7.patch --]
[-- Type: text/x-diff, Size: 8524 bytes --]

diff -u -r linux-2.5.63/arch/i386/kernel/io_apic.c linux-2.5.63.new/arch/i386/kernel/io_apic.c
--- linux-2.5.63/arch/i386/kernel/io_apic.c	2003-03-04 13:07:12.000000000 +0100
+++ linux-2.5.63.new/arch/i386/kernel/io_apic.c	2003-03-04 13:43:40.000000000 +0100
@@ -18,6 +18,8 @@
  *					and Rolf G. Tews
  *					for testing these extensively
  *	Paul Diefenbaugh	:	Added full ACPI support
+ *	Kai Bankett		:	Improved interrupt distribution
+ *					and stickiness
  */
 
 #include <linux/mm.h>
@@ -214,7 +216,7 @@
 # include <linux/timer.h>	/* time_after() */
  
 # if CONFIG_BALANCED_IRQ_DEBUG
-#  define TDprintk(x...) do { printk("<%ld:%s:%d>: ", jiffies, __FILE__, __LINE__); printk(x); } while (0)
+#  define TDprintk(x...) do { printk("<%lu:%s:%d>: ", jiffies, __FILE__, __LINE__); printk(x); } while (0)
 #  define Dprintk(x...) do { TDprintk(x); } while (0)
 # else
 #  define TDprintk(x...) 
@@ -232,9 +234,17 @@
 	unsigned long irq;
 } irq_cpu_data[NR_CPUS];
 
+struct irq_cpu_sum {
+	unsigned long total;
+} irq_cpu_total[NR_CPUS];
+
+/* fall back to this CPU-no for all interrupts */
+#define IRQ_PRIMARY_CPU 0
+
 #define CPU_IRQ(cpu)		(irq_cpu_data[cpu].irq)
 #define LAST_CPU_IRQ(cpu,irq)   (irq_cpu_data[cpu].last_irq[irq])
 #define IRQ_DELTA(cpu,irq) 	(irq_cpu_data[cpu].irq_delta[irq])
+#define CPU_IRQ_TOTAL(cpu)	(irq_cpu_total[cpu].total)
 
 #define IDLE_ENOUGH(cpu,now) \
 		(idle_cpu(cpu) && ((now) - irq_stat[(cpu)].idle_timestamp > 1))
@@ -252,7 +262,7 @@
 
 long balanced_irq_interval = MAX_BALANCED_IRQ_INTERVAL;
 					 
-static inline void balance_irq(int cpu, int irq);
+static inline void balance_irq(int cpu, int irq, int on_primary);
 
 static inline void rotate_irqs_among_cpus(unsigned long useful_load_threshold)
 {
@@ -265,7 +275,8 @@
 			/* Is it a significant load ?  */
 			if (IRQ_DELTA(CPU_TO_PACKAGEINDEX(i),j) < useful_load_threshold)
 				continue;
-			balance_irq(i, j);
+			/* balance (no primary force) */
+			balance_irq(i, j, 0);
 		}
 	}
 	balanced_irq_interval = max((long)MIN_BALANCED_IRQ_INTERVAL,
@@ -279,7 +290,7 @@
 	unsigned long max_cpu_irq = 0, min_cpu_irq = (~0);
 	unsigned long move_this_load = 0;
 	int max_loaded = 0, min_loaded = 0;
-	unsigned long useful_load_threshold = balanced_irq_interval + 10;
+	unsigned long useful_load_threshold = balanced_irq_interval + 200;
 	int selected_irq;
 	int tmp_loaded, first_attempt = 1;
 	unsigned long tmp_cpu_irq;
@@ -293,6 +304,7 @@
 		if (!cpu_online(i))
 			continue;
 		package_index = CPU_TO_PACKAGEINDEX(i);
+		CPU_IRQ_TOTAL(package_index) = 0;
 		for (j = 0; j < NR_IRQS; j++) {
 			unsigned long value_now, delta;
 			/* Is this an active IRQ? */
@@ -306,6 +318,18 @@
 			/* Determine the activity per processor per IRQ */
 			delta = value_now - LAST_CPU_IRQ(i,j);
 
+			/* Switch back to primary cpu if not loaded */
+			if ((i == irq_desc[j].processor) &&
+			    (delta < useful_load_threshold) &&
+			    (irq_desc[j].processor != IRQ_PRIMARY_CPU)) {
+				/* move back irq */
+				balance_irq(irq_desc[j].processor,j,1);
+				continue;
+			}
+
+			/* update irq total counter */
+			CPU_IRQ_TOTAL(package_index) += delta;
+
 			/* Update last_cpu_irq[][] for the next time */
 			LAST_CPU_IRQ(i,j) = value_now;
 
@@ -441,6 +465,7 @@
 		Dprintk("irq = %d moved to cpu = %d\n", selected_irq, min_loaded);
 		/* mark for change destination */
 		spin_lock(&desc->lock);
+		irq_desc[selected_irq].processor = min_loaded;
 		pending_irq_balance_apicid[selected_irq] = cpu_to_logical_apicid(min_loaded);
 		spin_unlock(&desc->lock);
 		/* Since we made a change, come back sooner to 
@@ -460,62 +485,75 @@
 	return;
 }
 
-static unsigned long move(int curr_cpu, unsigned long allowed_mask, unsigned long now, int direction)
+static inline void balance_irq (int cpu, int irq, int on_primary)
 {
-	int search_idle = 1;
-	int cpu = curr_cpu;
-
-	goto inside;
-
-	do {
-		if (unlikely(cpu == curr_cpu))
-			search_idle = 0;
-inside:
-		if (direction == 1) {
-			cpu++;
-			if (cpu >= NR_CPUS)
-				cpu = 0;
-		} else {
-			cpu--;
-			if (cpu == -1)
-				cpu = NR_CPUS-1;
-		}
-	} while (!cpu_online(cpu) || !IRQ_ALLOWED(cpu,allowed_mask) ||
-			(search_idle && !IDLE_ENOUGH(cpu,now)));
-
-	return cpu;
-}
-
-static inline void balance_irq (int cpu, int irq)
-{
-	unsigned long now = jiffies;
 	unsigned long allowed_mask;
-	unsigned int new_cpu;
+	unsigned long tmp_cur_irq;
+	unsigned int i, new_cpu;
 		
 	if (irqbalance_disabled)
 		return;
 
 	allowed_mask = cpu_online_map & irq_affinity[irq];
-	new_cpu = move(cpu, allowed_mask, now, 1);
+
+	if (on_primary == 1) {
+		new_cpu = IRQ_PRIMARY_CPU;
+		goto do_work;
+	}
+
+	/* Does ist make sense to balance ? */
+	new_cpu = IRQ_PRIMARY_CPU;
+	tmp_cur_irq = ULONG_MAX;
+
+	for (i = 0; i < NR_CPUS; i++) {
+		if (!cpu_online(i) || !IRQ_ALLOWED(i,allowed_mask))
+			continue;
+		if (CPU_IRQ_TOTAL(CPU_TO_PACKAGEINDEX(i)) < tmp_cur_irq) {
+			tmp_cur_irq = CPU_IRQ_TOTAL(CPU_TO_PACKAGEINDEX(i));
+			new_cpu = i;
+		}
+	}
+	if (CPU_IRQ_TOTAL(CPU_TO_PACKAGEINDEX(new_cpu)) + IRQ_DELTA(CPU_TO_PACKAGEINDEX(cpu),irq)
+			>= CPU_IRQ_TOTAL(CPU_TO_PACKAGEINDEX(cpu))) {
+		Dprintk("balanced_irq: Balance makes no sense\n");
+		return;
+	}
+
+do_work:
 	if (cpu != new_cpu) {
 		irq_desc_t *desc = irq_desc + irq;
 		spin_lock(&desc->lock);
+		irq_desc[irq].processor = new_cpu;
 		pending_irq_balance_apicid[irq] = cpu_to_logical_apicid(new_cpu);
 		spin_unlock(&desc->lock);
-	}
+	} else
+		Dprintk("balance_irq: irq-switch senseless (cpu == new_cpu)\n");
 }
 
 int balanced_irq(void *unused)
 {
 	int i;
+	int cpu_count = 0;
 	unsigned long prev_balance_time = jiffies;
 	long time_remaining = balanced_irq_interval;
 
+	/* push everything to CPU(IRQ_PRIMRAY_CPU)
+	   to give us a starting point. */
+	for (i = 0; i < NR_IRQS; i++) {
+		pending_irq_balance_apicid[i] = cpu_to_logical_apicid(IRQ_PRIMARY_CPU);
+		irq_desc[i].processor = 0;
+	}
+
+	/* if running only with one cpu - balance_irq does not make sense */
+	for (i = 0; i < NR_CPUS; i++) {
+		if (cpu_online(i))
+			cpu_count++;
+	}
+	if (cpu_count < 2)
+		return 0;
+
 	daemonize("kirqd");
 	
-	/* push everything to CPU 0 to give us a starting point.  */
-	for (i = 0 ; i < NR_IRQS ; i++)
-		pending_irq_balance_apicid[i] = cpu_to_logical_apicid(0);
 	for (;;) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		time_remaining = schedule_timeout(time_remaining);
@@ -1566,7 +1604,7 @@
  */
 static int __init timer_irq_works(void)
 {
-	unsigned int t1 = jiffies;
+	unsigned long t1 = jiffies;
 
 	local_irq_enable();
 	/* Let ten ticks pass... */
@@ -1579,7 +1617,7 @@
 	 * might have cached one ExtINT interrupt.  Finally, at
 	 * least one tick may be lost due to delays.
 	 */
-	if (jiffies - t1 > 4)
+	if (time_after(jiffies, t1 + 4))
 		return 1;
 
 	return 0;
diff -u -r linux-2.5.63/arch/i386/kernel/irq.c linux-2.5.63.new/arch/i386/kernel/irq.c
--- linux-2.5.63/arch/i386/kernel/irq.c	2003-03-04 13:07:12.000000000 +0100
+++ linux-2.5.63.new/arch/i386/kernel/irq.c	2003-03-04 13:18:22.000000000 +0100
@@ -65,8 +65,13 @@
 /*
  * Controller mappings for all interrupt sources:
  */
+#if defined(CONFIG_X86_IO_APIC)
+irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
+	{ [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, 0, SPIN_LOCK_UNLOCKED}};
+#else
 irq_desc_t irq_desc[NR_IRQS] __cacheline_aligned =
 	{ [0 ... NR_IRQS-1] = { 0, &no_irq_type, NULL, 0, SPIN_LOCK_UNLOCKED}};
+#endif
 
 static void register_irq_proc (unsigned int irq);
 
@@ -140,6 +145,9 @@
 	for (j=0; j<NR_CPUS; j++)
 		if (cpu_online(j))
 			p += seq_printf(p, "CPU%d       ",j);
+#if CONFIG_X86_IO_APIC
+	p += seq_printf(p, "ON_CPU");
+#endif
 	seq_putc(p, '\n');
 
 	for (i = 0 ; i < NR_IRQS ; i++) {
@@ -155,6 +163,9 @@
 				p += seq_printf(p, "%10u ",
 					     kstat_cpu(j).irqs[i]);
 #endif
+#if CONFIG_X86_IO_APIC
+		seq_printf(p, " %11i", irq_desc[i].processor);
+#endif
 		seq_printf(p, " %14s", irq_desc[i].handler->typename);
 		seq_printf(p, "  %s", action->name);
 
Only in linux-2.5.63.new/arch/i386/kernel: irq.c.orig
diff -u -r linux-2.5.63/include/linux/irq.h linux-2.5.63.new/include/linux/irq.h
--- linux-2.5.63/include/linux/irq.h	2003-02-24 20:05:29.000000000 +0100
+++ linux-2.5.63.new/include/linux/irq.h	2003-03-03 12:37:06.000000000 +0100
@@ -61,6 +61,9 @@
 	hw_irq_controller *handler;
 	struct irqaction *action;	/* IRQ action list */
 	unsigned int depth;		/* nested irq disables */
+#if defined(CONFIG_X86_IO_APIC)
+	unsigned int processor;
+#endif
 	spinlock_t lock;
 } ____cacheline_aligned irq_desc_t;
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2003-03-06 19:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-06 20:01 [PATCH][IO_APIC] 2.5.63bk7 irq_balance improvments / bug-fixes Nakajima, Jun
  -- strict thread matches above, loose matches on Subject: below --
2003-03-05 19:57 Kamble, Nitin A
2003-03-05  4:21 Kamble, Nitin A
2003-03-05  4:38 ` Jeff Garzik
2003-03-05 15:46   ` Jason Lunz
2003-03-05 18:26 ` Arjan van de Ven
2003-03-04 23:33 Kamble, Nitin A
2003-03-04 23:51 ` Andrew Morton
2003-03-05 10:48 ` Kai Bankett
2003-03-04 16:33 Kai Bankett
2003-03-04 16:45 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).