linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Updated MSI Patches
@ 2003-08-13  1:34 Nakajima, Jun
  2003-08-13  1:37 ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Nakajima, Jun @ 2003-08-13  1:34 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Jeff Garzik, Nguyen, Tom L, Linux Kernel, long

Salability means many things. I'm not sure which aspect you are talking
about, but a good thing with MSI is that it does not depend on I/O
APICs. A typical I/O APIC has 24 RTEs, and we need about 10 I/O APICs to
consume the vectors. 

You can make this scale on SMP systems. The vector-based approach would
be easier to extend it per-CPU because vectors are inherently per-CPU,
compared with IRQ-based. Today we have IRQ balancing as you know, and
the key part is to bind IRQ to a particular CPU (BTW, with our patch it
happens automatically because the balancer does not care if they are IRQ
or vector). 

Thanks,
Jun

> -----Original Message-----
> From: Zwane Mwaikambo [mailto:zwane@linuxpower.ca]
> Sent: Tuesday, August 12, 2003 5:49 PM
> To: Nakajima, Jun
> Cc: Jeff Garzik; Nguyen, Tom L; Linux Kernel; long
> Subject: RE: Updated MSI Patches
> 
> On Tue, 12 Aug 2003, Nakajima, Jun wrote:
> 
> > static unsigned int startup_edge_ioapic_vector(unsigned int vector)
> > {
> > 	int was_pending = 0;
> > 	unsigned long flags;
> > 	int irq = vector_to_irq (vector);
> >
> > 	spin_lock_irqsave(&ioapic_lock, flags);
> > 	if (irq < 16) {
> > 		disable_8259A_irq(irq);
> > 		if (i8259A_irq_pending(irq))
> > 			was_pending = 1;
> > 	}
> > 	__unmask_IO_APIC_irq(irq);
> > 	spin_unlock_irqrestore(&ioapic_lock, flags);
> >
> > 	return was_pending;
> > }
> 
> Hmm that doesn't look too bad, i like, ok another question, do you
think
> this could be made to scale on systems with lots of devices requiring
> vectors?
> 
> Thanks,
> 	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
  2003-08-13  1:34 Updated MSI Patches Nakajima, Jun
@ 2003-08-13  1:37 ` Zwane Mwaikambo
  0 siblings, 0 replies; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-13  1:37 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: Jeff Garzik, Nguyen, Tom L, Linux Kernel, long

On Tue, 12 Aug 2003, Nakajima, Jun wrote:

> Salability means many things. I'm not sure which aspect you are talking
> about, but a good thing with MSI is that it does not depend on I/O
> APICs. A typical I/O APIC has 24 RTEs, and we need about 10 I/O APICs to
> consume the vectors. 

Yes there are already a number of systems which exhibit this problem (see 
the URL in the next comment)
 
> You can make this scale on SMP systems. The vector-based approach would
> be easier to extend it per-CPU because vectors are inherently per-CPU,
> compared with IRQ-based. Today we have IRQ balancing as you know, and
> the key part is to bind IRQ to a particular CPU (BTW, with our patch it
> happens automatically because the balancer does not care if they are IRQ
> or vector). 

Sorry, yeah i meant device scalability essentially, the ability to plug in 
a fair amount of these and other devices into the same system. The per cpu 
IDT route is also worthwhile following a bit later on. I've already 
written something along the lines of a per node IDT (you could make the 
granularity per cpu of course) if you're interested you can have a look 
here http://www.osdl.org/projects/numaqhwspprt/results/patch-numaq-highirq10 it 
also has a companion patch to dynamically allocate the irq_desc's which 
applies on top of it http://function.linuxpower.ca/patches/patch-dynirq-wli

But back to the subject, your patch looks good, i'll test it for 
regression (no MSI devices but regression testing is just as important) 
when you next post.

Thanks!
	Zwane

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Updated MSI Patches
@ 2003-10-03 22:15 long
  0 siblings, 0 replies; 35+ messages in thread
From: long @ 2003-10-03 22:15 UTC (permalink / raw)
  To: jgarzik; +Cc: jun.nakajima, linux-kernel, tom.l.nguyen

Hi Jeff,

Thank you for your feedback on MSI patches based on kernel 
2.6.0-test5. Below are the updated patches, which are based
on kernel 2.6.0-test6 and include minor modification to
reflect your input.

Thanks,
Long
---------------------------------------------------------
diff -urN linux-2.6.0-test6/arch/i386/Kconfig linux-260t6-vectorbase-testbox/arch/i386/Kconfig
--- linux-2.6.0-test6/arch/i386/Kconfig	2003-09-27 20:50:10.000000000 -0400
+++ linux-260t6-vectorbase-testbox/arch/i386/Kconfig	2003-09-30 17:14:41.000000000 -0400
@@ -1030,6 +1030,17 @@
  	depends on PCI && ((PCI_GODIRECT || PCI_GOANY) || X86_VISWS)
 	default y
 
+config PCI_USE_VECTOR
+	bool "PCI_USE_VECTOR"
+	default n
+	help
+	   This replaces the current existing IRQ-based index interrupt scheme
+	   with the vector-base index scheme. The advantages of vector base over	   IRQ base are listed below:
+	   1) Support MSI implementation.
+	   2) Support future IOxAPIC hotplug
+
+	   If you don't know what to do here, say N.
+
 source "drivers/pci/Kconfig"
 
 config ISA
diff -urN linux-2.6.0-test6/arch/i386/kernel/i8259.c linux-260t6-vectorbase-testbox/arch/i386/kernel/i8259.c
--- linux-2.6.0-test6/arch/i386/kernel/i8259.c	2003-09-27 20:51:02.000000000 -0400
+++ linux-260t6-vectorbase-testbox/arch/i386/kernel/i8259.c	2003-09-30 17:14:41.000000000 -0400
@@ -419,8 +419,10 @@
 	 * us. (some of these will be overridden and become
 	 * 'special' SMP interrupts)
 	 */
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
 		int vector = FIRST_EXTERNAL_VECTOR + i;
+		if (i >= NR_IRQS)
+			break;
 		if (vector != SYSCALL_VECTOR) 
 			set_intr_gate(vector, interrupt[i]);
 	}
diff -urN linux-2.6.0-test6/arch/i386/kernel/io_apic.c linux-260t6-vectorbase-testbox/arch/i386/kernel/io_apic.c
--- linux-2.6.0-test6/arch/i386/kernel/io_apic.c	2003-09-27 20:50:15.000000000 -0400
+++ linux-260t6-vectorbase-testbox/arch/i386/kernel/io_apic.c	2003-09-30 17:14:41.000000000 -0400
@@ -76,6 +76,14 @@
 	int apic, pin, next;
 } irq_2_pin[PIN_MAP_SIZE];
 
+#ifdef CONFIG_PCI_USE_VECTOR
+int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
+#define vector_to_irq(vector) 	\
+	(platform_legacy_irq(vector) ? vector : vector_irq[vector])
+#else
+#define vector_to_irq(vector)	(vector)
+#endif
+
 /*
  * The common case is 1:1 IRQ<->pin mappings. Sometimes there are
  * shared ISA-space IRQs, so we have to support them. We are super
@@ -249,7 +257,7 @@
 			clear_IO_APIC_pin(apic, pin);
 }
 
-static void set_ioapic_affinity(unsigned int irq, cpumask_t cpumask)
+static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t cpumask)
 {
 	unsigned long flags;
 	int pin;
@@ -670,13 +678,13 @@
 
 __setup("noirqbalance", irqbalance_disable);
 
-static void set_ioapic_affinity(unsigned int irq, cpumask_t mask);
+static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t mask);
 
 static inline void move_irq(int irq)
 {
 	/* note - we hold the desc->lock */
 	if (unlikely(!cpus_empty(pending_irq_balance_cpumask[irq]))) {
-		set_ioapic_affinity(irq, pending_irq_balance_cpumask[irq]);
+		set_ioapic_affinity_irq(irq, pending_irq_balance_cpumask[irq]);
 		cpus_clear(pending_irq_balance_cpumask[irq]);
 	}
 }
@@ -853,7 +861,7 @@
 			if (irq_entry == -1)
 				continue;
 			irq = pin_2_irq(irq_entry, ioapic, pin);
-			set_ioapic_affinity(irq, mask);
+			set_ioapic_affinity_irq(irq, mask);
 		}
 
 	}
@@ -1156,12 +1164,40 @@
 		current_vector = FIRST_DEVICE_VECTOR + offset;
 	}
 
+#ifdef CONFIG_PCI_USE_VECTOR
+	vector_irq[current_vector] = irq;
+#endif
+	
 	IO_APIC_VECTOR(irq) = current_vector;
+
 	return current_vector;
 }
 
-static struct hw_interrupt_type ioapic_level_irq_type;
-static struct hw_interrupt_type ioapic_edge_irq_type;
+static struct hw_interrupt_type ioapic_level_type;
+static struct hw_interrupt_type ioapic_edge_type;
+
+#define IOAPIC_AUTO	-1
+#define IOAPIC_EDGE	0
+#define IOAPIC_LEVEL	1
+
+static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger)
+{
+	if (use_pci_vector() && !platform_legacy_irq(irq)) {
+		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
+				trigger == IOAPIC_LEVEL)
+			irq_desc[vector].handler = &ioapic_level_type;
+		else
+			irq_desc[vector].handler = &ioapic_edge_type;
+		set_intr_gate(vector, interrupt[vector]);
+	} else	{
+		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
+				trigger == IOAPIC_LEVEL)
+			irq_desc[irq].handler = &ioapic_level_type;
+		else
+			irq_desc[irq].handler = &ioapic_edge_type;
+		set_intr_gate(vector, interrupt[irq]);
+	}
+}
 
 void __init setup_IO_APIC_irqs(void)
 {
@@ -1219,13 +1255,7 @@
 		if (IO_APIC_IRQ(irq)) {
 			vector = assign_irq_vector(irq);
 			entry.vector = vector;
-
-			if (IO_APIC_irq_trigger(irq))
-				irq_desc[irq].handler = &ioapic_level_irq_type;
-			else
-				irq_desc[irq].handler = &ioapic_edge_irq_type;
-
-			set_intr_gate(vector, interrupt[irq]);
+			ioapic_register_intr(irq, vector, IOAPIC_AUTO);
 		
 			if (!apic && (irq < 16))
 				disable_8259A_irq(irq);
@@ -1272,7 +1302,7 @@
 	 * The timer IRQ doesn't have to know that behind the
 	 * scene we have a 8259A-master in AEOI mode ...
 	 */
-	irq_desc[0].handler = &ioapic_edge_irq_type;
+	irq_desc[0].handler = &ioapic_edge_type;
 
 	/*
 	 * Add it to the IO-APIC irq-routing table:
@@ -1762,9 +1792,6 @@
  * that was delayed but this is now handled in the device
  * independent code.
  */
-#define enable_edge_ioapic_irq unmask_IO_APIC_irq
-
-static void disable_edge_ioapic_irq (unsigned int irq) { /* nothing */ }
 
 /*
  * Starting up a edge-triggered IO-APIC interrupt is
@@ -1775,7 +1802,6 @@
  * This is not complete - we should be able to fake
  * an edge even if it isn't on the 8259A...
  */
-
 static unsigned int startup_edge_ioapic_irq(unsigned int irq)
 {
 	int was_pending = 0;
@@ -1793,8 +1819,6 @@
 	return was_pending;
 }
 
-#define shutdown_edge_ioapic_irq	disable_edge_ioapic_irq
-
 /*
  * Once we have recorded IRQ_PENDING already, we can mask the
  * interrupt for real. This prevents IRQ storms from unhandled
@@ -1809,9 +1833,6 @@
 	ack_APIC_irq();
 }
 
-static void end_edge_ioapic_irq (unsigned int i) { /* nothing */ }
-
-
 /*
  * Level triggered interrupts can just be masked,
  * and shutting down and starting up the interrupt
@@ -1833,10 +1854,6 @@
 	return 0; /* don't check for pending */
 }
 
-#define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
-
 static void end_level_ioapic_irq (unsigned int irq)
 {
 	unsigned long v;
@@ -1863,6 +1880,7 @@
  * The idea is from Manfred Spraul.  --macro
  */
 	i = IO_APIC_VECTOR(irq);
+
 	v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
 	ack_APIC_irq();
@@ -1897,7 +1915,57 @@
 	}
 }
 
-static void mask_and_ack_level_ioapic_irq (unsigned int irq) { /* nothing */ }
+#ifdef CONFIG_PCI_USE_VECTOR
+static unsigned int startup_edge_ioapic_vector(unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	return startup_edge_ioapic_irq(irq);
+}
+
+static void ack_edge_ioapic_vector(unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+	
+	ack_edge_ioapic_irq(irq);
+}
+
+static unsigned int startup_level_ioapic_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	return startup_level_ioapic_irq (irq);
+}
+
+static void end_level_ioapic_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	end_level_ioapic_irq(irq);
+}
+
+static void mask_IO_APIC_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	mask_IO_APIC_irq(irq);
+}
+
+static void unmask_IO_APIC_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+	
+	unmask_IO_APIC_irq(irq);
+}
+
+static void set_ioapic_affinity_vector (unsigned int vector, 
+					unsigned long cpu_mask)
+{
+	int irq = vector_to_irq(vector);
+
+	set_ioapic_affinity_irq(irq, cpu_mask);
+}
+#endif
 
 /*
  * Level and edge triggered IO-APIC interrupts need different handling,
@@ -1907,26 +1975,25 @@
  * edge-triggered handler, without risking IRQ storms and other ugly
  * races.
  */
-
-static struct hw_interrupt_type ioapic_edge_irq_type = {
+static struct hw_interrupt_type ioapic_edge_type = {
 	.typename 	= "IO-APIC-edge",
-	.startup 	= startup_edge_ioapic_irq,
-	.shutdown 	= shutdown_edge_ioapic_irq,
-	.enable 	= enable_edge_ioapic_irq,
-	.disable 	= disable_edge_ioapic_irq,
-	.ack 		= ack_edge_ioapic_irq,
-	.end 		= end_edge_ioapic_irq,
+	.startup 	= startup_edge_ioapic,
+	.shutdown 	= shutdown_edge_ioapic,
+	.enable 	= enable_edge_ioapic,
+	.disable 	= disable_edge_ioapic,
+	.ack 		= ack_edge_ioapic,
+	.end 		= end_edge_ioapic,
 	.set_affinity 	= set_ioapic_affinity,
 };
 
-static struct hw_interrupt_type ioapic_level_irq_type = {
+static struct hw_interrupt_type ioapic_level_type = {
 	.typename 	= "IO-APIC-level",
-	.startup 	= startup_level_ioapic_irq,
-	.shutdown 	= shutdown_level_ioapic_irq,
-	.enable 	= enable_level_ioapic_irq,
-	.disable 	= disable_level_ioapic_irq,
-	.ack 		= mask_and_ack_level_ioapic_irq,
-	.end 		= end_level_ioapic_irq,
+	.startup 	= startup_level_ioapic,
+	.shutdown 	= shutdown_level_ioapic,
+	.enable 	= enable_level_ioapic,
+	.disable 	= disable_level_ioapic,
+	.ack 		= mask_and_ack_level_ioapic,
+	.end 		= end_level_ioapic,
 	.set_affinity 	= set_ioapic_affinity,
 };
 
@@ -1946,7 +2013,13 @@
 	 * 0x80, because int 0x80 is hm, kind of importantish. ;)
 	 */
 	for (irq = 0; irq < NR_IRQS ; irq++) {
-		if (IO_APIC_IRQ(irq) && !IO_APIC_VECTOR(irq)) {
+		int tmp = irq;
+		if (use_pci_vector()) {
+			if (!platform_legacy_irq(tmp))  
+				if ((tmp = vector_to_irq(tmp)) == -1)
+					continue;
+		}
+		if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
 			/*
 			 * Hmm.. We don't have an entry for this,
 			 * so default to an old-fashioned 8259
@@ -2378,10 +2451,12 @@
 		"IRQ %d Mode:%i Active:%i)\n", ioapic, 
 		mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq, edge_level, active_high_low);
 
+ 	if (use_pci_vector() && !platform_legacy_irq(irq))
+		irq = IO_APIC_VECTOR(irq);
 	if (edge_level) {
-	irq_desc[irq].handler = &ioapic_level_irq_type;
+		irq_desc[irq].handler = &ioapic_level_type;
 	} else {
-		irq_desc[irq].handler = &ioapic_edge_irq_type;
+		irq_desc[irq].handler = &ioapic_edge_type;
 	}
 
 	set_intr_gate(entry.vector, interrupt[irq]);
diff -urN linux-2.6.0-test6/arch/i386/kernel/mpparse.c linux-260t6-vectorbase-testbox/arch/i386/kernel/mpparse.c
--- linux-2.6.0-test6/arch/i386/kernel/mpparse.c	2003-09-27 20:50:15.000000000 -0400
+++ linux-260t6-vectorbase-testbox/arch/i386/kernel/mpparse.c	2003-09-30 17:14:41.000000000 -0400
@@ -1136,15 +1136,19 @@
 		if ((1<<bit) & mp_ioapic_routing[ioapic].pin_programmed[idx]) {
 			printk(KERN_DEBUG "Pin %d-%d already programmed\n",
 				mp_ioapic_routing[ioapic].apic_id, ioapic_pin);
-			entry->irq = irq;
+ 			if (use_pci_vector() && !platform_legacy_irq(irq))
+ 				irq = IO_APIC_VECTOR(irq);
+ 			entry->irq = irq;
 			continue;
 		}
 
 		mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<<bit);
 
-		if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq, edge_level, active_high_low))
-			entry->irq = irq;
-
+		if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq, edge_level, active_high_low)) {
+ 			if (use_pci_vector() && !platform_legacy_irq(irq))
+ 				irq = IO_APIC_VECTOR(irq);
+ 			entry->irq = irq;
+ 		}
 		printk(KERN_DEBUG "%02x:%02x:%02x[%c] -> %d-%d -> IRQ %d\n",
 			entry->id.segment, entry->id.bus, 
 			entry->id.device, ('A' + entry->pin), 
diff -urN linux-2.6.0-test6/arch/i386/pci/irq.c linux-260t6-vectorbase-testbox/arch/i386/pci/irq.c
--- linux-2.6.0-test6/arch/i386/pci/irq.c	2003-09-27 20:51:16.000000000 -0400
+++ linux-260t6-vectorbase-testbox/arch/i386/pci/irq.c	2003-09-30 17:14:41.000000000 -0400
@@ -745,6 +745,10 @@
 							bridge->bus->number, PCI_SLOT(bridge->devfn), pin, irq);
 				}
 				if (irq >= 0) {
+					if (use_pci_vector() && 
+						!platform_legacy_irq(irq))
+						irq = IO_APIC_VECTOR(irq);
+
 					printk(KERN_INFO "PCI->APIC IRQ transform: (B%d,I%d,P%d) -> %d\n",
 						dev->bus->number, PCI_SLOT(dev->devfn), pin, irq);
 					dev->irq = irq;
diff -urN linux-2.6.0-test6/include/asm-i386/hw_irq.h linux-260t6-vectorbase-testbox/include/asm-i386/hw_irq.h
--- linux-2.6.0-test6/include/asm-i386/hw_irq.h	2003-09-27 20:51:15.000000000 -0400
+++ linux-260t6-vectorbase-testbox/include/asm-i386/hw_irq.h	2003-09-30 17:14:41.000000000 -0400
@@ -41,6 +41,7 @@
 asmlinkage void error_interrupt(void);
 asmlinkage void spurious_interrupt(void);
 asmlinkage void thermal_interrupt(struct pt_regs);
+#define platform_legacy_irq(irq)	((irq) < 16)
 #endif
 
 void mask_irq(unsigned int irq);
diff -urN linux-2.6.0-test6/include/asm-i386/io_apic.h linux-260t6-vectorbase-testbox/include/asm-i386/io_apic.h
--- linux-2.6.0-test6/include/asm-i386/io_apic.h	2003-09-27 20:50:36.000000000 -0400
+++ linux-260t6-vectorbase-testbox/include/asm-i386/io_apic.h	2003-09-30 17:14:41.000000000 -0400
@@ -13,6 +13,46 @@
 
 #ifdef CONFIG_X86_IO_APIC
 
+#ifdef CONFIG_PCI_USE_VECTOR
+static inline int use_pci_vector(void)	{return 1;}
+static inline void disable_edge_ioapic_vector(unsigned int vector) { }
+static inline void mask_and_ack_level_ioapic_vector(unsigned int vector) { }
+static inline void end_edge_ioapic_vector (unsigned int vector) { }
+#define startup_level_ioapic	startup_level_ioapic_vector
+#define shutdown_level_ioapic	mask_IO_APIC_vector
+#define enable_level_ioapic	unmask_IO_APIC_vector
+#define disable_level_ioapic	mask_IO_APIC_vector
+#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_vector	
+#define end_level_ioapic	end_level_ioapic_vector
+#define set_ioapic_affinity	set_ioapic_affinity_vector
+
+#define startup_edge_ioapic 	startup_edge_ioapic_vector
+#define shutdown_edge_ioapic 	disable_edge_ioapic_vector
+#define enable_edge_ioapic 	unmask_IO_APIC_vector
+#define disable_edge_ioapic 	disable_edge_ioapic_vector
+#define ack_edge_ioapic 	ack_edge_ioapic_vector
+#define end_edge_ioapic 	end_edge_ioapic_vector
+#else
+static inline int use_pci_vector(void)	{return 0;}
+static inline void disable_edge_ioapic_irq(unsigned int irq) { }
+static inline void mask_and_ack_level_ioapic_irq(unsigned int irq) { }
+static inline void end_edge_ioapic_irq (unsigned int irq) { }
+#define startup_level_ioapic	startup_level_ioapic_irq
+#define shutdown_level_ioapic	mask_IO_APIC_irq
+#define enable_level_ioapic	unmask_IO_APIC_irq
+#define disable_level_ioapic	mask_IO_APIC_irq
+#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_irq	
+#define end_level_ioapic	end_level_ioapic_irq
+#define set_ioapic_affinity	set_ioapic_affinity_irq
+
+#define startup_edge_ioapic 	startup_edge_ioapic_irq
+#define shutdown_edge_ioapic 	disable_edge_ioapic_irq
+#define enable_edge_ioapic 	unmask_IO_APIC_irq
+#define disable_edge_ioapic 	disable_edge_ioapic_irq
+#define ack_edge_ioapic 	ack_edge_ioapic_irq
+#define end_edge_ioapic 	end_edge_ioapic_irq
+#endif
+
 #define APIC_MISMATCH_DEBUG
 
 #define IO_APIC_BASE(idx) \
diff -urN linux-2.6.0-test6/include/asm-i386/mach-default/irq_vectors.h linux-260t6-vectorbase-testbox/include/asm-i386/mach-default/irq_vectors.h
--- linux-2.6.0-test6/include/asm-i386/mach-default/irq_vectors.h	2003-09-27 20:50:14.000000000 -0400
+++ linux-260t6-vectorbase-testbox/include/asm-i386/mach-default/irq_vectors.h	2003-09-30 17:14:41.000000000 -0400
@@ -76,11 +76,21 @@
  * Since vectors 0x00-0x1f are used/reserved for the CPU,
  * the usable vector space is 0x20-0xff (224 vectors)
  */
+/*
+ * The maximum number of vectors supported by i386 processors
+ * is limited to 256. For processors other than i386, NR_VECTORS
+ * should be changed accordingly. 
+ */
+#define NR_VECTORS 256
+#ifdef CONFIG_PCI_USE_VECTOR
+#define NR_IRQS FIRST_SYSTEM_VECTOR
+#else
 #ifdef CONFIG_X86_IO_APIC
 #define NR_IRQS 224
 #else
 #define NR_IRQS 16
 #endif
+#endif
 
 #define FPU_IRQ			13
 
diff -urN linux-260t6-vectorbase-testbox/arch/i386/Kconfig linux-260t6-msi-testbox/arch/i386/Kconfig
--- linux-260t6-vectorbase-testbox/arch/i386/Kconfig	2003-09-30 17:14:41.000000000 -0400
+++ linux-260t6-msi-testbox/arch/i386/Kconfig	2003-09-30 19:09:30.000000000 -0400
@@ -1032,6 +1032,7 @@
 
 config PCI_USE_VECTOR
 	bool "PCI_USE_VECTOR"
+	depends on X86_LOCAL_APIC
 	default n
 	help
 	   This replaces the current existing IRQ-based index interrupt scheme
@@ -1039,6 +1040,12 @@
 	   1) Support MSI implementation.
 	   2) Support future IOxAPIC hotplug
 
+	   Note that this enables MSI, Message Signaled Interrupt, on all
+	   MSI capable device functions detected if users also install the 
+	   MSI patch. Message Signal Interrupt enables an MSI-capable
+	   hardware device to send an inbound Memory Write on its PCI bus
+	   instead of asserting IRQ signal on device IRQ pin.
+
 	   If you don't know what to do here, say N.
 
 source "drivers/pci/Kconfig"
diff -urN linux-260t6-vectorbase-testbox/arch/i386/kernel/io_apic.c linux-260t6-msi-testbox/arch/i386/kernel/io_apic.c
--- linux-260t6-vectorbase-testbox/arch/i386/kernel/io_apic.c	2003-09-30 17:14:41.000000000 -0400
+++ linux-260t6-msi-testbox/arch/i386/kernel/io_apic.c	2003-09-30 19:09:30.000000000 -0400
@@ -296,7 +296,7 @@
 
 extern cpumask_t irq_affinity[NR_IRQS];
 
-static cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS];
+cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS];
 
 #define IRQBALANCE_CHECK_ARCH -999
 static int irqbalance_disabled = IRQBALANCE_CHECK_ARCH;
@@ -1148,6 +1148,7 @@
 
 int irq_vector[NR_IRQS] = { FIRST_DEVICE_VECTOR , 0 };
 
+#ifndef CONFIG_PCI_USE_VECTOR
 static int __init assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
@@ -1164,14 +1165,13 @@
 		current_vector = FIRST_DEVICE_VECTOR + offset;
 	}
 
-#ifdef CONFIG_PCI_USE_VECTOR
-	vector_irq[current_vector] = irq;
-#endif
-	
 	IO_APIC_VECTOR(irq) = current_vector;
 
 	return current_vector;
 }
+#else
+extern int assign_irq_vector(int irq);
+#endif
 
 static struct hw_interrupt_type ioapic_level_type;
 static struct hw_interrupt_type ioapic_edge_type;
diff -urN linux-260t6-vectorbase-testbox/arch/i386/pci/irq.c linux-260t6-msi-testbox/arch/i386/pci/irq.c
--- linux-260t6-vectorbase-testbox/arch/i386/pci/irq.c	2003-09-30 17:14:41.000000000 -0400
+++ linux-260t6-msi-testbox/arch/i386/pci/irq.c	2003-09-30 19:09:30.000000000 -0400
@@ -680,8 +680,10 @@
 		    	if ( dev2->irq && dev2->irq != irq && \
 			(!(pci_probe & PCI_USE_PIRQ_MASK) || \
 			((1 << dev2->irq) & mask)) ) {
+#ifndef CONFIG_PCI_USE_VECTOR
 		    		printk(KERN_INFO "IRQ routing conflict for %s, have irq %d, want irq %d\n",
 				       pci_name(dev2), dev2->irq, irq);
+#endif
 		    		continue;
 		    	}
 			dev2->irq = irq;
diff -urN linux-260t6-vectorbase-testbox/Documentation/MSI-HOWTO.txt linux-260t6-msi-testbox/Documentation/MSI-HOWTO.txt
--- linux-260t6-vectorbase-testbox/Documentation/MSI-HOWTO.txt	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t6-msi-testbox/Documentation/MSI-HOWTO.txt	2003-10-02 12:32:39.941240072 -0400
@@ -0,0 +1,321 @@
+		The MSI Driver Guide HOWTO
+	Tom L Nguyen tom.l.nguyen@intel.com
+			10/03/2003
+
+1. About this guide
+
+This guide describes the basics of Message Signaled Interrupts(MSI), the 
+advantages of using MSI over traditional interrupt mechanisms, and how 
+to enable your driver to use MSI or MSI-X. Also included is a Frequently 
+Asked Questions.
+
+2. Copyright 2003 Intel Corporation 
+
+3. What is MSI/MSI-X?
+
+Message Signaled Interrupt (MSI), as described in the PCI Local Bus 
+Specification Revision 2.3 or latest, is an optional feature, and a 
+required feature for PCI Express devices. MSI enables a device function
+to request service by sending an Inbound Memory Write on its PCI bus to
+the FSB as a Message Signal Interrupt transaction. Because MSI is 
+generated in the form of a Memory Write, all transaction conditions, 
+such as a Retry, Master-Abort, Target-Abort or normal completion, are 
+supported.
+
+A PCI device that supports MSI must also support pin IRQ assertion 
+interrupt mechanism to provide backward compatibility for systems that
+do not support MSI. In Systems, which support MSI, the bus driver is
+responsible for initializing the message address and message data of
+the device function's MSI/MSI-X capability structure during device
+initial configuration. 
+
+An MSI capable device function indicates MSI support by implementing 
+the MSI/MSI-X capability structure in its PCI capability list. The 
+device function may implement both the MSI capability structure and 
+the MSI-X capability structure; however, the bus driver should not 
+enable both, but instead enable only the MSI-X capability structure.
+
+The MSI capability structure contains Message Control register,
+Message Address register and Message Data register. These registers
+provide the bus driver control over MSI. The Message Control register
+indicates the MSI capability supported by the device. The Message
+Address register specifies the target address and the Message Data
+register specifies the characteristics of the message. To request
+service, the device function writes the content of the Message Data
+register to the target address. The device and its software driver 
+are prohibited from writing to these registers.
+
+The MSI-X capability structure is an optional extension to MSI. It 
+uses an independent and separate capability structure. There are 
+some key advantages to implementing the MSI-X capability structure 
+over the MSI capability structure as described below.
+
+	- Support a larger maximum number of vectors per function.
+
+	- Provide the ability for system software to configure 
+	each vector with an independent message address and message 
+	data, specified by a table that resides in Memory Space. 
+
+        - MSI and MSI-X both support per-vector masking. Per-vector
+	masking is an optional extension of MSI but a required
+	feature for MSI-X. Per-vector masking provides the kernel
+	the ability to mask/unmask MSI when servicing its software
+	interrupt service routing handler. If per-vector masking is
+	not supported, then the device driver should provide the 
+	hardware/software synchronization to ensure that the device
+	generates MSI when the driver wants it to do so. 
+
+4. Why use MSI? 
+
+As a benefit the simplification of board design, MSI allows board 
+designers to remove out of band interrupt routing. MSI is another 
+step towards a legacy-free environment.
+
+Due to increasing pressure on chipset and processor packages to 
+reduce pin count, the need for interrupt pins is expected to 
+diminish over time. Devices, due to pin constraints, may implement
+messages to increase performance. 
+
+PCI Express endpoints uses INTx emulation (in-band messages) instead
+of IRQ pin assertion. Using INTx emulation requires interrupt 
+sharing among devices connected to the same node (PCI bridge) while
+MSI is unique (non-shared) and does not require BIOS configuration
+support. As a result, the PCI Express technology requires MSI 
+support for better interrupt performance.
+ 
+Using MSI enables the device functions to support two or more 
+vectors, which can be configure to target different CPU's to 
+increase scalability. 
+
+5. Configuring a driver to use MSI/MSI-X
+
+By default, the kernel will not enable MSI/MSI-X on all devices that
+support this capability once the patch is installed. A kernel 
+configuration option must be selected to enable MSI/MSI-X support.
+
+5.1 Including MSI support into the kernel
+
+To include MSI support into the kernel requires users to patch the
+VECTOR-base patch first and then the MSI patch because the MSI 
+support needs VECTOR based scheme. Once these patches are installed,
+setting CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and 
+the option for MSI-capable device drivers to selectively enable MSI
+(using pci_enable_msi as desribed below).
+
+Since the target of the inbound message is the local APIC, providing
+CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC
+is enabled or not.
+
+int pci_enable_msi(struct pci_dev *)
+ 
+With this new API, any existing device driver, which like to have 
+MSI enabled on its device function, must call this explicitly. A 
+successful call will initialize the MSI/MSI-X capability structure
+with ONE vector, regardless of whether the device function is 
+capable of supporting multiple messages. This vector replaces the
+pre-assigned dev->irq with a new MSI vector. To avoid the conflict
+of new assigned vector with existing pre-assigned vector requires
+the device driver to call this API before calling request_irq(...).
+
+The below diagram shows the events, which switches the interrupt
+mode on the MSI-capable device function between MSI mode and 
+PIN-IRQ assertion mode.
+
+	 ------------   pci_enable_msi 	 ------------------------
+	|	     | <===============	| 			 |
+	| MSI MODE   |	  	     	| PIN-IRQ ASSERTION MODE |	
+	| 	     | ===============>	|			 |	
+ 	 ------------	free_irq      	 ------------------------
+
+5.2 Configuring for MSI support
+
+Due to the non-contiguous fashion in vector assignment of the 
+existing Linux kernel, this patch does not support multiple 
+messages regardless of the device function is capable of supporting
+more than one vector. The bus driver initializes only entry 0 of
+this capability if pci_enable_msi(...) is called successfully by 
+the device driver.
+
+5.3 Configuring for MSI-X support
+
+Both the MSI capability structure and the MSI-X capability structure
+share the same above semantics; however, due to the ability of the
+system software to configure each vector of the MSI-X capability
+structure with an independent message address and message data, the
+non-contiguous fashion in vector assignment of the existing Linux 
+kernel has no impact on supporting multiple messages on an MSI-X 
+capable device functions. By default, as mentioned above, ONE vector
+should be always allocated to the MSI-X capability structure at 
+entry 0. The bus driver does not initialize other entries of the 
+MSI-X table.
+ 
+Note that the PCI subsystem should have full control of a MSI-X 
+table that resides in Memory Space. The software device driver 
+should not access this table. 
+
+To request for additional vectors, the device software driver should
+call function msi_alloc_vectors(). It is recommended that the 
+software driver should call this function once during the 
+initialization phase of the device driver.
+
+The function msi_alloc_vectors(), once invoked, enables either 
+all or nothing, depending on the current availability of vector 
+resources. If no vector resources are available, the device function
+still works with ONE vector. If the vector resources are available
+for the number of vectors requested by the driver, this function 
+will reconfigure the MSI-X capability structure of the device with
+additional messages, starting from entry 1. To emphasize this
+reason, for example, the device may be capable for supporting the 
+maximum of 32 vectors while its software driver usually may request
+4 vectors.
+
+For each vector, after this successful call, the device driver is 
+responsible to call other functions like request_irq(), enable_irq(),
+etc. to enable this vector with its corresponding interrupt service
+handler. It is the device driver's choice to have all vectors shared
+the same interrupt service handler or each vector with a unique 
+interrupt service handler. 
+
+In addition to the function msi_alloc_vectors(), another function 
+msi_free_vectors() is provided to allow the software driver to 
+release a number of vectors back to the vector resources. Once 
+invoked, the PCI subsystem disables (masks) each vector released. 
+These vectors are no longer valid for the hardware device and its
+software driver to use. Like free_irq, it recommends that the
+device driver should also call msi_free_vectors to release all
+additional vectors previously requested.
+
+int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec)
+
+This API enables the software driver to request the PCI subsystem 
+for additional messages. Depending on the number of vectors 
+available, the PCI subsystem enables either all or nothing. 
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+requested.
+A return of zero indicates that the number of allocated vector is 
+successfully allocated. Otherwise, indicate resources not 
+available.
+
+int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+
+This API enables the software driver to inform the PCI subsystem
+that it is willing to release a number of vectors back to the 
+MSI resource pool. Once invoked, the PCI subsystem disables each
+MSI-X entry associated with each vector stored in the argument 2.
+These vectors are no longer valid for the hardware device and 
+its software driver to use.
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+released. 
+A return of zero indicates that the number of allocated vectors 
+is successfully released. Otherwise, indicates a failure.
+
+5.4 Hardware requirements for MSI support
+MSI support requires support from both system hardware and 
+individual hardware device functions.
+
+5.4.1 System hardware support
+Since the target of MSI address is the local APIC CPU, enabling
+MSI support in Linux kernel is dependent on whether existing 
+system hardware supports local APIC. Users should verify their
+system whether it runs when CONFIG_X86_LOCAL_APIC=y.  
+
+In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
+however, in UP environment, users must manually set 
+CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting 
+CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and 
+the option for MSI-capable device drivers to selectively enable 
+MSI (using pci_enable_msi as desribed below).
+ 
+Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI
+vector is allocated new during runtime and MSI support does not
+depend on BIOS support. This key independency enables MSI support
+on future IOxAPIC free platform.
+
+5.4.2 Device hardware support
+The hardware device function supports MSI by indicating the 
+MSI/MSI-X capability structure on its PCI capability list. By
+default, this capability structure will not be initialized by
+the kernel to enable MSI during the system boot. In other words,
+the device function is running on its default pin assertion mode.
+Note that in many cases the hardware supporting MSI have bugs,
+which may result in system hang. The software driver of specific
+MSI-capable hardware is responsible for whether calling 
+pci_enable_msi or not. A return of zero indicates the kernel 
+successfully initializes the MSI/MSI-X capability structure of the
+device funtion. The device function is now running on MSI mode.
+ 
+5.5 How to tell whether MSI is enabled on device function
+
+At the driver level, a return of zero from pci_enable_msi(...) 
+indicates to the device driver that its device function is 
+initialized successfully and ready to run in MSI mode.
+
+At the user level, users can use command 'cat /proc/interrupts' 
+to display the vector allocated for the device and its interrupt
+mode, as shown below. 
+
+           CPU0       CPU1       
+  0:     324639          0    IO-APIC-edge  timer
+  1:       1186          0    IO-APIC-edge  i8042
+  2:          0          0          XT-PIC  cascade
+ 12:       2797          0    IO-APIC-edge  i8042
+ 14:       6543          0    IO-APIC-edge  ide0
+ 15:          1          0    IO-APIC-edge  ide1
+169:          0          0   IO-APIC-level  uhci-hcd
+185:          0          0   IO-APIC-level  uhci-hcd
+193:        138         10         PCI MSI  aic79xx
+201:         30          0         PCI MSI  aic79xx
+225:         30          0   IO-APIC-level  aic7xxx
+233:         30          0   IO-APIC-level  aic7xxx
+NMI:          0          0 
+LOC:     324553     325068 
+ERR:          0
+MIS:          0
+
+6. FAQ
+
+Q1. Are there any limitations on using the MSI?
+
+A1. If the PCI device supports MSI and conforms to the
+specification and the platform supports the APIC local bus,
+then using MSI should work.
+
+Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
+AMD processors)? In P3 IPI's are transmitted on the APIC local
+bus and in P4 and Xeon they are transmitted on the system
+bus. Are there any implications with this?
+
+A2. MSI support enables a PCI device sending an inbound
+memory write (0xfeexxxxx as target address) on its PCI bus
+directly to the FSB. Since the message address has a
+redirection hint bit cleared, it should work.
+
+Q3. The target address 0xfeexxxxx will be translated by the
+Host Bridge into an interrupt message. Are there any
+limitations on the chipsets such as Intel 8xx, Intel e7xxx,
+or VIA?
+
+A3. If these chipsets support an inbound memory write with
+target address set as 0xfeexxxxx, as conformed to PCI 
+specification 2.3 or latest, then it should work.
+
+Q4. From the driver point of view, if the MSI is lost because
+of the errors occur during inbound memory write, then it may
+wait for ever. Is there a mechanism for it to recover?
+
+A4. Since the target of the transaction is an inbound memory
+write, all transaction termination conditions (Retry,
+Master-Abort, Target-Abort, or normal completion) are
+supported. A device sending an MSI must abide by all the PCI
+rules and conditions regarding that inbound memory write. So,
+if a retry is signaled it must retry, etc... We believe that
+the recommendation for Abort is also a retry (refer to PCI
+specification 2.3 or latest).
diff -urN linux-260t6-vectorbase-testbox/drivers/acpi/pci_irq.c linux-260t6-msi-testbox/drivers/acpi/pci_irq.c
--- linux-260t6-vectorbase-testbox/drivers/acpi/pci_irq.c	2003-09-27 20:50:19.000000000 -0400
+++ linux-260t6-msi-testbox/drivers/acpi/pci_irq.c	2003-09-30 19:09:30.000000000 -0400
@@ -234,7 +234,7 @@
                           PCI Interrupt Routing Support
    -------------------------------------------------------------------------- */
 
-static int
+int
 acpi_pci_irq_lookup (struct pci_bus *bus, int device, int pin)
 {
 	struct acpi_prt_entry	*entry = NULL;
diff -urN linux-260t6-vectorbase-testbox/drivers/pci/Makefile linux-260t6-msi-testbox/drivers/pci/Makefile
--- linux-260t6-vectorbase-testbox/drivers/pci/Makefile	2003-09-27 20:51:00.000000000 -0400
+++ linux-260t6-msi-testbox/drivers/pci/Makefile	2003-09-30 19:09:30.000000000 -0400
@@ -28,6 +28,7 @@
 obj-$(CONFIG_SGI_IP27) += setup-irq.o
 obj-$(CONFIG_SGI_IP32) += setup-irq.o
 obj-$(CONFIG_X86_VISWS) += setup-irq.o
+obj-$(CONFIG_PCI_USE_VECTOR) += msi.o
 
 # Cardbus & CompactPCI use setup-bus
 obj-$(CONFIG_HOTPLUG) += setup-bus.o
diff -urN linux-260t6-vectorbase-testbox/drivers/pci/msi.c linux-260t6-msi-testbox/drivers/pci/msi.c
--- linux-260t6-vectorbase-testbox/drivers/pci/msi.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t6-msi-testbox/drivers/pci/msi.c	2003-10-02 12:06:58.941507640 -0400
@@ -0,0 +1,1061 @@
+/*
+ * linux/drivers/pci/msi.c
+ */
+
+#include <linux/mm.h>
+#include <linux/irq.h>		
+#include <linux/interrupt.h>
+#include <linux/init.h>		
+#include <linux/config.h>      	
+#include <linux/ioport.h>       
+#include <linux/smp_lock.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/acpi.h>
+
+#include <asm/errno.h>
+#include <asm/io.h>		
+#include <asm/smp.h>
+#include <asm/desc.h>		
+#include <mach_apic.h>
+
+#include <linux/pci_msi.h>
+
+_DEFINE_DBG_BUFFER
+
+static spinlock_t msi_lock = SPIN_LOCK_UNLOCKED;
+static struct msi_desc* msi_desc[NR_IRQS] = { [0 ... NR_IRQS-1] = NULL };
+static kmem_cache_t* msi_cachep;
+
+static int pci_msi_enable = 1;
+static int nr_alloc_vectors = 0;
+static int nr_released_vectors = 0;
+static int nr_reserved_vectors = NR_HP_RESERVED_VECTORS;
+static int nr_msix_devices = 0;
+
+#ifndef CONFIG_X86_IO_APIC
+int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
+int irq_vector[NR_IRQS] = { FIRST_DEVICE_VECTOR , 0 };
+#endif
+
+static void msi_cache_ctor(void *p, kmem_cache_t *cache, unsigned long flags)
+{
+	memset(p, 0, NR_IRQS * sizeof(struct msi_desc));
+}
+
+static int msi_cache_init(void)
+{
+	msi_cachep = kmem_cache_create("msi_cache", 
+			NR_IRQS * sizeof(struct msi_desc),
+		       	0, SLAB_HWCACHE_ALIGN, msi_cache_ctor, NULL);
+	if (!msi_cachep)
+		return -ENOMEM;
+	
+	return 0;
+}
+
+static void msi_set_mask_bit(unsigned int vector, int flag)
+{
+	struct msi_desc *entry;
+	
+	entry = (struct msi_desc *)msi_desc[vector];
+	if (!entry || !entry->dev || !entry->mask_base)
+		return;
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+	{
+		int		pos;
+		unsigned int	mask_bits;
+
+		pos = entry->mask_base;
+	        entry->dev->bus->ops->read(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, &mask_bits);
+		mask_bits &= ~(1);
+		mask_bits |= flag;
+	        entry->dev->bus->ops->write(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, mask_bits);
+		break;
+	}
+	case PCI_CAP_ID_MSIX:
+	{
+		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+		writel(flag, entry->mask_base + offset);
+		break;
+	}
+	default:
+		break;
+	} 
+}
+
+#ifdef CONFIG_SMP
+static void set_msi_affinity(unsigned int vector, unsigned long cpu_mask)
+{
+	struct msi_desc *entry;
+	struct msg_address address;	
+	unsigned int dest_id;
+
+	entry = (struct msi_desc *)msi_desc[vector];
+	if (!entry || !entry->dev) 
+		return;
+	
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+	{
+		int pos;
+
+   		if (!(pos = pci_find_capability(entry->dev, PCI_CAP_ID_MSI)))
+			return;
+	       	
+	        entry->dev->bus->ops->read(entry->dev->bus, entry->dev->devfn, 
+			msi_lower_address_reg(pos), 4, 
+			&address.lo_address.value);
+		dest_id = (address.lo_address.u.dest_id & 
+			MSI_ADDRESS_HEADER_MASK) |
+			(cpu_mask_to_apicid(cpu_mask) << MSI_TARGET_CPU_SHIFT);
+		address.lo_address.u.dest_id = dest_id; 
+		entry->msi_attrib.current_cpu = cpu_mask_to_apicid(cpu_mask); 
+		entry->dev->bus->ops->write(entry->dev->bus, entry->dev->devfn,
+			msi_lower_address_reg(pos), 4, 
+			address.lo_address.value);
+		break;
+	}
+	case PCI_CAP_ID_MSIX:
+	{
+		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET;
+
+		address.lo_address.value = readl(entry->mask_base + offset);
+		dest_id = (address.lo_address.u.dest_id & 
+			MSI_ADDRESS_HEADER_MASK) |
+			(cpu_mask_to_apicid(cpu_mask) << MSI_TARGET_CPU_SHIFT);
+		address.lo_address.u.dest_id = dest_id; 
+		entry->msi_attrib.current_cpu = cpu_mask_to_apicid(cpu_mask); 
+		writel(address.lo_address.value, entry->mask_base + offset);
+		break;
+	}
+	default:
+		break;
+	}
+}
+
+static inline void move_msi(int vector)
+{
+	if (unlikely(pending_irq_balance_cpumask[vector])) {
+		set_msi_affinity(vector, pending_irq_balance_cpumask[vector]);
+		pending_irq_balance_cpumask[vector] = 0;
+	}
+}
+#endif
+
+static void mask_MSI_irq(unsigned int vector)
+{
+	msi_set_mask_bit(vector, 1);
+}
+
+static void unmask_MSI_irq(unsigned int vector)
+{
+	msi_set_mask_bit(vector, 0);
+}
+
+static unsigned int startup_msi_irq_wo_maskbit(unsigned int vector) 
+{
+	return 0;	/* never anything pending */
+}
+
+static void pci_disable_msi(unsigned int vector);
+static void shutdown_msi_irq(unsigned int vector) 
+{
+	pci_disable_msi(vector);
+}
+
+#define shutdown_msi_irq_wo_maskbit	shutdown_msi_irq 
+static void enable_msi_irq_wo_maskbit(unsigned int vector) {}
+static void disable_msi_irq_wo_maskbit(unsigned int vector) {}
+static void ack_msi_irq_wo_maskbit(unsigned int vector) {}
+static void end_msi_irq_wo_maskbit(unsigned int vector) 
+{
+	move_msi(vector);
+	ack_APIC_irq();	
+}
+
+static unsigned int startup_msi_irq_w_maskbit(unsigned int vector)
+{ 
+	unmask_MSI_irq(vector);
+	return 0;	/* never anything pending */
+}
+
+#define shutdown_msi_irq_w_maskbit	shutdown_msi_irq
+#define enable_msi_irq_w_maskbit	unmask_MSI_irq
+#define disable_msi_irq_w_maskbit	mask_MSI_irq
+#define ack_msi_irq_w_maskbit		mask_MSI_irq
+
+static void end_msi_irq_w_maskbit(unsigned int vector) 
+{
+	move_msi(vector);
+	unmask_MSI_irq(vector);	
+	ack_APIC_irq();
+}
+
+/*
+ * Interrupt Type for MSI-X PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI-X Capability Structure. 
+ */
+static struct hw_interrupt_type msix_irq_type = {
+	.typename	= "PCI MSI-X",
+	.startup	= startup_msi_irq_w_maskbit,
+	.shutdown	= shutdown_msi_irq_w_maskbit,
+	.enable		= enable_msi_irq_w_maskbit,
+	.disable	= disable_msi_irq_w_maskbit,
+	.ack		= ack_msi_irq_w_maskbit,		
+	.end		= end_msi_irq_w_maskbit,	
+	.set_affinity	= set_msi_irq_affinity
+};
+
+/*
+ * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI Capability Structure with 
+ * Mask-and-Pending Bits. 
+ */
+static struct hw_interrupt_type msi_irq_w_maskbit_type = {
+	.typename	= "PCI MSI",
+	.startup	= startup_msi_irq_w_maskbit,
+	.shutdown	= shutdown_msi_irq_w_maskbit,
+	.enable		= enable_msi_irq_w_maskbit,
+	.disable	= disable_msi_irq_w_maskbit,
+	.ack		= ack_msi_irq_w_maskbit,		
+	.end		= end_msi_irq_w_maskbit,	
+	.set_affinity	= set_msi_irq_affinity
+};
+
+/*
+ * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI Capability Structure without 
+ * Mask-and-Pending Bits. 
+ */
+static struct hw_interrupt_type msi_irq_wo_maskbit_type = {
+	.typename	= "PCI MSI",
+	.startup	= startup_msi_irq_wo_maskbit,
+	.shutdown	= shutdown_msi_irq_wo_maskbit,
+	.enable		= enable_msi_irq_wo_maskbit,
+	.disable	= disable_msi_irq_wo_maskbit,
+	.ack		= ack_msi_irq_wo_maskbit,		
+	.end		= end_msi_irq_wo_maskbit,	
+	.set_affinity	= set_msi_irq_affinity
+};
+
+static void msi_data_init(struct msg_data *msi_data, 
+			  unsigned int vector)
+{
+	memset(msi_data, 0, sizeof(struct msg_data));
+	msi_data->vector = (u8)vector;
+	msi_data->delivery_mode = MSI_DELIVERY_MODE;
+	msi_data->level = MSI_LEVEL_MODE;
+	msi_data->trigger = MSI_TRIGGER_MODE;	
+}
+
+static void msi_address_init(struct msg_address *msi_address)
+{
+	unsigned int	dest_id;
+	
+	memset(msi_address, 0, sizeof(struct msg_address));
+	msi_address->hi_address = (u32)0;
+	dest_id = (MSI_ADDRESS_HEADER << MSI_ADDRESS_HEADER_SHIFT) |
+		 (MSI_TARGET_CPU << MSI_TARGET_CPU_SHIFT);
+	msi_address->lo_address.u.dest_mode = MSI_LOGICAL_MODE;
+	msi_address->lo_address.u.redirection_hint = MSI_REDIRECTION_HINT_MODE;
+	msi_address->lo_address.u.dest_id = dest_id; 
+}
+
+static int pci_vector_resources(void) 					
+{	
+	static int res = -EINVAL;
+	int nr_free_vectors;
+	
+	if (res == -EINVAL) {
+		int i, repeat;
+		for (i = NR_REPEATS; i > 0; i--) {				
+			if ((FIRST_DEVICE_VECTOR + i * 8) > FIRST_SYSTEM_VECTOR)
+				continue; 					
+			break; 							
+		} 								
+		i++; 								
+		repeat = (FIRST_SYSTEM_VECTOR - FIRST_DEVICE_VECTOR)/i; 	
+		res = i * repeat - NR_RESERVED_VECTORS + 1; 
+	}
+
+	nr_free_vectors = res + nr_released_vectors - nr_alloc_vectors;	
+
+	return nr_free_vectors;
+}
+
+int assign_irq_vector(int irq)
+{
+	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+	
+	if (irq != MSI_AUTO && IO_APIC_VECTOR(irq) > 0)
+		return IO_APIC_VECTOR(irq);
+next:
+	current_vector += 8;
+	if (current_vector == SYSCALL_VECTOR)
+		goto next;
+
+	if (current_vector > FIRST_SYSTEM_VECTOR) {
+		offset++;
+		current_vector = FIRST_DEVICE_VECTOR + offset;
+	}
+
+	if (current_vector == FIRST_SYSTEM_VECTOR) 
+		return -ENOSPC;
+	
+	vector_irq[current_vector] = irq;
+	if (irq != MSI_AUTO) 
+		IO_APIC_VECTOR(irq) = current_vector;
+
+	nr_alloc_vectors++;
+
+	return current_vector;
+}
+
+static int assign_msi_vector(void)
+{
+	static int new_vector_avail = 1; 	
+	int vector;
+	unsigned long flags;
+	
+	/* 
+	 * msi_lock is provided to ensure that successful allocation of MSI
+	 * vector is assigned unique among drivers.
+	 */
+	spin_lock_irqsave(&msi_lock, flags);
+	if (!(pci_vector_resources() > 0)) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+
+	if (!new_vector_avail) {
+		/*
+	 	 * vector_irq[] = -1 indicates that this specific vector is:
+	 	 * - assigned for MSI (since MSI have no associated IRQ) or
+	 	 * - assigned for legacy if less than 16, or
+	 	 * - having no corresponding 1:1 vector-to-IOxAPIC IRQ mapping
+	 	 * vector_irq[] = 0 indicates that this vector, previously 
+		 * assigned for MSI, is freed by hotplug removed operations. 
+		 * This vector will be reused for any subsequent hotplug added 
+		 * operations.
+	 	 * vector_irq[] > 0 indicates that this vector is assigned for 
+		 * IOxAPIC IRQs. This vector and its value provides a 1-to-1 
+		 * vector-to-IOxAPIC IRQ mapping.
+	 	 */   
+		for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) {
+			if (vector_irq[vector] != 0) 
+				continue;
+			vector_irq[vector] = -1;	
+			nr_released_vectors--;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			return vector;
+		}
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}	
+
+	vector = assign_irq_vector(MSI_AUTO);
+	if (vector  == (FIRST_SYSTEM_VECTOR - 8))  
+		new_vector_avail = 0;
+
+	spin_unlock_irqrestore(&msi_lock, flags);
+	return vector;
+}
+
+static int get_new_vector(void)
+{
+	int vector;
+	
+	if ((vector = assign_msi_vector()) > 0)
+		set_intr_gate(vector, interrupt[vector]);
+
+	return vector;
+}
+
+static int msi_init(void)
+{
+	static int status = -ENOMEM;
+	
+	if (!status)
+		return status;
+
+	if ((status = msi_cache_init()) < 0) {
+		pci_msi_enable = 0;
+		printk(KERN_INFO "WARNING: MSI INIT FAILURE\n");
+		return status;
+	}
+	printk(KERN_INFO "MSI INIT SUCCESS\n");
+	
+	return status;
+}
+	
+static int get_msi_vector(struct pci_dev *dev)
+{
+	return get_new_vector();
+}
+
+static struct msi_desc* alloc_msi_entry(void)
+{
+	struct msi_desc *entry;
+
+	entry = (struct msi_desc*) kmem_cache_alloc(msi_cachep, SLAB_KERNEL);
+	if (!entry)
+		return NULL;
+
+	memset(entry, 0, sizeof(struct msi_desc));
+	entry->link.tail = entry->link.head = 0;	/* single message */
+	entry->dev = NULL;
+	
+	return entry;
+}
+
+static void attach_msi_entry(struct msi_desc *entry, int vector)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	msi_desc[vector] = entry;
+	spin_unlock_irqrestore(&msi_lock, flags);
+}
+
+static void irq_handler_init(int cap_id, int pos, int mask)
+{
+	spin_lock(&irq_desc[pos].lock);
+	if (cap_id == PCI_CAP_ID_MSIX)
+		irq_desc[pos].handler = &msix_irq_type;
+	else {
+		if (!mask)
+			irq_desc[pos].handler = &msi_irq_wo_maskbit_type;
+		else
+			irq_desc[pos].handler = &msi_irq_w_maskbit_type;
+	}
+	spin_unlock(&irq_desc[pos].lock);
+}
+
+static void enable_msi_mode(struct pci_dev *dev, int pos, int type)
+{
+	u32 control;
+			
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msi_control_reg(pos), 2, &control);
+	if (type == PCI_CAP_ID_MSI) {
+		/* Set enabled bits to single MSI & enable MSI_enable bit */
+		msi_enable(control, 1);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	} else {
+		msix_enable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	}
+    	if (pci_find_capability(dev, PCI_CAP_ID_EXP)) {
+		/* PCI Express Endpoint device detected */
+		u32 cmd;
+	        dev->bus->ops->read(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
+		cmd |= PCI_COMMAND_INTX_DISABLE;
+	        dev->bus->ops->write(dev->bus, dev->devfn, PCI_COMMAND, 2, cmd);
+	}
+}
+
+static void disable_msi_mode(struct pci_dev *dev, int pos, int type)
+{
+	u32 control;
+	
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msi_control_reg(pos), 2, &control);
+	if (type == PCI_CAP_ID_MSI) {
+		/* Set enabled bits to single MSI & enable MSI_enable bit */
+		msi_disable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	} else {
+		msix_disable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	}
+    	if (pci_find_capability(dev, PCI_CAP_ID_EXP)) {
+		/* PCI Express Endpoint device detected */
+		u32 cmd;
+	        dev->bus->ops->read(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
+		cmd &= ~PCI_COMMAND_INTX_DISABLE;
+	        dev->bus->ops->write(dev->bus, dev->devfn, PCI_COMMAND, 2, cmd);
+	}
+}
+
+static int msi_lookup_vector(struct pci_dev *dev)
+{
+	int vector;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) {
+		if (!msi_desc[vector] || msi_desc[vector]->dev != dev ||
+			msi_desc[vector]->msi_attrib.entry_nr ||
+			msi_desc[vector]->msi_attrib.default_vector != dev->irq)
+			continue;	/* not entry 0, skip */
+		spin_unlock_irqrestore(&msi_lock, flags);
+		/* This pre-assigned entry-0 MSI vector for this device 
+		   already exits. Override dev->irq with this vector */ 
+		dev->irq = vector;	
+		return 0;
+	}
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	return -EACCES;
+}
+
+void pci_scan_msi_device(struct pci_dev *dev)
+{
+	if (!dev)
+		return;
+
+   	if (pci_find_capability(dev, PCI_CAP_ID_MSIX) > 0) {
+		nr_reserved_vectors++;
+		nr_msix_devices++;
+	} else if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0)
+		nr_reserved_vectors++;
+}
+
+/**
+ * msi_capability_init - configure device's MSI capability structure
+ * @dev: pointer to the pci_dev data structure of MSI device function
+ * 
+ * Setup the MSI capability structure of device funtion with a single
+ * MSI vector, regardless of device function is capable of handling
+ * multiple messages. A return of zero indicates the successful setup
+ * of an entry zero with the new MSI vector or non-zero for otherwise.  
+ **/
+static int msi_capability_init(struct pci_dev *dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int pos, vector;
+	u32 control;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+	if (!pos)
+		return -EINVAL; 
+
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 
+		2, &control);
+	if (control & PCI_MSI_FLAGS_ENABLE)
+		return 0;
+	
+	vector = dev->irq;
+	if (vector > 0 && !msi_lookup_vector(dev)) {
+		/* Lookup Sucess */
+		enable_msi_mode(dev, pos, PCI_CAP_ID_MSI);
+		return 0;
+	}
+	/* MSI Entry Initialization */
+	if (!(entry = alloc_msi_entry()))
+		return -ENOMEM;
+	
+	if ((vector = get_msi_vector(dev)) < 0) {
+		kmem_cache_free(msi_cachep, entry);
+		return -EBUSY;
+	}
+	entry->msi_attrib.type = PCI_CAP_ID_MSI;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = is_mask_bit_support(control);
+	entry->msi_attrib.default_vector = dev->irq;
+	dev->irq = vector;	/* save default pre-assigned ioapic vector */
+	entry->dev = dev;
+	if (is_mask_bit_support(control)) { 
+		entry->mask_base = msi_mask_bits_reg(pos, 
+				is_64bit_address(control));
+	}
+	/* Replace with MSI handler */
+	irq_handler_init(PCI_CAP_ID_MSI, vector, entry->msi_attrib.maskbit);
+	/* Configure MSI capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = ((address.lo_address.u.dest_id >>
+				MSI_TARGET_CPU_SHIFT) & MSI_TARGET_CPU_MASK);
+	dev->bus->ops->write(dev->bus, dev->devfn, msi_lower_address_reg(pos), 
+				4, address.lo_address.value);
+	if (is_64bit_address(control)) {
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_upper_address_reg(pos), 4, address.hi_address);
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 1), 2, *((u32*)&data));
+	} else
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 0), 2, *((u32*)&data));
+	if (entry->msi_attrib.maskbit) {
+		unsigned int maskbits, temp;
+		/* All MSIs are unmasked by default, Mask them all */
+	        dev->bus->ops->read(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			&maskbits);
+		temp = (1 << multi_msi_capable(control));
+		temp = ((temp - 1) & ~temp);
+		maskbits |= temp;
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			maskbits);
+	}
+	attach_msi_entry(entry, vector);
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSI);
+
+	return 0;
+}
+
+/**
+ * msix_capability_init - configure device's MSI-X capability 
+ * @dev: pointer to the pci_dev data structure of MSI-X device function
+ * 
+ * Setup the MSI-X capability structure of device funtion with a
+ * single MSI-X vector. A return of zero indicates the successful setup
+ * of an entry zero with the new MSI-X vector or non-zero for otherwise.  
+ * To request for additional MSI-X vectors, the device drivers are
+ * required to utilize the following supported APIs:
+ * 1) msi_alloc_vectors(...) for requesting one or more MSI-X vectors
+ * 2) msi_free_vectors(...) for releasing one or more MSI-X vectors 
+ *    back to PCI subsystem before calling free_irq(...)
+ **/
+static int msix_capability_init(struct pci_dev	*dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int vector, pos, dev_msi_cap;
+	u32 phys_addr, table_offset;
+	u32 control;
+	u8 bir;
+	void *base;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	if (!pos)
+		return -EINVAL;  
+
+	/* Request & Map MSI-X table region */
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 2, 
+		&control);
+	if (control & PCI_MSIX_FLAGS_ENABLE)
+		return 0;
+
+	vector = dev->irq;
+	if (vector > 0 && !msi_lookup_vector(dev)) {
+		/* Lookup Sucess */
+		enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX);
+		return 0;
+	}
+
+	dev_msi_cap = multi_msix_capable(control);
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msix_table_offset_reg(pos), 4, &table_offset);
+	bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+	phys_addr = pci_resource_start (dev, bir);
+	phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+	if (!request_mem_region(phys_addr, 
+		dev_msi_cap * PCI_MSIX_ENTRY_SIZE,
+		"MSI-X iomap Failure"))
+		return -ENOMEM;
+	base = ioremap_nocache(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+	if (base == NULL) 
+		goto free_region; 
+	/* MSI Entry Initialization */
+	entry = alloc_msi_entry();
+	if (!entry) 
+		goto free_iomap; 
+	if ((vector = get_msi_vector(dev)) < 0)
+		goto free_entry;
+       	
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = 1;
+	entry->msi_attrib.default_vector = dev->irq;
+	dev->irq = vector;	/* save default pre-assigned ioapic vector */
+	entry->dev = dev;
+	entry->mask_base = (unsigned long)base; 
+	/* Replace with MSI handler */
+	irq_handler_init(PCI_CAP_ID_MSIX, vector, 1);
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = ((address.lo_address.u.dest_id >>
+				MSI_TARGET_CPU_SHIFT) & MSI_TARGET_CPU_MASK);
+	writel(address.lo_address.value, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	writel(address.hi_address, base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+	writel(*(u32*)&data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
+	/* Initialize all entries from 1 up to 0 */
+	for (pos = 1; pos < dev_msi_cap; pos++) {
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_DATA_OFFSET);
+	}
+	attach_msi_entry(entry, vector);
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX);
+
+	return 0;
+
+free_entry:
+	kmem_cache_free(msi_cachep, entry);
+free_iomap:
+	iounmap(base);
+free_region: 
+	release_mem_region(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+
+	return ((vector < 0) ? -EBUSY : -ENOMEM);
+}
+
+/** 
+ * pci_enable_msi - configure device's MSI(X) capability structure
+ * @dev: pointer to the pci_dev data structure of MSI(X) device function
+ *
+ * Setup the MSI/MSI-X capability structure of device function with
+ * a single MSI(X) vector upon its software driver call to request for
+ * MSI(X) mode enabled on its hardware device function. A return of zero
+ * indicates the successful setup of an entry zero with the new MSI(X)
+ * vector or non-zero for otherwise.  
+ **/
+int pci_enable_msi(struct pci_dev* dev)
+{
+	int status = -EINVAL;
+
+	if (!pci_msi_enable || !dev)
+ 		return status;	
+	
+	if (msi_init() < 0)
+		return -ENOMEM;
+		
+	if ((status = msix_capability_init(dev)) == -EINVAL)
+		status = msi_capability_init(dev);
+	if (!status)
+		nr_reserved_vectors--;
+
+	return status;
+}
+
+static int msi_free_vector(struct pci_dev* dev, int vector);
+static void pci_disable_msi(unsigned int vector)
+{
+	int head, tail, type, default_vector;
+	struct msi_desc *entry;
+	struct pci_dev *dev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[vector];
+	if (!entry || !entry->dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return;
+	}
+	dev = entry->dev;
+	type = entry->msi_attrib.type;
+	head = entry->link.head;
+	tail = entry->link.tail;
+	default_vector = entry->msi_attrib.default_vector;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	disable_msi_mode(dev, pci_find_capability(dev, type), type);
+	/* Restore dev->irq to its default pin-assertion vector */
+	dev->irq = default_vector;
+	if (type == PCI_CAP_ID_MSIX && head != tail) {
+		/* Bad driver, which do not call msi_free_vectors before exit. 
+		   We must do a cleanup here */
+		while (1) {
+			spin_lock_irqsave(&msi_lock, flags);
+			entry = msi_desc[vector];
+			head = entry->link.head;
+			tail = entry->link.tail;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			if (tail == head)
+				break;
+			if (msi_free_vector(dev, entry->link.tail))
+				break;
+		}
+	}
+}
+
+static int msi_alloc_vector(struct pci_dev* dev, int head)
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int i, offset, pos, dev_msi_cap, vector;
+	u32 low_address, control;
+	unsigned long base = 0L;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	base = entry->mask_base; 
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos),	
+		2, &control);
+	dev_msi_cap = multi_msix_capable(control);
+	for (i = 1; i < dev_msi_cap; i++) {
+		if (!(low_address = readl(base + i * PCI_MSIX_ENTRY_SIZE)))
+			 break;
+	}
+	if (i >= dev_msi_cap) 
+		return -EINVAL;
+	
+	/* MSI Entry Initialization */
+	if (!(entry = alloc_msi_entry()))
+		return -ENOMEM;
+		
+	if ((vector = get_new_vector()) < 0) {
+		kmem_cache_free(msi_cachep, entry);
+		return vector;
+	}
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = i;
+	entry->msi_attrib.maskbit = 1;
+	entry->dev = dev;
+	entry->link.head = head;
+	entry->mask_base = base; 
+	irq_handler_init(PCI_CAP_ID_MSIX, vector, 1);
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = ((address.lo_address.u.dest_id >>
+				MSI_TARGET_CPU_SHIFT) & MSI_TARGET_CPU_MASK);
+	offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+	writel(address.lo_address.value, base + offset + 
+		PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	writel(address.hi_address, base + offset +
+		PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+	writel(*(u32*)&data, base + offset + PCI_MSIX_ENTRY_DATA_OFFSET);
+	writel(1, base + offset + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+	attach_msi_entry(entry, vector);
+	
+	return vector;
+}
+
+static int msi_free_vector(struct pci_dev* dev, int vector)
+{
+	struct msi_desc *entry;
+	int entry_nr, type;
+	unsigned long base = 0L;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[vector];
+	if (!entry || entry->dev != dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	type = entry->msi_attrib.type;
+	entry_nr = entry->msi_attrib.entry_nr;
+	base = entry->mask_base;
+	if (entry->link.tail != entry->link.head) {
+		msi_desc[entry->link.head]->link.tail = entry->link.tail;
+		if (entry->link.tail) 
+			msi_desc[entry->link.tail]->link.head = entry->link.head; 
+	}
+	entry->dev = NULL;
+	vector_irq[vector] = 0;	
+	nr_released_vectors++;
+	msi_desc[vector] = NULL;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	kmem_cache_free(msi_cachep, entry);
+	if (type == PCI_CAP_ID_MSIX) {
+		int offset;
+
+		offset = entry_nr * PCI_MSIX_ENTRY_SIZE;
+		writel(1, base + offset + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+		writel(0, base + offset + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	}
+	
+	return 0;
+}
+
+/** 
+ * msi_alloc_vectors - allocate additional MSI-X vectors
+ * @dev: pointer to the pci_dev data structure of MSI-X device function
+ * @vector: pointer to an array of new allocated MSI-X vectors
+ * @nvec: number of MSI-X vectors requested for allocation by device driver
+ *
+ * Allocate additional MSI-X vectors requested by device driver. A 
+ * return of zero indicates the successful setup of MSI-X capability
+ * structure with new allocated MSI-X vectors or non-zero for otherwise.  
+ **/
+int msi_alloc_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i, head, pos, vec, free_vectors, alloc_vectors;
+	int *vectors = (int *)vector;
+	u32 control;
+	unsigned long flags;
+
+	if (!pci_msi_enable || !dev)
+ 		return -EINVAL;
+
+   	if (!(pos = pci_find_capability(dev, PCI_CAP_ID_MSIX)))
+ 		return -EINVAL;
+
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+	if (nvec > multi_msix_capable(control))
+		return -EINVAL;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev ||		/* legal call */
+	   entry->msi_attrib.type != PCI_CAP_ID_MSIX || /* must be MSI-X */ 
+	   entry->link.head != entry->link.tail) {	/* already multi */ 
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	/*
+	 * msi_lock is provided to ensure that enough vectors resources are
+	 * available before granting.
+	 */
+	free_vectors = pci_vector_resources(); 
+	/* Ensure that each MSI/MSI-X device has one vector reserved by 
+	   default to avoid any MSI-X driver to take all available 
+ 	   resources */
+	free_vectors -= nr_reserved_vectors;
+	/* Find the average of free vectors among MSI-X devices */
+	if (nr_msix_devices > 0)
+		free_vectors /= nr_msix_devices;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	if (nvec > free_vectors) 
+		return -EBUSY;
+
+	alloc_vectors = 0;
+	head = dev->irq;
+	for (i = 0; i < nvec; i++) {
+		if ((vec = msi_alloc_vector(dev, head)) < 0) 
+			break;
+		*(vectors + i) = vec;
+		head = vec;
+		alloc_vectors++;
+	}
+	if (alloc_vectors != nvec) {
+		for (i = 0; i < alloc_vectors; i++) {
+			vec = *(vectors + i);
+			msi_free_vector(dev, vec);
+		}
+		spin_lock_irqsave(&msi_lock, flags);
+		msi_desc[dev->irq]->link.tail = msi_desc[dev->irq]->link.head;
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+	if (nr_msix_devices > 0)
+		nr_msix_devices--;
+
+	return 0;
+}
+
+/** 
+ * msi_free_vectors - reclaim MSI-X vectors to unused state
+ * @dev: pointer to the pci_dev data structure of MSI-X device function
+ * @vector: pointer to an array of released MSI-X vectors
+ * @nvec: number of MSI-X vectors requested for release by device driver
+ *
+ * Reclaim MSI-X vectors released by device driver to unused state, 
+ * which may be used later on. A return of zero indicates the 
+ * success or non-zero for otherwise. Device driver should call this
+ * before calling function free_irq.  
+ **/
+int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return -EINVAL;
+	
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev || 
+	   	entry->msi_attrib.type != PCI_CAP_ID_MSIX || 
+		entry->link.head == entry->link.tail) {	/* Nothing to free */
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	for (i = 0; i < nvec; i++) {
+		if (*(vector + i) == dev->irq)
+			continue;/* Don't free entry 0 if mistaken by driver */
+		msi_free_vector(dev, *(vector + i)); 
+	}
+
+	return 0;
+}
+
+/**
+ * msi_remove_pci_irq_vectors - reclaim MSI(X) vectors to unused state
+ * @dev: pointer to the pci_dev data structure of MSI(X) device function
+ * 
+ * Being called during hotplug remove, from which the device funciton
+ * is hot-removed. All previous assigned MSI/MSI-X vectors, if 
+ * allocated for this device function, are reclaimed to unused state,
+ * which may be used later on.
+ **/ 
+void msi_remove_pci_irq_vectors(struct pci_dev* dev)
+{
+	struct msi_desc *entry;
+	int type;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return;
+	
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return;
+	}
+	type = entry->msi_attrib.type;
+	spin_unlock_irqrestore(&msi_lock, flags);
+		
+	msi_free_vector(dev, dev->irq);
+	if (type == PCI_CAP_ID_MSIX) {
+		int i, pos, dev_msi_cap;
+		u32 phys_addr, table_offset;
+		u32 control;
+		u8 bir;
+
+   		pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+		dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+		dev_msi_cap = multi_msix_capable(control);
+		dev->bus->ops->read(dev->bus, dev->devfn, 
+			msix_table_offset_reg(pos), 4, &table_offset);
+		bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+		phys_addr = pci_resource_start (dev, bir);
+		phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+		for (i = FIRST_DEVICE_VECTOR; i < NR_IRQS; i++) {
+			spin_lock_irqsave(&msi_lock, flags);
+			if (!msi_desc[i] || msi_desc[i]->dev != dev) {
+				spin_unlock_irqrestore(&msi_lock, flags);
+				continue;
+			}
+			spin_unlock_irqrestore(&msi_lock, flags);
+			msi_free_vector(dev, i);
+		}
+		writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+		iounmap((void*)entry->mask_base);
+		release_mem_region(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+	} 
+	nr_reserved_vectors++;
+}
+
+EXPORT_SYMBOL(pci_enable_msi);
+EXPORT_SYMBOL(msi_alloc_vectors);
+EXPORT_SYMBOL(msi_free_vectors);
diff -urN linux-260t6-vectorbase-testbox/drivers/pci/probe.c linux-260t6-msi-testbox/drivers/pci/probe.c
--- linux-260t6-vectorbase-testbox/drivers/pci/probe.c	2003-09-27 20:50:38.000000000 -0400
+++ linux-260t6-msi-testbox/drivers/pci/probe.c	2003-09-30 19:09:30.000000000 -0400
@@ -552,6 +552,7 @@
 		struct pci_dev *dev;
 
 		dev = pci_scan_device(bus, devfn);
+		pci_scan_msi_device(dev);
 		if (func == 0) {
 			if (!dev)
 				break;
diff -urN linux-260t6-vectorbase-testbox/drivers/pci/remove.c linux-260t6-msi-testbox/drivers/pci/remove.c
--- linux-260t6-vectorbase-testbox/drivers/pci/remove.c	2003-09-27 20:50:10.000000000 -0400
+++ linux-260t6-msi-testbox/drivers/pci/remove.c	2003-09-30 19:09:30.000000000 -0400
@@ -14,6 +14,8 @@
 {
 	int i;
 
+ 	msi_remove_pci_irq_vectors(dev);
+ 
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *res = dev->resource + i;
 		if (res->parent)
diff -urN linux-260t6-vectorbase-testbox/include/linux/pci.h linux-260t6-msi-testbox/include/linux/pci.h
--- linux-260t6-vectorbase-testbox/include/linux/pci.h	2003-09-27 20:50:20.000000000 -0400
+++ linux-260t6-msi-testbox/include/linux/pci.h	2003-09-30 19:09:30.000000000 -0400
@@ -36,6 +36,7 @@
 #define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
+#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
@@ -198,6 +199,8 @@
 #define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
 #define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
 #define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
+#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
+#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
 #define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF		4
@@ -275,11 +278,13 @@
 #define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
 #define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
 #define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
+#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
 #define PCI_MSI_RFU		3	/* Rest of capability flags */
 #define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
 #define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
 #define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_BIT	16	/* Mask bits register */
 
 /* CompactPCI Hotswap Register */
 
@@ -696,6 +701,18 @@
 extern struct pci_dev *isa_bridge;
 #endif
 
+#ifndef CONFIG_PCI_USE_VECTOR
+static inline void pci_scan_msi_device(struct pci_dev *dev) {}
+static inline int pci_enable_msi(struct pci_dev *dev) {return -1;}
+static inline void msi_remove_pci_irq_vectors(struct pci_dev *dev) {}
+#else
+extern void pci_scan_msi_device(struct pci_dev *dev);
+extern int pci_enable_msi(struct pci_dev *dev);
+extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
+extern int msi_alloc_vectors(struct pci_dev* dev, int *vector, int nvec);
+extern int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec);
+#endif
+
 #endif /* CONFIG_PCI */
 
 /* Include architecture-dependent settings and functions */
diff -urN linux-260t6-vectorbase-testbox/include/linux/pci_msi.h linux-260t6-msi-testbox/include/linux/pci_msi.h
--- linux-260t6-vectorbase-testbox/include/linux/pci_msi.h	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t6-msi-testbox/include/linux/pci_msi.h	2003-10-02 12:00:00.498120704 -0400
@@ -0,0 +1,193 @@
+/*
+ *	../include/linux/pci_msi.h
+ *
+ */
+
+#ifndef _ASM_PCI_MSI_H
+#define _ASM_PCI_MSI_H
+
+#include <linux/pci.h>
+
+#define MSI_AUTO -1
+#define NR_REPEATS	23
+#define NR_RESERVED_VECTORS 3 /*FIRST_DEVICE_VECTOR,FIRST_SYSTEM_VECTOR,0x80 */	
+
+/* 
+ * Assume the maximum number of hot plug slots supported by the system is about
+ * ten. The worstcase is that each of these slots is hot-added with a device, 
+ * which has two MSI/MSI-X capable functions. To avoid any MSI-X driver, which
+ * attempts to request all available vectors, NR_HP_RESERVED_VECTORS is defined
+ * as below to ensure at least one message is assigned to each detected MSI/
+ * MSI-X device function.
+ */
+#define NR_HP_RESERVED_VECTORS 	20
+
+extern int vector_irq[NR_IRQS];
+extern cpumask_t pending_irq_balance_cpumask[NR_IRQS];
+extern void (*interrupt[NR_IRQS])(void);
+
+#ifdef CONFIG_SMP
+#define set_msi_irq_affinity	set_msi_affinity
+#else
+#define set_msi_irq_affinity	NULL
+static inline void move_msi(int vector) {}
+#endif
+
+#ifndef CONFIG_X86_IO_APIC
+static inline int get_ioapic_vector(struct pci_dev *dev) { return -1;}
+static inline void restore_ioapic_irq_handler(int irq) {}
+#else
+extern void restore_ioapic_irq_handler(int irq);
+#endif
+
+/*
+ * MSI-X Address Register
+ */
+#define PCI_MSIX_FLAGS_QSIZE		0x7FF
+#define PCI_MSIX_FLAGS_ENABLE		(1 << 15)
+#define PCI_MSIX_FLAGS_BIRMASK		(7 << 0)
+#define PCI_MSIX_FLAGS_BITMASK		(1 << 0)
+
+#define PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET	0
+#define PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET	4
+#define PCI_MSIX_ENTRY_DATA_OFFSET		8
+#define PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET	12
+#define PCI_MSIX_ENTRY_SIZE			16
+
+#define msi_control_reg(base)		(base + PCI_MSI_FLAGS)
+#define msi_lower_address_reg(base)	(base + PCI_MSI_ADDRESS_LO)
+#define msi_upper_address_reg(base)	(base + PCI_MSI_ADDRESS_HI)
+#define msi_data_reg(base, is64bit)	\
+	( (is64bit == 1) ? base+PCI_MSI_DATA_64 : base+PCI_MSI_DATA_32 )
+#define msi_mask_bits_reg(base, is64bit) \
+	( (is64bit == 1) ? base+PCI_MSI_MASK_BIT : base+PCI_MSI_MASK_BIT-4)
+#define msi_disable(control)		control &= ~PCI_MSI_FLAGS_ENABLE
+#define multi_msi_capable(control) \
+	(1 << ((control & PCI_MSI_FLAGS_QMASK) >> 1))
+#define multi_msi_enable(control, num) \
+	control |= (((num >> 1) << 4) & PCI_MSI_FLAGS_QSIZE);
+#define is_64bit_address(control)	(control & PCI_MSI_FLAGS_64BIT)	
+#define is_mask_bit_support(control)	(control & PCI_MSI_FLAGS_MASKBIT)
+#define msi_enable(control, num) multi_msi_enable(control, num); \
+	control |= PCI_MSI_FLAGS_ENABLE
+ 
+#define msix_control_reg		msi_control_reg
+#define msix_table_offset_reg(base)	(base + 0x04)
+#define msix_pba_offset_reg(base)	(base + 0x08)
+#define msix_enable(control)	 	control |= PCI_MSIX_FLAGS_ENABLE
+#define msix_disable(control)	 	control &= ~PCI_MSIX_FLAGS_ENABLE
+#define msix_table_size(control) 	((control & PCI_MSIX_FLAGS_QSIZE)+1)
+#define multi_msix_capable		msix_table_size
+#define msix_unmask(address)	 	(address & ~PCI_MSIX_FLAGS_BITMASK)
+#define msix_mask(address)		(address | PCI_MSIX_FLAGS_BITMASK)  
+#define msix_is_pending(address) 	(address & PCI_MSIX_FLAGS_PENDMASK)
+
+extern char __dbg_str_buf[256];
+#define _DEFINE_DBG_BUFFER	char __dbg_str_buf[256];
+#define _DBG_K_TRACE_ENTRY	((unsigned int)0x00000001)
+#define _DBG_K_TRACE_EXIT	((unsigned int)0x00000002)
+#define _DBG_K_INFO		((unsigned int)0x00000004)
+#define _DBG_K_ERROR		((unsigned int)0x00000008)
+#define _DBG_K_TRACE	(_DBG_K_TRACE_ENTRY | _DBG_K_TRACE_EXIT)
+
+#define _DEBUG_LEVEL	(_DBG_K_INFO | _DBG_K_ERROR | _DBG_K_TRACE)
+#define _DBG_PRINT( dbg_flags, args... )		\
+if ( _DEBUG_LEVEL & (dbg_flags) )			\
+{							\
+	int len;					\
+	len = sprintf(__dbg_str_buf, "%s:%d: %s ", 	\
+		__FILE__, __LINE__, __FUNCTION__ ); 	\
+	sprintf(__dbg_str_buf + len, args);		\
+	printk(KERN_INFO "%s\n", __dbg_str_buf);	\
+}
+
+#define MSI_FUNCTION_TRACE_ENTER	\
+	_DBG_PRINT (_DBG_K_TRACE_ENTRY, "%s", "[Entry]");
+#define MSI_FUNCTION_TRACE_EXIT		\
+	_DBG_PRINT (_DBG_K_TRACE_EXIT, "%s", "[Entry]");
+	
+/*
+ * MSI Defined Data Structures
+ */
+#define MSI_ADDRESS_HEADER		0xfee
+#define MSI_ADDRESS_HEADER_SHIFT	12
+#define MSI_ADDRESS_HEADER_MASK		0xfff000
+#define MSI_TARGET_CPU_SHIFT		4
+#define MSI_TARGET_CPU_MASK		0xff
+#define MSI_DELIVERY_MODE		0
+#define MSI_LEVEL_MODE			1	/* Edge always assert */
+#define MSI_TRIGGER_MODE		0	/* MSI is edge sensitive */
+#define MSI_LOGICAL_MODE		1
+#define MSI_REDIRECTION_HINT_MODE	0
+#ifdef CONFIG_SMP
+#define MSI_TARGET_CPU			logical_smp_processor_id()
+#else
+#define MSI_TARGET_CPU			TARGET_CPUS
+#endif
+
+struct msg_data {
+#if defined(__LITTLE_ENDIAN_BITFIELD)	
+	__u32	vector		:  8;		
+	__u32	delivery_mode	:  3;	/* 000b: FIXED | 001b: lowest prior */
+	__u32	reserved_1	:  3;
+	__u32	level		:  1;	/* 0: deassert | 1: assert */	
+	__u32	trigger		:  1;	/* 0: edge | 1: level */	
+	__u32	reserved_2	: 16;
+#elif defined(__BIG_ENDIAN_BITFIELD)	
+	__u32	reserved_2	: 16;
+	__u32	trigger		:  1;	/* 0: edge | 1: level */	
+	__u32	level		:  1;	/* 0: deassert | 1: assert */	
+	__u32	reserved_1	:  3;
+	__u32	delivery_mode	:  3;	/* 000b: FIXED | 001b: lowest prior */
+	__u32	vector		:  8;		
+#else
+#error "Bitfield endianness not defined! Check your byteorder.h"
+#endif
+} __attribute__ ((packed));
+
+struct msg_address {
+	union {
+		struct { 
+#if defined(__LITTLE_ENDIAN_BITFIELD)	
+			__u32	reserved_1	:  2;		
+			__u32	dest_mode	:  1;	/*0:physic | 1:logic */
+			__u32	redirection_hint:  1;  	/*0: dedicated CPU 
+							  1: lowest priority */
+			__u32	reserved_2	:  4;   	
+ 			__u32	dest_id		: 24;	/* Destination ID */
+#elif defined(__BIG_ENDIAN_BITFIELD)	
+ 			__u32	dest_id		: 24;	/* Destination ID */
+			__u32	reserved_2	:  4;   	
+			__u32	redirection_hint:  1;  	/*0: dedicated CPU 
+							  1: lowest priority */
+			__u32	dest_mode	:  1;	/*0:physic | 1:logic */
+			__u32	reserved_1	:  2;		
+#else
+#error "Bitfield endianness not defined! Check your byteorder.h"
+#endif
+      		}u;
+       		__u32  value;
+	}lo_address;
+	__u32 	hi_address;
+} __attribute__ ((packed));
+
+struct msi_desc {
+	struct { 	
+		__u8	type	: 5; 	/* {0: unused, 5h:MSI, 11h:MSI-X} */
+		__u8	maskbit	: 1; 	/* mask-pending bit supported ?   */
+		__u8	reserved: 2; 	/* reserved			  */
+		__u8	entry_nr;    	/* specific enabled entry 	  */
+		__u8	default_vector; /* default pre-assigned vector    */ 
+		__u8	current_cpu; 	/* current destination cpu	  */
+	}msi_attrib;
+
+	struct {
+		__u16	head; 
+		__u16	tail; 
+	}link;
+
+	unsigned long mask_base;
+	struct pci_dev *dev;
+};
+
+#endif /* _ASM_PCI_MSI_H */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-10-01 18:26 Nguyen, Tom L
  0 siblings, 0 replies; 35+ messages in thread
From: Nguyen, Tom L @ 2003-10-01 18:26 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Nakajima, Jun, linux-kernel, linux-pci, Nguyen, Tom L

On Wednesday, October 01, 2003, Jeff Garzik wrote:
>> +Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
>> +AMD processors)? In P3 IPI's are transmitted on the APIC local
>> +bus and in P4 and Xeon they are transmitted on the system
>> +bus. Are there any implications with this?
>> +
>> +A2. MSI support enables a PCI device sending an inbound
>> +memory write (0xfeexxxxx as target address) on its PCI bus
>> +directly to the FSB. Since the message address has a
>> +redirection hint bit cleared, it should work.
>> +
>> +Q3. The target address 0xfeexxxxx will be translated by the
>> +Host Bridge into an interrupt message. Are there any
>> +limitations on the chipsets such as Intel 8xx, Intel e7xxx,
>> +or VIA?
>> +
>> +A3. If these chipsets support an inbound memory write with
>> +target address set as 0xfeexxxxx, as conformed to PCI 
>> +specification 2.3 or latest, then it should work.

>I would prefer a section in the documentation that spells out, in plain 
>English, exactly what hardware support is needed for MSI to work.  i.e.

>"APIC required and must be enabled"
>"Northbridge must support MSI transactions"
>etc.

>Using that information, users and developers may know _exactly_ whether 
>they can use MSI or not.  From "Q2/A2" and "Q3/A3" above, it is not 
>clear to me.
Agree. Thanks!

>> +static struct hw_interrupt_type msix_irq_type = {
>> +	"PCI MSI-X",
>> +	startup_msi_irq_w_maskbit,
>> +	shutdown_msi_irq_w_maskbit,
>> +	enable_msi_irq_w_maskbit,
>> +	disable_msi_irq_w_maskbit,
>> +	act_msi_irq_w_maskbit,		
>> +	end_msi_irq_w_maskbit,	
>> +	set_msi_irq_affinity
>> +};
>> +
>> +/*
>> + * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
>> + * which implement the MSI Capability Structure with 
>> + * Mask-and-Pending Bits. 
>> + */
>> +static struct hw_interrupt_type msi_irq_w_maskbit_type = {
>> +	"PCI MSI",
>> +	startup_msi_irq_w_maskbit,
>> +	shutdown_msi_irq_w_maskbit,
>> +	enable_msi_irq_w_maskbit,
>> +	disable_msi_irq_w_maskbit,
>> +	act_msi_irq_w_maskbit,		
>> +	end_msi_irq_w_maskbit,	
>> +	set_msi_irq_affinity
>> +};
>> +
>> +/*
>> + * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
>> + * which implement the MSI Capability Structure without 
>> + * Mask-and-Pending Bits. 
>> + */
>> +static struct hw_interrupt_type msi_irq_wo_maskbit_type = {
>> +	"PCI MSI",
>> +	startup_msi_irq_wo_maskbit,
>> +	shutdown_msi_irq_wo_maskbit,
>> +	enable_msi_irq_wo_maskbit,
>> +	disable_msi_irq_wo_maskbit,
>> +	act_msi_irq_wo_maskbit,		
>> +	end_msi_irq_wo_maskbit,	
>> +	set_msi_irq_affinity
>> +};

>Suggest converting these structure initializers to C99 style:
>
>	.member_name		= value,

Agree. Thanks!

>> +/**
>> + * msix_capability_init: configure device's MSI-X capability structure
>> + * argument: dev of struct pci_dev type
>> + * 
>> + * description: called to configure the device's MSI-X capability structure 
>> + * with a single MSI. To request for additional MSI vectors, the device drivers
>> + * are required to utilize the following supported APIs:
>> + * 1) msi_alloc_vectors(...) for requesting one or more MSI
>> + * 2) msi_free_vectors(...) for releasing one or more MSI back to PCI subsystem
>> + *

>When added header comments, please use the official kernel style, so 
>that it may be picked up automatically by the kernel documentation 
>processing system (Documentation/DocBook/*).  Specifically,
>
>	/**
>
>begins a header (you did this),
>
>	*	my_function - description
>
>begins a function header (you need to use " - "),
>
>	*	@arg1: description...
>	*	@arg2: description...
>
>describes the arguments.

>The rest is fine.  Note that lines ending with a colon ("blah blah:\n") 
>are bolded, and considered paragraph leaders.

Agree. Thanks for this input!

>> @@ -564,6 +565,9 @@
>>  		/* Fix up broken headers */
>>  		pci_fixup_device(PCI_FIXUP_HEADER, dev);
>>  
>> +		/* Fix up broken MSI hardware */
>> +		pci_fixup_device(PCI_FIXUP_MSI, dev);
>> +
>>  		/*
>>  		 * Add the device to our list of discovered devices
>>  		 * and the bus list for fixup functions, etc.

>Why does MSI need a separate list of fixups?
You are right. Sorry, I forgot to remove these lines after testing 
first alternative "quirk" solution. Will remove these lines.

>> +struct msg_data {	
>> +	__u32	vector	:  8,		
>> +	delivery_mode	:  3,	/*	000b: FIXED
>> +			 		001b: lowest priority
>> +			 	 	111b: ExtINT */	
>> +	reserved_1	:  3,
>> +	level		:  1,	/* 0: deassert, 1: assert */	
>> +	trigger		:  1,	/* 0: edge, 1: level */	
>> +	reserved_2	: 16;
>> +} __attribute__ ((packed));
>> +
>> +struct msg_address {
>> +	union {
>> +		struct { __u32
>> +			reserved_1		:  2,		
>> +			dest_mode		:  1,	/* 0: physical, 
>> +							   1: logical */	
>> +			redirection_hint        :  1,  	/* 0: dedicated CPU 
>> +							   1: lowest priority */
>> +			reserved_2		:  8,   	
>> + 			dest_id			:  8,	/* Destination ID */	
>> +			header			: 12;	/* FEEH */
>> +      	}u;
>> +       	__u32  value;
>> +	}lo_address;
>> +	__u32 	hi_address;
>> +} __attribute__ ((packed));
>> +
>> +struct msi_desc {
>> +	struct {
>> +		__u32	type	: 5, /* {0: unused, 5h:MSI, 11h:MSI-X} 	  */
>> +			maskbit	: 1, /* mask-pending bit supported ?      */
>> +			reserved: 2, /* reserved			  */
>> +			entry_nr: 8, /* specific enabled entry 		  */
>> +			default_vector: 8, /* default pre-assigned vector */ 
>> +			current_cpu: 8; /* current destination cpu	  */
>> +	}msi_attrib;
>> +
>> +	struct {
>> +		__u32	head	: 16, 
>> +			tail	: 16; 
>> +	}link;
>> +
>> +	unsigned long mask_base;
>> +	struct pci_dev *dev;
>> +};

>These bitfields should be defined in big-endian and little-endian forms, 
>shouldn't they?  grep for __LITTLE_ENDIAN_BITFIELD and 
>__BIG_ENDIAN_BITFIELD in include/*/*.h.

Thanks for your input. We'll make change appropriately.

Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-10-01 15:37 long
@ 2003-10-01 17:09 ` Jeff Garzik
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2003-10-01 17:09 UTC (permalink / raw)
  To: long; +Cc: jun.nakajima, linux-kernel, linux-pci, tom.l.nguyen

long wrote:
> Here is an updated version of the MSI patch based on kernel linux-2.6.0-test5.
> Thanks for all the input received from the community, below patches include
> the following changes from the previous patches, which are based on kernel
> linux-2.6.0-test2:

Lookin' real good to me.


> +Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
> +AMD processors)? In P3 IPI's are transmitted on the APIC local
> +bus and in P4 and Xeon they are transmitted on the system
> +bus. Are there any implications with this?
> +
> +A2. MSI support enables a PCI device sending an inbound
> +memory write (0xfeexxxxx as target address) on its PCI bus
> +directly to the FSB. Since the message address has a
> +redirection hint bit cleared, it should work.
> +
> +Q3. The target address 0xfeexxxxx will be translated by the
> +Host Bridge into an interrupt message. Are there any
> +limitations on the chipsets such as Intel 8xx, Intel e7xxx,
> +or VIA?
> +
> +A3. If these chipsets support an inbound memory write with
> +target address set as 0xfeexxxxx, as conformed to PCI 
> +specification 2.3 or latest, then it should work.

I would prefer a section in the documentation that spells out, in plain 
English, exactly what hardware support is needed for MSI to work.  i.e.

"APIC required and must be enabled"
"Northbridge must support MSI transactions"
etc.

Using that information, users and developers may know _exactly_ whether 
they can use MSI or not.  From "Q2/A2" and "Q3/A3" above, it is not 
clear to me.


> +static struct hw_interrupt_type msix_irq_type = {
> +	"PCI MSI-X",
> +	startup_msi_irq_w_maskbit,
> +	shutdown_msi_irq_w_maskbit,
> +	enable_msi_irq_w_maskbit,
> +	disable_msi_irq_w_maskbit,
> +	act_msi_irq_w_maskbit,		
> +	end_msi_irq_w_maskbit,	
> +	set_msi_irq_affinity
> +};
> +
> +/*
> + * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
> + * which implement the MSI Capability Structure with 
> + * Mask-and-Pending Bits. 
> + */
> +static struct hw_interrupt_type msi_irq_w_maskbit_type = {
> +	"PCI MSI",
> +	startup_msi_irq_w_maskbit,
> +	shutdown_msi_irq_w_maskbit,
> +	enable_msi_irq_w_maskbit,
> +	disable_msi_irq_w_maskbit,
> +	act_msi_irq_w_maskbit,		
> +	end_msi_irq_w_maskbit,	
> +	set_msi_irq_affinity
> +};
> +
> +/*
> + * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
> + * which implement the MSI Capability Structure without 
> + * Mask-and-Pending Bits. 
> + */
> +static struct hw_interrupt_type msi_irq_wo_maskbit_type = {
> +	"PCI MSI",
> +	startup_msi_irq_wo_maskbit,
> +	shutdown_msi_irq_wo_maskbit,
> +	enable_msi_irq_wo_maskbit,
> +	disable_msi_irq_wo_maskbit,
> +	act_msi_irq_wo_maskbit,		
> +	end_msi_irq_wo_maskbit,	
> +	set_msi_irq_affinity
> +};

Suggest converting these structure initializers to C99 style:

	.member_name		= value,



> +/**
> + * msix_capability_init: configure device's MSI-X capability structure
> + * argument: dev of struct pci_dev type
> + * 
> + * description: called to configure the device's MSI-X capability structure 
> + * with a single MSI. To request for additional MSI vectors, the device drivers
> + * are required to utilize the following supported APIs:
> + * 1) msi_alloc_vectors(...) for requesting one or more MSI
> + * 2) msi_free_vectors(...) for releasing one or more MSI back to PCI subsystem
> + *

When added header comments, please use the official kernel style, so 
that it may be picked up automatically by the kernel documentation 
processing system (Documentation/DocBook/*).  Specifically,

	/**

begins a header (you did this),

	*	my_function - description

begins a function header (you need to use " - "),

	*	@arg1: description...
	*	@arg2: description...

describes the arguments.

The rest is fine.  Note that lines ending with a colon ("blah blah:\n") 
are bolded, and considered paragraph leaders.


> diff -urN linux-260t5-vectorbase-testbox/drivers/pci/probe.c linux-260t5-msi-testbox/drivers/pci/probe.c
> --- linux-260t5-vectorbase-testbox/drivers/pci/probe.c	2003-09-08 15:50:17.000000000 -0400
> +++ linux-260t5-msi-testbox/drivers/pci/probe.c	2003-09-22 12:08:56.000000000 -0400
> @@ -552,6 +552,7 @@
>  		struct pci_dev *dev;
>  
>  		dev = pci_scan_device(bus, devfn);
> +		pci_scan_msi_device(dev);
>  		if (func == 0) {
>  			if (!dev)
>  				break;
> @@ -564,6 +565,9 @@
>  		/* Fix up broken headers */
>  		pci_fixup_device(PCI_FIXUP_HEADER, dev);
>  
> +		/* Fix up broken MSI hardware */
> +		pci_fixup_device(PCI_FIXUP_MSI, dev);
> +
>  		/*
>  		 * Add the device to our list of discovered devices
>  		 * and the bus list for fixup functions, etc.

Why does MSI need a separate list of fixups?

AFAICS adding broken MSI devices to either the PCI_FIXUP_HEADER or 
PCI_FIXUP_FINAL lists is entirely appropriate.


> +struct msg_data {	
> +	__u32	vector	:  8,		
> +	delivery_mode	:  3,	/*	000b: FIXED
> +			 		001b: lowest priority
> +			 	 	111b: ExtINT */	
> +	reserved_1	:  3,
> +	level		:  1,	/* 0: deassert, 1: assert */	
> +	trigger		:  1,	/* 0: edge, 1: level */	
> +	reserved_2	: 16;
> +} __attribute__ ((packed));
> +
> +struct msg_address {
> +	union {
> +		struct { __u32
> +			reserved_1		:  2,		
> +			dest_mode		:  1,	/* 0: physical, 
> +							   1: logical */	
> +			redirection_hint        :  1,  	/* 0: dedicated CPU 
> +							   1: lowest priority */
> +			reserved_2		:  8,   	
> + 			dest_id			:  8,	/* Destination ID */	
> +			header			: 12;	/* FEEH */
> +      	}u;
> +       	__u32  value;
> +	}lo_address;
> +	__u32 	hi_address;
> +} __attribute__ ((packed));
> +
> +struct msi_desc {
> +	struct {
> +		__u32	type	: 5, /* {0: unused, 5h:MSI, 11h:MSI-X} 	  */
> +			maskbit	: 1, /* mask-pending bit supported ?      */
> +			reserved: 2, /* reserved			  */
> +			entry_nr: 8, /* specific enabled entry 		  */
> +			default_vector: 8, /* default pre-assigned vector */ 
> +			current_cpu: 8; /* current destination cpu	  */
> +	}msi_attrib;
> +
> +	struct {
> +		__u32	head	: 16, 
> +			tail	: 16; 
> +	}link;
> +
> +	unsigned long mask_base;
> +	struct pci_dev *dev;
> +};

These bitfields should be defined in big-endian and little-endian forms, 
shouldn't they?  grep for __LITTLE_ENDIAN_BITFIELD and 
__BIG_ENDIAN_BITFIELD in include/*/*.h.

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Updated MSI Patches
@ 2003-10-01 15:37 long
  2003-10-01 17:09 ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: long @ 2003-10-01 15:37 UTC (permalink / raw)
  To: jun.nakajima, linux-kernel, linux-pci, tom.l.nguyen

Hi,

Here is an updated version of the MSI patch based on kernel linux-2.6.0-test5.
Thanks for all the input received from the community, below patches include
the following changes from the previous patches, which are based on kernel
linux-2.6.0-test2:

- Reconstructing a vector-base patch.
- Adding a new API "int pci_enable_msi(struct pci_dev *dev)" based on the
input from the community to handle buggy PCI/PCI-X hardware when MSI is 
enabled. Any MSI/MSI-X capable device driver, which wants to have MSI enabled
on its hardware device, must call this API to request for MSI mode.
- Adding a set_msi_affinity feature.
- Adding a feature, which dynamically detects any MSI/MSI-X capable devices 
and establishes a reserved policy to ensure that no MSI-X device driver can
claim all available vectors.
- Resolving issue with kernel build when configuring system in UP mode without
setting CONFIG_X86_LOCAL_APIC to 'Y'. This solution provides MSI support
independent from IOxAPICs (IOxAPIC free).
- Renaming pci_msi.c to msi.c and moving this file under ../drivers/pci to 
have code share as much as possible across all Linux architectures.
- Replacing all direct memory access with readl/writel.
- Using kmem_cache instead of kmalloc to allocate msi_desc entries.
- Putting the EXPORT_SYMBOL in the file msi.c
- Removing boot parameters "pci_nomsi", "device_msi=", and "device_nomsi=".

Thanks,
Long

----------------------------------------------------------------------------
diff -urN linux-2.6.0-test5/arch/i386/Kconfig linux-260t5-vectorbase-testbox/arch/i386/Kconfig
--- linux-2.6.0-test5/arch/i386/Kconfig	2003-09-08 15:49:53.000000000 -0400
+++ linux-260t5-vectorbase-testbox/arch/i386/Kconfig	2003-09-24 08:11:43.000000000 -0400
@@ -1086,6 +1086,17 @@
  	depends on PCI && ((PCI_GODIRECT || PCI_GOANY) || X86_VISWS)
 	default y
 
+config PCI_USE_VECTOR
+	bool "PCI_USE_VECTOR"
+	default n
+	help
+	   This replaces the current existing IRQ-based index interrupt scheme
+	   with the vector-base index scheme. The advantages of vector base over	   IRQ base are listed below:
+	   1) Support MSI implementation.
+	   2) Support future IOxAPIC hotplug
+
+	   If you don't know what to do here, say N.
+
 source "drivers/pci/Kconfig"
 
 config ISA
diff -urN linux-2.6.0-test5/arch/i386/kernel/i8259.c linux-260t5-vectorbase-testbox/arch/i386/kernel/i8259.c
--- linux-2.6.0-test5/arch/i386/kernel/i8259.c	2003-09-08 15:50:28.000000000 -0400
+++ linux-260t5-vectorbase-testbox/arch/i386/kernel/i8259.c	2003-09-24 08:11:43.000000000 -0400
@@ -419,8 +419,10 @@
 	 * us. (some of these will be overridden and become
 	 * 'special' SMP interrupts)
 	 */
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
 		int vector = FIRST_EXTERNAL_VECTOR + i;
+		if (i >= NR_IRQS)
+			break;
 		if (vector != SYSCALL_VECTOR) 
 			set_intr_gate(vector, interrupt[i]);
 	}
diff -urN linux-2.6.0-test5/arch/i386/kernel/io_apic.c linux-260t5-vectorbase-testbox/arch/i386/kernel/io_apic.c
--- linux-2.6.0-test5/arch/i386/kernel/io_apic.c	2003-09-08 15:50:01.000000000 -0400
+++ linux-260t5-vectorbase-testbox/arch/i386/kernel/io_apic.c	2003-09-25 07:29:13.000000000 -0400
@@ -76,6 +76,14 @@
 	int apic, pin, next;
 } irq_2_pin[PIN_MAP_SIZE];
 
+#ifdef CONFIG_PCI_USE_VECTOR
+int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
+#define vector_to_irq(vector) 	\
+	(platform_legacy_irq(vector) ? vector : vector_irq[vector])
+#else
+#define vector_to_irq(vector)	(vector)
+#endif
+
 /*
  * The common case is 1:1 IRQ<->pin mappings. Sometimes there are
  * shared ISA-space IRQs, so we have to support them. We are super
@@ -249,7 +257,7 @@
 			clear_IO_APIC_pin(apic, pin);
 }
 
-static void set_ioapic_affinity(unsigned int irq, cpumask_t cpumask)
+static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t cpumask)
 {
 	unsigned long flags;
 	int pin;
@@ -670,13 +678,13 @@
 
 __setup("noirqbalance", irqbalance_disable);
 
-static void set_ioapic_affinity(unsigned int irq, cpumask_t mask);
+static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t mask);
 
 static inline void move_irq(int irq)
 {
 	/* note - we hold the desc->lock */
 	if (unlikely(!cpus_empty(pending_irq_balance_cpumask[irq]))) {
-		set_ioapic_affinity(irq, pending_irq_balance_cpumask[irq]);
+		set_ioapic_affinity_irq(irq, pending_irq_balance_cpumask[irq]);
 		cpus_clear(pending_irq_balance_cpumask[irq]);
 	}
 }
@@ -853,7 +861,7 @@
 			if (irq_entry == -1)
 				continue;
 			irq = pin_2_irq(irq_entry, ioapic, pin);
-			set_ioapic_affinity(irq, mask);
+			set_ioapic_affinity_irq(irq, mask);
 		}
 
 	}
@@ -1143,8 +1151,10 @@
 static int __init assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
-	if (IO_APIC_VECTOR(irq) > 0)
-		return IO_APIC_VECTOR(irq);
+	int vector;
+
+	if ((vector = IO_APIC_VECTOR(irq)) > 0)
+		return vector;
 next:
 	current_vector += 8;
 	if (current_vector == SYSCALL_VECTOR)
@@ -1155,12 +1165,40 @@
 		current_vector = FIRST_DEVICE_VECTOR + offset;
 	}
 
+#ifdef CONFIG_PCI_USE_VECTOR
+	vector_irq[current_vector] = irq;
+#endif
+	
 	IO_APIC_VECTOR(irq) = current_vector;
+
 	return current_vector;
 }
 
-static struct hw_interrupt_type ioapic_level_irq_type;
-static struct hw_interrupt_type ioapic_edge_irq_type;
+static struct hw_interrupt_type ioapic_level_type;
+static struct hw_interrupt_type ioapic_edge_type;
+
+#define IOAPIC_AUTO	-1
+#define IOAPIC_EDGE	0
+#define IOAPIC_LEVEL	1
+
+static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger)
+{
+	if (use_pci_vector() && !platform_legacy_irq(irq)) {
+		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
+				trigger == IOAPIC_LEVEL)
+			irq_desc[vector].handler = &ioapic_level_type;
+		else
+			irq_desc[vector].handler = &ioapic_edge_type;
+		set_intr_gate(vector, interrupt[vector]);
+	} else	{
+		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
+				trigger == IOAPIC_LEVEL)
+			irq_desc[irq].handler = &ioapic_level_type;
+		else
+			irq_desc[irq].handler = &ioapic_edge_type;
+		set_intr_gate(vector, interrupt[irq]);
+	}
+}
 
 void __init setup_IO_APIC_irqs(void)
 {
@@ -1218,13 +1256,7 @@
 		if (IO_APIC_IRQ(irq)) {
 			vector = assign_irq_vector(irq);
 			entry.vector = vector;
-
-			if (IO_APIC_irq_trigger(irq))
-				irq_desc[irq].handler = &ioapic_level_irq_type;
-			else
-				irq_desc[irq].handler = &ioapic_edge_irq_type;
-
-			set_intr_gate(vector, interrupt[irq]);
+			ioapic_register_intr(irq, vector, IOAPIC_AUTO);
 		
 			if (!apic && (irq < 16))
 				disable_8259A_irq(irq);
@@ -1271,7 +1303,7 @@
 	 * The timer IRQ doesn't have to know that behind the
 	 * scene we have a 8259A-master in AEOI mode ...
 	 */
-	irq_desc[0].handler = &ioapic_edge_irq_type;
+	irq_desc[0].handler = &ioapic_edge_type;
 
 	/*
 	 * Add it to the IO-APIC irq-routing table:
@@ -1757,9 +1789,6 @@
  * that was delayed but this is now handled in the device
  * independent code.
  */
-#define enable_edge_ioapic_irq unmask_IO_APIC_irq
-
-static void disable_edge_ioapic_irq (unsigned int irq) { /* nothing */ }
 
 /*
  * Starting up a edge-triggered IO-APIC interrupt is
@@ -1770,7 +1799,6 @@
  * This is not complete - we should be able to fake
  * an edge even if it isn't on the 8259A...
  */
-
 static unsigned int startup_edge_ioapic_irq(unsigned int irq)
 {
 	int was_pending = 0;
@@ -1788,8 +1816,6 @@
 	return was_pending;
 }
 
-#define shutdown_edge_ioapic_irq	disable_edge_ioapic_irq
-
 /*
  * Once we have recorded IRQ_PENDING already, we can mask the
  * interrupt for real. This prevents IRQ storms from unhandled
@@ -1804,9 +1830,6 @@
 	ack_APIC_irq();
 }
 
-static void end_edge_ioapic_irq (unsigned int i) { /* nothing */ }
-
-
 /*
  * Level triggered interrupts can just be masked,
  * and shutting down and starting up the interrupt
@@ -1828,10 +1851,6 @@
 	return 0; /* don't check for pending */
 }
 
-#define shutdown_level_ioapic_irq	mask_IO_APIC_irq
-#define enable_level_ioapic_irq		unmask_IO_APIC_irq
-#define disable_level_ioapic_irq	mask_IO_APIC_irq
-
 static void end_level_ioapic_irq (unsigned int irq)
 {
 	unsigned long v;
@@ -1858,6 +1877,7 @@
  * The idea is from Manfred Spraul.  --macro
  */
 	i = IO_APIC_VECTOR(irq);
+
 	v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
 	ack_APIC_irq();
@@ -1892,7 +1912,57 @@
 	}
 }
 
-static void mask_and_ack_level_ioapic_irq (unsigned int irq) { /* nothing */ }
+#ifdef CONFIG_PCI_USE_VECTOR
+static unsigned int startup_edge_ioapic_vector(unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	return startup_edge_ioapic_irq(irq);
+}
+
+static void ack_edge_ioapic_vector(unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+	
+	ack_edge_ioapic_irq(irq);
+}
+
+static unsigned int startup_level_ioapic_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	return startup_level_ioapic_irq (irq);
+}
+
+static void end_level_ioapic_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	end_level_ioapic_irq(irq);
+}
+
+static void mask_IO_APIC_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+
+	mask_IO_APIC_irq(irq);
+}
+
+static void unmask_IO_APIC_vector (unsigned int vector)
+{
+	int irq = vector_to_irq(vector);
+	
+	unmask_IO_APIC_irq(irq);
+}
+
+static void set_ioapic_affinity_vector (unsigned int vector, 
+					unsigned long cpu_mask)
+{
+	int irq = vector_to_irq(vector);
+
+	set_ioapic_affinity_irq(irq, cpu_mask);
+}
+#endif
 
 /*
  * Level and edge triggered IO-APIC interrupts need different handling,
@@ -1902,26 +1972,25 @@
  * edge-triggered handler, without risking IRQ storms and other ugly
  * races.
  */
-
-static struct hw_interrupt_type ioapic_edge_irq_type = {
+static struct hw_interrupt_type ioapic_edge_type = {
 	.typename 	= "IO-APIC-edge",
-	.startup 	= startup_edge_ioapic_irq,
-	.shutdown 	= shutdown_edge_ioapic_irq,
-	.enable 	= enable_edge_ioapic_irq,
-	.disable 	= disable_edge_ioapic_irq,
-	.ack 		= ack_edge_ioapic_irq,
-	.end 		= end_edge_ioapic_irq,
+	.startup 	= startup_edge_ioapic,
+	.shutdown 	= shutdown_edge_ioapic,
+	.enable 	= enable_edge_ioapic,
+	.disable 	= disable_edge_ioapic,
+	.ack 		= ack_edge_ioapic,
+	.end 		= end_edge_ioapic,
 	.set_affinity 	= set_ioapic_affinity,
 };
 
-static struct hw_interrupt_type ioapic_level_irq_type = {
+static struct hw_interrupt_type ioapic_level_type = {
 	.typename 	= "IO-APIC-level",
-	.startup 	= startup_level_ioapic_irq,
-	.shutdown 	= shutdown_level_ioapic_irq,
-	.enable 	= enable_level_ioapic_irq,
-	.disable 	= disable_level_ioapic_irq,
-	.ack 		= mask_and_ack_level_ioapic_irq,
-	.end 		= end_level_ioapic_irq,
+	.startup 	= startup_level_ioapic,
+	.shutdown 	= shutdown_level_ioapic,
+	.enable 	= enable_level_ioapic,
+	.disable 	= disable_level_ioapic,
+	.ack 		= mask_and_ack_level_ioapic,
+	.end 		= end_level_ioapic,
 	.set_affinity 	= set_ioapic_affinity,
 };
 
@@ -1941,7 +2010,13 @@
 	 * 0x80, because int 0x80 is hm, kind of importantish. ;)
 	 */
 	for (irq = 0; irq < NR_IRQS ; irq++) {
-		if (IO_APIC_IRQ(irq) && !IO_APIC_VECTOR(irq)) {
+		int tmp = irq;
+		if (use_pci_vector()) {
+			if (!platform_legacy_irq(tmp))  
+				if ((tmp = vector_to_irq(tmp)) == -1)
+					continue;
+		}
+		if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
 			/*
 			 * Hmm.. We don't have an entry for this,
 			 * so default to an old-fashioned 8259
@@ -2373,10 +2448,12 @@
 		"IRQ %d Mode:%i Active:%i)\n", ioapic, 
 		mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq, edge_level, active_high_low);
 
+ 	if (use_pci_vector() && !platform_legacy_irq(irq))
+		irq = IO_APIC_VECTOR(irq);
 	if (edge_level) {
-	irq_desc[irq].handler = &ioapic_level_irq_type;
+		irq_desc[irq].handler = &ioapic_level_type;
 	} else {
-		irq_desc[irq].handler = &ioapic_edge_irq_type;
+		irq_desc[irq].handler = &ioapic_edge_type;
 	}
 
 	set_intr_gate(entry.vector, interrupt[irq]);
diff -urN linux-2.6.0-test5/arch/i386/kernel/mpparse.c linux-260t5-vectorbase-testbox/arch/i386/kernel/mpparse.c
--- linux-2.6.0-test5/arch/i386/kernel/mpparse.c	2003-09-08 15:50:00.000000000 -0400
+++ linux-260t5-vectorbase-testbox/arch/i386/kernel/mpparse.c	2003-09-25 08:09:21.749936480 -0400
@@ -1131,15 +1131,19 @@
 		if ((1<<bit) & mp_ioapic_routing[ioapic].pin_programmed[idx]) {
 			printk(KERN_DEBUG "Pin %d-%d already programmed\n",
 				mp_ioapic_routing[ioapic].apic_id, ioapic_pin);
-			entry->irq = irq;
+ 			if (use_pci_vector() && !platform_legacy_irq(irq))
+ 				irq = IO_APIC_VECTOR(irq);
+ 			entry->irq = irq;
 			continue;
 		}
 
 		mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<<bit);
 
-		if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq, edge_level, active_high_low))
-			entry->irq = irq;
-
+		if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq, edge_level, active_high_low)) {
+ 			if (use_pci_vector() && !platform_legacy_irq(irq))
+ 				irq = IO_APIC_VECTOR(irq);
+ 			entry->irq = irq;
+ 		}
 		printk(KERN_DEBUG "%02x:%02x:%02x[%c] -> %d-%d -> IRQ %d\n",
 			entry->id.segment, entry->id.bus, 
 			entry->id.device, ('A' + entry->pin), 
diff -urN linux-2.6.0-test5/arch/i386/pci/irq.c linux-260t5-vectorbase-testbox/arch/i386/pci/irq.c
--- linux-2.6.0-test5/arch/i386/pci/irq.c	2003-09-08 15:50:43.000000000 -0400
+++ linux-260t5-vectorbase-testbox/arch/i386/pci/irq.c	2003-09-24 08:11:43.000000000 -0400
@@ -745,6 +745,10 @@
 							bridge->bus->number, PCI_SLOT(bridge->devfn), pin, irq);
 				}
 				if (irq >= 0) {
+					if (use_pci_vector() && 
+						!platform_legacy_irq(irq))
+						irq = IO_APIC_VECTOR(irq);
+
 					printk(KERN_INFO "PCI->APIC IRQ transform: (B%d,I%d,P%d) -> %d\n",
 						dev->bus->number, PCI_SLOT(dev->devfn), pin, irq);
 					dev->irq = irq;
diff -urN linux-2.6.0-test5/include/asm-i386/hw_irq.h linux-260t5-vectorbase-testbox/include/asm-i386/hw_irq.h
--- linux-2.6.0-test5/include/asm-i386/hw_irq.h	2003-09-08 15:50:41.000000000 -0400
+++ linux-260t5-vectorbase-testbox/include/asm-i386/hw_irq.h	2003-09-24 08:11:43.000000000 -0400
@@ -41,6 +41,7 @@
 asmlinkage void error_interrupt(void);
 asmlinkage void spurious_interrupt(void);
 asmlinkage void thermal_interrupt(struct pt_regs);
+#define platform_legacy_irq(irq)	((irq) < 16)
 #endif
 
 void mask_irq(unsigned int irq);
diff -urN linux-2.6.0-test5/include/asm-i386/io_apic.h linux-260t5-vectorbase-testbox/include/asm-i386/io_apic.h
--- linux-2.6.0-test5/include/asm-i386/io_apic.h	2003-09-08 15:50:13.000000000 -0400
+++ linux-260t5-vectorbase-testbox/include/asm-i386/io_apic.h	2003-09-24 10:48:47.000000000 -0400
@@ -13,6 +13,46 @@
 
 #ifdef CONFIG_X86_IO_APIC
 
+#ifdef CONFIG_PCI_USE_VECTOR
+static inline int use_pci_vector(void)	{return 1;}
+static inline void disable_edge_ioapic_vector(unsigned int vector) { }
+static inline void mask_and_ack_level_ioapic_vector(unsigned int vector) { }
+static inline void end_edge_ioapic_vector (unsigned int vector) { }
+#define startup_level_ioapic	startup_level_ioapic_vector
+#define shutdown_level_ioapic	mask_IO_APIC_vector
+#define enable_level_ioapic	unmask_IO_APIC_vector
+#define disable_level_ioapic	mask_IO_APIC_vector
+#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_vector	
+#define end_level_ioapic	end_level_ioapic_vector
+#define set_ioapic_affinity	set_ioapic_affinity_vector
+
+#define startup_edge_ioapic 	startup_edge_ioapic_vector
+#define shutdown_edge_ioapic 	disable_edge_ioapic_vector
+#define enable_edge_ioapic 	unmask_IO_APIC_vector
+#define disable_edge_ioapic 	disable_edge_ioapic_vector
+#define ack_edge_ioapic 	ack_edge_ioapic_vector
+#define end_edge_ioapic 	end_edge_ioapic_vector
+#else
+static inline int use_pci_vector(void)	{return 0;}
+static inline void disable_edge_ioapic_irq(unsigned int irq) { }
+static inline void mask_and_ack_level_ioapic_irq(unsigned int irq) { }
+static inline void end_edge_ioapic_irq (unsigned int irq) { }
+#define startup_level_ioapic	startup_level_ioapic_irq
+#define shutdown_level_ioapic	mask_IO_APIC_irq
+#define enable_level_ioapic	unmask_IO_APIC_irq
+#define disable_level_ioapic	mask_IO_APIC_irq
+#define mask_and_ack_level_ioapic mask_and_ack_level_ioapic_irq	
+#define end_level_ioapic	end_level_ioapic_irq
+#define set_ioapic_affinity	set_ioapic_affinity_irq
+
+#define startup_edge_ioapic 	startup_edge_ioapic_irq
+#define shutdown_edge_ioapic 	disable_edge_ioapic_irq
+#define enable_edge_ioapic 	unmask_IO_APIC_irq
+#define disable_edge_ioapic 	disable_edge_ioapic_irq
+#define ack_edge_ioapic 	ack_edge_ioapic_irq
+#define end_edge_ioapic 	end_edge_ioapic_irq
+#endif
+
 #define APIC_MISMATCH_DEBUG
 
 #define IO_APIC_BASE(idx) \
diff -urN linux-2.6.0-test5/include/asm-i386/mach-default/irq_vectors.h linux-260t5-vectorbase-testbox/include/asm-i386/mach-default/irq_vectors.h
--- linux-2.6.0-test5/include/asm-i386/mach-default/irq_vectors.h	2003-09-08 15:49:58.000000000 -0400
+++ linux-260t5-vectorbase-testbox/include/asm-i386/mach-default/irq_vectors.h	2003-09-24 08:11:43.000000000 -0400
@@ -76,11 +76,21 @@
  * Since vectors 0x00-0x1f are used/reserved for the CPU,
  * the usable vector space is 0x20-0xff (224 vectors)
  */
+/*
+ * The maximum number of vectors supported by i386 processors
+ * is limited to 256. For processors other than i386, NR_VECTORS
+ * should be changed accordingly. 
+ */
+#define NR_VECTORS 256
+#ifdef CONFIG_PCI_USE_VECTOR
+#define NR_IRQS FIRST_SYSTEM_VECTOR
+#else
 #ifdef CONFIG_X86_IO_APIC
 #define NR_IRQS 224
 #else
 #define NR_IRQS 16
 #endif
+#endif
 
 #define FPU_IRQ			13
 
diff -urN linux-260t5-vectorbase-testbox/arch/i386/Kconfig linux-260t5-msi-testbox/arch/i386/Kconfig
--- linux-260t5-vectorbase-testbox/arch/i386/Kconfig	2003-09-24 08:11:43.000000000 -0400
+++ linux-260t5-msi-testbox/arch/i386/Kconfig	2003-09-22 12:08:56.000000000 -0400
@@ -1088,6 +1088,7 @@
 
 config PCI_USE_VECTOR
 	bool "PCI_USE_VECTOR"
+	depends on X86_LOCAL_APIC
 	default n
 	help
 	   This replaces the current existing IRQ-based index interrupt scheme
@@ -1095,6 +1096,12 @@
 	   1) Support MSI implementation.
 	   2) Support future IOxAPIC hotplug
 
+	   Note that this enables MSI, Message Signaled Interrupt, on all
+	   MSI capable device functions detected if users also install the 
+	   MSI patch. Message Signal Interrupt enables an MSI-capable
+	   hardware device to send an inbound Memory Write on its PCI bus
+	   instead of asserting IRQ signal on device IRQ pin.
+
 	   If you don't know what to do here, say N.
 
 source "drivers/pci/Kconfig"
diff -urN linux-260t5-vectorbase-testbox/arch/i386/kernel/io_apic.c linux-260t5-msi-testbox/arch/i386/kernel/io_apic.c
--- linux-260t5-vectorbase-testbox/arch/i386/kernel/io_apic.c	2003-09-25 07:29:13.000000000 -0400
+++ linux-260t5-msi-testbox/arch/i386/kernel/io_apic.c	2003-09-25 11:51:49.000000000 -0400
@@ -296,7 +296,7 @@
 
 extern cpumask_t irq_affinity[NR_IRQS];
 
-static cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS];
+cpumask_t __cacheline_aligned pending_irq_balance_cpumask[NR_IRQS];
 
 #define IRQBALANCE_CHECK_ARCH -999
 static int irqbalance_disabled = IRQBALANCE_CHECK_ARCH;
@@ -1148,6 +1148,7 @@
 
 int irq_vector[NR_IRQS] = { FIRST_DEVICE_VECTOR , 0 };
 
+#ifndef CONFIG_PCI_USE_VECTOR
 static int __init assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
@@ -1165,14 +1166,13 @@
 		current_vector = FIRST_DEVICE_VECTOR + offset;
 	}
 
-#ifdef CONFIG_PCI_USE_VECTOR
-	vector_irq[current_vector] = irq;
-#endif
-	
 	IO_APIC_VECTOR(irq) = current_vector;
 
 	return current_vector;
 }
+#else
+extern int assign_irq_vector(int irq);
+#endif
 
 static struct hw_interrupt_type ioapic_level_type;
 static struct hw_interrupt_type ioapic_edge_type;
diff -urN linux-260t5-vectorbase-testbox/arch/i386/pci/irq.c linux-260t5-msi-testbox/arch/i386/pci/irq.c
--- linux-260t5-vectorbase-testbox/arch/i386/pci/irq.c	2003-09-24 08:11:43.000000000 -0400
+++ linux-260t5-msi-testbox/arch/i386/pci/irq.c	2003-09-22 12:08:56.000000000 -0400
@@ -680,8 +680,10 @@
 		    	if ( dev2->irq && dev2->irq != irq && \
 			(!(pci_probe & PCI_USE_PIRQ_MASK) || \
 			((1 << dev2->irq) & mask)) ) {
+#ifndef CONFIG_PCI_USE_VECTOR
 		    		printk(KERN_INFO "IRQ routing conflict for %s, have irq %d, want irq %d\n",
 				       pci_name(dev2), dev2->irq, irq);
+#endif
 		    		continue;
 		    	}
 			dev2->irq = irq;
diff -urN linux-260t5-vectorbase-testbox/Documentation/MSI-HOWTO.txt linux-260t5-msi-testbox/Documentation/MSI-HOWTO.txt
--- linux-260t5-vectorbase-testbox/Documentation/MSI-HOWTO.txt	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t5-msi-testbox/Documentation/MSI-HOWTO.txt	2003-09-22 12:08:56.000000000 -0400
@@ -0,0 +1,275 @@
+		The MSI Driver Guide HOWTO
+	Tom L Nguyen tom.l.nguyen@intel.com
+			09/18/2003
+
+1. About this guide
+
+This guide describes the basics of Message Signaled Interrupts(MSI), the 
+advantages of using MSI over traditional interrupt mechanisms, and how 
+to enable your driver to use MSI or MSI-X. Also included is a Frequently 
+Asked Questions.
+
+2. Copyright 2003 Intel Corporation 
+
+3. What is MSI/MSI-X?
+
+Message Signaled Interrupt (MSI), as described in the PCI Local Bus 
+Specification Revision 2.3 or latest, is an optional feature, and a 
+required feature for PCI Express devices. MSI enables a device function
+to request service by sending an Inbound Memory Write on its PCI bus to
+the FSB as a Message Signal Interrupt transaction. Because MSI is 
+generated in the form of a Memory Write, all transaction conditions, 
+such as a Retry, Master-Abort, Target-Abort or normal completion, are 
+supported.
+
+A PCI device that supports MSI must also support pin IRQ assertion 
+interrupt mechanism to provide backward compatibility for systems that
+do not support MSI. In Systems, which support MSI, the bus driver is
+responsible for initializing the message address and message data of
+the device function's MSI/MSI-X capability structure during device
+initial configuration. 
+
+An MSI capable device function indicates MSI support by implementing 
+the MSI/MSI-X capability structure in its PCI capability list. The 
+device function may implement both the MSI capability structure and 
+the MSI-X capability structure; however, the bus driver should not 
+enable both, but instead enable only the MSI-X capability structure.
+
+The MSI capability structure contains Message Control register,
+Message Address register and Message Data register. These registers
+provide the bus driver control over MSI. The Message Control register
+indicates the MSI capability supported by the device. The Message
+Address register specifies the target address and the Message Data
+register specifies the characteristics of the message. To request
+service, the device function writes the content of the Message Data
+register to the target address. The device and its software driver 
+are prohibited from writing to these registers.
+
+The MSI-X capability structure is an optional extension to MSI. It 
+uses an independent and separate capability structure. There are 
+some key advantages to implementing the MSI-X capability structure 
+over the MSI capability structure as described below.
+
+	- Support a larger maximum number of vectors per function.
+
+	- Provide the ability for system software to configure 
+	each vector with an independent message address and message 
+	data, specified by a table that resides in Memory Space. 
+
+        - MSI and MSI-X both support per-vector masking. Per-vector
+	masking is an optional extension of MSI but a required
+	feature for MSI-X. Per-vector masking provides the kernel
+	the ability to mask/unmask MSI when servicing its software
+	interrupt service routing handler. If per-vector masking is
+	not supported, then the device driver should provide the 
+	hardware/software synchronization to ensure that the device
+	generates MSI when the driver wants it to do so. 
+
+4. Why use MSI? 
+
+As a benefit the simplification of board design, MSI allows board 
+designers to remove out of band interrupt routing. MSI is another 
+step towards a legacy-free environment.
+
+Due to increasing pressure on chipset and processor packages to 
+reduce pin count, the need for interrupt pins is expected to 
+diminish over time. Devices, due to pin constraints, may implement
+messages to increase performance. 
+
+PCI Express endpoints uses INTx emulation (in-band messages) instead
+of IRQ pin assertion. Using INTx emulation requires interrupt 
+sharing among devices connected to the same node (PCI bridge) while
+MSI is unique (non-shared) and does not require BIOS configuration
+support. As a result, the PCI Express technology requires MSI 
+support for better interrupt performance.
+ 
+Using MSI enables the device functions to support two or more 
+vectors, which can be configure to target different CPU's to 
+increase scalability. 
+
+5. Configuring a driver to use MSI/MSI-X
+
+By default, the kernel will not enable MSI/MSI-X on all devices that
+support this capability once the patch is installed. A kernel 
+configuration option must be selected to enable MSI/MSI-X support.
+
+5.1 Including MSI support into the kernel
+
+To include MSI support into the kernel requires users to patch the
+VECTOR-base patch first and then the MSI patch because the MSI 
+support needs VECTOR based scheme. Once these patches are installed,
+setting CONFIG_PCI_USE_VECTOR enables the VECTOR based scheme and 
+the option for MSI-capable device drivers to selectively enable MSI
+(using pci_enable_msi as desribed below).
+
+Since the target of the inbound message is the local APIC, providing
+CONFIG_PCI_USE_VECTOR is dependent on whether CONFIG_X86_LOCAL_APIC
+is enabled or not.
+
+int pci_enable_msi(struct pci_dev *)
+ 
+With this new API, any existing device driver, which like to have 
+MSI enabled on its device function, must call this explicitly. A 
+successful call will initialize the MSI/MSI-X capability structure
+with ONE vector, regardless of whether the device function is 
+capable of supporting multiple messages. This vector replaces the
+pre-assigned dev->irq with a new MSI vector if reclaiming the 
+pre-assigned vector is impossible. To avoid the conflict of new 
+assigned vector with existing pre-assigned vector requires the 
+device driver to call this API before calling request_irq(...).
+
+5.2 Configuring for MSI support
+
+Due to the non-contiguous fashion in vector assignment of the 
+existing Linux kernel, this patch does not support multiple 
+messages regardless of the device function is capable of supporting
+more than one vector. The bus driver initializes only entry 0 of
+this capability if pci_enable_msi(...) is called successfully by 
+the device driver.
+
+5.3 Configuring for MSI-X support
+
+Both the MSI capability structure and the MSI-X capability structure
+share the same above semantics; however, due to the ability of the
+system software to configure each vector of the MSI-X capability
+structure with an independent message address and message data, the
+non-contiguous fashion in vector assignment of the existing Linux 
+kernel has no impact on supporting multiple messages on an MSI-X 
+capable device functions. By default, as mentioned above, ONE vector
+should be always allocated to the MSI-X capability structure at 
+entry 0. The bus driver does not initialize other entries of the 
+MSI-X table.
+ 
+Note that the PCI subsystem should have full control of a MSI-X 
+table that resides in Memory Space. The software device driver 
+should not access this table. 
+
+To request for additional vectors, the device software driver should
+call function msi_alloc_vectors(). It is recommended that the 
+software driver should call this function once during the 
+initialization phase of the device driver.
+
+The function msi_alloc_vectors(), once invoked, enables either 
+all or nothing, depending on the current availability of vector 
+resources. If no vector resources are available, the device function
+still works with ONE vector. If the vector resources are available
+for the number of vectors requested by the driver, this function 
+will reconfigure the MSI-X capability structure of the device with
+additional messages, starting from entry 1. To emphasize this
+reason, for example, the device may be capable for supporting the 
+maximum of 32 vectors while its software driver usually may request
+4 vectors.
+
+For each vector, after this successful call, the device driver is 
+responsible to call other functions like request_irq(), enable_irq(),
+etc. to enable this vector with its corresponding interrupt service
+handler. It is the device driver's choice to have all vectors shared
+the same interrupt service handler or each vector with a unique 
+interrupt service handler. 
+
+In addition to the function msi_alloc_vectors(), another function 
+msi_free_vectors() is provided to allow the software driver to 
+release a number of vectors back to the vector resources. Once 
+invoked, the PCI subsystem disables (masks) each vector released. 
+These vectors are no longer valid for the hardware device and its
+software driver to use.
+
+int msi_alloc_vectors(struct pci_dev *dev, int *vector, int nvec)
+
+This API enables the software driver to request the PCI subsystem 
+for additional messages. Depending on the number of vectors 
+available, the PCI subsystem enables either all or nothing. 
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+requested.
+A return of zero indicates that the number of allocated vector is 
+successfully allocated. Otherwise, indicate resources not 
+available.
+
+int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+
+This API enables the software driver to inform the PCI subsystem
+that it is willing to release a number of vectors back to the 
+MSI resource pool. Once invoked, the PCI subsystem disables each
+MSI-X entry associated with each vector stored in the argument 2.
+These vectors are no longer valid for the hardware device and 
+its software driver to use.
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+released. 
+A return of zero indicates that the number of allocated vectors 
+is successfully released. Otherwise, indicates a failure.
+
+5.4 How to tell whether MSI is enabled on device function
+
+At the driver level, a return of zero from pci_enable_msi(...) 
+indicates to the device driver that its device function is 
+initialized successfully and ready to run in MSI mode.
+
+At the user level, users can use command 'cat /proc/interrupts' 
+to display the vector allocated for the device and its interrupt
+mode, as shown below. 
+
+           CPU0       CPU1       
+  0:     324639          0    IO-APIC-edge  timer
+  1:       1186          0    IO-APIC-edge  i8042
+  2:          0          0          XT-PIC  cascade
+ 12:       2797          0    IO-APIC-edge  i8042
+ 14:       6543          0    IO-APIC-edge  ide0
+ 15:          1          0    IO-APIC-edge  ide1
+169:          0          0   IO-APIC-level  uhci-hcd
+185:          0          0   IO-APIC-level  uhci-hcd
+193:        138         10         PCI MSI  aic79xx
+201:         30          0         PCI MSI  aic79xx
+225:         30          0   IO-APIC-level  aic7xxx
+233:         30          0   IO-APIC-level  aic7xxx
+NMI:          0          0 
+LOC:     324553     325068 
+ERR:          0
+MIS:          0
+
+6. FAQ
+
+Q1. Are there any limitations on using the MSI?
+
+A1. If the PCI device supports MSI and conforms to the
+specification and the platform supports the APIC local bus,
+then using MSI should work.
+
+Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
+AMD processors)? In P3 IPI's are transmitted on the APIC local
+bus and in P4 and Xeon they are transmitted on the system
+bus. Are there any implications with this?
+
+A2. MSI support enables a PCI device sending an inbound
+memory write (0xfeexxxxx as target address) on its PCI bus
+directly to the FSB. Since the message address has a
+redirection hint bit cleared, it should work.
+
+Q3. The target address 0xfeexxxxx will be translated by the
+Host Bridge into an interrupt message. Are there any
+limitations on the chipsets such as Intel 8xx, Intel e7xxx,
+or VIA?
+
+A3. If these chipsets support an inbound memory write with
+target address set as 0xfeexxxxx, as conformed to PCI 
+specification 2.3 or latest, then it should work.
+
+Q4. From the driver point of view, if the MSI is lost because
+of the errors occur during inbound memory write, then it may
+wait for ever. Is there a mechanism for it to recover?
+
+A4. Since the target of the transaction is an inbound memory
+write, all transaction termination conditions (Retry,
+Master-Abort, Target-Abort, or normal completion) are
+supported. A device sending an MSI must abide by all the PCI
+rules and conditions regarding that inbound memory write. So,
+if a retry is signaled it must retry, etc... We believe that
+the recommendation for Abort is also a retry (refer to PCI
+specification 2.3 or latest).
diff -urN linux-260t5-vectorbase-testbox/drivers/acpi/pci_irq.c linux-260t5-msi-testbox/drivers/acpi/pci_irq.c
--- linux-260t5-vectorbase-testbox/drivers/acpi/pci_irq.c	2003-09-24 08:38:09.000000000 -0400
+++ linux-260t5-msi-testbox/drivers/acpi/pci_irq.c	2003-09-22 12:08:56.000000000 -0400
@@ -234,7 +234,7 @@
                           PCI Interrupt Routing Support
    -------------------------------------------------------------------------- */
 
-static int
+int
 acpi_pci_irq_lookup (struct pci_bus *bus, int device, int pin)
 {
 	struct acpi_prt_entry	*entry = NULL;
diff -urN linux-260t5-vectorbase-testbox/drivers/pci/Makefile linux-260t5-msi-testbox/drivers/pci/Makefile
--- linux-260t5-vectorbase-testbox/drivers/pci/Makefile	2003-09-08 15:50:27.000000000 -0400
+++ linux-260t5-msi-testbox/drivers/pci/Makefile	2003-09-22 12:08:56.000000000 -0400
@@ -28,6 +28,7 @@
 obj-$(CONFIG_SGI_IP27) += setup-irq.o
 obj-$(CONFIG_SGI_IP32) += setup-irq.o
 obj-$(CONFIG_X86_VISWS) += setup-irq.o
+obj-$(CONFIG_PCI_USE_VECTOR) += msi.o
 
 # Cardbus & CompactPCI use setup-bus
 obj-$(CONFIG_HOTPLUG) += setup-bus.o
diff -urN linux-260t5-vectorbase-testbox/drivers/pci/msi.c linux-260t5-msi-testbox/drivers/pci/msi.c
--- linux-260t5-vectorbase-testbox/drivers/pci/msi.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t5-msi-testbox/drivers/pci/msi.c	2003-09-25 13:55:23.101339672 -0400
@@ -0,0 +1,1041 @@
+/*
+ * linux/drivers/pci/msi.c
+ */
+
+#include <linux/mm.h>
+#include <linux/irq.h>		
+#include <linux/interrupt.h>
+#include <linux/init.h>		
+#include <linux/config.h>      	
+#include <linux/ioport.h>       
+#include <linux/smp_lock.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/acpi.h>
+
+#include <asm/errno.h>
+#include <asm/io.h>		
+#include <asm/smp.h>
+#include <asm/desc.h>		
+#include <mach_apic.h>
+
+#include <linux/pci_msi.h>
+
+_DEFINE_DBG_BUFFER
+
+static spinlock_t msi_lock = SPIN_LOCK_UNLOCKED;
+static struct msi_desc* msi_desc[NR_IRQS] = { [0 ... NR_IRQS-1] = NULL };
+static kmem_cache_t* msi_cachep;
+
+static int pci_msi_enable = 1;
+static int nr_alloc_vectors = 0;
+static int nr_released_vectors = 0;
+static int nr_reserved_vectors = NR_HP_RESERVED_VECTORS;
+static int nr_msix_devices = 0;
+
+#ifndef CONFIG_X86_IO_APIC
+int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
+int irq_vector[NR_IRQS] = { FIRST_DEVICE_VECTOR , 0 };
+#endif
+
+static void msi_cache_ctor(void *p, kmem_cache_t *cache, unsigned long flags)
+{
+	memset(p, 0, NR_IRQS * sizeof(struct msi_desc));
+}
+
+static int msi_cache_init(void)
+{
+	msi_cachep = kmem_cache_create("msi_cache", 
+			NR_IRQS * sizeof(struct msi_desc),
+		       	0, SLAB_HWCACHE_ALIGN, msi_cache_ctor, NULL);
+	if (!msi_cachep)
+		return -ENOMEM;
+	
+	return 0;
+}
+
+static void msi_set_mask_bit(unsigned int vector, int flag)
+{
+	struct msi_desc *entry;
+	
+	entry = (struct msi_desc *)msi_desc[vector];
+	if (!entry || !entry->dev || !entry->mask_base)
+		return;
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+	{
+		int		pos;
+		unsigned int	mask_bits;
+
+		pos = entry->mask_base;
+	        entry->dev->bus->ops->read(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, &mask_bits);
+		mask_bits &= ~(1);
+		mask_bits |= flag;
+	        entry->dev->bus->ops->write(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, mask_bits);
+		break;
+	}
+	case PCI_CAP_ID_MSIX:
+	{
+		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+		writel(flag, entry->mask_base + offset);
+		break;
+	}
+	default:
+		break;
+	} 
+}
+
+#ifdef CONFIG_SMP
+static void set_msi_affinity(unsigned int vector, unsigned long cpu_mask)
+{
+	struct msi_desc *entry;
+	struct msg_address address;	
+
+	entry = (struct msi_desc *)msi_desc[vector];
+	if (!entry || !entry->dev) 
+		return;
+	
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+	{
+		int pos;
+
+   		if (!(pos = pci_find_capability(entry->dev, PCI_CAP_ID_MSI)))
+			return;
+	       	
+	        entry->dev->bus->ops->read(entry->dev->bus, entry->dev->devfn, 
+			msi_lower_address_reg(pos), 4, 
+			&address.lo_address.value);
+		address.lo_address.u.dest_id = cpu_mask_to_apicid(cpu_mask); 
+		entry->msi_attrib.current_cpu = address.lo_address.u.dest_id; 
+		entry->dev->bus->ops->write(entry->dev->bus, entry->dev->devfn, 
+			msi_lower_address_reg(pos), 4, 
+			address.lo_address.value);
+		break;
+	}
+	case PCI_CAP_ID_MSIX:
+	{
+		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET;
+
+		address.lo_address.value = readl(entry->mask_base + offset);
+		address.lo_address.u.dest_id = cpu_mask_to_apicid(cpu_mask); 
+		entry->msi_attrib.current_cpu = address.lo_address.u.dest_id; 
+		writel(address.lo_address.value, entry->mask_base + offset);
+		break;
+	}
+	default:
+		break;
+	}
+}
+
+static inline void move_msi(int vector)
+{
+	if (unlikely(pending_irq_balance_cpumask[vector])) {
+		set_msi_affinity(vector, pending_irq_balance_cpumask[vector]);
+		pending_irq_balance_cpumask[vector] = 0;
+	}
+}
+#endif
+
+static void mask_MSI_irq(unsigned int vector)
+{
+	msi_set_mask_bit(vector, 1);
+}
+
+static void unmask_MSI_irq(unsigned int vector)
+{
+	msi_set_mask_bit(vector, 0);
+}
+
+static unsigned int startup_msi_irq_wo_maskbit(unsigned int vector) 
+{
+	return 0;	/* never anything pending */
+}
+
+static void pci_disable_msi(unsigned int vector);
+static void shutdown_msi_irq(unsigned int vector) 
+{
+	pci_disable_msi(vector);
+}
+
+#define shutdown_msi_irq_wo_maskbit	shutdown_msi_irq 
+static void enable_msi_irq_wo_maskbit(unsigned int vector) {}
+static void disable_msi_irq_wo_maskbit(unsigned int vector) {}
+static void act_msi_irq_wo_maskbit(unsigned int vector) {}
+static void end_msi_irq_wo_maskbit(unsigned int vector) 
+{
+	move_msi(vector);
+	ack_APIC_irq();	
+}
+
+static unsigned int startup_msi_irq_w_maskbit(unsigned int vector)
+{ 
+	unmask_MSI_irq(vector);
+	return 0;	/* never anything pending */
+}
+
+#define shutdown_msi_irq_w_maskbit	shutdown_msi_irq
+#define enable_msi_irq_w_maskbit	unmask_MSI_irq
+#define disable_msi_irq_w_maskbit	mask_MSI_irq
+#define act_msi_irq_w_maskbit		mask_MSI_irq
+
+static void end_msi_irq_w_maskbit(unsigned int vector) 
+{
+	move_msi(vector);
+	unmask_MSI_irq(vector);	
+	ack_APIC_irq();
+}
+
+/*
+ * Interrupt Type for MSI-X PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI-X Capability Structure. 
+ */
+static struct hw_interrupt_type msix_irq_type = {
+	"PCI MSI-X",
+	startup_msi_irq_w_maskbit,
+	shutdown_msi_irq_w_maskbit,
+	enable_msi_irq_w_maskbit,
+	disable_msi_irq_w_maskbit,
+	act_msi_irq_w_maskbit,		
+	end_msi_irq_w_maskbit,	
+	set_msi_irq_affinity
+};
+
+/*
+ * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI Capability Structure with 
+ * Mask-and-Pending Bits. 
+ */
+static struct hw_interrupt_type msi_irq_w_maskbit_type = {
+	"PCI MSI",
+	startup_msi_irq_w_maskbit,
+	shutdown_msi_irq_w_maskbit,
+	enable_msi_irq_w_maskbit,
+	disable_msi_irq_w_maskbit,
+	act_msi_irq_w_maskbit,		
+	end_msi_irq_w_maskbit,	
+	set_msi_irq_affinity
+};
+
+/*
+ * Interrupt Type for MSI PCI/PCI-X/PCI-Express Devices,
+ * which implement the MSI Capability Structure without 
+ * Mask-and-Pending Bits. 
+ */
+static struct hw_interrupt_type msi_irq_wo_maskbit_type = {
+	"PCI MSI",
+	startup_msi_irq_wo_maskbit,
+	shutdown_msi_irq_wo_maskbit,
+	enable_msi_irq_wo_maskbit,
+	disable_msi_irq_wo_maskbit,
+	act_msi_irq_wo_maskbit,		
+	end_msi_irq_wo_maskbit,	
+	set_msi_irq_affinity
+};
+
+static void msi_data_init(struct msg_data *msi_data, 
+			  unsigned int vector)
+{
+	memset(msi_data, 0, sizeof(struct msg_data));
+	msi_data->vector = (u8)vector;
+	msi_data->delivery_mode = MSI_DELIVERY_MODE;
+	msi_data->level = MSI_LEVEL_MODE;
+	msi_data->trigger = MSI_TRIGGER_MODE;	
+}
+
+static void msi_address_init(struct msg_address *msi_address)
+{
+	memset(msi_address, 0, sizeof(struct msg_address));
+	msi_address->hi_address = (u32)0;
+	msi_address->lo_address.u.header = MSI_ADDRESS_HEADER;
+	msi_address->lo_address.u.dest_mode = MSI_LOGICAL_MODE;
+	msi_address->lo_address.u.redirection_hint = MSI_REDIRECTION_HINT_MODE;
+	msi_address->lo_address.u.dest_id = MSI_TARGET_CPU; 
+}
+
+static int pci_vector_resources(void) 					
+{	
+	static int res = -EINVAL;
+	int nr_free_vectors;
+	
+	if (res == -EINVAL) {
+		int i, repeat;
+		for (i = NR_REPEATS; i > 0; i--) {				
+			if ((FIRST_DEVICE_VECTOR + i * 8) > FIRST_SYSTEM_VECTOR)
+				continue; 					
+			break; 							
+		} 								
+		i++; 								
+		repeat = (FIRST_SYSTEM_VECTOR - FIRST_DEVICE_VECTOR)/i; 	
+		res = i * repeat - NR_RESERVED_VECTORS + 1; 
+	}
+
+	nr_free_vectors = res + nr_released_vectors - nr_alloc_vectors;	
+
+	return nr_free_vectors;
+}
+
+int assign_irq_vector(int irq)
+{
+	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+	
+	if (irq != MSI_AUTO && IO_APIC_VECTOR(irq) > 0)
+		return IO_APIC_VECTOR(irq);
+next:
+	current_vector += 8;
+	if (current_vector == SYSCALL_VECTOR)
+		goto next;
+
+	if (current_vector > FIRST_SYSTEM_VECTOR) {
+		offset++;
+		current_vector = FIRST_DEVICE_VECTOR + offset;
+	}
+
+	if (current_vector == FIRST_SYSTEM_VECTOR) 
+		return -ENOSPC;
+	
+	vector_irq[current_vector] = irq;
+	if (irq != MSI_AUTO) 
+		IO_APIC_VECTOR(irq) = current_vector;
+
+	nr_alloc_vectors++;
+
+	return current_vector;
+}
+
+static int assign_msi_vector(void)
+{
+	static int new_vector_avail = 1; 	
+	int vector;
+	unsigned long flags;
+	
+	/* 
+	 * msi_lock is provided to ensure that successful allocation of MSI
+	 * vector is assigned unique among drivers.
+	 */
+	spin_lock_irqsave(&msi_lock, flags);
+	if (!(pci_vector_resources() > 0)) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+
+	if (!new_vector_avail) {
+		/*
+	 	 * vector_irq[] = -1 indicates that this specific vector is:
+	 	 * - assigned for MSI (since MSI have no associated IRQ) or
+	 	 * - assigned for legacy if less than 16, or
+	 	 * - having no corresponding 1:1 vector-to-IOxAPIC IRQ mapping
+	 	 * vector_irq[] = 0 indicates that this vector, previously 
+		 * assigned for MSI, is freed by hotplug removed operations. 
+		 * This vector will be reused for any subsequent hotplug added 
+		 * operations.
+	 	 * vector_irq[] > 0 indicates that this vector is assigned for 
+		 * IOxAPIC IRQs. This vector and its value provides a 1-to-1 
+		 * vector-to-IOxAPIC IRQ mapping.
+	 	 */   
+		for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) {
+			if (vector_irq[vector] != 0) 
+				continue;
+			vector_irq[vector] = -1;	
+			nr_released_vectors--;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			return vector;
+		}
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}	
+
+	vector = assign_irq_vector(MSI_AUTO);
+	if (vector  == (FIRST_SYSTEM_VECTOR - 8))  
+		new_vector_avail = 0;
+
+	spin_unlock_irqrestore(&msi_lock, flags);
+	return vector;
+}
+
+static int get_new_vector(void)
+{
+	int vector;
+	
+	if ((vector = assign_msi_vector()) > 0)
+		set_intr_gate(vector, interrupt[vector]);
+
+	return vector;
+}
+
+static int msi_init(void)
+{
+	static int status = -ENOMEM;
+	
+	if (!status)
+		return status;
+
+	if ((status = msi_cache_init()) < 0) {
+		pci_msi_enable = 0;
+		printk(KERN_INFO "WARNING: MSI INIT FAILURE\n");
+		return status;
+	}
+	printk(KERN_INFO "MSI INIT SUCCESS\n");
+	
+	return status;
+}
+	
+static int get_msi_vector(struct pci_dev *dev)
+{
+	return get_new_vector();
+}
+
+static struct msi_desc* alloc_msi_entry(void)
+{
+	struct msi_desc *entry;
+
+	entry = (struct msi_desc*) kmem_cache_alloc(msi_cachep, SLAB_KERNEL);
+	if (!entry)
+		return NULL;
+
+	memset(entry, 0, sizeof(struct msi_desc));
+	entry->link.tail = entry->link.head = 0;	/* single message */
+	entry->dev = NULL;
+	
+	return entry;
+}
+
+static void attach_msi_entry(struct msi_desc *entry, int vector)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	msi_desc[vector] = entry;
+	spin_unlock_irqrestore(&msi_lock, flags);
+}
+
+static void irq_handler_init(int cap_id, int pos, int mask)
+{
+	spin_lock(&irq_desc[pos].lock);
+	if (cap_id == PCI_CAP_ID_MSIX)
+		irq_desc[pos].handler = &msix_irq_type;
+	else {
+		if (!mask)
+			irq_desc[pos].handler = &msi_irq_wo_maskbit_type;
+		else
+			irq_desc[pos].handler = &msi_irq_w_maskbit_type;
+	}
+	spin_unlock(&irq_desc[pos].lock);
+}
+
+static void enable_msi_mode(struct pci_dev *dev, int pos, int type)
+{
+	u32 control;
+			
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msi_control_reg(pos), 2, &control);
+	if (type == PCI_CAP_ID_MSI) {
+		/* Set enabled bits to single MSI & enable MSI_enable bit */
+		msi_enable(control, 1);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	} else {
+		msix_enable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	}
+    	if (pci_find_capability(dev, PCI_CAP_ID_EXP)) {
+		/* PCI Express Endpoint device detected */
+		u32 cmd;
+	        dev->bus->ops->read(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
+		cmd |= PCI_COMMAND_INTX_DISABLE;
+	        dev->bus->ops->write(dev->bus, dev->devfn, PCI_COMMAND, 2, cmd);
+	}
+}
+
+static void disable_msi_mode(struct pci_dev *dev, int pos, int type)
+{
+	u32 control;
+	
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msi_control_reg(pos), 2, &control);
+	if (type == PCI_CAP_ID_MSI) {
+		/* Set enabled bits to single MSI & enable MSI_enable bit */
+		msi_disable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	} else {
+		msix_disable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	}
+    	if (pci_find_capability(dev, PCI_CAP_ID_EXP)) {
+		/* PCI Express Endpoint device detected */
+		u32 cmd;
+	        dev->bus->ops->read(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
+		cmd &= ~PCI_COMMAND_INTX_DISABLE;
+	        dev->bus->ops->write(dev->bus, dev->devfn, PCI_COMMAND, 2, cmd);
+	}
+}
+
+static int msi_lookup_vector(struct pci_dev *dev)
+{
+	int vector;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	for (vector = FIRST_DEVICE_VECTOR; vector < NR_IRQS; vector++) {
+		if (!msi_desc[vector] || msi_desc[vector]->dev != dev ||
+			msi_desc[vector]->msi_attrib.entry_nr ||
+			msi_desc[vector]->msi_attrib.default_vector != dev->irq)
+			continue;	/* not entry 0, skip */
+		spin_unlock_irqrestore(&msi_lock, flags);
+		/* This pre-assigned entry-0 MSI vector for this device 
+		   already exits. Override dev->irq with this vector */ 
+		dev->irq = vector;	
+		return 0;
+	}
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	return -EACCES;
+}
+
+void pci_scan_msi_device(struct pci_dev *dev)
+{
+	if (!dev)
+		return;
+
+   	if (pci_find_capability(dev, PCI_CAP_ID_MSIX) > 0) {
+		nr_reserved_vectors++;
+		nr_msix_devices++;
+	} else if (pci_find_capability(dev, PCI_CAP_ID_MSI) > 0)
+		nr_reserved_vectors++;
+}
+
+/*
+ * msi_capability_init: configure device's MSI capability structure
+ * argument: dev of struct pci_dev type
+ * 
+ * description: called to configure the device's MSI capability structure with 
+ * a single MSI, regardless of device is capable of handling multiple messages.
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * 0		: Successfully allocate vector for entry 0
+ **/
+static int msi_capability_init(struct pci_dev *dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int pos, vector;
+	u32 control;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+	if (!pos)
+		return -EINVAL; 
+
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 
+		2, &control);
+	if (control & PCI_MSI_FLAGS_ENABLE)
+		return 0;
+	
+	vector = dev->irq;
+	if (vector > 0 && !msi_lookup_vector(dev)) {
+		/* Lookup Sucess */
+		enable_msi_mode(dev, pos, PCI_CAP_ID_MSI);
+		return 0;
+	}
+	/* MSI Entry Initialization */
+	if (!(entry = alloc_msi_entry()))
+		return -ENOMEM;
+	
+	if ((vector = get_msi_vector(dev)) < 0) {
+		kmem_cache_free(msi_cachep, entry);
+		return -EBUSY;
+	}
+	entry->msi_attrib.type = PCI_CAP_ID_MSI;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = is_mask_bit_support(control);
+	entry->msi_attrib.default_vector = dev->irq;
+	dev->irq = vector;	/* save default pre-assigned ioapic vector */
+	entry->dev = dev;
+	if (is_mask_bit_support(control)) { 
+		entry->mask_base = msi_mask_bits_reg(pos, 
+				is_64bit_address(control));
+	}
+	/* Replace with MSI handler */
+	irq_handler_init(PCI_CAP_ID_MSI, vector, entry->msi_attrib.maskbit);
+	/* Configure MSI capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	dev->bus->ops->write(dev->bus, dev->devfn, msi_lower_address_reg(pos), 
+				4, address.lo_address.value);
+	if (is_64bit_address(control)) {
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_upper_address_reg(pos), 4, address.hi_address);
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 1), 2, *((u32*)&data));
+	} else
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 0), 2, *((u32*)&data));
+	if (entry->msi_attrib.maskbit) {
+		unsigned int maskbits, temp;
+		/* All MSIs are unmasked by default, Mask them all */
+	        dev->bus->ops->read(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			&maskbits);
+		temp = (1 << multi_msi_capable(control));
+		temp = ((temp - 1) & ~temp);
+		maskbits |= temp;
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			maskbits);
+	}
+	attach_msi_entry(entry, vector);
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSI);
+
+	return 0;
+}
+
+/**
+ * msix_capability_init: configure device's MSI-X capability structure
+ * argument: dev of struct pci_dev type
+ * 
+ * description: called to configure the device's MSI-X capability structure 
+ * with a single MSI. To request for additional MSI vectors, the device drivers
+ * are required to utilize the following supported APIs:
+ * 1) msi_alloc_vectors(...) for requesting one or more MSI
+ * 2) msi_free_vectors(...) for releasing one or more MSI back to PCI subsystem
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * 0		: Successfully allocate vector for entry 0
+ **/
+static int msix_capability_init(struct pci_dev	*dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int vector, pos, dev_msi_cap;
+	u32 phys_addr, table_offset;
+	u32 control;
+	u8 bir;
+	void *base;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	if (!pos)
+		return -EINVAL;  
+
+	/* Request & Map MSI-X table region */
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 2, 
+		&control);
+	if (control & PCI_MSIX_FLAGS_ENABLE)
+		return 0;
+
+	vector = dev->irq;
+	if (vector > 0 && !msi_lookup_vector(dev)) {
+		/* Lookup Sucess */
+		enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX);
+		return 0;
+	}
+
+	dev_msi_cap = multi_msix_capable(control);
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msix_table_offset_reg(pos), 4, &table_offset);
+	bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+	phys_addr = pci_resource_start (dev, bir);
+	phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+	if (!request_mem_region(phys_addr, 
+		dev_msi_cap * PCI_MSIX_ENTRY_SIZE,
+		"MSI-X iomap Failure"))
+		return -ENOMEM;
+	base = ioremap_nocache(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+	if (base == NULL) 
+		goto free_region; 
+	/* MSI Entry Initialization */
+	entry = alloc_msi_entry();
+	if (!entry) 
+		goto free_iomap; 
+	if ((vector = get_msi_vector(dev)) < 0)
+		goto free_entry;
+       	
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = 1;
+	entry->msi_attrib.default_vector = dev->irq;
+	dev->irq = vector;	/* save default pre-assigned ioapic vector */
+	entry->dev = dev;
+	entry->mask_base = (unsigned long)base; 
+	/* Replace with MSI handler */
+	irq_handler_init(PCI_CAP_ID_MSIX, vector, 1);
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	writel(address.lo_address.value, base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	writel(address.hi_address, base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+	writel(*(u32*)&data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
+	/* Initialize all entries from 1 up to 0 */
+	for (pos = 1; pos < dev_msi_cap; pos++) {
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+		writel(0, base + pos * PCI_MSIX_ENTRY_SIZE +
+			PCI_MSIX_ENTRY_DATA_OFFSET);
+	}
+	attach_msi_entry(entry, vector);
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX);
+
+	return 0;
+
+free_entry:
+	kmem_cache_free(msi_cachep, entry);
+free_iomap:
+	iounmap(base);
+free_region: 
+	release_mem_region(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+
+	return ((vector < 0) ? -EBUSY : -ENOMEM);
+}
+
+/** 
+ * pci_enable_msi: configure MSI/MSI-X capability structure
+ * argument: struct pci_dev *dev 
+ * 
+ * description: called by device driver to request MSI mode on its device.
+ * The decision of enabling MSI on this function requires that all 
+ * of the following conditions must be satisfied:
+ * -	Kernel configuration parameter CONFIG_PCI_USE_VECTOR is set to 'Y'. 
+ * -	The function indicates MSI support by implementing MSI/MSI-X 
+ *	capability structure in its capability list.
+ * -	The MSI resource pool is available.
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * 0		: Sucessfully allocate MSI vector for entry 0
+ **/
+int pci_enable_msi(struct pci_dev* dev)
+{
+	int status = -EINVAL;
+
+	if (!pci_msi_enable || !dev)
+ 		return status;	
+	
+	if (msi_init() < 0)
+		return -ENOMEM;
+		
+	if ((status = msix_capability_init(dev)) == -EINVAL)
+		status = msi_capability_init(dev);
+	if (!status)
+		nr_reserved_vectors--;
+
+	return status;
+}
+
+static int msi_free_vector(struct pci_dev* dev, int vector);
+static void pci_disable_msi(unsigned int vector)
+{
+	int head, tail, type, default_vector;
+	struct msi_desc *entry;
+	struct pci_dev *dev;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[vector];
+	if (!entry || !entry->dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return;
+	}
+	dev = entry->dev;
+	type = entry->msi_attrib.type;
+	head = entry->link.head;
+	tail = entry->link.tail;
+	default_vector = entry->msi_attrib.default_vector;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	disable_msi_mode(dev, pci_find_capability(dev, type), type);
+	/* Restore dev->irq to its default pin-assertion vector */
+	dev->irq = default_vector;
+	if (type == PCI_CAP_ID_MSIX && head != tail) {
+		/* Bad driver, which do not call msi_free_vectors before exit. 
+		   We must do a cleanup here */
+		while (1) {
+			spin_lock_irqsave(&msi_lock, flags);
+			entry = msi_desc[vector];
+			head = entry->link.head;
+			tail = entry->link.tail;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			if (tail == head)
+				break;
+			if (msi_free_vector(dev, entry->link.tail))
+				break;
+		}
+	}
+}
+
+static int msi_alloc_vector(struct pci_dev* dev, int head)
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int i, offset, pos, dev_msi_cap, vector;
+	u32 low_address, control;
+	unsigned long base = 0L;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	base = entry->mask_base; 
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos),	
+		2, &control);
+	dev_msi_cap = multi_msix_capable(control);
+	for (i = 1; i < dev_msi_cap; i++) {
+		if (!(low_address = readl(base + i * PCI_MSIX_ENTRY_SIZE)))
+			 break;
+	}
+	if (i >= dev_msi_cap) 
+		return -EINVAL;
+	
+	/* MSI Entry Initialization */
+	if (!(entry = alloc_msi_entry()))
+		return -ENOMEM;
+		
+	if ((vector = get_new_vector()) < 0) {
+		kmem_cache_free(msi_cachep, entry);
+		return vector;
+	}
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = i;
+	entry->msi_attrib.maskbit = 1;
+	entry->dev = dev;
+	entry->link.head = head;
+	entry->mask_base = base; 
+	irq_handler_init(PCI_CAP_ID_MSIX, vector, 1);
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+	writel(address.lo_address.value, base + offset + 
+		PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	writel(address.hi_address, base + offset +
+		PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+	writel(*(u32*)&data, base + offset + PCI_MSIX_ENTRY_DATA_OFFSET);
+	writel(1, base + offset + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+	attach_msi_entry(entry, vector);
+	
+	return vector;
+}
+
+static int msi_free_vector(struct pci_dev* dev, int vector)
+{
+	struct msi_desc *entry;
+	int entry_nr, type;
+	unsigned long base = 0L;
+	unsigned long flags;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[vector];
+	if (!entry || entry->dev != dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	type = entry->msi_attrib.type;
+	entry_nr = entry->msi_attrib.entry_nr;
+	base = entry->mask_base;
+	if (entry->link.tail != entry->link.head) {
+		msi_desc[entry->link.head]->link.tail = entry->link.tail;
+		if (entry->link.tail) 
+			msi_desc[entry->link.tail]->link.head = entry->link.head; 
+	}
+	entry->dev = NULL;
+	vector_irq[vector] = 0;	
+	nr_released_vectors++;
+	msi_desc[vector] = NULL;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	kmem_cache_free(msi_cachep, entry);
+	if (type == PCI_CAP_ID_MSIX) {
+		int offset;
+
+		offset = entry_nr * PCI_MSIX_ENTRY_SIZE;
+		writel(1, base + offset + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+		writel(0, base + offset + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+	}
+	
+	return 0;
+}
+
+int msi_alloc_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i, head, pos, vec, free_vectors, alloc_vectors;
+	int *vectors = (int *)vector;
+	u32 control;
+	unsigned long flags;
+
+	if (!pci_msi_enable || !dev)
+ 		return -EINVAL;
+
+   	if (!(pos = pci_find_capability(dev, PCI_CAP_ID_MSIX)))
+ 		return -EINVAL;
+
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+	if (nvec > multi_msix_capable(control))
+		return -EINVAL;
+
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev ||		/* legal call */
+	   entry->msi_attrib.type != PCI_CAP_ID_MSIX || /* must be MSI-X */ 
+	   entry->link.head != entry->link.tail) {	/* already multi */ 
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	/*
+	 * msi_lock is provided to ensure that enough vectors resources are
+	 * available before granting.
+	 */
+	free_vectors = pci_vector_resources(); 
+	/* Ensure that each MSI/MSI-X device has one vector reserved by 
+	   default to avoid any MSI-X driver to take all available 
+ 	   resources */
+	free_vectors -= nr_reserved_vectors;
+	/* Find the average of free vectors among MSI-X devices */
+	if (nr_msix_devices > 0)
+		free_vectors /= nr_msix_devices;
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	if (nvec > free_vectors) 
+		return -EBUSY;
+
+	alloc_vectors = 0;
+	head = dev->irq;
+	for (i = 0; i < nvec; i++) {
+		if ((vec = msi_alloc_vector(dev, head)) < 0) 
+			break;
+		*(vectors + i) = vec;
+		head = vec;
+		alloc_vectors++;
+	}
+	if (alloc_vectors != nvec) {
+		for (i = 0; i < alloc_vectors; i++) {
+			vec = *(vectors + i);
+			msi_free_vector(dev, vec);
+		}
+		spin_lock_irqsave(&msi_lock, flags);
+		msi_desc[dev->irq]->link.tail = msi_desc[dev->irq]->link.head;
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+	if (nr_msix_devices > 0)
+		nr_msix_devices--;
+
+	return 0;
+}
+
+int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return -EINVAL;
+	
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev || 
+	   	entry->msi_attrib.type != PCI_CAP_ID_MSIX || 
+		entry->link.head == entry->link.tail) {	/* Nothing to free */
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	spin_unlock_irqrestore(&msi_lock, flags);
+
+	for (i = 0; i < nvec; i++) {
+		if (*(vector + i) == dev->irq)
+			continue;/* Don't free entry 0 if mistaken by driver */
+		msi_free_vector(dev, *(vector + i)); 
+	}
+
+	return 0;
+}
+
+/**
+ * msi_remove_pci_irq_vectors: reclaim all MSI previous assigned to this device
+ * argument: device pointer of type struct pci_dev 
+ *
+ * description: used during hotplug removed, from which device is hot-removed;
+ * PCI subsystem reclaims associated MSI to unused state, which may be used 
+ * later on.
+ **/ 
+void msi_remove_pci_irq_vectors(struct pci_dev* dev)
+{
+	struct msi_desc *entry;
+	int type;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return;
+	
+	spin_lock_irqsave(&msi_lock, flags);
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return;
+	}
+	type = entry->msi_attrib.type;
+	spin_unlock_irqrestore(&msi_lock, flags);
+		
+	msi_free_vector(dev, dev->irq);
+	if (type == PCI_CAP_ID_MSIX) {
+		int i, pos, dev_msi_cap;
+		u32 phys_addr, table_offset;
+		u32 control;
+		u8 bir;
+
+   		pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+		dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+		dev_msi_cap = multi_msix_capable(control);
+		dev->bus->ops->read(dev->bus, dev->devfn, 
+			msix_table_offset_reg(pos), 4, &table_offset);
+		bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+		phys_addr = pci_resource_start (dev, bir);
+		phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+		for (i = FIRST_DEVICE_VECTOR; i < NR_IRQS; i++) {
+			spin_lock_irqsave(&msi_lock, flags);
+			if (!msi_desc[i] || msi_desc[i]->dev != dev) {
+				spin_unlock_irqrestore(&msi_lock, flags);
+				continue;
+			}
+			spin_unlock_irqrestore(&msi_lock, flags);
+			msi_free_vector(dev, i);
+		}
+		writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+		iounmap((void*)entry->mask_base);
+		release_mem_region(phys_addr, dev_msi_cap * PCI_MSIX_ENTRY_SIZE);
+	} 
+	nr_reserved_vectors++;
+}
+
+EXPORT_SYMBOL(pci_enable_msi);
+EXPORT_SYMBOL(msi_alloc_vectors);
+EXPORT_SYMBOL(msi_free_vectors);
diff -urN linux-260t5-vectorbase-testbox/drivers/pci/probe.c linux-260t5-msi-testbox/drivers/pci/probe.c
--- linux-260t5-vectorbase-testbox/drivers/pci/probe.c	2003-09-08 15:50:17.000000000 -0400
+++ linux-260t5-msi-testbox/drivers/pci/probe.c	2003-09-22 12:08:56.000000000 -0400
@@ -552,6 +552,7 @@
 		struct pci_dev *dev;
 
 		dev = pci_scan_device(bus, devfn);
+		pci_scan_msi_device(dev);
 		if (func == 0) {
 			if (!dev)
 				break;
@@ -564,6 +565,9 @@
 		/* Fix up broken headers */
 		pci_fixup_device(PCI_FIXUP_HEADER, dev);
 
+		/* Fix up broken MSI hardware */
+		pci_fixup_device(PCI_FIXUP_MSI, dev);
+
 		/*
 		 * Add the device to our list of discovered devices
 		 * and the bus list for fixup functions, etc.
diff -urN linux-260t5-vectorbase-testbox/drivers/pci/remove.c linux-260t5-msi-testbox/drivers/pci/remove.c
--- linux-260t5-vectorbase-testbox/drivers/pci/remove.c	2003-09-08 15:49:57.000000000 -0400
+++ linux-260t5-msi-testbox/drivers/pci/remove.c	2003-09-22 12:08:56.000000000 -0400
@@ -14,6 +14,8 @@
 {
 	int i;
 
+ 	msi_remove_pci_irq_vectors(dev);
+ 
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *res = dev->resource + i;
 		if (res->parent)
diff -urN linux-260t5-vectorbase-testbox/include/linux/pci.h linux-260t5-msi-testbox/include/linux/pci.h
--- linux-260t5-vectorbase-testbox/include/linux/pci.h	2003-09-08 15:50:03.000000000 -0400
+++ linux-260t5-msi-testbox/include/linux/pci.h	2003-09-22 12:08:56.000000000 -0400
@@ -36,6 +36,7 @@
 #define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
+#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
@@ -198,6 +199,8 @@
 #define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
 #define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
 #define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
+#define  PCI_CAP_ID_EXP 	0x10	/* PCI Express */
+#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
 #define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF		4
@@ -275,11 +278,13 @@
 #define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
 #define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
 #define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
+#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
 #define PCI_MSI_RFU		3	/* Rest of capability flags */
 #define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
 #define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
 #define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_BIT	16	/* Mask bits register */
 
 /* CompactPCI Hotswap Register */
 
@@ -696,6 +701,18 @@
 extern struct pci_dev *isa_bridge;
 #endif
 
+#ifndef CONFIG_PCI_USE_VECTOR
+static inline void pci_scan_msi_device(struct pci_dev *dev) {}
+static inline int pci_enable_msi(struct pci_dev *dev) {return -1;}
+static inline void msi_remove_pci_irq_vectors(struct pci_dev *dev) {}
+#else
+extern void pci_scan_msi_device(struct pci_dev *dev);
+extern int pci_enable_msi(struct pci_dev *dev);
+extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
+extern int msi_alloc_vectors(struct pci_dev* dev, int *vector, int nvec);
+extern int msi_free_vectors(struct pci_dev* dev, int *vector, int nvec);
+#endif
+
 #endif /* CONFIG_PCI */
 
 /* Include architecture-dependent settings and functions */
@@ -866,6 +883,7 @@
 
 #define PCI_FIXUP_HEADER	1		/* Called immediately after reading configuration header */
 #define PCI_FIXUP_FINAL		2		/* Final phase of device fixups */
+#define PCI_FIXUP_MSI		3		/* Fixup for MSI buggy hardware */
 
 void pci_fixup_device(int pass, struct pci_dev *dev);
 
diff -urN linux-260t5-vectorbase-testbox/include/linux/pci_msi.h linux-260t5-msi-testbox/include/linux/pci_msi.h
--- linux-260t5-vectorbase-testbox/include/linux/pci_msi.h	1969-12-31 19:00:00.000000000 -0500
+++ linux-260t5-msi-testbox/include/linux/pci_msi.h	2003-09-25 10:51:35.000000000 -0400
@@ -0,0 +1,170 @@
+/*
+ *	../include/linux/pci_msi.h
+ *
+ */
+
+#ifndef _ASM_PCI_MSI_H
+#define _ASM_PCI_MSI_H
+
+#include <linux/pci.h>
+
+#define MSI_AUTO -1
+#define NR_REPEATS	23
+#define NR_RESERVED_VECTORS 3 /*FIRST_DEVICE_VECTOR,FIRST_SYSTEM_VECTOR,0x80 */	
+
+/* 
+ * Assume the maximum number of hot plug slots supported by the system is about
+ * ten. The worstcase is that each of these slots is hot-added with a device, 
+ * which has two MSI/MSI-X capable functions. To avoid any MSI-X driver, which
+ * attempts to request all available vectors, NR_HP_RESERVED_VECTORS is defined
+ * as below to ensure at least one message is assigned to each detected MSI/
+ * MSI-X device function.
+ */
+#define NR_HP_RESERVED_VECTORS 	20
+
+extern int vector_irq[NR_IRQS];
+extern cpumask_t pending_irq_balance_cpumask[NR_IRQS];
+extern void (*interrupt[NR_IRQS])(void);
+
+#ifdef CONFIG_SMP
+#define set_msi_irq_affinity	set_msi_affinity
+#else
+#define set_msi_irq_affinity	NULL
+static inline void move_msi(int vector) {}
+#endif
+
+#ifndef CONFIG_X86_IO_APIC
+static inline int get_ioapic_vector(struct pci_dev *dev) { return -1;}
+static inline void restore_ioapic_irq_handler(int irq) {}
+#else
+extern void restore_ioapic_irq_handler(int irq);
+#endif
+
+/*
+ * MSI-X Address Register
+ */
+#define PCI_MSIX_FLAGS_QSIZE		0x7FF
+#define PCI_MSIX_FLAGS_ENABLE		(1 << 15)
+#define PCI_MSIX_FLAGS_BIRMASK		(7 << 0)
+#define PCI_MSIX_FLAGS_BITMASK		(1 << 0)
+
+#define PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET	0
+#define PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET	4
+#define PCI_MSIX_ENTRY_DATA_OFFSET		8
+#define PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET	12
+#define PCI_MSIX_ENTRY_SIZE			16
+
+#define msi_control_reg(base)		(base + PCI_MSI_FLAGS)
+#define msi_lower_address_reg(base)	(base + PCI_MSI_ADDRESS_LO)
+#define msi_upper_address_reg(base)	(base + PCI_MSI_ADDRESS_HI)
+#define msi_data_reg(base, is64bit)	\
+	( (is64bit == 1) ? base+PCI_MSI_DATA_64 : base+PCI_MSI_DATA_32 )
+#define msi_mask_bits_reg(base, is64bit) \
+	( (is64bit == 1) ? base+PCI_MSI_MASK_BIT : base+PCI_MSI_MASK_BIT-4)
+#define msi_disable(control)		control &= ~PCI_MSI_FLAGS_ENABLE
+#define multi_msi_capable(control) \
+	(1 << ((control & PCI_MSI_FLAGS_QMASK) >> 1))
+#define multi_msi_enable(control, num) \
+	control |= (((num >> 1) << 4) & PCI_MSI_FLAGS_QSIZE);
+#define is_64bit_address(control)	(control & PCI_MSI_FLAGS_64BIT)	
+#define is_mask_bit_support(control)	(control & PCI_MSI_FLAGS_MASKBIT)
+#define msi_enable(control, num) multi_msi_enable(control, num); \
+	control |= PCI_MSI_FLAGS_ENABLE
+ 
+#define msix_control_reg		msi_control_reg
+#define msix_table_offset_reg(base)	(base + 0x04)
+#define msix_pba_offset_reg(base)	(base + 0x08)
+#define msix_enable(control)	 	control |= PCI_MSIX_FLAGS_ENABLE
+#define msix_disable(control)	 	control &= ~PCI_MSIX_FLAGS_ENABLE
+#define msix_table_size(control) 	((control & PCI_MSIX_FLAGS_QSIZE)+1)
+#define multi_msix_capable		msix_table_size
+#define msix_unmask(address)	 	(address & ~PCI_MSIX_FLAGS_BITMASK)
+#define msix_mask(address)		(address | PCI_MSIX_FLAGS_BITMASK)  
+#define msix_is_pending(address) 	(address & PCI_MSIX_FLAGS_PENDMASK)
+
+extern char __dbg_str_buf[256];
+#define _DEFINE_DBG_BUFFER	char __dbg_str_buf[256];
+#define _DBG_K_TRACE_ENTRY	((unsigned int)0x00000001)
+#define _DBG_K_TRACE_EXIT	((unsigned int)0x00000002)
+#define _DBG_K_INFO		((unsigned int)0x00000004)
+#define _DBG_K_ERROR		((unsigned int)0x00000008)
+#define _DBG_K_TRACE	(_DBG_K_TRACE_ENTRY | _DBG_K_TRACE_EXIT)
+
+#define _DEBUG_LEVEL	(_DBG_K_INFO | _DBG_K_ERROR | _DBG_K_TRACE)
+#define _DBG_PRINT( dbg_flags, args... )		\
+if ( _DEBUG_LEVEL & (dbg_flags) )			\
+{							\
+	int len;					\
+	len = sprintf(__dbg_str_buf, "%s:%d: %s ", 	\
+		__FILE__, __LINE__, __FUNCTION__ ); 	\
+	sprintf(__dbg_str_buf + len, args);		\
+	printk(KERN_INFO "%s\n", __dbg_str_buf);	\
+}
+
+#define MSI_FUNCTION_TRACE_ENTER	\
+	_DBG_PRINT (_DBG_K_TRACE_ENTRY, "%s", "[Entry]");
+#define MSI_FUNCTION_TRACE_EXIT		\
+	_DBG_PRINT (_DBG_K_TRACE_EXIT, "%s", "[Entry]");
+	
+/*
+ * MSI Defined Data Structures
+ */
+#define MSI_ADDRESS_HEADER		0xfee
+#define MSI_DELIVERY_MODE		0
+#define MSI_LEVEL_MODE			1	/* Edge always assert */
+#define MSI_TRIGGER_MODE		0	/* MSI is edge sensitive */
+#define MSI_LOGICAL_MODE		1
+#define MSI_REDIRECTION_HINT_MODE	0
+#ifdef CONFIG_SMP
+#define MSI_TARGET_CPU			logical_smp_processor_id()
+#else
+#define MSI_TARGET_CPU			TARGET_CPUS
+#endif
+struct msg_data {	
+	__u32	vector	:  8,		
+	delivery_mode	:  3,	/*	000b: FIXED
+			 		001b: lowest priority
+			 	 	111b: ExtINT */	
+	reserved_1	:  3,
+	level		:  1,	/* 0: deassert, 1: assert */	
+	trigger		:  1,	/* 0: edge, 1: level */	
+	reserved_2	: 16;
+} __attribute__ ((packed));
+
+struct msg_address {
+	union {
+		struct { __u32
+			reserved_1		:  2,		
+			dest_mode		:  1,	/* 0: physical, 
+							   1: logical */	
+			redirection_hint        :  1,  	/* 0: dedicated CPU 
+							   1: lowest priority */
+			reserved_2		:  8,   	
+ 			dest_id			:  8,	/* Destination ID */	
+			header			: 12;	/* FEEH */
+      	}u;
+       	__u32  value;
+	}lo_address;
+	__u32 	hi_address;
+} __attribute__ ((packed));
+
+struct msi_desc {
+	struct {
+		__u32	type	: 5, /* {0: unused, 5h:MSI, 11h:MSI-X} 	  */
+			maskbit	: 1, /* mask-pending bit supported ?      */
+			reserved: 2, /* reserved			  */
+			entry_nr: 8, /* specific enabled entry 		  */
+			default_vector: 8, /* default pre-assigned vector */ 
+			current_cpu: 8; /* current destination cpu	  */
+	}msi_attrib;
+
+	struct {
+		__u32	head	: 16, 
+			tail	: 16; 
+	}link;
+
+	unsigned long mask_base;
+	struct pci_dev *dev;
+};
+
+#endif /* _ASM_PCI_MSI_H */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Updated MSI Patches
@ 2003-08-22 18:25 long
  0 siblings, 0 replies; 35+ messages in thread
From: long @ 2003-08-22 18:25 UTC (permalink / raw)
  To: greg; +Cc: jun.nakajima, linux-kernel

Monday, August 11, 2003, Greg wrote:
>> There are two types of MSI capable devices: one supports the MSI 
>> capability structure and other supports the MSI-X capability structure.
>> The patches provide two APIs msix_alloc_vectors/msix_free_vectors for 
>> only devices, which support the MSI-X capability structure, to request
>> for additional messages. By default, each MSI/MSI-X capable device 
>> function is allocated with one vector for below reasons:
>> - To achieve a backward compatibility with existing drivers if possible.
>> - Due to the current implementation of vector assignment, all devices 
>>   that support the MSI-capability structure work with no more than one 
>>   allocated vector.
>> - The hardware devices, which support the MSI-X capability structure, 
>>   may indicate the maximum capable number of vectors supported (32 
>>   vectors as example). However, the device drivers may require only 
>>   four. With provided APIs, the optimization of MSI vector allocation 
>>   is achievable.
>> 
>> The existing driver is required to be modified if and only if its
>> hardware/software synchronization is not provided; that means, the 
>> hardware device may generate multiple messages from the same vector. 
>> These messages may be lost; the results may be unpredictable. To 
>> overcome this issue, the existing driver driver requires to provide 
>> the hardware/software synchronization to ask its hardware device to 
>> generate one message at a time.

>Ok, that makes a little more sense.  I'm guessing that new hardware will
>need to have their drivers modified to support this if it is needed.

Agree. Sorry for responding late to your comments. Also, I rearrange your
other comments below.

>> In summary, there are two options provided:
>> Option 1: Enable MSI on all MSI capable devices by default and use boot
>> parameter "device_nomsi=" to disable specific devices, which have bugs 
>> in MSI.

>Ok, that's almost acceptable.  How about fixing the bugs in MSI so this
>is not needed?  :)

Agree. Boot parameters "device_nomsi=" will be no longer needed finally.
It is IHVs' responsibility to fix the MSI bugs in their hardwares. 

>> Option 2 (default): Enable individual MSI capable devices as validation
>> purpose by using the boot parameter "device_msi=" to select which 
>> devices the kernel supports MSI.

>Ick.  How do you specify which "device" this option pertains to?  Do I
>have to know the pci id?  What about pci hotplug boxes which decide to
>re-arrange the pci ids every other time the box boots (yeah, I've seen
>this, no fun...)

The boot parameter "device_msi=" requires device ID and vendor ID as
mentioned in MSI-HOWTO.txt. I agree that this is no fun... because users
may not know the device ID and vendor ID of the hardware device. Like boot
parameter "device_nomsi=", "device_msi=" will be no longer needed finally.
It is IHVs' responsibility to fix the MSI bugs in their hardwares. 

Based on your comments, we reconstruct the kernel config parameters as
below. Option 1 & 2 as described above are provided only if 
CONFIG_MSI_DEBUG is set to 'Y'. This provides IHVs options to debug their
buggy hardware devices. Depending on how many MSI capable devices 
populated on their system, either option 1 or 2 can be selected to help
them to debug their own hardwares. 
Please let me know what you think...

CONFIG_PCI_USE_VECTOR	==> This implies to vector base enabled or not. 
			    If this is enabled, then MSI is also enabled by 
			    default to all MSI capable devices when MSI 
			    patch is installed along with vector base patch.
	CONFIG_MSI_DEBUG==> This implies when MSI patch is installed. By 
			    default, it is set to 'N'. This provides IHVs' 
			    option to debug their buggy hardwares. If this 
			    option is set, then MSI code handles boot 
			    parameters "device_nomsi="/"device_msi=" 
			    accordingly. Otherwise, the patch ignores
			    "device_nomsi="/"device_msi=". 

>How about (if MSI is so troublesome), we just disable MSI all the time,
>and provide a sysfs file for MSI to be "enabled" for every device if it
>is wanted?  The init scripts could go and enable this as needed.  But
>I'm guessing that you can't switch interrupt modes in a device after it
>has been initialized, right?  Oh well...

Enabling MSI on an MSI capable device is done during PCI bus enumeration.
A sysfs file does not provide any benefits during system boot. Second
reason is as mentioned above it is IHVs' responsibility to fix the MSI 
bugs in their hardwares.
Yes, you can't switch interrupt modes after it has been initialized. 

>> Note that if this option is chosen, 
>> then the use of "device_nomsi" id no longer necessary. This is by 
>> default because users may not know how many MSI capable devices 
>> populated in their systems and how many of these have bugs in MSI.
>> 
>> Either one of these options is chosen, I do not think you have any 
>> issue of hot-adding the 33 MSI capable device.

So then there is no limit on the number of MSI devices if
CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is not selected?
Yes, there is no limit on the number of MSI devices if 
CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is not selected. This will also be
removed finally. 

>> > +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };
>> 
>> >Shouldn't this be static?
>> 
>> > +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };
>> 
>> >Same with this one?
>> 
>> In the initial design draft, I am thinking of defining
>> DRIVER_NOMSI_MAX as 32 since it may be impossible to have more than 32
>> MSI capable devices populated in the same system. I will add some
>> comments to it and look forward for your comments.
>>
> Um, I ment the variable is in the global symbol table, and it doesn't
> look like it needs to be.

Agree.

> But your comment about number of devices is also a good one.  Any way
> we can figure that out at run-time too? What happens when I hot-add 
> the 33 MSI capable device?  Or just boot with that many devices (I've 
> seen boxes with 100s of pci devices all attached to one system)?

I agree that processing data from boot parameter "device_nomsi=" can be done
at run-time during system boot, but not in the hot plug environment. Anyway,
this will be removed finally as mentioned above.

>> The defined variable DRIVER_NOMSI_MAX in device_msi_list[] and 
>> device_nomsi_list[] are used to serve mainly as debug purposes since 
>> most IHVs have not yet validated their hardware devices.

> Heh, but what if I have 33 of the same kind of card in a machine (and
> that card has already been "validated")?

The same kind of cards have the same vendor ID and device ID. If that card
has already been validated, leave CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to its
default 'N'. As mentioned above, the final patches will enable MSI to all
MSI capable devices automatically. All debug parameters are removed. It 
is IHVs' responsibility to fix the MSI bugs in their hardwares. 

>> If one of the system has more than 32 MSI capable devices, users can
>> enable MSI on all devices by default (by setting
>> CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to 'n') and use boot parameter
>> "device_nomsi=" to disable some devices, which have bugs in MSI.

>Ugh, I'm trying to get across the point that a user does _not_ want to
>have to reconfig their kernel.  Distros do not want to ship a whole
>different kernel just for this one option.  That's not acceptable if we
>can handle everything dynamically with no problems for the small boxes
>as well as the big ones.

Agree. The final patches will not provide any boot parameters. It is 
IHVs' responsibility to fix the MSI bugs in their hardwares.


Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-19 15:42 Nguyen, Tom L
  0 siblings, 0 replies; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-19 15:42 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel, Nakajima, Jun, long

Tuesday, August 12, 2003, Zwane Mwaikambo wrote
>> +#ifdef CONFIG_PCI_MSI
>> +			/* Is this an active MSI? */
>> +			if (msi_desc[j])
>> +				continue;
>> +#endif

>> Since the code determinces whether this entry is NULL or not, I think any  
>> locking for msi_desc may not be required.

>Yes but there is other code which modifies msi_desc members. i think a per 
>msi_desc lock is needed. You could also use a kmem_cache to allocate them, 
>and perhaps utilise HWCACHE_ALIGN.
We will have set_affinity support for MSI in our next update release. The above
code will be deleted. Utilize HWCACHE_ALIGN to allocate msi_desc is a good
suggestion. Thanks!

On Thu, 7 Aug 2003, Zwane Mwaikambo wrote:
>> - Change the interface name from msi_free_vectors to msix_free_vectors since 
>> this interface is used for MSI-X device drivers, which request for releasing 
>> the number of vector back to the PCI subsystem.
>> - Change the function name from remove_hotplug_vectors to 
>> msi_remove_pci_irq_vectors to have a close match with function name 
>> msi_get_pci_irq_vector.

>I think the vector allocator code can all be arch specific generic, 
>there is no particular reason as to why it has to be MSI specific.
Some IHVs suggest changing the interface names from msi_alloc_vectors/
msi_free_vectors to msix_alloc_vectors/msix_free_vectors. You have a good point 
of keeping these generic. Agree.

Thanks,
Long


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-14 23:46 Nguyen, Tom L
  0 siblings, 0 replies; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-14 23:46 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Greg KH, long, Nakajima, Jun, linux-kernel

On Tue, 12 Aug 2003, Nguyen, Tom L wrote:
> > IMO Multiplexing would be preferred, we can't be allocating that many 
> > vectors for one device/device driver
> All pre-assigned vectors to all enabled IOxAPIC(s) are reserved.
> Allocating additional vectors to MSI-X driver is determined based on the
> available vectors, which must be greater than the number of vectors requested
> by MSI-X driver.

>Hmm can you avoid grabbing all the free vectors? I think the irq 
>controller subsystem should handle allocation of vectors. Letting MSI grab 
>everything might leave us with problems later on.
Good point. Thanks! we are studying the policy, which ensures that no MSI-X
can grab everything from others. Will post this changes after done ...

Thanks,
Long



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-13 15:19 Nguyen, Tom L
  0 siblings, 0 replies; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-13 15:19 UTC (permalink / raw)
  To: Zwane Mwaikambo, Nakajima, Jun; +Cc: Jeff Garzik, Linux Kernel, long

>> Salability means many things. I'm not sure which aspect you are talking
>> about, but a good thing with MSI is that it does not depend on I/O
>> APICs. A typical I/O APIC has 24 RTEs, and we need about 10 I/O APICs to
>> consume the vectors. 

> Yes there are already a number of systems which exhibit this problem (see 
> the URL in the next comment)
 
>> You can make this scale on SMP systems. The vector-based approach would
>> be easier to extend it per-CPU because vectors are inherently per-CPU,
>> compared with IRQ-based. Today we have IRQ balancing as you know, and
>> the key part is to bind IRQ to a particular CPU (BTW, with our patch it
>> happens automatically because the balancer does not care if they are IRQ
>> or vector). 

>Sorry, yeah i meant device scalability essentially, the ability to plug in 
>a fair amount of these and other devices into the same system. The per cpu 
>IDT route is also worthwhile following a bit later on. I've already 
>written something along the lines of a per node IDT (you could make the 
>granularity per cpu of course) if you're interested you can have a look 
>here http://www.osdl.org/projects/numaqhwspprt/results/patch-numaq-highirq10 it 
>also has a companion patch to dynamically allocate the irq_desc's which 
>applies on top of it http://function.linuxpower.ca/patches/patch-dynirq-wli

>But back to the subject, your patch looks good, i'll test it for 
>regression (no MSI devices but regression testing is just as important) 
>when you next post.
I already had discussion with Jun about this change. We will post it as soon as
we validate this change on our system.

Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-13  4:14 Nakajima, Jun
  0 siblings, 0 replies; 35+ messages in thread
From: Nakajima, Jun @ 2003-08-13  4:14 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Jeff Garzik, Nguyen, Tom L, Linux Kernel, long

> But back to the subject, your patch looks good, i'll test it for
> regression (no MSI devices but regression testing is just as
important)
> when you next post.

Thanks. Long will make another patch that reflects my last comments.

Jun

> -----Original Message-----
> From: Zwane Mwaikambo [mailto:zwane@linuxpower.ca]
> Sent: Tuesday, August 12, 2003 6:37 PM
> To: Nakajima, Jun
> Cc: Jeff Garzik; Nguyen, Tom L; Linux Kernel; long
> Subject: RE: Updated MSI Patches
> 
> On Tue, 12 Aug 2003, Nakajima, Jun wrote:
> 
> > Salability means many things. I'm not sure which aspect you are
talking
> > about, but a good thing with MSI is that it does not depend on I/O
> > APICs. A typical I/O APIC has 24 RTEs, and we need about 10 I/O
APICs to
> > consume the vectors.
> 
> Yes there are already a number of systems which exhibit this problem
(see
> the URL in the next comment)
> 
> > You can make this scale on SMP systems. The vector-based approach
would
> > be easier to extend it per-CPU because vectors are inherently
per-CPU,
> > compared with IRQ-based. Today we have IRQ balancing as you know,
and
> > the key part is to bind IRQ to a particular CPU (BTW, with our patch
it
> > happens automatically because the balancer does not care if they are
IRQ
> > or vector).
> 
> Sorry, yeah i meant device scalability essentially, the ability to
plug in
> a fair amount of these and other devices into the same system. The per
cpu
> IDT route is also worthwhile following a bit later on. I've already
> written something along the lines of a per node IDT (you could make
the
> granularity per cpu of course) if you're interested you can have a
look
> here http://www.osdl.org/projects/numaqhwspprt/results/patch-numaq-
> highirq10 it
> also has a companion patch to dynamically allocate the irq_desc's
which
> applies on top of it
http://function.linuxpower.ca/patches/patch-dynirq-
> wli
> 
> But back to the subject, your patch looks good, i'll test it for
> regression (no MSI devices but regression testing is just as
important)
> when you next post.
> 
> Thanks!
> 	Zwane

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
  2003-08-12 23:59 Nakajima, Jun
@ 2003-08-13  0:48 ` Zwane Mwaikambo
  0 siblings, 0 replies; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-13  0:48 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: Jeff Garzik, Nguyen, Tom L, Linux Kernel, long

On Tue, 12 Aug 2003, Nakajima, Jun wrote:

> static unsigned int startup_edge_ioapic_vector(unsigned int vector)
> {
> 	int was_pending = 0;
> 	unsigned long flags;
> 	int irq = vector_to_irq (vector);
> 
> 	spin_lock_irqsave(&ioapic_lock, flags);
> 	if (irq < 16) {
> 		disable_8259A_irq(irq);
> 		if (i8259A_irq_pending(irq))
> 			was_pending = 1;
> 	}
> 	__unmask_IO_APIC_irq(irq);
> 	spin_unlock_irqrestore(&ioapic_lock, flags);
> 
> 	return was_pending;
> }

Hmm that doesn't look too bad, i like, ok another question, do you think 
this could be made to scale on systems with lots of devices requiring 
vectors?

Thanks,
	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-12 23:59 Nakajima, Jun
  2003-08-13  0:48 ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Nakajima, Jun @ 2003-08-12 23:59 UTC (permalink / raw)
  To: Jeff Garzik, Zwane Mwaikambo; +Cc: Nguyen, Tom L, Linux Kernel, long

If the platform_irq thing is the issue, we can remove it by having like

#ifdef PCI_USE_VECTOR
static struct hw_interrupt_type ioapic_edge_vector_type = {
        .typename       = "IO-APIC-edge",
        .startup        = startup_edge_ioapic_vector,
        .shutdown       = shutdown_edge_ioapic_vector,
        .enable         = enable_edge_ioapic_vector,
        .disable        = disable_edge_ioapic_vector,
        .ack            = ack_edge_ioapic_vector,
        .end            = end_edge_ioapic_vector,
        .set_affinity   = set_ioapic_affinity,
};

static struct hw_interrupt_type ioapic_level_vector_type = {
...

#else
...

They (startup_edge_ioapic_vector, etc.) convert the vector to IRQ when
needed. Note that we need that because interrupt controller(s) still
needs IRQ info for the conventional interrupts when we use vector-based
do_IRQ(). 

For example, startup_edge_ioapic_vector() looks like this:

static unsigned int startup_edge_ioapic_vector(unsigned int vector)
{
	int was_pending = 0;
	unsigned long flags;
	int irq = vector_to_irq (vector);

	spin_lock_irqsave(&ioapic_lock, flags);
	if (irq < 16) {
		disable_8259A_irq(irq);
		if (i8259A_irq_pending(irq))
			was_pending = 1;
	}
	__unmask_IO_APIC_irq(irq);
	spin_unlock_irqrestore(&ioapic_lock, flags);

	return was_pending;
}

Thanks,
Jun

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Tuesday, August 12, 2003 12:45 PM
> To: Zwane Mwaikambo
> Cc: Nakajima, Jun; Nguyen, Tom L; Linux Kernel; long
> Subject: Re: Updated MSI Patches
> 
> Zwane Mwaikambo wrote:
> > On Tue, 12 Aug 2003, Jeff Garzik wrote:
> >
> >
> >>So, IMO, do_IRQ is one special case where copying code may be
preferred
> >>over common code.
> >>
> >>And I also feel the same way about do_MSI().  However, I have not
looked
> >>at non-ia32 MSI implementations to know what sort of issues exist.
> >
> >
> > The main reason i have a preference for a seperate MSI handling path
is
> so
> > that we don't have to do the platform_irq thing in do_IRQ and we
know
> > what to expect wrt irq or vector. If platform_irq stays we should at
> > least try and pick up on what the IA64 folks have done, But that
would
> be
> > even harder to get done right now.
> 
> 
> Oh, I definitely prefer a separate MSI handling path too.
> 
> In the future we'll be writing drivers that _require_ MSI interrupt
> handling, and we'll be optimizing the various MSI hot paths to reclaim
> even the most minute amount of CPU cycles.  And we want to escape any
> shackles the evil phrase "legacy interrupts" dares to try to lay upon
us.
> 
> But there is a flip side to that:  do_IRQ is not solely hardware
> interrupts.  That area of code is central dispatcher for
> softirq/tasklet/timer delivery as well.  So a separate do_MSI() needs
to
> take that stuff into account.
> 
> Overall, I'm pretty happy with how Tom's MSI patches are going so far,
> and he seems to be responding to feedback.  So, we'll get there.
> 
> 	Jeff
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-12 19:14   ` Zwane Mwaikambo
@ 2003-08-12 19:44     ` Jeff Garzik
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff Garzik @ 2003-08-12 19:44 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Nakajima, Jun, Nguyen, Tom L, Linux Kernel, long

Zwane Mwaikambo wrote:
> On Tue, 12 Aug 2003, Jeff Garzik wrote:
> 
> 
>>So, IMO, do_IRQ is one special case where copying code may be preferred 
>>over common code.
>>
>>And I also feel the same way about do_MSI().  However, I have not looked 
>>at non-ia32 MSI implementations to know what sort of issues exist.
> 
> 
> The main reason i have a preference for a seperate MSI handling path is so 
> that we don't have to do the platform_irq thing in do_IRQ and we know 
> what to expect wrt irq or vector. If platform_irq stays we should at 
> least try and pick up on what the IA64 folks have done, But that would be 
> even harder to get done right now.


Oh, I definitely prefer a separate MSI handling path too.

In the future we'll be writing drivers that _require_ MSI interrupt 
handling, and we'll be optimizing the various MSI hot paths to reclaim 
even the most minute amount of CPU cycles.  And we want to escape any 
shackles the evil phrase "legacy interrupts" dares to try to lay upon us.

But there is a flip side to that:  do_IRQ is not solely hardware 
interrupts.  That area of code is central dispatcher for 
softirq/tasklet/timer delivery as well.  So a separate do_MSI() needs to 
take that stuff into account.

Overall, I'm pretty happy with how Tom's MSI patches are going so far, 
and he seems to be responding to feedback.  So, we'll get there.

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-12 18:48 ` Jeff Garzik
@ 2003-08-12 19:14   ` Zwane Mwaikambo
  2003-08-12 19:44     ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-12 19:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Nakajima, Jun, Nguyen, Tom L, Linux Kernel, long

On Tue, 12 Aug 2003, Jeff Garzik wrote:

> So, IMO, do_IRQ is one special case where copying code may be preferred 
> over common code.
> 
> And I also feel the same way about do_MSI().  However, I have not looked 
> at non-ia32 MSI implementations to know what sort of issues exist.

The main reason i have a preference for a seperate MSI handling path is so 
that we don't have to do the platform_irq thing in do_IRQ and we know 
what to expect wrt irq or vector. If platform_irq stays we should at 
least try and pick up on what the IA64 folks have done, But that would be 
even harder to get done right now.

	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-12 18:36 Nakajima, Jun
@ 2003-08-12 18:48 ` Jeff Garzik
  2003-08-12 19:14   ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2003-08-12 18:48 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: Zwane Mwaikambo, Nguyen, Tom L, Linux Kernel, long

Nakajima, Jun wrote:
> The issue with do_MSI() approach is that it's very similar to do_IRQ(),
> and we may have maintenance issues there. However, if we make a common

I agree


> do_MSI() code, that might be worth it, and I would expect much fewer
> architecture-dependent issues there, compared to do_IRQ (the common
> do_IRQ() hasn't happened yet as far as I know).

However, we have maintenance issues in this area as well :)

If you look at each architecture's implementation of do_IRQ, you can see 
each implementation is strikingly similar... except for some subtle 
differences.  So there are arguments both ways:  creating a common 
do_IRQ may add maintenance value...  but also create corner-case 
problems for the arch maintainers.

So, IMO, do_IRQ is one special case where copying code may be preferred 
over common code.

And I also feel the same way about do_MSI().  However, I have not looked 
at non-ia32 MSI implementations to know what sort of issues exist.

	Jeff




^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-12 18:36 Nakajima, Jun
  2003-08-12 18:48 ` Jeff Garzik
  0 siblings, 1 reply; 35+ messages in thread
From: Nakajima, Jun @ 2003-08-12 18:36 UTC (permalink / raw)
  To: Zwane Mwaikambo, Nguyen, Tom L; +Cc: Linux Kernel, long

> For the legacy devices, an irq_vector mapping should be sufficient as
you
> currently maintain. You could also possibly have a seperate do_MSI
which
> gets a vector pushed to it and then uses mesc_desc instead to do the
> handlers. You then install the IDT entry using the seperate MSI
interrupt
> stub; set_intr_gate(vector, msi_interrupt[msi_num]) where
msi_interrupt is
> a NR_MSI sized array. This way you can avoid touching do_IRQ entirely.

The issue with do_MSI() approach is that it's very similar to do_IRQ(),
and we may have maintenance issues there. However, if we make a common
do_MSI() code, that might be worth it, and I would expect much fewer
architecture-dependent issues there, compared to do_IRQ (the common
do_IRQ() hasn't happened yet as far as I know).

Thanks,
Jun

> -----Original Message-----
> From: Zwane Mwaikambo [mailto:zwane@linuxpower.ca]
> Sent: Tuesday, August 12, 2003 11:11 AM
> To: Nguyen, Tom L
> Cc: Linux Kernel; Nakajima, Jun; long
> Subject: RE: Updated MSI Patches
> 
> Your message would be easier to reply to and read if it was wrapped at
72
> characters, please look into fixing that for the sake of your
recipients.
> Thanks.
> 
> On Tue, 12 Aug 2003, Nguyen, Tom L wrote:
> 
> > I understand that mixing up vector and irq is very confusing.
However,
> > to support non-PCI legacy devices with IRQ less than 16, such as
> > keyboard and mouse for example, may be impossilbe to achieve without
> > mixing up. Some existing driver of legacy keyboard/mouse devices,
for
> > example, may use fixed IO ranges and fixed IRQs (as assigned to 1
for
> > keyboard and 12 for mouse). If these device drivers use these fixed
> > legacy IRQs and the interrupt routings for these non-PCI legacy
devices
> > use vectors, then the system may break. As you know, MSI support
> > requires vector allocation instead of IRQ allocation since MSI does
not
> > require a support of BIOS IRQ table. Mixing vector with IRQ to be
> > compatible with non-PCI legacy devices must be achieved. Last time,
> > your suggestion of changing variable name from irq to vector is the
> > good approach. I am looking at restructuring the code of the
> > vector-base patch. I will send you an update when I am done for your
> > feedback.
> 
> For the legacy devices, an irq_vector mapping should be sufficient as
you
> currently maintain. You could also possibly have a seperate do_MSI
which
> gets a vector pushed to it and then uses mesc_desc instead to do the
> handlers. You then install the IDT entry using the seperate MSI
interrupt
> stub; set_intr_gate(vector, msi_interrupt[msi_num]) where
msi_interrupt is
> a NR_MSI sized array. This way you can avoid touching do_IRQ entirely.
> 
> > Since the code determinces whether this entry is NULL or not, I
think
> any
> > locking for msi_desc may not be required.
> 
> Yes but there is other code which modifies msi_desc members. i think a
per
> msi_desc lock is needed. You could also use a kmem_cache to allocate
them,
> and perhaps utilise HWCACHE_ALIGN.
> 
> 	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
  2003-08-12 17:32 Nguyen, Tom L
@ 2003-08-12 18:14 ` Zwane Mwaikambo
  0 siblings, 0 replies; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-12 18:14 UTC (permalink / raw)
  To: Nguyen, Tom L; +Cc: Greg KH, long, Nakajima, Jun, linux-kernel

On Tue, 12 Aug 2003, Nguyen, Tom L wrote:

> > IMO Multiplexing would be preferred, we can't be allocating that many 
> > vectors for one device/device driver
> All pre-assigned vectors to all enabled IOxAPIC(s) are reserved.
> Allocating additional vectors to MSI-X driver is determined based on the
> available vectors, which must be greater than the number of vectors requested
> by MSI-X driver.

Hmm can you avoid grabbing all the free vectors? I think the irq 
controller subsystem should handle allocation of vectors. Letting MSI grab 
everything might leave us with problems later on.

	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
  2003-08-12 17:04 Nguyen, Tom L
@ 2003-08-12 18:11 ` Zwane Mwaikambo
  0 siblings, 0 replies; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-12 18:11 UTC (permalink / raw)
  To: Nguyen, Tom L; +Cc: Linux Kernel, Nakajima, Jun, long

Your message would be easier to reply to and read if it was wrapped at 72 
characters, please look into fixing that for the sake of your recipients. 
Thanks.

On Tue, 12 Aug 2003, Nguyen, Tom L wrote:

> I understand that mixing up vector and irq is very confusing. However, 
> to support non-PCI legacy devices with IRQ less than 16, such as 
> keyboard and mouse for example, may be impossilbe to achieve without 
> mixing up. Some existing driver of legacy keyboard/mouse devices, for 
> example, may use fixed IO ranges and fixed IRQs (as assigned to 1 for 
> keyboard and 12 for mouse). If these device drivers use these fixed 
> legacy IRQs and the interrupt routings for these non-PCI legacy devices  
> use vectors, then the system may break. As you know, MSI support 
> requires vector allocation instead of IRQ allocation since MSI does not 
> require a support of BIOS IRQ table. Mixing vector with IRQ to be 
> compatible with non-PCI legacy devices must be achieved. Last time, 
> your suggestion of changing variable name from irq to vector is the 
> good approach. I am looking at restructuring the code of the 
> vector-base patch. I will send you an update when I am done for your 
> feedback.

For the legacy devices, an irq_vector mapping should be sufficient as you 
currently maintain. You could also possibly have a seperate do_MSI which 
gets a vector pushed to it and then uses mesc_desc instead to do the 
handlers. You then install the IDT entry using the seperate MSI interrupt 
stub; set_intr_gate(vector, msi_interrupt[msi_num]) where msi_interrupt is 
a NR_MSI sized array. This way you can avoid touching do_IRQ entirely.

> Since the code determinces whether this entry is NULL or not, I think any  
> locking for msi_desc may not be required.

Yes but there is other code which modifies msi_desc members. i think a per 
msi_desc lock is needed. You could also use a kmem_cache to allocate them, 
and perhaps utilise HWCACHE_ALIGN.

	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-12 17:32 Nguyen, Tom L
  2003-08-12 18:14 ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-12 17:32 UTC (permalink / raw)
  To: Zwane Mwaikambo, Greg KH; +Cc: long, Nakajima, Jun, linux-kernel

> > There are two types of MSI capable devices: one supports the MSI 
> > capability structure and other supports the MSI-X capability structure.
> > The patches provide two APIs msix_alloc_vectors/msix_free_vectors for 
> > only devices, which support the MSI-X capability structure, to request
> > for additional messages. By default, each MSI/MSI-X capable device 
> > function is allocated with one vector for below reasons:
> > - To achieve a backward compatibility with existing drivers if possible.
> > - Due to the current implementation of vector assignment, all devices 
> >   that support the MSI-capability structure work with no more than one 
> >   allocated vector.
> > - The hardware devices, which support the MSI-X capability structure, 
> >   may indicate the maximum capable number of vectors supported (32 
> >   vectors as example). However, the device drivers may require only 
> >   four. With provided APIs, the optimization of MSI vector allocation 
> >   is achievable.

> IMO Multiplexing would be preferred, we can't be allocating that many 
> vectors for one device/device driver
All pre-assigned vectors to all enabled IOxAPIC(s) are reserved.
Allocating additional vectors to MSI-X driver is determined based on the
available vectors, which must be greater than the number of vectors requested
by MSI-X driver.

Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-12 17:04 Nguyen, Tom L
  2003-08-12 18:11 ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-12 17:04 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Linux Kernel, Nakajima, Jun, long

> diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2/arch/i386/kernel/io_apic.c	2003-07-27 13:00:21.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
> @@ -76,6 +76,20 @@
>  	int apic, pin, next;
>  } irq_2_pin[PIN_MAP_SIZE];
>  
> +#ifdef CONFIG_PCI_USE_VECTOR
> +int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
> + 
> +static int platform_irq(int irq) 	
> +{ 	
> +	if (platform_legacy_irq(irq))
> +		return irq;
> +	else
> +		return vector_irq[irq];
> +}
> +#else
> +#define platform_irq(irq)	(irq)
> +#endif 

> I wish you wouldn't mix up vector and irq like this :( This is really 
> beginning to look like IPF. When is an irq an irq and when is a vector a 
> vector? This can get very confusing very quickly. I spent a fair 
> amount of time debugging this last time due to the confusion in the 
> common_interrupt -> do_IRQ path, i'd hate for this to be a permanent 
> fixture. If you're going to change things just change everything 
> (including variable names) instead of making it conditional.
I understand that mixing up vector and irq is very confusing. However, to support non-PCI legacy devices with IRQ less than 16, such as keyboard and mouse for example, may be impossilbe to achieve without mixing up. Some existing driver of legacy keyboard/mouse devices, for example, may use fixed IO ranges and fixed IRQs (as assigned to 1 for keyboard and 12 for mouse). If these device drivers use these fixed legacy IRQs and the interrupt routings for these non-PCI legacy devices use vectors, then the system may break. As you know, MSI support requires vector allocation instead of IRQ allocation since MSI does not require a support of BIOS IRQ table. Mixing vector with IRQ to be compatible with non-PCI legacy devices must be achieved. Last time, your suggestion of changing variable name from irq to vector is the good approach. I am looking at restructuring the code of the vector-base patch. I will send you an update when I am done for your feedback.

> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c	2003-08-05 09:45:25.000000000 -0400
> @@ -427,6 +427,11 @@
>  			/* Is this an active IRQ? */
>  			if (!irq_desc[j].action)
>  				continue;
> +#ifdef CONFIG_PCI_MSI
> +			/* Is this an active MSI? */
> +			if (msi_desc[j])
> +				continue;
> +#endif

> Is there any locking for msi_desc? Or are you relying on something else?
Since the code determinces whether this entry is NULL or not, I think any locking for msi_desc may not be required.


> +	if (current_vector == FIRST_SYSTEM_VECTOR) 
> +		panic("ran out of interrupt sources!");

> Please just fail and return -ENOSPC or something like that.
Good. Thanks!

Thanks,
Long


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
@ 2003-08-12 15:22 long
  0 siblings, 0 replies; 35+ messages in thread
From: long @ 2003-08-12 15:22 UTC (permalink / raw)
  To: greg, zwane
  Cc: jun.nakajima, linux-kernel, pcihpd-discuss, tlnguyen, tom.l.nguyen

> > > I hope there may be a way to determine the number of vectors supported 
> > > by CPU during the run-time. I look at the file ../include/asm-i386/mach-
> > > default/irq_vectors.h, the maximum of vectors (256) is already well 
> > > commented.
> > 
> > Yeah, but that's in reference to APIC interrupt sources, right?  Does
> > that correspond to these "vectors" too?  If so, why not just use the
> > existing NR_IRQS value instead of creating your own?
> 
> There isn't a 1:1 relationship between NR_IRQS and NR_VECTORS we really 
> shouldn't mix them together. NR_IRQS can be much higher than 256 whilst 
> on i386 we're fixed to 256 vectors due to that being the Interrupt 
> Descriptor Table capacity. It is still possible to service more interrupt 
> lines than 256 on SMP however by making IDTs per 'interrupt servicing 
> node'

> Ah, so for processors other than i386, NR_VECTORS can be bigger?

> I think a nice description is still needed for this value for just this
> reason :)

Agree. Will put some comments describing this value. 

Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-12  1:41   ` Zwane Mwaikambo
@ 2003-08-12  5:53     ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2003-08-12  5:53 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: long, jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

On Mon, Aug 11, 2003 at 09:41:30PM -0400, Zwane Mwaikambo wrote:
> > > I hope there may be a way to determine the number of vectors supported 
> > > by CPU during the run-time. I look at the file ../include/asm-i386/mach-
> > > default/irq_vectors.h, the maximum of vectors (256) is already well 
> > > commented.
> > 
> > Yeah, but that's in reference to APIC interrupt sources, right?  Does
> > that correspond to these "vectors" too?  If so, why not just use the
> > existing NR_IRQS value instead of creating your own?
> 
> There isn't a 1:1 relationship between NR_IRQS and NR_VECTORS we really 
> shouldn't mix them together. NR_IRQS can be much higher than 256 whilst 
> on i386 we're fixed to 256 vectors due to that being the Interrupt 
> Descriptor Table capacity. It is still possible to service more interrupt 
> lines than 256 on SMP however by making IDTs per 'interrupt servicing 
> node'

Ah, so for processors other than i386, NR_VECTORS can be bigger?

I think a nice description is still needed for this value for just this
reason :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-11 21:16 ` Greg KH
@ 2003-08-12  1:41   ` Zwane Mwaikambo
  2003-08-12  5:53     ` Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-12  1:41 UTC (permalink / raw)
  To: Greg KH; +Cc: long, jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

On Mon, 11 Aug 2003, Greg KH wrote:

> > There are two types of MSI capable devices: one supports the MSI 
> > capability structure and other supports the MSI-X capability structure.
> > The patches provide two APIs msix_alloc_vectors/msix_free_vectors for 
> > only devices, which support the MSI-X capability structure, to request
> > for additional messages. By default, each MSI/MSI-X capable device 
> > function is allocated with one vector for below reasons:
> > - To achieve a backward compatibility with existing drivers if possible.
> > - Due to the current implementation of vector assignment, all devices 
> >   that support the MSI-capability structure work with no more than one 
> >   allocated vector.
> > - The hardware devices, which support the MSI-X capability structure, 
> >   may indicate the maximum capable number of vectors supported (32 
> >   vectors as example). However, the device drivers may require only 
> >   four. With provided APIs, the optimization of MSI vector allocation 
> >   is achievable.

IMO Multiplexing would be preferred, we can't be allocating that many 
vectors for one device/device driver

> > I hope there may be a way to determine the number of vectors supported 
> > by CPU during the run-time. I look at the file ../include/asm-i386/mach-
> > default/irq_vectors.h, the maximum of vectors (256) is already well 
> > commented.
> 
> Yeah, but that's in reference to APIC interrupt sources, right?  Does
> that correspond to these "vectors" too?  If so, why not just use the
> existing NR_IRQS value instead of creating your own?

There isn't a 1:1 relationship between NR_IRQS and NR_VECTORS we really 
shouldn't mix them together. NR_IRQS can be much higher than 256 whilst 
on i386 we're fixed to 256 vectors due to that being the Interrupt 
Descriptor Table capacity. It is still possible to service more interrupt 
lines than 256 on SMP however by making IDTs per 'interrupt servicing 
node'

	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-11 20:51 long
@ 2003-08-11 21:16 ` Greg KH
  2003-08-12  1:41   ` Zwane Mwaikambo
  0 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2003-08-11 21:16 UTC (permalink / raw)
  To: long; +Cc: jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

On Mon, Aug 11, 2003 at 01:51:09PM -0700, long wrote:
> > Good, so we can always enable this option, so it doesn't even need 
> > to be an option :)
> Agree. Will eliminate CONFIG_PCI_MSI to make this option dynamically 
> happens.

Thank you.

> > A driver has to be modified to use this option, right?  Do you have 
> > any drivers that have been modified?  Without that, I don't think we
> > can test this patch out, right?
> > MSI support may be transparent to some device drivers.
> 
> > Yet you provide an api for drivers to use, right?  Why would a driver
> > want to use this api instead of just having the hardware work properly
> > on its own?  I guess I'm confused as to how to modify drivers to 
> > support this, or if that's even needed to be done.
> There are two types of MSI capable devices: one supports the MSI 
> capability structure and other supports the MSI-X capability structure.
> The patches provide two APIs msix_alloc_vectors/msix_free_vectors for 
> only devices, which support the MSI-X capability structure, to request
> for additional messages. By default, each MSI/MSI-X capable device 
> function is allocated with one vector for below reasons:
> - To achieve a backward compatibility with existing drivers if possible.
> - Due to the current implementation of vector assignment, all devices 
>   that support the MSI-capability structure work with no more than one 
>   allocated vector.
> - The hardware devices, which support the MSI-X capability structure, 
>   may indicate the maximum capable number of vectors supported (32 
>   vectors as example). However, the device drivers may require only 
>   four. With provided APIs, the optimization of MSI vector allocation 
>   is achievable.
> 
> The existing driver is required to be modified if and only if its
> hardware/software synchronization is not provided; that means, the 
> hardware device may generate multiple messages from the same vector. 
> These messages may be lost; the results may be unpredictable. To 
> overcome this issue, the existing driver driver requires to provide 
> the hardware/software synchronization to ask its hardware device to 
> generate one message at a time.

Ok, that makes a little more sense.  I'm guessing that new hardware will
need to have their drivers modified to support this if it is needed.

> > > +++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
> > > @@ -76,9 +76,14 @@
> > >   * Since vectors 0x00-0x1f are used/reserved for the CPU,
> > >   * the usable vector space is 0x20-0xff (224 vectors)
> > >   */
> > > +#define NR_VECTORS 256
> > 
> > >Will this _always_ be the value?  Can boxes have bigger numbers (I
> > >haven't seen the spec, so I don't know...)
> > Yes, this will always be 256 since the current CPU architecture
> > supports a maximum of 256 vectors.
> 
> > The "current" one does. What happens in a few months when I go to help
> > bring up the next gen hardware platform?  Will I run into the problem
> > that this number needs to be increased?  Can we determine it at
> > run-time?
> I hope there may be a way to determine the number of vectors supported 
> by CPU during the run-time. I look at the file ../include/asm-i386/mach-
> default/irq_vectors.h, the maximum of vectors (256) is already well 
> commented.

Yeah, but that's in reference to APIC interrupt sources, right?  Does
that correspond to these "vectors" too?  If so, why not just use the
existing NR_IRQS value instead of creating your own?

> > > +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };
> > 
> > >Shouldn't this be static?
> > 
> > > +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };
> > 
> > >Same with this one?
> > 
> > In the initial design draft, I am thinking of defining
> > DRIVER_NOMSI_MAX as 32 since it may be impossible to have more than 32
> > MSI capable devices populated in the same system. I will add some
> > comments to it and look forward for your comments.
> 
> > Um, I ment the variable is in the global symbol table, and it doesn't
> > look like it needs to be.
> 
> > But your comment about number of devices is also a good one.  Any way
> > we can figure that out at run-time too?  What happens when I hot-add 
> > the 33 MSI capable device?  Or just boot with that many devices (I've 
> > seen boxes with 100s of pci devices all attached to one system)?
> The defined variable DRIVER_NOMSI_MAX in device_msi_list[] and 
> device_nomsi_list[] are used to serve mainly as debug purposes since 
> most IHVs have not yet validated their hardware devices.

Heh, but what if I have 33 of the same kind of card in a machine (and
that card has already been "validated")?

> If one of the system has more than 32 MSI capable devices, users can
> enable MSI on all devices by default (by setting
> CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to 'n') and use boot parameter
> "device_nomsi=" to disable some devices, which have bugs in MSI.

Ugh, I'm trying to get across the point that a user does _not_ want to
have to reconfig their kernel.  Distros do not want to ship a whole
different kernel just for this one option.  That's not acceptable if we
can handle everything dynamically with no problems for the small boxes
as well as the big ones.

> In summary, there are two options provided:
> Option 1: Enable MSI on all MSI capable devices by default and use boot
> parameter "device_nomsi=" to disable specific devices, which have bugs 
> in MSI.

Ok, that's almost acceptable.  How about fixing the bugs in MSI so this
is not needed?  :)


> Option 2 (default): Enable individual MSI capable devices as validation
> purpose by using the boot parameter "device_msi=" to select which 
> devices the kernel supports MSI.

Ick.  How do you specify which "device" this option pertains to?  Do I
have to know the pci id?  What about pci hotplug boxes which decide to
re-arrange the pci ids every other time the box boots (yeah, I've seen
this, no fun...)

How about (if MSI is so troublesome), we just disable MSI all the time,
and provide a sysfs file for MSI to be "enabled" for every device if it
is wanted?  The init scripts could go and enable this as needed.  But
I'm guessing that you can't switch interrupt modes in a device after it
has been initialized, right?  Oh well...

> Note that if this option is chosen, 
> then the use of "device_nomsi" id no longer necessary. This is by 
> default because users may not know how many MSI capable devices 
> populated in their systems and how many of these have bugs in MSI.
> 
> Either one of these options is chosen, I do not think you have any 
> issue of hot-adding the 33 MSI capable device.

So then there is no limit on the number of MSI devices if
CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is not selected?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
@ 2003-08-11 20:51 long
  2003-08-11 21:16 ` Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: long @ 2003-08-11 20:51 UTC (permalink / raw)
  To: greg, tlnguyen; +Cc: jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

> Is there any way to dynamically detect if hardware can support MSI?
> The MSI patch dynamically detects whether hardware devices can support
> MSI once the CONFIG_PCI_MSI is set during kernel built.

> So why would we ever want CONFIG_PCI_MSI to not be set?  I'm just
> trying to make options dynamically happen without users having to 
> worry about setting a kernel option or not.

> If you enable this option, will boxes that do not support it stop
> working? Once users enable this option, all devices that do not 
> support MSI are still working. These devices are by default using
> IRQ pin assertion as interrupt generated mechanism since all 
> pre-assigned vectors to any enabled IOxAPICs are reserved.

> Good, so we can always enable this option, so it doesn't even need 
> to be an option :)
Agree. Will eliminate CONFIG_PCI_MSI to make this option dynamically 
happens.

> A driver has to be modified to use this option, right?  Do you have 
> any drivers that have been modified?  Without that, I don't think we
> can test this patch out, right?
> MSI support may be transparent to some device drivers.

> Yet you provide an api for drivers to use, right?  Why would a driver
> want to use this api instead of just having the hardware work properly
> on its own?  I guess I'm confused as to how to modify drivers to 
> support this, or if that's even needed to be done.
There are two types of MSI capable devices: one supports the MSI 
capability structure and other supports the MSI-X capability structure.
The patches provide two APIs msix_alloc_vectors/msix_free_vectors for 
only devices, which support the MSI-X capability structure, to request
for additional messages. By default, each MSI/MSI-X capable device 
function is allocated with one vector for below reasons:
- To achieve a backward compatibility with existing drivers if possible.
- Due to the current implementation of vector assignment, all devices 
  that support the MSI-capability structure work with no more than one 
  allocated vector.
- The hardware devices, which support the MSI-X capability structure, 
  may indicate the maximum capable number of vectors supported (32 
  vectors as example). However, the device drivers may require only 
  four. With provided APIs, the optimization of MSI vector allocation 
  is achievable.

The existing driver is required to be modified if and only if its
hardware/software synchronization is not provided; that means, the 
hardware device may generate multiple messages from the same vector. 
These messages may be lost; the results may be unpredictable. To 
overcome this issue, the existing driver driver requires to provide 
the hardware/software synchronization to ask its hardware device to 
generate one message at a time.
    
> > +++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
> > @@ -76,9 +76,14 @@
> >   * Since vectors 0x00-0x1f are used/reserved for the CPU,
> >   * the usable vector space is 0x20-0xff (224 vectors)
> >   */
> > +#define NR_VECTORS 256
> 
> >Will this _always_ be the value?  Can boxes have bigger numbers (I
> >haven't seen the spec, so I don't know...)
> Yes, this will always be 256 since the current CPU architecture
> supports a maximum of 256 vectors.

> The "current" one does. What happens in a few months when I go to help
> bring up the next gen hardware platform?  Will I run into the problem
> that this number needs to be increased?  Can we determine it at
> run-time?
I hope there may be a way to determine the number of vectors supported 
by CPU during the run-time. I look at the file ../include/asm-i386/mach-
default/irq_vectors.h, the maximum of vectors (256) is already well 
commented.

> > +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };
> 
> >Shouldn't this be static?
> 
> > +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };
> 
> >Same with this one?
> 
> In the initial design draft, I am thinking of defining
> DRIVER_NOMSI_MAX as 32 since it may be impossible to have more than 32
> MSI capable devices populated in the same system. I will add some
> comments to it and look forward for your comments.

> Um, I ment the variable is in the global symbol table, and it doesn't
> look like it needs to be.

> But your comment about number of devices is also a good one.  Any way
> we can figure that out at run-time too?  What happens when I hot-add 
> the 33 MSI capable device?  Or just boot with that many devices (I've 
> seen boxes with 100s of pci devices all attached to one system)?
The defined variable DRIVER_NOMSI_MAX in device_msi_list[] and 
device_nomsi_list[] are used to serve mainly as debug purposes since 
most IHVs have not yet validated their hardware devices. If one of 
the system has more than 32 MSI capable devices, users can enable MSI 
on all devices by default (by setting CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES
to 'n') and use boot parameter "device_nomsi=" to disable some devices,
which have bugs in MSI. In summary, there are two options provided:
Option 1: Enable MSI on all MSI capable devices by default and use boot
parameter "device_nomsi=" to disable specific devices, which have bugs 
in MSI.
This option is selected by setting CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to
'n'. Note that if this option is chosen, then the use of "device_msi=" 
is no longer necessary.
Option 2 (default): Enable individual MSI capable devices as validation
purpose by using the boot parameter "device_msi=" to select which 
devices the kernel supports MSI. Note that if this option is chosen, 
then the use of "device_nomsi" id no longer necessary. This is by 
default because users may not know how many MSI capable devices 
populated in their systems and how many of these have bugs in MSI.

Either one of these options is chosen, I do not think you have any 
issue of hot-adding the 33 MSI capable device.

Thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-08 19:33 long
@ 2003-08-08 20:15 ` Greg KH
  0 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2003-08-08 20:15 UTC (permalink / raw)
  To: long; +Cc: jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

On Fri, Aug 08, 2003 at 12:33:01PM -0700, long wrote:
> >pcihpd-discuss?  Don't you mean the linux-pci mailing list instead?
> pcihpd-discuss is my first assumption of what you mean the linux-pci
> mailing list. Please forward me the email address of the linux-pci
> mailing list.

linux-pci@vger.kernel.org

Also, please wrap your emails at a sane size, you have lines here that
are longer than 900 characters long...  I've reformatted them here to be
able to quote them properly.

> > - Is there any way to dynamically detect if hardware can support MSI?
> The MSI patch dynamically detects whether hardware devices can support
> MSI once the CONFIG_PCI_MSI is set during kernel built.

So why would we ever want CONFIG_PCI_MSI to not be set?  I'm just trying
to make options dynamically happen without users having to worry about
setting a kernel option or not.

> > - If you enable this option, will boxes that do not support it stop
> > working?
> Once users enable this option, all devices that do not support MSI are
> still working. These devices are by default using IRQ pin assertion as
> interrupt generated mechanism since all pre-assigned vectors to any
> enabled IOxAPICs are reserved.

Good, so we can always enable this option, so it doesn't even need to be
an option :)

> >A driver has to be modified to use this option, right?  Do you have any
> >drivers that have been modified?  Without that, I don't think we can
> >test this patch out, right?
> MSI support may be transparent to some device drivers.

Yet you provide an api for drivers to use, right?  Why would a driver
want to use this api instead of just having the hardware work properly
on its own?  I guess I'm confused as to how to modify drivers to support
this, or if that's even needed to be done.

> > +++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
> > @@ -76,9 +76,14 @@
> >   * Since vectors 0x00-0x1f are used/reserved for the CPU,
> >   * the usable vector space is 0x20-0xff (224 vectors)
> >   */
> > +#define NR_VECTORS 256
> 
> >Will this _always_ be the value?  Can boxes have bigger numbers (I
> >haven't seen the spec, so I don't know...)
> Yes, this will always be 256 since the current CPU architecture
> supports a maximum of 256 vectors.

The "current" one does.  What happens in a few months when I go to help
bring up the next gen hardware platform?  Will I run into the problem
that this number needs to be increased?  Can we determine it at
run-time?

> > +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c	2003-08-05 09:45:25.000000000 -0400
> > @@ -166,6 +166,11 @@
> >  EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
> >  #endif
> >  
> > +#ifdef CONFIG_PCI_MSI
> > +EXPORT_SYMBOL(msix_alloc_vectors);
> > +EXPORT_SYMBOL(msix_free_vectors);
> > +#endif
> > +
> >  #ifdef CONFIG_MCA
> >  EXPORT_SYMBOL(machine_id);
> >  #endif
> 
> >Put the EXPORT_SYMBOL in the files that have the functions.  Don't
> >clutter up the ksyms.c files anymore.
> Agree. Will put these EXPORT_SYMBOL(s) in the file
> ../include/asm-i386/hw_irq.h where these functions are declared as
> extern. Thanks.

No, it needs to go into the .c file where the function is.

> > +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };
> 
> >Shouldn't this be static?
> 
> > +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };
> 
> >Same with this one?
> 
> In the initial design draft, I am thinking of defining
> DRIVER_NOMSI_MAX as 32 since it may be impossible to have more than 32
> MSI capable devices populated in the same system. I will add some
> comments to it and look forward for your comments.

Um, I ment the variable is in the global symbol table, and it doesn't
look like it needs to be.

But your comment about number of devices is also a good one.  Any way we
can figure that out at run-time too?  What happens when I hot-add the 33
MSI capable device?  Or just boot with that many devices (I've seen
boxes with 100s of pci devices all attached to one system)?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
@ 2003-08-08 19:33 long
  2003-08-08 20:15 ` Greg KH
  0 siblings, 1 reply; 35+ messages in thread
From: long @ 2003-08-08 19:33 UTC (permalink / raw)
  To: greg, tlnguyen; +Cc: jun.nakajima, linux-kernel, pcihpd-discuss, tom.l.nguyen

>pcihpd-discuss?  Don't you mean the linux-pci mailing list instead?
pcihpd-discuss is my first assumption of what you mean the linux-pci mailing list. Please forward me the email address of the linux-pci mailing list.

> - Is there any way to dynamically detect if hardware can support MSI?
The MSI patch dynamically detects whether hardware devices can support MSI once the CONFIG_PCI_MSI is set during kernel built. By default, the CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is also set to provide users an option to enable MSI on specific devices with the use of boot parameter "device_msi=". With both CONFIG_PCI_MSI and CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES set, the PCI subsystem dynamically detects if hardware devices can support MSI. However, enabling MSI on these devices is determined based on whether these devices are explicitly listed in the boot parameter "device_msi=". The reason is that users may not know how many MSI capable hardware devices are populated in their systems. The software device drivers of these devices may break their system once the patches are installed and users want the PCI subsystem to enable MSI automatically on all MSI capable devices. The second reason is that it may be difficult to debug all MSI capable devices at the same time. Users can de!
 bug on individual MSI capable device with its existing software driver. When all are fully validated, users can rebuild the kernel with the CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES cleared; the PCI subsystem will enable MSI on all MSI capable devices automatically and the boot parameter "device_msi=" is no longer necessary.

> - If you enable this option, will boxes that do not support it stop
   working?
Once users enable this option, all devices that do not support MSI are still working. These devices are by default using IRQ pin assertion as interrupt generated mechanism since all pre-assigned vectors to any enabled IOxAPICs are reserved.

>A driver has to be modified to use this option, right?  Do you have any
>drivers that have been modified?  Without that, I don't think we can
>test this patch out, right?
MSI support may be transparent to some device drivers. This transparency is dependent on whether the existing driver can provide the hardware/software synchronization; that means, once the hardware/firmware of the device signals a Vector A, it cannot signal Vector A again until it is explicitly enabled to do so by its device driver. Such example of this transparency is the Intel Pro(R) Ethernet (10000 MBit> driver, which works with these patches without requiring any changes in the existing driver. The second driver we are validating is Adaptec AIC79XX PCI-X SCSI HBA driver (Rev 1.3.9). This existing software driver is tested as a module driver (such as mount/unmount /dev/sda2 /mnt/temp and then read/write to files to this device). However, if its hardware device is a boot device, then the results are unpredictable. We are currently working with the driver engineer to root cause the issues.


> diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2/arch/i386/kernel/io_apic.c	2003-07-27 13:00:21.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400

>You are cluttering up this file with a lot of #ifdefs.  Is there any way
>you can not do this?
Agree. We will make appropriate changes in the next update release.

>Does this break the summit and/or NUMA builds?
We have not tested with these builds, I will add it to my list of builds to test. Thanks.

> diff -X excludes -urN linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h
> --- linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h	2003-07-27 12:58:54.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
> @@ -76,9 +76,14 @@
>   * Since vectors 0x00-0x1f are used/reserved for the CPU,
>   * the usable vector space is 0x20-0xff (224 vectors)
>   */
> +#define NR_VECTORS 256

>Will this _always_ be the value?  Can boxes have bigger numbers (I
>haven't seen the spec, so I don't know...)
Yes, this will always be 256 since the current CPU architecture supports a maximum of 256 vectors.

>Care to add a comment about this value?
Agree. Will add a comment about this value in the next update release.

> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c	2003-07-27 13:11:42.000000000 -0400
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c	2003-08-05 09:45:25.000000000 -0400
> @@ -166,6 +166,11 @@
>  EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI
> +EXPORT_SYMBOL(msix_alloc_vectors);
> +EXPORT_SYMBOL(msix_free_vectors);
> +#endif
> +
>  #ifdef CONFIG_MCA
>  EXPORT_SYMBOL(machine_id);
>  #endif

>Put the EXPORT_SYMBOL in the files that have the functions.  Don't
>clutter up the ksyms.c files anymore.
Agree. Will put these EXPORT_SYMBOL(s) in the file ../include/asm-i386/hw_irq.h where these functions are declared as extern. Thanks.



> +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };

>Shouldn't this be static?

> +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };

>Same with this one?

In the initial design draft, I am thinking of defining DRIVER_NOMSI_MAX as 32 since it may be impossible to have more than 32 MSI capable devices populated in the same system. I will add some comments to it and look forward for your comments.

> +	base = (u32*)ioremap_nocache(phys_addr,
> +		dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32));
> +	if (base == NULL) 
> +		goto free_region; 

...

> +	*base = address.lo_address.value;
> +	*(base + 1) = address.hi_address;
> +	*(base + 2) = *((u32*)&data);

>Don't do direct writes to memory, use the proper function calls.  You do
>this in a few other places too.
Thanks. I will replace all direct writes/reads with the proper functions calls such as writel()/readl().


thanks,
Long

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-08 14:41 Nguyen, Tom L
  0 siblings, 0 replies; 35+ messages in thread
From: Nguyen, Tom L @ 2003-08-08 14:41 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Nakajima, Jun, greg, long

> Seems like a lot of this file could go into a new file,
> drivers/pci/msi.c.  We'll want to share as much code as possible across
> all Linux architectures.
Agree. Next update release will incorporate your feedback.

Thanks,
Long


-----Original Message-----
From: Jeff Garzik [mailto:jgarzik@pobox.com]
Sent: Thursday, August 07, 2003 3:45 PM
To: long
Cc: linux-kernel@vger.kernel.org; Nakajima, Jun; Nguyen, Tom L;
greg@kroah.com
Subject: Re: Updated MSI Patches


long wrote:


> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c	1969-12-31 19:00:00.000000000 -0500
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c	2003-08-06 09:47:02.000000000 -0400

Seems like a lot of this file could go into a new file, 
drivers/pci/msi.c.  We'll want to share as much code as possible across 
all Linux architectures.

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-07 21:25 long
  2003-08-07 22:14 ` Greg KH
  2003-08-07 22:44 ` Jeff Garzik
@ 2003-08-08  2:36 ` Zwane Mwaikambo
  2 siblings, 0 replies; 35+ messages in thread
From: Zwane Mwaikambo @ 2003-08-08  2:36 UTC (permalink / raw)
  To: long; +Cc: Linux Kernel, pcihpd-discuss, Nakajima, Jun, tom.l.nguyen

On Thu, 7 Aug 2003, long wrote:

> - Change the interface name from msi_free_vectors to msix_free_vectors since 
> this interface is used for MSI-X device drivers, which request for releasing 
> the number of vector back to the PCI subsystem.
> - Change the function name from remove_hotplug_vectors to 
> msi_remove_pci_irq_vectors to have a close match with function name 
> msi_get_pci_irq_vector.

I think the vector allocator code can all be arch specific generic, 
there is no particular reason as to why it has to be MSI specific.

> diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2/arch/i386/kernel/io_apic.c	2003-07-27 13:00:21.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
> @@ -76,6 +76,20 @@
>  	int apic, pin, next;
>  } irq_2_pin[PIN_MAP_SIZE];
>  
> +#ifdef CONFIG_PCI_USE_VECTOR
> +int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
> + 
> +static int platform_irq(int irq) 	
> +{ 	
> +	if (platform_legacy_irq(irq))
> +		return irq;
> +	else
> +		return vector_irq[irq];
> +}
> +#else
> +#define platform_irq(irq)	(irq)
> +#endif 

I wish you wouldn't mix up vector and irq like this :( This is really 
beginning to look like IPF. When is an irq an irq and when is a vector a 
vector? This can get very confusing very quickly. I spent a fair 
amount of time debugging this last time due to the confusion in the 
common_interrupt -> do_IRQ path, i'd hate for this to be a permanent 
fixture. If you're going to change things just change everything 
(including variable names) instead of making it conditional.

> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c	2003-08-05 09:45:25.000000000 -0400
> @@ -427,6 +427,11 @@
>  			/* Is this an active IRQ? */
>  			if (!irq_desc[j].action)
>  				continue;
> +#ifdef CONFIG_PCI_MSI
> +			/* Is this an active MSI? */
> +			if (msi_desc[j])
> +				continue;
> +#endif

Is there any locking for msi_desc? Or are you relying on something else?

> +static int nr_alloc_vectors = 0;
> +static int nr_released_vectors = 0;

No need to initialise to 0 it'll be done for you when .bss gets zeroed.

> +	if (current_vector == FIRST_SYSTEM_VECTOR) 
> +		panic("ran out of interrupt sources!");

Please just fail and return -ENOSPC or something like that.

Besides my whining thanks for cranking this out.

	Zwane


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Updated MSI Patches
@ 2003-08-07 23:07 Nakajima, Jun
  0 siblings, 0 replies; 35+ messages in thread
From: Nakajima, Jun @ 2003-08-07 23:07 UTC (permalink / raw)
  To: Jeff Garzik, long; +Cc: linux-kernel, Nguyen, Tom L, greg

> Seems like a lot of this file could go into a new file,
> drivers/pci/msi.c.  We'll want to share as much code as possible
across
> all Linux architectures.
Agree. 

At the same time, Long, you need to provide more info so that more
people can play with MSI. 

Jun

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Thursday, August 07, 2003 3:45 PM
> To: long
> Cc: linux-kernel@vger.kernel.org; Nakajima, Jun; Nguyen, Tom L;
> greg@kroah.com
> Subject: Re: Updated MSI Patches
> 
> long wrote:
> 
> 
> > diff -X excludes -urN linux-2.6.0-test2-create-
> vectorbase/arch/i386/kernel/pci_msi.c linux-2.6.0-test2-create-
> msi/arch/i386/kernel/pci_msi.c
> > --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c
> 	1969-12-31 19:00:00.000000000 -0500
> > +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c	2003-08-
> 06 09:47:02.000000000 -0400
> 
> Seems like a lot of this file could go into a new file,
> drivers/pci/msi.c.  We'll want to share as much code as possible
across
> all Linux architectures.
> 
> 	Jeff
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-07 22:44 ` Jeff Garzik
@ 2003-08-07 23:03   ` Matt Porter
  0 siblings, 0 replies; 35+ messages in thread
From: Matt Porter @ 2003-08-07 23:03 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: long, linux-kernel, jun.nakajima, tom.l.nguyen, greg

On Thu, Aug 07, 2003 at 06:44:46PM -0400, Jeff Garzik wrote:
> long wrote:
> 
> 
> > diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c
> > --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c	1969-12-31 19:00:00.000000000 -0500
> > +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c	2003-08-06 09:47:02.000000000 -0400
> 
> Seems like a lot of this file could go into a new file, 
> drivers/pci/msi.c.  We'll want to share as much code as possible across 
> all Linux architectures.

Yes.  The on-chip PCI-X host bridge on PPC44x parts has MSI support, so
I can make use of this if it goes in the right place.
 
--
Matt Porter
mporter@kernel.crashing.org

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-07 21:25 long
  2003-08-07 22:14 ` Greg KH
@ 2003-08-07 22:44 ` Jeff Garzik
  2003-08-07 23:03   ` Matt Porter
  2003-08-08  2:36 ` Zwane Mwaikambo
  2 siblings, 1 reply; 35+ messages in thread
From: Jeff Garzik @ 2003-08-07 22:44 UTC (permalink / raw)
  To: long; +Cc: linux-kernel, jun.nakajima, tom.l.nguyen, greg

long wrote:


> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c	1969-12-31 19:00:00.000000000 -0500
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c	2003-08-06 09:47:02.000000000 -0400

Seems like a lot of this file could go into a new file, 
drivers/pci/msi.c.  We'll want to share as much code as possible across 
all Linux architectures.

	Jeff



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Updated MSI Patches
  2003-08-07 21:25 long
@ 2003-08-07 22:14 ` Greg KH
  2003-08-07 22:44 ` Jeff Garzik
  2003-08-08  2:36 ` Zwane Mwaikambo
  2 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2003-08-07 22:14 UTC (permalink / raw)
  To: long; +Cc: linux-kernel, pcihpd-discuss, jun.nakajima, tom.l.nguyen

pcihpd-discuss?  Don't you mean the linux-pci mailing list instead?

A few initial comments on your patch:
 - Is there any way to dynamically detect if hardware can support MSI?
 - If you enable this option, will boxes that do not support it stop
   working?

A driver has to be modified to use this option, right?  Do you have any
drivers that have been modified?  Without that, I don't think we can
test this patch out, right?


> diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c
> --- linux-2.6.0-test2/arch/i386/kernel/io_apic.c	2003-07-27 13:00:21.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400

You are cluttering up this file with a lot of #ifdefs.  Is there any way
you can not do this?

Does this break the summit and/or NUMA builds?

> diff -X excludes -urN linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h
> --- linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h	2003-07-27 12:58:54.000000000 -0400
> +++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
> @@ -76,9 +76,14 @@
>   * Since vectors 0x00-0x1f are used/reserved for the CPU,
>   * the usable vector space is 0x20-0xff (224 vectors)
>   */
> +#define NR_VECTORS 256

Will this _always_ be the value?  Can boxes have bigger numbers (I
haven't seen the spec, so I don't know...)

Care to add a comment about this value?

> diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c
> --- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c	2003-07-27 13:11:42.000000000 -0400
> +++ linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c	2003-08-05 09:45:25.000000000 -0400
> @@ -166,6 +166,11 @@
>  EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
>  #endif
>  
> +#ifdef CONFIG_PCI_MSI
> +EXPORT_SYMBOL(msix_alloc_vectors);
> +EXPORT_SYMBOL(msix_free_vectors);
> +#endif
> +
>  #ifdef CONFIG_MCA
>  EXPORT_SYMBOL(machine_id);
>  #endif

Put the EXPORT_SYMBOL in the files that have the functions.  Don't
clutter up the ksyms.c files anymore.



> +u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };

Shouldn't this be static?

> +u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };

Same with this one?


> +	base = (u32*)ioremap_nocache(phys_addr,
> +		dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32));
> +	if (base == NULL) 
> +		goto free_region; 

...

> +	*base = address.lo_address.value;
> +	*(base + 1) = address.hi_address;
> +	*(base + 2) = *((u32*)&data);

Don't do direct writes to memory, use the proper function calls.  You do
this in a few other places too.

That's enough to get the conversation started :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Updated MSI Patches
@ 2003-08-07 21:25 long
  2003-08-07 22:14 ` Greg KH
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: long @ 2003-08-07 21:25 UTC (permalink / raw)
  To: linux-kernel, pcihpd-discuss; +Cc: jun.nakajima, tom.l.nguyen

Hi,

Here is an updated version of the MSI patch I posted for comments
several weeks ago. The patches below are based on kernel version 2.6.0-test2.
The changes from the original patch include:

- Separate the original patch into two patches - a vector base patch and MSI patch based on the vector base. The vector base patch replaces irq index scheme with vector-base index scheme. MSI patch requires the vector base patch to allocate new vectors on the fly.
- Change the interface name from msi_alloc_vectors to msix_alloc_vectors since 
this interface is used for MSI-X device drivers, which request for additional 
messages.
- Change the interface name from msi_free_vectors to msix_free_vectors since 
this interface is used for MSI-X device drivers, which request for releasing 
the number of vector back to the PCI subsystem.
- Change the function name from remove_hotplug_vectors to 
msi_remove_pci_irq_vectors to have a close match with function name 
msi_get_pci_irq_vector.
- Fix a bug in function which processes boot parameter "device_nomsi="
- Add MSI support to PCI hotplug by making MSI support transparent to hotplug 
controller drivers. The original patch required the hotplug controller drivers 
to explicitly call msi_get_pci_irq_vector and msi_remove_pci_irq_vectors 
functions during hot-add/hot-remove operations. This approach is not a good 
idea since it requires changes in the hot plug controller drivers. This patch 
makes MSI support transparent to any hotplug controller drivers.
- Add Documentation/MSI-HOWTO.txt to describe the basics of MSI, the 
advantages of using MSI over traditional interrupt mechanism, and how to 
enable the device drivers to use MSI or MSI-X. Also included is a description 
of debugging features available and Frequently Asked Questions.

I'd appreciate any feedback you have on the patch.   


Thanks,
Tom

diff -X excludes -urN linux-2.6.0-test2/arch/i386/Kconfig linux-2.6.0-test2-create-vectorbase/arch/i386/Kconfig
--- linux-2.6.0-test2/arch/i386/Kconfig	2003-07-27 12:57:48.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/arch/i386/Kconfig	2003-08-05 09:25:54.000000000 -0400
@@ -1072,6 +1072,17 @@
  	depends on PCI && ((PCI_GODIRECT || PCI_GOANY) || X86_VISWS)
 	default y
 
+config PCI_USE_VECTOR
+	bool "PCI_USE_VECTOR"
+	default n
+	help
+	   This replaces the current existing IRQ-based index interrupt scheme
+	   with the vector-base index scheme. The advantages of vector base over	   IRQ base are listed below:
+	   1) Support MSI implementation.
+	   2) Support future IOxAPIC hotplug
+
+	   If you don't know what to do here, say N.
+
 source "drivers/pci/Kconfig"
 
 config ISA
diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/i8259.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i8259.c
--- linux-2.6.0-test2/arch/i386/kernel/i8259.c	2003-07-27 13:09:30.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i8259.c	2003-08-05 09:25:54.000000000 -0400
@@ -419,8 +419,10 @@
 	 * us. (some of these will be overridden and become
 	 * 'special' SMP interrupts)
 	 */
-	for (i = 0; i < NR_IRQS; i++) {
+	for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
 		int vector = FIRST_EXTERNAL_VECTOR + i;
+		if (i >= NR_IRQS)
+			break;
 		if (vector != SYSCALL_VECTOR) 
 			set_intr_gate(vector, interrupt[i]);
 	}
diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c
--- linux-2.6.0-test2/arch/i386/kernel/io_apic.c	2003-07-27 13:00:21.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
@@ -76,6 +76,20 @@
 	int apic, pin, next;
 } irq_2_pin[PIN_MAP_SIZE];
 
+#ifdef CONFIG_PCI_USE_VECTOR
+int vector_irq[NR_IRQS] = { [0 ... NR_IRQS -1] = -1};
+ 
+static int platform_irq(int irq) 	
+{ 	
+	if (platform_legacy_irq(irq))
+		return irq;
+	else
+		return vector_irq[irq];
+}
+#else
+#define platform_irq(irq)	(irq)
+#endif 
+
 /*
  * The common case is 1:1 IRQ<->pin mappings. Sometimes there are
  * shared ISA-space IRQs, so we have to support them. We are super
@@ -106,7 +120,7 @@
 				      int oldapic, int oldpin,
 				      int newapic, int newpin)
 {
-	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *entry = irq_2_pin + platform_irq(irq);
 
 	while (1) {
 		if (entry->apic == oldapic && entry->pin == oldpin) {
@@ -123,7 +137,7 @@
 static void __mask_IO_APIC_irq (unsigned int irq)
 {
 	int pin;
-	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *entry = irq_2_pin + platform_irq(irq);
 
 	for (;;) {
 		unsigned int reg;
@@ -143,7 +157,7 @@
 static void __unmask_IO_APIC_irq (unsigned int irq)
 {
 	int pin;
-	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *entry = irq_2_pin + platform_irq(irq);
 
 	for (;;) {
 		unsigned int reg;
@@ -162,7 +176,7 @@
 static void __mask_and_edge_IO_APIC_irq (unsigned int irq)
 {
 	int pin;
-	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *entry = irq_2_pin + platform_irq(irq);
 
 	for (;;) {
 		unsigned int reg;
@@ -182,7 +196,7 @@
 static void __unmask_and_level_IO_APIC_irq (unsigned int irq)
 {
 	int pin;
-	struct irq_pin_list *entry = irq_2_pin + irq;
+	struct irq_pin_list *entry = irq_2_pin + platform_irq(irq);
 
 	for (;;) {
 		unsigned int reg;
@@ -1140,8 +1154,10 @@
 static int __init assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
-	if (IO_APIC_VECTOR(irq) > 0)
-		return IO_APIC_VECTOR(irq);
+	int vector;
+
+	if ((vector = IO_APIC_VECTOR(irq)) > 0)
+		return vector;
 next:
 	current_vector += 8;
 	if (current_vector == SYSCALL_VECTOR)
@@ -1152,13 +1168,46 @@
 		current_vector = FIRST_DEVICE_VECTOR + offset;
 	}
 
+#ifdef CONFIG_PCI_USE_VECTOR
+	vector_irq[current_vector] = irq;
+#endif
 	IO_APIC_VECTOR(irq) = current_vector;
+
 	return current_vector;
 }
 
 static struct hw_interrupt_type ioapic_level_irq_type;
 static struct hw_interrupt_type ioapic_edge_irq_type;
 
+#define IOAPIC_AUTO	-1
+#define IOAPIC_EDGE	0
+#define IOAPIC_LEVEL	1
+
+static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger)
+{
+#ifdef CONFIG_PCI_USE_VECTOR
+			if (!platform_legacy_irq(irq)) {
+				if ((trigger == IOAPIC_AUTO && 
+				     IO_APIC_irq_trigger(irq)) ||
+				    trigger == IOAPIC_LEVEL)
+					irq_desc[vector].handler = &ioapic_level_irq_type;
+				else
+					irq_desc[vector].handler = &ioapic_edge_irq_type;
+				set_intr_gate(vector, interrupt[vector]);
+
+			} else 
+#endif
+			{
+				if ((trigger == IOAPIC_AUTO && 
+				     IO_APIC_irq_trigger(irq)) ||
+				    trigger == IOAPIC_LEVEL)
+					irq_desc[irq].handler = &ioapic_level_irq_type;
+				else
+					irq_desc[irq].handler = &ioapic_edge_irq_type;
+				set_intr_gate(vector, interrupt[irq]);
+			}
+}
+
 void __init setup_IO_APIC_irqs(void)
 {
 	struct IO_APIC_route_entry entry;
@@ -1215,13 +1264,7 @@
 		if (IO_APIC_IRQ(irq)) {
 			vector = assign_irq_vector(irq);
 			entry.vector = vector;
-
-			if (IO_APIC_irq_trigger(irq))
-				irq_desc[irq].handler = &ioapic_level_irq_type;
-			else
-				irq_desc[irq].handler = &ioapic_edge_irq_type;
-
-			set_intr_gate(vector, interrupt[irq]);
+			ioapic_register_intr(irq, vector, IOAPIC_AUTO);
 		
 			if (!apic && (irq < 16))
 				disable_8259A_irq(irq);
@@ -1848,7 +1891,13 @@
  * operation to prevent an edge-triggered interrupt escaping meanwhile.
  * The idea is from Manfred Spraul.  --macro
  */
+#ifdef CONFIG_PCI_USE_VECTOR
+	if (!platform_legacy_irq(irq))		/* it's already the vector */
+		i = irq;
+	else
+#endif
 	i = IO_APIC_VECTOR(irq);
+
 	v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
 
 	ack_APIC_irq();
@@ -1932,7 +1981,15 @@
 	 * 0x80, because int 0x80 is hm, kind of importantish. ;)
 	 */
 	for (irq = 0; irq < NR_IRQS ; irq++) {
+#ifdef CONFIG_PCI_USE_VECTOR
+		int tmp;
+		tmp = platform_irq(irq);
+		if (tmp == -1)
+			continue;
+		if (IO_APIC_IRQ(tmp) && !IO_APIC_VECTOR(tmp)) {
+#else
 		if (IO_APIC_IRQ(irq) && !IO_APIC_VECTOR(irq)) {
+#endif
 			/*
 			 * Hmm.. We don't have an entry for this,
 			 * so default to an old-fashioned 8259
@@ -2362,9 +2419,7 @@
 		"IRQ %d)\n", ioapic, 
 		mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq);
 
-	irq_desc[irq].handler = &ioapic_level_irq_type;
-
-	set_intr_gate(entry.vector, interrupt[irq]);
+	ioapic_register_intr(irq, entry.vector, IOAPIC_LEVEL);
 
 	if (!ioapic && (irq < 16))
 		disable_8259A_irq(irq);
diff -X excludes -urN linux-2.6.0-test2/arch/i386/kernel/mpparse.c linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/mpparse.c
--- linux-2.6.0-test2/arch/i386/kernel/mpparse.c	2003-07-27 12:59:51.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/mpparse.c	2003-08-05 09:25:54.000000000 -0400
@@ -1124,6 +1124,11 @@
 		if ((1<<bit) & mp_ioapic_routing[ioapic].pin_programmed[idx]) {
 			printk(KERN_DEBUG "Pin %d-%d already programmed\n",
 				mp_ioapic_routing[ioapic].apic_id, ioapic_pin);
+#ifdef CONFIG_PCI_USE_VECTOR
+			if (!platform_legacy_irq(irq))
+				entry->irq = IO_APIC_VECTOR(irq);
+			else
+#endif
 			entry->irq = irq;
 			continue;
 		}
@@ -1131,6 +1136,11 @@
 		mp_ioapic_routing[ioapic].pin_programmed[idx] |= (1<<bit);
 
 		if (!io_apic_set_pci_routing(ioapic, ioapic_pin, irq))
+#ifdef CONFIG_PCI_USE_VECTOR
+			if (!platform_legacy_irq(irq))
+				entry->irq = IO_APIC_VECTOR(irq);
+			else
+#endif
 			entry->irq = irq;
 
 		printk(KERN_DEBUG "%02x:%02x:%02x[%c] -> %d-%d -> IRQ %d\n",
diff -X excludes -urN linux-2.6.0-test2/arch/i386/pci/irq.c linux-2.6.0-test2-create-vectorbase/arch/i386/pci/irq.c
--- linux-2.6.0-test2/arch/i386/pci/irq.c	2003-07-27 13:11:50.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/arch/i386/pci/irq.c	2003-08-05 09:25:54.000000000 -0400
@@ -743,6 +743,11 @@
 							bridge->bus->number, PCI_SLOT(bridge->devfn), pin, irq);
 				}
 				if (irq >= 0) {
+#ifdef CONFIG_PCI_USE_VECTOR
+					if (!platform_legacy_irq(irq))
+						irq = IO_APIC_VECTOR(irq);
+#endif
+
 					printk(KERN_INFO "PCI->APIC IRQ transform: (B%d,I%d,P%d) -> %d\n",
 						dev->bus->number, PCI_SLOT(dev->devfn), pin, irq);
 					dev->irq = irq;
diff -X excludes -urN linux-2.6.0-test2/include/asm-i386/hw_irq.h linux-2.6.0-test2-create-vectorbase/include/asm-i386/hw_irq.h
--- linux-2.6.0-test2/include/asm-i386/hw_irq.h	2003-07-27 13:11:11.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/hw_irq.h	2003-08-05 09:25:54.000000000 -0400
@@ -41,6 +41,7 @@
 extern asmlinkage void error_interrupt(void);
 extern asmlinkage void spurious_interrupt(void);
 extern asmlinkage void thermal_interrupt(struct pt_regs);
+#define platform_legacy_irq(irq)	((irq) < 16)
 #endif
 
 extern void mask_irq(unsigned int irq);
diff -X excludes -urN linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h
--- linux-2.6.0-test2/include/asm-i386/mach-default/irq_vectors.h	2003-07-27 12:58:54.000000000 -0400
+++ linux-2.6.0-test2-create-vectorbase/include/asm-i386/mach-default/irq_vectors.h	2003-08-05 09:25:54.000000000 -0400
@@ -76,9 +76,14 @@
  * Since vectors 0x00-0x1f are used/reserved for the CPU,
  * the usable vector space is 0x20-0xff (224 vectors)
  */
+#define NR_VECTORS 256
 #ifdef CONFIG_X86_IO_APIC
+#ifndef CONFIG_PCI_USE_VECTOR
 #define NR_IRQS 224
 #else
+#define NR_IRQS FIRST_SYSTEM_VECTOR
+#endif
+#else
 #define NR_IRQS 16
 #endif
 

diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/Kconfig linux-2.6.0-test2-create-msi/arch/i386/Kconfig
--- linux-2.6.0-test2-create-vectorbase/arch/i386/Kconfig	2003-08-05 09:25:54.000000000 -0400
+++ linux-2.6.0-test2-create-msi/arch/i386/Kconfig	2003-08-05 09:45:25.000000000 -0400
@@ -1083,6 +1083,38 @@
 
 	   If you don't know what to do here, say N.
 
+config PCI_MSI
+	bool "ENABLE PCI MSI"
+	depends on PCI_USE_VECTOR
+	default n
+	help
+	   MSI means Message Signaled Interrupt, which allows any MSI-capable
+	   hardware device to send an inbound Memory Write on its PCI bus
+	   instead of asserting IRQ signal on device IRQ pin.
+
+	   This provides PCI MSI support. Enabling PCI MSI support requires 
+	   users set CONFIG_PCI_USE_VECTOR and this parameter since MSI support
+	   requires vector base indexing support.
+
+	   If you don't know what MSI is or enabling MSI does not serve you 
+	   any goods, say N.
+
+config PCI_MSI_ON_SPECIFIC_DEVICES
+	bool "ENABLE MSI ON SPECIFIC DEVICES (DEBUG)"
+	depends on PCI_MSI
+	default y
+	help
+	   This provides users with an option, which enables MSI on specific 
+	   devices instead of all (the default of the configuration parameter
+	   CONFIG_PCI_MSI). This, if set, prohibits the PCI subsystem from
+	   enabling MSI on all MSI capable devices except ones listed in the 
+	   boot parameter "device_msi=".
+
+	   Since some devices, which may have bugs in MSI, may break once MSI
+	   support is invoked in the kernel, this option serves to enable 
+	   specific MSI capable devices, which are fully validated with the 
+	   corresponding software drivers.   
+
 source "drivers/pci/Kconfig"
 
 config ISA
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c
--- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/i386_ksyms.c	2003-07-27 13:11:42.000000000 -0400
+++ linux-2.6.0-test2-create-msi/arch/i386/kernel/i386_ksyms.c	2003-08-05 09:45:25.000000000 -0400
@@ -166,6 +166,11 @@
 EXPORT_SYMBOL(IO_APIC_get_PCI_irq_vector);
 #endif
 
+#ifdef CONFIG_PCI_MSI
+EXPORT_SYMBOL(msix_alloc_vectors);
+EXPORT_SYMBOL(msix_free_vectors);
+#endif
+
 #ifdef CONFIG_MCA
 EXPORT_SYMBOL(machine_id);
 #endif
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c
--- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/io_apic.c	2003-08-05 09:25:54.000000000 -0400
+++ linux-2.6.0-test2-create-msi/arch/i386/kernel/io_apic.c	2003-08-05 09:45:25.000000000 -0400
@@ -427,6 +427,11 @@
 			/* Is this an active IRQ? */
 			if (!irq_desc[j].action)
 				continue;
+#ifdef CONFIG_PCI_MSI
+			/* Is this an active MSI? */
+			if (msi_desc[j])
+				continue;
+#endif
 			if ( package_index == i )
 				IRQ_DELTA(package_index,j) = 0;
 			/* Determine the total count per processor per IRQ */
@@ -1151,6 +1156,7 @@
 
 int irq_vector[NR_IRQS] = { FIRST_DEVICE_VECTOR , 0 };
 
+#ifndef CONFIG_PCI_MSI
 static int __init assign_irq_vector(int irq)
 {
 	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
@@ -1175,6 +1181,9 @@
 
 	return current_vector;
 }
+#else
+extern int assign_irq_vector(int irq);
+#endif
 
 static struct hw_interrupt_type ioapic_level_irq_type;
 static struct hw_interrupt_type ioapic_edge_irq_type;
@@ -2267,6 +2276,16 @@
 	print_IO_APIC();
 }
 
+void restore_ioapic_irq_handler(int irq)
+{
+	spin_lock(&irq_desc[irq].lock);
+	if (IO_APIC_irq_trigger(irq))
+		irq_desc[irq].handler = &ioapic_level_irq_type;
+	else
+		irq_desc[irq].handler = &ioapic_edge_irq_type;
+	spin_unlock(&irq_desc[irq].lock);
+}
+
 /*
  *	Called after all the initialization is done. If we didnt find any
  *	APIC bugs then we can allow the modify fast path
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/Makefile linux-2.6.0-test2-create-msi/arch/i386/kernel/Makefile
--- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/Makefile	2003-07-27 12:57:50.000000000 -0400
+++ linux-2.6.0-test2-create-msi/arch/i386/kernel/Makefile	2003-08-05 09:45:25.000000000 -0400
@@ -24,6 +24,7 @@
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-$(CONFIG_X86_LOCAL_APIC)	+= apic.o nmi.o
 obj-$(CONFIG_X86_IO_APIC)	+= io_apic.o
+obj-$(CONFIG_PCI_MSI)		+= pci_msi.o
 obj-$(CONFIG_SOFTWARE_SUSPEND)	+= suspend.o suspend_asm.o
 obj-$(CONFIG_X86_NUMAQ)		+= numaq.o
 obj-$(CONFIG_X86_SUMMIT)	+= summit.o
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c
--- linux-2.6.0-test2-create-vectorbase/arch/i386/kernel/pci_msi.c	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.0-test2-create-msi/arch/i386/kernel/pci_msi.c	2003-08-06 09:47:02.000000000 -0400
@@ -0,0 +1,896 @@
+/*
+ * linux/arch/i386/kernel/pci_msi.c
+ */
+
+#include <linux/mm.h>
+#include <linux/irq.h>		
+#include <linux/interrupt.h>
+#include <linux/init.h>		
+#include <linux/config.h>      	
+#include <linux/ioport.h>       
+#include <linux/smp_lock.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/acpi.h>
+
+#include <asm/errno.h>
+#include <asm/io.h>		
+#include <asm/smp.h>
+#include <asm/desc.h>		
+#include <mach_apic.h>
+
+#include <linux/pci_msi.h>
+
+_DEFINE_DBG_BUFFER
+
+static spinlock_t msi_lock = SPIN_LOCK_UNLOCKED;
+struct msi_desc* msi_desc[NR_IRQS] = { [0 ... NR_IRQS-1] = NULL };
+
+int pci_msi_enable = 1;
+static int nr_alloc_vectors = 0;
+static int nr_released_vectors = 0;
+
+static int __init pci_msi_clear(char *str)
+{
+	pci_msi_enable = 0;
+	return 1;
+}
+__setup("pci_nomsi", pci_msi_clear);
+
+/**
+ * create_device_nomsi_list - create a list of devices with MSI disabled
+ * argument: none
+ *
+ * description: provide a temperary workaround for a PCI/PCI-X/PCI Express 
+ * hardware device, which may require IHV investigation if there is an issue. 
+ * It would be a good idea not to force MSI at device level, unless 
+ * system-wide boot parameter CONFIG_PCI_MSI is disabled. 
+ * A boot parameter device_nomsi=[HW] can be used for this purpose, where
+ * [HW] is a list of devices requesting for MSI disabled at device level
+ * (apply only for MSI-capable devices).
+ **/
+u32 device_nomsi_list[DRIVER_NOMSI_MAX] = {0, };
+static int __init create_device_nomsi_list(char *str)
+{
+	int i;
+	
+	if (!ENABLE_MSI_ON_ALL) {
+		printk(KERN_INFO "WARNING: boot param device_nomsi= invalid. Ignore!\n");
+		return 1;
+	}
+	
+	str = get_options(str,ARRAY_SIZE(device_nomsi_list), device_nomsi_list);
+	if (!device_nomsi_list[0])  /* first item contains number of entries */
+		return 0;
+
+	for (i = 0;i < device_nomsi_list[0]; i++) 
+		printk(KERN_INFO "User requests no MSI for dev (D%04x-V%04x)\n",
+		(device_nomsi_list[i + 1] >> 16), 
+		(device_nomsi_list[i + 1] & 0xffff));
+
+	return 1;
+}
+__setup("device_nomsi=", create_device_nomsi_list);
+
+u32 device_msi_list[DRIVER_NOMSI_MAX] = {0, };
+static int __init create_device_msi_list(char *str)
+{
+	int i;
+	
+	if (ENABLE_MSI_ON_ALL) {
+		printk(KERN_INFO "WARNING: boot param device_msi= invalid. Ignore!\n");
+		return 1;
+	}
+
+	str = get_options(str,ARRAY_SIZE(device_msi_list), device_msi_list);
+	if (!device_msi_list[0])  /* first item contains number of entries */
+		return 0;
+
+	for (i = 0;i < device_msi_list[0]; i++) 
+		printk(KERN_INFO "User requests MSI for dev (D%04x-V%04x)\n",
+		(device_msi_list[i + 1] >> 16), 
+		(device_msi_list[i + 1] & 0xffff));
+
+	return 1;
+}
+__setup("device_msi=", create_device_msi_list);
+
+static int is_msi_support_for_this_device(struct pci_dev *dev)
+{
+	int		i, enabling_msi = 1; 	
+	u32		did_vid, *list;
+
+	if (!ENABLE_MSI_ON_ALL) {
+		enabling_msi = 0;
+		if (!device_msi_list[0]) 
+			return (enabling_msi);
+		list = &device_msi_list[0];
+	} else {
+		if (!device_nomsi_list[0]) 
+			return (enabling_msi);
+		list = &device_nomsi_list[0];
+	} 
+	did_vid = (dev->device << 16) | dev->vendor;
+	for (i = 0; i < *list; i++) {
+		if (*(list + i + 1) == did_vid) {
+			enabling_msi = !(enabling_msi);	
+			break;
+		}
+	}
+
+	return (enabling_msi);
+}
+
+static void msi_set_mask_bit(unsigned int irq, int flag)
+{
+	struct msi_desc *entry;
+	
+	entry = (struct msi_desc *)msi_desc[irq];
+	if (!entry || !entry->dev || !entry->mask_entry_addr)
+		return;
+	switch (entry->msi_attrib.type) {
+	case PCI_CAP_ID_MSI:
+	{
+		int		pos;
+		unsigned int	mask_bits;
+		pos = *(int *)entry->mask_entry_addr;
+	        entry->dev->bus->ops->read(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, &mask_bits);
+		mask_bits &= ~(1);
+		mask_bits |= flag;
+	        entry->dev->bus->ops->write(entry->dev->bus, entry->dev->devfn, 
+				pos, 4, mask_bits);
+		break;
+	}
+	case PCI_CAP_ID_MSIX:
+	{
+		int dw_off = entry->msi_attrib.entry_nr*PCI_MSIX_ENTRY_SIZE - 1;
+		*(u32*)(entry->mask_entry_addr + dw_off) = flag;
+		break;
+	}
+	default:
+		break;
+	} 
+}
+
+static void mask_MSI_irq(unsigned int irq)
+{
+	msi_set_mask_bit(irq, 1);
+}
+
+static void unmask_MSI_irq(unsigned int irq)
+{
+	msi_set_mask_bit(irq, 0);
+}
+
+static unsigned int startup_msi_irq_wo_maskbit(unsigned int irq) 
+{
+	return 0;	/* never anything pending */
+}
+static void shutdown_msi_irq_wo_maskbit(unsigned int irq) {}
+static void enable_msi_irq_wo_maskbit(unsigned int irq) {}
+static void disable_msi_irq_wo_maskbit(unsigned int irq) {}
+static void act_msi_irq_wo_maskbit(unsigned int irq) {}
+static void end_msi_irq_wo_maskbit(unsigned int irq) 
+{
+	ack_APIC_irq();	
+}
+
+static unsigned int startup_msi_irq_w_maskbit(unsigned int irq)
+{ 
+	unmask_MSI_irq(irq);
+	return 0;	/* never anything pending */
+}
+
+#define shutdown_msi_irq_w_maskbit	disable_msi_irq_w_maskbit
+#define enable_msi_irq_w_maskbit	unmask_MSI_irq
+#define disable_msi_irq_w_maskbit	mask_MSI_irq
+#define act_msi_irq_w_maskbit		mask_MSI_irq
+static void end_msi_irq_w_maskbit(unsigned int irq) 
+{
+	unmask_MSI_irq(irq);	
+	ack_APIC_irq();
+}
+
+static struct hw_interrupt_type msi_irq_w_maskbit_type = {
+	"PCI MSI",
+	startup_msi_irq_w_maskbit,
+	shutdown_msi_irq_w_maskbit,
+	enable_msi_irq_w_maskbit,
+	disable_msi_irq_w_maskbit,
+	act_msi_irq_w_maskbit,		
+	end_msi_irq_w_maskbit,	
+	NULL
+};
+
+static struct hw_interrupt_type msi_irq_wo_maskbit_type = {
+	"PCI MSI",
+	startup_msi_irq_wo_maskbit,
+	shutdown_msi_irq_wo_maskbit,
+	enable_msi_irq_wo_maskbit,
+	disable_msi_irq_wo_maskbit,
+	act_msi_irq_wo_maskbit,		
+	end_msi_irq_wo_maskbit,	
+	NULL
+};
+
+static void msi_data_init(struct msg_data *msi_data, 
+			  unsigned int vector)
+{
+	memset(msi_data, 0, sizeof(struct msg_data));
+	msi_data->vector = (u8)vector;
+	msi_data->delivery_mode = INT_DELIVERY_MODE;
+	msi_data->dest_mode = INT_DEST_MODE;
+	msi_data->trigger = 0;	
+	msi_data->delivery_status = 0;	
+}
+
+static void msi_address_init(struct msg_address *msi_address)
+{
+	memset(msi_address, 0, sizeof(struct msg_address));
+	msi_address->hi_address = (u32)0;
+	msi_address->lo_address.value = (u32)mp_lapic_addr;
+	msi_address->lo_address.u.dest_mode = INT_DEST_MODE;
+	msi_address->lo_address.u.redirection_hint = 0; 
+	msi_address->lo_address.u.ext_dest_id = 0;
+	msi_address->lo_address.u.dest_id = TARGET_CPUS;
+}
+
+static int pci_vector_resources(void) 					
+{	
+	static int res = -EINVAL;	
+	
+	if (res == -EINVAL) {
+		int i, repeat;
+		for (i = NR_REPEATS; i > 0; i--) {				
+			if ((FIRST_DEVICE_VECTOR + i*8) > FIRST_SYSTEM_VECTOR)
+				continue; 					
+			break; 							
+		} 								
+		i++; 								
+		repeat = (FIRST_SYSTEM_VECTOR - FIRST_DEVICE_VECTOR)/i; 	
+		res = i*repeat - NR_RESERVED_VECTORS + 1; 
+	}
+
+	return (res + nr_released_vectors - nr_alloc_vectors);	
+}
+
+int assign_irq_vector(int irq)
+{
+	static int current_vector = FIRST_DEVICE_VECTOR, offset = 0;
+	
+	if (irq != MSI_AUTO && IO_APIC_VECTOR(irq) > 0)
+		return IO_APIC_VECTOR(irq);
+next:
+	current_vector += 8;
+	if (current_vector == SYSCALL_VECTOR)
+		goto next;
+
+	if (current_vector > FIRST_SYSTEM_VECTOR) {
+		offset++;
+		current_vector = FIRST_DEVICE_VECTOR + offset;
+	}
+
+	if (current_vector == FIRST_SYSTEM_VECTOR) 
+		panic("ran out of interrupt sources!");
+	
+	vector_irq[current_vector] = irq;
+	if (irq != MSI_AUTO)	
+		IO_APIC_VECTOR(irq) = current_vector;
+
+	nr_alloc_vectors++;
+
+	return current_vector;
+}
+
+static int assign_msi_vector(void)
+{
+	static int new_vector_avail = 1; 	
+	int vector;
+	unsigned long flags;
+	
+	/* 
+	 * msi_lock is provided to ensure that successful allocation of MSI
+	 * vector is assigned unique among drivers.
+	 */
+	spin_lock_irqsave(&msi_lock, flags);
+	if (!(pci_vector_resources() > 0)) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+
+	if (!new_vector_avail) {
+		/*
+	 	 * vector_irq[] = -1 indicates that this specific vector is:
+	 	 * - assigned for MSI (since MSI have no associated IRQ) or
+	 	 * - assigned for legacy if less than 16, or
+	 	 * - having no corresponding 1:1 vector-to-IOxAPIC IRQ mapping
+	 	 * vector_irq[] = 0 indicates that this vector, previously 
+		 * assigned for MSI, is freed by hotplug removed operations. 
+		 * This vector will be reused for any subsequent hotplug added 
+		 * operations.
+	 	 * vector_irq[] > 0 indicates that this vector is assigned for 
+		 * IOxAPIC IRQs. This vector and its value provides a 1-to-1 
+		 * vector-to-IOxAPIC IRQ mapping.
+	 	 */   
+		for (vector = 0; vector < NR_IRQS; vector++) { 
+			if (vector_irq[vector] != 0) 
+				continue;
+			vector_irq[vector] = -1;	
+			nr_released_vectors--;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			return vector;
+		}
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}	
+
+	vector = assign_irq_vector(MSI_AUTO);
+	if (vector  == (FIRST_SYSTEM_VECTOR - 8))  
+		new_vector_avail = 0;
+
+	spin_unlock_irqrestore(&msi_lock, flags);
+	return vector;
+}
+
+static int get_new_vector(void)
+{
+	int vector;
+	
+	if ((vector = assign_msi_vector()) > 0)
+		set_intr_gate(vector, interrupt[vector]);
+
+	return vector;
+}
+
+#ifdef CONFIG_ACPI_BOOT
+static int find_prt_ioapic_vector(struct pci_dev *dev)
+{
+	struct list_head *node = NULL;
+	struct acpi_prt_entry *entry = NULL;
+	u32 pin;
+	int shares = 0, vector = 0;
+	
+	dev->bus->ops->read(dev->bus, dev->devfn, PCI_INTERRUPT_PIN, 1, &pin);
+	pin--;
+	vector = acpi_pci_irq_lookup(dev->bus, PCI_SLOT(dev->devfn), pin);
+	if (vector_irq[vector] == -1)
+		return -EBUSY;
+
+	list_for_each(node, &acpi_prt.entries) {
+		entry = list_entry(node, struct acpi_prt_entry, node);
+		if (entry->link.handle)
+			continue;
+		if (entry->id.segment == 0 && entry->irq == vector) 
+			shares++;
+	}
+	if (shares != 1) 
+		return -EBUSY;
+	
+	return vector; 
+}
+#define get_ioapic_vector	find_prt_ioapic_vector
+#else
+static int find_ioapic_vector(struct pci_dev *dev)
+{
+	int 		irq = -1;
+
+	if (io_apic_assign_pci_irqs && IO_APIC_DEFINED) 
+	{
+		u32 		pin = 0;
+		unsigned int 	apic, intin;
+		int 		i = 0, shares = 0;
+		
+		dev->bus->ops->read(dev->bus, dev->devfn, PCI_INTERRUPT_PIN, 1,
+			&pin);
+		if (pin) {
+			pin--; 	
+			irq = IO_APIC_get_PCI_irq_vector(
+				dev->bus->number, PCI_SLOT(dev->devfn), pin);
+			for (apic = 0; apic < nr_ioapics; apic++) {
+				i += nr_ioapic_registers[apic];
+				if (i > irq) break;
+			}
+			intin = irq - (i - nr_ioapic_registers[apic]);	
+		        for (i = 0; i < mp_irq_entries; i++) 
+				if ((mp_ioapics[apic].mpc_apicid == 
+				    mp_irqs[i].mpc_dstapic) && /*MATCH APIC & */
+				   (mp_irqs[i].mpc_dstirq == intin)) /* INTIN */
+					shares++; 
+		}
+		if (shares != 1) 
+			return -EBUSY;
+	}
+	
+	return IO_APIC_VECTOR(irq);
+}
+#define get_ioapic_vector	find_ioapic_vector
+#endif
+
+static int get_msi_vector(struct pci_dev *dev)
+{
+	int vector;
+
+	if ((vector = get_ioapic_vector(dev)) < 0) 
+		vector = get_new_vector();
+	
+	return vector;
+}
+
+static struct msi_desc* alloc_msi_entry(void)
+{
+	struct msi_desc *entry;
+
+	entry=(struct msi_desc*)kmalloc(sizeof(struct msi_desc),GFP_KERNEL);
+	if (!entry)
+		return NULL;
+	entry->mask_entry_addr = NULL;
+	entry->dev = NULL;
+	
+	return entry;
+}
+
+static void irq_handler_init(int pos, int mask)
+{
+	spin_lock(&irq_desc[pos].lock);
+	if (!mask)
+		irq_desc[pos].handler = &msi_irq_wo_maskbit_type;
+	else
+		irq_desc[pos].handler = &msi_irq_w_maskbit_type;
+	spin_unlock(&irq_desc[pos].lock);
+}
+
+static void enable_msi_mode(struct pci_dev *dev, int pos, int type)
+{
+	u32 control;
+			
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msi_control_reg(pos), 2, &control);
+	if (type == PCI_CAP_ID_MSI) {
+		/* Set enabled bits to single MSI & enable MSI_enable bit */
+		msi_enable(control, 1);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	} else {
+		msix_enable(control);
+	        dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_control_reg(pos), 2, control);
+	}
+    	if (pci_find_capability(dev, PCI_CAP_ID_PCI_EXPRESS)) {
+		/* PCI Express Endpoint device detected */
+		u32 cmd;
+	        dev->bus->ops->read(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
+		cmd |= PCI_COMMAND_INTX_DISABLE;
+	        dev->bus->ops->write(dev->bus, dev->devfn, PCI_COMMAND, 2, cmd);
+	}
+}
+
+/**
+ * msi_capability_init: configure device's MSI capability structure
+ * argument: dev of struct pci_dev type
+ * 
+ * description: called to configure the device's MSI capability structure with 
+ * a single MSI, regardless of device is capable of handling multiple messages.
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * > 0		: vector for entry 0
+ **/
+static int msi_capability_init(struct pci_dev *dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int pos, vector, *mask_reg = NULL;
+	u32 control;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+	if (!pos)
+		return -EINVAL; 
+
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 
+		2, &control);
+	if (control & PCI_MSI_FLAGS_ENABLE)
+		return dev->irq;
+	
+	if (is_mask_bit_support(control)) {
+		mask_reg = (int*) kmalloc(sizeof(int), GFP_KERNEL);
+		if (!mask_reg)
+			return -ENOMEM;
+		*mask_reg = msi_mask_bits_reg(pos, is_64bit_address(control));
+	}
+	/* MSI Entry Initialization */
+	entry = alloc_msi_entry();
+	if (!entry) {
+		kfree(mask_reg);
+		return -ENOMEM;
+	}
+	if ((vector = get_msi_vector(dev)) < 0) {
+		kfree(entry);
+		kfree(mask_reg);
+		return vector;
+	}
+	entry->msi_attrib.type = PCI_CAP_ID_MSI;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = is_mask_bit_support(control);
+	entry->dev = dev;
+	entry->mask_entry_addr = mask_reg; 
+	irq_handler_init(vector, entry->msi_attrib.maskbit);
+	msi_desc[vector] = entry;
+	/* Configure MSI capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	dev->bus->ops->write(dev->bus, dev->devfn, msi_lower_address_reg(pos), 
+				4, address.lo_address.value);
+	if (is_64bit_address(control)) {
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_upper_address_reg(pos), 4, address.hi_address);
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 1), 2, *((u32*)&data));
+	} else
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_data_reg(pos, 0), 2, *((u32*)&data));
+	if (entry->msi_attrib.maskbit) {
+		unsigned int maskbits, temp;
+		/* All MSIs are unmasked by default, Mask them all */
+	        dev->bus->ops->read(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			&maskbits);
+		temp = (1 << multi_msi_capable(control));
+		temp = ((temp - 1) & ~temp);
+		maskbits |= temp;
+		dev->bus->ops->write(dev->bus, dev->devfn, 
+			msi_mask_bits_reg(pos, is_64bit_address(control)), 4,
+			maskbits);
+	}
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSI);
+	
+	return vector;
+}
+
+/**
+ * msix_capability_init: configure device's MSI-X capability structure
+ * argument: dev of struct pci_dev type
+ * 
+ * description: called to configure the device's MSI-X capability structure 
+ * with a single MSI. To request for additional MSI vectors, the device drivers
+ * are required to utilize the following supported APIs:
+ * 1) msix_alloc_vectors(...) for requesting one or more MSI
+ * 2) msix_free_vectors(...) for releasing one or more MSI back to PCI subsystem
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * > 0		: vector for entry 0
+ **/
+static int msix_capability_init(struct pci_dev	*dev) 
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int vector = 0, pos, dev_msi_cap;
+	u32 phys_addr, table_offset, *base;
+	u32 control;
+	u8 bir;
+	
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	if (!pos)
+		return -EINVAL;  
+
+	/* Request & Map MSI-X table region */
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 2, 
+		&control);
+	if (control & PCI_MSIX_FLAGS_ENABLE)
+		return dev->irq;
+
+	dev_msi_cap = multi_msix_capable(control);
+	dev->bus->ops->read(dev->bus, dev->devfn, 
+		msix_table_offset_reg(pos), 4, &table_offset);
+	bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+	phys_addr = pci_resource_start (dev, bir);
+	phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+	if (!request_mem_region(phys_addr, 
+		dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32),
+		"MSI-X iomap Failure"))
+		return -ENOMEM;
+	base = (u32*)ioremap_nocache(phys_addr,
+		dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32));
+	if (base == NULL) 
+		goto free_region; 
+	/* MSI Entry Initialization */
+	entry = alloc_msi_entry();
+	if (!entry) 
+		goto free_iomap; 
+	if ((vector = get_msi_vector(dev)) < 0)
+		goto free_entry;
+       	
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = 0;
+	entry->msi_attrib.maskbit = 1;
+	entry->dev = dev;
+	entry->mask_entry_addr = base; 
+	irq_handler_init(vector, 1);
+	msi_desc[vector] = entry;
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	*base = address.lo_address.value;
+	*(base + 1) = address.hi_address;
+	*(base + 2) = *((u32*)&data);
+	/* Set MSI enabled bits	 */
+	enable_msi_mode(dev, pos, PCI_CAP_ID_MSIX);
+	return vector;
+
+free_entry:
+	kfree(entry);
+free_iomap:
+	iounmap((void *)base);
+free_region: 
+	release_mem_region(phys_addr, 
+	dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32));
+
+	return ((vector < 0) ? -EBUSY : -ENOMEM);
+}
+
+/** 
+ * msi_get_pci_irq_vector: configure MSI/MSI-X capability structure
+ * argument: unsigned int irq 
+ * 
+ * description: called by PCI bus driver during PCI bus enumeration. The 
+ * decision of enabling MSI vector(s) to the function requires that all of 
+ * the following conditions must be satisfied:
+ * -	The system-wide-level boot parameter CONFIG_PCI_MSI is enabled 
+ *      This boot parameter enables MSI support to all MSI capable functions 
+ *      by default.
+ * -	The function indicates MSI support by implementing MSI/MSI-X capability
+ *	structure in its capability list.
+ * -	The function is not listed in the boot parameter "device_nomsi=". PCI 
+ *	subsystem uses this boot parameter to determine whether users want to 
+ *	have PCI subsystem forcing the function to override its default irq 
+ *	assertion mechanism with PCI MSI mechanism.
+ * -	The MSI resource pool is available.
+ *
+ * return:
+ * -EINVAL	: Invalid device type
+ * -ENOMEM	: Memory Allocation Error
+ * -EBUSY	: No MSI resource 
+ * > 0		: vector for entry 0
+ **/
+int msi_get_pci_irq_vector(struct pci_dev* dev)
+{
+	int vector = -EINVAL;
+
+	if (!pci_msi_enable)
+ 		return vector;	
+	if (!is_msi_support_for_this_device(dev))
+ 		return vector;	
+	if ((vector = msix_capability_init(dev)) == -EINVAL)
+		return (msi_capability_init(dev));
+	
+	return vector;
+}
+
+static int msix_alloc_vector(struct pci_dev* dev)
+{
+	struct msi_desc *entry;
+	struct msg_address address; 
+	struct msg_data data;
+	int i, pos, dev_msi_cap, vector;
+	u32 control;
+	u32 *base = NULL;
+	unsigned long flags;
+
+	vector = get_new_vector();
+	if (vector < 0)
+		return vector;
+
+	entry = msi_desc[dev->irq];
+	base = (u32*)entry->mask_entry_addr;
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos),	
+		2, &control);
+	dev_msi_cap = multi_msix_capable(control);
+	for (i = 1; i < dev_msi_cap; i++)
+		if (*(base + i*PCI_MSIX_ENTRY_SIZE) == 0)
+			 break;
+	if (i >= dev_msi_cap) {
+		spin_lock_irqsave(&msi_lock, flags);
+		vector_irq[vector] = 0;
+		nr_released_vectors++;
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EINVAL;
+	}
+	/* MSI Entry Initialization */
+	entry = alloc_msi_entry();
+	if (!entry) {
+		spin_lock_irqsave(&msi_lock, flags);
+		vector_irq[vector] = 0;
+		nr_released_vectors++;
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -ENOMEM;
+	}	
+	entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+	entry->msi_attrib.entry_nr = i;
+	entry->msi_attrib.maskbit = 1;
+	entry->dev = dev;
+	entry->mask_entry_addr = base; 
+	irq_handler_init(vector, 1);
+	msi_desc[vector] = entry;
+	/* Configure MSI-X capability structure */
+	msi_address_init(&address);
+	msi_data_init(&data, vector);
+	entry->msi_attrib.current_cpu = address.lo_address.u.dest_id;
+	base += i*PCI_MSIX_ENTRY_SIZE;
+	*base = address.lo_address.value;
+	*(base + 1) = address.hi_address;
+	*(base + 2) = *((u32*)&data);
+	*(base + 3) = 1; /* ensure to mask it */
+	
+	return vector;
+}
+
+int msix_alloc_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i, pos, vec, alloc_vectors = 0, *vectors = (int *)vector;
+	u32 control;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return -EINVAL;
+	
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev ||
+	   entry->msi_attrib.type != PCI_CAP_ID_MSIX)
+		return -EINVAL;
+
+   	pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+	dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+	if (nvec > multi_msix_capable(control))
+		return -EINVAL;
+	/*
+	 * msi_lock is provided to ensure that enough vectors resources are
+	 * available before granting.
+	 */ 
+	spin_lock_irqsave(&msi_lock, flags);
+	if (nvec > pci_vector_resources()) {
+		spin_unlock_irqrestore(&msi_lock, flags);
+		return -EBUSY;
+	}
+	spin_unlock_irqrestore(&msi_lock, flags);
+	for (i = 0; i < nvec; i++) {
+		if ((vec = msix_alloc_vector(dev)) < 0)
+			break;
+		*(vectors + i) = vec;
+		alloc_vectors++;
+	}
+	if (alloc_vectors != nvec) {
+		for (i = 0; i < alloc_vectors; i++) {
+			vec = *(vectors + i);
+			spin_lock_irqsave(&msi_lock, flags);
+			vector_irq[vec] = 0;
+			nr_released_vectors++;
+			spin_unlock_irqrestore(&msi_lock, flags);
+			if (msi_desc[vec] != NULL) {
+				msi_desc[vec]->dev = NULL; 
+				msi_desc[vec]->mask_entry_addr = NULL; 
+				kfree(msi_desc[vec]);
+			}
+		}
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static int msi_free_vector(struct pci_dev* dev, int vector)
+{
+	struct msi_desc *entry;
+	int offset;
+	u32 *tbl_addr;
+	unsigned long flags;
+
+	entry = msi_desc[vector];
+	if (!entry || entry->dev != dev || 
+	   entry->msi_attrib.type != PCI_CAP_ID_MSIX)
+		return -EINVAL;
+
+	tbl_addr = (u32*)entry->mask_entry_addr;
+	offset = entry->msi_attrib.entry_nr*PCI_MSIX_ENTRY_SIZE - 1;
+	*(tbl_addr + offset) = 1;	/* mask this entry */
+	entry->mask_entry_addr = NULL;
+	entry->dev = NULL;
+	kfree(entry);
+	msi_desc[vector] = NULL;
+	spin_lock_irqsave(&msi_lock, flags);
+	vector_irq[vector] = 0;	
+	nr_released_vectors++;
+	spin_unlock_irqrestore(&msi_lock, flags);
+	
+	return 0;
+}
+
+int msix_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+{
+	struct msi_desc *entry;
+	int i;
+
+	if (!pci_msi_enable)
+ 		return -EINVAL;
+	
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev || 
+	   entry->msi_attrib.type != PCI_CAP_ID_MSIX)
+		return -EINVAL;
+
+	for (i = 0; i < nvec; i++) {
+		msi_free_vector(dev, *(vector + i)); 
+	}
+
+	return 0;
+}
+
+/**
+ * msi_hp_free_vectors: reclaim all MSI previous assigned to this device
+ * argument: device pointer of type struct pci_dev 
+ *
+ * description: used during hotplug removed, from which device is hot-removed;
+ * PCI subsystem reclaims associated MSI to unused state, which may be used 
+ * later on.
+ **/ 
+void msi_remove_pci_irq_vectors(struct pci_dev* dev)
+{
+	struct msi_desc *entry;
+	int type;
+	void *mask_entry_addr;
+	unsigned long flags;
+
+	if (!pci_msi_enable)
+ 		return;
+	
+	entry = msi_desc[dev->irq];
+	if (!entry || entry->dev != dev)
+		return;
+		
+	if (get_ioapic_vector(dev) > 0)
+		restore_ioapic_irq_handler(dev->irq);
+	type = entry->msi_attrib.type;
+	mask_entry_addr = entry->mask_entry_addr;
+	entry->mask_entry_addr = NULL;
+	entry->dev = NULL;
+	kfree(entry);
+	msi_desc[dev->irq] = NULL;
+	spin_lock_irqsave(&msi_lock, flags);
+	vector_irq[dev->irq] = 0;	
+	nr_released_vectors++;
+	spin_unlock_irqrestore(&msi_lock, flags);
+	if (type == PCI_CAP_ID_MSIX) {
+		int i, pos, dev_msi_cap;
+		u32 phys_addr, table_offset;
+		u32 control;
+		u8 bir;
+
+   		pos = pci_find_capability(dev, PCI_CAP_ID_MSIX);
+		dev->bus->ops->read(dev->bus, dev->devfn, msi_control_reg(pos), 			2, &control);
+		dev_msi_cap = multi_msix_capable(control);
+		dev->bus->ops->read(dev->bus, dev->devfn, 
+			msix_table_offset_reg(pos), 4, &table_offset);
+		bir = (u8)(table_offset & PCI_MSIX_FLAGS_BIRMASK);
+		phys_addr = pci_resource_start (dev, bir);
+		phys_addr += (u32)(table_offset & ~PCI_MSIX_FLAGS_BIRMASK);
+		for (i = 0; i < NR_IRQS; i++) 
+			if (msi_desc[i]->dev == dev)
+				msi_free_vector(dev, i);
+		*(u32*)(mask_entry_addr + 3) = 1;
+		iounmap(mask_entry_addr);
+		release_mem_region(phys_addr, 
+		dev_msi_cap*PCI_MSIX_ENTRY_SIZE*sizeof(u32));
+	} else
+		kfree(mask_entry_addr);
+}
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/arch/i386/pci/irq.c linux-2.6.0-test2-create-msi/arch/i386/pci/irq.c
--- linux-2.6.0-test2-create-vectorbase/arch/i386/pci/irq.c	2003-08-05 09:25:54.000000000 -0400
+++ linux-2.6.0-test2-create-msi/arch/i386/pci/irq.c	2003-08-05 09:45:25.000000000 -0400
@@ -714,6 +714,18 @@
 
 	dev = NULL;
 	while ((dev = pci_find_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
+		int irq;
+
+#ifdef CONFIG_PCI_MSI
+  		if ((irq = msi_get_pci_irq_vector(dev)) > 0) {
+  		    	dev->irq = irq;
+  		    	printk(KERN_INFO "PCI->MSI vector:(B%d,S%d,F%d)->%d\n", 
+  				dev->bus->number, PCI_SLOT(dev->devfn), 
+  				PCI_FUNC(dev->devfn), irq);
+			continue;
+  		} 
+#endif	
+
 		pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin);
 #ifdef CONFIG_X86_IO_APIC
 		/*
@@ -721,8 +733,6 @@
 		 */
 		if (io_apic_assign_pci_irqs)
 		{
-			int irq;
-
 			if (pin) {
 				pin--;		/* interrupt pins are numbered starting from 1 */
 				irq = IO_APIC_get_PCI_irq_vector(dev->bus->number, PCI_SLOT(dev->devfn), pin);
@@ -812,6 +822,18 @@
 {
 	u8 pin;
 	extern int interrupt_line_quirk;
+	
+#ifdef CONFIG_PCI_MSI
+	int irq;
+	if ((irq = msi_get_pci_irq_vector(dev)) > 0) {
+	    	dev->irq = irq;
+	    	printk(KERN_INFO "PCI->MSI vector:(B%d,S%d,F%d)->%d\n", 
+			dev->bus->number, PCI_SLOT(dev->devfn), 
+			PCI_FUNC(dev->devfn), irq);
+		return 0;
+	} 
+#endif
+	
 	pci_read_config_byte(dev, PCI_INTERRUPT_PIN, &pin);
 	if (pin && !pcibios_lookup_irq(dev, 1) && !dev->irq) {
 		char *msg;
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/Documentation/kernel-parameters.txt linux-2.6.0-test2-create-msi/Documentation/kernel-parameters.txt
--- linux-2.6.0-test2-create-vectorbase/Documentation/kernel-parameters.txt	2003-07-27 13:12:45.000000000 -0400
+++ linux-2.6.0-test2-create-msi/Documentation/kernel-parameters.txt	2003-08-05 09:45:25.000000000 -0400
@@ -261,6 +261,22 @@
 			Format: <area>[,<node>]
 			See also Documentation/networking/decnet.txt.
 
+	device_nomsi=	[HW,HW,...]
+			Format: HW = DeviceIDVendorID in hex format with prefix
+			0x to select a list of specific MSI-capable functions
+			requesting no MSI support at function-wide. This
+			option is ignored if either a configured parameter
+			CONFIG_PCI_MSI is set to 'N' or a configured parameter 
+			CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is set to 'Y'.
+
+	device_msi=	[HW,HW,...]
+			Format: HW = DeviceIDVendorID in hex format with prefix
+			0x to select a list of specific MSI-capable functions
+			requesting MSI support at function-wide. This
+			option is ignored if either a configured parameter 
+			CONFIG_PCI_MSI is set to 'N' or a configured parameter
+			CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is set to 'N'.
+
 	devfs=		[DEVFS]
 			See Documentation/filesystems/devfs/boot-options.
  
@@ -694,6 +710,11 @@
 			See header of drivers/block/paride/pcd.c.
 			See also Documentation/paride.txt.
 
+	pci_nomsi	If user sets a configured parameter CONFIG_PCI_MSI
+			during kernel built, then he/she can use this append 
+			option to disable MSI system-wide support from PCI 
+			subsystem during boot.
+
 	pci=option[,option...]		[PCI] various PCI subsystem options:
 		off			[IA-32] don't probe for the PCI bus
 		bios			[IA-32] force use of PCI BIOS, don't access
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/Documentation/MSI-HOWTO.txt linux-2.6.0-test2-create-msi/Documentation/MSI-HOWTO.txt
--- linux-2.6.0-test2-create-vectorbase/Documentation/MSI-HOWTO.txt	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.0-test2-create-msi/Documentation/MSI-HOWTO.txt	2003-08-05 14:43:33.000000000 -0400
@@ -0,0 +1,330 @@
+		The MSI Driver Guide HOWTO
+	Tom L Nguyen tom.l.nguyen@intel.com
+			07/15/2003
+
+1. About this guide
+
+This guide describes the basics of Message Signaled Interrupts(MSI), the 
+advantages of using MSI over traditional interrupt mechanisms, and how 
+to enable your driver to use MSI or MSI-X. Also included is a 
+description of debugging features available and Frequently Asked 
+Questions.
+
+2. Copyright 2003 Intel Corporation 
+
+3. What is MSI/MSI-X?
+
+Message Signaled Interrupt (MSI), as described in the PCI Local Bus 
+Specification Revision 2.3 or latest, is an optional feature, and a required 
+feature for PCI Express devices. MSI enables a device function to request 
+service by sending an Inbound Memory Write on its PCI bus to the FSB as a 
+Message Signal Interrupt transaction. Because MSI is generated in the
+form of a Memory Write, all transaction conditions, such as a Retry, 
+Master-Abort, Target-Abort or normal completion, are supported.
+
+A PCI device that supports MSI must also support pin IRQ assertion interrupt 
+mechanism to provide backward compatibility for systems that do not
+support MSI. In Systems, which support MSI, the bus driver is
+responsible for initializing the message address and message data of
+the device function's MSI/MSI-X capability structure during device
+initial configuration. 
+
+An MSI capable device function indicates MSI support by implementing 
+the MSI/MSI-X capability structure in its PCI capability list. The 
+device function may implement both the MSI capability structure and the 
+MSI-X capability structure; however, the bus driver should not enable 
+both, but instead enable only the MSI-X capability structure.
+
+The MSI capability structure contains Message Control register,
+Message Address register and Message Data register. These registers
+provide the bus driver control over MSI. The Message Control register
+indicates the MSI capability supported by the device. The Message
+Address register specifies the target address and the Message Data
+register specifies the characteristics of the message. To request
+service, the device function writes the content of the Message Data
+register to the target address. The device and its software driver 
+are prohibited from writing to these registers.
+
+The MSI-X capability structure is an optional extension to MSI. It uses 
+an independent and separate capability structure. There are some key 
+advantages to implementing the MSI-X capability structure over the MSI 
+capability structure as described below.
+
+	- Support a larger maximum number of vectors per function.
+
+	- Provide the ability for system software to configure each
+	  vector with an independent message address and message data,
+	  specified by a table that resides in Memory Space. 
+
+        - MSI and MSI-X both support per-vector masking. Per-vector
+	  masking is an optional extension of MSI but a required
+	  feature for MSI-X. Per-vector masking provides the kernel
+	  the ability to mask/unmask MSI when servicing its software
+	  interrupt service routing handler. If per-vector masking is
+	  not supported, then the device driver should provide the 
+	  hardware/software synchronization to ensure that the device generates
+	  MSI when the driver wants it to do so. 
+
+4. Why use MSI? 
+
+As a benefit the simplification of board design, MSI allows board designers to 
+remove out of band interrupt routing. MSI is another step towards a legacy-free
+environment.
+
+Due to increasing pressure on chipset and processor packages to reduce pin 
+count, the need for interrupt pins is expected to diminish over time. Devices, 
+due to pin constraints, may implement messages to increase performance. 
+
+PCI Express endpoints uses INTx emulation (in-band messages) instead of IRQ pin
+assertion. Using INTx emulation requires interrupt sharing among devices 
+connected to the same node (PCI bridge) while MSI is unique (non-shared) and 
+does not require BIOS configuration support. As a result, the PCI Express 
+technology requires MSI support for better interrupt performance.
+ 
+Using MSI enables the device functions to support two or more vectors,
+which can be configure to target different CPU's to increase scalability. 
+
+5. Configuring a driver to use MSI/MSI-X
+
+By default, the kernel will not enable MSI/MSI-X on all 
+devices that support this capability because some devices. A kernel 
+configuration option must be selected to enable MSI/MSI-X support.
+
+5.1 Including MSI support into the kernel
+
+To include MSI support into the kernel requires users to rebuild 
+the kernel with both the configuration parameters CONFIG_PCI_USE_VECTOR and 
+CONFIG_PCI_MSI set. CONFIG_PCI_USE_VECTOR enables the kernel to
+replace the IRQ-based scheme with VECTOR-based scheme because MSI
+requires a unique vector and no BIOS interrupt-routing
+table. CONFIG_PCI_MSI enables MSI support in the kernel. 
+
+During PCI device enumeration, the bus driver initializes the devices
+MSI/MSI-X capability structure with ONE vector, regardless of whether
+the device function is capable of supporting multiple vectors. 
+
+ONE vector is initially allocated to the device function and the vector is 
+stored in the irq field of the device (pci_dev) structure. This default 
+initialization allows legacy drivers to work without specific modification to 
+support MSI.
+	
+5.2 Configuring for MSI support
+
+Due to the non-contiguous fashion in vector assignment of the 
+existing Linux kernel, this patch does not support multiple 
+messages regardless of the device function is capable of 
+supporting more than one vector. The bus driver initializes only 
+entry 0 of this capability. Existing software drivers of this 
+device function will work without changes if no 
+hardware/software synchronization is required. Otherwise, the 
+device driver should be updated to provide the hardware/software 
+synchronization due to multiple messages generated from the same 
+vector might be lost. In other words, once the device function 
+signals Vector A, it cannot signal Vector A again until it is 
+explicitly enabled to do so by its device driver. It is 
+recommended that IHVs should validate their hardware devices 
+against their existing device drivers once the patch is 
+installed. Please refer section 5.4 Debugging MSI.
+
+5.3 Configuring for MSI-X support
+
+Both the MSI capability structure and the MSI-X capability 
+structure share the same above semantics; however, due to the 
+ability of the system software to configure each vector of the 
+MSI-X capability structure with an independent message address 
+and message data, the non-contiguous fashion in vector assignment 
+of the existing Linux kernel has no impact on supporting multiple 
+messages on an MSI-X capable device functions. By default, as 
+mentioned above, ONE vector should be always allocated to the 
+MSI-X capability structure at entry 0. The bus driver does not 
+initialize other entries of the table during device enumeration. 
+Note that the PCI subsystem should have full control of a MSI-X table that 
+resides in Memory Space. The software device driver should not access this 
+table. 
+
+To request for additional vectors, the device software driver 
+should call function msix_alloc_vectors(). It is recommended that 
+the software driver should call this function once during the 
+initialization phase of the device driver. With this semantics, 
+the existing software device driver may work with one vector if 
+no hardware/software synchronization is required. It is 
+recommended that IHVs should validate their hardware devices 
+against their existing device drivers once the patch is 
+installed. Please refer section 5.4 Debugging MSI.
+
+The function msix_alloc_vectors(), once invoked, enables either 
+all or nothing, depending on the current availability of vector 
+resources. If no vector resources are available, the device 
+function still works with ONE vector. If the vector resources are 
+available for the number of vectors requested by the driver, this 
+function will reconfigure the MSI-X capability structure of the 
+device with additional messages, starting from entry 1. To 
+emphasize this reason, for example, the device may be capable for 
+supporting the maximum of 32 vectors while its software driver 
+usually may request 4 vectors.
+
+For each vector, after this successful call, the device driver is 
+responsible to call other functions like request_irq(), 
+enable_irq(), etc. to enable this vector with its corresponding 
+interrupt service handler. It is the device driver's choice to 
+have all vectors shared the same interrupt service handler or 
+each vector with a unique interrupt service handler. 
+
+In addition to the function msix_alloc_vectors(), another 
+function msix_free_vectors() is provided to allow the software 
+driver to release a number of vectors back to the vector 
+resources. Once invoked, the PCI subsystem disables (masks) each 
+vector released. These vectors are no longer valid for the 
+hardware device and its software driver to use.
+
+int msix_alloc_vectors(struct pci_dev *dev, int *vector, int nvec)
+
+This API enables the software driver to request the PCI
+subsystem for additional messages. Depending on the number of 
+vectors available, the PCI subsystem enables either all or 
+nothing. 
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+requested.
+A return of zero indicates that the number of allocated vector is 
+successfully allocated. Otherwise, indicate resources not 
+available.
+
+int msix_free_vectors(struct pci_dev* dev, int *vector, int nvec)
+
+This API enables the software driver to inform the PCI subsystem 
+that it is willing to release a number of vectors back to the MSI 
+resource pool.  Once invoked, the PCI subsystem disables each 
+MSI-X entry associated with each vector stored in the argument 2. 
+These vectors are no longer valid for the hardware device and its 
+software driver to use.
+
+Argument dev points to the device (pci_dev) structure.
+Argument vector is a pointer of integer type. The number of 
+elements is indicated in argument nvec.
+Argument nvec is an integer indicating the number of messages 
+released. 
+A return of zero indicates that the number of allocated vectors 
+is successfully released. Otherwise, indicates a failure.
+
+5.4 Debugging MSI
+
+There are some devices that may have some bugs in MSI. These devices may break 
+once MSI support is invoked in the kernel. To debug these devices, the patch 
+provides two configuration parameters, CONFIG_PCI_MSI and 
+CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES. Both of them are not set by default. When 
+users set CONFIG_PCI_MSI, CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES is also set by 
+default. After users rebuild the kernel with this combination, the
+kernel enables MSI on specific devices listed in the boot parameter 
+"device_msi=". Users must explicitly use this boot parameter to
+provide a list of specific devices they would like to have MSI
+support. To emphasize this reason, users can debug on individual MSI
+capable device with its existing software driver until all are fully
+validated since it may be difficult to debug all the same time. The format 
+of "device_msi=" is similar to the format of "device_nomsi=" and will be 
+described in later paragraph. Note that this boot parameter is
+required only if the configuration parameter CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES
+is set. Otherwise, it will be ignored.  
+
+Once users completed validating these devices, they can clear the
+configuration parameter CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to indicate
+the kernel that MSI should be enabled on all MSI capable devices. The
+boot parameter "device_msi=" is no longer required.
+
+The patch also provides second debug option, which requires users set the 
+configuration parameter CONFIG_PCI_MSI and clear configuration parameter 
+CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES. After users rebuild the kernel with this 
+combination, the kernel enables MSI on all MSI capable devices by default. The 
+boot parameter "device_msi=" will be ignored. To disable MSI on
+specific MSI capable devices, which may show some signs of
+unpredictable behaviors, users must explicitly use the boot parameter 
+"device_nomsi=", which contains a list of specific devices users do
+not want MSI enabled. These devices are default to IRQ pin assertion.
+
+The format of this is "device_nomsi=DWORD1,DWORD2,...". Each 
+DWORD in a list specifies a device function in terms of device ID 
+(higher word) and vendor ID (lower word). DWORD should be in hex 
+format with a prefix 0x.
+
+For example, "device_nomsi=0x80119005,0x10108086" indicates that 
+the bus driver should not enable MSI(X) on two device functions 
+(Device ID = 0x8011 & Vendor ID = 0x9005, and Device ID = 0x1010 
+and Vendor ID = 0x8086). 
+	
+In addition to the boot parameter "device_nomsi=", another boot 
+parameter "pci_nomsi" can be used to prohibit the bus driver from 
+enabling MSI(X) on all MSI capable devices.
+
+At the driver level, the software device driver can tell whether 
+MSI/MSI-X is enabled by reading the MSI enable bit of the 
+MSI/MSI-X capability structure's message control register. If 
+this bit is zero, the device function is default to IRQ pin 
+assertion. If this bit is set, the device function is using MSI 
+as interrupt generated mechanism.
+
+At the user level, users can use command 'cat /proc/interrupts' 
+to display the vector allocated for the device and its interrupt 
+mode, as shown below. 
+
+     CPU0     CPU1    CPU2    CPU3       
+0:   14175    0       17408   0    	IO-APIC-edge  	timer
+1:   123      310     0       37    	IO-APIC-edge  	keyboard
+2:   0        0       0       0     	XT-PIC  	cascade
+8:   1        0       0       0    	IO-APIC-edge  	rtc
+12:  41       0       0       813   	IO-APIC-edge  	PS/2 Mouse
+14:  2744     7017    0       0     	IO-APIC-edge  	ide0
+15:  1515     1       0       418   	IO-APIC-edge  	ide1
+169: 0        0       0       0   	IO-APIC-level  	usb-uhci
+185: 0        0       0       0   	IO-APIC-level  	usb-uhci
+193: 30       0       0       0     	PCI MSI		aic79xx
+201: 30       0       0       0     	PCI MSI		aic79xx
+209: 467      0       0       0   	IO-APIC-level  	eth1
+225: 15       0       0       0   	IO-APIC-level  	aic7xxx
+233: 15       0       0       0   	IO-APIC-level  	aic7xxx
+NMI: 0        0       0       0 
+LOC: 31446    31448   31448   31448 
+ERR: 0
+MIS: 0
+
+6. FAQ
+
+Q1. Are there any limitations on using the MSI?
+
+A1. If the PCI device supports MSI and conforms to the
+specification and the platform supports the APIC local bus,
+then using MSI should work.
+
+Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
+AMD processors)? In P3 IPI's are transmitted on the APIC local
+bus and in P4 and Xeon they are transmitted on the system
+bus. Are there any implications with this?
+
+A2. MSI support enables a PCI device sending an inbound
+memory write (0xfeexxxxx as target address) on its PCI bus
+directly to the FSB. Since the message address has a
+redirection hint bit cleared, it should work.
+
+Q3. The target address 0xfeexxxxx will be translated by the
+Host Bridge into an interrupt message. Are there any
+limitations on the chipsets such as Intel 8xx, Intel e7xxx,
+or VIA?
+
+A3. If these chipsets support an inbound memory write with
+target address set as 0xfeexxxxx, as conformed to PCI specification 2.3 or 
+latest, then it should work.
+
+Q4. From the driver point of view, if the MSI is lost because
+of the errors occur during inbound memory write, then it may
+wait for ever. Is there a mechanism for it to recover?
+
+A4. Since the target of the transaction is an inbound memory
+write, all transaction termination conditions (Retry,
+Master-Abort, Target-Abort, or normal completion) are
+supported. A device sending an MSI must abide by all the PCI
+rules and conditions regarding that inbound memory write. So,
+if a retry is signaled it must retry, etc... We believe that
+the recommendation for Abort is also a retry (refer to PCI
+specification 2.3 or latest).
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/drivers/acpi/pci_irq.c linux-2.6.0-test2-create-msi/drivers/acpi/pci_irq.c
--- linux-2.6.0-test2-create-vectorbase/drivers/acpi/pci_irq.c	2003-07-27 13:01:22.000000000 -0400
+++ linux-2.6.0-test2-create-msi/drivers/acpi/pci_irq.c	2003-08-05 09:45:25.000000000 -0400
@@ -49,6 +49,9 @@
 extern void eisa_set_level_irq(unsigned int irq);
 #endif
 
+#ifdef CONFIG_PCI_MSI
+extern int msi_get_pci_irq_vector(struct pci_dev *dev);
+#endif
 
 /* --------------------------------------------------------------------------
                          PCI IRQ Routing Table (PRT) Support
@@ -229,7 +232,7 @@
                           PCI Interrupt Routing Support
    -------------------------------------------------------------------------- */
 
-static int
+int 
 acpi_pci_irq_lookup (struct pci_bus *bus, int device, int pin)
 {
 	struct acpi_prt_entry	*entry = NULL;
@@ -326,6 +329,15 @@
 		return_VALUE(-ENODEV);
 	}
 
+#ifdef CONFIG_PCI_MSI	
+	if ((irq = msi_get_pci_irq_vector(dev)) > 0) {
+	    	dev->irq = irq;
+	    	printk(KERN_INFO "PCI->MSI VEC transform:(B%d,I%d,P%d)->%d\n", 
+			dev->bus->number, PCI_SLOT(dev->devfn), pin, irq);
+		return_VALUE(dev->irq);
+	}
+#endif
+	
 	/* 
 	 * First we check the PCI IRQ routing table (PRT) for an IRQ.  PRT
 	 * values override any BIOS-assigned IRQs set during boot.
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/drivers/pci/remove.c linux-2.6.0-test2-create-msi/drivers/pci/remove.c
--- linux-2.6.0-test2-create-vectorbase/drivers/pci/remove.c	2003-07-27 12:58:26.000000000 -0400
+++ linux-2.6.0-test2-create-msi/drivers/pci/remove.c	2003-08-05 10:21:57.000000000 -0400
@@ -14,6 +14,10 @@
 {
 	int i;
 
+#ifdef CONFIG_PCI_MSI
+ 	msi_remove_pci_irq_vectors(dev);
+#endif
+ 
 	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
 		struct resource *res = dev->resource + i;
 		if (res->parent)
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/include/asm-i386/hw_irq.h linux-2.6.0-test2-create-msi/include/asm-i386/hw_irq.h
--- linux-2.6.0-test2-create-vectorbase/include/asm-i386/hw_irq.h	2003-08-05 09:25:54.000000000 -0400
+++ linux-2.6.0-test2-create-msi/include/asm-i386/hw_irq.h	2003-08-06 09:43:39.000000000 -0400
@@ -60,6 +60,15 @@
 extern void send_IPI(int dest, int vector);
 extern void setup_ioapic_dest(unsigned long mask);
 
+#ifdef CONFIG_PCI_MSI
+#include <linux/pci_msi.h>
+extern struct msi_desc* msi_desc[NR_IRQS];
+extern int msi_get_pci_irq_vector(struct pci_dev *dev);
+extern int msix_alloc_vectors(struct pci_dev *dev, int *vector, int nvec);
+extern int msix_free_vectors(struct pci_dev *dev, int *vector, int nvec);
+extern void msi_remove_pci_irq_vectors(struct pci_dev *dev);
+#endif
+
 extern unsigned long io_apic_irqs;
 
 extern atomic_t irq_err_count;
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/include/linux/pci.h linux-2.6.0-test2-create-msi/include/linux/pci.h
--- linux-2.6.0-test2-create-vectorbase/include/linux/pci.h	2003-07-27 13:01:50.000000000 -0400
+++ linux-2.6.0-test2-create-msi/include/linux/pci.h	2003-08-05 09:45:25.000000000 -0400
@@ -36,6 +36,7 @@
 #define  PCI_COMMAND_WAIT 	0x80	/* Enable address/data stepping */
 #define  PCI_COMMAND_SERR	0x100	/* Enable SERR */
 #define  PCI_COMMAND_FAST_BACK	0x200	/* Enable back-to-back writes */
+#define  PCI_COMMAND_INTX_DISABLE 0x400 /* INTx Emulation Disable */
 
 #define PCI_STATUS		0x06	/* 16 bits */
 #define  PCI_STATUS_CAP_LIST	0x10	/* Support Capability List */
@@ -198,6 +199,8 @@
 #define  PCI_CAP_ID_MSI		0x05	/* Message Signalled Interrupts */
 #define  PCI_CAP_ID_CHSWP	0x06	/* CompactPCI HotSwap */
 #define  PCI_CAP_ID_PCIX	0x07	/* PCI-X */
+#define  PCI_CAP_ID_PCI_EXPRESS 0x10
+#define  PCI_CAP_ID_MSIX	0x11	/* MSI-X */
 #define PCI_CAP_LIST_NEXT	1	/* Next capability in the list */
 #define PCI_CAP_FLAGS		2	/* Capability defined flags (16 bits) */
 #define PCI_CAP_SIZEOF		4
@@ -275,11 +278,13 @@
 #define  PCI_MSI_FLAGS_QSIZE	0x70	/* Message queue size configured */
 #define  PCI_MSI_FLAGS_QMASK	0x0e	/* Maximum queue size available */
 #define  PCI_MSI_FLAGS_ENABLE	0x01	/* MSI feature enabled */
+#define  PCI_MSI_FLAGS_MASKBIT	0x100	/* 64-bit mask bits allowed */
 #define PCI_MSI_RFU		3	/* Rest of capability flags */
 #define PCI_MSI_ADDRESS_LO	4	/* Lower 32 bits */
 #define PCI_MSI_ADDRESS_HI	8	/* Upper 32 bits (if PCI_MSI_FLAGS_64BIT set) */
 #define PCI_MSI_DATA_32		8	/* 16 bits of data for 32-bit devices */
 #define PCI_MSI_DATA_64		12	/* 16 bits of data for 64-bit devices */
+#define PCI_MSI_MASK_BIT	16	/* Mask bits register */
 
 /* CompactPCI Hotswap Register */
 
diff -X excludes -urN linux-2.6.0-test2-create-vectorbase/include/linux/pci_msi.h linux-2.6.0-test2-create-msi/include/linux/pci_msi.h
--- linux-2.6.0-test2-create-vectorbase/include/linux/pci_msi.h	1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.0-test2-create-msi/include/linux/pci_msi.h	2003-08-06 09:44:10.000000000 -0400
@@ -0,0 +1,142 @@
+/*
+ *	linux/include/asm/pci_msi.h
+ *
+ */
+
+#ifndef _ASM_PCI_MSI_H
+#define _ASM_PCI_MSI_H
+
+#include <linux/pci.h>
+
+#define MSI_AUTO -1
+#define NR_REPEATS	23
+#define NR_RESERVED_VECTORS 3 /*FIRST_DEVICE_VECTOR,FIRST_SYSTEM_VECTOR,0x80 */	
+extern int vector_irq[NR_IRQS];
+extern void (*interrupt[NR_IRQS])(void);
+extern void restore_ioapic_irq_handler(int irq);
+extern int acpi_pci_irq_lookup (struct pci_bus *bus, int device, int pin);
+
+#ifndef CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES
+#define ENABLE_MSI_ON_ALL		1
+#else
+#define ENABLE_MSI_ON_ALL		0
+#endif
+
+#ifdef CONFIG_X86_IO_APIC
+#define IO_APIC_DEFINED			1
+#else
+#define IO_APIC_DEFINED			0
+#endif
+
+#define DRIVER_NOMSI_MAX		32
+/*
+ * MSI-X Address Register
+ */
+#define PCI_MSIX_FLAGS_QSIZE		0x7FF
+#define PCI_MSIX_FLAGS_ENABLE		(1 << 15)
+#define PCI_MSIX_FLAGS_BIRMASK		(7 << 0)
+#define PCI_MSIX_FLAGS_BITMASK		(1 << 0)
+
+#define PCI_MSIX_ENTRY_SIZE		4	/* number of dwords */
+
+#define msi_control_reg(base)		(base + PCI_MSI_FLAGS)
+#define msi_lower_address_reg(base)	(base + PCI_MSI_ADDRESS_LO)
+#define msi_upper_address_reg(base)	(base + PCI_MSI_ADDRESS_HI)
+#define msi_data_reg(base, is64bit)	\
+	( (is64bit == 1) ? base+PCI_MSI_DATA_64 : base+PCI_MSI_DATA_32 )
+#define msi_mask_bits_reg(base, is64bit) \
+	( (is64bit == 1) ? base+PCI_MSI_MASK_BIT : base+PCI_MSI_MASK_BIT-4)
+#define msi_disable(control)		(control & ~PCI_MSI_FLAGS_ENABLE)
+#define multi_msi_capable(control) \
+	(1 << ((control & PCI_MSI_FLAGS_QMASK) >> 1))
+#define multi_msi_enable(control, num) \
+	control |= (((num >> 1) << 4) & PCI_MSI_FLAGS_QSIZE);
+#define is_64bit_address(control)	(control & PCI_MSI_FLAGS_64BIT)	
+#define is_mask_bit_support(control)	(control & PCI_MSI_FLAGS_MASKBIT)
+#define msi_enable(control, num) multi_msi_enable(control, num); \
+	control |= PCI_MSI_FLAGS_ENABLE
+ 
+#define msix_control_reg		msi_control_reg
+#define msix_table_offset_reg(base)	(base + 0x04)
+#define msix_pba_offset_reg(base)	(base + 0x08)
+#define msix_enable(control)	 	control |= PCI_MSIX_FLAGS_ENABLE
+#define msix_disable(control)	 	(control & ~PCI_MSIX_FLAGS_ENABLE)
+#define msix_table_size(control) 	((control & PCI_MSIX_FLAGS_QSIZE)+1)
+#define multi_msix_capable		msix_table_size
+#define msix_unmask(address)	 	(address & ~PCI_MSIX_FLAGS_BITMASK)
+#define msix_mask(address)		(address | PCI_MSIX_FLAGS_BITMASK)  
+#define msix_is_pending(address) 	(address & PCI_MSIX_FLAGS_PENDMASK)
+
+#define _DEFINE_DBG_BUFFER	char __dbg_str_buf[256];
+#define _DBG_K_TRACE_ENTRY	((unsigned int)0x00000001)
+#define _DBG_K_TRACE_EXIT	((unsigned int)0x00000002)
+#define _DBG_K_INFO		((unsigned int)0x00000004)
+#define _DBG_K_ERROR		((unsigned int)0x00000008)
+#define _DBG_K_TRACE	(_DBG_K_TRACE_ENTRY | _DBG_K_TRACE_EXIT)
+
+#define _DEBUG_LEVEL	(_DBG_K_INFO | _DBG_K_ERROR | _DBG_K_TRACE)
+#define _DBG_PRINT( dbg_flags, args... )		\
+if ( _DEBUG_LEVEL & (dbg_flags) )			\
+{							\
+	int len;					\
+	len = sprintf(__dbg_str_buf, "%s:%d: %s ", 	\
+		__FILE__, __LINE__, __FUNCTION__ ); 	\
+	sprintf(__dbg_str_buf + len, args);		\
+	printk(KERN_INFO "%s\n", __dbg_str_buf);	\
+}
+
+#define MSI_FUNCTION_TRACE_ENTER	\
+	_DBG_PRINT (_DBG_K_TRACE_ENTRY, "%s", "[Entry]");
+#define MSI_FUNCTION_TRACE_EXIT		\
+	_DBG_PRINT (_DBG_K_TRACE_EXIT, "%s", "[Entry]");
+	
+
+/*
+ * MSI Defined Data Structures
+ */
+
+extern unsigned long mp_lapic_addr;
+extern char __dbg_str_buf[256];
+
+struct msg_data {	
+	__u32	vector	:  8,		
+	delivery_mode	:  3,	/*	000b: FIXED
+			 		001b: lowest priority
+			 	 	111b: ExtINT */	
+	dest_mode	:  1,	/* 0: physical, 1: logical */	
+	reserved_1	:  2,
+	trigger		:  1,	/* 0: edge, 1: level */	
+	delivery_status	:  1,	/* 0: idle, 1: pending */	
+	reserved_2	: 16;
+} __attribute__ ((packed));
+
+struct msg_address {
+	union {
+		struct { __u32
+			reserved_1		:  2,		
+			dest_mode		:  1,	/* 0: physical, 
+							   1: logical */	
+			redirection_hint        :  1,  	/* 0: to FSB, 
+							   1: redirection */ 
+			ext_dest_id		:  8,   /* Extended Dest. ID */	
+ 			dest_id			:  8,	/* Destination ID */	
+			reserved_2		: 12;	/* 0xFEE */
+      	}u;
+       	__u32  value;
+	}lo_address;
+	__u32 	hi_address;
+} __attribute__ ((packed));
+
+struct msi_desc {
+	struct {
+		__u32	type	: 5, /* {0: unused, 5h:MSI, 11h:MSI-X} 	*/
+			maskbit	: 1, /* mask-pending bit supported ?    */
+			reserved: 2, /* reserved			*/
+			entry_nr: 8, /* specific enabled entry 		*/
+			current_cpu: 16;/* current destination cpu	*/
+	}msi_attrib;
+	void *mask_entry_addr;
+	struct pci_dev *dev;
+};
+
+#endif /* _ASM_PCI_MSI_H */

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2003-10-03 22:01 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-13  1:34 Updated MSI Patches Nakajima, Jun
2003-08-13  1:37 ` Zwane Mwaikambo
  -- strict thread matches above, loose matches on Subject: below --
2003-10-03 22:15 long
2003-10-01 18:26 Nguyen, Tom L
2003-10-01 15:37 long
2003-10-01 17:09 ` Jeff Garzik
2003-08-22 18:25 long
2003-08-19 15:42 Nguyen, Tom L
2003-08-14 23:46 Nguyen, Tom L
2003-08-13 15:19 Nguyen, Tom L
2003-08-13  4:14 Nakajima, Jun
2003-08-12 23:59 Nakajima, Jun
2003-08-13  0:48 ` Zwane Mwaikambo
2003-08-12 18:36 Nakajima, Jun
2003-08-12 18:48 ` Jeff Garzik
2003-08-12 19:14   ` Zwane Mwaikambo
2003-08-12 19:44     ` Jeff Garzik
2003-08-12 17:32 Nguyen, Tom L
2003-08-12 18:14 ` Zwane Mwaikambo
2003-08-12 17:04 Nguyen, Tom L
2003-08-12 18:11 ` Zwane Mwaikambo
2003-08-12 15:22 long
2003-08-11 20:51 long
2003-08-11 21:16 ` Greg KH
2003-08-12  1:41   ` Zwane Mwaikambo
2003-08-12  5:53     ` Greg KH
2003-08-08 19:33 long
2003-08-08 20:15 ` Greg KH
2003-08-08 14:41 Nguyen, Tom L
2003-08-07 23:07 Nakajima, Jun
2003-08-07 21:25 long
2003-08-07 22:14 ` Greg KH
2003-08-07 22:44 ` Jeff Garzik
2003-08-07 23:03   ` Matt Porter
2003-08-08  2:36 ` Zwane Mwaikambo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).