All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] Per-CPU vector for Xen.
@ 2009-08-16  8:58 Zhang, Xiantao
  2009-08-18  5:45 ` Zhang, Xiantao
  0 siblings, 1 reply; 3+ messages in thread
From: Zhang, Xiantao @ 2009-08-16  8:58 UTC (permalink / raw)
  To: Keir Fraser
  Cc: xen-devel, Yang, Xiaowei, Jiang, Yunhong, Dong, Eddie, Li, Xin

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

Hi, Keir
   To support more interrupt vectors in Xen for more devices,  especially for SR-IOV devices in a large system, we implemented per-cpu vector for Xen like Linux does. For SR-IOV devices, since each VF needs several separate vectors for interrupt delivery and global ~200 vectors in Xen is insufficient and easily run out after installing two or three such devices. Becaue SR-IOV devices are becoming popular now,  and from this point of view, we have to extend vector resource space to make these devices work. As linux does, we implemented per-cpu vector for Xen to extend and scale vector resource to nr_cpus x ~200 in a system.   BTW, the core logic of the patches is ported from upstream linux and then adapted for Xen.  
 
Patch 0001:  Change nr_irqs to nr_irqs_gsi and make nr_irqs_gsi only used for GSI interrupts. 
Patch 0002:  Modify Xen from vector-based interrupt infrastructure to IRQ-based one, and the big change is that one irq number is also allocated for MSI interrupt source, and the idea is same as Linux's. 
Patch 0003:  Implement per-cpu vector for xen. Most core logic(such as vector allocation algorithm, IRQ migration logic...) is ported from upstream Linux.  
About the patch quality, we have done enough testings against upstream, and no any regression found after applying this patchset.  
Please help to review.  Comments are very appreicated!  Thanks!
 
Signed-off-by : Xiantao Zhang <xiantao.zhang@Intel.com>
 
Xiantao

[-- Attachment #2: 0001-change_nr_irqs_to_nr_irqs_gsi.patch --]
[-- Type: application/octet-stream, Size: 7458 bytes --]

# HG changeset patch
# User Xiantao Zhang <xiantao.zhang@intel.com>
# Date 1248758991 -28800
# Node ID 6a639384fba682924c7bf7f882ee1877b68ffbaa
# Parent  07fe52b0b2e03f4af7ed19c8559e638dd2feef1e
x86: Change nr_irqs to nr_irqs_gsi.

Currently, nr_irqs is only used for GSI irqs, change
the name to make its meaning more precise. And, also
this is the initial step to support irq allocation for
MSI interrupt source.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>

diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/arch/x86/i8259.c	Tue Jul 28 13:29:51 2009 +0800
@@ -403,8 +403,8 @@ void __init init_IRQ(void)
             set_intr_gate(i, interrupt[i]);
     }
 
-    irq_vector = xmalloc_array(u8, nr_irqs);
-    memset(irq_vector, 0, nr_irqs * sizeof(*irq_vector));
+    irq_vector = xmalloc_array(u8, nr_irqs_gsi);
+    memset(irq_vector, 0, nr_irqs_gsi * sizeof(*irq_vector));
 
     for ( i = 0; i < 16; i++ )
     {
diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/arch/x86/io_apic.c	Tue Jul 28 13:29:51 2009 +0800
@@ -71,8 +71,8 @@ int disable_timer_pin_1 __initdata;
  * Rough estimation of how many shared IRQs there are, can
  * be changed anytime.
  */
-#define MAX_PLUS_SHARED_IRQS nr_irqs
-#define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + nr_irqs)
+#define MAX_PLUS_SHARED_IRQS nr_irqs_gsi
+#define PIN_MAP_SIZE (MAX_PLUS_SHARED_IRQS + nr_irqs_gsi)
 
 /*
  * This is performance-critical, we want to do it O(1)
@@ -741,7 +741,7 @@ static void __init setup_IO_APIC_irqs(vo
                 vector = assign_irq_vector(irq);
                 entry.vector = vector;
                 ioapic_register_intr(irq, vector, IOAPIC_AUTO);
-		
+
                 if (!apic && (irq < 16))
                     disable_8259A_irq(irq);
             }
@@ -928,7 +928,7 @@ void /*__init*/ __print_IO_APIC(void)
     }
     printk(KERN_INFO "Using vector-based indexing\n");
     printk(KERN_DEBUG "IRQ to pin mappings:\n");
-    for (i = 0; i < nr_irqs; i++) {
+    for (i = 0; i < nr_irqs_gsi; i++) {
         struct irq_pin_list *entry = irq_2_pin + i;
         if (entry->pin < 0)
             continue;
@@ -971,10 +971,10 @@ static void __init enable_IO_APIC(void)
 
     /* Initialise dynamic irq_2_pin free list. */
     irq_2_pin = xmalloc_array(struct irq_pin_list, PIN_MAP_SIZE);
-    memset(irq_2_pin, 0, nr_irqs * sizeof(*irq_2_pin));
+    memset(irq_2_pin, 0, nr_irqs_gsi * sizeof(*irq_2_pin));
     for (i = 0; i < PIN_MAP_SIZE; i++)
         irq_2_pin[i].pin = -1;
-    for (i = irq_2_pin_free_entry = nr_irqs; i < PIN_MAP_SIZE; i++)
+    for (i = irq_2_pin_free_entry = nr_irqs_gsi; i < PIN_MAP_SIZE; i++)
         irq_2_pin[i].next = i + 1;
 
     for(apic = 0; apic < nr_ioapics; apic++) {
@@ -2172,7 +2172,7 @@ void dump_ioapic_irq_info(void)
     unsigned int irq, pin, printed = 0;
     unsigned long flags;
 
-    for ( irq = 0; irq < nr_irqs; irq++ )
+    for ( irq = 0; irq < nr_irqs_gsi; irq++ )
     {
         entry = &irq_2_pin[irq];
         if ( entry->pin == -1 )
@@ -2216,7 +2216,7 @@ void __init init_ioapic_mappings(void)
     union IO_APIC_reg_01 reg_01;
 
     if ( smp_found_config )
-        nr_irqs = 0;
+        nr_irqs_gsi = 0;
     for ( i = 0; i < nr_ioapics; i++ )
     {
         if ( smp_found_config )
@@ -2247,16 +2247,16 @@ void __init init_ioapic_mappings(void)
             /* The number of IO-APIC IRQ registers (== #pins): */
             reg_01.raw = io_apic_read(i, 1);
             nr_ioapic_registers[i] = reg_01.bits.entries + 1;
-            nr_irqs += nr_ioapic_registers[i];
-        }
-    }
-    if ( !smp_found_config || skip_ioapic_setup || nr_irqs < 16 )
-        nr_irqs = 16;
-    else if ( nr_irqs > PAGE_SIZE * 8 )
+            nr_irqs_gsi += nr_ioapic_registers[i];
+        }
+    }
+    if ( !smp_found_config || skip_ioapic_setup || nr_irqs_gsi < 16 )
+        nr_irqs_gsi = 16;
+    else if ( nr_irqs_gsi > PAGE_SIZE * 8 )
     {
         /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions */
         printk(KERN_WARNING "Limiting number of IRQs found (%u) to %lu\n",
-               nr_irqs, PAGE_SIZE * 8);
-        nr_irqs = PAGE_SIZE * 8;
-    }
-}
+               nr_irqs_gsi, PAGE_SIZE * 8);
+        nr_irqs_gsi = PAGE_SIZE * 8;
+    }
+}
diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/arch/x86/irq.c	Tue Jul 28 13:29:51 2009 +0800
@@ -26,7 +26,7 @@ int opt_noirqbalance = 0;
 int opt_noirqbalance = 0;
 boolean_param("noirqbalance", opt_noirqbalance);
 
-unsigned int __read_mostly nr_irqs = 16;
+unsigned int __read_mostly nr_irqs_gsi = 16;
 irq_desc_t irq_desc[NR_VECTORS];
 
 static DEFINE_SPINLOCK(vector_lock);
@@ -80,7 +80,7 @@ int assign_irq_vector(int irq)
     static unsigned current_vector = FIRST_DYNAMIC_VECTOR;
     unsigned vector;
 
-    BUG_ON(irq >= nr_irqs && irq != AUTO_ASSIGN_IRQ);
+    BUG_ON(irq >= nr_irqs_gsi && irq != AUTO_ASSIGN_IRQ);
 
     spin_lock(&vector_lock);
 
@@ -886,10 +886,10 @@ int get_free_pirq(struct domain *d, int 
 
     if ( type == MAP_PIRQ_TYPE_GSI )
     {
-        for ( i = 16; i < nr_irqs; i++ )
+        for ( i = 16; i < nr_irqs_gsi; i++ )
             if ( !d->arch.pirq_vector[i] )
                 break;
-        if ( i == nr_irqs )
+        if ( i == nr_irqs_gsi )
             return -ENOSPC;
     }
     else
diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/arch/x86/physdev.c	Tue Jul 28 13:29:51 2009 +0800
@@ -55,7 +55,7 @@ static int physdev_map_pirq(struct physd
     switch ( map->type )
     {
         case MAP_PIRQ_TYPE_GSI:
-            if ( map->index < 0 || map->index >= nr_irqs )
+            if ( map->index < 0 || map->index >= nr_irqs_gsi )
             {
                 dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n",
                         d->domain_id, map->index);
@@ -344,7 +344,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
 
         irq = irq_op.irq;
         ret = -EINVAL;
-        if ( (irq < 0) || (irq >= nr_irqs) )
+        if ( (irq < 0) || (irq >= nr_irqs_gsi) )
             break;
 
         irq_op.vector = assign_irq_vector(irq);
diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/common/domain.c
--- a/xen/common/domain.c	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/common/domain.c	Tue Jul 28 13:29:51 2009 +0800
@@ -253,9 +253,9 @@ struct domain *domain_create(
         d->is_paused_by_controller = 1;
         atomic_inc(&d->pause_count);
 
-        d->nr_pirqs = (nr_irqs +
+        d->nr_pirqs = (nr_irqs_gsi +
                        (domid ? extra_domU_irqs :
-                        extra_dom0_irqs ?: nr_irqs));
+                        extra_dom0_irqs ?: nr_irqs_gsi));
         d->pirq_to_evtchn = xmalloc_array(u16, d->nr_pirqs);
         d->pirq_mask = xmalloc_array(
             unsigned long, BITS_TO_LONGS(d->nr_pirqs));
diff -r 07fe52b0b2e0 -r 6a639384fba6 xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Mon Aug 10 18:15:19 2009 +0100
+++ b/xen/include/xen/irq.h	Tue Jul 28 13:29:51 2009 +0800
@@ -50,9 +50,9 @@ typedef struct hw_interrupt_type hw_irq_
 #include <asm/irq.h>
 
 #ifdef NR_IRQS
-# define nr_irqs NR_IRQS
+# define nr_irqs_gsi NR_IRQS
 #else
-extern unsigned int nr_irqs;
+extern unsigned int nr_irqs_gsi;
 #endif
 
 struct msi_desc;

[-- Attachment #3: 0002-change_to_IRQ_based_interrupt_infrastructure.patch --]
[-- Type: application/octet-stream, Size: 107728 bytes --]

# HG changeset patch
# User Xiantao Zhang <xiantao.zhang@intel.com>
# Date 1249978047 -28800
# Node ID 8584327c7e701a6a5005ed2cee7daad3b6a39659
# Parent  6a639384fba682924c7bf7f882ee1877b68ffbaa
x86:  Change Xen hypervisor's interrupt infrastructure
from vector-based to IRQ-based.

In per-cpu vector environment, vector space changes to
multi-demension resource, so vector number is not appropriate
to index irq_desc which stands for unique interrupt source. As
Linux does, irq number is chosen to index irq_desc. This patch
changes vector-based interrupt infrastructure to irq-based one.
Mostly, it follows upstream linux's changes, and some parts are
adapted for Xen.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>

diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/domain.c
--- a/xen/arch/x86/domain.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/domain.c	Tue Aug 11 16:07:27 2009 +0800
@@ -474,11 +474,17 @@ int arch_domain_create(struct domain *d,
         share_xen_page_with_guest(
             virt_to_page(d->shared_info), d, XENSHARE_writable);
 
-        d->arch.pirq_vector = xmalloc_array(s16, d->nr_pirqs);
-        if ( !d->arch.pirq_vector )
+        d->arch.pirq_irq = xmalloc_array(int, d->nr_pirqs);
+        if ( !d->arch.pirq_irq )
             goto fail;
-        memset(d->arch.pirq_vector, 0,
-               d->nr_pirqs * sizeof(*d->arch.pirq_vector));
+        memset(d->arch.pirq_irq, 0,
+               d->nr_pirqs * sizeof(*d->arch.pirq_irq));
+
+        d->arch.irq_pirq = xmalloc_array(int, nr_irqs);
+        if ( !d->arch.irq_pirq )
+            goto fail;
+        memset(d->arch.irq_pirq, 0,
+               nr_irqs * sizeof(*d->arch.irq_pirq));
 
         if ( (rc = iommu_domain_init(d)) != 0 )
             goto fail;
@@ -513,7 +519,8 @@ int arch_domain_create(struct domain *d,
 
  fail:
     d->is_dying = DOMDYING_dead;
-    xfree(d->arch.pirq_vector);
+    xfree(d->arch.pirq_irq);
+    xfree(d->arch.irq_pirq);
     free_xenheap_page(d->shared_info);
     if ( paging_initialised )
         paging_final_teardown(d);
@@ -562,7 +569,8 @@ void arch_domain_destroy(struct domain *
 #endif
 
     free_xenheap_page(d->shared_info);
-    xfree(d->arch.pirq_vector);
+    xfree(d->arch.pirq_irq);
+    xfree(d->arch.irq_pirq);
 }
 
 unsigned long pv_guest_cr4_fixup(unsigned long guest_cr4)
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/hpet.c	Tue Aug 11 16:07:27 2009 +0800
@@ -38,7 +38,7 @@ struct hpet_event_channel
 
     unsigned int idx;   /* physical channel idx */
     int cpu;            /* msi target */
-    unsigned int vector;/* msi vector */
+    unsigned int irq;/* msi irq */
     unsigned int flags; /* HPET_EVT_x */
 } __cacheline_aligned;
 static struct hpet_event_channel legacy_hpet_event;
@@ -47,13 +47,13 @@ static unsigned int num_hpets_used; /* m
 
 DEFINE_PER_CPU(struct hpet_event_channel *, cpu_bc_channel);
 
-static int vector_channel[NR_VECTORS] = {[0 ... NR_VECTORS-1] = -1};
-
-#define vector_to_channel(vector)   vector_channel[vector]
+static int *irq_channel;
+
+#define irq_to_channel(irq)   irq_channel[irq]
 
 unsigned long hpet_address;
 
-void msi_compose_msg(struct pci_dev *pdev, int vector, struct msi_msg *msg);
+void msi_compose_msg(struct pci_dev *pdev, int irq, struct msi_msg *msg);
 
 /*
  * force_hpet_broadcast: by default legacy hpet broadcast will be stopped
@@ -208,7 +208,7 @@ again:
     spin_unlock_irq(&ch->lock);
 }
 
-static void hpet_interrupt_handler(int vector, void *data,
+static void hpet_interrupt_handler(int irq, void *data,
         struct cpu_user_regs *regs)
 {
     struct hpet_event_channel *ch = (struct hpet_event_channel *)data;
@@ -221,10 +221,10 @@ static void hpet_interrupt_handler(int v
     ch->event_handler(ch);
 }
 
-static void hpet_msi_unmask(unsigned int vector)
+static void hpet_msi_unmask(unsigned int irq)
 {
     unsigned long cfg;
-    int ch_idx = vector_to_channel(vector);
+    int ch_idx = irq_to_channel(irq);
     struct hpet_event_channel *ch;
 
     BUG_ON(ch_idx < 0);
@@ -235,10 +235,10 @@ static void hpet_msi_unmask(unsigned int
     hpet_write32(cfg, HPET_Tn_CFG(ch->idx));
 }
 
-static void hpet_msi_mask(unsigned int vector)
+static void hpet_msi_mask(unsigned int irq)
 {
     unsigned long cfg;
-    int ch_idx = vector_to_channel(vector);
+    int ch_idx = irq_to_channel(irq);
     struct hpet_event_channel *ch;
 
     BUG_ON(ch_idx < 0);
@@ -249,9 +249,9 @@ static void hpet_msi_mask(unsigned int v
     hpet_write32(cfg, HPET_Tn_CFG(ch->idx));
 }
 
-static void hpet_msi_write(unsigned int vector, struct msi_msg *msg)
-{
-    int ch_idx = vector_to_channel(vector);
+static void hpet_msi_write(unsigned int irq, struct msi_msg *msg)
+{
+    int ch_idx = irq_to_channel(irq);
     struct hpet_event_channel *ch;
 
     BUG_ON(ch_idx < 0);
@@ -261,9 +261,9 @@ static void hpet_msi_write(unsigned int 
     hpet_write32(msg->address_lo, HPET_Tn_ROUTE(ch->idx) + 4);
 }
 
-static void hpet_msi_read(unsigned int vector, struct msi_msg *msg)
-{
-    int ch_idx = vector_to_channel(vector);
+static void hpet_msi_read(unsigned int irq, struct msi_msg *msg)
+{
+    int ch_idx = irq_to_channel(irq);
     struct hpet_event_channel *ch;
 
     BUG_ON(ch_idx < 0);
@@ -274,31 +274,32 @@ static void hpet_msi_read(unsigned int v
     msg->address_hi = 0;
 }
 
-static unsigned int hpet_msi_startup(unsigned int vector)
-{
-    hpet_msi_unmask(vector);
+static unsigned int hpet_msi_startup(unsigned int irq)
+{
+    hpet_msi_unmask(irq);
     return 0;
 }
 
-static void hpet_msi_shutdown(unsigned int vector)
-{
-    hpet_msi_mask(vector);
-}
-
-static void hpet_msi_ack(unsigned int vector)
+static void hpet_msi_shutdown(unsigned int irq)
+{
+    hpet_msi_mask(irq);
+}
+
+static void hpet_msi_ack(unsigned int irq)
 {
     ack_APIC_irq();
 }
 
-static void hpet_msi_end(unsigned int vector)
-{
-}
-
-static void hpet_msi_set_affinity(unsigned int vector, cpumask_t mask)
+static void hpet_msi_end(unsigned int irq)
+{
+}
+
+static void hpet_msi_set_affinity(unsigned int irq, cpumask_t mask)
 {
     struct msi_msg msg;
     unsigned int dest;
     cpumask_t tmp;
+    int vector = irq_to_vector(irq);
 
     cpus_and(tmp, mask, cpu_online_map);
     if ( cpus_empty(tmp) )
@@ -314,7 +315,7 @@ static void hpet_msi_set_affinity(unsign
     msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
     hpet_msi_write(vector, &msg);
-    irq_desc[vector].affinity = mask;
+    irq_desc[irq].affinity = mask;
 }
 
 /*
@@ -331,44 +332,44 @@ static struct hw_interrupt_type hpet_msi
     .set_affinity   = hpet_msi_set_affinity,
 };
 
-static int hpet_setup_msi_irq(unsigned int vector)
+static int hpet_setup_msi_irq(unsigned int irq)
 {
     int ret;
     struct msi_msg msg;
-    struct hpet_event_channel *ch = &hpet_events[vector_to_channel(vector)];
-
-    irq_desc[vector].handler = &hpet_msi_type;
-    ret = request_irq_vector(vector, hpet_interrupt_handler,
+    struct hpet_event_channel *ch = &hpet_events[irq_to_channel(irq)];
+
+    irq_desc[irq].handler = &hpet_msi_type;
+    ret = request_irq(irq, hpet_interrupt_handler,
                       0, "HPET", ch);
     if ( ret < 0 )
         return ret;
 
-    msi_compose_msg(NULL, vector, &msg);
-    hpet_msi_write(vector, &msg);
+    msi_compose_msg(NULL, irq, &msg);
+    hpet_msi_write(irq, &msg);
 
     return 0;
 }
 
 static int hpet_assign_irq(struct hpet_event_channel *ch)
 {
-    int vector;
-
-    if ( ch->vector )
+    int irq;
+
+    if ( ch->irq )
         return 0;
 
-    if ( (vector = assign_irq_vector(AUTO_ASSIGN_IRQ)) < 0 )
-        return vector;
-
-    vector_channel[vector] = ch - &hpet_events[0];
-
-    if ( hpet_setup_msi_irq(vector) )
-    {
-        free_irq_vector(vector);
-        vector_channel[vector] = -1;
+    if ( (irq = create_irq()) < 0 )
+        return irq;
+
+    irq_channel[irq] = ch - &hpet_events[0];
+
+    if ( hpet_setup_msi_irq(irq) )
+    {
+        destroy_irq(irq);
+        irq_channel[irq] = -1;
         return -EINVAL;
     }
 
-    ch->vector = vector;
+    ch->irq = irq;
     return 0;
 }
 
@@ -402,8 +403,8 @@ static int hpet_fsb_cap_lookup(void)
         /* set default irq affinity */
         ch->cpu = num_chs_used;
         per_cpu(cpu_bc_channel, ch->cpu) = ch;
-        irq_desc[ch->vector].handler->
-            set_affinity(ch->vector, cpumask_of_cpu(ch->cpu));
+        irq_desc[ch->irq].handler->
+            set_affinity(ch->irq, cpumask_of_cpu(ch->cpu));
 
         num_chs_used++;
 
@@ -462,8 +463,8 @@ static void hpet_attach_channel_share(in
         return;
 
     /* set irq affinity */
-    irq_desc[ch->vector].handler->
-        set_affinity(ch->vector, cpumask_of_cpu(ch->cpu));
+    irq_desc[ch->irq].handler->
+        set_affinity(ch->irq, cpumask_of_cpu(ch->cpu));
 }
 
 static void hpet_detach_channel_share(int cpu)
@@ -484,8 +485,8 @@ static void hpet_detach_channel_share(in
 
     ch->cpu = first_cpu(ch->cpumask);
     /* set irq affinity */
-    irq_desc[ch->vector].handler->
-        set_affinity(ch->vector, cpumask_of_cpu(ch->cpu));
+    irq_desc[ch->irq].handler->
+        set_affinity(ch->irq, cpumask_of_cpu(ch->cpu));
 }
 
 static void (*hpet_attach_channel)(int cpu, struct hpet_event_channel *ch);
@@ -522,6 +523,11 @@ void hpet_broadcast_init(void)
     u64 hpet_rate;
     u32 hpet_id, cfg;
     int i;
+
+    irq_channel= xmalloc_array(int, nr_irqs);
+    BUG_ON(!irq_channel);
+    for (i = 0; i < nr_irqs ; i++)
+        irq_channel[i] = -1;
 
     hpet_rate = hpet_setup();
     if ( hpet_rate == 0 )
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/hvm/vmsi.c
--- a/xen/arch/x86/hvm/vmsi.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/hvm/vmsi.c	Tue Aug 11 16:07:27 2009 +0800
@@ -374,7 +374,7 @@ static void del_msixtbl_entry(struct msi
 
 int msixtbl_pt_register(struct domain *d, int pirq, uint64_t gtable)
 {
-    irq_desc_t *irq_desc;
+    struct irq_desc *irq_desc;
     struct msi_desc *msi_desc;
     struct pci_dev *pdev;
     struct msixtbl_entry *entry, *new_entry;
@@ -429,7 +429,7 @@ out:
 
 void msixtbl_pt_unregister(struct domain *d, int pirq)
 {
-    irq_desc_t *irq_desc;
+    struct irq_desc *irq_desc;
     struct msi_desc *msi_desc;
     struct pci_dev *pdev;
     struct msixtbl_entry *entry;
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/i8259.c	Tue Aug 11 16:07:27 2009 +0800
@@ -106,38 +106,28 @@ BUILD_SMP_INTERRUPT(cmci_interrupt, CMCI
 
 static DEFINE_SPINLOCK(i8259A_lock);
 
-static void disable_8259A_vector(unsigned int vector)
-{
-    disable_8259A_irq(LEGACY_IRQ_FROM_VECTOR(vector));
-}
-
-static void enable_8259A_vector(unsigned int vector)
-{
-    enable_8259A_irq(LEGACY_IRQ_FROM_VECTOR(vector));
-}
-
-static void mask_and_ack_8259A_vector(unsigned int);
-
-static void end_8259A_vector(unsigned int vector)
-{
-    if (!(irq_desc[vector].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
-        enable_8259A_vector(vector);
-}
-
-static unsigned int startup_8259A_vector(unsigned int vector)
-{ 
-    enable_8259A_vector(vector);
+static void mask_and_ack_8259A_irq(unsigned int irq);
+
+static unsigned int startup_8259A_irq(unsigned int irq)
+{
+    enable_8259A_irq(irq);
     return 0; /* never anything pending */
+}
+
+static void end_8259A_irq(unsigned int irq)
+{
+    if (!(irq_desc[irq].status & (IRQ_DISABLED|IRQ_INPROGRESS)))
+        enable_8259A_irq(irq);
 }
 
 static struct hw_interrupt_type i8259A_irq_type = {
     .typename = "XT-PIC",
-    .startup  = startup_8259A_vector,
-    .shutdown = disable_8259A_vector,
-    .enable   = enable_8259A_vector,
-    .disable  = disable_8259A_vector,
-    .ack      = mask_and_ack_8259A_vector,
-    .end      = end_8259A_vector
+    .startup  = startup_8259A_irq,
+    .shutdown = disable_8259A_irq,
+    .enable   = enable_8259A_irq,
+    .disable  = disable_8259A_irq,
+    .ack      = mask_and_ack_8259A_irq,
+    .end      = end_8259A_irq
 };
 
 /*
@@ -237,9 +227,8 @@ static inline int i8259A_irq_real(unsign
  * first, _then_ send the EOI, and the order of EOI
  * to the two 8259s is important!
  */
-static void mask_and_ack_8259A_vector(unsigned int vector)
-{
-    unsigned int irq = LEGACY_IRQ_FROM_VECTOR(vector);
+static void mask_and_ack_8259A_irq(unsigned int irq)
+{
     unsigned int irqmask = 1 << irq;
     unsigned long flags;
 
@@ -369,9 +358,9 @@ void __devinit init_8259A(int auto_eoi)
          * in AEOI mode we just have to mask the interrupt
          * when acking.
          */
-        i8259A_irq_type.ack = disable_8259A_vector;
-    else
-        i8259A_irq_type.ack = mask_and_ack_8259A_vector;
+        i8259A_irq_type.ack = disable_8259A_irq;
+    else
+        i8259A_irq_type.ack = mask_and_ack_8259A_irq;
 
     udelay(100);            /* wait for 8259A to initialize */
 
@@ -385,31 +374,25 @@ static struct irqaction cascade = { no_a
 
 void __init init_IRQ(void)
 {
-    int i;
+    int i, vector;
 
     init_bsp_APIC();
 
     init_8259A(0);
 
-    for ( i = 0; i < NR_VECTORS; i++ )
+    BUG_ON(init_irq_data() < 0);
+
+    for ( vector = FIRST_DYNAMIC_VECTOR; vector < NR_VECTORS; vector++ )
     {
-        irq_desc[i].status  = IRQ_DISABLED;
-        irq_desc[i].handler = &no_irq_type;
-        irq_desc[i].action  = NULL;
-        irq_desc[i].depth   = 1;
-        spin_lock_init(&irq_desc[i].lock);
-        cpus_setall(irq_desc[i].affinity);
-        if ( i >= 0x20 )
-            set_intr_gate(i, interrupt[i]);
-    }
-
-    irq_vector = xmalloc_array(u8, nr_irqs_gsi);
-    memset(irq_vector, 0, nr_irqs_gsi * sizeof(*irq_vector));
+        if (vector == HYPERCALL_VECTOR || vector == LEGACY_SYSCALL_VECTOR)
+            continue;
+        set_intr_gate(vector, interrupt[vector]);
+    }
 
     for ( i = 0; i < 16; i++ )
     {
         vector_irq[LEGACY_VECTOR(i)] = i;
-        irq_desc[LEGACY_VECTOR(i)].handler = &i8259A_irq_type;
+        irq_desc[i].handler = &i8259A_irq_type;
     }
 
     /* Never allocate the hypercall vector or Linux/BSD fast-trap vector. */
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/io_apic.c	Tue Aug 11 16:07:27 2009 +0800
@@ -661,9 +661,6 @@ static inline int IO_APIC_irq_trigger(in
     return 0;
 }
 
-/* irq_vectors is indexed by the sum of all RTEs in all I/O APICs. */
-u8 *irq_vector __read_mostly = (u8 *)(1UL << (BITS_PER_LONG - 1));
-
 static struct hw_interrupt_type ioapic_level_type;
 static struct hw_interrupt_type ioapic_edge_type;
 
@@ -671,13 +668,13 @@ static struct hw_interrupt_type ioapic_e
 #define IOAPIC_EDGE	0
 #define IOAPIC_LEVEL	1
 
-static inline void ioapic_register_intr(int irq, int vector, unsigned long trigger)
+static inline void ioapic_register_intr(int irq, unsigned long trigger)
 {
     if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
         trigger == IOAPIC_LEVEL)
-        irq_desc[vector].handler = &ioapic_level_type;
+        irq_desc[irq].handler = &ioapic_level_type;
     else
-        irq_desc[vector].handler = &ioapic_edge_type;
+        irq_desc[irq].handler = &ioapic_edge_type;
 }
 
 static void __init setup_IO_APIC_irqs(void)
@@ -740,7 +737,7 @@ static void __init setup_IO_APIC_irqs(vo
             if (IO_APIC_IRQ(irq)) {
                 vector = assign_irq_vector(irq);
                 entry.vector = vector;
-                ioapic_register_intr(irq, vector, IOAPIC_AUTO);
+                ioapic_register_intr(irq, IOAPIC_AUTO);
 
                 if (!apic && (irq < 16))
                     disable_8259A_irq(irq);
@@ -748,7 +745,7 @@ static void __init setup_IO_APIC_irqs(vo
             spin_lock_irqsave(&ioapic_lock, flags);
             io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
             io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
-            set_native_irq_info(entry.vector, TARGET_CPUS);
+            set_native_irq_info(irq, TARGET_CPUS);
             spin_unlock_irqrestore(&ioapic_lock, flags);
 	}
     }
@@ -788,7 +785,7 @@ static void __init setup_ExtINT_IRQ0_pin
      * The timer IRQ doesn't have to know that behind the
      * scene we have a 8259A-master in AEOI mode ...
      */
-    irq_desc[IO_APIC_VECTOR(0)].handler = &ioapic_edge_type;
+    irq_desc[0].handler = &ioapic_edge_type;
 
     /*
      * Add it to the IO-APIC irq-routing table:
@@ -1269,7 +1266,7 @@ static unsigned int startup_edge_ioapic_
  */
 static void ack_edge_ioapic_irq(unsigned int irq)
 {
-    if ((irq_desc[IO_APIC_VECTOR(irq)].status & (IRQ_PENDING | IRQ_DISABLED))
+    if ((irq_desc[irq].status & (IRQ_PENDING | IRQ_DISABLED))
         == (IRQ_PENDING | IRQ_DISABLED))
         mask_IO_APIC_irq(irq);
     ack_APIC_irq();
@@ -1359,7 +1356,7 @@ static void end_level_ioapic_irq (unsign
 
     if ( !ioapic_ack_new )
     {
-        if ( !(irq_desc[IO_APIC_VECTOR(irq)].status & IRQ_DISABLED) )
+        if ( !(irq_desc[irq].status & IRQ_DISABLED) )
             unmask_IO_APIC_irq(irq);
         return;
     }
@@ -1395,70 +1392,19 @@ static void end_level_ioapic_irq (unsign
         __mask_IO_APIC_irq(irq);
         __edge_IO_APIC_irq(irq);
         __level_IO_APIC_irq(irq);
-        if ( !(irq_desc[IO_APIC_VECTOR(irq)].status & IRQ_DISABLED) )
+        if ( !(irq_desc[irq].status & IRQ_DISABLED) )
             __unmask_IO_APIC_irq(irq);
         spin_unlock(&ioapic_lock);
     }
 }
 
-static unsigned int startup_edge_ioapic_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    return startup_edge_ioapic_irq(irq);
-}
-
-static void ack_edge_ioapic_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    ack_edge_ioapic_irq(irq);
-}
-
-static unsigned int startup_level_ioapic_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    return startup_level_ioapic_irq (irq);
-}
-
-static void mask_and_ack_level_ioapic_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    mask_and_ack_level_ioapic_irq(irq);
-}
-
-static void end_level_ioapic_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    end_level_ioapic_irq(irq);
-}
-
-static void mask_IO_APIC_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    mask_IO_APIC_irq(irq);
-}
-
-static void unmask_IO_APIC_vector(unsigned int vector)
-{
-    int irq = vector_to_irq(vector);
-    unmask_IO_APIC_irq(irq);
-}
-
-static void set_ioapic_affinity_vector(
-    unsigned int vector, cpumask_t cpu_mask)
-{
-    int irq = vector_to_irq(vector);
-
-    set_native_irq_info(vector, cpu_mask);
-    set_ioapic_affinity_irq(irq, cpu_mask);
-}
-
-static void disable_edge_ioapic_vector(unsigned int vector)
-{
-}
-
-static void end_edge_ioapic_vector(unsigned int vector)
-{
-}
+static void disable_edge_ioapic_irq(unsigned int irq)
+{
+}
+
+static void end_edge_ioapic_irq(unsigned int irq)
+ {
+ }
 
 /*
  * Level and edge triggered IO-APIC interrupts need different handling,
@@ -1470,53 +1416,54 @@ static void end_edge_ioapic_vector(unsig
  */
 static struct hw_interrupt_type ioapic_edge_type = {
     .typename 	= "IO-APIC-edge",
-    .startup 	= startup_edge_ioapic_vector,
-    .shutdown 	= disable_edge_ioapic_vector,
-    .enable 	= unmask_IO_APIC_vector,
-    .disable 	= disable_edge_ioapic_vector,
-    .ack 		= ack_edge_ioapic_vector,
-    .end 		= end_edge_ioapic_vector,
-    .set_affinity 	= set_ioapic_affinity_vector,
+    .startup 	= startup_edge_ioapic_irq,
+    .shutdown 	= disable_edge_ioapic_irq,
+    .enable 	= unmask_IO_APIC_irq,
+    .disable 	= disable_edge_ioapic_irq,
+    .ack 		= ack_edge_ioapic_irq,
+    .end 		= end_edge_ioapic_irq,
+    .set_affinity 	= set_ioapic_affinity_irq,
 };
 
 static struct hw_interrupt_type ioapic_level_type = {
     .typename 	= "IO-APIC-level",
-    .startup 	= startup_level_ioapic_vector,
-    .shutdown 	= mask_IO_APIC_vector,
-    .enable 	= unmask_IO_APIC_vector,
-    .disable 	= mask_IO_APIC_vector,
-    .ack 		= mask_and_ack_level_ioapic_vector,
-    .end 		= end_level_ioapic_vector,
-    .set_affinity 	= set_ioapic_affinity_vector,
+    .startup 	= startup_level_ioapic_irq,
+    .shutdown 	= mask_IO_APIC_irq,
+    .enable 	= unmask_IO_APIC_irq,
+    .disable 	= mask_IO_APIC_irq,
+    .ack 		= mask_and_ack_level_ioapic_irq,
+    .end 		= end_level_ioapic_irq,
+    .set_affinity 	= set_ioapic_affinity_irq,
 };
 
-static unsigned int startup_msi_vector(unsigned int vector)
-{
-    unmask_msi_vector(vector);
+static unsigned int startup_msi_irq(unsigned int irq)
+{
+    unmask_msi_irq(irq);
     return 0;
 }
 
-static void ack_msi_vector(unsigned int vector)
-{
-    if ( msi_maskable_irq(irq_desc[vector].msi_desc) )
+static void ack_msi_irq(unsigned int irq)
+{
+    struct irq_desc *desc = irq_to_desc(irq);
+
+    if ( msi_maskable_irq(desc->msi_desc) )
         ack_APIC_irq(); /* ACKTYPE_NONE */
 }
 
-static void end_msi_vector(unsigned int vector)
-{
-    if ( !msi_maskable_irq(irq_desc[vector].msi_desc) )
+static void end_msi_irq(unsigned int irq)
+{
+    if ( !msi_maskable_irq(irq_desc[irq].msi_desc) )
         ack_APIC_irq(); /* ACKTYPE_EOI */
 }
 
-static void shutdown_msi_vector(unsigned int vector)
-{
-    mask_msi_vector(vector);
-}
-
-static void set_msi_affinity_vector(unsigned int vector, cpumask_t cpu_mask)
-{
-    set_native_irq_info(vector, cpu_mask);
-    set_msi_affinity(vector, cpu_mask);
+static void shutdown_msi_irq(unsigned int irq)
+{
+    mask_msi_irq(irq);
+}
+
+static void set_msi_affinity_irq(unsigned int irq, cpumask_t cpu_mask)
+{
+    set_msi_affinity(irq, cpu_mask);
 }
 
 /*
@@ -1525,13 +1472,13 @@ static void set_msi_affinity_vector(unsi
  */
 struct hw_interrupt_type pci_msi_type = {
     .typename   = "PCI-MSI",
-    .startup    = startup_msi_vector,
-    .shutdown   = shutdown_msi_vector,
-    .enable	    = unmask_msi_vector,
-    .disable    = mask_msi_vector,
-    .ack        = ack_msi_vector,
-    .end        = end_msi_vector,
-    .set_affinity   = set_msi_affinity_vector,
+    .startup    = startup_msi_irq,
+    .shutdown   = shutdown_msi_irq,
+    .enable	    = unmask_msi_irq,
+    .disable    = mask_msi_irq,
+    .ack        = ack_msi_irq,
+    .end        = end_msi_irq,
+    .set_affinity   = set_msi_affinity_irq,
 };
 
 static inline void init_IO_APIC_traps(void)
@@ -1543,7 +1490,7 @@ static inline void init_IO_APIC_traps(vo
             make_8259A_irq(irq);
 }
 
-static void enable_lapic_vector(unsigned int vector)
+static void enable_lapic_irq(unsigned int irq)
 {
     unsigned long v;
 
@@ -1551,7 +1498,7 @@ static void enable_lapic_vector(unsigned
     apic_write_around(APIC_LVT0, v & ~APIC_LVT_MASKED);
 }
 
-static void disable_lapic_vector(unsigned int vector)
+static void disable_lapic_irq(unsigned int irq)
 {
     unsigned long v;
 
@@ -1559,21 +1506,21 @@ static void disable_lapic_vector(unsigne
     apic_write_around(APIC_LVT0, v | APIC_LVT_MASKED);
 }
 
-static void ack_lapic_vector(unsigned int vector)
+static void ack_lapic_irq(unsigned int irq)
 {
     ack_APIC_irq();
 }
 
-static void end_lapic_vector(unsigned int vector) { /* nothing */ }
+static void end_lapic_irq(unsigned int irq) { /* nothing */ }
 
 static struct hw_interrupt_type lapic_irq_type = {
     .typename 	= "local-APIC-edge",
     .startup 	= NULL, /* startup_irq() not used for IRQ0 */
     .shutdown 	= NULL, /* shutdown_irq() not used for IRQ0 */
-    .enable 	= enable_lapic_vector,
-    .disable 	= disable_lapic_vector,
-    .ack 		= ack_lapic_vector,
-    .end 		= end_lapic_vector
+    .enable 	= enable_lapic_irq,
+    .disable 	= disable_lapic_irq,
+    .ack 		= ack_lapic_irq,
+    .end 		= end_lapic_irq,
 };
 
 /*
@@ -1661,9 +1608,9 @@ static inline void check_timer(void)
     disable_8259A_irq(0);
     vector = assign_irq_vector(0);
 
-    irq_desc[IO_APIC_VECTOR(0)].action = irq_desc[LEGACY_VECTOR(0)].action;
-    irq_desc[IO_APIC_VECTOR(0)].depth  = 0;
-    irq_desc[IO_APIC_VECTOR(0)].status &= ~IRQ_DISABLED;
+    irq_desc[0].depth  = 0;
+    irq_desc[0].status &= ~IRQ_DISABLED;
+    irq_desc[0].handler = &ioapic_edge_type;
 
     /*
      * Subtle, code in do_timer_interrupt() expects an AEOI
@@ -1736,7 +1683,7 @@ static inline void check_timer(void)
     printk(KERN_INFO "...trying to set up timer as Virtual Wire IRQ...");
 
     disable_8259A_irq(0);
-    irq_desc[vector].handler = &lapic_irq_type;
+    irq_desc[0].handler = &lapic_irq_type;
     apic_write_around(APIC_LVT0, APIC_DM_FIXED | vector);	/* Fixed mode */
     enable_8259A_irq(0);
 
@@ -2002,7 +1949,7 @@ int io_apic_set_pci_routing (int ioapic,
 		mp_ioapics[ioapic].mpc_apicid, pin, entry.vector, irq,
 		edge_level, active_high_low);
 
-    ioapic_register_intr(irq, entry.vector, edge_level);
+    ioapic_register_intr(irq, edge_level);
 
     if (!ioapic && (irq < 16))
         disable_8259A_irq(irq);
@@ -2010,7 +1957,7 @@ int io_apic_set_pci_routing (int ioapic,
     spin_lock_irqsave(&ioapic_lock, flags);
     io_apic_write(ioapic, 0x11+2*pin, *(((int *)&entry)+1));
     io_apic_write(ioapic, 0x10+2*pin, *(((int *)&entry)+0));
-    set_native_irq_info(entry.vector, TARGET_CPUS);
+    set_native_irq_info(irq, TARGET_CPUS);
     spin_unlock_irqrestore(&ioapic_lock, flags);
 
     return 0;
@@ -2114,12 +2061,13 @@ int ioapic_guest_write(unsigned long phy
 
     if ( old_rte.vector >= FIRST_DYNAMIC_VECTOR )
         old_irq = vector_irq[old_rte.vector];
+
     if ( new_rte.vector >= FIRST_DYNAMIC_VECTOR )
         new_irq = vector_irq[new_rte.vector];
 
     if ( (old_irq != new_irq) && (old_irq >= 0) && IO_APIC_IRQ(old_irq) )
     {
-        if ( irq_desc[IO_APIC_VECTOR(old_irq)].action )
+        if ( irq_desc[old_irq].action )
         {
             WARN_BOGUS_WRITE("Attempt to remove IO-APIC pin of in-use IRQ!\n");
             spin_unlock_irqrestore(&ioapic_lock, flags);
@@ -2131,7 +2079,7 @@ int ioapic_guest_write(unsigned long phy
 
     if ( (new_irq >= 0) && IO_APIC_IRQ(new_irq) )
     {
-        if ( irq_desc[IO_APIC_VECTOR(new_irq)].action )
+        if ( irq_desc[new_irq].action )
         {
             WARN_BOGUS_WRITE("Attempt to %s IO-APIC pin for in-use IRQ!\n",
                              (old_irq != new_irq) ? "add" : "modify");
@@ -2140,7 +2088,7 @@ int ioapic_guest_write(unsigned long phy
         }
         
         /* Set the correct irq-handling type. */
-        irq_desc[IO_APIC_VECTOR(new_irq)].handler = new_rte.trigger ? 
+        irq_desc[new_irq].handler = new_rte.trigger ? 
             &ioapic_level_type: &ioapic_edge_type;
         
         if ( old_irq != new_irq )
@@ -2252,11 +2200,17 @@ void __init init_ioapic_mappings(void)
     }
     if ( !smp_found_config || skip_ioapic_setup || nr_irqs_gsi < 16 )
         nr_irqs_gsi = 16;
-    else if ( nr_irqs_gsi > PAGE_SIZE * 8 )
+    else if ( nr_irqs_gsi > MAX_GSI_IRQS)
     {
         /* for PHYSDEVOP_pirq_eoi_gmfn guest assumptions */
-        printk(KERN_WARNING "Limiting number of IRQs found (%u) to %lu\n",
-               nr_irqs_gsi, PAGE_SIZE * 8);
-        nr_irqs_gsi = PAGE_SIZE * 8;
-    }
-}
+        printk(KERN_WARNING "Limiting number of GSI IRQs found (%u) to %lu\n",
+               nr_irqs_gsi, MAX_GSI_IRQS);
+        nr_irqs_gsi = MAX_GSI_IRQS;
+    }
+
+    if (nr_irqs < 2 * nr_irqs_gsi)
+        nr_irqs = 2 * nr_irqs_gsi;
+
+    if (nr_irqs > MAX_NR_IRQS)
+        nr_irqs = MAX_NR_IRQS;
+}
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/irq.c	Tue Aug 11 16:07:27 2009 +0800
@@ -27,13 +27,162 @@ boolean_param("noirqbalance", opt_noirqb
 boolean_param("noirqbalance", opt_noirqbalance);
 
 unsigned int __read_mostly nr_irqs_gsi = 16;
-irq_desc_t irq_desc[NR_VECTORS];
+unsigned int __read_mostly nr_irqs = 1024;
+integer_param("nr_irqs", nr_irqs);
+
+u8 __read_mostly *irq_vector;
+struct irq_desc __read_mostly *irq_desc = NULL;
+
+int __read_mostly *irq_status = NULL;
+#define IRQ_UNUSED      (0)
+#define IRQ_USED        (1)
+#define IRQ_RSVD        (2)
+
+static struct timer *irq_guest_eoi_timer;
 
 static DEFINE_SPINLOCK(vector_lock);
 int vector_irq[NR_VECTORS] __read_mostly = {
     [0 ... NR_VECTORS - 1] = FREE_TO_ASSIGN_IRQ
 };
 
+static inline int find_unassigned_irq(void)
+{
+    int irq;
+
+    for (irq = nr_irqs_gsi; irq < nr_irqs; irq++)
+        if (irq_status[irq] == IRQ_UNUSED)
+            return irq;
+    return -ENOSPC;
+}
+
+/*
+ * Dynamic irq allocate and deallocation for MSI
+ */
+int create_irq(void)
+{
+    unsigned long flags;
+    int irq, ret;
+    irq = -ENOSPC;
+
+    spin_lock_irqsave(&vector_lock, flags);
+
+    irq = find_unassigned_irq();
+    if (irq < 0)
+         goto out;
+    ret = __assign_irq_vector(irq);
+    if (ret < 0)
+        irq = ret;
+out:
+     spin_unlock_irqrestore(&vector_lock, flags);
+
+    return irq;
+}
+
+void dynamic_irq_cleanup(unsigned int irq)
+{
+    struct irq_desc *desc = irq_to_desc(irq);
+    struct irqaction *action;
+    unsigned long flags;
+
+    spin_lock_irqsave(&desc->lock, flags);
+    desc->status  |= IRQ_DISABLED;
+    desc->handler->shutdown(irq);
+    action = desc->action;
+    desc->action  = NULL;
+    desc->depth   = 1;
+    desc->msi_desc = NULL;
+    desc->handler = &no_irq_type;
+    cpus_setall(desc->affinity);
+    spin_unlock_irqrestore(&desc->lock, flags);
+
+    /* Wait to make sure it's not being used on another CPU */
+    do { smp_mb(); } while ( desc->status & IRQ_INPROGRESS );
+
+    if (action)
+        xfree(action);
+}
+
+static void __clear_irq_vector(int irq)
+{
+    int vector = irq_vector[irq];
+    vector_irq[vector] = FREE_TO_ASSIGN_IRQ;
+    irq_vector[irq] = 0;
+    irq_status[irq] = IRQ_UNUSED;
+}
+
+void clear_irq_vector(int irq)
+{
+    unsigned long flags;
+
+    spin_lock_irqsave(&vector_lock, flags);
+    __clear_irq_vector(irq);
+    spin_unlock_irqrestore(&vector_lock, flags);
+}
+
+void destroy_irq(unsigned int irq)
+{
+    dynamic_irq_cleanup(irq);
+    clear_irq_vector(irq);
+}
+
+int irq_to_vector(int irq)
+{
+    int vector = -1;
+
+    BUG_ON(irq >= nr_irqs || irq < 0);
+
+    if (IO_APIC_IRQ(irq) || MSI_IRQ(irq))
+        vector = irq_vector[irq];
+    else
+        vector = LEGACY_VECTOR(irq);
+
+    return vector;
+}
+
+static void init_one_irq_desc(struct irq_desc *desc)
+{
+        desc->status  = IRQ_DISABLED;
+        desc->handler = &no_irq_type;
+        desc->action  = NULL;
+        desc->depth   = 1;
+        desc->msi_desc = NULL;
+        spin_lock_init(&desc->lock);
+        cpus_setall(desc->affinity);
+}
+
+static void init_one_irq_status(int irq)
+{
+    irq_status[irq] = IRQ_UNUSED;
+}
+
+int init_irq_data(void)
+{
+    struct irq_desc *desc;
+    int irq;
+
+    irq_desc = xmalloc_array(struct irq_desc, nr_irqs);
+    irq_status = xmalloc_array(int, nr_irqs);
+    irq_guest_eoi_timer = xmalloc_array(struct timer, nr_irqs);
+    irq_vector = xmalloc_array(u8, nr_irqs);
+    
+    if (!irq_desc || !irq_status ||! irq_vector || !irq_guest_eoi_timer)
+        return -1;
+
+    memset(irq_desc, 0,  nr_irqs * sizeof(*irq_desc));
+    memset(irq_status, 0,  nr_irqs * sizeof(*irq_status));
+    memset(irq_vector, 0, nr_irqs * sizeof(*irq_vector));
+    memset(irq_guest_eoi_timer, 0, nr_irqs * sizeof(*irq_guest_eoi_timer));
+    
+    for (irq = 0; irq < nr_irqs; irq++) {
+        desc = irq_to_desc(irq);
+        desc->irq = irq;
+        init_one_irq_desc(desc);
+        init_one_irq_status(irq);
+    }
+
+    return 0;
+}
+
 static void __do_IRQ_guest(int vector);
 
 void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs) { }
@@ -41,9 +190,9 @@ static void enable_none(unsigned int vec
 static void enable_none(unsigned int vector) { }
 static unsigned int startup_none(unsigned int vector) { return 0; }
 static void disable_none(unsigned int vector) { }
-static void ack_none(unsigned int vector)
-{
-    ack_bad_irq(vector);
+static void ack_none(unsigned int irq)
+{
+    ack_bad_irq(irq);
 }
 
 #define shutdown_none   disable_none
@@ -61,33 +210,15 @@ struct hw_interrupt_type no_irq_type = {
 
 atomic_t irq_err_count;
 
-int free_irq_vector(int vector)
-{
-    int irq;
-
-    BUG_ON((vector > LAST_DYNAMIC_VECTOR) || (vector < FIRST_DYNAMIC_VECTOR));
-
-    spin_lock(&vector_lock);
-    if ((irq = vector_irq[vector]) == AUTO_ASSIGN_IRQ)
-        vector_irq[vector] = FREE_TO_ASSIGN_IRQ;
-    spin_unlock(&vector_lock);
-
-    return (irq == AUTO_ASSIGN_IRQ) ? 0 : -EINVAL;
-}
-
-int assign_irq_vector(int irq)
+int __assign_irq_vector(int irq)
 {
     static unsigned current_vector = FIRST_DYNAMIC_VECTOR;
     unsigned vector;
 
-    BUG_ON(irq >= nr_irqs_gsi && irq != AUTO_ASSIGN_IRQ);
-
-    spin_lock(&vector_lock);
-
-    if ((irq != AUTO_ASSIGN_IRQ) && (irq_to_vector(irq) > 0)) {
-        spin_unlock(&vector_lock);
+    BUG_ON(irq >= nr_irqs || irq < 0);
+
+    if ((irq_to_vector(irq) > 0)) 
         return irq_to_vector(irq);
-    }
 
     vector = current_vector;
     while (vector_irq[vector] != FREE_TO_ASSIGN_IRQ) {
@@ -95,40 +226,59 @@ int assign_irq_vector(int irq)
         if (vector > LAST_DYNAMIC_VECTOR)
             vector = FIRST_DYNAMIC_VECTOR + ((vector + 1) & 7);
 
-        if (vector == current_vector) {
-            spin_unlock(&vector_lock);
+        if (vector == current_vector)
             return -ENOSPC;
-        }
     }
 
     current_vector = vector;
     vector_irq[vector] = irq;
-    if (irq != AUTO_ASSIGN_IRQ)
-        IO_APIC_VECTOR(irq) = vector;
-
-    spin_unlock(&vector_lock);
+    irq_vector[irq] = vector;
+    irq_status[irq] = IRQ_USED;
 
     return vector;
 }
 
+int assign_irq_vector(int irq)
+{
+    int ret;
+    unsigned long flags;
+    
+    spin_lock_irqsave(&vector_lock, flags);
+    ret = __assign_irq_vector(irq);
+    spin_unlock_irqrestore(&vector_lock, flags);
+
+    return ret;
+}
+
+
 asmlinkage void do_IRQ(struct cpu_user_regs *regs)
 {
-    unsigned int      vector = regs->entry_vector;
-    irq_desc_t       *desc = &irq_desc[vector];
     struct irqaction *action;
     uint32_t          tsc_in;
-
+    unsigned int      vector = regs->entry_vector;
+    int irq = vector_irq[vector];
+    struct irq_desc  *desc;
+    
     perfc_incr(irqs);
 
+    if (irq < 0) {
+        ack_APIC_irq();
+        printk("%s: %d.%d No irq handler for vector (irq %d)\n",
+                __func__, smp_processor_id(), vector, irq);
+        return;
+    }
+
+    desc = irq_to_desc(irq);
+
     spin_lock(&desc->lock);
-    desc->handler->ack(vector);
+    desc->handler->ack(irq);
 
     if ( likely(desc->status & IRQ_GUEST) )
     {
         irq_enter();
         tsc_in = tb_init_done ? get_cycles() : 0;
-        __do_IRQ_guest(vector);
-        TRACE_3D(TRC_TRACE_IRQ, vector, tsc_in, get_cycles());
+        __do_IRQ_guest(irq);
+        TRACE_3D(TRC_TRACE_IRQ, irq, tsc_in, get_cycles());
         irq_exit();
         spin_unlock(&desc->lock);
         return;
@@ -153,8 +303,8 @@ asmlinkage void do_IRQ(struct cpu_user_r
         irq_enter();
         spin_unlock_irq(&desc->lock);
         tsc_in = tb_init_done ? get_cycles() : 0;
-        action->handler(vector_to_irq(vector), action->dev_id, regs);
-        TRACE_3D(TRC_TRACE_IRQ, vector, tsc_in, get_cycles());
+        action->handler(irq, action->dev_id, regs);
+        TRACE_3D(TRC_TRACE_IRQ, irq, tsc_in, get_cycles());
         spin_lock_irq(&desc->lock);
         irq_exit();
     }
@@ -162,11 +312,11 @@ asmlinkage void do_IRQ(struct cpu_user_r
     desc->status &= ~IRQ_INPROGRESS;
 
  out:
-    desc->handler->end(vector);
+    desc->handler->end(irq);
     spin_unlock(&desc->lock);
 }
 
-int request_irq_vector(unsigned int vector,
+int request_irq(unsigned int irq,
         void (*handler)(int, void *, struct cpu_user_regs *),
         unsigned long irqflags, const char * devname, void *dev_id)
 {
@@ -179,7 +329,7 @@ int request_irq_vector(unsigned int vect
      * which interrupt is which (messes up the interrupt freeing
      * logic etc).
      */
-    if (vector >= NR_VECTORS)
+    if (irq >= nr_irqs)
         return -EINVAL;
     if (!handler)
         return -EINVAL;
@@ -192,33 +342,42 @@ int request_irq_vector(unsigned int vect
     action->name = devname;
     action->dev_id = dev_id;
 
-    retval = setup_irq_vector(vector, action);
+    retval = setup_irq(irq, action);
     if (retval)
         xfree(action);
 
     return retval;
 }
 
-void release_irq_vector(unsigned int vector)
-{
-    irq_desc_t *desc = &irq_desc[vector];
+void release_irq(unsigned int irq)
+{
+    struct irq_desc *desc;
     unsigned long flags;
+    struct irqaction *action;
+
+    desc = irq_to_desc(irq);
 
     spin_lock_irqsave(&desc->lock,flags);
+    action = desc->action;
     desc->action  = NULL;
     desc->depth   = 1;
     desc->status |= IRQ_DISABLED;
-    desc->handler->shutdown(vector);
+    desc->handler->shutdown(irq);
     spin_unlock_irqrestore(&desc->lock,flags);
 
     /* Wait to make sure it's not being used on another CPU */
     do { smp_mb(); } while ( desc->status & IRQ_INPROGRESS );
-}
-
-int setup_irq_vector(unsigned int vector, struct irqaction *new)
-{
-    irq_desc_t *desc = &irq_desc[vector];
+
+    if (action)
+        xfree(action);
+}
+
+int setup_irq(unsigned int irq, struct irqaction *new)
+{
+    struct irq_desc *desc;
     unsigned long flags;
+
+    desc = irq_to_desc(irq);
  
     spin_lock_irqsave(&desc->lock,flags);
 
@@ -231,7 +390,7 @@ int setup_irq_vector(unsigned int vector
     desc->action  = new;
     desc->depth   = 0;
     desc->status &= ~IRQ_DISABLED;
-    desc->handler->startup(vector);
+    desc->handler->startup(irq);
 
     spin_unlock_irqrestore(&desc->lock,flags);
 
@@ -261,9 +420,10 @@ typedef struct {
  * order, as only the current highest-priority pending irq can be EOIed.
  */
 struct pending_eoi {
-    u8 vector; /* Vector awaiting EOI */
+    u8 vector; /* vector awaiting EOI */
     u8 ready;  /* Ready for EOI now?  */
 };
+
 static DEFINE_PER_CPU(struct pending_eoi, pending_eoi[NR_VECTORS]);
 #define pending_eoi_sp(p) ((p)[NR_VECTORS-1].vector)
 
@@ -279,26 +439,25 @@ static inline void clear_pirq_eoi(struct
         clear_bit(irq, d->arch.pirq_eoi_map);
 }
 
-static void _irq_guest_eoi(irq_desc_t *desc)
+static void _irq_guest_eoi(struct irq_desc *desc)
 {
     irq_guest_action_t *action = (irq_guest_action_t *)desc->action;
-    unsigned int i, vector = desc - irq_desc;
+    unsigned int i, irq = desc - irq_desc;
 
     if ( !(desc->status & IRQ_GUEST_EOI_PENDING) )
         return;
 
     for ( i = 0; i < action->nr_guests; ++i )
         clear_pirq_eoi(action->guest[i],
-                       domain_vector_to_irq(action->guest[i], vector));
+                       domain_irq_to_pirq(action->guest[i], irq));
 
     desc->status &= ~(IRQ_INPROGRESS|IRQ_GUEST_EOI_PENDING);
-    desc->handler->enable(vector);
-}
-
-static struct timer irq_guest_eoi_timer[NR_VECTORS];
+    desc->handler->enable(irq);
+}
+
 static void irq_guest_eoi_timer_fn(void *data)
 {
-    irq_desc_t *desc = data;
+    struct irq_desc *desc = data;
     unsigned long flags;
 
     spin_lock_irqsave(&desc->lock, flags);
@@ -306,20 +465,21 @@ static void irq_guest_eoi_timer_fn(void 
     spin_unlock_irqrestore(&desc->lock, flags);
 }
 
-static void __do_IRQ_guest(int vector)
-{
-    irq_desc_t         *desc = &irq_desc[vector];
+static void __do_IRQ_guest(int irq)
+{
+    struct irq_desc         *desc = irq_to_desc(irq);
     irq_guest_action_t *action = (irq_guest_action_t *)desc->action;
     struct domain      *d;
     int                 i, sp, already_pending = 0;
     struct pending_eoi *peoi = this_cpu(pending_eoi);
+    int vector = irq_to_vector(irq);
 
     if ( unlikely(action->nr_guests == 0) )
     {
         /* An interrupt may slip through while freeing an ACKTYPE_EOI irq. */
         ASSERT(action->ack_type == ACKTYPE_EOI);
         ASSERT(desc->status & IRQ_DISABLED);
-        desc->handler->end(vector);
+        desc->handler->end(irq);
         return;
     }
 
@@ -336,13 +496,13 @@ static void __do_IRQ_guest(int vector)
 
     for ( i = 0; i < action->nr_guests; i++ )
     {
-        unsigned int irq;
+        unsigned int pirq;
         d = action->guest[i];
-        irq = domain_vector_to_irq(d, vector);
+        pirq = domain_irq_to_pirq(d, irq);
         if ( (action->ack_type != ACKTYPE_NONE) &&
-             !test_and_set_bit(irq, d->pirq_mask) )
+             !test_and_set_bit(pirq, d->pirq_mask) )
             action->in_flight++;
-        if ( hvm_do_IRQ_dpci(d, irq) )
+        if ( hvm_do_IRQ_dpci(d, pirq) )
         {
             if ( action->ack_type == ACKTYPE_NONE )
             {
@@ -350,7 +510,7 @@ static void __do_IRQ_guest(int vector)
                 desc->status |= IRQ_INPROGRESS; /* cleared during hvm eoi */
             }
         }
-        else if ( send_guest_pirq(d, irq) &&
+        else if ( send_guest_pirq(d, pirq) &&
                   (action->ack_type == ACKTYPE_NONE) )
         {
             already_pending++;
@@ -359,13 +519,13 @@ static void __do_IRQ_guest(int vector)
 
     if ( already_pending == action->nr_guests )
     {
-        stop_timer(&irq_guest_eoi_timer[vector]);
-        desc->handler->disable(vector);
+        stop_timer(&irq_guest_eoi_timer[irq]);
+        desc->handler->disable(irq);
         desc->status |= IRQ_GUEST_EOI_PENDING;
         for ( i = 0; i < already_pending; ++i )
         {
             d = action->guest[i];
-            set_pirq_eoi(d, domain_vector_to_irq(d, vector));
+            set_pirq_eoi(d, domain_irq_to_pirq(d, irq));
             /*
              * Could check here whether the guest unmasked the event by now
              * (or perhaps just re-issue the send_guest_pirq()), and if it
@@ -375,9 +535,9 @@ static void __do_IRQ_guest(int vector)
              * - skip the timer setup below.
              */
         }
-        init_timer(&irq_guest_eoi_timer[vector],
+        init_timer(&irq_guest_eoi_timer[irq],
                    irq_guest_eoi_timer_fn, desc, smp_processor_id());
-        set_timer(&irq_guest_eoi_timer[vector], NOW() + MILLISECS(1));
+        set_timer(&irq_guest_eoi_timer[irq], NOW() + MILLISECS(1));
     }
 }
 
@@ -386,21 +546,21 @@ static void __do_IRQ_guest(int vector)
  * The descriptor is returned locked. This function is safe against changes
  * to the per-domain irq-to-vector mapping.
  */
-irq_desc_t *domain_spin_lock_irq_desc(
-    struct domain *d, int irq, unsigned long *pflags)
-{
-    unsigned int vector;
+struct irq_desc *domain_spin_lock_irq_desc(
+    struct domain *d, int pirq, unsigned long *pflags)
+{
+    unsigned int irq;
     unsigned long flags;
-    irq_desc_t *desc;
+    struct irq_desc *desc;
 
     for ( ; ; )
     {
-        vector = domain_irq_to_vector(d, irq);
-        if ( vector <= 0 )
+        irq = domain_pirq_to_irq(d, pirq);
+        if ( irq <= 0 )
             return NULL;
-        desc = &irq_desc[vector];
+        desc = irq_to_desc(irq);
         spin_lock_irqsave(&desc->lock, flags);
-        if ( vector == domain_irq_to_vector(d, irq) )
+        if ( irq == domain_pirq_to_irq(d, pirq) )
             break;
         spin_unlock_irqrestore(&desc->lock, flags);
     }
@@ -414,8 +574,8 @@ static void flush_ready_eoi(void)
 static void flush_ready_eoi(void)
 {
     struct pending_eoi *peoi = this_cpu(pending_eoi);
-    irq_desc_t         *desc;
-    int                 vector, sp;
+    struct irq_desc         *desc;
+    int                irq, sp;
 
     ASSERT(!local_irq_is_enabled());
 
@@ -423,23 +583,23 @@ static void flush_ready_eoi(void)
 
     while ( (--sp >= 0) && peoi[sp].ready )
     {
-        vector = peoi[sp].vector;
-        desc = &irq_desc[vector];
+        irq = vector_irq[peoi[sp].vector];
+        desc = irq_to_desc(irq);
         spin_lock(&desc->lock);
-        desc->handler->end(vector);
+        desc->handler->end(irq);
         spin_unlock(&desc->lock);
     }
 
     pending_eoi_sp(peoi) = sp+1;
 }
 
-static void __set_eoi_ready(irq_desc_t *desc)
+static void __set_eoi_ready(struct irq_desc *desc)
 {
     irq_guest_action_t *action = (irq_guest_action_t *)desc->action;
     struct pending_eoi *peoi = this_cpu(pending_eoi);
-    int                 vector, sp;
-
-    vector = desc - irq_desc;
+    int                 irq, sp;
+
+    irq = desc - irq_desc;
 
     if ( !(desc->status & IRQ_GUEST) ||
          (action->in_flight != 0) ||
@@ -449,7 +609,7 @@ static void __set_eoi_ready(irq_desc_t *
     sp = pending_eoi_sp(peoi);
     do {
         ASSERT(sp > 0);
-    } while ( peoi[--sp].vector != vector );
+    } while ( peoi[--sp].vector != irq_to_vector(irq) );
     ASSERT(!peoi[sp].ready);
     peoi[sp].ready = 1;
 }
@@ -457,7 +617,7 @@ static void __set_eoi_ready(irq_desc_t *
 /* Mark specified IRQ as ready-for-EOI (if it really is) and attempt to EOI. */
 static void set_eoi_ready(void *data)
 {
-    irq_desc_t *desc = data;
+    struct irq_desc *desc = data;
 
     ASSERT(!local_irq_is_enabled());
 
@@ -468,29 +628,29 @@ static void set_eoi_ready(void *data)
     flush_ready_eoi();
 }
 
-static void __pirq_guest_eoi(struct domain *d, int irq)
-{
-    irq_desc_t         *desc;
+static void __pirq_guest_eoi(struct domain *d, int pirq)
+{
+    struct irq_desc         *desc;
     irq_guest_action_t *action;
     cpumask_t           cpu_eoi_map;
-    int                 vector;
+    int                 irq;
 
     ASSERT(local_irq_is_enabled());
-    desc = domain_spin_lock_irq_desc(d, irq, NULL);
+    desc = domain_spin_lock_irq_desc(d, pirq, NULL);
     if ( desc == NULL )
         return;
 
     action = (irq_guest_action_t *)desc->action;
-    vector = desc - irq_desc;
+    irq = desc - irq_desc;
 
     if ( action->ack_type == ACKTYPE_NONE )
     {
-        ASSERT(!test_bit(irq, d->pirq_mask));
-        stop_timer(&irq_guest_eoi_timer[vector]);
+        ASSERT(!test_bit(pirq, d->pirq_mask));
+        stop_timer(&irq_guest_eoi_timer[irq]);
         _irq_guest_eoi(desc);
     }
 
-    if ( unlikely(!test_and_clear_bit(irq, d->pirq_mask)) ||
+    if ( unlikely(!test_and_clear_bit(pirq, d->pirq_mask)) ||
          unlikely(--action->in_flight != 0) )
     {
         spin_unlock_irq(&desc->lock);
@@ -500,7 +660,7 @@ static void __pirq_guest_eoi(struct doma
     if ( action->ack_type == ACKTYPE_UNMASK )
     {
         ASSERT(cpus_empty(action->cpu_eoi_map));
-        desc->handler->end(vector);
+        desc->handler->end(irq);
         spin_unlock_irq(&desc->lock);
         return;
     }
@@ -527,7 +687,7 @@ static void __pirq_guest_eoi(struct doma
 
 int pirq_guest_eoi(struct domain *d, int irq)
 {
-    if ( (irq < 0) || (irq >= d->nr_pirqs) )
+    if ( (irq < 0) || (irq > d->nr_pirqs) )
         return -EINVAL;
 
     __pirq_guest_eoi(d, irq);
@@ -551,16 +711,16 @@ int pirq_guest_unmask(struct domain *d)
 }
 
 extern int ioapic_ack_new;
-static int pirq_acktype(struct domain *d, int irq)
-{
-    irq_desc_t  *desc;
-    unsigned int vector;
-
-    vector = domain_irq_to_vector(d, irq);
-    if ( vector <= 0 )
+static int pirq_acktype(struct domain *d, int pirq)
+{
+    struct irq_desc  *desc;
+    unsigned int irq;
+
+    irq = domain_pirq_to_irq(d, pirq);
+    if ( irq <= 0 )
         return ACKTYPE_NONE;
 
-    desc = &irq_desc[vector];
+    desc = irq_to_desc(irq);
 
     if ( desc->handler == &no_irq_type )
         return ACKTYPE_NONE;
@@ -597,14 +757,14 @@ static int pirq_acktype(struct domain *d
     return 0;
 }
 
-int pirq_shared(struct domain *d, int irq)
-{
-    irq_desc_t         *desc;
+int pirq_shared(struct domain *d, int pirq)
+{
+    struct irq_desc         *desc;
     irq_guest_action_t *action;
     unsigned long       flags;
     int                 shared;
 
-    desc = domain_spin_lock_irq_desc(d, irq, &flags);
+    desc = domain_spin_lock_irq_desc(d, pirq, &flags);
     if ( desc == NULL )
         return 0;
 
@@ -616,10 +776,10 @@ int pirq_shared(struct domain *d, int ir
     return shared;
 }
 
-int pirq_guest_bind(struct vcpu *v, int irq, int will_share)
-{
-    unsigned int        vector;
-    irq_desc_t         *desc;
+int pirq_guest_bind(struct vcpu *v, int pirq, int will_share)
+{
+    unsigned int        irq;
+    struct irq_desc         *desc;
     irq_guest_action_t *action, *newaction = NULL;
     int                 rc = 0;
     cpumask_t           cpumask = CPU_MASK_NONE;
@@ -628,7 +788,7 @@ int pirq_guest_bind(struct vcpu *v, int 
     BUG_ON(!local_irq_is_enabled());
 
  retry:
-    desc = domain_spin_lock_irq_desc(v->domain, irq, NULL);
+    desc = domain_spin_lock_irq_desc(v->domain, pirq, NULL);
     if ( desc == NULL )
     {
         rc = -EINVAL;
@@ -636,7 +796,7 @@ int pirq_guest_bind(struct vcpu *v, int 
     }
 
     action = (irq_guest_action_t *)desc->action;
-    vector = desc - irq_desc;
+    irq = desc - irq_desc;
 
     if ( !(desc->status & IRQ_GUEST) )
     {
@@ -644,7 +804,7 @@ int pirq_guest_bind(struct vcpu *v, int 
         {
             gdprintk(XENLOG_INFO,
                     "Cannot bind IRQ %d to guest. In use by '%s'.\n",
-                    irq, desc->action->name);
+                    pirq, desc->action->name);
             rc = -EBUSY;
             goto unlock_out;
         }
@@ -656,7 +816,7 @@ int pirq_guest_bind(struct vcpu *v, int 
                 goto retry;
             gdprintk(XENLOG_INFO,
                      "Cannot bind IRQ %d to guest. Out of memory.\n",
-                     irq);
+                     pirq);
             rc = -ENOMEM;
             goto out;
         }
@@ -668,23 +828,23 @@ int pirq_guest_bind(struct vcpu *v, int 
         action->nr_guests   = 0;
         action->in_flight   = 0;
         action->shareable   = will_share;
-        action->ack_type    = pirq_acktype(v->domain, irq);
+        action->ack_type    = pirq_acktype(v->domain, pirq);
         cpus_clear(action->cpu_eoi_map);
 
         desc->depth = 0;
         desc->status |= IRQ_GUEST;
         desc->status &= ~IRQ_DISABLED;
-        desc->handler->startup(vector);
+        desc->handler->startup(irq);
 
         /* Attempt to bind the interrupt target to the correct CPU. */
         cpu_set(v->processor, cpumask);
         if ( !opt_noirqbalance && (desc->handler->set_affinity != NULL) )
-            desc->handler->set_affinity(vector, cpumask);
+            desc->handler->set_affinity(irq, cpumask);
     }
     else if ( !will_share || !action->shareable )
     {
         gdprintk(XENLOG_INFO, "Cannot bind IRQ %d to guest. %s.\n",
-                 irq,
+                 pirq,
                  will_share ?
                  "Others do not share" :
                  "Will not share with others");
@@ -707,7 +867,7 @@ int pirq_guest_bind(struct vcpu *v, int 
     if ( action->nr_guests == IRQ_MAX_GUESTS )
     {
         gdprintk(XENLOG_INFO, "Cannot bind IRQ %d to guest. "
-               "Already at max share.\n", irq);
+               "Already at max share.\n", pirq);
         rc = -EBUSY;
         goto unlock_out;
     }
@@ -715,9 +875,9 @@ int pirq_guest_bind(struct vcpu *v, int 
     action->guest[action->nr_guests++] = v->domain;
 
     if ( action->ack_type != ACKTYPE_NONE )
-        set_pirq_eoi(v->domain, irq);
+        set_pirq_eoi(v->domain, pirq);
     else
-        clear_pirq_eoi(v->domain, irq);
+        clear_pirq_eoi(v->domain, pirq);
 
  unlock_out:
     spin_unlock_irq(&desc->lock);
@@ -728,9 +888,9 @@ int pirq_guest_bind(struct vcpu *v, int 
 }
 
 static irq_guest_action_t *__pirq_guest_unbind(
-    struct domain *d, int irq, irq_desc_t *desc)
-{
-    unsigned int        vector;
+    struct domain *d, int pirq, struct irq_desc *desc)
+{
+    unsigned int        irq;
     irq_guest_action_t *action;
     cpumask_t           cpu_eoi_map;
     int                 i;
@@ -738,7 +898,7 @@ static irq_guest_action_t *__pirq_guest_
     BUG_ON(!(desc->status & IRQ_GUEST));
 
     action = (irq_guest_action_t *)desc->action;
-    vector = desc - irq_desc;
+    irq = desc - irq_desc;
 
     for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++ )
         continue;
@@ -749,13 +909,13 @@ static irq_guest_action_t *__pirq_guest_
     switch ( action->ack_type )
     {
     case ACKTYPE_UNMASK:
-        if ( test_and_clear_bit(irq, d->pirq_mask) &&
+        if ( test_and_clear_bit(pirq, d->pirq_mask) &&
              (--action->in_flight == 0) )
-            desc->handler->end(vector);
+            desc->handler->end(irq);
         break;
     case ACKTYPE_EOI:
         /* NB. If #guests == 0 then we clear the eoi_map later on. */
-        if ( test_and_clear_bit(irq, d->pirq_mask) &&
+        if ( test_and_clear_bit(pirq, d->pirq_mask) &&
              (--action->in_flight == 0) &&
              (action->nr_guests != 0) )
         {
@@ -766,7 +926,7 @@ static irq_guest_action_t *__pirq_guest_
         }
         break;
     case ACKTYPE_NONE:
-        stop_timer(&irq_guest_eoi_timer[vector]);
+        stop_timer(&irq_guest_eoi_timer[irq]);
         _irq_guest_eoi(desc);
         break;
     }
@@ -775,7 +935,7 @@ static irq_guest_action_t *__pirq_guest_
      * The guest cannot re-bind to this IRQ until this function returns. So,
      * when we have flushed this IRQ from pirq_mask, it should remain flushed.
      */
-    BUG_ON(test_bit(irq, d->pirq_mask));
+    BUG_ON(test_bit(pirq, d->pirq_mask));
 
     if ( action->nr_guests != 0 )
         return NULL;
@@ -785,7 +945,7 @@ static irq_guest_action_t *__pirq_guest_
     /* Disabling IRQ before releasing the desc_lock avoids an IRQ storm. */
     desc->depth   = 1;
     desc->status |= IRQ_DISABLED;
-    desc->handler->disable(vector);
+    desc->handler->disable(irq);
 
     /*
      * Mark any remaining pending EOIs as ready to flush.
@@ -808,35 +968,35 @@ static irq_guest_action_t *__pirq_guest_
     desc->action = NULL;
     desc->status &= ~IRQ_GUEST;
     desc->status &= ~IRQ_INPROGRESS;
-    kill_timer(&irq_guest_eoi_timer[vector]);
-    desc->handler->shutdown(vector);
+    kill_timer(&irq_guest_eoi_timer[irq]);
+    desc->handler->shutdown(irq);
 
     /* Caller frees the old guest descriptor block. */
     return action;
 }
 
-void pirq_guest_unbind(struct domain *d, int irq)
+void pirq_guest_unbind(struct domain *d, int pirq)
 {
     irq_guest_action_t *oldaction = NULL;
-    irq_desc_t *desc;
-    int vector;
+    struct irq_desc *desc;
+    int irq;
 
     WARN_ON(!spin_is_locked(&d->event_lock));
 
     BUG_ON(!local_irq_is_enabled());
-    desc = domain_spin_lock_irq_desc(d, irq, NULL);
+    desc = domain_spin_lock_irq_desc(d, pirq, NULL);
 
     if ( desc == NULL )
     {
-        vector = -domain_irq_to_vector(d, irq);
-        BUG_ON(vector <= 0);
-        desc = &irq_desc[vector];
+        irq = -domain_pirq_to_irq(d, pirq);
+        BUG_ON(irq <= 0);
+        desc = irq_to_desc(irq);
         spin_lock_irq(&desc->lock);
-        d->arch.pirq_vector[irq] = d->arch.vector_pirq[vector] = 0;
+        d->arch.pirq_irq[pirq] = d->arch.irq_pirq[irq] = 0;
     }
     else
     {
-        oldaction = __pirq_guest_unbind(d, irq, desc);
+        oldaction = __pirq_guest_unbind(d, pirq, desc);
     }
 
     spin_unlock_irq(&desc->lock);
@@ -847,7 +1007,7 @@ void pirq_guest_unbind(struct domain *d,
 
 static int pirq_guest_force_unbind(struct domain *d, int irq)
 {
-    irq_desc_t *desc;
+    struct irq_desc *desc;
     irq_guest_action_t *action, *oldaction = NULL;
     int i, bound = 0;
 
@@ -887,7 +1047,7 @@ int get_free_pirq(struct domain *d, int 
     if ( type == MAP_PIRQ_TYPE_GSI )
     {
         for ( i = 16; i < nr_irqs_gsi; i++ )
-            if ( !d->arch.pirq_vector[i] )
+            if ( !d->arch.pirq_irq[i] )
                 break;
         if ( i == nr_irqs_gsi )
             return -ENOSPC;
@@ -895,7 +1055,7 @@ int get_free_pirq(struct domain *d, int 
     else
     {
         for ( i = d->nr_pirqs - 1; i >= 16; i-- )
-            if ( !d->arch.pirq_vector[i] )
+            if ( !d->arch.pirq_irq[i] )
                 break;
         if ( i == 16 )
             return -ENOSPC;
@@ -905,11 +1065,11 @@ int get_free_pirq(struct domain *d, int 
 }
 
 int map_domain_pirq(
-    struct domain *d, int pirq, int vector, int type, void *data)
+    struct domain *d, int pirq, int irq, int type, void *data)
 {
     int ret = 0;
-    int old_vector, old_pirq;
-    irq_desc_t *desc;
+    int old_irq, old_pirq;
+    struct irq_desc *desc;
     unsigned long flags;
     struct msi_desc *msi_desc;
     struct pci_dev *pdev = NULL;
@@ -920,21 +1080,21 @@ int map_domain_pirq(
     if ( !IS_PRIV(current->domain) )
         return -EPERM;
 
-    if ( pirq < 0 || pirq >= d->nr_pirqs || vector < 0 || vector >= NR_VECTORS )
-    {
-        dprintk(XENLOG_G_ERR, "dom%d: invalid pirq %d or vector %d\n",
-                d->domain_id, pirq, vector);
+    if ( pirq < 0 || pirq >= d->nr_pirqs || irq < 0 || irq >= nr_irqs )
+    {
+        dprintk(XENLOG_G_ERR, "dom%d: invalid pirq %d or irq %d\n",
+                d->domain_id, pirq, irq);
         return -EINVAL;
     }
 
-    old_vector = domain_irq_to_vector(d, pirq);
-    old_pirq = domain_vector_to_irq(d, vector);
-
-    if ( (old_vector && (old_vector != vector) ) ||
+    old_irq = domain_pirq_to_irq(d, pirq);
+    old_pirq = domain_irq_to_pirq(d, irq);
+
+    if ( (old_irq && (old_irq != irq) ) ||
          (old_pirq && (old_pirq != pirq)) )
     {
-        dprintk(XENLOG_G_ERR, "dom%d: pirq %d or vector %d already mapped\n",
-                d->domain_id, pirq, vector);
+        dprintk(XENLOG_G_ERR, "dom%d: pirq %d or irq %d already mapped\n",
+                d->domain_id, pirq, irq);
         return -EINVAL;
     }
 
@@ -946,7 +1106,7 @@ int map_domain_pirq(
         return ret;
     }
 
-    desc = &irq_desc[vector];
+    desc = irq_to_desc(irq);
 
     if ( type == MAP_PIRQ_TYPE_MSI )
     {
@@ -964,18 +1124,18 @@ int map_domain_pirq(
         spin_lock_irqsave(&desc->lock, flags);
 
         if ( desc->handler != &no_irq_type )
-            dprintk(XENLOG_G_ERR, "dom%d: vector %d in use\n",
-              d->domain_id, vector);
+            dprintk(XENLOG_G_ERR, "dom%d: irq %d in use\n",
+              d->domain_id, irq);
         desc->handler = &pci_msi_type;
-        d->arch.pirq_vector[pirq] = vector;
-        d->arch.vector_pirq[vector] = pirq;
-        setup_msi_irq(pdev, msi_desc);
+        d->arch.pirq_irq[pirq] = irq;
+        d->arch.irq_pirq[irq] = pirq;
+        setup_msi_irq(pdev, msi_desc, irq);
         spin_unlock_irqrestore(&desc->lock, flags);
     } else
     {
         spin_lock_irqsave(&desc->lock, flags);
-        d->arch.pirq_vector[pirq] = vector;
-        d->arch.vector_pirq[vector] = pirq;
+        d->arch.pirq_irq[pirq] = irq;
+        d->arch.irq_pirq[irq] = pirq;
         spin_unlock_irqrestore(&desc->lock, flags);
     }
 
@@ -987,8 +1147,8 @@ int unmap_domain_pirq(struct domain *d, 
 int unmap_domain_pirq(struct domain *d, int pirq)
 {
     unsigned long flags;
-    irq_desc_t *desc;
-    int vector, ret = 0;
+    struct irq_desc *desc;
+    int irq, ret = 0;
     bool_t forced_unbind;
     struct msi_desc *msi_desc = NULL;
 
@@ -1001,8 +1161,8 @@ int unmap_domain_pirq(struct domain *d, 
     ASSERT(spin_is_locked(&pcidevs_lock));
     ASSERT(spin_is_locked(&d->event_lock));
 
-    vector = domain_irq_to_vector(d, pirq);
-    if ( vector <= 0 )
+    irq = domain_pirq_to_irq(d, pirq);
+    if ( irq <= 0 )
     {
         dprintk(XENLOG_G_ERR, "dom%d: pirq %d not mapped\n",
                 d->domain_id, pirq);
@@ -1015,44 +1175,41 @@ int unmap_domain_pirq(struct domain *d, 
         dprintk(XENLOG_G_WARNING, "dom%d: forcing unbind of pirq %d\n",
                 d->domain_id, pirq);
 
-    desc = &irq_desc[vector];
+    desc = irq_to_desc(irq);
 
     if ( (msi_desc = desc->msi_desc) != NULL )
         pci_disable_msi(msi_desc);
 
     spin_lock_irqsave(&desc->lock, flags);
 
-    BUG_ON(vector != domain_irq_to_vector(d, pirq));
+    BUG_ON(irq != domain_pirq_to_irq(d, pirq));
 
     if ( msi_desc )
-        teardown_msi_vector(vector);
-
-    if ( desc->handler == &pci_msi_type )
-        desc->handler = &no_irq_type;
+        teardown_msi_irq(irq);
 
     if ( !forced_unbind )
     {
-        d->arch.pirq_vector[pirq] = 0;
-        d->arch.vector_pirq[vector] = 0;
+        d->arch.pirq_irq[pirq] = 0;
+        d->arch.irq_pirq[irq] = 0;
     }
     else
     {
-        d->arch.pirq_vector[pirq] = -vector;
-        d->arch.vector_pirq[vector] = -pirq;
+        d->arch.pirq_irq[pirq] = -irq;
+        d->arch.irq_pirq[irq] = -pirq;
     }
 
     spin_unlock_irqrestore(&desc->lock, flags);
     if (msi_desc)
-    {
-        msi_free_vector(msi_desc);
-        free_irq_vector(vector);
-    }
+        msi_free_irq(msi_desc);
 
     ret = irq_deny_access(d, pirq);
     if ( ret )
         dprintk(XENLOG_G_ERR, "dom%d: could not deny access to irq %d\n",
                 d->domain_id, pirq);
 
+    if ( desc->handler == &pci_msi_type )
+        desc->handler = &no_irq_type;
+
  done:
     return ret;
 }
@@ -1065,7 +1222,7 @@ void free_domain_pirqs(struct domain *d)
     spin_lock(&d->event_lock);
 
     for ( i = 0; i < d->nr_pirqs; i++ )
-        if ( d->arch.pirq_vector[i] > 0 )
+        if ( d->arch.pirq_irq[i] > 0 )
             unmap_domain_pirq(d, i);
 
     spin_unlock(&d->event_lock);
@@ -1077,7 +1234,7 @@ static void dump_irqs(unsigned char key)
 static void dump_irqs(unsigned char key)
 {
     int i, glob_irq, irq, vector;
-    irq_desc_t *desc;
+    struct irq_desc *desc;
     irq_guest_action_t *action;
     struct domain *d;
     unsigned long flags;
@@ -1088,8 +1245,10 @@ static void dump_irqs(unsigned char key)
     {
 
         glob_irq = vector_to_irq(vector);
-
-        desc = &irq_desc[vector];
+        if (glob_irq < 0)
+            continue;
+
+        desc = irq_to_desc(glob_irq);
         if ( desc == NULL || desc->handler == &no_irq_type )
             continue;
 
@@ -1111,7 +1270,7 @@ static void dump_irqs(unsigned char key)
             for ( i = 0; i < action->nr_guests; i++ )
             {
                 d = action->guest[i];
-                irq = domain_vector_to_irq(d, vector);
+                irq = domain_irq_to_pirq(d, vector_irq[vector]);
                 printk("%u:%3d(%c%c%c%c)",
                        d->domain_id, irq,
                        (test_bit(d->pirq_to_evtchn[glob_irq],
@@ -1172,7 +1331,7 @@ void fixup_irqs(cpumask_t map)
         if ( vector_to_irq(vector) == 2 )
             continue;
 
-        desc = &irq_desc[vector];
+        desc = irq_to_desc(vector_to_irq(vector));
 
         spin_lock_irqsave(&desc->lock, flags);
 
@@ -1200,9 +1359,9 @@ void fixup_irqs(cpumask_t map)
     /* Clean up cpu_eoi_map of every interrupt to exclude this CPU. */
     for ( vector = 0; vector < NR_VECTORS; vector++ )
     {
-        if ( !(irq_desc[vector].status & IRQ_GUEST) )
+        if ( !(irq_desc[vector_to_irq(vector)].status & IRQ_GUEST) )
             continue;
-        action = (irq_guest_action_t *)irq_desc[vector].action;
+        action = (irq_guest_action_t *)irq_desc[vector_to_irq(vector)].action;
         cpu_clear(smp_processor_id(), action->cpu_eoi_map);
     }
 
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/msi.c	Tue Aug 11 16:07:27 2009 +0800
@@ -116,11 +116,12 @@ static void msix_put_fixmap(struct pci_d
 /*
  * MSI message composition
  */
-void msi_compose_msg(struct pci_dev *pdev, int vector,
+void msi_compose_msg(struct pci_dev *pdev, int irq,
                             struct msi_msg *msg)
 {
     unsigned dest;
     cpumask_t tmp;
+    int vector = irq_to_vector(irq);
 
     tmp = TARGET_CPUS;
     if ( vector )
@@ -195,31 +196,31 @@ static void read_msi_msg(struct msi_desc
         iommu_read_msi_from_ire(entry, msg);
 }
 
-static int set_vector_msi(struct msi_desc *entry)
-{
-    if ( entry->vector >= NR_VECTORS )
-    {
-        dprintk(XENLOG_ERR, "Trying to install msi data for Vector %d\n",
-                entry->vector);
+static int set_irq_msi(struct msi_desc *entry)
+{
+    if ( entry->irq >= nr_irqs )
+    {
+        dprintk(XENLOG_ERR, "Trying to install msi data for irq %d\n",
+                entry->irq);
         return -EINVAL;
     }
 
-    irq_desc[entry->vector].msi_desc = entry;
+    irq_desc[entry->irq].msi_desc = entry;
     return 0;
 }
 
-static int unset_vector_msi(int vector)
-{
-    ASSERT(spin_is_locked(&irq_desc[vector].lock));
-
-    if ( vector >= NR_VECTORS )
-    {
-        dprintk(XENLOG_ERR, "Trying to uninstall msi data for Vector %d\n",
-                vector);
+static int unset_irq_msi(int irq)
+{
+    ASSERT(spin_is_locked(&irq_desc[irq].lock));
+
+    if ( irq >= nr_irqs )
+    {
+        dprintk(XENLOG_ERR, "Trying to uninstall msi data for irq %d\n",
+                irq);
         return -EINVAL;
     }
 
-    irq_desc[vector].msi_desc = NULL;
+    irq_desc[irq].msi_desc = NULL;
 
     return 0;
 }
@@ -271,9 +272,9 @@ static void write_msi_msg(struct msi_des
     entry->msg = *msg;
 }
 
-void set_msi_affinity(unsigned int vector, cpumask_t mask)
-{
-    struct msi_desc *desc = irq_desc[vector].msi_desc;
+void set_msi_affinity(unsigned int irq, cpumask_t mask)
+{
+    struct msi_desc *desc = irq_desc[irq].msi_desc;
     struct msi_msg msg;
     unsigned int dest;
 
@@ -286,7 +287,7 @@ void set_msi_affinity(unsigned int vecto
     if ( !desc )
         return;
 
-    ASSERT(spin_is_locked(&irq_desc[vector].lock));
+    ASSERT(spin_is_locked(&irq_desc[irq].lock));
     read_msi_msg(desc, &msg);
 
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
@@ -333,9 +334,9 @@ static void msix_set_enable(struct pci_d
     }
 }
 
-static void msix_flush_writes(unsigned int vector)
-{
-    struct msi_desc *entry = irq_desc[vector].msi_desc;
+static void msix_flush_writes(unsigned int irq)
+{
+    struct msi_desc *entry = irq_desc[irq].msi_desc;
 
     BUG_ON(!entry || !entry->dev);
     switch (entry->msi_attrib.type) {
@@ -361,11 +362,11 @@ int msi_maskable_irq(const struct msi_de
            || entry->msi_attrib.maskbit;
 }
 
-static void msi_set_mask_bit(unsigned int vector, int flag)
-{
-    struct msi_desc *entry = irq_desc[vector].msi_desc;
-
-    ASSERT(spin_is_locked(&irq_desc[vector].lock));
+static void msi_set_mask_bit(unsigned int irq, int flag)
+{
+    struct msi_desc *entry = irq_desc[irq].msi_desc;
+
+    ASSERT(spin_is_locked(&irq_desc[irq].lock));
     BUG_ON(!entry || !entry->dev);
     switch (entry->msi_attrib.type) {
     case PCI_CAP_ID_MSI:
@@ -397,16 +398,16 @@ static void msi_set_mask_bit(unsigned in
     entry->msi_attrib.masked = !!flag;
 }
 
-void mask_msi_vector(unsigned int vector)
-{
-    msi_set_mask_bit(vector, 1);
-    msix_flush_writes(vector);
-}
-
-void unmask_msi_vector(unsigned int vector)
-{
-    msi_set_mask_bit(vector, 0);
-    msix_flush_writes(vector);
+void mask_msi_irq(unsigned int irq)
+{
+    msi_set_mask_bit(irq, 1);
+    msix_flush_writes(irq);
+}
+
+void unmask_msi_irq(unsigned int irq)
+{
+    msi_set_mask_bit(irq, 0);
+    msix_flush_writes(irq);
 }
 
 static struct msi_desc* alloc_msi_entry(void)
@@ -424,23 +425,23 @@ static struct msi_desc* alloc_msi_entry(
     return entry;
 }
 
-int setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc)
+int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc, int irq)
 {
     struct msi_msg msg;
 
-    msi_compose_msg(dev, desc->vector, &msg);
-    set_vector_msi(desc);
-    write_msi_msg(irq_desc[desc->vector].msi_desc, &msg);
+    msi_compose_msg(dev, irq, &msg);
+    set_irq_msi(msidesc);
+    write_msi_msg(irq_desc[irq].msi_desc, &msg);
 
     return 0;
 }
 
-void teardown_msi_vector(int vector)
-{
-    unset_vector_msi(vector);
-}
-
-int msi_free_vector(struct msi_desc *entry)
+void teardown_msi_irq(int irq)
+{
+    unset_irq_msi(irq);
+}
+
+int msi_free_irq(struct msi_desc *entry)
 {
     if ( entry->msi_attrib.type == PCI_CAP_ID_MSIX )
     {
@@ -452,19 +453,20 @@ int msi_free_vector(struct msi_desc *ent
         msix_put_fixmap(entry->dev, virt_to_fix(start));
     }
     list_del(&entry->list);
+    destroy_irq(entry->irq);
     xfree(entry);
     return 0;
 }
 
 static struct msi_desc *find_msi_entry(struct pci_dev *dev,
-                                       int vector, int cap_id)
+                                       int irq, int cap_id)
 {
     struct msi_desc *entry;
 
     list_for_each_entry( entry, &dev->msi_list, list )
     {
         if ( entry->msi_attrib.type == cap_id &&
-             (vector == -1 || entry->vector == vector) )
+             (irq == -1 || entry->irq == irq) )
             return entry;
     }
 
@@ -481,7 +483,7 @@ static struct msi_desc *find_msi_entry(s
  * of an entry zero with the new MSI irq or non-zero for otherwise.
  **/
 static int msi_capability_init(struct pci_dev *dev,
-                               int vector,
+                               int irq,
                                struct msi_desc **desc)
 {
     struct msi_desc *entry;
@@ -507,7 +509,7 @@ static int msi_capability_init(struct pc
     entry->msi_attrib.maskbit = is_mask_bit_support(control);
     entry->msi_attrib.masked = 1;
     entry->msi_attrib.pos = pos;
-    entry->vector = vector;
+    entry->irq = irq;
     if ( is_mask_bit_support(control) )
         entry->mask_base = (void __iomem *)(long)msi_mask_bits_reg(pos,
                                                                    is_64bit_address(control));
@@ -594,7 +596,7 @@ static int msix_capability_init(struct p
     entry->msi_attrib.maskbit = 1;
     entry->msi_attrib.masked = 1;
     entry->msi_attrib.pos = pos;
-    entry->vector = msi->vector;
+    entry->irq = msi->irq;
     entry->dev = dev;
     entry->mask_base = base;
 
@@ -630,15 +632,15 @@ static int __pci_enable_msi(struct msi_i
     if ( !pdev )
         return -ENODEV;
 
-    if ( find_msi_entry(pdev, msi->vector, PCI_CAP_ID_MSI) )
-    {
-        dprintk(XENLOG_WARNING, "vector %d has already mapped to MSI on "
-                "device %02x:%02x.%01x.\n", msi->vector, msi->bus,
+    if ( find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSI) )
+    {
+        dprintk(XENLOG_WARNING, "irq %d has already mapped to MSI on "
+                "device %02x:%02x.%01x.\n", msi->irq, msi->bus,
                 PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn));
         return 0;
     }
 
-    status = msi_capability_init(pdev, msi->vector, desc);
+    status = msi_capability_init(pdev, msi->irq, desc);
     return status;
 }
 
@@ -696,10 +698,10 @@ static int __pci_enable_msix(struct msi_
     if (msi->entry_nr >= nr_entries)
         return -EINVAL;
 
-    if ( find_msi_entry(pdev, msi->vector, PCI_CAP_ID_MSIX) )
-    {
-        dprintk(XENLOG_WARNING, "vector %d has already mapped to MSIX on "
-                "device %02x:%02x.%01x.\n", msi->vector, msi->bus,
+    if ( find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSIX) )
+    {
+        dprintk(XENLOG_WARNING, "irq %d has already mapped to MSIX on "
+                "device %02x:%02x.%01x.\n", msi->irq, msi->bus,
                 PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn));
         return 0;
     }
@@ -754,21 +756,21 @@ void pci_disable_msi(struct msi_desc *ms
         __pci_disable_msix(msi_desc);
 }
 
-static void msi_free_vectors(struct pci_dev* dev)
+static void msi_free_irqs(struct pci_dev* dev)
 {
     struct msi_desc *entry, *tmp;
-    irq_desc_t *desc;
-    unsigned long flags, vector;
+    struct irq_desc *desc;
+    unsigned long flags, irq;
 
     list_for_each_entry_safe( entry, tmp, &dev->msi_list, list )
     {
-        vector = entry->vector;
-        desc = &irq_desc[vector];
+        irq = entry->irq;
+        desc = &irq_desc[irq];
         pci_disable_msi(entry);
 
         spin_lock_irqsave(&desc->lock, flags);
 
-        teardown_msi_vector(vector);
+        teardown_msi_irq(irq);
 
         if ( desc->handler == &pci_msi_type )
         {
@@ -778,7 +780,7 @@ static void msi_free_vectors(struct pci_
         }
 
         spin_unlock_irqrestore(&desc->lock, flags);
-        msi_free_vector(entry);
+        msi_free_irq(entry);
     }
 }
 
@@ -787,15 +789,15 @@ void pci_cleanup_msi(struct pci_dev *pde
     /* Disable MSI and/or MSI-X */
     msi_set_enable(pdev, 0);
     msix_set_enable(pdev, 0);
-    msi_free_vectors(pdev);
+    msi_free_irqs(pdev);
 }
 
 int pci_restore_msi_state(struct pci_dev *pdev)
 {
     unsigned long flags;
-    int vector;
+    int irq;
     struct msi_desc *entry, *tmp;
-    irq_desc_t *desc;
+    struct irq_desc *desc;
 
     ASSERT(spin_is_locked(&pcidevs_lock));
 
@@ -804,8 +806,8 @@ int pci_restore_msi_state(struct pci_dev
 
     list_for_each_entry_safe( entry, tmp, &pdev->msi_list, list )
     {
-        vector = entry->vector;
-        desc = &irq_desc[vector];
+        irq = entry->irq;
+        desc = &irq_desc[irq];
 
         spin_lock_irqsave(&desc->lock, flags);
 
@@ -826,7 +828,7 @@ int pci_restore_msi_state(struct pci_dev
 
         write_msi_msg(entry, &entry->msg);
 
-        msi_set_mask_bit(vector, entry->msi_attrib.masked);
+        msi_set_mask_bit(irq, entry->msi_attrib.masked);
 
         if ( entry->msi_attrib.type == PCI_CAP_ID_MSI )
             msi_set_enable(pdev, 1);
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/physdev.c	Tue Aug 11 16:07:27 2009 +0800
@@ -30,7 +30,7 @@ static int physdev_map_pirq(struct physd
 static int physdev_map_pirq(struct physdev_map_pirq *map)
 {
     struct domain *d;
-    int vector, pirq, ret = 0;
+    int pirq, irq, ret = 0;
     struct msi_info _msi;
     void *map_data = NULL;
 
@@ -51,7 +51,7 @@ static int physdev_map_pirq(struct physd
         goto free_domain;
     }
 
-    /* Verify or get vector. */
+    /* Verify or get irq. */
     switch ( map->type )
     {
         case MAP_PIRQ_TYPE_GSI:
@@ -62,25 +62,25 @@ static int physdev_map_pirq(struct physd
                 ret = -EINVAL;
                 goto free_domain;
             }
-            vector = domain_irq_to_vector(current->domain, map->index);
-            if ( !vector )
-            {
-                dprintk(XENLOG_G_ERR, "dom%d: map irq with no vector %d\n",
-                        d->domain_id, vector);
+            irq = domain_pirq_to_irq(current->domain, map->index);
+            if ( !irq )
+            {
+                dprintk(XENLOG_G_ERR, "dom%d: map pirq with incorrect irq!\n",
+                        d->domain_id);
                 ret = -EINVAL;
                 goto free_domain;
             }
             break;
 
         case MAP_PIRQ_TYPE_MSI:
-            vector = map->index;
-            if ( vector == -1 )
-                vector = assign_irq_vector(AUTO_ASSIGN_IRQ);
-
-            if ( vector < 0 || vector >= NR_VECTORS )
-            {
-                dprintk(XENLOG_G_ERR, "dom%d: map irq with wrong vector %d\n",
-                        d->domain_id, vector);
+            irq = map->index;
+            if ( irq == -1 )
+                irq = create_irq();
+
+            if ( irq < 0 || irq >= nr_irqs )
+            {
+                dprintk(XENLOG_G_ERR, "dom%d: can't create irq for msi!\n",
+                        d->domain_id);
                 ret = -EINVAL;
                 goto free_domain;
             }
@@ -89,7 +89,7 @@ static int physdev_map_pirq(struct physd
             _msi.devfn = map->devfn;
             _msi.entry_nr = map->entry_nr;
             _msi.table_base = map->table_base;
-            _msi.vector = vector;
+            _msi.irq = irq;
             map_data = &_msi;
             break;
 
@@ -103,7 +103,7 @@ static int physdev_map_pirq(struct physd
     spin_lock(&pcidevs_lock);
     /* Verify or get pirq. */
     spin_lock(&d->event_lock);
-    pirq = domain_vector_to_irq(d, vector);
+    pirq = domain_irq_to_pirq(d, irq);
     if ( map->pirq < 0 )
     {
         if ( pirq )
@@ -132,7 +132,7 @@ static int physdev_map_pirq(struct physd
     {
         if ( pirq && pirq != map->pirq )
         {
-            dprintk(XENLOG_G_ERR, "dom%d: vector %d conflicts with irq %d\n",
+            dprintk(XENLOG_G_ERR, "dom%d: pirq %d conflicts with irq %d\n",
                     d->domain_id, map->index, map->pirq);
             ret = -EEXIST;
             goto done;
@@ -141,7 +141,7 @@ static int physdev_map_pirq(struct physd
             pirq = map->pirq;
     }
 
-    ret = map_domain_pirq(d, pirq, vector, map->type, map_data);
+    ret = map_domain_pirq(d, pirq, irq, map->type, map_data);
     if ( ret == 0 )
         map->pirq = pirq;
 
@@ -149,7 +149,7 @@ done:
     spin_unlock(&d->event_lock);
     spin_unlock(&pcidevs_lock);
     if ( (ret != 0) && (map->type == MAP_PIRQ_TYPE_MSI) && (map->index == -1) )
-        free_irq_vector(vector);
+        destroy_irq(irq);
 free_domain:
     rcu_unlock_domain(d);
     return ret;
@@ -344,14 +344,12 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
 
         irq = irq_op.irq;
         ret = -EINVAL;
-        if ( (irq < 0) || (irq >= nr_irqs_gsi) )
-            break;
 
         irq_op.vector = assign_irq_vector(irq);
 
         spin_lock(&pcidevs_lock);
         spin_lock(&dom0->event_lock);
-        ret = map_domain_pirq(dom0, irq_op.irq, irq_op.vector,
+        ret = map_domain_pirq(dom0, irq_op.irq, irq,
                               MAP_PIRQ_TYPE_GSI, NULL);
         spin_unlock(&dom0->event_lock);
         spin_unlock(&pcidevs_lock);
diff -r 6a639384fba6 -r 8584327c7e70 xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/arch/x86/setup.c	Tue Aug 11 16:07:27 2009 +0800
@@ -922,7 +922,7 @@ void __init __start_xen(unsigned long mb
     init_apic_mappings();
 
     init_IRQ();
-
+    
     percpu_init_areas();
 
     xsm_init(&initrdidx, mbi, initial_images_start);
diff -r 6a639384fba6 -r 8584327c7e70 xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/drivers/passthrough/amd/iommu_init.c	Tue Aug 11 16:07:27 2009 +0800
@@ -27,7 +27,7 @@
 #include <asm/hvm/svm/amd-iommu-proto.h>
 #include <asm-x86/fixmap.h>
 
-static struct amd_iommu *vector_to_iommu[NR_VECTORS];
+static struct amd_iommu **irq_to_iommu;
 static int nr_amd_iommus;
 static long amd_iommu_cmd_buffer_entries = IOMMU_CMD_BUFFER_DEFAULT_ENTRIES;
 static long amd_iommu_event_log_entries = IOMMU_EVENT_LOG_DEFAULT_ENTRIES;
@@ -309,7 +309,7 @@ static void amd_iommu_msi_data_init(stru
     u8 bus = (iommu->bdf >> 8) & 0xff;
     u8 dev = PCI_SLOT(iommu->bdf & 0xff);
     u8 func = PCI_FUNC(iommu->bdf & 0xff);
-    int vector = iommu->vector;
+    int vector = irq_to_vector(iommu->irq);
 
     msi_data = MSI_DATA_TRIGGER_EDGE |
         MSI_DATA_LEVEL_ASSERT |
@@ -355,10 +355,10 @@ static void amd_iommu_msi_enable(struct 
         iommu->msi_cap + PCI_MSI_FLAGS, control);
 }
 
-static void iommu_msi_unmask(unsigned int vector)
+static void iommu_msi_unmask(unsigned int irq)
 {
     unsigned long flags;
-    struct amd_iommu *iommu = vector_to_iommu[vector];
+    struct amd_iommu *iommu = irq_to_iommu[irq];
 
     /* FIXME: do not support mask bits at the moment */
     if ( iommu->maskbit )
@@ -369,10 +369,10 @@ static void iommu_msi_unmask(unsigned in
     spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
-static void iommu_msi_mask(unsigned int vector)
+static void iommu_msi_mask(unsigned int irq)
 {
     unsigned long flags;
-    struct amd_iommu *iommu = vector_to_iommu[vector];
+    struct amd_iommu *iommu = irq_to_iommu[irq];
 
     /* FIXME: do not support mask bits at the moment */
     if ( iommu->maskbit )
@@ -383,21 +383,21 @@ static void iommu_msi_mask(unsigned int 
     spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
-static unsigned int iommu_msi_startup(unsigned int vector)
-{
-    iommu_msi_unmask(vector);
+static unsigned int iommu_msi_startup(unsigned int irq)
+{
+    iommu_msi_unmask(irq);
     return 0;
 }
 
-static void iommu_msi_end(unsigned int vector)
-{
-    iommu_msi_unmask(vector);
+static void iommu_msi_end(unsigned int irq)
+{
+    iommu_msi_unmask(irq);
     ack_APIC_irq();
 }
 
-static void iommu_msi_set_affinity(unsigned int vector, cpumask_t dest)
-{
-    struct amd_iommu *iommu = vector_to_iommu[vector];
+static void iommu_msi_set_affinity(unsigned int irq, cpumask_t dest)
+{
+    struct amd_iommu *iommu = irq_to_iommu[irq];
     amd_iommu_msi_addr_init(iommu, cpu_physical_id(first_cpu(dest)));
 }
 
@@ -451,7 +451,7 @@ static void parse_event_log_entry(u32 en
     }
 }
 
-static void amd_iommu_page_fault(int vector, void *dev_id,
+static void amd_iommu_page_fault(int irq, void *dev_id,
                              struct cpu_user_regs *regs)
 {
     u32 event[4];
@@ -477,32 +477,30 @@ static void amd_iommu_page_fault(int vec
 
 static int set_iommu_interrupt_handler(struct amd_iommu *iommu)
 {
-    int vector, ret;
-
-    vector = assign_irq_vector(AUTO_ASSIGN_IRQ);
-    if ( vector <= 0 )
-    {
-        gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no vectors\n");
+    int irq, ret;
+
+    irq = create_irq();
+    if ( irq <= 0 )
+    {
+        gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irqs\n");
         return 0;
     }
 
-    irq_desc[vector].handler = &iommu_msi_type;
-    vector_to_iommu[vector] = iommu;
-    ret = request_irq_vector(vector, amd_iommu_page_fault, 0,
+    irq_desc[irq].handler = &iommu_msi_type;
+    irq_to_iommu[irq] = iommu;
+    ret = request_irq(irq, amd_iommu_page_fault, 0,
                              "amd_iommu", iommu);
     if ( ret )
     {
-        irq_desc[vector].handler = &no_irq_type;
-        vector_to_iommu[vector] = NULL;
-        free_irq_vector(vector);
+        irq_desc[irq].handler = &no_irq_type;
+        irq_to_iommu[irq] = NULL;
+        destroy_irq(irq);
         amd_iov_error("can't request irq\n");
         return 0;
     }
 
-    /* Make sure that vector is never re-used. */
-    vector_irq[vector] = NEVER_ASSIGN_IRQ;
-    iommu->vector = vector;
-    return vector;
+    iommu->irq = irq;
+    return irq;
 }
 
 void enable_iommu(struct amd_iommu *iommu)
@@ -510,6 +508,10 @@ void enable_iommu(struct amd_iommu *iomm
     unsigned long flags;
 
     spin_lock_irqsave(&iommu->lock, flags);
+
+    irq_to_iommu = xmalloc_array(struct amd_iommu *, nr_irqs);
+    BUG_ON(!irq_to_iommu);
+    memset(irq_to_iommu, 0, nr_irqs * sizeof(struct iommu*));
 
     if ( iommu->enabled )
     {
diff -r 6a639384fba6 -r 8584327c7e70 xen/drivers/passthrough/io.c
--- a/xen/drivers/passthrough/io.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/drivers/passthrough/io.c	Tue Aug 11 16:07:27 2009 +0800
@@ -35,7 +35,6 @@ static void pt_irq_time_out(void *data)
 {
     struct hvm_mirq_dpci_mapping *irq_map = data;
     unsigned int guest_gsi, machine_gsi = 0;
-    int vector;
     struct hvm_irq_dpci *dpci = NULL;
     struct dev_intx_gsi_link *digl;
     struct hvm_girq_dpci_mapping *girq;
@@ -68,7 +67,6 @@ static void pt_irq_time_out(void *data)
                                       machine_gsi + 1) )
     {
         clear_bit(machine_gsi, dpci->dirq_mask);
-        vector = domain_irq_to_vector(irq_map->dom, machine_gsi);
         dpci->mirq[machine_gsi].pending = 0;
     }
 
@@ -88,6 +86,7 @@ void free_hvm_irq_dpci(struct hvm_irq_dp
     xfree(dpci->mirq);
     xfree(dpci->dirq_mask);
     xfree(dpci->mapping);
+    xfree(dpci->hvm_timer);
     xfree(dpci);
 }
 
@@ -124,9 +123,11 @@ int pt_irq_create_bind_vtd(
                                                 BITS_TO_LONGS(d->nr_pirqs));
         hvm_irq_dpci->mapping = xmalloc_array(unsigned long,
                                               BITS_TO_LONGS(d->nr_pirqs));
+        hvm_irq_dpci->hvm_timer = xmalloc_array(struct timer, nr_irqs);
         if ( !hvm_irq_dpci->mirq ||
              !hvm_irq_dpci->dirq_mask ||
-             !hvm_irq_dpci->mapping )
+             !hvm_irq_dpci->mapping ||
+             !hvm_irq_dpci->hvm_timer)
         {
             spin_unlock(&d->event_lock);
             free_hvm_irq_dpci(hvm_irq_dpci);
@@ -136,6 +137,8 @@ int pt_irq_create_bind_vtd(
                d->nr_pirqs * sizeof(*hvm_irq_dpci->mirq));
         bitmap_zero(hvm_irq_dpci->dirq_mask, d->nr_pirqs);
         bitmap_zero(hvm_irq_dpci->mapping, d->nr_pirqs);
+        memset(hvm_irq_dpci->hvm_timer, 0, 
+                nr_irqs * sizeof(*hvm_irq_dpci->hvm_timer));
         for ( int i = 0; i < d->nr_pirqs; i++ )
             INIT_LIST_HEAD(&hvm_irq_dpci->mirq[i].digl_list);
         for ( int i = 0; i < NR_HVM_IRQS; i++ )
@@ -236,7 +239,7 @@ int pt_irq_create_bind_vtd(
         /* Bind the same mirq once in the same domain */
         if ( !test_and_set_bit(machine_gsi, hvm_irq_dpci->mapping))
         {
-            unsigned int vector = domain_irq_to_vector(d, machine_gsi);
+            unsigned int irq = domain_pirq_to_irq(d, machine_gsi);
             unsigned int share;
 
             hvm_irq_dpci->mirq[machine_gsi].dom = d;
@@ -256,14 +259,14 @@ int pt_irq_create_bind_vtd(
 
             /* Init timer before binding */
             if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) )
-                init_timer(&hvm_irq_dpci->hvm_timer[vector],
+                init_timer(&hvm_irq_dpci->hvm_timer[irq],
                            pt_irq_time_out, &hvm_irq_dpci->mirq[machine_gsi], 0);
             /* Deal with gsi for legacy devices */
             rc = pirq_guest_bind(d->vcpu[0], machine_gsi, share);
             if ( unlikely(rc) )
             {
                 if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) )
-                    kill_timer(&hvm_irq_dpci->hvm_timer[vector]);
+                    kill_timer(&hvm_irq_dpci->hvm_timer[irq]);
                 hvm_irq_dpci->mirq[machine_gsi].dom = NULL;
                 clear_bit(machine_gsi, hvm_irq_dpci->mapping);
                 list_del(&girq->list);
@@ -349,7 +352,7 @@ int pt_irq_destroy_bind_vtd(
             pirq_guest_unbind(d, machine_gsi);
             msixtbl_pt_unregister(d, machine_gsi);
             if ( pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) )
-                kill_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, machine_gsi)]);
+                kill_timer(&hvm_irq_dpci->hvm_timer[domain_pirq_to_irq(d, machine_gsi)]);
             hvm_irq_dpci->mirq[machine_gsi].dom   = NULL;
             hvm_irq_dpci->mirq[machine_gsi].flags = 0;
             clear_bit(machine_gsi, hvm_irq_dpci->mapping);
@@ -357,7 +360,7 @@ int pt_irq_destroy_bind_vtd(
     }
     spin_unlock(&d->event_lock);
     gdprintk(XENLOG_INFO,
-             "XEN_DOMCTL_irq_unmapping: m_irq = %x device = %x intx = %x\n",
+             "XEN_DOMCTL_irq_unmapping: m_irq = 0x%x device = 0x%x intx = 0x%x\n",
              machine_gsi, device, intx);
 
     return 0;
@@ -367,7 +370,7 @@ int hvm_do_IRQ_dpci(struct domain *d, un
 {
     struct hvm_irq_dpci *dpci = domain_get_irq_dpci(d);
 
-    ASSERT(spin_is_locked(&irq_desc[domain_irq_to_vector(d, mirq)].lock));
+    ASSERT(spin_is_locked(&irq_desc[domain_pirq_to_irq(d, mirq)].lock));
     if ( !iommu_enabled || (d == dom0) || !dpci ||
          !test_bit(mirq, dpci->mapping))
         return 0;
@@ -425,7 +428,7 @@ static int hvm_pci_msi_assert(struct dom
 
 static void hvm_dirq_assist(unsigned long _d)
 {
-    unsigned int irq;
+    unsigned int pirq;
     uint32_t device, intx;
     struct domain *d = (struct domain *)_d;
     struct hvm_irq_dpci *hvm_irq_dpci = d->arch.hvm_domain.irq.dpci;
@@ -433,34 +436,34 @@ static void hvm_dirq_assist(unsigned lon
 
     ASSERT(hvm_irq_dpci);
 
-    for ( irq = find_first_bit(hvm_irq_dpci->dirq_mask, d->nr_pirqs);
-          irq < d->nr_pirqs;
-          irq = find_next_bit(hvm_irq_dpci->dirq_mask, d->nr_pirqs, irq + 1) )
-    {
-        if ( !test_and_clear_bit(irq, hvm_irq_dpci->dirq_mask) )
+    for ( pirq = find_first_bit(hvm_irq_dpci->dirq_mask, d->nr_pirqs);
+          pirq < d->nr_pirqs;
+          pirq = find_next_bit(hvm_irq_dpci->dirq_mask, d->nr_pirqs, pirq + 1) )
+    {
+        if ( !test_and_clear_bit(pirq, hvm_irq_dpci->dirq_mask) )
             continue;
 
         spin_lock(&d->event_lock);
 #ifdef SUPPORT_MSI_REMAPPING
-        if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_GUEST_MSI )
-        {
-            hvm_pci_msi_assert(d, irq);
+        if ( hvm_irq_dpci->mirq[pirq].flags & HVM_IRQ_DPCI_GUEST_MSI )
+        {
+            hvm_pci_msi_assert(d, pirq);
             spin_unlock(&d->event_lock);
             continue;
         }
 #endif
-        list_for_each_entry ( digl, &hvm_irq_dpci->mirq[irq].digl_list, list )
+        list_for_each_entry ( digl, &hvm_irq_dpci->mirq[pirq].digl_list, list )
         {
             device = digl->device;
             intx = digl->intx;
             hvm_pci_intx_assert(d, device, intx);
-            hvm_irq_dpci->mirq[irq].pending++;
+            hvm_irq_dpci->mirq[pirq].pending++;
 
 #ifdef SUPPORT_MSI_REMAPPING
-            if ( hvm_irq_dpci->mirq[irq].flags & HVM_IRQ_DPCI_TRANSLATE )
+            if ( hvm_irq_dpci->mirq[pirq].flags & HVM_IRQ_DPCI_TRANSLATE )
             {
                 /* for translated MSI to INTx interrupt, eoi as early as possible */
-                __msi_pirq_eoi(d, irq);
+                __msi_pirq_eoi(d, pirq);
             }
 #endif
         }
@@ -472,8 +475,8 @@ static void hvm_dirq_assist(unsigned lon
          * guest will never deal with the irq, then the physical interrupt line
          * will never be deasserted.
          */
-        if ( pt_irq_need_timer(hvm_irq_dpci->mirq[irq].flags) )
-            set_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, irq)],
+        if ( pt_irq_need_timer(hvm_irq_dpci->mirq[pirq].flags) )
+            set_timer(&hvm_irq_dpci->hvm_timer[domain_pirq_to_irq(d, pirq)],
                       NOW() + PT_IRQ_TIME_OUT);
         spin_unlock(&d->event_lock);
     }
@@ -501,7 +504,7 @@ static void __hvm_dpci_eoi(struct domain
          ! pt_irq_need_timer(hvm_irq_dpci->mirq[machine_gsi].flags) )
         return;
 
-    stop_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, machine_gsi)]);
+    stop_timer(&hvm_irq_dpci->hvm_timer[domain_pirq_to_irq(d, machine_gsi)]);
     pirq_guest_eoi(d, machine_gsi);
 }
 
diff -r 6a639384fba6 -r 8584327c7e70 xen/drivers/passthrough/pci.c
--- a/xen/drivers/passthrough/pci.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/drivers/passthrough/pci.c	Tue Aug 11 16:07:27 2009 +0800
@@ -216,7 +216,7 @@ static void pci_clean_dpci_irqs(struct d
               i = find_next_bit(hvm_irq_dpci->mapping, d->nr_pirqs, i + 1) )
         {
             pirq_guest_unbind(d, i);
-            kill_timer(&hvm_irq_dpci->hvm_timer[domain_irq_to_vector(d, i)]);
+            kill_timer(&hvm_irq_dpci->hvm_timer[domain_pirq_to_irq(d, i)]);
 
             list_for_each_safe ( digl_list, tmp,
                                  &hvm_irq_dpci->mirq[i].digl_list )
@@ -408,7 +408,7 @@ static void dump_pci_devices(unsigned ch
                pdev->bus, PCI_SLOT(pdev->devfn), PCI_FUNC(pdev->devfn),
                pdev->domain ? pdev->domain->domain_id : -1);
         list_for_each_entry ( msi, &pdev->msi_list, list )
-               printk("%d ", msi->vector);
+               printk("%d ", msi->irq);
         printk(">\n");
     }
 
diff -r 6a639384fba6 -r 8584327c7e70 xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/drivers/passthrough/vtd/iommu.c	Tue Aug 11 16:07:27 2009 +0800
@@ -31,6 +31,8 @@
 #include <xen/pci_regs.h>
 #include <xen/keyhandler.h>
 #include <asm/msi.h>
+#include <asm/irq.h>
+#include <mach_apic.h>
 #include "iommu.h"
 #include "dmar.h"
 #include "extern.h"
@@ -659,7 +661,7 @@ static void iommu_disable_translation(st
     spin_unlock_irqrestore(&iommu->register_lock, flags);
 }
 
-static struct iommu *vector_to_iommu[NR_VECTORS];
+static struct iommu **irq_to_iommu;
 static int iommu_page_fault_do_one(struct iommu *iommu, int type,
                                    u8 fault_reason, u16 source_id, u64 addr)
 {
@@ -705,7 +707,7 @@ static void iommu_fault_status(u32 fault
 }
 
 #define PRIMARY_FAULT_REG_LEN (16)
-static void iommu_page_fault(int vector, void *dev_id,
+static void iommu_page_fault(int irq, void *dev_id,
                              struct cpu_user_regs *regs)
 {
     struct iommu *iommu = dev_id;
@@ -777,9 +779,9 @@ clear_overflow:
     }
 }
 
-static void dma_msi_unmask(unsigned int vector)
-{
-    struct iommu *iommu = vector_to_iommu[vector];
+static void dma_msi_unmask(unsigned int irq)
+{
+    struct iommu *iommu = irq_to_iommu[irq];
     unsigned long flags;
 
     /* unmask it */
@@ -788,10 +790,10 @@ static void dma_msi_unmask(unsigned int 
     spin_unlock_irqrestore(&iommu->register_lock, flags);
 }
 
-static void dma_msi_mask(unsigned int vector)
+static void dma_msi_mask(unsigned int irq)
 {
     unsigned long flags;
-    struct iommu *iommu = vector_to_iommu[vector];
+    struct iommu *iommu = irq_to_iommu[irq];
 
     /* mask it */
     spin_lock_irqsave(&iommu->register_lock, flags);
@@ -799,22 +801,23 @@ static void dma_msi_mask(unsigned int ve
     spin_unlock_irqrestore(&iommu->register_lock, flags);
 }
 
-static unsigned int dma_msi_startup(unsigned int vector)
-{
-    dma_msi_unmask(vector);
+static unsigned int dma_msi_startup(unsigned int irq)
+{
+    dma_msi_unmask(irq);
     return 0;
 }
 
-static void dma_msi_end(unsigned int vector)
-{
-    dma_msi_unmask(vector);
+static void dma_msi_end(unsigned int irq)
+{
+    dma_msi_unmask(irq);
     ack_APIC_irq();
 }
 
-static void dma_msi_data_init(struct iommu *iommu, int vector)
+static void dma_msi_data_init(struct iommu *iommu, int irq)
 {
     u32 msi_data = 0;
     unsigned long flags;
+    int vector = irq_to_vector(irq);
 
     /* Fixed, edge, assert mode. Follow MSI setting */
     msi_data |= vector & 0xff;
@@ -842,9 +845,9 @@ static void dma_msi_addr_init(struct iom
     spin_unlock_irqrestore(&iommu->register_lock, flags);
 }
 
-static void dma_msi_set_affinity(unsigned int vector, cpumask_t dest)
-{
-    struct iommu *iommu = vector_to_iommu[vector];
+static void dma_msi_set_affinity(unsigned int irq, cpumask_t dest)
+{
+    struct iommu *iommu = irq_to_iommu[irq];
     dma_msi_addr_init(iommu, cpu_physical_id(first_cpu(dest)));
 }
 
@@ -861,31 +864,28 @@ static struct hw_interrupt_type dma_msi_
 
 static int iommu_set_interrupt(struct iommu *iommu)
 {
-    int vector, ret;
-
-    vector = assign_irq_vector(AUTO_ASSIGN_IRQ);
-    if ( vector <= 0 )
-    {
-        gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no vectors\n");
+    int irq, ret;
+
+    irq = create_irq();
+    if ( irq <= 0 )
+    {
+        gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irq available!\n");
         return -EINVAL;
     }
 
-    irq_desc[vector].handler = &dma_msi_type;
-    vector_to_iommu[vector] = iommu;
-    ret = request_irq_vector(vector, iommu_page_fault, 0, "dmar", iommu);
+    irq_desc[irq].handler = &dma_msi_type;
+    irq_to_iommu[irq] = iommu;
+    ret = request_irq(irq, iommu_page_fault, 0, "dmar", iommu);
     if ( ret )
     {
-        irq_desc[vector].handler = &no_irq_type;
-        vector_to_iommu[vector] = NULL;
-        free_irq_vector(vector);
+        irq_desc[irq].handler = &no_irq_type;
+        irq_to_iommu[irq] = NULL;
+        destroy_irq(irq);
         gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: can't request irq\n");
         return ret;
     }
 
-    /* Make sure that vector is never re-used. */
-    vector_irq[vector] = NEVER_ASSIGN_IRQ;
-
-    return vector;
+    return irq;
 }
 
 static int iommu_alloc(struct acpi_drhd_unit *drhd)
@@ -906,7 +906,7 @@ static int iommu_alloc(struct acpi_drhd_
         return -ENOMEM;
     memset(iommu, 0, sizeof(struct iommu));
 
-    iommu->vector = -1; /* No vector assigned yet. */
+    iommu->irq = -1; /* No irq assigned yet. */
 
     iommu->intel = alloc_intel_iommu();
     if ( iommu->intel == NULL )
@@ -966,7 +966,7 @@ static void iommu_free(struct acpi_drhd_
         iounmap(iommu->reg);
 
     free_intel_iommu(iommu->intel);
-    release_irq_vector(iommu->vector);
+    destroy_irq(iommu->irq);
     xfree(iommu);
 
     drhd->iommu = NULL;
@@ -1581,24 +1581,24 @@ static int init_vtd_hw(void)
     struct acpi_drhd_unit *drhd;
     struct iommu *iommu;
     struct iommu_flush *flush = NULL;
-    int vector;
+    int irq = -1;
     int ret;
     unsigned long flags;
 
     for_each_drhd_unit ( drhd )
     {
         iommu = drhd->iommu;
-        if ( iommu->vector < 0 )
-        {
-            vector = iommu_set_interrupt(iommu);
-            if ( vector < 0 )
+        if ( iommu->irq < 0 )
+        {
+            irq = iommu_set_interrupt(iommu);
+            if ( irq < 0 )
             {
                 gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: interrupt setup failed\n");
-                return vector;
+                return irq;
             }
-            iommu->vector = vector;
-        }
-        dma_msi_data_init(iommu, iommu->vector);
+            iommu->irq = irq;
+        }
+        dma_msi_data_init(iommu, iommu->irq);
         dma_msi_addr_init(iommu, cpu_physical_id(first_cpu(cpu_online_map)));
         clear_fault_bits(iommu);
 
@@ -1702,6 +1702,13 @@ int intel_vtd_setup(void)
 
     spin_lock_init(&domid_bitmap_lock);
     clflush_size = get_cache_line_size();
+
+    irq_to_iommu = xmalloc_array(struct iommu*, nr_irqs);
+    BUG_ON(!irq_to_iommu);
+    memset(irq_to_iommu, 0, nr_irqs * sizeof(struct iommu*));
+
+    if(!irq_to_iommu)
+        return -ENOMEM;
 
     /* We enable the following features only if they are supported by all VT-d
      * engines: Snoop Control, DMA passthrough, Queued Invalidation and
diff -r 6a639384fba6 -r 8584327c7e70 xen/drivers/passthrough/vtd/x86/vtd.c
--- a/xen/drivers/passthrough/vtd/x86/vtd.c	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c	Tue Aug 11 16:07:27 2009 +0800
@@ -121,7 +121,7 @@ void hvm_dpci_isairq_eoi(struct domain *
                 hvm_pci_intx_deassert(d, digl->device, digl->intx);
                 if ( --dpci->mirq[i].pending == 0 )
                 {
-                    stop_timer(&dpci->hvm_timer[domain_irq_to_vector(d, i)]);
+                    stop_timer(&dpci->hvm_timer[domain_pirq_to_irq(d, i)]);
                     pirq_guest_eoi(d, i);
                 }
             }
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/asm-x86/amd-iommu.h
--- a/xen/include/asm-x86/amd-iommu.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/asm-x86/amd-iommu.h	Tue Aug 11 16:07:27 2009 +0800
@@ -79,7 +79,7 @@ struct amd_iommu {
     int maskbit;
 
     int enabled;
-    int vector;
+    int irq;
 };
 
 struct ivrs_mappings {
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/asm-x86/domain.h
--- a/xen/include/asm-x86/domain.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/asm-x86/domain.h	Tue Aug 11 16:07:27 2009 +0800
@@ -262,9 +262,9 @@ struct arch_domain
     /* Shadow translated domain: P2M mapping */
     pagetable_t phys_table;
 
-    /* NB. protected by d->event_lock and by irq_desc[vector].lock */
-    int vector_pirq[NR_VECTORS];
-    s16 *pirq_vector;
+    /* NB. protected by d->event_lock and by irq_desc[irq].lock */
+    int *irq_pirq;
+    int *pirq_irq;
 
     /* Shared page for notifying that explicit PIRQ EOI is required. */
     unsigned long *pirq_eoi_map;
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/asm-x86/irq.h
--- a/xen/include/asm-x86/irq.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/asm-x86/irq.h	Tue Aug 11 16:07:27 2009 +0800
@@ -7,19 +7,25 @@
 #include <asm/atomic.h>
 #include <irq_vectors.h>
 
-#define IO_APIC_IRQ(irq)    (((irq) >= 16) || ((1<<(irq)) & io_apic_irqs))
+#define IO_APIC_IRQ(irq)    (((irq) >= 16 && (irq) < nr_irqs_gsi) \
+        || (((irq) < 16) && (1<<(irq)) & io_apic_irqs))
 #define IO_APIC_VECTOR(irq) (irq_vector[irq])
+
+#define MSI_IRQ(irq)       ((irq) >= nr_irqs_gsi && (irq) < nr_irqs)
 
 #define LEGACY_VECTOR(irq)          ((irq) + FIRST_LEGACY_VECTOR)
 #define LEGACY_IRQ_FROM_VECTOR(vec) ((vec) - FIRST_LEGACY_VECTOR)
 
-#define irq_to_vector(irq)  \
-    (IO_APIC_IRQ(irq) ? IO_APIC_VECTOR(irq) : LEGACY_VECTOR(irq))
 #define vector_to_irq(vec)  (vector_irq[vec])
+#define irq_to_desc(irq)    &irq_desc[(irq)]
+
+#define MAX_GSI_IRQS PAGE_SIZE * 8
+#define MAX_NR_IRQS (2 * MAX_GSI_IRQS)
 
 extern int vector_irq[NR_VECTORS];
 extern u8 *irq_vector;
 
+extern int irq_to_vector(int irq);
 #define platform_legacy_irq(irq)	((irq) < 16)
 
 fastcall void event_check_interrupt(void);
@@ -51,17 +57,21 @@ extern atomic_t irq_mis_count;
 
 int pirq_shared(struct domain *d , int irq);
 
-int map_domain_pirq(struct domain *d, int pirq, int vector, int type,
+int map_domain_pirq(struct domain *d, int pirq, int irq, int type,
                            void *data);
 int unmap_domain_pirq(struct domain *d, int pirq);
 int get_free_pirq(struct domain *d, int type, int index);
 void free_domain_pirqs(struct domain *d);
 
-#define domain_irq_to_vector(d, irq) ((d)->arch.pirq_vector[irq] ?: \
-                                      IO_APIC_IRQ(irq) ? 0 : LEGACY_VECTOR(irq))
-#define domain_vector_to_irq(d, vec) ((d)->arch.vector_pirq[vec] ?: \
-                                      ((vec) < FIRST_LEGACY_VECTOR || \
-                                       (vec) > LAST_LEGACY_VECTOR) ? \
-                                      0 : LEGACY_IRQ_FROM_VECTOR(vec))
+int  init_irq_data(void);
+
+void clear_irq_vector(int irq);
+int __assign_irq_vector(int irq);
+
+int create_irq(void);
+void destroy_irq(unsigned int irq);
+
+#define domain_pirq_to_irq(d, pirq) ((d)->arch.pirq_irq[pirq])
+#define domain_irq_to_pirq(d, irq) ((d)->arch.irq_pirq[irq])
 
 #endif /* _ASM_HW_IRQ_H */
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/asm-x86/mach-default/irq_vectors.h
--- a/xen/include/asm-x86/mach-default/irq_vectors.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/asm-x86/mach-default/irq_vectors.h	Tue Aug 11 16:07:27 2009 +0800
@@ -23,6 +23,7 @@
 #define LAST_LEGACY_VECTOR      0xef
 
 #define HYPERCALL_VECTOR	0x82
+#define LEGACY_SYSCALL_VECTOR   0x80
 
 /* Dynamically-allocated vectors available to any driver. */
 #define FIRST_DYNAMIC_VECTOR	0x20
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/asm-x86/msi.h
--- a/xen/include/asm-x86/msi.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/asm-x86/msi.h	Tue Aug 11 16:07:27 2009 +0800
@@ -2,7 +2,6 @@
 #define __ASM_MSI_H
 
 #include <xen/cpumask.h>
-#include <asm/irq.h>
 /*
  * Constants for Intel APIC based MSI messages.
  */
@@ -57,7 +56,7 @@ struct msi_info {
 struct msi_info {
     int bus;
     int devfn;
-    int vector;
+    int irq;
     int entry_nr;
     uint64_t table_base;
 };
@@ -70,14 +69,14 @@ struct msi_msg {
 
 struct msi_desc;
 /* Helper functions */
-extern void mask_msi_vector(unsigned int vector);
-extern void unmask_msi_vector(unsigned int vector);
+extern void mask_msi_irq(unsigned int irq);
+extern void unmask_msi_irq(unsigned int irq);
 extern void set_msi_affinity(unsigned int vector, cpumask_t mask);
 extern int pci_enable_msi(struct msi_info *msi, struct msi_desc **desc);
 extern void pci_disable_msi(struct msi_desc *desc);
 extern void pci_cleanup_msi(struct pci_dev *pdev);
-extern int setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc);
-extern void teardown_msi_vector(int vector);
+extern int setup_msi_irq(struct pci_dev *dev, struct msi_desc *desc, int irq);
+extern void teardown_msi_irq(int irq);
 extern int msi_free_vector(struct msi_desc *entry);
 extern int pci_restore_msi_state(struct pci_dev *pdev);
 
@@ -97,7 +96,7 @@ struct msi_desc {
 
 	void __iomem *mask_base;        /* va for the entry in mask table */
 	struct pci_dev *dev;
-	int vector;
+	int irq;
 
 	struct msi_msg msg;		/* Last set MSI message */
 
@@ -105,6 +104,7 @@ struct msi_desc {
 };
 
 int msi_maskable_irq(const struct msi_desc *);
+int msi_free_irq(struct msi_desc *entry);
 
 /*
  * Assume the maximum number of hot plug slots supported by the system is about
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/xen/hvm/irq.h
--- a/xen/include/xen/hvm/irq.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/xen/hvm/irq.h	Tue Aug 11 16:07:27 2009 +0800
@@ -88,7 +88,7 @@ struct hvm_irq_dpci {
     DECLARE_BITMAP(isairq_map, NR_ISAIRQS);
     /* Record of mapped Links */
     uint8_t link_cnt[NR_LINK];
-    struct timer hvm_timer[NR_VECTORS];
+    struct timer *hvm_timer;
     struct tasklet dirq_tasklet;
 };
 
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/xen/iommu.h
--- a/xen/include/xen/iommu.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/xen/iommu.h	Tue Aug 11 16:07:27 2009 +0800
@@ -53,7 +53,7 @@ struct iommu {
     spinlock_t lock; /* protect context, domain ids */
     spinlock_t register_lock; /* protect iommu register handling */
     u64 root_maddr; /* root entry machine address */
-    int vector;
+    int irq;
     struct intel_iommu *intel;
 };
 
diff -r 6a639384fba6 -r 8584327c7e70 xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Tue Jul 28 13:29:51 2009 +0800
+++ b/xen/include/xen/irq.h	Tue Aug 11 16:07:27 2009 +0800
@@ -53,6 +53,7 @@ typedef struct hw_interrupt_type hw_irq_
 # define nr_irqs_gsi NR_IRQS
 #else
 extern unsigned int nr_irqs_gsi;
+extern unsigned int nr_irqs;
 #endif
 
 struct msi_desc;
@@ -63,23 +64,19 @@ struct msi_desc;
  *
  * Pad this out to 32 bytes for cache and indexing reasons.
  */
-typedef struct {
+typedef struct irq_desc{
     unsigned int status;		/* IRQ status */
     hw_irq_controller *handler;
     struct msi_desc   *msi_desc;
     struct irqaction *action;	/* IRQ action list */
     unsigned int depth;		/* nested irq disables */
+    int irq;
     spinlock_t lock;
     cpumask_t affinity;
 } __cacheline_aligned irq_desc_t;
 
+#ifndef CONFIG_X86
 extern irq_desc_t irq_desc[NR_VECTORS];
-
-extern int setup_irq_vector(unsigned int, struct irqaction *);
-extern void release_irq_vector(unsigned int);
-extern int request_irq_vector(unsigned int vector,
-               void (*handler)(int, void *, struct cpu_user_regs *),
-               unsigned long irqflags, const char * devname, void *dev_id);
 
 #define setup_irq(irq, action) \
     setup_irq_vector(irq_to_vector(irq), action)
@@ -89,6 +86,16 @@ extern int request_irq_vector(unsigned i
 
 #define request_irq(irq, handler, irqflags, devname, devid) \
     request_irq_vector(irq_to_vector(irq), handler, irqflags, devname, devid)
+
+#else
+extern struct irq_desc *irq_desc;
+
+extern int setup_irq(unsigned int irq, struct irqaction *);
+extern void release_irq(unsigned int irq);
+extern int request_irq(unsigned int irq,
+               void (*handler)(int, void *, struct cpu_user_regs *),
+               unsigned long irqflags, const char * devname, void *dev_id);
+#endif
 
 extern hw_irq_controller no_irq_type;
 extern void no_action(int cpl, void *dev_id, struct cpu_user_regs *regs);
@@ -102,16 +109,18 @@ extern irq_desc_t *domain_spin_lock_irq_
 extern irq_desc_t *domain_spin_lock_irq_desc(
     struct domain *d, int irq, unsigned long *pflags);
 
-static inline void set_native_irq_info(unsigned int vector, cpumask_t mask)
+static inline void set_native_irq_info(unsigned int irq, cpumask_t mask)
 {
-    irq_desc[vector].affinity = mask;
+    irq_desc[irq].affinity = mask;
 }
 
-#ifdef irq_to_vector
 static inline void set_irq_info(int irq, cpumask_t mask)
 {
+#ifdef CONFIG_X86
+    set_native_irq_info(irq, mask);
+#else
     set_native_irq_info(irq_to_vector(irq), mask);
+#endif
 }
-#endif
 
 #endif /* __XEN_IRQ_H__ */

[-- Attachment #4: 0003-per_cpu_vector_implementation.patch --]
[-- Type: application/octet-stream, Size: 73519 bytes --]

# HG changeset patch
# User Xiantao Zhang <xiantao.zhang@intel.com>
# Date 1250131777 -28800
# Node ID 4009a583e41c45239c502c7296882b72ef22b52e
# Parent  8584327c7e701a6a5005ed2cee7daad3b6a39659
x86: Implement per-cpu vector for xen hypervisor

Since Xen and Linux has big differece in code base, it
is very hard to port Linux's patch and apply it to Xen
directly, so this patch only adopts core logic of Linux,
and make it work for Xen.

Key changes:
1. vector allocation algorithm
2. all IRQ chips' set_affinity logic
3. IRQ migration when cpu hot remove.
4. Break assumptions which depend on global vector policy.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>

diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/apic.c
--- a/xen/arch/x86/apic.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/apic.c	Thu Aug 13 10:49:37 2009 +0800
@@ -70,7 +70,7 @@ int modern_apic(void)
  */
 void ack_bad_irq(unsigned int irq)
 {
-    printk("unexpected IRQ trap at vector %02x\n", irq);
+    printk("unexpected IRQ trap at irq %02x\n", irq);
     /*
      * Currently unexpected vectors happen only on SMP and APIC.
      * We _must_ ack these because every local APIC has only N
@@ -1197,9 +1197,11 @@ int reprogram_timer(s_time_t timeout)
 
 fastcall void smp_apic_timer_interrupt(struct cpu_user_regs * regs)
 {
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
     ack_APIC_irq();
     perfc_incr(apic_timer);
     raise_softirq(TIMER_SOFTIRQ);
+    set_irq_regs(old_regs);
 }
 
 /*
@@ -1208,6 +1210,7 @@ fastcall void smp_spurious_interrupt(str
 fastcall void smp_spurious_interrupt(struct cpu_user_regs *regs)
 {
     unsigned long v;
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
     irq_enter();
     /*
@@ -1223,6 +1226,7 @@ fastcall void smp_spurious_interrupt(str
     printk(KERN_INFO "spurious APIC interrupt on CPU#%d, should never happen.\n",
            smp_processor_id());
     irq_exit();
+    set_irq_regs(old_regs);
 }
 
 /*
@@ -1232,6 +1236,7 @@ fastcall void smp_error_interrupt(struct
 fastcall void smp_error_interrupt(struct cpu_user_regs *regs)
 {
     unsigned long v, v1;
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
     irq_enter();
     /* First tickle the hardware, only then report what went on. -- REW */
@@ -1254,6 +1259,7 @@ fastcall void smp_error_interrupt(struct
     printk (KERN_DEBUG "APIC error on CPU%d: %02lx(%02lx)\n",
             smp_processor_id(), v , v1);
     irq_exit();
+    set_irq_regs(old_regs);
 }
 
 /*
@@ -1262,8 +1268,10 @@ fastcall void smp_error_interrupt(struct
 
 fastcall void smp_pmu_apic_interrupt(struct cpu_user_regs *regs)
 {
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
     ack_APIC_irq();
     hvm_do_pmu_interrupt(regs);
+    set_irq_regs(old_regs);
 }
 
 /*
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/cpu/mcheck/mce_intel.c
--- a/xen/arch/x86/cpu/mcheck/mce_intel.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/cpu/mcheck/mce_intel.c	Thu Aug 13 10:49:37 2009 +0800
@@ -84,9 +84,11 @@ static void (*vendor_thermal_interrupt)(
 
 fastcall void smp_thermal_interrupt(struct cpu_user_regs *regs)
 {
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
     irq_enter();
     vendor_thermal_interrupt(regs);
     irq_exit();
+    set_irq_regs(old_regs);
 }
 
 /* P4/Xeon Thermal regulation detect and init */
@@ -964,6 +966,7 @@ fastcall void smp_cmci_interrupt(struct 
 {
     mctelem_cookie_t mctc;
     struct mca_summary bs;
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
 
     ack_APIC_irq();
     irq_enter();
@@ -984,6 +987,7 @@ fastcall void smp_cmci_interrupt(struct 
         mctelem_dismiss(mctc);
 
     irq_exit();
+    set_irq_regs(old_regs);
 }
 
 void mce_intel_feature_init(struct cpuinfo_x86 *c)
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/genapic/delivery.c
--- a/xen/arch/x86/genapic/delivery.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/genapic/delivery.c	Thu Aug 13 10:49:37 2009 +0800
@@ -29,13 +29,17 @@ cpumask_t target_cpus_flat(void)
 cpumask_t target_cpus_flat(void)
 {
 	return cpu_online_map;
+}
+
+cpumask_t vector_allocation_domain_flat(int cpu)
+{
+	return cpu_online_map;
 } 
 
 unsigned int cpu_mask_to_apicid_flat(cpumask_t cpumask)
 {
-	return cpus_addr(cpumask)[0];
+	return cpus_addr(cpumask)[0]&0xFF;
 }
-
 
 /*
  * PHYSICAL DELIVERY MODE (unicast to physical APIC IDs).
@@ -57,8 +61,12 @@ void clustered_apic_check_phys(void)
 
 cpumask_t target_cpus_phys(void)
 {
-	/* IRQs will get bound more accurately later. */
-	return cpumask_of_cpu(0);
+	return cpu_online_map;
+}
+
+cpumask_t vector_allocation_domain_phys(int cpu)
+{
+	return cpumask_of_cpu(cpu);
 }
 
 unsigned int cpu_mask_to_apicid_phys(cpumask_t cpumask)
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/genapic/x2apic.c
--- a/xen/arch/x86/genapic/x2apic.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/genapic/x2apic.c	Thu Aug 13 10:49:37 2009 +0800
@@ -47,8 +47,12 @@ void clustered_apic_check_x2apic(void)
 
 cpumask_t target_cpus_x2apic(void)
 {
-    /* Deliver interrupts only to CPU0 for now */
-    return cpumask_of_cpu(0);
+    return cpu_online_map;
+}
+
+cpumask_t vector_allocation_domain_x2apic(int cpu)
+{
+	return cpumask_of_cpu(cpu);
 }
 
 unsigned int cpu_mask_to_apicid_x2apic(cpumask_t cpumask)
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/hpet.c
--- a/xen/arch/x86/hpet.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/hpet.c	Thu Aug 13 10:49:37 2009 +0800
@@ -287,6 +287,9 @@ static void hpet_msi_shutdown(unsigned i
 
 static void hpet_msi_ack(unsigned int irq)
 {
+    struct irq_desc *desc = irq_to_desc(irq);
+
+    irq_complete_move(&desc);
     ack_APIC_irq();
 }
 
@@ -298,24 +301,19 @@ static void hpet_msi_set_affinity(unsign
 {
     struct msi_msg msg;
     unsigned int dest;
-    cpumask_t tmp;
-    int vector = irq_to_vector(irq);
-
-    cpus_and(tmp, mask, cpu_online_map);
-    if ( cpus_empty(tmp) )
-        mask = TARGET_CPUS;
-
-    dest = cpu_mask_to_apicid(mask);
-
-    hpet_msi_read(vector, &msg);
-
+    struct irq_desc * desc = irq_to_desc(irq);
+    struct irq_cfg *cfg= desc->chip_data;
+
+    dest = set_desc_affinity(desc, mask);
+    if (dest == BAD_APICID)
+        return;
+
+    hpet_msi_read(irq, &msg);
     msg.data &= ~MSI_DATA_VECTOR_MASK;
-    msg.data |= MSI_DATA_VECTOR(vector);
+    msg.data |= MSI_DATA_VECTOR(cfg->vector);
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
     msg.address_lo |= MSI_ADDR_DEST_ID(dest);
-
-    hpet_msi_write(vector, &msg);
-    irq_desc[irq].affinity = mask;
+    hpet_msi_write(irq, &msg);
 }
 
 /*
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/hvm/vmx/vmx.c
--- a/xen/arch/x86/hvm/vmx/vmx.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/hvm/vmx/vmx.c	Thu Aug 13 10:49:37 2009 +0800
@@ -2062,13 +2062,14 @@ static void vmx_do_extint(struct cpu_use
 
     asmlinkage void do_IRQ(struct cpu_user_regs *);
     fastcall void smp_apic_timer_interrupt(struct cpu_user_regs *);
-    fastcall void smp_event_check_interrupt(void);
+    fastcall void smp_event_check_interrupt(struct cpu_user_regs *regs);
     fastcall void smp_invalidate_interrupt(void);
-    fastcall void smp_call_function_interrupt(void);
+    fastcall void smp_call_function_interrupt(struct cpu_user_regs *regs);
     fastcall void smp_spurious_interrupt(struct cpu_user_regs *regs);
     fastcall void smp_error_interrupt(struct cpu_user_regs *regs);
     fastcall void smp_pmu_apic_interrupt(struct cpu_user_regs *regs);
     fastcall void smp_cmci_interrupt(struct cpu_user_regs *regs);
+    fastcall void smp_irq_move_cleanup_interrupt(struct cpu_user_regs *regs);
 #ifdef CONFIG_X86_MCE_THERMAL
     fastcall void smp_thermal_interrupt(struct cpu_user_regs *regs);
 #endif
@@ -2081,17 +2082,20 @@ static void vmx_do_extint(struct cpu_use
 
     switch ( vector )
     {
+    case IRQ_MOVE_CLEANUP_VECTOR:
+        smp_irq_move_cleanup_interrupt(regs);
+        break;
     case LOCAL_TIMER_VECTOR:
         smp_apic_timer_interrupt(regs);
         break;
     case EVENT_CHECK_VECTOR:
-        smp_event_check_interrupt();
+        smp_event_check_interrupt(regs);
         break;
     case INVALIDATE_TLB_VECTOR:
         smp_invalidate_interrupt();
         break;
     case CALL_FUNCTION_VECTOR:
-        smp_call_function_interrupt();
+        smp_call_function_interrupt(regs);
         break;
     case SPURIOUS_APIC_VECTOR:
         smp_spurious_interrupt(regs);
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/i8259.c
--- a/xen/arch/x86/i8259.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/i8259.c	Thu Aug 13 10:49:37 2009 +0800
@@ -58,6 +58,7 @@ BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd) BU
  * is no hardware IRQ pin equivalent for them, they are triggered
  * through the ICC by us (IPIs)
  */
+BUILD_SMP_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
 BUILD_SMP_INTERRUPT(event_check_interrupt,EVENT_CHECK_VECTOR)
 BUILD_SMP_INTERRUPT(invalidate_interrupt,INVALIDATE_TLB_VECTOR)
 BUILD_SMP_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
@@ -374,7 +375,7 @@ static struct irqaction cascade = { no_a
 
 void __init init_IRQ(void)
 {
-    int i, vector;
+    int vector, irq, cpu = smp_processor_id();
 
     init_bsp_APIC();
 
@@ -389,15 +390,17 @@ void __init init_IRQ(void)
         set_intr_gate(vector, interrupt[vector]);
     }
 
-    for ( i = 0; i < 16; i++ )
-    {
-        vector_irq[LEGACY_VECTOR(i)] = i;
-        irq_desc[i].handler = &i8259A_irq_type;
-    }
-
-    /* Never allocate the hypercall vector or Linux/BSD fast-trap vector. */
-    vector_irq[HYPERCALL_VECTOR] = NEVER_ASSIGN_IRQ;
-    vector_irq[0x80] = NEVER_ASSIGN_IRQ;
+    for (irq = 0; irq < 16; irq++) {
+        struct irq_desc *desc = irq_to_desc(irq);
+        struct irq_cfg *cfg = desc->chip_data;
+        
+        desc->handler = &i8259A_irq_type;
+        per_cpu(vector_irq, cpu)[FIRST_LEGACY_VECTOR + irq] = irq;
+        cfg->domain = cpumask_of_cpu(cpu);
+        cfg->vector = FIRST_LEGACY_VECTOR + irq;
+    }
+    
+    per_cpu(vector_irq, cpu)[FIRST_HIPRIORITY_VECTOR] = 0;
 
     apic_intr_init();
 
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/io_apic.c
--- a/xen/arch/x86/io_apic.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/io_apic.c	Thu Aug 13 10:49:37 2009 +0800
@@ -30,7 +30,6 @@
 #include <xen/pci.h>
 #include <xen/pci_regs.h>
 #include <xen/keyhandler.h>
-#include <asm/io.h>
 #include <asm/mc146818rtc.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
@@ -85,7 +84,36 @@ static struct irq_pin_list {
     int apic, pin;
     unsigned int next;
 } *irq_2_pin;
+
+static int *pin_irq_map;
+
 static unsigned int irq_2_pin_free_entry;
+
+/* Use an arry to record pin_2_irq_mapping */
+static int get_irq_from_apic_pin(int apic, int pin)
+{
+    int i, pin_base = 0;
+
+    ASSERT(apic < nr_ioapics);
+    
+    for (i = 0; i < apic; i++)
+        pin_base += nr_ioapic_registers[i];
+
+    return pin_irq_map[pin_base + pin];
+}
+
+static void set_irq_to_apic_pin(int apic, int pin, int irq)
+{
+    
+    int i, pin_base = 0;
+
+    ASSERT(apic < nr_ioapics);
+    
+    for (i = 0; i < apic; i++)
+        pin_base += nr_ioapic_registers[i];
+
+    pin_irq_map[pin_base + pin] = irq;
+}
 
 /*
  * The common case is 1:1 IRQ<->pin mappings. Sometimes there are
@@ -100,7 +128,7 @@ static void add_pin_to_irq(unsigned int 
         BUG_ON((entry->apic == apic) && (entry->pin == pin));
         entry = irq_2_pin + entry->next;
     }
-
+    
     BUG_ON((entry->apic == apic) && (entry->pin == pin));
 
     if (entry->pin != -1) {
@@ -113,6 +141,8 @@ static void add_pin_to_irq(unsigned int 
     }
     entry->apic = apic;
     entry->pin = pin;
+
+    set_irq_to_apic_pin(apic, pin, irq);
 }
 
 static void remove_pin_at_irq(unsigned int irq, int apic, int pin)
@@ -145,14 +175,16 @@ static void remove_pin_at_irq(unsigned i
         entry->next = irq_2_pin_free_entry;
         irq_2_pin_free_entry = entry - irq_2_pin;
     }
+
+    set_irq_to_apic_pin(apic, pin, -1);
 }
 
 /*
  * Reroute an IRQ to a different pin.
  */
 static void __init replace_pin_at_irq(unsigned int irq,
-				      int oldapic, int oldpin,
-				      int newapic, int newpin)
+                      int oldapic, int oldpin,
+                      int newapic, int newpin)
 {
     struct irq_pin_list *entry = irq_2_pin + irq;
 
@@ -232,7 +264,7 @@ static void clear_IO_APIC_pin(unsigned i
 {
     struct IO_APIC_route_entry entry;
     unsigned long flags;
-	
+    
     /* Check delivery_mode to be sure we're not clearing an SMI pin */
     spin_lock_irqsave(&ioapic_lock, flags);
     *(((int*)&entry) + 0) = io_apic_read(apic, 0x10 + 2 * pin);
@@ -262,32 +294,160 @@ static void clear_IO_APIC (void)
 }
 
 #ifdef CONFIG_SMP
-static void set_ioapic_affinity_irq(unsigned int irq, cpumask_t cpumask)
+fastcall void smp_irq_move_cleanup_interrupt(struct cpu_user_regs *regs)
+{
+    unsigned vector, me;
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
+
+    ack_APIC_irq();
+    irq_enter();
+
+    me = smp_processor_id();
+    for (vector = FIRST_DYNAMIC_VECTOR; vector < NR_VECTORS; vector++) {
+        unsigned int irq;
+        unsigned int irr;
+        struct irq_desc *desc;
+        struct irq_cfg *cfg;
+        irq = __get_cpu_var(vector_irq)[vector];
+
+        if (irq == -1)
+            continue;
+
+        desc = irq_to_desc(irq);
+        if (!desc)
+            continue;
+
+        cfg = desc->chip_data;
+        spin_lock(&desc->lock);
+        if (!cfg->move_cleanup_count)
+            goto unlock;
+
+        if (vector == cfg->vector && cpu_isset(me, cfg->domain))
+            goto unlock;
+
+        irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
+        /*
+         * Check if the vector that needs to be cleanedup is
+         * registered at the cpu's IRR. If so, then this is not
+         * the best time to clean it up. Lets clean it up in the
+         * next attempt by sending another IRQ_MOVE_CLEANUP_VECTOR
+         * to myself.
+         */
+        if (irr  & (1 << (vector % 32))) {
+            genapic->send_IPI_self(IRQ_MOVE_CLEANUP_VECTOR);
+            goto unlock;
+        }
+        __get_cpu_var(vector_irq)[vector] = -1;
+        cfg->move_cleanup_count--;
+unlock:
+        spin_unlock(&desc->lock);
+    }
+
+    irq_exit();
+    set_irq_regs(old_regs);
+}
+
+static void send_cleanup_vector(struct irq_cfg *cfg)
+{
+    cpumask_t cleanup_mask;
+
+    cpus_and(cleanup_mask, cfg->old_domain, cpu_online_map);
+    cfg->move_cleanup_count = cpus_weight(cleanup_mask);
+    genapic->send_IPI_mask(&cleanup_mask, IRQ_MOVE_CLEANUP_VECTOR);
+
+    cfg->move_in_progress = 0;
+}
+
+void irq_complete_move(struct irq_desc **descp)
+{
+    struct irq_desc *desc = *descp;
+    struct irq_cfg *cfg = desc->chip_data;
+    unsigned vector, me;
+
+    if (likely(!cfg->move_in_progress))
+        return;
+
+    vector = get_irq_regs()->entry_vector;
+    me = smp_processor_id();
+
+    if (vector == cfg->vector && cpumask_test_cpu(me, cfg->domain))
+        send_cleanup_vector(cfg);
+}
+
+unsigned int set_desc_affinity(struct irq_desc *desc, cpumask_t mask)
+{
+    struct irq_cfg *cfg;
+    unsigned int irq;
+    int ret;
+    cpumask_t dest_mask;
+
+    if (!cpus_intersects(mask, cpu_online_map))
+        return BAD_APICID;
+
+    irq = desc->irq;
+    cfg = desc->chip_data;
+    
+    lock_vector_lock();   
+    ret = __assign_irq_vector(irq, cfg, mask);
+    unlock_vector_lock();
+    
+    if (ret < 0)
+        return BAD_APICID;
+
+    cpus_copy(desc->affinity, mask);
+    cpus_and(dest_mask, desc->affinity, cfg->domain);
+
+    return cpu_mask_to_apicid(dest_mask);
+}
+
+static void
+set_ioapic_affinity_irq_desc(struct irq_desc *desc,
+                                        const struct cpumask mask)
 {
     unsigned long flags;
-    int pin;
-    struct irq_pin_list *entry = irq_2_pin + irq;
-    unsigned int apicid_value;
-
-    cpus_and(cpumask, cpumask, cpu_online_map);
-    if (cpus_empty(cpumask))
-        cpumask = TARGET_CPUS;
-
-    apicid_value = cpu_mask_to_apicid(cpumask);
-    /* Prepare to do the io_apic_write */
-    apicid_value = apicid_value << 24;
+    unsigned int dest;
+    int pin, irq;
+    struct irq_cfg *cfg;
+    struct irq_pin_list *entry;
+
+    irq = desc->irq;
+    cfg = desc->chip_data;
+
     spin_lock_irqsave(&ioapic_lock, flags);
-    for (;;) {
-        pin = entry->pin;
-        if (pin == -1)
-            break;
-        io_apic_write(entry->apic, 0x10 + 1 + pin*2, apicid_value);
-        if (!entry->next)
-            break;
-        entry = irq_2_pin + entry->next;
-    }
-    set_irq_info(irq, cpumask);
+    dest = set_desc_affinity(desc, mask);
+    if (dest != BAD_APICID) {
+        /* Only the high 8 bits are valid. */
+        dest = SET_APIC_LOGICAL_ID(dest);
+        entry = irq_2_pin + irq;
+        for (;;) {
+            unsigned int data;
+            pin = entry->pin;
+            if (pin == -1)
+                break;
+
+            io_apic_write(entry->apic, 0x10 + 1 + pin*2, dest);
+            data = io_apic_read(entry->apic, 0x10 + pin*2);
+            data &= ~IO_APIC_REDIR_VECTOR_MASK;
+            data |= cfg->vector & 0xFF;
+            io_apic_modify(entry->apic, 0x10 + pin*2, data);
+
+            if (!entry->next)
+                break;
+            entry = irq_2_pin + entry->next;
+        }
+    }
     spin_unlock_irqrestore(&ioapic_lock, flags);
+
+}
+
+static void
+set_ioapic_affinity_irq(unsigned int irq, const struct cpumask mask)
+{
+    struct irq_desc *desc;
+
+    desc = irq_to_desc(irq);
+
+    set_ioapic_affinity_irq_desc(desc, mask);
 }
 #endif /* CONFIG_SMP */
 
@@ -373,6 +533,7 @@ void /*__init*/ setup_ioapic_dest(void)
 void /*__init*/ setup_ioapic_dest(void)
 {
     int pin, ioapic, irq, irq_entry;
+    struct irq_cfg *cfg;
 
     if (skip_ioapic_setup == 1)
         return;
@@ -383,7 +544,9 @@ void /*__init*/ setup_ioapic_dest(void)
             if (irq_entry == -1)
                 continue;
             irq = pin_2_irq(irq_entry, ioapic, pin);
-            set_ioapic_affinity_irq(irq, TARGET_CPUS);
+            cfg = irq_cfg(irq);
+            BUG_ON(cpus_empty(cfg->domain));
+            set_ioapic_affinity_irq(irq, cfg->domain);
         }
 
     }
@@ -409,7 +572,7 @@ static int EISA_ELCR(unsigned int irq)
  * EISA conforming in the MP table, that means its trigger type must
  * be read in from the ELCR */
 
-#define default_EISA_trigger(idx)	(EISA_ELCR(mp_irqs[idx].mpc_srcbusirq))
+#define default_EISA_trigger(idx)    (EISA_ELCR(mp_irqs[idx].mpc_srcbusirq))
 #define default_EISA_polarity(idx)	(0)
 
 /* ISA interrupts are always polarity zero edge triggered,
@@ -682,11 +845,12 @@ static void __init setup_IO_APIC_irqs(vo
     struct IO_APIC_route_entry entry;
     int apic, pin, idx, irq, first_notcon = 1, vector;
     unsigned long flags;
+    struct irq_cfg *cfg;
 
     apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
 
     for (apic = 0; apic < nr_ioapics; apic++) {
-	for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
+        for (pin = 0; pin < nr_ioapic_registers[apic]; pin++) {
 
             /*
              * add it to the IO-APIC irq-routing table:
@@ -695,9 +859,7 @@ static void __init setup_IO_APIC_irqs(vo
 
             entry.delivery_mode = INT_DELIVERY_MODE;
             entry.dest_mode = INT_DEST_MODE;
-            entry.mask = 0;				/* enable IRQ */
-            entry.dest.logical.logical_dest = 
-                cpu_mask_to_apicid(TARGET_CPUS);
+            entry.mask = 0;                /* enable IRQ */
 
             idx = find_irq_entry(apic,pin,mp_INT);
             if (idx == -1) {
@@ -736,12 +898,16 @@ static void __init setup_IO_APIC_irqs(vo
 
             if (IO_APIC_IRQ(irq)) {
                 vector = assign_irq_vector(irq);
+                BUG_ON(vector < 0);
                 entry.vector = vector;
                 ioapic_register_intr(irq, IOAPIC_AUTO);
 
                 if (!apic && (irq < 16))
                     disable_8259A_irq(irq);
             }
+            cfg = irq_cfg(irq);
+            entry.dest.logical.logical_dest = 
+                cpu_mask_to_apicid(cfg->domain);
             spin_lock_irqsave(&ioapic_lock, flags);
             io_apic_write(apic, 0x11+2*pin, *(((int *)&entry)+1));
             io_apic_write(apic, 0x10+2*pin, *(((int *)&entry)+0));
@@ -968,11 +1134,16 @@ static void __init enable_IO_APIC(void)
 
     /* Initialise dynamic irq_2_pin free list. */
     irq_2_pin = xmalloc_array(struct irq_pin_list, PIN_MAP_SIZE);
-    memset(irq_2_pin, 0, nr_irqs_gsi * sizeof(*irq_2_pin));
+    memset(irq_2_pin, 0, PIN_MAP_SIZE * sizeof(*irq_2_pin));
+    pin_irq_map = xmalloc_array(int, nr_irqs_gsi);
+    memset(pin_irq_map, 0, nr_irqs_gsi * sizeof(int));
+        
     for (i = 0; i < PIN_MAP_SIZE; i++)
         irq_2_pin[i].pin = -1;
     for (i = irq_2_pin_free_entry = nr_irqs_gsi; i < PIN_MAP_SIZE; i++)
         irq_2_pin[i].next = i + 1;
+    for (i = 0; i < nr_irqs_gsi; i++)
+        pin_irq_map[i] = -1;
 
     for(apic = 0; apic < nr_ioapics; apic++) {
         int pin;
@@ -1266,7 +1437,11 @@ static unsigned int startup_edge_ioapic_
  */
 static void ack_edge_ioapic_irq(unsigned int irq)
 {
-    if ((irq_desc[irq].status & (IRQ_PENDING | IRQ_DISABLED))
+    struct irq_desc *desc = irq_to_desc(irq);
+    
+    irq_complete_move(&desc);
+
+    if ((desc->status & (IRQ_PENDING | IRQ_DISABLED))
         == (IRQ_PENDING | IRQ_DISABLED))
         mask_IO_APIC_irq(irq);
     ack_APIC_irq();
@@ -1309,6 +1484,9 @@ static void mask_and_ack_level_ioapic_ir
 {
     unsigned long v;
     int i;
+    struct irq_desc *desc = irq_to_desc(irq);
+
+    irq_complete_move(&desc);
 
     if ( ioapic_ack_new )
         return;
@@ -1446,6 +1624,8 @@ static void ack_msi_irq(unsigned int irq
 {
     struct irq_desc *desc = irq_to_desc(irq);
 
+    irq_complete_move(&desc);
+
     if ( msi_maskable_irq(desc->msi_desc) )
         ack_APIC_irq(); /* ACKTYPE_NONE */
 }
@@ -1597,7 +1777,7 @@ static inline void check_timer(void)
 static inline void check_timer(void)
 {
     int apic1, pin1, apic2, pin2;
-    int vector;
+    int vector, ret;
     unsigned long flags;
 
     local_irq_save(flags);
@@ -1606,8 +1786,12 @@ static inline void check_timer(void)
      * get/set the timer IRQ vector:
      */
     disable_8259A_irq(0);
-    vector = assign_irq_vector(0);
-
+    vector = FIRST_HIPRIORITY_VECTOR;
+    clear_irq_vector(0);
+
+    if ((ret = bind_irq_vector(0, vector, (cpumask_t)CPU_MASK_ALL)))
+        printk(KERN_ERR"..IRQ0 is not set correctly with ioapic!!!, err:%d\n", ret);
+    
     irq_desc[0].depth  = 0;
     irq_desc[0].status &= ~IRQ_DISABLED;
     irq_desc[0].handler = &ioapic_edge_type;
@@ -1914,6 +2098,7 @@ int io_apic_set_pci_routing (int ioapic,
 {
     struct IO_APIC_route_entry entry;
     unsigned long flags;
+    int vector;
 
     if (!IO_APIC_IRQ(irq)) {
         printk(KERN_ERR "IOAPIC[%d]: Invalid reference to IRQ 0\n",
@@ -1942,7 +2127,10 @@ int io_apic_set_pci_routing (int ioapic,
     if (irq >= 16)
         add_pin_to_irq(irq, ioapic, pin);
 
-    entry.vector = assign_irq_vector(irq);
+    vector = assign_irq_vector(irq);
+    if (vector < 0)
+        return vector;
+    entry.vector = vector;
 
     apic_printk(APIC_DEBUG, KERN_DEBUG "IOAPIC[%d]: Set PCI routing entry "
 		"(%d-%d -> 0x%x -> IRQ %d Mode:%i Active:%i)\n", ioapic,
@@ -2014,7 +2202,6 @@ int ioapic_guest_write(unsigned long phy
 
     /* Write first half from guest; second half is target info. */
     *(u32 *)&new_rte = val;
-    new_rte.dest.logical.logical_dest = cpu_mask_to_apicid(TARGET_CPUS);
 
     /*
      * What about weird destination types?
@@ -2060,10 +2247,10 @@ int ioapic_guest_write(unsigned long phy
     }
 
     if ( old_rte.vector >= FIRST_DYNAMIC_VECTOR )
-        old_irq = vector_irq[old_rte.vector];
-
-    if ( new_rte.vector >= FIRST_DYNAMIC_VECTOR )
-        new_irq = vector_irq[new_rte.vector];
+        old_irq = get_irq_from_apic_pin(apic, pin);
+
+    /* FIXME: dirty hack to support per-cpu vector. */
+    new_irq = new_rte.vector;
 
     if ( (old_irq != new_irq) && (old_irq >= 0) && IO_APIC_IRQ(old_irq) )
     {
@@ -2096,6 +2283,8 @@ int ioapic_guest_write(unsigned long phy
 
         /* Mask iff level triggered. */
         new_rte.mask = new_rte.trigger;
+        /* Set the vector field to the real vector! */
+        new_rte.vector = irq_cfg[new_irq].vector;
     }
     else if ( !new_rte.mask )
     {
@@ -2104,6 +2293,8 @@ int ioapic_guest_write(unsigned long phy
         new_rte.mask = 1;
     }
 
+    new_rte.dest.logical.logical_dest = 
+    cpu_mask_to_apicid(irq_cfg[new_irq].domain);
 
     io_apic_write(apic, 0x10 + 2 * pin, *(((int *)&new_rte) + 0));
     io_apic_write(apic, 0x11 + 2 * pin, *(((int *)&new_rte) + 1));
@@ -2144,11 +2335,12 @@ void dump_ioapic_irq_info(void)
 
             printk("vector=%u, delivery_mode=%u, dest_mode=%s, "
                    "delivery_status=%d, polarity=%d, irr=%d, "
-                   "trigger=%s, mask=%d\n",
+                   "trigger=%s, mask=%d, dest_id:%d\n",
                    rte.vector, rte.delivery_mode,
                    rte.dest_mode ? "logical" : "physical",
                    rte.delivery_status, rte.polarity, rte.irr,
-                   rte.trigger ? "level" : "edge", rte.mask);
+                   rte.trigger ? "level" : "edge", rte.mask,
+                   rte.dest.logical.logical_dest);
 
             if ( entry->next == 0 )
                 break;
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/irq.c	Thu Aug 13 10:49:37 2009 +0800
@@ -20,6 +20,7 @@
 #include <asm/msi.h>
 #include <asm/current.h>
 #include <asm/flushtlb.h>
+#include <asm/mach-generic/mach_apic.h>
 #include <public/physdev.h>
 
 /* opt_noirqbalance: If true, software IRQ balancing/affinity is disabled. */
@@ -38,12 +39,71 @@ int __read_mostly *irq_status = NULL;
 #define IRQ_USED        (1)
 #define IRQ_RSVD        (2)
 
+#define IRQ_VECTOR_UNASSIGNED (0)
+
+DECLARE_BITMAP(used_vectors, NR_VECTORS);
+
+struct irq_cfg __read_mostly *irq_cfg = NULL;
+
 static struct timer *irq_guest_eoi_timer;
 
 static DEFINE_SPINLOCK(vector_lock);
-int vector_irq[NR_VECTORS] __read_mostly = {
-    [0 ... NR_VECTORS - 1] = FREE_TO_ASSIGN_IRQ
+
+DEFINE_PER_CPU(vector_irq_t, vector_irq) = {
+    [0 ... NR_VECTORS - 1] = -1
 };
+
+DEFINE_PER_CPU(struct cpu_user_regs *, __irq_regs);
+
+void lock_vector_lock(void)
+{
+    /* Used to the online set of cpus does not change
+     * during assign_irq_vector.
+     */
+    spin_lock(&vector_lock);
+}
+
+void unlock_vector_lock(void)
+{
+    spin_unlock(&vector_lock);
+}
+
+static int __bind_irq_vector(int irq, int vector, cpumask_t domain)
+{
+    cpumask_t mask;
+    int cpu;
+    struct irq_cfg *cfg = irq_cfg(irq);
+
+    BUG_ON((unsigned)irq >= nr_irqs);
+    BUG_ON((unsigned)vector >= NR_VECTORS);
+
+    cpus_and(mask, domain, cpu_online_map);
+    if (cpus_empty(mask))
+        return -EINVAL;
+    if ((cfg->vector == vector) && cpus_equal(cfg->domain, domain))
+        return 0;
+    if (cfg->vector != IRQ_VECTOR_UNASSIGNED) 
+        return -EBUSY;
+    for_each_cpu_mask(cpu, mask)
+        per_cpu(vector_irq, cpu)[vector] = irq;
+    cfg->vector = vector;
+    cfg->domain = domain;
+    irq_status[irq] = IRQ_USED;
+    if (IO_APIC_IRQ(irq))
+        irq_vector[irq] = vector;
+    return 0;
+}
+
+int bind_irq_vector(int irq, int vector, cpumask_t domain)
+{
+    unsigned long flags;
+    int ret;
+
+    spin_lock_irqsave(&vector_lock, flags);
+    ret = __bind_irq_vector(irq, vector, domain);
+    spin_unlock_irqrestore(&vector_lock, flags);
+    return ret;
+}
 
 static inline int find_unassigned_irq(void)
 {
@@ -69,7 +129,7 @@ int create_irq(void)
     irq = find_unassigned_irq();
     if (irq < 0)
          goto out;
-    ret = __assign_irq_vector(irq);
+    ret = __assign_irq_vector(irq, irq_cfg(irq), TARGET_CPUS);
     if (ret < 0)
         irq = ret;
 out:
@@ -81,8 +141,8 @@ void dynamic_irq_cleanup(unsigned int ir
 void dynamic_irq_cleanup(unsigned int irq)
 {
     struct irq_desc *desc = irq_to_desc(irq);
+    unsigned long flags;
     struct irqaction *action;
-    unsigned long flags;
 
     spin_lock_irqsave(&desc->lock, flags);
     desc->status  |= IRQ_DISABLED;
@@ -102,12 +162,39 @@ void dynamic_irq_cleanup(unsigned int ir
         xfree(action);
 }
 
+static void init_one_irq_status(int irq);
+
 static void __clear_irq_vector(int irq)
 {
-    int vector = irq_vector[irq];
-    vector_irq[vector] = FREE_TO_ASSIGN_IRQ;
-    irq_vector[irq] = 0;
-    irq_status[irq] = IRQ_UNUSED;
+    int cpu, vector;
+    cpumask_t tmp_mask;
+    struct irq_cfg *cfg = irq_cfg(irq);
+
+    BUG_ON(!cfg->vector);
+
+    vector = cfg->vector;
+    cpus_and(tmp_mask, cfg->domain, cpu_online_map);
+
+    for_each_cpu_mask(cpu, tmp_mask)
+        per_cpu(vector_irq, cpu)[vector] = -1;
+
+    cfg->vector = IRQ_VECTOR_UNASSIGNED;
+    cpus_clear(cfg->domain);
+    init_one_irq_status(irq);
+
+    if (likely(!cfg->move_in_progress))
+        return;
+    for_each_cpu_mask(cpu, tmp_mask) {
+        for (vector = FIRST_DYNAMIC_VECTOR; vector <= LAST_DYNAMIC_VECTOR;
+                                vector++) {
+            if (per_cpu(vector_irq, cpu)[vector] != irq)
+                continue;
+            per_cpu(vector_irq, cpu)[vector] = -1;
+             break;
+        }
+     }
+
+    cfg->move_in_progress = 0;
 }
 
 void clear_irq_vector(int irq)
@@ -121,6 +208,7 @@ void clear_irq_vector(int irq)
 
 void destroy_irq(unsigned int irq)
 {
+    BUG_ON(!MSI_IRQ(irq));
     dynamic_irq_cleanup(irq);
     clear_irq_vector(irq);
 }
@@ -128,12 +216,16 @@ int irq_to_vector(int irq)
 int irq_to_vector(int irq)
 {
     int vector = -1;
+    struct irq_cfg *cfg;
 
     BUG_ON(irq >= nr_irqs || irq < 0);
 
-    if (IO_APIC_IRQ(irq) || MSI_IRQ(irq))
+    if (IO_APIC_IRQ(irq))
         vector = irq_vector[irq];
-    else
+    else if(MSI_IRQ(irq)) {
+        cfg = irq_cfg(irq);
+        vector = cfg->vector;
+    } else
         vector = LEGACY_VECTOR(irq);
 
     return vector;
@@ -141,13 +233,13 @@ int irq_to_vector(int irq)
 
 static void init_one_irq_desc(struct irq_desc *desc)
 {
-        desc->status  = IRQ_DISABLED;
-        desc->handler = &no_irq_type;
-        desc->action  = NULL;
-        desc->depth   = 1;
-        desc->msi_desc = NULL;
-        spin_lock_init(&desc->lock);
-        cpus_setall(desc->affinity);
+    desc->status  = IRQ_DISABLED;
+    desc->handler = &no_irq_type;
+    desc->action  = NULL;
+    desc->depth   = 1;
+    desc->msi_desc = NULL;
+    spin_lock_init(&desc->lock);
+    cpus_setall(desc->affinity);
 }
 
 static void init_one_irq_status(int irq)
@@ -155,30 +247,51 @@ static void init_one_irq_status(int irq)
     irq_status[irq] = IRQ_UNUSED;
 }
 
+static void init_one_irq_cfg(struct irq_cfg *cfg)
+{
+    cfg->vector = IRQ_VECTOR_UNASSIGNED;
+    cpus_clear(cfg->domain);
+    cpus_clear(cfg->old_domain);
+}
+
 int init_irq_data(void)
 {
     struct irq_desc *desc;
+    struct irq_cfg *cfg;
     int irq;
 
     irq_desc = xmalloc_array(struct irq_desc, nr_irqs);
+    irq_cfg = xmalloc_array(struct irq_cfg, nr_irqs);
     irq_status = xmalloc_array(int, nr_irqs);
     irq_guest_eoi_timer = xmalloc_array(struct timer, nr_irqs);
-    irq_vector = xmalloc_array(u8, nr_irqs);
+    irq_vector = xmalloc_array(u8, nr_irqs_gsi);
     
-    if (!irq_desc || !irq_status ||! irq_vector || !irq_guest_eoi_timer)
-        return -1;
+    if (!irq_desc || !irq_cfg || !irq_status ||! irq_vector ||
+        !irq_guest_eoi_timer)
+        return -ENOMEM;
 
     memset(irq_desc, 0,  nr_irqs * sizeof(*irq_desc));
+    memset(irq_cfg, 0,  nr_irqs * sizeof(*irq_cfg));
     memset(irq_status, 0,  nr_irqs * sizeof(*irq_status));
-    memset(irq_vector, 0, nr_irqs * sizeof(*irq_vector));
+    memset(irq_vector, 0, nr_irqs_gsi * sizeof(*irq_vector));
     memset(irq_guest_eoi_timer, 0, nr_irqs * sizeof(*irq_guest_eoi_timer));
     
     for (irq = 0; irq < nr_irqs; irq++) {
         desc = irq_to_desc(irq);
+        cfg = irq_cfg(irq);
         desc->irq = irq;
+        desc->chip_data = cfg;
         init_one_irq_desc(desc);
+        init_one_irq_cfg(cfg);
         init_one_irq_status(irq);
     }
+
+    /* Never allocate the hypercall vector or Linux/BSD fast-trap vector. */
+    set_bit(LEGACY_SYSCALL_VECTOR, used_vectors);
+    set_bit(HYPERCALL_VECTOR, used_vectors);
+    
+    /* IRQ_MOVE_CLEANUP_VECTOR used for clean up vectors */
+    set_bit(IRQ_MOVE_CLEANUP_VECTOR, used_vectors);
 
     return 0;
 }
@@ -210,54 +323,133 @@ struct hw_interrupt_type no_irq_type = {
 
 atomic_t irq_err_count;
 
-int __assign_irq_vector(int irq)
-{
-    static unsigned current_vector = FIRST_DYNAMIC_VECTOR;
-    unsigned vector;
-
-    BUG_ON(irq >= nr_irqs || irq < 0);
-
-    if ((irq_to_vector(irq) > 0)) 
-        return irq_to_vector(irq);
-
-    vector = current_vector;
-    while (vector_irq[vector] != FREE_TO_ASSIGN_IRQ) {
+int __assign_irq_vector(int irq, struct irq_cfg *cfg, cpumask_t mask)
+{
+    /*
+     * NOTE! The local APIC isn't very good at handling
+     * multiple interrupts at the same interrupt level.
+     * As the interrupt level is determined by taking the
+     * vector number and shifting that right by 4, we
+     * want to spread these out a bit so that they don't
+     * all fall in the same interrupt level.
+     *
+     * Also, we've got to be careful not to trash gate
+     * 0x80, because int 0x80 is hm, kind of importantish. ;)
+     */
+    static int current_vector = FIRST_DYNAMIC_VECTOR, current_offset = 0;
+    unsigned int old_vector;
+    int cpu, err;
+    cpumask_t tmp_mask;
+
+    if ((cfg->move_in_progress) || cfg->move_cleanup_count)
+        return -EBUSY;
+
+    old_vector = irq_to_vector(irq);
+    if (old_vector) {
+        cpus_and(tmp_mask, mask, cpu_online_map);
+        cpus_and(tmp_mask, cfg->domain, tmp_mask);
+        if (!cpus_empty(tmp_mask)) {
+            cfg->vector = old_vector;
+            return 0;
+        }
+    }
+
+    /* Only try and allocate irqs on cpus that are present */
+    cpus_and(mask, mask, cpu_online_map);
+
+    err = -ENOSPC;
+    for_each_cpu_mask(cpu, mask) {
+        int new_cpu;
+        int vector, offset;
+
+        tmp_mask = vector_allocation_domain(cpu);
+        cpus_and(tmp_mask, tmp_mask, cpu_online_map);
+
+        vector = current_vector;
+        offset = current_offset;
+next:
         vector += 8;
-        if (vector > LAST_DYNAMIC_VECTOR)
-            vector = FIRST_DYNAMIC_VECTOR + ((vector + 1) & 7);
-
-        if (vector == current_vector)
-            return -ENOSPC;
-    }
-
-    current_vector = vector;
-    vector_irq[vector] = irq;
-    irq_vector[irq] = vector;
-    irq_status[irq] = IRQ_USED;
-
-    return vector;
+        if (vector > LAST_DYNAMIC_VECTOR) {
+            /* If out of vectors on large boxen, must share them. */
+            offset = (offset + 1) % 8;
+            vector = FIRST_DYNAMIC_VECTOR + offset;
+        }
+        if (unlikely(current_vector == vector))
+            continue;
+
+        if (test_bit(vector, used_vectors))
+            goto next;
+
+        for_each_cpu_mask(new_cpu, tmp_mask)
+            if (per_cpu(vector_irq, new_cpu)[vector] != -1)
+                goto next;
+        /* Found one! */
+        current_vector = vector;
+        current_offset = offset;
+        if (old_vector) {
+            cfg->move_in_progress = 1;
+            cpus_copy(cfg->old_domain, cfg->domain);
+        }
+        for_each_cpu_mask(new_cpu, tmp_mask)
+            per_cpu(vector_irq, new_cpu)[vector] = irq;
+        cfg->vector = vector;
+        cpus_copy(cfg->domain, tmp_mask);
+
+        irq_status[irq] = IRQ_USED;
+            if (IO_APIC_IRQ(irq))
+                    irq_vector[irq] = vector;
+        err = 0;
+        break;
+    }
+    return err;
 }
 
 int assign_irq_vector(int irq)
 {
     int ret;
     unsigned long flags;
+    struct irq_cfg *cfg = &irq_cfg[irq];
     
+    BUG_ON(irq >= nr_irqs || irq <0);
+
     spin_lock_irqsave(&vector_lock, flags);
-    ret = __assign_irq_vector(irq);
+    ret = __assign_irq_vector(irq, cfg, TARGET_CPUS);
+    if (!ret)
+        ret = cfg->vector;
     spin_unlock_irqrestore(&vector_lock, flags);
-
     return ret;
 }
 
+/*
+ * Initialize vector_irq on a new cpu. This function must be called
+ * with vector_lock held.
+ */
+void __setup_vector_irq(int cpu)
+{
+    int irq, vector;
+    struct irq_cfg *cfg;
+
+    /* Clear vector_irq */
+    for (vector = 0; vector < NR_VECTORS; ++vector)
+        per_cpu(vector_irq, cpu)[vector] = -1;
+    /* Mark the inuse vectors */
+    for (irq = 0; irq < nr_irqs; ++irq) {
+        cfg = irq_cfg(irq);
+        if (!cpu_isset(cpu, cfg->domain))
+            continue;
+        vector = irq_to_vector(irq);
+        per_cpu(vector_irq, cpu)[vector] = irq;
+    }
+}
 
 asmlinkage void do_IRQ(struct cpu_user_regs *regs)
 {
     struct irqaction *action;
     uint32_t          tsc_in;
+    struct irq_desc  *desc;
     unsigned int      vector = regs->entry_vector;
-    int irq = vector_irq[vector];
-    struct irq_desc  *desc;
+    int irq = __get_cpu_var(vector_irq[vector]);
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
     
     perfc_incr(irqs);
 
@@ -265,6 +457,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
         ack_APIC_irq();
         printk("%s: %d.%d No irq handler for vector (irq %d)\n",
                 __func__, smp_processor_id(), vector, irq);
+        set_irq_regs(old_regs);
         return;
     }
 
@@ -281,6 +474,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
         TRACE_3D(TRC_TRACE_IRQ, irq, tsc_in, get_cycles());
         irq_exit();
         spin_unlock(&desc->lock);
+        set_irq_regs(old_regs);
         return;
     }
 
@@ -314,6 +508,7 @@ asmlinkage void do_IRQ(struct cpu_user_r
  out:
     desc->handler->end(irq);
     spin_unlock(&desc->lock);
+    set_irq_regs(old_regs);
 }
 
 int request_irq(unsigned int irq,
@@ -412,6 +607,7 @@ typedef struct {
 #define ACKTYPE_UNMASK 1     /* Unmask PIC hardware (from any CPU)   */
 #define ACKTYPE_EOI    2     /* EOI on the CPU that was interrupted  */
     cpumask_t cpu_eoi_map;   /* CPUs that need to EOI this interrupt */
+    u8 eoi_vector;           /* vector awaiting the EOI*/
     struct domain *guest[IRQ_MAX_GUESTS];
 } irq_guest_action_t;
 
@@ -472,7 +668,7 @@ static void __do_IRQ_guest(int irq)
     struct domain      *d;
     int                 i, sp, already_pending = 0;
     struct pending_eoi *peoi = this_cpu(pending_eoi);
-    int vector = irq_to_vector(irq);
+    int vector = get_irq_regs()->entry_vector;
 
     if ( unlikely(action->nr_guests == 0) )
     {
@@ -492,6 +688,7 @@ static void __do_IRQ_guest(int irq)
         peoi[sp].ready = 0;
         pending_eoi_sp(peoi) = sp+1;
         cpu_set(smp_processor_id(), action->cpu_eoi_map);
+        action->eoi_vector = vector;
     }
 
     for ( i = 0; i < action->nr_guests; i++ )
@@ -583,7 +780,8 @@ static void flush_ready_eoi(void)
 
     while ( (--sp >= 0) && peoi[sp].ready )
     {
-        irq = vector_irq[peoi[sp].vector];
+        irq = __get_cpu_var(vector_irq[peoi[sp].vector]);
+        ASSERT(irq > 0);
         desc = irq_to_desc(irq);
         spin_lock(&desc->lock);
         desc->handler->end(irq);
@@ -607,9 +805,10 @@ static void __set_eoi_ready(struct irq_d
         return;
 
     sp = pending_eoi_sp(peoi);
+
     do {
         ASSERT(sp > 0);
-    } while ( peoi[--sp].vector != irq_to_vector(irq) );
+    } while ( peoi[--sp].vector != action->eoi_vector );
     ASSERT(!peoi[sp].ready);
     peoi[sp].ready = 1;
 }
@@ -1233,57 +1432,58 @@ extern void dump_ioapic_irq_info(void);
 
 static void dump_irqs(unsigned char key)
 {
-    int i, glob_irq, irq, vector;
+    int i, irq, pirq;
     struct irq_desc *desc;
+    struct irq_cfg *cfg;
     irq_guest_action_t *action;
     struct domain *d;
     unsigned long flags;
 
     printk("Guest interrupt information:\n");
 
-    for ( vector = 0; vector < NR_VECTORS; vector++ )
-    {
-
-        glob_irq = vector_to_irq(vector);
-        if (glob_irq < 0)
+    for ( irq = 0; irq < nr_irqs; irq++ )
+    {
+
+        desc = irq_to_desc(irq);
+        cfg = desc->chip_data;
+
+        if ( !desc->handler || desc->handler == &no_irq_type )
             continue;
 
-        desc = irq_to_desc(glob_irq);
-        if ( desc == NULL || desc->handler == &no_irq_type )
-            continue;
-
         spin_lock_irqsave(&desc->lock, flags);
 
         if ( !(desc->status & IRQ_GUEST) )
-            printk("   Vec%3d IRQ%3d: type=%-15s status=%08x "
-                   "mapped, unbound\n",
-                   vector, glob_irq, desc->handler->typename, desc->status);
+            /* Only show CPU0 - CPU31's affinity info.*/
+            printk("   IRQ:%4d, IRQ affinity:0x%08x, Vec:%3d type=%-15s"
+                    " status=%08x mapped, unbound\n",
+                   irq, *(int*)cfg->domain.bits, cfg->vector,
+                    desc->handler->typename, desc->status);
         else
         {
             action = (irq_guest_action_t *)desc->action;
 
-            printk("   Vec%3d IRQ%3d: type=%-15s status=%08x "
-                   "in-flight=%d domain-list=",
-                   vector, glob_irq, desc->handler->typename,
-                   desc->status, action->in_flight);
+            printk("   IRQ:%4d, IRQ affinity:0x%08x, Vec:%3d type=%-15s "
+                    "status=%08x in-flight=%d domain-list=",
+                   irq, *(int*)cfg->domain.bits, cfg->vector,
+                   desc->handler->typename, desc->status, action->in_flight);
 
             for ( i = 0; i < action->nr_guests; i++ )
             {
                 d = action->guest[i];
-                irq = domain_irq_to_pirq(d, vector_irq[vector]);
+                pirq = domain_irq_to_pirq(d, irq);
                 printk("%u:%3d(%c%c%c%c)",
-                       d->domain_id, irq,
-                       (test_bit(d->pirq_to_evtchn[glob_irq],
+                       d->domain_id, pirq,
+                       (test_bit(d->pirq_to_evtchn[pirq],
                                  &shared_info(d, evtchn_pending)) ?
                         'P' : '-'),
-                       (test_bit(d->pirq_to_evtchn[glob_irq] /
+                       (test_bit(d->pirq_to_evtchn[pirq] /
                                  BITS_PER_EVTCHN_WORD(d),
                                  &vcpu_info(d->vcpu[0], evtchn_pending_sel)) ?
                         'S' : '-'),
-                       (test_bit(d->pirq_to_evtchn[glob_irq],
+                       (test_bit(d->pirq_to_evtchn[pirq],
                                  &shared_info(d, evtchn_mask)) ?
                         'M' : '-'),
-                       (test_bit(glob_irq, d->pirq_mask) ?
+                       (test_bit(pirq, d->pirq_mask) ?
                         'M' : '-'));
                 if ( i != action->nr_guests )
                     printk(",");
@@ -1315,53 +1515,69 @@ __initcall(setup_dump_irqs);
 #include <asm/mach-generic/mach_apic.h>
 #include <xen/delay.h>
 
-void fixup_irqs(cpumask_t map)
-{
-    unsigned int vector, sp;
+/* A cpu has been removed from cpu_online_mask.  Re-set irq affinities. */
+void fixup_irqs(void)
+{
+    unsigned int irq, sp;
     static int warned;
+    struct irq_desc *desc;
     irq_guest_action_t *action;
     struct pending_eoi *peoi;
-    irq_desc_t         *desc;
-    unsigned long       flags;
-
-    /* Direct all future interrupts away from this CPU. */
-    for ( vector = 0; vector < NR_VECTORS; vector++ )
-    {
-        cpumask_t mask;
-        if ( vector_to_irq(vector) == 2 )
+    for(irq = 0; irq < nr_irqs; irq++ ) {
+        int break_affinity = 0;
+        int set_affinity = 1;
+        cpumask_t affinity;
+        if (irq == 2)
             continue;
-
-        desc = irq_to_desc(vector_to_irq(vector));
-
-        spin_lock_irqsave(&desc->lock, flags);
-
-        cpus_and(mask, desc->affinity, map);
-        if ( any_online_cpu(mask) == NR_CPUS )
+        desc = irq_to_desc(irq);
+        /* interrupt's are disabled at this point */
+        spin_lock(&desc->lock);
+
+        affinity = desc->affinity;
+        if (!desc->action ||
+            cpus_equal(affinity, cpu_online_map)) {
+            spin_unlock(&desc->lock);
+            continue;
+        }
+
+        cpus_and(affinity, affinity, cpu_online_map);
+        if ( any_online_cpu(affinity) == NR_CPUS )
         {
-            printk("Breaking affinity for vector %u (irq %i)\n",
-                   vector, vector_to_irq(vector));
-            mask = map;
+            break_affinity = 1;
+            affinity = cpu_online_map;
         }
-        if ( desc->handler->set_affinity )
-            desc->handler->set_affinity(vector, mask);
-        else if ( desc->action && !(warned++) )
-            printk("Cannot set affinity for vector %u (irq %i)\n",
-                   vector, vector_to_irq(vector));
-
-        spin_unlock_irqrestore(&desc->lock, flags);
-    }
-
-    /* Service any interrupts that beat us in the re-direction race. */
+
+        if (desc->handler->disable)
+            desc->handler->disable(irq);
+
+        if (desc->handler->set_affinity)
+            desc->handler->set_affinity(irq, affinity);
+        else if (!(warned++))
+            set_affinity = 0;
+
+        if (desc->handler->enable)
+            desc->handler->enable(irq);
+
+        spin_unlock(&desc->lock);
+
+        if (break_affinity && set_affinity)
+            printk("Broke affinity for irq %i\n", irq);
+        else if (!set_affinity)
+            printk("Cannot set affinity for irq %i\n", irq);
+    }
+
+    /* That doesn't seem sufficient.  Give it 1ms. */
     local_irq_enable();
     mdelay(1);
     local_irq_disable();
 
     /* Clean up cpu_eoi_map of every interrupt to exclude this CPU. */
-    for ( vector = 0; vector < NR_VECTORS; vector++ )
-    {
-        if ( !(irq_desc[vector_to_irq(vector)].status & IRQ_GUEST) )
+    for ( irq = 0; irq < nr_irqs; irq++ )
+    {
+        desc = irq_to_desc(irq);
+        if ( !(desc->status & IRQ_GUEST) )
             continue;
-        action = (irq_guest_action_t *)irq_desc[vector_to_irq(vector)].action;
+        action = (irq_guest_action_t *)desc->action;
         cpu_clear(smp_processor_id(), action->cpu_eoi_map);
     }
 
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/msi.c
--- a/xen/arch/x86/msi.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/msi.c	Thu Aug 13 10:49:37 2009 +0800
@@ -120,13 +120,19 @@ void msi_compose_msg(struct pci_dev *pde
                             struct msi_msg *msg)
 {
     unsigned dest;
-    cpumask_t tmp;
-    int vector = irq_to_vector(irq);
-
-    tmp = TARGET_CPUS;
-    if ( vector )
-    {
-        dest = cpu_mask_to_apicid(tmp);
+    cpumask_t domain;
+    struct irq_cfg *cfg = irq_cfg(irq);
+    int vector = cfg->vector;
+    domain = cfg->domain;
+
+    if ( cpus_empty( domain ) ) {
+        dprintk(XENLOG_ERR,"%s, compose msi message error!!\n", __func__);
+	    return;
+    }
+
+    if ( vector ) {
+
+        dest = cpu_mask_to_apicid(domain);
 
         msg->address_hi = MSI_ADDR_BASE_HI;
         msg->address_lo =
@@ -274,11 +280,23 @@ static void write_msi_msg(struct msi_des
 
 void set_msi_affinity(unsigned int irq, cpumask_t mask)
 {
-    struct msi_desc *desc = irq_desc[irq].msi_desc;
     struct msi_msg msg;
     unsigned int dest;
+    struct irq_desc *desc = irq_to_desc(irq);
+    struct msi_desc *msi_desc = desc->msi_desc;
+    struct irq_cfg *cfg = desc->chip_data;
+
+    dest = set_desc_affinity(desc, mask);
+    if (dest == BAD_APICID || !msi_desc)
+        return;
+
+    ASSERT(spin_is_locked(&desc->lock));
 
     memset(&msg, 0, sizeof(msg));
+    read_msi_msg(msi_desc, &msg);
+
+    msg.data &= ~MSI_DATA_VECTOR_MASK;
+    msg.data |= MSI_DATA_VECTOR(cfg->vector);
     cpus_and(mask, mask, cpu_online_map);
     if ( cpus_empty(mask) )
         mask = TARGET_CPUS;
@@ -287,13 +305,16 @@ void set_msi_affinity(unsigned int irq, 
     if ( !desc )
         return;
 
-    ASSERT(spin_is_locked(&irq_desc[irq].lock));
-    read_msi_msg(desc, &msg);
+    ASSERT(spin_is_locked(&desc->lock));
+    read_msi_msg(msi_desc, &msg);
+
+    msg.data &= ~MSI_DATA_VECTOR_MASK;
+    msg.data |= MSI_DATA_VECTOR(cfg->vector);
 
     msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
     msg.address_lo |= MSI_ADDR_DEST_ID(dest);
 
-    write_msi_msg(desc, &msg);
+    write_msi_msg(msi_desc, &msg);
 }
 
 static void msi_set_enable(struct pci_dev *dev, int enable)
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/physdev.c
--- a/xen/arch/x86/physdev.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/physdev.c	Thu Aug 13 10:49:37 2009 +0800
@@ -329,6 +329,7 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
 
     case PHYSDEVOP_alloc_irq_vector: {
         struct physdev_irq irq_op;
+        int vector;
 
         ret = -EFAULT;
         if ( copy_from_guest(&irq_op, arg, 1) != 0 )
@@ -344,8 +345,16 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_H
 
         irq = irq_op.irq;
         ret = -EINVAL;
-
-        irq_op.vector = assign_irq_vector(irq);
+        
+        /* FIXME: Once dom0 breaks GSI IRQ limit, it is
+            a must to eliminate the limit here */
+        BUG_ON(irq >= 256);
+        
+        vector = assign_irq_vector(irq);
+        if (vector >= FIRST_DYNAMIC_VECTOR)
+            irq_op.vector = irq;
+        else
+            irq_op.vector = -ENOSPC;
 
         spin_lock(&pcidevs_lock);
         spin_lock(&dom0->event_lock);
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/setup.c
--- a/xen/arch/x86/setup.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/setup.c	Thu Aug 13 10:49:37 2009 +0800
@@ -921,9 +921,9 @@ void __init __start_xen(unsigned long mb
 
     init_apic_mappings();
 
+    percpu_init_areas();
+
     init_IRQ();
-    
-    percpu_init_areas();
 
     xsm_init(&initrdidx, mbi, initial_images_start);
 
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/smp.c
--- a/xen/arch/x86/smp.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/smp.c	Thu Aug 13 10:49:37 2009 +0800
@@ -26,7 +26,11 @@
  * send_IPI_mask(cpumask, vector): sends @vector IPI to CPUs in @cpumask,
  * excluding the local CPU. @cpumask may be empty.
  */
-#define send_IPI_mask (genapic->send_IPI_mask)
+
+void send_IPI_mask(const cpumask_t *mask, int vector)
+{
+    genapic->send_IPI_mask(mask, vector);
+}
 
 /*
  *	Some notes on x86 processor bugs affecting SMP operation:
@@ -89,6 +93,41 @@ void apic_wait_icr_idle(void)
         cpu_relax();
 }
 
+static void __default_send_IPI_shortcut(unsigned int shortcut, int vector,
+                                    unsigned int dest)
+{
+    unsigned int cfg;
+
+    /*
+     * Wait for idle.
+     */
+    apic_wait_icr_idle();
+
+    /*
+     * prepare target chip field
+     */
+    cfg = __prepare_ICR(shortcut, vector) | dest;
+    /*
+     * Send the IPI. The write to APIC_ICR fires this off.
+     */
+    apic_write_around(APIC_ICR, cfg);
+}
+
+void send_IPI_self_flat(int vector)
+{
+    __default_send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
+}
+
+void send_IPI_self_phys(int vector)
+{
+    __default_send_IPI_shortcut(APIC_DEST_SELF, vector, APIC_DEST_PHYSICAL);
+}
+
+void send_IPI_self_x2apic(int vector)
+{
+    apic_write(APIC_SELF_IPI, vector);    
+}
+
 void send_IPI_mask_flat(const cpumask_t *cpumask, int vector)
 {
     unsigned long mask = cpus_addr(*cpumask)[0];
@@ -337,8 +376,10 @@ void smp_send_nmi_allbutself(void)
 
 fastcall void smp_event_check_interrupt(struct cpu_user_regs *regs)
 {
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
     ack_APIC_irq();
     perfc_incr(ipis);
+    set_irq_regs(old_regs);
 }
 
 static void __smp_call_function_interrupt(void)
@@ -369,7 +410,10 @@ static void __smp_call_function_interrup
 
 fastcall void smp_call_function_interrupt(struct cpu_user_regs *regs)
 {
+    struct cpu_user_regs *old_regs = set_irq_regs(regs);
+
     ack_APIC_irq();
     perfc_incr(ipis);
     __smp_call_function_interrupt();
-}
+    set_irq_regs(old_regs);
+}
diff -r 8584327c7e70 -r 4009a583e41c xen/arch/x86/smpboot.c
--- a/xen/arch/x86/smpboot.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/arch/x86/smpboot.c	Thu Aug 13 10:49:37 2009 +0800
@@ -512,7 +512,12 @@ void __devinit start_secondary(void *unu
 	set_cpu_sibling_map(raw_smp_processor_id());
 	wmb();
 
+    /* Initlize vector_irq for BSPs */
+    lock_vector_lock();
+    __setup_vector_irq(smp_processor_id());
 	cpu_set(smp_processor_id(), cpu_online_map);
+    unlock_vector_lock();
+
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 
 	init_percpu_time();
@@ -1232,10 +1237,9 @@ remove_siblinginfo(int cpu)
 	cpu_clear(cpu, cpu_sibling_setup_map);
 }
 
-extern void fixup_irqs(cpumask_t map);
+extern void fixup_irqs(void);
 int __cpu_disable(void)
 {
-	cpumask_t map = cpu_online_map;
 	int cpu = smp_processor_id();
 
 	/*
@@ -1262,8 +1266,8 @@ int __cpu_disable(void)
 
 	remove_siblinginfo(cpu);
 
-	cpu_clear(cpu, map);
-	fixup_irqs(map);
+	cpu_clear(cpu, cpu_online_map);
+	fixup_irqs();
 	/* It's now safe to remove this processor from the online map */
 	cpu_clear(cpu, cpu_online_map);
 
@@ -1477,14 +1481,13 @@ void __init smp_cpus_done(unsigned int m
 
 void __init smp_intr_init(void)
 {
-	int irq, seridx;
+	int irq, seridx, cpu = smp_processor_id();
 
 	/*
 	 * IRQ0 must be given a fixed assignment and initialized,
 	 * because it's used before the IO-APIC is set up.
 	 */
 	irq_vector[0] = FIRST_HIPRIORITY_VECTOR;
-	vector_irq[FIRST_HIPRIORITY_VECTOR] = 0;
 
 	/*
 	 * Also ensure serial interrupts are high priority. We do not
@@ -1493,9 +1496,14 @@ void __init smp_intr_init(void)
 	for (seridx = 0; seridx < 2; seridx++) {
 		if ((irq = serial_irq(seridx)) < 0)
 			continue;
-		irq_vector[irq] = FIRST_HIPRIORITY_VECTOR + seridx + 1;
-		vector_irq[FIRST_HIPRIORITY_VECTOR + seridx + 1] = irq;
-	}
+        irq_vector[irq] = FIRST_HIPRIORITY_VECTOR + seridx + 1;
+        per_cpu(vector_irq, cpu)[FIRST_HIPRIORITY_VECTOR + seridx + 1] = irq;
+        irq_cfg[irq].vector = FIRST_HIPRIORITY_VECTOR + seridx + 1;
+        irq_cfg[irq].domain = (cpumask_t)CPU_MASK_ALL;
+	}
+
+    /* IPI for cleanuping vectors after irq move */
+    set_intr_gate(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt);
 
 	/* IPI for event checking. */
 	set_intr_gate(EVENT_CHECK_VECTOR, event_check_interrupt);
diff -r 8584327c7e70 -r 4009a583e41c xen/drivers/passthrough/amd/iommu_init.c
--- a/xen/drivers/passthrough/amd/iommu_init.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/drivers/passthrough/amd/iommu_init.c	Thu Aug 13 10:49:37 2009 +0800
@@ -26,6 +26,7 @@
 #include <asm/msi.h>
 #include <asm/hvm/svm/amd-iommu-proto.h>
 #include <asm-x86/fixmap.h>
+#include <mach_apic.h>
 
 static struct amd_iommu **irq_to_iommu;
 static int nr_amd_iommus;
@@ -303,40 +304,46 @@ static int amd_iommu_read_event_log(stru
     return -EFAULT;
 }
 
-static void amd_iommu_msi_data_init(struct amd_iommu *iommu)
-{
-    u32 msi_data;
+static void iommu_msi_set_affinity(unsigned int irq, cpumask_t mask)
+{
+    struct msi_msg msg;
+    unsigned int dest;
+    struct amd_iommu *iommu = irq_to_iommu[irq];
+    struct irq_desc *desc = irq_to_desc(irq);
+    struct irq_cfg *cfg = desc->chip_data;
     u8 bus = (iommu->bdf >> 8) & 0xff;
     u8 dev = PCI_SLOT(iommu->bdf & 0xff);
     u8 func = PCI_FUNC(iommu->bdf & 0xff);
-    int vector = irq_to_vector(iommu->irq);
-
-    msi_data = MSI_DATA_TRIGGER_EDGE |
-        MSI_DATA_LEVEL_ASSERT |
-        MSI_DATA_DELIVERY_FIXED |
-        MSI_DATA_VECTOR(vector);
+
+    dest = set_desc_affinity(desc, mask);
+    if (dest == BAD_APICID){
+        gdprintk(XENLOG_ERR, "Set iommu interrupt affinity error!\n");
+        return;
+    }
+
+    memset(&msg, 0, sizeof(msg)); 
+    msg.data = MSI_DATA_VECTOR(cfg->vector) & 0xff;
+    msg.data |= 1 << 14;
+    msg.data |= (INT_DELIVERY_MODE != dest_LowestPrio) ?
+        MSI_DATA_DELIVERY_FIXED:
+        MSI_DATA_DELIVERY_LOWPRI;
+
+    msg.address_hi =0;
+    msg.address_lo = (MSI_ADDRESS_HEADER << (MSI_ADDRESS_HEADER_SHIFT + 8)); 
+    msg.address_lo |= INT_DEST_MODE ? MSI_ADDR_DESTMODE_LOGIC:
+                    MSI_ADDR_DESTMODE_PHYS;
+    msg.address_lo |= (INT_DELIVERY_MODE != dest_LowestPrio) ?
+                    MSI_ADDR_REDIRECTION_CPU:
+                    MSI_ADDR_REDIRECTION_LOWPRI;
+    msg.address_lo |= MSI_ADDR_DEST_ID(dest & 0xff);
 
     pci_conf_write32(bus, dev, func,
-        iommu->msi_cap + PCI_MSI_DATA_64, msi_data);
-}
-
-static void amd_iommu_msi_addr_init(struct amd_iommu *iommu, int phy_cpu)
-{
-
-    int bus = (iommu->bdf >> 8) & 0xff;
-    int dev = PCI_SLOT(iommu->bdf & 0xff);
-    int func = PCI_FUNC(iommu->bdf & 0xff);
-
-    u32 address_hi = 0;
-    u32 address_lo = MSI_ADDR_HEADER |
-            MSI_ADDR_DESTMODE_PHYS |
-            MSI_ADDR_REDIRECTION_CPU |
-            MSI_ADDR_DEST_ID(phy_cpu);
-
+        iommu->msi_cap + PCI_MSI_DATA_64, msg.data);
     pci_conf_write32(bus, dev, func,
-        iommu->msi_cap + PCI_MSI_ADDRESS_LO, address_lo);
+        iommu->msi_cap + PCI_MSI_ADDRESS_LO, msg.address_lo);
     pci_conf_write32(bus, dev, func,
-        iommu->msi_cap + PCI_MSI_ADDRESS_HI, address_hi);
+        iommu->msi_cap + PCI_MSI_ADDRESS_HI, msg.address_hi);
+    
 }
 
 static void amd_iommu_msi_enable(struct amd_iommu *iommu, int flag)
@@ -373,6 +380,9 @@ static void iommu_msi_mask(unsigned int 
 {
     unsigned long flags;
     struct amd_iommu *iommu = irq_to_iommu[irq];
+    struct irq_desc *desc = irq_to_desc(irq);
+
+    irq_complete_move(&desc);
 
     /* FIXME: do not support mask bits at the moment */
     if ( iommu->maskbit )
@@ -395,11 +405,6 @@ static void iommu_msi_end(unsigned int i
     ack_APIC_irq();
 }
 
-static void iommu_msi_set_affinity(unsigned int irq, cpumask_t dest)
-{
-    struct amd_iommu *iommu = irq_to_iommu[irq];
-    amd_iommu_msi_addr_init(iommu, cpu_physical_id(first_cpu(dest)));
-}
 
 static struct hw_interrupt_type iommu_msi_type = {
     .typename = "AMD_IOV_MSI",
@@ -485,7 +490,7 @@ static int set_iommu_interrupt_handler(s
         gdprintk(XENLOG_ERR VTDPREFIX, "IOMMU: no irqs\n");
         return 0;
     }
-
+    
     irq_desc[irq].handler = &iommu_msi_type;
     irq_to_iommu[irq] = iommu;
     ret = request_irq(irq, amd_iommu_page_fault, 0,
@@ -524,8 +529,7 @@ void enable_iommu(struct amd_iommu *iomm
     register_iommu_event_log_in_mmio_space(iommu);
     register_iommu_exclusion_range(iommu);
 
-    amd_iommu_msi_data_init (iommu);
-    amd_iommu_msi_addr_init(iommu, cpu_physical_id(first_cpu(cpu_online_map)));
+    iommu_msi_set_affinity(iommu->irq, cpu_online_map);
     amd_iommu_msi_enable(iommu, IOMMU_CONTROL_ENABLED);
 
     set_iommu_command_buffer_control(iommu, IOMMU_CONTROL_ENABLED);
diff -r 8584327c7e70 -r 4009a583e41c xen/drivers/passthrough/vtd/iommu.c
--- a/xen/drivers/passthrough/vtd/iommu.c	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/drivers/passthrough/vtd/iommu.c	Thu Aug 13 10:49:37 2009 +0800
@@ -794,6 +794,9 @@ static void dma_msi_mask(unsigned int ir
 {
     unsigned long flags;
     struct iommu *iommu = irq_to_iommu[irq];
+    struct irq_desc *desc = irq_to_desc(irq);
+
+    irq_complete_move(&desc);
 
     /* mask it */
     spin_lock_irqsave(&iommu->register_lock, flags);
@@ -813,42 +816,45 @@ static void dma_msi_end(unsigned int irq
     ack_APIC_irq();
 }
 
-static void dma_msi_data_init(struct iommu *iommu, int irq)
-{
-    u32 msi_data = 0;
+static void dma_msi_set_affinity(unsigned int irq, cpumask_t mask)
+{
+    struct msi_msg msg;
+    unsigned int dest;
     unsigned long flags;
-    int vector = irq_to_vector(irq);
-
-    /* Fixed, edge, assert mode. Follow MSI setting */
-    msi_data |= vector & 0xff;
-    msi_data |= 1 << 14;
+
+    struct iommu *iommu = irq_to_iommu[irq];
+    struct irq_desc *desc = irq_to_desc(irq);
+    struct irq_cfg *cfg = desc->chip_data;
 
     spin_lock_irqsave(&iommu->register_lock, flags);
-    dmar_writel(iommu->reg, DMAR_FEDATA_REG, msi_data);
+    dest = set_desc_affinity(desc, mask);
+    if (dest == BAD_APICID){
+        gdprintk(XENLOG_ERR VTDPREFIX, "Set iommu interrupt affinity error!\n");
+        return;
+    }
+    
+    memset(&msg, 0, sizeof(msg)); 
+    msg.data = MSI_DATA_VECTOR(cfg->vector) & 0xff;
+    msg.data |= 1 << 14;
+    msg.data |= (INT_DELIVERY_MODE != dest_LowestPrio) ?
+        MSI_DATA_DELIVERY_FIXED:
+        MSI_DATA_DELIVERY_LOWPRI;
+
+    /* Follow MSI setting */
+    if (x2apic_enabled)
+        msg.address_hi = dest & 0xFFFFFF00;
+    msg.address_lo = (MSI_ADDRESS_HEADER << (MSI_ADDRESS_HEADER_SHIFT + 8)); 
+    msg.address_lo |= INT_DEST_MODE ? MSI_ADDR_DESTMODE_LOGIC:
+                    MSI_ADDR_DESTMODE_PHYS;
+    msg.address_lo |= (INT_DELIVERY_MODE != dest_LowestPrio) ?
+                    MSI_ADDR_REDIRECTION_CPU:
+                    MSI_ADDR_REDIRECTION_LOWPRI;
+    msg.address_lo |= MSI_ADDR_DEST_ID(dest & 0xff);
+
+    dmar_writel(iommu->reg, DMAR_FEDATA_REG, msg.data);
+    dmar_writel(iommu->reg, DMAR_FEADDR_REG, msg.address_lo);
+    dmar_writel(iommu->reg, DMAR_FEUADDR_REG, msg.address_hi);
     spin_unlock_irqrestore(&iommu->register_lock, flags);
-}
-
-static void dma_msi_addr_init(struct iommu *iommu, int phy_cpu)
-{
-    u64 msi_address;
-    unsigned long flags;
-
-    /* Physical, dedicated cpu. Follow MSI setting */
-    msi_address = (MSI_ADDRESS_HEADER << (MSI_ADDRESS_HEADER_SHIFT + 8));
-    msi_address |= MSI_PHYSICAL_MODE << 2;
-    msi_address |= MSI_REDIRECTION_HINT_MODE << 3;
-    msi_address |= phy_cpu << MSI_TARGET_CPU_SHIFT;
-
-    spin_lock_irqsave(&iommu->register_lock, flags);
-    dmar_writel(iommu->reg, DMAR_FEADDR_REG, (u32)msi_address);
-    dmar_writel(iommu->reg, DMAR_FEUADDR_REG, (u32)(msi_address >> 32));
-    spin_unlock_irqrestore(&iommu->register_lock, flags);
-}
-
-static void dma_msi_set_affinity(unsigned int irq, cpumask_t dest)
-{
-    struct iommu *iommu = irq_to_iommu[irq];
-    dma_msi_addr_init(iommu, cpu_physical_id(first_cpu(dest)));
 }
 
 static struct hw_interrupt_type dma_msi_type = {
@@ -1584,6 +1590,7 @@ static int init_vtd_hw(void)
     int irq = -1;
     int ret;
     unsigned long flags;
+    struct irq_cfg *cfg;
 
     for_each_drhd_unit ( drhd )
     {
@@ -1598,8 +1605,10 @@ static int init_vtd_hw(void)
             }
             iommu->irq = irq;
         }
-        dma_msi_data_init(iommu, iommu->irq);
-        dma_msi_addr_init(iommu, cpu_physical_id(first_cpu(cpu_online_map)));
+
+        cfg = irq_cfg(irq);
+        dma_msi_set_affinity(irq, cfg->domain);
+
         clear_fault_bits(iommu);
 
         spin_lock_irqsave(&iommu->register_lock, flags);
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/apic.h
--- a/xen/include/asm-x86/apic.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/apic.h	Thu Aug 13 10:49:37 2009 +0800
@@ -14,6 +14,12 @@
 #define APIC_QUIET   0
 #define APIC_VERBOSE 1
 #define APIC_DEBUG   2
+
+#define	SET_APIC_LOGICAL_ID(x)	(((x)<<24))
+
+#define IO_APIC_REDIR_VECTOR_MASK	0x000FF
+#define IO_APIC_REDIR_DEST_LOGICAL	0x00800
+#define IO_APIC_REDIR_DEST_PHYSICAL	0x00000
 
 extern int apic_verbosity;
 extern int x2apic_enabled;
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/apicdef.h
--- a/xen/include/asm-x86/apicdef.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/apicdef.h	Thu Aug 13 10:49:37 2009 +0800
@@ -107,7 +107,7 @@
 #define		APIC_TDCR	0x3E0
 
 /* Only available in x2APIC mode */
-#define		APIC_SELF_IPI	0x400
+#define		APIC_SELF_IPI	0x3F0
 
 #define			APIC_TDR_DIV_TMBASE	(1<<2)
 #define			APIC_TDR_DIV_1		0xB
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/genapic.h
--- a/xen/include/asm-x86/genapic.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/genapic.h	Thu Aug 13 10:49:37 2009 +0800
@@ -34,8 +34,10 @@ struct genapic {
 	void (*init_apic_ldr)(void);
 	void (*clustered_apic_check)(void);
 	cpumask_t (*target_cpus)(void);
+	cpumask_t (*vector_allocation_domain)(int cpu);
 	unsigned int (*cpu_mask_to_apicid)(cpumask_t cpumask);
 	void (*send_IPI_mask)(const cpumask_t *mask, int vector);
+    void (*send_IPI_self)(int vector);
 };
 
 #define APICFUNC(x) .x = x
@@ -53,41 +55,53 @@ cpumask_t target_cpus_flat(void);
 cpumask_t target_cpus_flat(void);
 unsigned int cpu_mask_to_apicid_flat(cpumask_t cpumask);
 void send_IPI_mask_flat(const cpumask_t *mask, int vector);
+void send_IPI_self_flat(int vector);
+cpumask_t vector_allocation_domain_flat(int cpu);
 #define GENAPIC_FLAT \
 	.int_delivery_mode = dest_LowestPrio, \
 	.int_dest_mode = 1 /* logical delivery */, \
 	.init_apic_ldr = init_apic_ldr_flat, \
 	.clustered_apic_check = clustered_apic_check_flat, \
 	.target_cpus = target_cpus_flat, \
+	.vector_allocation_domain = vector_allocation_domain_flat, \
 	.cpu_mask_to_apicid = cpu_mask_to_apicid_flat, \
-	.send_IPI_mask = send_IPI_mask_flat
+	.send_IPI_mask = send_IPI_mask_flat, \
+	.send_IPI_self = send_IPI_self_flat
 
 void init_apic_ldr_x2apic(void);
 void clustered_apic_check_x2apic(void);
 cpumask_t target_cpus_x2apic(void);
 unsigned int cpu_mask_to_apicid_x2apic(cpumask_t cpumask);
 void send_IPI_mask_x2apic(const cpumask_t *mask, int vector);
+void send_IPI_self_x2apic(int vector);
+cpumask_t vector_allocation_domain_x2apic(int cpu);
 #define GENAPIC_X2APIC \
 	.int_delivery_mode = dest_Fixed, \
 	.int_dest_mode = 0 /* physical delivery */, \
 	.init_apic_ldr = init_apic_ldr_x2apic, \
 	.clustered_apic_check = clustered_apic_check_x2apic, \
 	.target_cpus = target_cpus_x2apic, \
+	.vector_allocation_domain = vector_allocation_domain_x2apic, \
 	.cpu_mask_to_apicid = cpu_mask_to_apicid_x2apic, \
-	.send_IPI_mask = send_IPI_mask_x2apic
+	.send_IPI_mask = send_IPI_mask_x2apic,       \
+	.send_IPI_self = send_IPI_self_x2apic
 
 void init_apic_ldr_phys(void);
 void clustered_apic_check_phys(void);
 cpumask_t target_cpus_phys(void);
 unsigned int cpu_mask_to_apicid_phys(cpumask_t cpumask);
 void send_IPI_mask_phys(const cpumask_t *mask, int vector);
+void send_IPI_self_phys(int vector);
+cpumask_t vector_allocation_domain_phys(int cpu);
 #define GENAPIC_PHYS \
 	.int_delivery_mode = dest_Fixed, \
 	.int_dest_mode = 0 /* physical delivery */, \
 	.init_apic_ldr = init_apic_ldr_phys, \
 	.clustered_apic_check = clustered_apic_check_phys, \
 	.target_cpus = target_cpus_phys, \
+	.vector_allocation_domain = vector_allocation_domain_phys, \
 	.cpu_mask_to_apicid = cpu_mask_to_apicid_phys, \
-	.send_IPI_mask = send_IPI_mask_phys
+	.send_IPI_mask = send_IPI_mask_phys, \
+	.send_IPI_self = send_IPI_self_phys
 
 #endif
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/irq.h
--- a/xen/include/asm-x86/irq.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/irq.h	Thu Aug 13 10:49:37 2009 +0800
@@ -5,7 +5,10 @@
 
 #include <xen/config.h>
 #include <asm/atomic.h>
+#include <xen/cpumask.h>
+#include <xen/smp.h>
 #include <irq_vectors.h>
+#include <asm/percpu.h>
 
 #define IO_APIC_IRQ(irq)    (((irq) >= 16 && (irq) < nr_irqs_gsi) \
         || (((irq) < 16) && (1<<(irq)) & io_apic_irqs))
@@ -22,10 +25,44 @@
 #define MAX_GSI_IRQS PAGE_SIZE * 8
 #define MAX_NR_IRQS (2 * MAX_GSI_IRQS)
 
-extern int vector_irq[NR_VECTORS];
+#define irq_cfg(irq)        &irq_cfg[(irq)]
+
+struct irq_cfg {
+        int  vector;
+        cpumask_t domain;
+        cpumask_t old_domain;
+        unsigned move_cleanup_count;
+        u8 move_in_progress : 1;
+};
+
+extern struct irq_cfg *irq_cfg;
+
+typedef int vector_irq_t[NR_VECTORS];
+DECLARE_PER_CPU(vector_irq_t, vector_irq);
+
 extern u8 *irq_vector;
 
-extern int irq_to_vector(int irq);
+/*
+ * Per-cpu current frame pointer - the location of the last exception frame on
+ * the stack
+ */
+DECLARE_PER_CPU(struct cpu_user_regs *, __irq_regs);
+
+static inline struct cpu_user_regs *get_irq_regs(void)
+{
+	return __get_cpu_var(__irq_regs);
+}
+
+static inline struct cpu_user_regs *set_irq_regs(struct cpu_user_regs *new_regs)
+{
+	struct cpu_user_regs *old_regs, **pp_regs = &__get_cpu_var(__irq_regs);
+
+	old_regs = *pp_regs;
+	*pp_regs = new_regs;
+	return old_regs;
+}
+
+
 #define platform_legacy_irq(irq)	((irq) < 16)
 
 fastcall void event_check_interrupt(void);
@@ -37,6 +74,7 @@ fastcall void spurious_interrupt(void);
 fastcall void spurious_interrupt(void);
 fastcall void thermal_interrupt(void);
 fastcall void cmci_interrupt(void);
+fastcall void irq_move_cleanup_interrupt(void);
 
 void disable_8259A_irq(unsigned int irq);
 void enable_8259A_irq(unsigned int irq);
@@ -66,10 +104,24 @@ int  init_irq_data(void);
 int  init_irq_data(void);
 
 void clear_irq_vector(int irq);
-int __assign_irq_vector(int irq);
 
+int irq_to_vector(int irq);
 int create_irq(void);
 void destroy_irq(unsigned int irq);
+
+struct irq_desc;
+extern void irq_complete_move(struct irq_desc **descp);
+
+void lock_vector_lock(void);
+void unlock_vector_lock(void);
+
+void __setup_vector_irq(int cpu);
+
+void move_native_irq(int irq);
+
+int __assign_irq_vector(int irq, struct irq_cfg *cfg, cpumask_t mask);
+
+int bind_irq_vector(int irq, int vector, cpumask_t domain);
 
 #define domain_pirq_to_irq(d, pirq) ((d)->arch.pirq_irq[pirq])
 #define domain_irq_to_pirq(d, irq) ((d)->arch.irq_pirq[irq])
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/mach-default/irq_vectors.h
--- a/xen/include/asm-x86/mach-default/irq_vectors.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/mach-default/irq_vectors.h	Thu Aug 13 10:49:37 2009 +0800
@@ -28,6 +28,9 @@
 /* Dynamically-allocated vectors available to any driver. */
 #define FIRST_DYNAMIC_VECTOR	0x20
 #define LAST_DYNAMIC_VECTOR	0xdf
+#define NR_DYNAMIC_VECTORS	(LAST_DYNAMIC_VECTOR - FIRST_DYNAMIC_VECTOR + 1)
+
+#define IRQ_MOVE_CLEANUP_VECTOR FIRST_DYNAMIC_VECTOR
 
 #define NR_VECTORS 256
 
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/mach-generic/mach_apic.h
--- a/xen/include/asm-x86/mach-generic/mach_apic.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/mach-generic/mach_apic.h	Thu Aug 13 10:49:37 2009 +0800
@@ -14,6 +14,7 @@
 #define init_apic_ldr (genapic->init_apic_ldr)
 #define clustered_apic_check (genapic->clustered_apic_check) 
 #define cpu_mask_to_apicid (genapic->cpu_mask_to_apicid)
+#define vector_allocation_domain(cpu) (genapic->vector_allocation_domain(cpu))
 
 static inline void enable_apic_mode(void)
 {
diff -r 8584327c7e70 -r 4009a583e41c xen/include/asm-x86/smp.h
--- a/xen/include/asm-x86/smp.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/asm-x86/smp.h	Thu Aug 13 10:49:37 2009 +0800
@@ -36,6 +36,8 @@ DECLARE_PER_CPU(cpumask_t, cpu_core_map)
 DECLARE_PER_CPU(cpumask_t, cpu_core_map);
 
 void smp_send_nmi_allbutself(void);
+
+void  send_IPI_mask(const cpumask_t *mask, int vector);
 
 extern void (*mtrr_hook) (void);
 
diff -r 8584327c7e70 -r 4009a583e41c xen/include/xen/cpumask.h
--- a/xen/include/xen/cpumask.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/xen/cpumask.h	Thu Aug 13 10:49:37 2009 +0800
@@ -79,7 +79,7 @@
 #include <xen/bitmap.h>
 #include <xen/kernel.h>
 
-typedef struct { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
+typedef struct cpumask{ DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t;
 
 #define cpu_set(cpu, dst) __cpu_set((cpu), &(dst))
 static inline void __cpu_set(int cpu, volatile cpumask_t *dstp)
@@ -112,6 +112,16 @@ static inline int __cpu_test_and_set(int
 static inline int __cpu_test_and_set(int cpu, cpumask_t *addr)
 {
 	return test_and_set_bit(cpu, addr->bits);
+}
+
+/**
+ * cpumask_test_cpu - test for a cpu in a cpumask
+ */
+#define cpumask_test_cpu(cpu, cpumask) __cpu_test((cpu), &(cpumask))
+
+static inline int __cpu_test(int cpu, cpumask_t *addr)
+{
+	return test_bit(cpu, addr->bits);
 }
 
 #define cpu_test_and_clear(cpu, cpumask) __cpu_test_and_clear((cpu), &(cpumask))
@@ -193,6 +203,12 @@ static inline int __cpus_weight(const cp
 static inline int __cpus_weight(const cpumask_t *srcp, int nbits)
 {
 	return bitmap_weight(srcp->bits, nbits);
+}
+
+#define cpus_copy(dest, src) __cpus_copy(&(dest), &(src))
+static inline void __cpus_copy(cpumask_t *dstp, cpumask_t *srcp)
+{
+	bitmap_copy(dstp->bits, srcp->bits, NR_CPUS);
 }
 
 #define cpus_shift_right(dst, src, n) \
diff -r 8584327c7e70 -r 4009a583e41c xen/include/xen/irq.h
--- a/xen/include/xen/irq.h	Tue Aug 11 16:07:27 2009 +0800
+++ b/xen/include/xen/irq.h	Thu Aug 13 10:49:37 2009 +0800
@@ -70,12 +70,15 @@ typedef struct irq_desc{
     struct msi_desc   *msi_desc;
     struct irqaction *action;	/* IRQ action list */
     unsigned int depth;		/* nested irq disables */
+#if defined(__i386__) || defined(__x86_64__)
+    struct irq_cfg *chip_data;
+#endif
     int irq;
     spinlock_t lock;
     cpumask_t affinity;
 } __cacheline_aligned irq_desc_t;
 
-#ifndef CONFIG_X86
+#if defined(__ia64__)
 extern irq_desc_t irq_desc[NR_VECTORS];
 
 #define setup_irq(irq, action) \
@@ -116,11 +119,13 @@ static inline void set_native_irq_info(u
 
 static inline void set_irq_info(int irq, cpumask_t mask)
 {
-#ifdef CONFIG_X86
+#if defined(__i386__) || defined(__x86_64__)
     set_native_irq_info(irq, mask);
 #else
     set_native_irq_info(irq_to_vector(irq), mask);
 #endif
 }
 
+unsigned int set_desc_affinity(struct irq_desc *desc, cpumask_t mask);
+
 #endif /* __XEN_IRQ_H__ */

[-- Attachment #5: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: [PATCH RFC] Per-CPU vector for Xen.
  2009-08-16  8:58 [PATCH RFC] Per-CPU vector for Xen Zhang, Xiantao
@ 2009-08-18  5:45 ` Zhang, Xiantao
  2009-08-18  6:42   ` Keir Fraser
  0 siblings, 1 reply; 3+ messages in thread
From: Zhang, Xiantao @ 2009-08-18  5:45 UTC (permalink / raw)
  To: Zhang, Xiantao, Keir Fraser
  Cc: xen-devel, Kay, Allen M, Jiang, Yunhong, Dong, Eddie, Yang,
	Xiaowei, Li, Xin

Hi, Keir
   Another issue or limitation is also found when we tested Per-CPU vector patch at our side. Currenlty, Xen uses fixmap to access MSI-X resouce, (e.g. msi-x tables), 32 pages  512 pages are reserved for 32-pae and x86_64 separately in fixmap section, but  each MSI-X capable device(regardless of real device or virtual function) at least needs one page to map its resource, so these pages may easily run out with these devices, espeically on 32-bit platforms. For 64-bit platforms, we can reserve more pages to fix the issue, but for 32-bit platforms, it is not always safe to increase the number of pages due to limited virtual address space.  We have one optional solution to fix the issue through dynamically map/unmap MSI-X tables when access it, but the concern is that it may cost much due to frequent access for the resource.   What's suggestion or good idea to address the issue ?  Thanks!
Xiantao

Zhang, Xiantao wrote:
> Hi, Keir
>    To support more interrupt vectors in Xen for more devices, 
> especially for SR-IOV devices in a large system, we implemented
> per-cpu vector for Xen like Linux does. For SR-IOV devices, since
> each VF needs several separate vectors for interrupt delivery and
> global ~200 vectors in Xen is insufficient and easily run out after
> installing two or three such devices. Becaue SR-IOV devices are
> becoming popular now,  and from this point of view, we have to extend
> vector resource space to make these devices work. As linux does, we
> implemented per-cpu vector for Xen to extend and scale vector
> resource to nr_cpus x ~200 in a system.   BTW, the core logic of the
> patches is ported from upstream linux and then adapted for Xen.      
> 
> Patch 0001:  Change nr_irqs to nr_irqs_gsi and make nr_irqs_gsi only
> used for GSI interrupts. 
> Patch 0002:  Modify Xen from vector-based interrupt infrastructure to
> IRQ-based one, and the big change is that one irq number is also
> allocated for MSI interrupt source, and the idea is same as Linux's.  
> Patch 0003:  Implement per-cpu vector for xen. Most core logic(such
> as vector allocation algorithm, IRQ migration logic...) is ported
> from upstream Linux.  
> About the patch quality, we have done enough testings against
> upstream, and no any regression found after applying this patchset. 
> Please help to review.  Comments are very appreicated!  Thanks!
> 
> Signed-off-by : Xiantao Zhang <xiantao.zhang@Intel.com>
> 
> Xiantao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] Per-CPU vector for Xen.
  2009-08-18  5:45 ` Zhang, Xiantao
@ 2009-08-18  6:42   ` Keir Fraser
  0 siblings, 0 replies; 3+ messages in thread
From: Keir Fraser @ 2009-08-18  6:42 UTC (permalink / raw)
  To: Zhang, Xiantao
  Cc: xen-devel, Kay, Allen M, Jiang, Yunhong, Dong, Eddie, Yang,
	Xiaowei, Li, Xin

On 18/08/2009 06:45, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:

> Hi, Keir
>    Another issue or limitation is also found when we tested Per-CPU vector
> patch at our side. Currenlty, Xen uses fixmap to access MSI-X resouce, (e.g.
> msi-x tables), 32 pages  512 pages are reserved for 32-pae and x86_64
> separately in fixmap section, but  each MSI-X capable device(regardless of
> real device or virtual function) at least needs one page to map its resource,
> so these pages may easily run out with these devices, espeically on 32-bit
> platforms. For 64-bit platforms, we can reserve more pages to fix the issue,
> but for 32-bit platforms, it is not always safe to increase the number of
> pages due to limited virtual address space.  We have one optional solution to
> fix the issue through dynamically map/unmap MSI-X tables when access it, but
> the concern is that it may cost much due to frequent access for the resource.
> What's suggestion or good idea to address the issue ?  Thanks!
> Xiantao

A lower-performance option for 32-bit Xen would be fine by me. Very few
people should have reason not to run 64-bit Xen now. Those who do may well
have old systems with no/few MSI-x devices.

 -- Keir

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-08-18  6:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-16  8:58 [PATCH RFC] Per-CPU vector for Xen Zhang, Xiantao
2009-08-18  5:45 ` Zhang, Xiantao
2009-08-18  6:42   ` Keir Fraser

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.