All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
@ 2010-08-13 13:37 Jan Beulich
  2010-08-14  6:32 ` Keir Fraser
  2010-08-26  6:24 ` Jiang, Yunhong
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2010-08-13 13:37 UTC (permalink / raw)
  To: xen-devel; +Cc: Konrad Rzeszutek Wilk

[-- Attachment #1: Type: text/plain, Size: 18947 bytes --]

Below/attached is an untested and possibly not yet complete patch
attempting to address the problem originally described on this thread
(can't really test this myself as I don't have, with one exception,
any MSI-X capable devices around, and the exceptional one doesn't have
a driver making use of it). I had sent it once before, it only got
refreshed since then to build with xen-unstable as of c/s 21952, and
I'm re-sending it mainly because there was no feedback so far, despite
the original problem representing a security issue.

It tracks MMIO MFNs required to only have read-only guest access
(determined when the first MSI-X interrupt gets enabled on a device)
in a global range set, and enforces the write protection as
translations get established.

The changes are made under the assumption that p2m_mmio_direct will
only ever be used for order 0 pages.

An open question is whether dealing with pv guests (including the
IOMMU-less case) is necessary, as handling mappings a domain may
already have in place at the time the first interrupt gets set up
would require scanning all of the guest's page table pages.

An alternative would be to determine and insert the address ranges
earlier into mmio_ro_ranges, but that would require a hook in the
PCI config space writes, which is particularly problematic in case
MMCONFIG accesses are being used.

A second alternative would be to require Dom0 to report all devices
(or at least all MSI-X capable ones) regardless of whether they would
be used by that domain, and do so after resources got determined/
assigned for them (i.e. a second notification later than the one
currently happening from the PCI bus scan would be needed).

(Attached is also a trivial prerequisite patch.)

Jan

--- 2010-08-12.orig/xen/arch/x86/mm.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm.c	2010-08-12 17:16:32.000000000 +0200
@@ -824,7 +824,13 @@ get_page_from_l1e(
             return 0;
         }
 
-        return 1;
+        if ( !(l1f & _PAGE_RW) || IS_PRIV(pg_owner) ||
+             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+            return 1;
+        dprintk(XENLOG_G_WARNING,
+                "d%d: Forcing read-only access to MFN %lx\n",
+                l1e_owner->domain_id, mfn);
+        return -1;
     }
 
     if ( unlikely(real_pg_owner != pg_owner) )
@@ -1180,9 +1186,15 @@ static int alloc_l1_table(struct page_in
 
     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
     {
-        if ( is_guest_l1_slot(i) &&
-             unlikely(!get_page_from_l1e(pl1e[i], d, d)) )
-            goto fail;
+        if ( is_guest_l1_slot(i) )
+            switch ( get_page_from_l1e(pl1e[i], d, d) )
+            {
+            case 0:
+                goto fail;
+            case -1:
+                l1e_remove_flags(pl1e[i], _PAGE_RW);
+                break;
+            }
 
         adjust_guest_l1e(pl1e[i], d);
     }
@@ -1766,8 +1778,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         }
 
-        if ( unlikely(!get_page_from_l1e(nl1e, pt_dom, pg_dom)) )
+        switch ( get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        case 0:
             return 0;
+        case -1:
+            l1e_remove_flags(nl1e, _PAGE_RW);
+            break;
+        }
         
         adjust_guest_l1e(nl1e, pt_dom);
         if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -4992,8 +5010,9 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    if ( unlikely(!get_page_from_l1e(nl1e, d, d)) )
+    switch ( get_page_from_l1e(nl1e, d, d) )
     {
+    case 0:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
              !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
         {
@@ -5012,6 +5031,10 @@ static int ptwr_emulated_update(
             MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
             return X86EMUL_UNHANDLEABLE;
         }
+        break;
+    case -1:
+        l1e_remove_flags(nl1e, _PAGE_RW);
+        break;
     }
 
     adjust_guest_l1e(nl1e, d);
--- 2010-08-12.orig/xen/arch/x86/mm/hap/p2m-ept.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/hap/p2m-ept.c	2010-08-12 17:16:32.000000000 +0200
@@ -72,9 +72,13 @@ static void ept_p2m_type_to_flags(ept_en
             entry->r = entry->w = entry->x = 0;
             return;
         case p2m_ram_rw:
-        case p2m_mmio_direct:
             entry->r = entry->w = entry->x = 1;
             return;
+        case p2m_mmio_direct:
+            entry->r = entry->x = 1;
+            entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
+                                                    entry->mfn);
+            return;
         case p2m_ram_logdirty:
         case p2m_ram_ro:
         case p2m_ram_shared:
@@ -716,6 +720,9 @@ static void ept_change_entry_type_global
     if ( ept_get_asr(d) == 0 )
         return;
 
+    BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct));
+
     ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt);
 
     ept_sync_domain(d);
--- 2010-08-12.orig/xen/arch/x86/mm/p2m.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/p2m.c	2010-08-12 17:16:32.000000000 +0200
@@ -72,7 +72,7 @@ boolean_param("hap_1gb", opt_hap_1gb);
 #define SUPERPAGE_PAGES (1UL << 9)
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
-static unsigned long p2m_type_to_flags(p2m_type_t t) 
+static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
 {
     unsigned long flags;
 #ifdef __x86_64__
@@ -101,7 +101,9 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_dm:
         return flags;
     case p2m_mmio_direct:
-        return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_PCD;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+            flags |= _PAGE_RW;
+        return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     case p2m_populate_on_demand:
         return flags;
     }
@@ -1299,8 +1301,10 @@ p2m_set_entry(struct p2m_domain *p2m, un
             domain_crash(p2m->domain);
             goto out;
         }
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) 
-            ? l3e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt) | _PAGE_PSE)
+            ? l3e_from_pfn(mfn_x(mfn),
+                           p2m_type_to_flags(p2mt, mfn) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
         paging_write_p2m_entry(p2m->domain, gfn, p2m_entry,
@@ -1334,7 +1338,8 @@ p2m_set_entry(struct p2m_domain *p2m, un
         ASSERT(p2m_entry);
         
         if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
-            entry_content = l1e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt));
+            entry_content = l1e_from_pfn(mfn_x(mfn),
+                                         p2m_type_to_flags(p2mt, mfn));
         else
             entry_content = l1e_empty();
         
@@ -1358,9 +1363,11 @@ p2m_set_entry(struct p2m_domain *p2m, un
             goto out;
         }
         
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_is_magic(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt) | _PAGE_PSE);
+                                       p2m_type_to_flags(p2mt, mfn) |
+                                       _PAGE_PSE);
         else
             l2e_content = l2e_empty();
         
@@ -2437,6 +2444,7 @@ void p2m_change_type_global(struct p2m_d
 #endif /* CONFIG_PAGING_LEVELS == 4 */
 
     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct));
 
     if ( !paging_mode_translate(p2m->domain) )
         return;
@@ -2478,7 +2486,7 @@ void p2m_change_type_global(struct p2m_d
                     continue;
                 mfn = l3e_get_pfn(l3e[i3]);
                 gfn = get_gpfn_from_mfn(mfn);
-                flags = p2m_type_to_flags(nt);
+                flags = p2m_type_to_flags(nt, _mfn(mfn));
                 l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                 paging_write_p2m_entry(p2m->domain, gfn,
                                        (l1_pgentry_t *)&l3e[i3],
@@ -2509,7 +2517,7 @@ void p2m_change_type_global(struct p2m_d
 #endif
 				)
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                     paging_write_p2m_entry(p2m->domain, gfn,
                                            (l1_pgentry_t *)&l2e[i2],
@@ -2533,7 +2541,7 @@ void p2m_change_type_global(struct p2m_d
 				     )
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
                     /* create a new 1le entry with the new type */
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags);
                     paging_write_p2m_entry(p2m->domain, gfn, &l1e[i1],
                                            l1mfn, l1e_content, 1);
--- 2010-08-12.orig/xen/arch/x86/mm/shadow/multi.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/shadow/multi.c	2010-08-12 17:16:32.000000000 +0200
@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v, 
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) )
+    if ( p2m_is_readonly(p2mt) ||
+         (p2mt == p2m_mmio_direct &&
+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
         sflags &= ~_PAGE_RW;
     
     // protect guest page tables
@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
         /* About to install a new reference */        
         if ( shadow_mode_refcounts(d) ) {
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 ) 
+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
+            case 0:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
-            }
-            else
-            {
+                break;
+            case -1:
+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+                /* fall through */
+            default:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
+                break;
             }
         }
     } 
--- 2010-08-12.orig/xen/arch/x86/msi.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/msi.c	2010-08-12 18:09:43.000000000 +0200
@@ -16,12 +16,14 @@
 #include <xen/errno.h>
 #include <xen/pci.h>
 #include <xen/pci_regs.h>
+#include <xen/iocap.h>
 #include <xen/keyhandler.h>
 #include <asm/io.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
 #include <asm/msi.h>
 #include <asm/fixmap.h>
+#include <asm/p2m.h>
 #include <mach_apic.h>
 #include <io_ports.h>
 #include <public/physdev.h>
@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
     return 0;
 }
 
+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
+{
+    u8 limit;
+    u32 addr;
+
+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
+    {
+    case PCI_HEADER_TYPE_NORMAL:
+        limit = 6;
+        break;
+    case PCI_HEADER_TYPE_BRIDGE:
+        limit = 2;
+        break;
+    case PCI_HEADER_TYPE_CARDBUS:
+        limit = 1;
+        break;
+    default:
+        return 0;
+    }
+
+    if ( bir >= limit )
+        return 0;
+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
+        return 0;
+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        if ( ++bir >= limit )
+            return 0;
+        return addr |
+               ((u64)pci_conf_read32(bus, slot, func,
+                                     PCI_BASE_ADDRESS_0 + bir * 4) << 32);
+    }
+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+}
+
 /**
  * msix_capability_init - configure device's MSI-X capability
  * @dev: pointer to the pci_dev data structure of MSI-X device function
@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
  **/
 static int msix_capability_init(struct pci_dev *dev,
                                 struct msi_info *msi,
-                                struct msi_desc **desc)
+                                struct msi_desc **desc,
+                                unsigned int nr_entries)
 {
     struct msi_desc *entry;
     int pos;
@@ -587,6 +627,69 @@ static int msix_capability_init(struct p
 
     list_add_tail(&entry->list, &dev->msi_list);
 
+    if ( !dev->msix_nr_entries )
+    {
+        u64 pba_paddr;
+        u32 pba_offset;
+
+        ASSERT(!dev->msix_used_entries);
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+
+        dev->msix_nr_entries = nr_entries;
+        dev->msix_table.first = PFN_DOWN(table_paddr);
+        dev->msix_table.last = PFN_DOWN(table_paddr +
+                                        nr_entries * PCI_MSIX_ENTRY_SIZE - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_table.first,
+                                        dev->msix_table.last));
+
+        pba_offset = pci_conf_read32(bus, slot, func,
+                                     msix_pba_offset_reg(pos));
+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
+        WARN_ON(!pba_paddr);
+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
+
+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
+                                      BITS_TO_LONGS(nr_entries) - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last));
+
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
+                                dev->msix_table.last) )
+            WARN();
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
+                                dev->msix_pba.last) )
+            WARN();
+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus, slot, func,
+       dev->msix_table.first, dev->msix_table.last,
+       dev->msix_pba.first, dev->msix_pba.last);//temp
+
+        if ( dev->domain )
+            p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain),
+                                         p2m_mmio_direct, p2m_mmio_direct);
+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
+        {
+            struct domain *d = dev->domain;
+
+            if ( !d )
+                for_each_domain(d)
+                    if ( !paging_mode_translate(d) &&
+                         (iomem_access_permitted(d, dev->msix_table.first,
+                                                 dev->msix_table.last) ||
+                          iomem_access_permitted(d, dev->msix_pba.first,
+                                                 dev->msix_pba.last)) )
+                        break;
+            if ( d )
+            {
+                /* XXX How to deal with existing mappings? */
+            }
+        }
+    }
+    WARN_ON(dev->msix_nr_entries != nr_entries);
+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
+    ++dev->msix_used_entries;
+
     /* Mask interrupt here */
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
         return 0;
     }
 
-    status = msix_capability_init(pdev, msi, desc);
+    status = msix_capability_init(pdev, msi, desc, nr_entries);
     return status;
 }
 
@@ -732,6 +835,16 @@ static void __pci_disable_msix(struct ms
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
     pci_conf_write16(bus, slot, func, msix_control_reg(pos), control);
+
+    if ( !--dev->msix_used_entries )
+    {
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_table.first,
+                                  dev->msix_table.last) )
+            WARN();
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_pba.first,
+                                   dev->msix_pba.last) )
+            WARN();
+    }
 }
 
 /*
--- 2010-08-12.orig/xen/drivers/passthrough/io.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/drivers/passthrough/io.c	2010-08-12 17:16:32.000000000 +0200
@@ -26,6 +26,8 @@
 #include <xen/hvm/irq.h>
 #include <xen/tasklet.h>
 
+struct rangeset *__read_mostly mmio_ro_ranges;
+
 static void hvm_dirq_assist(unsigned long _d);
 
 bool_t pt_irq_need_timer(uint32_t flags)
@@ -565,3 +567,11 @@ void hvm_dpci_eoi(struct domain *d, unsi
 unlock:
     spin_unlock(&d->event_lock);
 }
+
+static int __init setup_mmio_ro_ranges(void)
+{
+    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
+                                  RANGESETF_prettyprint_hex);
+    return 0;
+}
+__initcall(setup_mmio_ro_ranges);
--- 2010-08-12.orig/xen/include/xen/iommu.h	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/include/xen/iommu.h	2010-08-12 17:16:32.000000000 +0200
@@ -31,6 +31,8 @@ extern bool_t force_iommu, iommu_verbose
 extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
 
+extern struct rangeset *mmio_ro_ranges;
+
 #define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
 
 #define MAX_IOMMUS 32
--- 2010-08-12.orig/xen/include/xen/pci.h	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/include/xen/pci.h	2010-08-12 17:16:32.000000000 +0200
@@ -45,6 +45,10 @@ struct pci_dev {
     struct list_head domain_list;
 
     struct list_head msi_list;
+    unsigned int msix_nr_entries, msix_used_entries;
+    struct {
+        unsigned long first, last;
+    } msix_table, msix_pba;
     int msix_table_refcnt[MAX_MSIX_TABLE_PAGES];
     int msix_table_idx[MAX_MSIX_TABLE_PAGES];
     spinlock_t msix_table_lock;


[-- Attachment #2: x86-msix-protect-table-and-pba.patch --]
[-- Type: text/plain, Size: 18886 bytes --]

Below/attached is an untested and possibly not yet complete patch
attempting to address the problem originally described on this thread
(can't really test this myself as I don't have, with one exception,
any MSI-X capable devices around, and the exceptional one doesn't have
a driver making use of it). I had sent it once before, it only got
refreshed since then to build with xen-unstable as of c/s 21952, and
I'm re-sending it mainly because there was no feedback so far, despite
the original problem representing a security issue.

It tracks MMIO MFNs required to only have read-only guest access
(determined when the first MSI-X interrupt gets enabled on a device)
in a global range set, and enforces the write protection as
translations get established.

The changes are made under the assumption that p2m_mmio_direct will
only ever be used for order 0 pages.

An open question is whether dealing with pv guests (including the
IOMMU-less case) is necessary, as handling mappings a domain may
already have in place at the time the first interrupt gets set up
would require scanning all of the guest's page table pages.

An alternative would be to determine and insert the address ranges
earlier into mmio_ro_ranges, but that would require a hook in the
PCI config space writes, which is particularly problematic in case
MMCONFIG accesses are being used.

A second alternative would be to require Dom0 to report all devices
(or at least all MSI-X capable ones) regardless of whether they would
be used by that domain, and do so after resources got determined/
assigned for them (i.e. a second notification later than the one
currently happening from the PCI bus scan would be needed).

--- 2010-08-12.orig/xen/arch/x86/mm.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm.c	2010-08-12 17:16:32.000000000 +0200
@@ -824,7 +824,13 @@ get_page_from_l1e(
             return 0;
         }
 
-        return 1;
+        if ( !(l1f & _PAGE_RW) || IS_PRIV(pg_owner) ||
+             !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
+            return 1;
+        dprintk(XENLOG_G_WARNING,
+                "d%d: Forcing read-only access to MFN %lx\n",
+                l1e_owner->domain_id, mfn);
+        return -1;
     }
 
     if ( unlikely(real_pg_owner != pg_owner) )
@@ -1180,9 +1186,15 @@ static int alloc_l1_table(struct page_in
 
     for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++ )
     {
-        if ( is_guest_l1_slot(i) &&
-             unlikely(!get_page_from_l1e(pl1e[i], d, d)) )
-            goto fail;
+        if ( is_guest_l1_slot(i) )
+            switch ( get_page_from_l1e(pl1e[i], d, d) )
+            {
+            case 0:
+                goto fail;
+            case -1:
+                l1e_remove_flags(pl1e[i], _PAGE_RW);
+                break;
+            }
 
         adjust_guest_l1e(pl1e[i], d);
     }
@@ -1766,8 +1778,14 @@ static int mod_l1_entry(l1_pgentry_t *pl
             return rc;
         }
 
-        if ( unlikely(!get_page_from_l1e(nl1e, pt_dom, pg_dom)) )
+        switch ( get_page_from_l1e(nl1e, pt_dom, pg_dom) )
+        {
+        case 0:
             return 0;
+        case -1:
+            l1e_remove_flags(nl1e, _PAGE_RW);
+            break;
+        }
         
         adjust_guest_l1e(nl1e, pt_dom);
         if ( unlikely(!UPDATE_ENTRY(l1, pl1e, ol1e, nl1e, gl1mfn, pt_vcpu,
@@ -4992,8 +5010,9 @@ static int ptwr_emulated_update(
 
     /* Check the new PTE. */
     nl1e = l1e_from_intpte(val);
-    if ( unlikely(!get_page_from_l1e(nl1e, d, d)) )
+    switch ( get_page_from_l1e(nl1e, d, d) )
     {
+    case 0:
         if ( is_pv_32bit_domain(d) && (bytes == 4) && (unaligned_addr & 4) &&
              !do_cmpxchg && (l1e_get_flags(nl1e) & _PAGE_PRESENT) )
         {
@@ -5012,6 +5031,10 @@ static int ptwr_emulated_update(
             MEM_LOG("ptwr_emulate: could not get_page_from_l1e()");
             return X86EMUL_UNHANDLEABLE;
         }
+        break;
+    case -1:
+        l1e_remove_flags(nl1e, _PAGE_RW);
+        break;
     }
 
     adjust_guest_l1e(nl1e, d);
--- 2010-08-12.orig/xen/arch/x86/mm/hap/p2m-ept.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/hap/p2m-ept.c	2010-08-12 17:16:32.000000000 +0200
@@ -72,9 +72,13 @@ static void ept_p2m_type_to_flags(ept_en
             entry->r = entry->w = entry->x = 0;
             return;
         case p2m_ram_rw:
-        case p2m_mmio_direct:
             entry->r = entry->w = entry->x = 1;
             return;
+        case p2m_mmio_direct:
+            entry->r = entry->x = 1;
+            entry->w = !rangeset_contains_singleton(mmio_ro_ranges,
+                                                    entry->mfn);
+            return;
         case p2m_ram_logdirty:
         case p2m_ram_ro:
         case p2m_ram_shared:
@@ -716,6 +720,9 @@ static void ept_change_entry_type_global
     if ( ept_get_asr(d) == 0 )
         return;
 
+    BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct));
+
     ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt);
 
     ept_sync_domain(d);
--- 2010-08-12.orig/xen/arch/x86/mm/p2m.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/p2m.c	2010-08-12 17:16:32.000000000 +0200
@@ -72,7 +72,7 @@ boolean_param("hap_1gb", opt_hap_1gb);
 #define SUPERPAGE_PAGES (1UL << 9)
 #define superpage_aligned(_x)  (((_x)&(SUPERPAGE_PAGES-1))==0)
 
-static unsigned long p2m_type_to_flags(p2m_type_t t) 
+static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn)
 {
     unsigned long flags;
 #ifdef __x86_64__
@@ -101,7 +101,9 @@ static unsigned long p2m_type_to_flags(p
     case p2m_mmio_dm:
         return flags;
     case p2m_mmio_direct:
-        return flags | P2M_BASE_FLAGS | _PAGE_RW | _PAGE_PCD;
+        if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn_x(mfn)) )
+            flags |= _PAGE_RW;
+        return flags | P2M_BASE_FLAGS | _PAGE_PCD;
     case p2m_populate_on_demand:
         return flags;
     }
@@ -1299,8 +1301,10 @@ p2m_set_entry(struct p2m_domain *p2m, un
             domain_crash(p2m->domain);
             goto out;
         }
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         l3e_content = mfn_valid(mfn) 
-            ? l3e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt) | _PAGE_PSE)
+            ? l3e_from_pfn(mfn_x(mfn),
+                           p2m_type_to_flags(p2mt, mfn) | _PAGE_PSE)
             : l3e_empty();
         entry_content.l1 = l3e_content.l3;
         paging_write_p2m_entry(p2m->domain, gfn, p2m_entry,
@@ -1334,7 +1338,8 @@ p2m_set_entry(struct p2m_domain *p2m, un
         ASSERT(p2m_entry);
         
         if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) )
-            entry_content = l1e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt));
+            entry_content = l1e_from_pfn(mfn_x(mfn),
+                                         p2m_type_to_flags(p2mt, mfn));
         else
             entry_content = l1e_empty();
         
@@ -1358,9 +1363,11 @@ p2m_set_entry(struct p2m_domain *p2m, un
             goto out;
         }
         
+        ASSERT(!mfn_valid(mfn) || p2mt != p2m_mmio_direct);
         if ( mfn_valid(mfn) || p2m_is_magic(p2mt) )
             l2e_content = l2e_from_pfn(mfn_x(mfn),
-                                       p2m_type_to_flags(p2mt) | _PAGE_PSE);
+                                       p2m_type_to_flags(p2mt, mfn) |
+                                       _PAGE_PSE);
         else
             l2e_content = l2e_empty();
         
@@ -2437,6 +2444,7 @@ void p2m_change_type_global(struct p2m_d
 #endif /* CONFIG_PAGING_LEVELS == 4 */
 
     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
+    BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct));
 
     if ( !paging_mode_translate(p2m->domain) )
         return;
@@ -2478,7 +2486,7 @@ void p2m_change_type_global(struct p2m_d
                     continue;
                 mfn = l3e_get_pfn(l3e[i3]);
                 gfn = get_gpfn_from_mfn(mfn);
-                flags = p2m_type_to_flags(nt);
+                flags = p2m_type_to_flags(nt, _mfn(mfn));
                 l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                 paging_write_p2m_entry(p2m->domain, gfn,
                                        (l1_pgentry_t *)&l3e[i3],
@@ -2509,7 +2517,7 @@ void p2m_change_type_global(struct p2m_d
 #endif
 				)
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags | _PAGE_PSE);
                     paging_write_p2m_entry(p2m->domain, gfn,
                                            (l1_pgentry_t *)&l2e[i2],
@@ -2533,7 +2541,7 @@ void p2m_change_type_global(struct p2m_d
 				     )
                            * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; 
                     /* create a new 1le entry with the new type */
-                    flags = p2m_type_to_flags(nt);
+                    flags = p2m_type_to_flags(nt, _mfn(mfn));
                     l1e_content = l1e_from_pfn(mfn, flags);
                     paging_write_p2m_entry(p2m->domain, gfn, &l1e[i1],
                                            l1mfn, l1e_content, 1);
--- 2010-08-12.orig/xen/arch/x86/mm/shadow/multi.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/mm/shadow/multi.c	2010-08-12 17:16:32.000000000 +0200
@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v, 
     }
 
     /* Read-only memory */
-    if ( p2m_is_readonly(p2mt) )
+    if ( p2m_is_readonly(p2mt) ||
+         (p2mt == p2m_mmio_direct &&
+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
         sflags &= ~_PAGE_RW;
     
     // protect guest page tables
@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
         /* About to install a new reference */        
         if ( shadow_mode_refcounts(d) ) {
             TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 ) 
+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
             {
+            case 0:
                 /* Doesn't look like a pagetable. */
                 flags |= SHADOW_SET_ERROR;
                 new_sl1e = shadow_l1e_empty();
-            }
-            else
-            {
+                break;
+            case -1:
+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
+                /* fall through */
+            default:
                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
+                break;
             }
         }
     } 
--- 2010-08-12.orig/xen/arch/x86/msi.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/arch/x86/msi.c	2010-08-12 18:09:43.000000000 +0200
@@ -16,12 +16,14 @@
 #include <xen/errno.h>
 #include <xen/pci.h>
 #include <xen/pci_regs.h>
+#include <xen/iocap.h>
 #include <xen/keyhandler.h>
 #include <asm/io.h>
 #include <asm/smp.h>
 #include <asm/desc.h>
 #include <asm/msi.h>
 #include <asm/fixmap.h>
+#include <asm/p2m.h>
 #include <mach_apic.h>
 #include <io_ports.h>
 #include <public/physdev.h>
@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
     return 0;
 }
 
+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
+{
+    u8 limit;
+    u32 addr;
+
+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
+    {
+    case PCI_HEADER_TYPE_NORMAL:
+        limit = 6;
+        break;
+    case PCI_HEADER_TYPE_BRIDGE:
+        limit = 2;
+        break;
+    case PCI_HEADER_TYPE_CARDBUS:
+        limit = 1;
+        break;
+    default:
+        return 0;
+    }
+
+    if ( bir >= limit )
+        return 0;
+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
+        return 0;
+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) == PCI_BASE_ADDRESS_MEM_TYPE_64 )
+    {
+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
+        if ( ++bir >= limit )
+            return 0;
+        return addr |
+               ((u64)pci_conf_read32(bus, slot, func,
+                                     PCI_BASE_ADDRESS_0 + bir * 4) << 32);
+    }
+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
+}
+
 /**
  * msix_capability_init - configure device's MSI-X capability
  * @dev: pointer to the pci_dev data structure of MSI-X device function
@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
  **/
 static int msix_capability_init(struct pci_dev *dev,
                                 struct msi_info *msi,
-                                struct msi_desc **desc)
+                                struct msi_desc **desc,
+                                unsigned int nr_entries)
 {
     struct msi_desc *entry;
     int pos;
@@ -587,6 +627,69 @@ static int msix_capability_init(struct p
 
     list_add_tail(&entry->list, &dev->msi_list);
 
+    if ( !dev->msix_nr_entries )
+    {
+        u64 pba_paddr;
+        u32 pba_offset;
+
+        ASSERT(!dev->msix_used_entries);
+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
+
+        dev->msix_nr_entries = nr_entries;
+        dev->msix_table.first = PFN_DOWN(table_paddr);
+        dev->msix_table.last = PFN_DOWN(table_paddr +
+                                        nr_entries * PCI_MSIX_ENTRY_SIZE - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_table.first,
+                                        dev->msix_table.last));
+
+        pba_offset = pci_conf_read32(bus, slot, func,
+                                     msix_pba_offset_reg(pos));
+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
+        WARN_ON(!pba_paddr);
+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
+
+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
+                                      BITS_TO_LONGS(nr_entries) - 1);
+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges, dev->msix_pba.first,
+                                        dev->msix_pba.last));
+
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
+                                dev->msix_table.last) )
+            WARN();
+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
+                                dev->msix_pba.last) )
+            WARN();
+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus, slot, func,
+       dev->msix_table.first, dev->msix_table.last,
+       dev->msix_pba.first, dev->msix_pba.last);//temp
+
+        if ( dev->domain )
+            p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain),
+                                         p2m_mmio_direct, p2m_mmio_direct);
+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
+        {
+            struct domain *d = dev->domain;
+
+            if ( !d )
+                for_each_domain(d)
+                    if ( !paging_mode_translate(d) &&
+                         (iomem_access_permitted(d, dev->msix_table.first,
+                                                 dev->msix_table.last) ||
+                          iomem_access_permitted(d, dev->msix_pba.first,
+                                                 dev->msix_pba.last)) )
+                        break;
+            if ( d )
+            {
+                /* XXX How to deal with existing mappings? */
+            }
+        }
+    }
+    WARN_ON(dev->msix_nr_entries != nr_entries);
+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
+    ++dev->msix_used_entries;
+
     /* Mask interrupt here */
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
         return 0;
     }
 
-    status = msix_capability_init(pdev, msi, desc);
+    status = msix_capability_init(pdev, msi, desc, nr_entries);
     return status;
 }
 
@@ -732,6 +835,16 @@ static void __pci_disable_msix(struct ms
     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
 
     pci_conf_write16(bus, slot, func, msix_control_reg(pos), control);
+
+    if ( !--dev->msix_used_entries )
+    {
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_table.first,
+                                  dev->msix_table.last) )
+            WARN();
+        if ( rangeset_remove_range(mmio_ro_ranges, dev->msix_pba.first,
+                                   dev->msix_pba.last) )
+            WARN();
+    }
 }
 
 /*
--- 2010-08-12.orig/xen/drivers/passthrough/io.c	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/drivers/passthrough/io.c	2010-08-12 17:16:32.000000000 +0200
@@ -26,6 +26,8 @@
 #include <xen/hvm/irq.h>
 #include <xen/tasklet.h>
 
+struct rangeset *__read_mostly mmio_ro_ranges;
+
 static void hvm_dirq_assist(unsigned long _d);
 
 bool_t pt_irq_need_timer(uint32_t flags)
@@ -565,3 +567,11 @@ void hvm_dpci_eoi(struct domain *d, unsi
 unlock:
     spin_unlock(&d->event_lock);
 }
+
+static int __init setup_mmio_ro_ranges(void)
+{
+    mmio_ro_ranges = rangeset_new(NULL, "r/o mmio ranges",
+                                  RANGESETF_prettyprint_hex);
+    return 0;
+}
+__initcall(setup_mmio_ro_ranges);
--- 2010-08-12.orig/xen/include/xen/iommu.h	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/include/xen/iommu.h	2010-08-12 17:16:32.000000000 +0200
@@ -31,6 +31,8 @@ extern bool_t force_iommu, iommu_verbose
 extern bool_t iommu_workaround_bios_bug, iommu_passthrough;
 extern bool_t iommu_snoop, iommu_qinval, iommu_intremap;
 
+extern struct rangeset *mmio_ro_ranges;
+
 #define domain_hvm_iommu(d)     (&d->arch.hvm_domain.hvm_iommu)
 
 #define MAX_IOMMUS 32
--- 2010-08-12.orig/xen/include/xen/pci.h	2010-08-12 17:36:43.000000000 +0200
+++ 2010-08-12/xen/include/xen/pci.h	2010-08-12 17:16:32.000000000 +0200
@@ -45,6 +45,10 @@ struct pci_dev {
     struct list_head domain_list;
 
     struct list_head msi_list;
+    unsigned int msix_nr_entries, msix_used_entries;
+    struct {
+        unsigned long first, last;
+    } msix_table, msix_pba;
     int msix_table_refcnt[MAX_MSIX_TABLE_PAGES];
     int msix_table_idx[MAX_MSIX_TABLE_PAGES];
     spinlock_t msix_table_lock;

[-- Attachment #3: rangeset-overlaps.patch --]
[-- Type: text/plain, Size: 1370 bytes --]

--- 2010-06-15.orig/xen/common/rangeset.c	2009-11-05 10:13:22.000000000 +0100
+++ 2010-06-15/xen/common/rangeset.c	2010-07-12 12:08:16.000000000 +0200
@@ -251,6 +251,22 @@ int rangeset_contains_range(
     return contains;
 }
 
+int rangeset_overlaps_range(
+    struct rangeset *r, unsigned long s, unsigned long e)
+{
+    struct range *x;
+    int overlaps;
+
+    ASSERT(s <= e);
+
+    spin_lock(&r->lock);
+    x = find_range(r, e);
+    overlaps = (x && (s <= x->e));
+    spin_unlock(&r->lock);
+
+    return overlaps;
+}
+
 int rangeset_report_ranges(
     struct rangeset *r, unsigned long s, unsigned long e,
     int (*cb)(unsigned long s, unsigned long e, void *), void *ctxt)
--- 2010-06-15.orig/xen/include/xen/rangeset.h	2009-11-05 10:13:22.000000000 +0100
+++ 2010-06-15/xen/include/xen/rangeset.h	2010-07-12 12:09:55.000000000 +0200
@@ -53,6 +53,8 @@ int __must_check rangeset_remove_range(
     struct rangeset *r, unsigned long s, unsigned long e);
 int __must_check rangeset_contains_range(
     struct rangeset *r, unsigned long s, unsigned long e);
+int __must_check rangeset_overlaps_range(
+    struct rangeset *r, unsigned long s, unsigned long e);
 int rangeset_report_ranges(
     struct rangeset *r, unsigned long s, unsigned long e,
     int (*cb)(unsigned long s, unsigned long e, void *), void *ctxt);

[-- Attachment #4: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-13 13:37 [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array Jan Beulich
@ 2010-08-14  6:32 ` Keir Fraser
  2010-08-16  7:55   ` Jan Beulich
  2010-08-26  6:24 ` Jiang, Yunhong
  1 sibling, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2010-08-14  6:32 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Konrad Rzeszutek Wilk

On 13/08/2010 14:37, "Jan Beulich" <JBeulich@novell.com> wrote:

> Below/attached is an untested and possibly not yet complete patch
> attempting to address the problem originally described on this thread
> (can't really test this myself as I don't have, with one exception,
> any MSI-X capable devices around, and the exceptional one doesn't have
> a driver making use of it). I had sent it once before, it only got
> refreshed since then to build with xen-unstable as of c/s 21952, and
> I'm re-sending it mainly because there was no feedback so far, despite
> the original problem representing a security issue.

I'm happy to check it in and see if anyone complains about breakage.

 -- Keir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-14  6:32 ` Keir Fraser
@ 2010-08-16  7:55   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2010-08-16  7:55 UTC (permalink / raw)
  To: Keir Fraser; +Cc: xen-devel, Konrad Rzeszutek Wilk

>>> On 14.08.10 at 08:32, Keir Fraser <keir.fraser@eu.citrix.com> wrote:
> On 13/08/2010 14:37, "Jan Beulich" <JBeulich@novell.com> wrote:
> 
>> Below/attached is an untested and possibly not yet complete patch
>> attempting to address the problem originally described on this thread
>> (can't really test this myself as I don't have, with one exception,
>> any MSI-X capable devices around, and the exceptional one doesn't have
>> a driver making use of it). I had sent it once before, it only got
>> refreshed since then to build with xen-unstable as of c/s 21952, and
>> I'm re-sending it mainly because there was no feedback so far, despite
>> the original problem representing a security issue.
> 
> I'm happy to check it in and see if anyone complains about breakage.

It's not really only a matter of doing a trial checkin, but also one of
finding a solution to the remaining pv-guest issue described in the
submission mail as well as the removal the remaining hole left due to
qemu temporarily mapping this space read/write for no apparent
reason other than keeping the code simple.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-13 13:37 [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array Jan Beulich
  2010-08-14  6:32 ` Keir Fraser
@ 2010-08-26  6:24 ` Jiang, Yunhong
  2010-08-26  7:06   ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Jiang, Yunhong @ 2010-08-26  6:24 UTC (permalink / raw)
  To: Jan Beulich, xen-devel; +Cc: Konrad Rzeszutek Wilk



>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>Sent: Friday, August 13, 2010 9:37 PM
>To: xen-devel@lists.xensource.com
>Cc: Konrad Rzeszutek Wilk
>Subject: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>Below/attached is an untested and possibly not yet complete patch
>attempting to address the problem originally described on this thread
>(can't really test this myself as I don't have, with one exception,
>any MSI-X capable devices around, and the exceptional one doesn't have
>a driver making use of it). I had sent it once before, it only got
>refreshed since then to build with xen-unstable as of c/s 21952, and
>I'm re-sending it mainly because there was no feedback so far, despite
>the original problem representing a security issue.

I setup a MSI-x NIC and can test the patch after it is finalized.

>
>It tracks MMIO MFNs required to only have read-only guest access
>(determined when the first MSI-X interrupt gets enabled on a device)
>in a global range set, and enforces the write protection as
>translations get established.
>
>The changes are made under the assumption that p2m_mmio_direct will
>only ever be used for order 0 pages.
>
>An open question is whether dealing with pv guests (including the
>IOMMU-less case) is necessary, as handling mappings a domain may
>already have in place at the time the first interrupt gets set up
>would require scanning all of the guest's page table pages.


>
>An alternative would be to determine and insert the address ranges
>earlier into mmio_ro_ranges, but that would require a hook in the
>PCI config space writes, which is particularly problematic in case
>MMCONFIG accesses are being used.

I noticed you stated in your previous mail that this should be done in hypervisor, not tools. Is it because tools is not trusted by xen hypervisor? 
If tools can be trusted, is it possible to achieve this in tools: Tools tell xen hypervisor the MMIO range that is read-only to this guest after the guest is created, but before the domain is unpaused.


>
>A second alternative would be to require Dom0 to report all devices
>(or at least all MSI-X capable ones) regardless of whether they would
>be used by that domain, and do so after resources got determined/
>assigned for them (i.e. a second notification later than the one
>currently happening from the PCI bus scan would be needed).

Currently Xen knows about the PCI device's resource assignment already when system boot, since Xen have PCI information. The only situations that Xen may have no idea includes: a) Dom0 re-assign the device resource, may because of resource balance; b) VF device for SR-IOV.

I think for situation a, IIRC, xen hypervisor can't handle it, because that means all shadow need be rebuild, the MSI information need be updated etc. For situation b, we can do it when the VF device is assigned to guest.

Still, I think tools can achive this more easily if it is trusted.

>
>(Attached is also a trivial prerequisite patch.)
>
>Jan
>

......

>--- 2010-08-12.orig/xen/arch/x86/mm/shadow/multi.c	2010-08-12
>17:36:43.000000000 +0200
>+++ 2010-08-12/xen/arch/x86/mm/shadow/multi.c	2010-08-12
>17:16:32.000000000 +0200
>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>     }
>
>     /* Read-only memory */
>-    if ( p2m_is_readonly(p2mt) )
>+    if ( p2m_is_readonly(p2mt) ||
>+         (p2mt == p2m_mmio_direct &&
>+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
>         sflags &= ~_PAGE_RW;

Would it have performance impact if too much mmio rangeset and we need search it for each l1 entry update? Or you assume this range will not be updated so frequently? 
I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but at least it works for x86_64, and some wrap function with #ifdef X86_64 can handle the difference.

BTW, I remember it has been discussed for a long time that PAE xen will be removed, but seems it still exist, are there any special reason for PAE Xen?

>
>     // protect guest page tables
>@@ -1204,15 +1206,19 @@ static int shadow_set_l1e(struct vcpu *v
>         /* About to install a new reference */
>         if ( shadow_mode_refcounts(d) ) {
>
>TRACE_SHADOW_PATH_FLAG(TRCE_SFLAG_SHADOW_L1_GET_REF);
>-            if ( shadow_get_page_from_l1e(new_sl1e, d, new_type) == 0 )
>+            switch ( shadow_get_page_from_l1e(new_sl1e, d, new_type) )
>             {
>+            case 0:
>                 /* Doesn't look like a pagetable. */
>                 flags |= SHADOW_SET_ERROR;
>                 new_sl1e = shadow_l1e_empty();
>-            }
>-            else
>-            {
>+                break;
>+            case -1:
>+                shadow_l1e_remove_flags(new_sl1e, _PAGE_RW);
>+                /* fall through */
>+            default:
>                 shadow_vram_get_l1e(new_sl1e, sl1e, sl1mfn, d);
>+                break;
>             }
>         }
>     }
>--- 2010-08-12.orig/xen/arch/x86/msi.c	2010-08-12 17:36:43.000000000 +0200
>+++ 2010-08-12/xen/arch/x86/msi.c	2010-08-12 18:09:43.000000000 +0200
>@@ -16,12 +16,14 @@
> #include <xen/errno.h>
> #include <xen/pci.h>
> #include <xen/pci_regs.h>
>+#include <xen/iocap.h>
> #include <xen/keyhandler.h>
> #include <asm/io.h>
> #include <asm/smp.h>
> #include <asm/desc.h>
> #include <asm/msi.h>
> #include <asm/fixmap.h>
>+#include <asm/p2m.h>
> #include <mach_apic.h>
> #include <io_ports.h>
> #include <public/physdev.h>
>@@ -520,6 +522,43 @@ static int msi_capability_init(struct pc
>     return 0;
> }
>
>+static u64 read_pci_mem_bar(u8 bus, u8 slot, u8 func, u8 bir)
>+{
>+    u8 limit;
>+    u32 addr;
>+
>+    switch ( pci_conf_read8(bus, slot, func, PCI_HEADER_TYPE) )
>+    {
>+    case PCI_HEADER_TYPE_NORMAL:
>+        limit = 6;
>+        break;
>+    case PCI_HEADER_TYPE_BRIDGE:
>+        limit = 2;
>+        break;
>+    case PCI_HEADER_TYPE_CARDBUS:
>+        limit = 1;
>+        break;
>+    default:
>+        return 0;
>+    }
>+
>+    if ( bir >= limit )
>+        return 0;
>+    addr = pci_conf_read32(bus, slot, func, PCI_BASE_ADDRESS_0 + bir * 4);
>+    if ( (addr & PCI_BASE_ADDRESS_SPACE) == PCI_BASE_ADDRESS_SPACE_IO )
>+        return 0;
>+    if ( (addr & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
>PCI_BASE_ADDRESS_MEM_TYPE_64 )
>+    {
>+        addr &= ~PCI_BASE_ADDRESS_MEM_MASK;
>+        if ( ++bir >= limit )
>+            return 0;
>+        return addr |
>+               ((u64)pci_conf_read32(bus, slot, func,
>+                                     PCI_BASE_ADDRESS_0 + bir * 4) <<
>32);
>+    }
>+    return addr & ~PCI_BASE_ADDRESS_MEM_MASK;
>+}
>+
> /**
>  * msix_capability_init - configure device's MSI-X capability
>  * @dev: pointer to the pci_dev data structure of MSI-X device function
>@@ -532,7 +571,8 @@ static int msi_capability_init(struct pc
>  **/
> static int msix_capability_init(struct pci_dev *dev,
>                                 struct msi_info *msi,
>-                                struct msi_desc **desc)
>+                                struct msi_desc **desc,
>+                                unsigned int nr_entries)
> {
>     struct msi_desc *entry;
>     int pos;
>@@ -587,6 +627,69 @@ static int msix_capability_init(struct p
>
>     list_add_tail(&entry->list, &dev->msi_list);
>
>+    if ( !dev->msix_nr_entries )
>+    {
>+        u64 pba_paddr;
>+        u32 pba_offset;
>+
>+        ASSERT(!dev->msix_used_entries);
>+        WARN_ON(msi->table_base != read_pci_mem_bar(bus, slot, func, bir));
>+
>+        dev->msix_nr_entries = nr_entries;
>+        dev->msix_table.first = PFN_DOWN(table_paddr);
>+        dev->msix_table.last = PFN_DOWN(table_paddr +
>+                                        nr_entries *
>PCI_MSIX_ENTRY_SIZE - 1);
>+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
>dev->msix_table.first,
>+                                        dev->msix_table.last));
>+
>+        pba_offset = pci_conf_read32(bus, slot, func,
>+                                     msix_pba_offset_reg(pos));
>+        bir = (u8)(pba_offset & PCI_MSIX_BIRMASK);
>+        pba_paddr = read_pci_mem_bar(bus, slot, func, bir);
>+        WARN_ON(!pba_paddr);
>+        pba_paddr += pba_offset & ~PCI_MSIX_BIRMASK;
>+
>+        dev->msix_pba.first = PFN_DOWN(pba_paddr);
>+        dev->msix_pba.last = PFN_DOWN(pba_paddr +
>+                                      BITS_TO_LONGS(nr_entries) - 1);
>+        WARN_ON(rangeset_overlaps_range(mmio_ro_ranges,
>dev->msix_pba.first,
>+                                        dev->msix_pba.last));
>+
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_table.first,
>+                                dev->msix_table.last) )
>+            WARN();
>+        if ( rangeset_add_range(mmio_ro_ranges, dev->msix_pba.first,
>+                                dev->msix_pba.last) )
>+            WARN();
>+printk("MSIX%02x:%02x.%x: table@(%lx,%lx), pba@(%lx,%lx)\n", bus, slot, func,
>+       dev->msix_table.first, dev->msix_table.last,
>+       dev->msix_pba.first, dev->msix_pba.last);//temp
>+
>+        if ( dev->domain )
>+            p2m_change_entry_type_global(p2m_get_hostp2m(dev->domain),
>+                                         p2m_mmio_direct,
>p2m_mmio_direct);
>+        if ( !dev->domain || !paging_mode_translate(dev->domain) )
>+        {
>+            struct domain *d = dev->domain;
>+
>+            if ( !d )
>+                for_each_domain(d)
>+                    if ( !paging_mode_translate(d) &&
>+                         (iomem_access_permitted(d, dev->msix_table.first,
>+                                                 dev->msix_table.last)
>||
>+                          iomem_access_permitted(d, dev->msix_pba.first,
>+                                                 dev->msix_pba.last)) )
>+                        break;
>+            if ( d )
>+            {
>+                /* XXX How to deal with existing mappings? */
>+            }
>+        }
>+    }
>+    WARN_ON(dev->msix_nr_entries != nr_entries);
>+    WARN_ON(dev->msix_table.first != (table_paddr >> PAGE_SHIFT));
>+    ++dev->msix_used_entries;
>+
>     /* Mask interrupt here */
>     writel(1, entry->mask_base + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
>
>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>         return 0;
>     }
>
>-    status = msix_capability_init(pdev, msi, desc);
>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>     return status;
> }

As stated above, would it be possible to achive this in tools?
Also if it is possible to place the mmio_ro_ranges to be per domain structure?


Thanks
--jyh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-26  6:24 ` Jiang, Yunhong
@ 2010-08-26  7:06   ` Jan Beulich
  2010-08-26  8:41     ` Jiang, Yunhong
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2010-08-26  7:06 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: xen-devel, Konrad Rzeszutek Wilk

>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>From: xen-devel-bounces@lists.xensource.com 
>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>>An alternative would be to determine and insert the address ranges
>>earlier into mmio_ro_ranges, but that would require a hook in the
>>PCI config space writes, which is particularly problematic in case
>>MMCONFIG accesses are being used.
> 
> I noticed you stated in your previous mail that this should be done in 
> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor? 
> If tools can be trusted, is it possible to achieve this in tools: Tools tell 
> xen hypervisor the MMIO range that is read-only to this guest after the guest 
> is created, but before the domain is unpaused.

Doing this from the tools would have the unfortunate side effect that
Dom0 (or the associated stubdom) would continue to need special
casing, which really it shouldn't for this purpose (and all the special
casing in the patch is really just because qemu wants to map writably
the ranges in question, and this can only be removed after the
hypervisor handling changed).

Another aspect is that of necessity of denying write access to these
ranges: The guest must be prevented writing to them only once at
least one MSI-X interrupt got enabled. Plus (while not the case with
the current Linux implementation) a (non-Linux or future Linux)
version may choose to (re-)assign device resources only when the
device gets enabled, which would be after the guest was already
launched.

>>A second alternative would be to require Dom0 to report all devices
>>(or at least all MSI-X capable ones) regardless of whether they would
>>be used by that domain, and do so after resources got determined/
>>assigned for them (i.e. a second notification later than the one
>>currently happening from the PCI bus scan would be needed).
> 
> Currently Xen knows about the PCI device's resource assignment already when 
> system boot, since Xen have PCI information. The only situations that Xen may 
> have no idea includes: a) Dom0 re-assign the device resource, may because of 
> resource balance; b) VF device for SR-IOV.
> 
> I think for situation a, IIRC, xen hypervisor can't handle it, because that 
> means all shadow need be rebuild, the MSI information need be updated etc. 

Shadows and p2m table need updating in this case, but MSI information
doesn't afaict (it gets proagated only when the first interrupt is being
set up).
>>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>>     }
>>
>>     /* Read-only memory */
>>-    if ( p2m_is_readonly(p2mt) )
>>+    if ( p2m_is_readonly(p2mt) ||
>>+         (p2mt == p2m_mmio_direct &&
>>+          rangeset_contains_singleton(mmio_ro_ranges, mfn_x(target_mfn))) )
>>         sflags &= ~_PAGE_RW;
> 
> Would it have performance impact if too much mmio rangeset and we need 
> search it for each l1 entry update? Or you assume this range will not be 
> updated so frequently? 

Yes, performance wise this isn't nice.

> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand 
> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but 
> at least it works for x86_64, and some wrap function with #ifdef X86_64 can 
> handle the difference.

That would be a possible alternative, but I'm afraid I wouldn't dare to
make such a change.

>>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>>         return 0;
>>     }
>>
>>-    status = msix_capability_init(pdev, msi, desc);
>>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>>     return status;
>> }
> 
> As stated above, would it be possible to achive this in tools?
> Also if it is possible to place the mmio_ro_ranges to be per domain 
> structure?

As said above, doing it in the tools has down sides, but if done
there and if excluding/special-casing Dom0 (or the servicing
stubdom) it should be possible to make the ranges per-domain.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-08-26  7:06   ` Jan Beulich
@ 2010-08-26  8:41     ` Jiang, Yunhong
  2010-08-26 11:22       ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Jiang, Yunhong @ 2010-08-26  8:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Rzeszutek Wilk, Konrad



>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@novell.com]
>Sent: Thursday, August 26, 2010 3:07 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com; Konrad Rzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>>From: xen-devel-bounces@lists.xensource.com
>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>>>An alternative would be to determine and insert the address ranges
>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>PCI config space writes, which is particularly problematic in case
>>>MMCONFIG accesses are being used.
>>
>> I noticed you stated in your previous mail that this should be done in
>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>> xen hypervisor the MMIO range that is read-only to this guest after the guest
>> is created, but before the domain is unpaused.
>
>Doing this from the tools would have the unfortunate side effect that
>Dom0 (or the associated stubdom) would continue to need special
>casing, which really it shouldn't for this purpose (and all the special
>casing in the patch is really just because qemu wants to map writably
>the ranges in question, and this can only be removed after the
>hypervisor handling changed).

Agree that doing this from tools can't remove dom0's write access.
Maybe we can combine this together. Tools for normal guest, while your change to msix_capability_init() is for dom0, the flow is followed:
1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address. Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed to track which MMIO range to which device).
2) When a MSI-x interrupt is started, we check the corresponding BAR to see if the range is changed by dom0. If yes, update dom0's mmio_ro_range.  We can also check if the assigned domain's mmio_ro_range cover this.
3) When tools create domain, tools will remove the mmio range for the guest.

A bit over-complicated?

>
>Another aspect is that of necessity of denying write access to these
>ranges: The guest must be prevented writing to them only once at

Although preventing writing access is only needed when MSI-x interrupt enabled, but as your patch stated, we need consider how to handle mapping setup already before the MSI-x enabled (In fact, we are working on removing setup-already mapping for MCA purpose, although not sure if it is accepetable by upstream). Removing the access from beginning will avoid such clean-already-mapped requirement.

>least one MSI-X interrupt got enabled. Plus (while not the case with
>the current Linux implementation) a (non-Linux or future Linux)
>version may choose to (re-)assign device resources only when the
>device gets enabled, which would be after the guest was already
>launched.

I'm a bit confused. Are you talking about guest re-assign device resources or dom0?
If you are talking about guest, I think that's easy to handle , and we should anyway do that. Especially it should not impact physical resource.
If you are talking aboud dom0, I can't think out while guest enable device will cause re-assignment in dom0.

>
>>>A second alternative would be to require Dom0 to report all devices
>>>(or at least all MSI-X capable ones) regardless of whether they would
>>>be used by that domain, and do so after resources got determined/
>>>assigned for them (i.e. a second notification later than the one
>>>currently happening from the PCI bus scan would be needed).
>>
>> Currently Xen knows about the PCI device's resource assignment already when
>> system boot, since Xen have PCI information. The only situations that Xen may
>> have no idea includes: a) Dom0 re-assign the device resource, may because of
>> resource balance; b) VF device for SR-IOV.
>>
>> I think for situation a, IIRC, xen hypervisor can't handle it, because that
>> means all shadow need be rebuild, the MSI information need be updated etc.
>
>Shadows and p2m table need updating in this case, but MSI information
>doesn't afaict (it gets proagated only when the first interrupt is being
>set up).

Maybe I didn't state my point clearly. What I mean is, since xen hypervisor knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. The only concerns for this method is situation a and situation b, which will update the PCI device's resource assignment, while xen hypervisor have no idea and can't update mmio_ro_range.

>>>@@ -653,7 +653,9 @@ _sh_propagate(struct vcpu *v,
>>>     }
>>>
>>>     /* Read-only memory */
>>>-    if ( p2m_is_readonly(p2mt) )
>>>+    if ( p2m_is_readonly(p2mt) ||
>>>+         (p2mt == p2m_mmio_direct &&
>>>+          rangeset_contains_singleton(mmio_ro_ranges,
>mfn_x(target_mfn))) )
>>>         sflags &= ~_PAGE_RW;
>>
>> Would it have performance impact if too much mmio rangeset and we need
>> search it for each l1 entry update? Or you assume this range will not be
>> updated so frequently?
>
>Yes, performance wise this isn't nice.
>
>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>> handle the difference.
>
>That would be a possible alternative, but I'm afraid I wouldn't dare to
>make such a change.

Which change? Would following code works without much issue?

 if (p2m_is_readonly(p2mt) || p2m_is_ro_mmio(p2mt, mfn_t(target_mfn)))
	sflags &= ~_PAGE_RW;

#ifdef __x86_64
#define P2M_RO_TYPES  (p2m_to_mask(p2m_ram_logdirty) |....|p2m_to_mask(p2m_ram_shared)|p2m_to_mask(p2m_mmio_ro)
#define p2m_is_ro_mmio() 0
#else
#define P2M_RO_TYPES  (p2m_to_mask(p2m_ram_logdirty) |....|p2m_to_mask(p2m_ram_shared)
#define p2m_is_ro_mmio(_t, mfn)  (p2mt == p2m_mmio_direct  && (rangeset_contains_singleton(mmio_ro_ranges,>mfn_x(target_mfn)))
#endif

#define p2m_is_readonly(_t)  (p2m_to_mask(_t) & P2M_RO_TYPE))


>
>>>@@ -707,7 +810,7 @@ static int __pci_enable_msix(struct msi_
>>>         return 0;
>>>     }
>>>
>>>-    status = msix_capability_init(pdev, msi, desc);
>>>+    status = msix_capability_init(pdev, msi, desc, nr_entries);
>>>     return status;
>>> }
>>
>> As stated above, would it be possible to achive this in tools?
>> Also if it is possible to place the mmio_ro_ranges to be per domain
>> structure?
>
>As said above, doing it in the tools has down sides, but if done
>there and if excluding/special-casing Dom0 (or the servicing
>stubdom) it should be possible to make the ranges per-domain.

I agree that dom0's writing access should also be avoided . The concern for global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card populated, each support several VF. But I have no data how bit impact would it be.

Thanks
--jyh

>
>Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-26  8:41     ` Jiang, Yunhong
@ 2010-08-26 11:22       ` Jan Beulich
  2010-08-27  1:48         ` Jiang, Yunhong
  0 siblings, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2010-08-26 11:22 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: xen-devel, KonradRzeszutek Wilk

>>> On 26.08.10 at 10:41, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>From: Jan Beulich [mailto:JBeulich@novell.com] 
>>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>>>From: xen-devel-bounces@lists.xensource.com 
>>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>>>>An alternative would be to determine and insert the address ranges
>>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>>PCI config space writes, which is particularly problematic in case
>>>>MMCONFIG accesses are being used.
>>>
>>> I noticed you stated in your previous mail that this should be done in
>>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>>> xen hypervisor the MMIO range that is read-only to this guest after the guest
>>> is created, but before the domain is unpaused.
>>
>>Doing this from the tools would have the unfortunate side effect that
>>Dom0 (or the associated stubdom) would continue to need special
>>casing, which really it shouldn't for this purpose (and all the special
>>casing in the patch is really just because qemu wants to map writably
>>the ranges in question, and this can only be removed after the
>>hypervisor handling changed).
> 
> Agree that doing this from tools can't remove dom0's write access.
> Maybe we can combine this together. Tools for normal guest, while your 
> change to msix_capability_init() is for dom0, the flow is followed:
> 1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address. 
> Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed 
> to track which MMIO range to which device).

That won't work for any devices that BIOS didn't assign resources to.

> 2) When a MSI-x interrupt is started, we check the corresponding BAR to see 
> if the range is changed by dom0. If yes, update dom0's mmio_ro_range.  We can 
> also check if the assigned domain's mmio_ro_range cover this.
> 3) When tools create domain, tools will remove the mmio range for the guest.

This (and therefore 2 above) won't work: You must not disallow the
guest access to this space altogether - it may validly want to read the
PBA. Remember that the code to remove pv guests' access to these
two ranges altogether got reverted due to causing problems with
real world devices/drivers.

Also, if the check would be done only when the interrupt is being
started, we would still have the problem of potentially needing to
change existing mappings.

>>least one MSI-X interrupt got enabled. Plus (while not the case with
>>the current Linux implementation) a (non-Linux or future Linux)
>>version may choose to (re-)assign device resources only when the
>>device gets enabled, which would be after the guest was already
>>launched.
> 
> I'm a bit confused. Are you talking about guest re-assign device resources or 
> dom0?

Dom0.

> If you are talking about guest, I think that's easy to handle , and we 
> should anyway do that. Especially it should not impact physical resource.
> If you are talking aboud dom0, I can't think out while guest enable device 
> will cause re-assignment in dom0.

Because pciback does the actual enabling on behalf of the guest.
Any resource adjustments done when memory decoding gets
enabled won't be known at the time the guest starts.

>>Shadows and p2m table need updating in this case, but MSI information
>>doesn't afaict (it gets proagated only when the first interrupt is being
>>set up).
> 
> Maybe I didn't state my point clearly. What I mean is, since xen hypervisor 
> knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot. 
> The only concerns for this method is situation a and situation b, which will 
> update the PCI device's resource assignment, while xen hypervisor have no 
> idea and can't update mmio_ro_range.

Again - knowing about the PCI hierarchy doesn't mean knowing about
all resources. One option clearly is to require a (new) callout from Dom0
when any resources get adjusted. What I don't like about this is that
all existing Dom0 kernels would need fixing, i.e. I'd prefer a solution
that is contained to hypervisor+tools.

>>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>>> handle the difference.
>>
>>That would be a possible alternative, but I'm afraid I wouldn't dare to
>>make such a change.
> 
> Which change? Would following code works without much issue?

I don't know. I would just be afraid that there are other places in
the code checking explicitly (or, worse, implicitly) for p2m_mmio
would need updating. And not knowing well both shadow and p2m
code, that's not something I would want to do on my own.

> I agree that dom0's writing access should also be avoided . The concern for 
> global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card 
> populated, each support several VF. But I have no data how bit impact would 
> it be.

A potential later optimization for this would be to make Dom0 try
co-locate all these regions.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-08-26 11:22       ` Jan Beulich
@ 2010-08-27  1:48         ` Jiang, Yunhong
  2010-08-27  9:10           ` Jan Beulich
  0 siblings, 1 reply; 14+ messages in thread
From: Jiang, Yunhong @ 2010-08-27  1:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, KonradRzeszutek Wilk



>-----Original Message-----
>From: Jan Beulich [mailto:JBeulich@novell.com]
>Sent: Thursday, August 26, 2010 7:23 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>>>> On 26.08.10 at 10:41, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>>From: Jan Beulich [mailto:JBeulich@novell.com]
>>>>>> On 26.08.10 at 08:24, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>>>>From: xen-devel-bounces@lists.xensource.com
>>>>>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Jan Beulich
>>>>>An alternative would be to determine and insert the address ranges
>>>>>earlier into mmio_ro_ranges, but that would require a hook in the
>>>>>PCI config space writes, which is particularly problematic in case
>>>>>MMCONFIG accesses are being used.
>>>>
>>>> I noticed you stated in your previous mail that this should be done in
>>>> hypervisor, not tools. Is it because tools is not trusted by xen hypervisor?
>>>> If tools can be trusted, is it possible to achieve this in tools: Tools tell
>>>> xen hypervisor the MMIO range that is read-only to this guest after the guest
>>>> is created, but before the domain is unpaused.
>>>
>>>Doing this from the tools would have the unfortunate side effect that
>>>Dom0 (or the associated stubdom) would continue to need special
>>>casing, which really it shouldn't for this purpose (and all the special
>>>casing in the patch is really just because qemu wants to map writably
>>>the ranges in question, and this can only be removed after the
>>>hypervisor handling changed).
>>
>> Agree that doing this from tools can't remove dom0's write access.
>> Maybe we can combine this together. Tools for normal guest, while your
>> change to msix_capability_init() is for dom0, the flow is followed:
>> 1) When xen boot, xen will scan all PCI hierarchy, and get the MSI-X address.
>> Mark that range to dom0's mmio_ro_range. (Maybe a seperetaed list is needed
>> to track which MMIO range to which device).
>
>That won't work for any devices that BIOS didn't assign resources to.
>
>> 2) When a MSI-x interrupt is started, we check the corresponding BAR to see
>> if the range is changed by dom0. If yes, update dom0's mmio_ro_range.  We can
>> also check if the assigned domain's mmio_ro_range cover this.
>> 3) When tools create domain, tools will remove the mmio range for the guest.
>
>This (and therefore 2 above) won't work: You must not disallow the
>guest access to this space altogether - it may validly want to read the
>PBA. Remember that the code to remove pv guests' access to these
>two ranges altogether got reverted due to causing problems with
>real world devices/drivers.

Oops, clearly I wanted to say "add the mmio_ro_range".

>
>Also, if the check would be done only when the interrupt is being
>started, we would still have the problem of potentially needing to
>change existing mappings.
>
>>>least one MSI-X interrupt got enabled. Plus (while not the case with
>>>the current Linux implementation) a (non-Linux or future Linux)
>>>version may choose to (re-)assign device resources only when the
>>>device gets enabled, which would be after the guest was already
>>>launched.
>>
>> I'm a bit confused. Are you talking about guest re-assign device resources or
>> dom0?
>
>Dom0.
>
>> If you are talking about guest, I think that's easy to handle , and we
>> should anyway do that. Especially it should not impact physical resource.
>> If you are talking aboud dom0, I can't think out while guest enable device
>> will cause re-assignment in dom0.
>
>Because pciback does the actual enabling on behalf of the guest.
>Any resource adjustments done when memory decoding gets
>enabled won't be known at the time the guest starts.

Does this resource adjustment work in current pciback/hypervisor? At least it means we need remove the old iomem permission/mapping and add the new iomem permission, although not sure if any other impact. And if we want to add this support to pciback/hypervisor, we can of course cover the MSI-x read-only situation.

>
>>>Shadows and p2m table need updating in this case, but MSI information
>>>doesn't afaict (it gets proagated only when the first interrupt is being
>>>set up).
>>
>> Maybe I didn't state my point clearly. What I mean is, since xen hypervisor
>> knows about PCI hierarchy, they can setup the mmio_ro_range when xen boot.
>> The only concerns for this method is situation a and situation b, which will
>> update the PCI device's resource assignment, while xen hypervisor have no
>> idea and can't update mmio_ro_range.
>
>Again - knowing about the PCI hierarchy doesn't mean knowing about
>all resources. One option clearly is to require a (new) callout from Dom0
>when any resources get adjusted. What I don't like about this is that
>all existing Dom0 kernels would need fixing, i.e. I'd prefer a solution
>that is contained to hypervisor+tools.

If we trust dom0, we don't need care dom0's writing access.
If we don't trust dom0, and if hypervisor does not protect pci configuration space access, this new callout, comparing with your patch, seems make no difference IMHO. 

Anyway, your patch do make great improvement, these issues (global list, guest's existing mapping and checking in _sh_propgate) is not important, or can be enhanced later, I'm glad to verify your patch on my system. Do you have any updated patch, or I can simply use the one you sent out on Aug, 13?

>
>>>> I'm not sure if we can add one more p2m type like p2m_mmio_ro? And expand
>>>> the p2m_is_readonly to cover this also? PAE xen may have trouble for it, but
>>>> at least it works for x86_64, and some wrap function with #ifdef X86_64 can
>>>> handle the difference.
>>>
>>>That would be a possible alternative, but I'm afraid I wouldn't dare to
>>>make such a change.
>>
>> Which change? Would following code works without much issue?
>
>I don't know. I would just be afraid that there are other places in
>the code checking explicitly (or, worse, implicitly) for p2m_mmio
>would need updating. And not knowing well both shadow and p2m
>code, that's not something I would want to do on my own.

A bit surprise to me. I thought you knows well every bit on xen.

--jyh

>
>> I agree that dom0's writing access should also be avoided . The concern for
>> global mmio_ro_ranges is, the ranges maybe a bit big, if several SR-IOV card
>> populated, each support several VF. But I have no data how bit impact would
>> it be.
>
>A potential later optimization for this would be to make Dom0 try
>co-locate all these regions.
>
>Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array
  2010-08-27  1:48         ` Jiang, Yunhong
@ 2010-08-27  9:10           ` Jan Beulich
  2010-08-27  9:38             ` Jiang, Yunhong
  2010-09-01  7:39             ` Jiang, Yunhong
  0 siblings, 2 replies; 14+ messages in thread
From: Jan Beulich @ 2010-08-27  9:10 UTC (permalink / raw)
  To: Yunhong Jiang; +Cc: xen-devel, KonradRzeszutek Wilk

>>> On 27.08.10 at 03:48, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>>From: Jan Beulich [mailto:JBeulich@novell.com] 
>>Because pciback does the actual enabling on behalf of the guest.
>>Any resource adjustments done when memory decoding gets
>>enabled won't be known at the time the guest starts.
> 
> Does this resource adjustment work in current pciback/hypervisor? At least 

pciback and hypervisor on its own presumably have no problem, but
the tools won't cope with it (as they read the resource settings when
assigning the device, before it gets enabled). I never liked that solution
anyway, as it basically takes away control from the hypervisor (which
could monitor config space changes if there wasn't the problem of
how to handle MMCONFIG accesses). Instead of passing resource
ownership information to Xen, all the tools should do is pass PCI
device ownership information. Any dynamic change to the device's
resources should be handled by the hypervisor. Individual resources
should be claimed for a guest only for non-PCI devices one wants
the guest to have access to.

> it means we need remove the old iomem permission/mapping and add the new 
> iomem permission, although not sure if any other impact. And if we want to 
> add this support to pciback/hypervisor, we can of course cover the MSI-x 
> read-only situation.
> 
>>Again - knowing about the PCI hierarchy doesn't mean knowing about
>>all resources. One option clearly is to require a (new) callout from Dom0
>>when any resources get adjusted. What I don't like about this is that
>>all existing Dom0 kernels would need fixing, i.e. I'd prefer a solution
>>that is contained to hypervisor+tools.
> 
> If we trust dom0, we don't need care dom0's writing access.

This is not just about Dom0's write access - I'm in particular talking
about resource adjustments done to passed through devices.

> If we don't trust dom0, and if hypervisor does not protect pci configuration 
> space access, this new callout, comparing with your patch, seems make no 
> difference IMHO. 
> 
> Anyway, your patch do make great improvement, these issues (global list, 
> guest's existing mapping and checking in _sh_propgate) is not important, or 
> can be enhanced later, I'm glad to verify your patch on my system. Do you 
> have any updated patch, or I can simply use the one you sent out on Aug, 13?

Yes, that one should be fine and up-to-date.

Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-08-27  9:10           ` Jan Beulich
@ 2010-08-27  9:38             ` Jiang, Yunhong
  2010-09-01  7:39             ` Jiang, Yunhong
  1 sibling, 0 replies; 14+ messages in thread
From: Jiang, Yunhong @ 2010-08-27  9:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, KonradRzeszutek Wilk

>> Anyway, your patch do make great improvement, these issues (global list,
>> guest's existing mapping and checking in _sh_propgate) is not important, or
>> can be enhanced later, I'm glad to verify your patch on my system. Do you
>> have any updated patch, or I can simply use the one you sent out on Aug, 13?
>
>Yes, that one should be fine and up-to-date.

Sure, but I will do that on next monday since I will be out in weekend and cant' access the system.

Thanks
--jyh

>
>Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-08-27  9:10           ` Jan Beulich
  2010-08-27  9:38             ` Jiang, Yunhong
@ 2010-09-01  7:39             ` Jiang, Yunhong
  2010-09-15  1:32               ` Jiang, Yunhong
  1 sibling, 1 reply; 14+ messages in thread
From: Jiang, Yunhong @ 2010-09-01  7:39 UTC (permalink / raw)
  To: Jiang, Yunhong, Jan Beulich; +Cc: xen-devel, KonradRzeszutek Wilk

Sorry for slow response because I met some trouble on environment setup.

This patch works well. Without this patch, the PV DomU can change the vector and cause spurios interrupt to xen hypervisor. With this patch, PV DomU can't update the vector anymore. 

Ack.

Thanks
--jyh

>-----Original Message-----
>From: Jiang, Yunhong
>Sent: Friday, August 27, 2010 5:38 PM
>To: Jan Beulich
>Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>>> Anyway, your patch do make great improvement, these issues (global list,
>>> guest's existing mapping and checking in _sh_propgate) is not important, or
>>> can be enhanced later, I'm glad to verify your patch on my system. Do you
>>> have any updated patch, or I can simply use the one you sent out on Aug, 13?
>>
>>Yes, that one should be fine and up-to-date.
>
>Sure, but I will do that on next monday since I will be out in weekend and cant'
>access the system.
>
>Thanks
>--jyh
>
>>
>>Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-09-01  7:39             ` Jiang, Yunhong
@ 2010-09-15  1:32               ` Jiang, Yunhong
  2010-09-15  6:52                 ` Keir Fraser
  0 siblings, 1 reply; 14+ messages in thread
From: Jiang, Yunhong @ 2010-09-15  1:32 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Jan, xen-devel, Beulich

Keir, this patch works well on my test, is it possible to pick this up? 

Thanks
--jyh

>-----Original Message-----
>From: Jiang, Yunhong
>Sent: Wednesday, September 01, 2010 3:39 PM
>To: Jiang, Yunhong; Jan Beulich
>Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>Sorry for slow response because I met some trouble on environment setup.
>
>This patch works well. Without this patch, the PV DomU can change the vector and
>cause spurios interrupt to xen hypervisor. With this patch, PV DomU can't update
>the vector anymore.
>
>Ack.
>
>Thanks
>--jyh
>
>>-----Original Message-----
>>From: Jiang, Yunhong
>>Sent: Friday, August 27, 2010 5:38 PM
>>To: Jan Beulich
>>Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>>Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table
>and
>>pending bit array
>>
>>>> Anyway, your patch do make great improvement, these issues (global list,
>>>> guest's existing mapping and checking in _sh_propgate) is not important, or
>>>> can be enhanced later, I'm glad to verify your patch on my system. Do you
>>>> have any updated patch, or I can simply use the one you sent out on Aug, 13?
>>>
>>>Yes, that one should be fine and up-to-date.
>>
>>Sure, but I will do that on next monday since I will be out in weekend and cant'
>>access the system.
>>
>>Thanks
>>--jyh
>>
>>>
>>>Jan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-09-15  1:32               ` Jiang, Yunhong
@ 2010-09-15  6:52                 ` Keir Fraser
  2010-09-15  7:02                   ` Jiang, Yunhong
  0 siblings, 1 reply; 14+ messages in thread
From: Keir Fraser @ 2010-09-15  6:52 UTC (permalink / raw)
  To: Jiang, Yunhong; +Cc: xen-devel, Jan Beulich

Jan hasn't submitted the patches for inclusion. They haven't been
'signed-off-by'.

 K.

On 15/09/2010 02:32, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:

> Keir, this patch works well on my test, is it possible to pick this up?
> 
> Thanks
> --jyh
> 
>> -----Original Message-----
>> From: Jiang, Yunhong
>> Sent: Wednesday, September 01, 2010 3:39 PM
>> To: Jiang, Yunhong; Jan Beulich
>> Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>> Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>> table and
>> pending bit array
>> 
>> Sorry for slow response because I met some trouble on environment setup.
>> 
>> This patch works well. Without this patch, the PV DomU can change the vector
>> and
>> cause spurios interrupt to xen hypervisor. With this patch, PV DomU can't
>> update
>> the vector anymore.
>> 
>> Ack.
>> 
>> Thanks
>> --jyh
>> 
>>> -----Original Message-----
>>> From: Jiang, Yunhong
>>> Sent: Friday, August 27, 2010 5:38 PM
>>> To: Jan Beulich
>>> Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>>> Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>>> table
>> and
>>> pending bit array
>>> 
>>>>> Anyway, your patch do make great improvement, these issues (global list,
>>>>> guest's existing mapping and checking in _sh_propgate) is not important,
>>>>> or
>>>>> can be enhanced later, I'm glad to verify your patch on my system. Do you
>>>>> have any updated patch, or I can simply use the one you sent out on Aug,
>>>>> 13?
>>>> 
>>>> Yes, that one should be fine and up-to-date.
>>> 
>>> Sure, but I will do that on next monday since I will be out in weekend and
>>> cant'
>>> access the system.
>>> 
>>> Thanks
>>> --jyh
>>> 
>>>> 
>>>> Jan
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH, RFC, resend] Re: granting access to  MSI-X table and pending bit array
  2010-09-15  6:52                 ` Keir Fraser
@ 2010-09-15  7:02                   ` Jiang, Yunhong
  0 siblings, 0 replies; 14+ messages in thread
From: Jiang, Yunhong @ 2010-09-15  7:02 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Jan, xen-devel, Beulich

Got it, thanks!.

--jyh

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Wednesday, September 15, 2010 2:52 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com; Jan Beulich
>Subject: Re: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X table and
>pending bit array
>
>Jan hasn't submitted the patches for inclusion. They haven't been
>'signed-off-by'.
>
> K.
>
>On 15/09/2010 02:32, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:
>
>> Keir, this patch works well on my test, is it possible to pick this up?
>>
>> Thanks
>> --jyh
>>
>>> -----Original Message-----
>>> From: Jiang, Yunhong
>>> Sent: Wednesday, September 01, 2010 3:39 PM
>>> To: Jiang, Yunhong; Jan Beulich
>>> Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>>> Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>>> table and
>>> pending bit array
>>>
>>> Sorry for slow response because I met some trouble on environment setup.
>>>
>>> This patch works well. Without this patch, the PV DomU can change the vector
>>> and
>>> cause spurios interrupt to xen hypervisor. With this patch, PV DomU can't
>>> update
>>> the vector anymore.
>>>
>>> Ack.
>>>
>>> Thanks
>>> --jyh
>>>
>>>> -----Original Message-----
>>>> From: Jiang, Yunhong
>>>> Sent: Friday, August 27, 2010 5:38 PM
>>>> To: Jan Beulich
>>>> Cc: xen-devel@lists.xensource.com; KonradRzeszutek Wilk
>>>> Subject: RE: [PATCH, RFC, resend] Re: [Xen-devel] granting access to MSI-X
>>>> table
>>> and
>>>> pending bit array
>>>>
>>>>>> Anyway, your patch do make great improvement, these issues (global list,
>>>>>> guest's existing mapping and checking in _sh_propgate) is not important,
>>>>>> or
>>>>>> can be enhanced later, I'm glad to verify your patch on my system. Do you
>>>>>> have any updated patch, or I can simply use the one you sent out on Aug,
>>>>>> 13?
>>>>>
>>>>> Yes, that one should be fine and up-to-date.
>>>>
>>>> Sure, but I will do that on next monday since I will be out in weekend and
>>>> cant'
>>>> access the system.
>>>>
>>>> Thanks
>>>> --jyh
>>>>
>>>>>
>>>>> Jan
>>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-09-15  7:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-13 13:37 [PATCH, RFC, resend] Re: granting access to MSI-X table and pending bit array Jan Beulich
2010-08-14  6:32 ` Keir Fraser
2010-08-16  7:55   ` Jan Beulich
2010-08-26  6:24 ` Jiang, Yunhong
2010-08-26  7:06   ` Jan Beulich
2010-08-26  8:41     ` Jiang, Yunhong
2010-08-26 11:22       ` Jan Beulich
2010-08-27  1:48         ` Jiang, Yunhong
2010-08-27  9:10           ` Jan Beulich
2010-08-27  9:38             ` Jiang, Yunhong
2010-09-01  7:39             ` Jiang, Yunhong
2010-09-15  1:32               ` Jiang, Yunhong
2010-09-15  6:52                 ` Keir Fraser
2010-09-15  7:02                   ` Jiang, Yunhong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.