All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
@ 2011-07-12  5:24 Haitao Shan
  2011-07-12  7:05 ` Jan Beulich
  2011-08-25 11:06 ` Ian Jackson
  0 siblings, 2 replies; 7+ messages in thread
From: Haitao Shan @ 2011-07-12  5:24 UTC (permalink / raw)
  To: Keir Fraser, Ian Jackson, Jan Beulich, George Dunlap, Tim Deegan

[-- Attachment #1: Type: text/plain, Size: 9076 bytes --]

Hi,

As reported by Jan, current Qemu does not handle MSIX table mapping properly.

Details:

MSI-X table resides in one of the physical BARs. When Qemu handles
guest's changes to BAR register (within which, MSI-X table resides),
Qemu first allows access of the whole BAR MMIO ranges and then removes
those of MSI-X. There is a small window here. It is possible that on a
SMP guests one vcpu could have access to the physical MSI-X
configurations when another vcpu is writing BAR registers.

The patch fixes this issue by first producing the valid MMIO ranges by
removing MSI-X table's range from the whole BAR mmio range and later
passing these ranges to Xen.

Please have a review, thanks!

Signed-off-by:    Shan Haitao <haitao.shan@intel.com>

diff --git a/hw/pass-through.c b/hw/pass-through.c
index 9c5620d..b9c2f32 100644
--- a/hw/pass-through.c
+++ b/hw/pass-through.c
@@ -92,6 +92,7 @@

 #include <unistd.h>
 #include <sys/ioctl.h>
+#include <assert.h>

 extern int gfx_passthru;
 int igd_passthru = 0;
@@ -1103,6 +1104,7 @@ static void pt_iomem_map(PCIDevice *d, int i,
uint32_t e_phys, uint32_t e_size,
 {
     struct pt_dev *assigned_device  = (struct pt_dev *)d;
     uint32_t old_ebase = assigned_device->bases[i].e_physbase;
+    uint32_t msix_last_pfn = 0, bar_last_pfn = 0;
     int first_map = ( assigned_device->bases[i].e_size == 0 );
     int ret = 0;

@@ -1118,39 +1120,124 @@ static void pt_iomem_map(PCIDevice *d, int i,
uint32_t e_phys, uint32_t e_size,

     if ( !first_map && old_ebase != -1 )
     {
-        add_msix_mapping(assigned_device, i);
-        /* Remove old mapping */
-        ret = xc_domain_memory_mapping(xc_handle, domid,
+        if ( has_msix_mapping(assigned_device, i) )
+        {
+            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
+                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
+            bar_last_pfn = (old_ebase + e_size - 1) >> XC_PAGE_SHIFT;
+
+            if ( assigned_device->msix->table_off )
+            {
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    old_ebase >> XC_PAGE_SHIFT,
+                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
+                    - (old_ebase >> XC_PAGE_SHIFT),
+                    DPCI_REMOVE_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+            if ( msix_last_pfn != bar_last_pfn )
+            {
+                assert(msix_last_pfn < bar_last_pfn);
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    msix_last_pfn + 1,
+                    (assigned_device->bases[i].access.maddr +
+                     assigned_device->msix->table_off +
+                     assigned_device->msix->total_entries * 16 +
+                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
+                    bar_last_pfn - msix_last_pfn,
+                    DPCI_REMOVE_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+        }
+        else
+        {
+		    /* Remove old mapping */
+		    ret = xc_domain_memory_mapping(xc_handle, domid,
                 old_ebase >> XC_PAGE_SHIFT,
                 assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
                 (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
                 DPCI_REMOVE_MAPPING);
-        if ( ret != 0 )
-        {
-            PT_LOG("Error: remove old mapping failed!\n");
-            return;
+            if ( ret != 0 )
+            {
+                PT_LOG("Error: remove old mapping failed!\n");
+                return;
+            }
         }
     }

     /* map only valid guest address */
     if (e_phys != -1)
     {
-        /* Create new mapping */
-        ret = xc_domain_memory_mapping(xc_handle, domid,
+        if ( has_msix_mapping(assigned_device, i) )
+		{
+            assigned_device->msix->mmio_base_addr =
+                assigned_device->bases[i].e_physbase
+                + assigned_device->msix->table_off;
+
+            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
+                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
+            bar_last_pfn = (e_phys + e_size - 1) >> XC_PAGE_SHIFT;
+
+            cpu_register_physical_memory(assigned_device->msix->mmio_base_addr,
+                                 assigned_device->msix->total_entries * 16,
+                                 assigned_device->msix->mmio_index);
+
+            if ( assigned_device->msix->table_off )
+            {
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
+                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
+                    - (assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT),
+                    DPCI_ADD_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+            if ( msix_last_pfn != bar_last_pfn )
+            {
+                assert(msix_last_pfn < bar_last_pfn);
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    msix_last_pfn + 1,
+                    (assigned_device->bases[i].access.maddr +
+                     assigned_device->msix->table_off +
+                     assigned_device->msix->total_entries * 16 +
+                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
+                    bar_last_pfn - msix_last_pfn,
+                    DPCI_ADD_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+		}
+		else
+        {
+			/* Create new mapping */
+			ret = xc_domain_memory_mapping(xc_handle, domid,
                 assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
                 assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
                 (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
                 DPCI_ADD_MAPPING);

-        if ( ret != 0 )
-        {
-            PT_LOG("Error: create new mapping failed!\n");
+            if ( ret != 0 )
+            {
+                PT_LOG("Error: create new mapping failed!\n");
+            }
         }

-        ret = remove_msix_mapping(assigned_device, i);
-        if ( ret != 0 )
-            PT_LOG("Error: remove MSI-X mmio mapping failed!\n");
-
         if ( old_ebase != e_phys && old_ebase != -1 )
             pt_msix_update_remap(assigned_device, i);
     }
diff --git a/hw/pt-msi.c b/hw/pt-msi.c
index 71fa6f0..1fbebd4 100644
--- a/hw/pt-msi.c
+++ b/hw/pt-msi.c
@@ -528,39 +528,12 @@ static CPUReadMemoryFunc *pci_msix_read[] = {
     pci_msix_readl
 };

-int add_msix_mapping(struct pt_dev *dev, int bar_index)
+int has_msix_mapping(struct pt_dev *dev, int bar_index)
 {
     if ( !(dev->msix && dev->msix->bar_index == bar_index) )
         return 0;

-    return xc_domain_memory_mapping(xc_handle, domid,
-                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
-                (dev->bases[bar_index].access.maddr
-                + dev->msix->table_off) >> XC_PAGE_SHIFT,
-                (dev->msix->total_entries * 16
-                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
-                DPCI_ADD_MAPPING);
-}
-
-int remove_msix_mapping(struct pt_dev *dev, int bar_index)
-{
-    if ( !(dev->msix && dev->msix->bar_index == bar_index) )
-        return 0;
-
-    dev->msix->mmio_base_addr = dev->bases[bar_index].e_physbase
-                                + dev->msix->table_off;
-
-    cpu_register_physical_memory(dev->msix->mmio_base_addr,
-                                 dev->msix->total_entries * 16,
-                                 dev->msix->mmio_index);
-
-    return xc_domain_memory_mapping(xc_handle, domid,
-                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
-                (dev->bases[bar_index].access.maddr
-                + dev->msix->table_off) >> XC_PAGE_SHIFT,
-                (dev->msix->total_entries * 16
-                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
-                DPCI_REMOVE_MAPPING);
+	return 1;
 }

 int pt_msix_init(struct pt_dev *dev, int pos)
diff --git a/hw/pt-msi.h b/hw/pt-msi.h
index 9664f89..2dc1720 100644
--- a/hw/pt-msi.h
+++ b/hw/pt-msi.h
@@ -107,10 +107,7 @@ void
 pt_msix_disable(struct pt_dev *dev);

 int
-remove_msix_mapping(struct pt_dev *dev, int bar_index);
-
-int
-add_msix_mapping(struct pt_dev *dev, int bar_index);
+has_msix_mapping(struct pt_dev *dev, int bar_index);

 int
 pt_msix_init(struct pt_dev *dev, int pos);

[-- Attachment #2: fix_msix_sec_hole.patch --]
[-- Type: application/octet-stream, Size: 8340 bytes --]

diff --git a/hw/pass-through.c b/hw/pass-through.c
index 9c5620d..b9c2f32 100644
--- a/hw/pass-through.c
+++ b/hw/pass-through.c
@@ -92,6 +92,7 @@
 
 #include <unistd.h>
 #include <sys/ioctl.h>
+#include <assert.h>
 
 extern int gfx_passthru;
 int igd_passthru = 0;
@@ -1103,6 +1104,7 @@ static void pt_iomem_map(PCIDevice *d, int i, uint32_t e_phys, uint32_t e_size,
 {
     struct pt_dev *assigned_device  = (struct pt_dev *)d;
     uint32_t old_ebase = assigned_device->bases[i].e_physbase;
+    uint32_t msix_last_pfn = 0, bar_last_pfn = 0;
     int first_map = ( assigned_device->bases[i].e_size == 0 );
     int ret = 0;
 
@@ -1118,39 +1120,124 @@ static void pt_iomem_map(PCIDevice *d, int i, uint32_t e_phys, uint32_t e_size,
 
     if ( !first_map && old_ebase != -1 )
     {
-        add_msix_mapping(assigned_device, i);
-        /* Remove old mapping */
-        ret = xc_domain_memory_mapping(xc_handle, domid,
+        if ( has_msix_mapping(assigned_device, i) )
+        {
+            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
+                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
+            bar_last_pfn = (old_ebase + e_size - 1) >> XC_PAGE_SHIFT;
+
+            if ( assigned_device->msix->table_off )
+            {
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    old_ebase >> XC_PAGE_SHIFT,
+                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
+                    - (old_ebase >> XC_PAGE_SHIFT),
+                    DPCI_REMOVE_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+            if ( msix_last_pfn != bar_last_pfn )
+            {
+                assert(msix_last_pfn < bar_last_pfn);
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    msix_last_pfn + 1,
+                    (assigned_device->bases[i].access.maddr +
+                     assigned_device->msix->table_off +
+                     assigned_device->msix->total_entries * 16 +
+                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
+                    bar_last_pfn - msix_last_pfn,
+                    DPCI_REMOVE_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+        }
+        else
+        {
+		    /* Remove old mapping */
+		    ret = xc_domain_memory_mapping(xc_handle, domid,
                 old_ebase >> XC_PAGE_SHIFT,
                 assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
                 (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
                 DPCI_REMOVE_MAPPING);
-        if ( ret != 0 )
-        {
-            PT_LOG("Error: remove old mapping failed!\n");
-            return;
+            if ( ret != 0 )
+            {
+                PT_LOG("Error: remove old mapping failed!\n");
+                return;
+            }
         }
     }
 
     /* map only valid guest address */
     if (e_phys != -1)
     {
-        /* Create new mapping */
-        ret = xc_domain_memory_mapping(xc_handle, domid,
+        if ( has_msix_mapping(assigned_device, i) )
+		{
+            assigned_device->msix->mmio_base_addr =
+                assigned_device->bases[i].e_physbase
+                + assigned_device->msix->table_off;
+
+            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
+                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
+            bar_last_pfn = (e_phys + e_size - 1) >> XC_PAGE_SHIFT;
+
+            cpu_register_physical_memory(assigned_device->msix->mmio_base_addr,
+                                 assigned_device->msix->total_entries * 16,
+                                 assigned_device->msix->mmio_index);
+
+            if ( assigned_device->msix->table_off )
+            {
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
+                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
+                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
+                    - (assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT),
+                    DPCI_ADD_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+            if ( msix_last_pfn != bar_last_pfn )
+            {
+                assert(msix_last_pfn < bar_last_pfn);
+		        ret = xc_domain_memory_mapping(xc_handle, domid,
+                    msix_last_pfn + 1,
+                    (assigned_device->bases[i].access.maddr +
+                     assigned_device->msix->table_off +
+                     assigned_device->msix->total_entries * 16 +
+                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
+                    bar_last_pfn - msix_last_pfn,
+                    DPCI_ADD_MAPPING);
+                if ( ret != 0 )
+                {
+                    PT_LOG("Error: remove old mapping failed!\n");
+                    return;
+                }
+            }
+		}
+		else
+        {
+			/* Create new mapping */
+			ret = xc_domain_memory_mapping(xc_handle, domid,
                 assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
                 assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
                 (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
                 DPCI_ADD_MAPPING);
 
-        if ( ret != 0 )
-        {
-            PT_LOG("Error: create new mapping failed!\n");
+            if ( ret != 0 )
+            {
+                PT_LOG("Error: create new mapping failed!\n");
+            }
         }
 
-        ret = remove_msix_mapping(assigned_device, i);
-        if ( ret != 0 )
-            PT_LOG("Error: remove MSI-X mmio mapping failed!\n");
-
         if ( old_ebase != e_phys && old_ebase != -1 )
             pt_msix_update_remap(assigned_device, i);
     }
diff --git a/hw/pt-msi.c b/hw/pt-msi.c
index 71fa6f0..1fbebd4 100644
--- a/hw/pt-msi.c
+++ b/hw/pt-msi.c
@@ -528,39 +528,12 @@ static CPUReadMemoryFunc *pci_msix_read[] = {
     pci_msix_readl
 };
 
-int add_msix_mapping(struct pt_dev *dev, int bar_index)
+int has_msix_mapping(struct pt_dev *dev, int bar_index)
 {
     if ( !(dev->msix && dev->msix->bar_index == bar_index) )
         return 0;
 
-    return xc_domain_memory_mapping(xc_handle, domid,
-                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
-                (dev->bases[bar_index].access.maddr
-                + dev->msix->table_off) >> XC_PAGE_SHIFT,
-                (dev->msix->total_entries * 16
-                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
-                DPCI_ADD_MAPPING);
-}
-
-int remove_msix_mapping(struct pt_dev *dev, int bar_index)
-{
-    if ( !(dev->msix && dev->msix->bar_index == bar_index) )
-        return 0;
-
-    dev->msix->mmio_base_addr = dev->bases[bar_index].e_physbase
-                                + dev->msix->table_off;
-
-    cpu_register_physical_memory(dev->msix->mmio_base_addr,
-                                 dev->msix->total_entries * 16,
-                                 dev->msix->mmio_index);
-
-    return xc_domain_memory_mapping(xc_handle, domid,
-                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
-                (dev->bases[bar_index].access.maddr
-                + dev->msix->table_off) >> XC_PAGE_SHIFT,
-                (dev->msix->total_entries * 16
-                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
-                DPCI_REMOVE_MAPPING);
+	return 1;
 }
 
 int pt_msix_init(struct pt_dev *dev, int pos)
diff --git a/hw/pt-msi.h b/hw/pt-msi.h
index 9664f89..2dc1720 100644
--- a/hw/pt-msi.h
+++ b/hw/pt-msi.h
@@ -107,10 +107,7 @@ void
 pt_msix_disable(struct pt_dev *dev);
 
 int
-remove_msix_mapping(struct pt_dev *dev, int bar_index);
-
-int
-add_msix_mapping(struct pt_dev *dev, int bar_index);
+has_msix_mapping(struct pt_dev *dev, int bar_index);
 
 int
 pt_msix_init(struct pt_dev *dev, int pos);

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-07-12  5:24 [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management Haitao Shan
@ 2011-07-12  7:05 ` Jan Beulich
  2011-07-12  9:30   ` Haitao Shan
  2011-08-25 11:06 ` Ian Jackson
  1 sibling, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2011-07-12  7:05 UTC (permalink / raw)
  To: Haitao Shan
  Cc: George Dunlap, xen-devel, Ian Jackson, Tim Deegan, Keir Fraser

>>> On 12.07.11 at 07:24, Haitao Shan <maillists.shan@gmail.com> wrote:
> Hi,
> 
> As reported by Jan, current Qemu does not handle MSIX table mapping 
> properly.
> 
> Details:
> 
> MSI-X table resides in one of the physical BARs. When Qemu handles
> guest's changes to BAR register (within which, MSI-X table resides),
> Qemu first allows access of the whole BAR MMIO ranges and then removes
> those of MSI-X. There is a small window here. It is possible that on a
> SMP guests one vcpu could have access to the physical MSI-X
> configurations when another vcpu is writing BAR registers.
> 
> The patch fixes this issue by first producing the valid MMIO ranges by
> removing MSI-X table's range from the whole BAR mmio range and later
> passing these ranges to Xen.

That's only half of it - something similar would need to be done for the
pending bit array.

Further I'm having the impression that while you avoid assigning the
questionable MMIO range to the guest (which isn't a security concern
as long as the BAR determination for the device in the hypervisor is
correct), your patch doesn't prevent qemu actually mapping these
ranges writably and allow pci_msix_writel() to access it (which is the
actual open security problem).

Further, I don't think it's correct to remove guest access to either of
the two ranges altogether - either qemu needs to emulate access to
these, or the guest ought to be able to access the ranges directly,
but read-only.

Jan

> Please have a review, thanks!
> 
> Signed-off-by:    Shan Haitao <haitao.shan@intel.com>
> 
> diff --git a/hw/pass-through.c b/hw/pass-through.c
> index 9c5620d..b9c2f32 100644
> --- a/hw/pass-through.c
> +++ b/hw/pass-through.c
> @@ -92,6 +92,7 @@
> 
>  #include <unistd.h>
>  #include <sys/ioctl.h>
> +#include <assert.h>
> 
>  extern int gfx_passthru;
>  int igd_passthru = 0;
> @@ -1103,6 +1104,7 @@ static void pt_iomem_map(PCIDevice *d, int i,
> uint32_t e_phys, uint32_t e_size,
>  {
>      struct pt_dev *assigned_device  = (struct pt_dev *)d;
>      uint32_t old_ebase = assigned_device->bases[i].e_physbase;
> +    uint32_t msix_last_pfn = 0, bar_last_pfn = 0;
>      int first_map = ( assigned_device->bases[i].e_size == 0 );
>      int ret = 0;
> 
> @@ -1118,39 +1120,124 @@ static void pt_iomem_map(PCIDevice *d, int i,
> uint32_t e_phys, uint32_t e_size,
> 
>      if ( !first_map && old_ebase != -1 )
>      {
> -        add_msix_mapping(assigned_device, i);
> -        /* Remove old mapping */
> -        ret = xc_domain_memory_mapping(xc_handle, domid,
> +        if ( has_msix_mapping(assigned_device, i) )
> +        {
> +            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
> +                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
> +            bar_last_pfn = (old_ebase + e_size - 1) >> XC_PAGE_SHIFT;
> +
> +            if ( assigned_device->msix->table_off )
> +            {
> +		        ret = xc_domain_memory_mapping(xc_handle, domid,
> +                    old_ebase >> XC_PAGE_SHIFT,
> +                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
> +                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
> +                    - (old_ebase >> XC_PAGE_SHIFT),
> +                    DPCI_REMOVE_MAPPING);
> +                if ( ret != 0 )
> +                {
> +                    PT_LOG("Error: remove old mapping failed!\n");
> +                    return;
> +                }
> +            }
> +            if ( msix_last_pfn != bar_last_pfn )
> +            {
> +                assert(msix_last_pfn < bar_last_pfn);
> +		        ret = xc_domain_memory_mapping(xc_handle, domid,
> +                    msix_last_pfn + 1,
> +                    (assigned_device->bases[i].access.maddr +
> +                     assigned_device->msix->table_off +
> +                     assigned_device->msix->total_entries * 16 +
> +                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
> +                    bar_last_pfn - msix_last_pfn,
> +                    DPCI_REMOVE_MAPPING);
> +                if ( ret != 0 )
> +                {
> +                    PT_LOG("Error: remove old mapping failed!\n");
> +                    return;
> +                }
> +            }
> +        }
> +        else
> +        {
> +		    /* Remove old mapping */
> +		    ret = xc_domain_memory_mapping(xc_handle, domid,
>                  old_ebase >> XC_PAGE_SHIFT,
>                  assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>                  (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
>                  DPCI_REMOVE_MAPPING);
> -        if ( ret != 0 )
> -        {
> -            PT_LOG("Error: remove old mapping failed!\n");
> -            return;
> +            if ( ret != 0 )
> +            {
> +                PT_LOG("Error: remove old mapping failed!\n");
> +                return;
> +            }
>          }
>      }
> 
>      /* map only valid guest address */
>      if (e_phys != -1)
>      {
> -        /* Create new mapping */
> -        ret = xc_domain_memory_mapping(xc_handle, domid,
> +        if ( has_msix_mapping(assigned_device, i) )
> +		{
> +            assigned_device->msix->mmio_base_addr =
> +                assigned_device->bases[i].e_physbase
> +                + assigned_device->msix->table_off;
> +
> +            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
> +                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
> +            bar_last_pfn = (e_phys + e_size - 1) >> XC_PAGE_SHIFT;
> +
> +            cpu_register_physical_memory(assigned_device->msix->mmio_base_addr,
> +                                 assigned_device->msix->total_entries * 16,
> +                                 assigned_device->msix->mmio_index);
> +
> +            if ( assigned_device->msix->table_off )
> +            {
> +		        ret = xc_domain_memory_mapping(xc_handle, domid,
> +                    assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
> +                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
> +                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
> +                    - (assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT),
> +                    DPCI_ADD_MAPPING);
> +                if ( ret != 0 )
> +                {
> +                    PT_LOG("Error: remove old mapping failed!\n");
> +                    return;
> +                }
> +            }
> +            if ( msix_last_pfn != bar_last_pfn )
> +            {
> +                assert(msix_last_pfn < bar_last_pfn);
> +		        ret = xc_domain_memory_mapping(xc_handle, domid,
> +                    msix_last_pfn + 1,
> +                    (assigned_device->bases[i].access.maddr +
> +                     assigned_device->msix->table_off +
> +                     assigned_device->msix->total_entries * 16 +
> +                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
> +                    bar_last_pfn - msix_last_pfn,
> +                    DPCI_ADD_MAPPING);
> +                if ( ret != 0 )
> +                {
> +                    PT_LOG("Error: remove old mapping failed!\n");
> +                    return;
> +                }
> +            }
> +		}
> +		else
> +        {
> +			/* Create new mapping */
> +			ret = xc_domain_memory_mapping(xc_handle, domid,
>                  assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
>                  assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>                  (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
>                  DPCI_ADD_MAPPING);
> 
> -        if ( ret != 0 )
> -        {
> -            PT_LOG("Error: create new mapping failed!\n");
> +            if ( ret != 0 )
> +            {
> +                PT_LOG("Error: create new mapping failed!\n");
> +            }
>          }
> 
> -        ret = remove_msix_mapping(assigned_device, i);
> -        if ( ret != 0 )
> -            PT_LOG("Error: remove MSI-X mmio mapping failed!\n");
> -
>          if ( old_ebase != e_phys && old_ebase != -1 )
>              pt_msix_update_remap(assigned_device, i);
>      }
> diff --git a/hw/pt-msi.c b/hw/pt-msi.c
> index 71fa6f0..1fbebd4 100644
> --- a/hw/pt-msi.c
> +++ b/hw/pt-msi.c
> @@ -528,39 +528,12 @@ static CPUReadMemoryFunc *pci_msix_read[] = {
>      pci_msix_readl
>  };
> 
> -int add_msix_mapping(struct pt_dev *dev, int bar_index)
> +int has_msix_mapping(struct pt_dev *dev, int bar_index)
>  {
>      if ( !(dev->msix && dev->msix->bar_index == bar_index) )
>          return 0;
> 
> -    return xc_domain_memory_mapping(xc_handle, domid,
> -                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
> -                (dev->bases[bar_index].access.maddr
> -                + dev->msix->table_off) >> XC_PAGE_SHIFT,
> -                (dev->msix->total_entries * 16
> -                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
> -                DPCI_ADD_MAPPING);
> -}
> -
> -int remove_msix_mapping(struct pt_dev *dev, int bar_index)
> -{
> -    if ( !(dev->msix && dev->msix->bar_index == bar_index) )
> -        return 0;
> -
> -    dev->msix->mmio_base_addr = dev->bases[bar_index].e_physbase
> -                                + dev->msix->table_off;
> -
> -    cpu_register_physical_memory(dev->msix->mmio_base_addr,
> -                                 dev->msix->total_entries * 16,
> -                                 dev->msix->mmio_index);
> -
> -    return xc_domain_memory_mapping(xc_handle, domid,
> -                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
> -                (dev->bases[bar_index].access.maddr
> -                + dev->msix->table_off) >> XC_PAGE_SHIFT,
> -                (dev->msix->total_entries * 16
> -                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
> -                DPCI_REMOVE_MAPPING);
> +	return 1;
>  }
> 
>  int pt_msix_init(struct pt_dev *dev, int pos)
> diff --git a/hw/pt-msi.h b/hw/pt-msi.h
> index 9664f89..2dc1720 100644
> --- a/hw/pt-msi.h
> +++ b/hw/pt-msi.h
> @@ -107,10 +107,7 @@ void
>  pt_msix_disable(struct pt_dev *dev);
> 
>  int
> -remove_msix_mapping(struct pt_dev *dev, int bar_index);
> -
> -int
> -add_msix_mapping(struct pt_dev *dev, int bar_index);
> +has_msix_mapping(struct pt_dev *dev, int bar_index);
> 
>  int
>  pt_msix_init(struct pt_dev *dev, int pos);

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-07-12  7:05 ` Jan Beulich
@ 2011-07-12  9:30   ` Haitao Shan
  2011-07-12  9:48     ` Jan Beulich
  0 siblings, 1 reply; 7+ messages in thread
From: Haitao Shan @ 2011-07-12  9:30 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Keir Fraser, xen-devel, Ian Jackson, Tim Deegan

Thanks for your comments. However, I got the impression that what I
saw here in this email thread was not in accordance with what I got
when this issue was first submitted.
I am not aware of any context of larger scope of MSI-X cleaning ups if
you are planning to do so. As a result, I might be missing some
important points. So please just go ahead and submit your patches.
Comments are embedded below.


2011/7/12 Jan Beulich <JBeulich@novell.com>:
>>>> On 12.07.11 at 07:24, Haitao Shan <maillists.shan@gmail.com> wrote:
>> Hi,
>>
>> As reported by Jan, current Qemu does not handle MSIX table mapping
>> properly.
>>
>> Details:
>>
>> MSI-X table resides in one of the physical BARs. When Qemu handles
>> guest's changes to BAR register (within which, MSI-X table resides),
>> Qemu first allows access of the whole BAR MMIO ranges and then removes
>> those of MSI-X. There is a small window here. It is possible that on a
>> SMP guests one vcpu could have access to the physical MSI-X
>> configurations when another vcpu is writing BAR registers.
>>
>> The patch fixes this issue by first producing the valid MMIO ranges by
>> removing MSI-X table's range from the whole BAR mmio range and later
>> passing these ranges to Xen.
>
> That's only half of it - something similar would need to be done for the
> pending bit array.
Please justify why this read-only PBA should also be removed from
guest access. Note that we actually mask physical MSI via MSI-X table
when guests mask it via virtualized MSI-X table.

>
> Further I'm having the impression that while you avoid assigning the
> questionable MMIO range to the guest (which isn't a security concern
> as long as the BAR determination for the device in the hypervisor is
> correct), your patch doesn't prevent qemu actually mapping these
> ranges writably and allow pci_msix_writel() to access it (which is the
> actual open security problem).
I totally disagree. I think Dom0 together with its management SW
entities such as Qemu and libxc/libxl are to be trusted. It can be
arguable to what extent Xen can trust them.
Mapping that to Qemu is mainly for writing MASK bit to physical MSI-X
table directly. Handling guests' masking MSI is already too long a
code path.
If Qemu is not trusted, I would say perhaps you can move MSI
virtualization part from Qemu to Xen itself.

>
> Further, I don't think it's correct to remove guest access to either of
> the two ranges altogether - either qemu needs to emulate access to
> these, or the guest ought to be able to access the ranges directly,
> but read-only.
PBA is exposed to guests, unless it happens to be located on the same
page of MSI-X table (in this case, it have to be removed), per my
understanding.
MSI-X table cannot be exposed to guests even read-only.

Shan Haitao

>
> Jan
>
>> Please have a review, thanks!
>>
>> Signed-off-by:    Shan Haitao <haitao.shan@intel.com>
>>
>> diff --git a/hw/pass-through.c b/hw/pass-through.c
>> index 9c5620d..b9c2f32 100644
>> --- a/hw/pass-through.c
>> +++ b/hw/pass-through.c
>> @@ -92,6 +92,7 @@
>>
>>  #include <unistd.h>
>>  #include <sys/ioctl.h>
>> +#include <assert.h>
>>
>>  extern int gfx_passthru;
>>  int igd_passthru = 0;
>> @@ -1103,6 +1104,7 @@ static void pt_iomem_map(PCIDevice *d, int i,
>> uint32_t e_phys, uint32_t e_size,
>>  {
>>      struct pt_dev *assigned_device  = (struct pt_dev *)d;
>>      uint32_t old_ebase = assigned_device->bases[i].e_physbase;
>> +    uint32_t msix_last_pfn = 0, bar_last_pfn = 0;
>>      int first_map = ( assigned_device->bases[i].e_size == 0 );
>>      int ret = 0;
>>
>> @@ -1118,39 +1120,124 @@ static void pt_iomem_map(PCIDevice *d, int i,
>> uint32_t e_phys, uint32_t e_size,
>>
>>      if ( !first_map && old_ebase != -1 )
>>      {
>> -        add_msix_mapping(assigned_device, i);
>> -        /* Remove old mapping */
>> -        ret = xc_domain_memory_mapping(xc_handle, domid,
>> +        if ( has_msix_mapping(assigned_device, i) )
>> +        {
>> +            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
>> +                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
>> +            bar_last_pfn = (old_ebase + e_size - 1) >> XC_PAGE_SHIFT;
>> +
>> +            if ( assigned_device->msix->table_off )
>> +            {
>> +                     ret = xc_domain_memory_mapping(xc_handle, domid,
>> +                    old_ebase >> XC_PAGE_SHIFT,
>> +                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>> +                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
>> +                    - (old_ebase >> XC_PAGE_SHIFT),
>> +                    DPCI_REMOVE_MAPPING);
>> +                if ( ret != 0 )
>> +                {
>> +                    PT_LOG("Error: remove old mapping failed!\n");
>> +                    return;
>> +                }
>> +            }
>> +            if ( msix_last_pfn != bar_last_pfn )
>> +            {
>> +                assert(msix_last_pfn < bar_last_pfn);
>> +                     ret = xc_domain_memory_mapping(xc_handle, domid,
>> +                    msix_last_pfn + 1,
>> +                    (assigned_device->bases[i].access.maddr +
>> +                     assigned_device->msix->table_off +
>> +                     assigned_device->msix->total_entries * 16 +
>> +                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
>> +                    bar_last_pfn - msix_last_pfn,
>> +                    DPCI_REMOVE_MAPPING);
>> +                if ( ret != 0 )
>> +                {
>> +                    PT_LOG("Error: remove old mapping failed!\n");
>> +                    return;
>> +                }
>> +            }
>> +        }
>> +        else
>> +        {
>> +                 /* Remove old mapping */
>> +                 ret = xc_domain_memory_mapping(xc_handle, domid,
>>                  old_ebase >> XC_PAGE_SHIFT,
>>                  assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>>                  (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
>>                  DPCI_REMOVE_MAPPING);
>> -        if ( ret != 0 )
>> -        {
>> -            PT_LOG("Error: remove old mapping failed!\n");
>> -            return;
>> +            if ( ret != 0 )
>> +            {
>> +                PT_LOG("Error: remove old mapping failed!\n");
>> +                return;
>> +            }
>>          }
>>      }
>>
>>      /* map only valid guest address */
>>      if (e_phys != -1)
>>      {
>> -        /* Create new mapping */
>> -        ret = xc_domain_memory_mapping(xc_handle, domid,
>> +        if ( has_msix_mapping(assigned_device, i) )
>> +             {
>> +            assigned_device->msix->mmio_base_addr =
>> +                assigned_device->bases[i].e_physbase
>> +                + assigned_device->msix->table_off;
>> +
>> +            msix_last_pfn = (assigned_device->msix->mmio_base_addr - 1 +
>> +                  assigned_device->msix->total_entries * 16) >>  XC_PAGE_SHIFT;
>> +            bar_last_pfn = (e_phys + e_size - 1) >> XC_PAGE_SHIFT;
>> +
>> +            cpu_register_physical_memory(assigned_device->msix->mmio_base_addr,
>> +                                 assigned_device->msix->total_entries * 16,
>> +                                 assigned_device->msix->mmio_index);
>> +
>> +            if ( assigned_device->msix->table_off )
>> +            {
>> +                     ret = xc_domain_memory_mapping(xc_handle, domid,
>> +                    assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
>> +                    assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>> +                    (assigned_device->msix->mmio_base_addr >> XC_PAGE_SHIFT)
>> +                    - (assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT),
>> +                    DPCI_ADD_MAPPING);
>> +                if ( ret != 0 )
>> +                {
>> +                    PT_LOG("Error: remove old mapping failed!\n");
>> +                    return;
>> +                }
>> +            }
>> +            if ( msix_last_pfn != bar_last_pfn )
>> +            {
>> +                assert(msix_last_pfn < bar_last_pfn);
>> +                     ret = xc_domain_memory_mapping(xc_handle, domid,
>> +                    msix_last_pfn + 1,
>> +                    (assigned_device->bases[i].access.maddr +
>> +                     assigned_device->msix->table_off +
>> +                     assigned_device->msix->total_entries * 16 +
>> +                     XC_PAGE_SIZE -1) >>  XC_PAGE_SHIFT,
>> +                    bar_last_pfn - msix_last_pfn,
>> +                    DPCI_ADD_MAPPING);
>> +                if ( ret != 0 )
>> +                {
>> +                    PT_LOG("Error: remove old mapping failed!\n");
>> +                    return;
>> +                }
>> +            }
>> +             }
>> +             else
>> +        {
>> +                     /* Create new mapping */
>> +                     ret = xc_domain_memory_mapping(xc_handle, domid,
>>                  assigned_device->bases[i].e_physbase >> XC_PAGE_SHIFT,
>>                  assigned_device->bases[i].access.maddr >> XC_PAGE_SHIFT,
>>                  (e_size+XC_PAGE_SIZE-1) >> XC_PAGE_SHIFT,
>>                  DPCI_ADD_MAPPING);
>>
>> -        if ( ret != 0 )
>> -        {
>> -            PT_LOG("Error: create new mapping failed!\n");
>> +            if ( ret != 0 )
>> +            {
>> +                PT_LOG("Error: create new mapping failed!\n");
>> +            }
>>          }
>>
>> -        ret = remove_msix_mapping(assigned_device, i);
>> -        if ( ret != 0 )
>> -            PT_LOG("Error: remove MSI-X mmio mapping failed!\n");
>> -
>>          if ( old_ebase != e_phys && old_ebase != -1 )
>>              pt_msix_update_remap(assigned_device, i);
>>      }
>> diff --git a/hw/pt-msi.c b/hw/pt-msi.c
>> index 71fa6f0..1fbebd4 100644
>> --- a/hw/pt-msi.c
>> +++ b/hw/pt-msi.c
>> @@ -528,39 +528,12 @@ static CPUReadMemoryFunc *pci_msix_read[] = {
>>      pci_msix_readl
>>  };
>>
>> -int add_msix_mapping(struct pt_dev *dev, int bar_index)
>> +int has_msix_mapping(struct pt_dev *dev, int bar_index)
>>  {
>>      if ( !(dev->msix && dev->msix->bar_index == bar_index) )
>>          return 0;
>>
>> -    return xc_domain_memory_mapping(xc_handle, domid,
>> -                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
>> -                (dev->bases[bar_index].access.maddr
>> -                + dev->msix->table_off) >> XC_PAGE_SHIFT,
>> -                (dev->msix->total_entries * 16
>> -                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
>> -                DPCI_ADD_MAPPING);
>> -}
>> -
>> -int remove_msix_mapping(struct pt_dev *dev, int bar_index)
>> -{
>> -    if ( !(dev->msix && dev->msix->bar_index == bar_index) )
>> -        return 0;
>> -
>> -    dev->msix->mmio_base_addr = dev->bases[bar_index].e_physbase
>> -                                + dev->msix->table_off;
>> -
>> -    cpu_register_physical_memory(dev->msix->mmio_base_addr,
>> -                                 dev->msix->total_entries * 16,
>> -                                 dev->msix->mmio_index);
>> -
>> -    return xc_domain_memory_mapping(xc_handle, domid,
>> -                dev->msix->mmio_base_addr >> XC_PAGE_SHIFT,
>> -                (dev->bases[bar_index].access.maddr
>> -                + dev->msix->table_off) >> XC_PAGE_SHIFT,
>> -                (dev->msix->total_entries * 16
>> -                + XC_PAGE_SIZE -1) >> XC_PAGE_SHIFT,
>> -                DPCI_REMOVE_MAPPING);
>> +     return 1;
>>  }
>>
>>  int pt_msix_init(struct pt_dev *dev, int pos)
>> diff --git a/hw/pt-msi.h b/hw/pt-msi.h
>> index 9664f89..2dc1720 100644
>> --- a/hw/pt-msi.h
>> +++ b/hw/pt-msi.h
>> @@ -107,10 +107,7 @@ void
>>  pt_msix_disable(struct pt_dev *dev);
>>
>>  int
>> -remove_msix_mapping(struct pt_dev *dev, int bar_index);
>> -
>> -int
>> -add_msix_mapping(struct pt_dev *dev, int bar_index);
>> +has_msix_mapping(struct pt_dev *dev, int bar_index);
>>
>>  int
>>  pt_msix_init(struct pt_dev *dev, int pos);
>
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-07-12  9:30   ` Haitao Shan
@ 2011-07-12  9:48     ` Jan Beulich
  2011-07-12 13:32       ` Haitao Shan
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Beulich @ 2011-07-12  9:48 UTC (permalink / raw)
  To: Haitao Shan
  Cc: George Dunlap, Keir Fraser, xen-devel, Ian Jackson, Tim Deegan

>>> On 12.07.11 at 11:30, Haitao Shan <maillists.shan@gmail.com> wrote:
> I am not aware of any context of larger scope of MSI-X cleaning ups if
> you are planning to do so. As a result, I might be missing some
> important points. So please just go ahead and submit your patches.

No, I didn't have any plans besides the dealing with the proper
determination of virtual functions' MSI-X table and PBA addresses.

> 2011/7/12 Jan Beulich <JBeulich@novell.com>:
>>>>> On 12.07.11 at 07:24, Haitao Shan <maillists.shan@gmail.com> wrote:
>>> Hi,
>>>
>>> As reported by Jan, current Qemu does not handle MSIX table mapping
>>> properly.
>>>
>>> Details:
>>>
>>> MSI-X table resides in one of the physical BARs. When Qemu handles
>>> guest's changes to BAR register (within which, MSI-X table resides),
>>> Qemu first allows access of the whole BAR MMIO ranges and then removes
>>> those of MSI-X. There is a small window here. It is possible that on a
>>> SMP guests one vcpu could have access to the physical MSI-X
>>> configurations when another vcpu is writing BAR registers.
>>>
>>> The patch fixes this issue by first producing the valid MMIO ranges by
>>> removing MSI-X table's range from the whole BAR mmio range and later
>>> passing these ranges to Xen.
>>
>> That's only half of it - something similar would need to be done for the
>> pending bit array.
> Please justify why this read-only PBA should also be removed from
> guest access.

Because it being stated to be read-only accessible only doesn't mean
that all devices implement only read accesses (and discard writes).

> Note that we actually mask physical MSI via MSI-X table
> when guests mask it via virtualized MSI-X table.

As long as this happens from Xen, that's fine, but see below.

>> Further I'm having the impression that while you avoid assigning the
>> questionable MMIO range to the guest (which isn't a security concern
>> as long as the BAR determination for the device in the hypervisor is
>> correct), your patch doesn't prevent qemu actually mapping these
>> ranges writably and allow pci_msix_writel() to access it (which is the
>> actual open security problem).
> I totally disagree. I think Dom0 together with its management SW
> entities such as Qemu and libxc/libxl are to be trusted. It can be
> arguable to what extent Xen can trust them.

It's not a matter of trust, but one of correctness.

> Mapping that to Qemu is mainly for writing MASK bit to physical MSI-X
> table directly. Handling guests' masking MSI is already too long a
> code path.

Qemu getting this wrong (e.g. doing an unmask when the guest
requests so, but Xen wants that interrupt to be masked) can
confuse Xen significantly, up to the point where the other
interrupts (and hence the whole system) can be affected.

> If Qemu is not trusted, I would say perhaps you can move MSI
> virtualization part from Qemu to Xen itself.
> 
>>
>> Further, I don't think it's correct to remove guest access to either of
>> the two ranges altogether - either qemu needs to emulate access to
>> these, or the guest ought to be able to access the ranges directly,
>> but read-only.
> PBA is exposed to guests, unless it happens to be located on the same
> page of MSI-X table (in this case, it have to be removed), per my
> understanding.

Besides above described reason for not exposing the PBA writably,
having to handle two cases (PBA exposed and PBA invisible) would
just needlessly complicate the code. So making it consistent with
the MSI-X table is going to be both more secure and simpler to
implement.

> MSI-X table cannot be exposed to guests even read-only.

Why not? The guest (or qemu on its behalf) reading the table
doesn't do any harm. And with there being 31 currently
undefined bits in each entry, future extensions could be severely
restricted from using in guests if we're too restrictive here.

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-07-12  9:48     ` Jan Beulich
@ 2011-07-12 13:32       ` Haitao Shan
  0 siblings, 0 replies; 7+ messages in thread
From: Haitao Shan @ 2011-07-12 13:32 UTC (permalink / raw)
  To: Jan Beulich
  Cc: George Dunlap, Keir Fraser, xen-devel, Ian Jackson, Tim Deegan

2011/7/12 Jan Beulich <JBeulich@novell.com>:
>>>> On 12.07.11 at 11:30, Haitao Shan <maillists.shan@gmail.com> wrote:
>> I am not aware of any context of larger scope of MSI-X cleaning ups if
>> you are planning to do so. As a result, I might be missing some
>> important points. So please just go ahead and submit your patches.
>
> No, I didn't have any plans besides the dealing with the proper
> determination of virtual functions' MSI-X table and PBA addresses.
>
>> 2011/7/12 Jan Beulich <JBeulich@novell.com>:
>>>>>> On 12.07.11 at 07:24, Haitao Shan <maillists.shan@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> As reported by Jan, current Qemu does not handle MSIX table mapping
>>>> properly.
>>>>
>>>> Details:
>>>>
>>>> MSI-X table resides in one of the physical BARs. When Qemu handles
>>>> guest's changes to BAR register (within which, MSI-X table resides),
>>>> Qemu first allows access of the whole BAR MMIO ranges and then removes
>>>> those of MSI-X. There is a small window here. It is possible that on a
>>>> SMP guests one vcpu could have access to the physical MSI-X
>>>> configurations when another vcpu is writing BAR registers.
>>>>
>>>> The patch fixes this issue by first producing the valid MMIO ranges by
>>>> removing MSI-X table's range from the whole BAR mmio range and later
>>>> passing these ranges to Xen.
>>>
>>> That's only half of it - something similar would need to be done for the
>>> pending bit array.
>> Please justify why this read-only PBA should also be removed from
>> guest access.
>
> Because it being stated to be read-only accessible only doesn't mean
> that all devices implement only read accesses (and discard writes).
I will agree the statement itself above. The spec does say writes will
have undefined behavior. But unless there is proven case that this
undefined behavior harms the system, I don't see the need for Xen to
defend more on that.
For example, MMIOs are assigned to guests with IOMMU. Typically there
would be reserved space in MMIOs. Writes to these reserved space also
has undefined behavior. I don't believe this is the area that Xen can
defend.

>
>> Note that we actually mask physical MSI via MSI-X table
>> when guests mask it via virtualized MSI-X table.
>
> As long as this happens from Xen, that's fine, but see below.
>
>>> Further I'm having the impression that while you avoid assigning the
>>> questionable MMIO range to the guest (which isn't a security concern
>>> as long as the BAR determination for the device in the hypervisor is
>>> correct), your patch doesn't prevent qemu actually mapping these
>>> ranges writably and allow pci_msix_writel() to access it (which is the
>>> actual open security problem).
>> I totally disagree. I think Dom0 together with its management SW
>> entities such as Qemu and libxc/libxl are to be trusted. It can be
>> arguable to what extent Xen can trust them.
>
> It's not a matter of trust, but one of correctness.
>
>> Mapping that to Qemu is mainly for writing MASK bit to physical MSI-X
>> table directly. Handling guests' masking MSI is already too long a
>> code path.
>
> Qemu getting this wrong (e.g. doing an unmask when the guest
> requests so, but Xen wants that interrupt to be masked) can
> confuse Xen significantly, up to the point where the other
> interrupts (and hence the whole system) can be affected.
Doing or not doing a mask/unmask on guests' requests is a policy. To
me, either one is not perfect.
Not allowing Qemu to do the mask/unmask also has made things complex.
If guests' settings only affect virtual masking, Qemu has to pass down
the virtual masking information to Xen, as Qemu does not have a chance
to be part of interrupt delivery (and hence applying its virtual mask
decision). What's more, only masking MSI-X virtually cannot prevent
physical interrupt storms.
I would like to keep current policy in Qemu, unless anyone can move
the whole MSI/MSI-X logics to Xen.

>
>> If Qemu is not trusted, I would say perhaps you can move MSI
>> virtualization part from Qemu to Xen itself.
>>
>>>
>>> Further, I don't think it's correct to remove guest access to either of
>>> the two ranges altogether - either qemu needs to emulate access to
>>> these, or the guest ought to be able to access the ranges directly,
>>> but read-only.
>> PBA is exposed to guests, unless it happens to be located on the same
>> page of MSI-X table (in this case, it have to be removed), per my
>> understanding.
>
> Besides above described reason for not exposing the PBA writably,
> having to handle two cases (PBA exposed and PBA invisible) would
> just needlessly complicate the code. So making it consistent with
> the MSI-X table is going to be both more secure and simpler to
> implement.
>
>> MSI-X table cannot be exposed to guests even read-only.
>
> Why not? The guest (or qemu on its behalf) reading the table
> doesn't do any harm. And with there being 31 currently
> undefined bits in each entry, future extensions could be severely
> restricted from using in guests if we're too restrictive here.
Why can guests see host MSI-X table, including vectors information contained?
If this is OK, why not simply expose all CPU features via CPUID but
just block its usage of unwanted features. For example, as long as
guests cannot set XSAVE support via CR4 (if Xen implements this), do
you agree it is safe to expose XSAVE via CPUID to guests.

>
> Jan
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-07-12  5:24 [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management Haitao Shan
  2011-07-12  7:05 ` Jan Beulich
@ 2011-08-25 11:06 ` Ian Jackson
  2011-08-25 11:55   ` Jan Beulich
  1 sibling, 1 reply; 7+ messages in thread
From: Ian Jackson @ 2011-08-25 11:06 UTC (permalink / raw)
  To: Haitao Shan; +Cc: xen-devel, Deegan, George Dunlap

Haitao Shan writes ("[Xen-devel] [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management"):
> As reported by Jan, current Qemu does not handle MSIX table mapping properly.
> 
> Details:
> 
> MSI-X table resides in one of the physical BARs. When Qemu handles
> guest's changes to BAR register (within which, MSI-X table resides),
> Qemu first allows access of the whole BAR MMIO ranges and then removes
> those of MSI-X. There is a small window here. It is possible that on a
> SMP guests one vcpu could have access to the physical MSI-X
> configurations when another vcpu is writing BAR registers.
> 
> The patch fixes this issue by first producing the valid MMIO ranges by
> removing MSI-X table's range from the whole BAR mmio range and later
> passing these ranges to Xen.

I'm afraid it wasn't clear to me what the consensus was on the status
of the attached patch, and I'm not very familiar with the code.

Also, if this is a security problem we should really issue an advisory...

Ian.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management
  2011-08-25 11:06 ` Ian Jackson
@ 2011-08-25 11:55   ` Jan Beulich
  0 siblings, 0 replies; 7+ messages in thread
From: Jan Beulich @ 2011-08-25 11:55 UTC (permalink / raw)
  To: Ian Jackson
  Cc: xen-devel, TimDeegan, George Dunlap, Donald D Dugger,
	Keir Fraser, Haitao Shan

>>> On 25.08.11 at 13:06, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> Haitao Shan writes ("[Xen-devel] [PATCH] tools/ioemu: Fixing Security Hole in 
> Qemu MSIX table access management"):
>> As reported by Jan, current Qemu does not handle MSIX table mapping 
> properly.
>> 
>> Details:
>> 
>> MSI-X table resides in one of the physical BARs. When Qemu handles
>> guest's changes to BAR register (within which, MSI-X table resides),
>> Qemu first allows access of the whole BAR MMIO ranges and then removes
>> those of MSI-X. There is a small window here. It is possible that on a
>> SMP guests one vcpu could have access to the physical MSI-X
>> configurations when another vcpu is writing BAR registers.
>> 
>> The patch fixes this issue by first producing the valid MMIO ranges by
>> removing MSI-X table's range from the whole BAR mmio range and later
>> passing these ranges to Xen.
> 
> I'm afraid it wasn't clear to me what the consensus was on the status
> of the attached patch, and I'm not very familiar with the code.

Afaict, the change here removes only the smallest part of problem: Xen
already forces any mapping attempts of the MSI-X table by DomU-s to
be read-only once the respective MSI gets set up, so the window during
which the guest has full access exists only before any MSI gets set up.

> Also, if this is a security problem we should really issue an advisory...

In a larger round on one of the BOFs on the summit we agreed there is
an issue in that the way it currently works, qemu's (on behalf of and
exclusively driven by the guest) direct writing to the mask bit represents
a security problem, since Xen itself needs to be able to force interrupts
masked during certain operations (move_native_irq(), IRQ rate limiting,
2nd instance of already pending guest IRQ, and fixup_irq()), and
failure here would potentially affect the whole system.

My (limited) understanding of qemu-kvm's dealing with that is that they
hide the physical mask bit from the guest altogether, which works
presumably because during normal operation the bit never gets fiddled
with (but in polling mode some network drivers do make use of it, and
I'd expect that not to work under KVM, unless my reading of their
sources was wrong).

Jan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-08-25 11:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-12  5:24 [PATCH] tools/ioemu: Fixing Security Hole in Qemu MSIX table access management Haitao Shan
2011-07-12  7:05 ` Jan Beulich
2011-07-12  9:30   ` Haitao Shan
2011-07-12  9:48     ` Jan Beulich
2011-07-12 13:32       ` Haitao Shan
2011-08-25 11:06 ` Ian Jackson
2011-08-25 11:55   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.