All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
@ 2015-05-04  2:16 Tiejun Chen
  2015-05-04  2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04  2:16 UTC (permalink / raw)
  To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

clflush is a light weight but not serializing instruction, so
hence memory fence is necessary to make sure all load/store
visible before flush cache line.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index 109234e..fd2ff04 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
 
 void cacheline_flush(char * addr)
 {
+    mb();
     clflush(addr);
+    mb();
 }
 
 void flush_all_cache()
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
@ 2015-05-04  2:16 ` Tiejun Chen
  2015-05-04  4:07   ` Zhang, Yang Z
  2015-05-04  2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04  2:16 UTC (permalink / raw)
  To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

While initializing VT-D we should mask interrupt message generation
to avoid receiving any interrupt as pending before enable DMA
translation, and also mask that before disable DMA engine.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index c7bda73..72cd854 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1000,15 +1000,21 @@ static void dma_msi_unmask(struct irq_desc *desc)
     iommu->msi.msi_attrib.masked = 0;
 }
 
-static void dma_msi_mask(struct irq_desc *desc)
+static void mask_dma_interrupt(struct iommu *iommu)
 {
     unsigned long flags;
-    struct iommu *iommu = desc->action->dev_id;
 
-    /* mask it */
     spin_lock_irqsave(&iommu->register_lock, flags);
     dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
     spin_unlock_irqrestore(&iommu->register_lock, flags);
+}
+
+static void dma_msi_mask(struct irq_desc *desc)
+{
+    struct iommu *iommu = desc->action->dev_id;
+
+    /* mask it */
+    mask_dma_interrupt(iommu);
     iommu->msi.msi_attrib.masked = 1;
 }
 
@@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
     struct iommu *iommu;
     struct iommu_flush *flush = NULL;
     int ret;
-    unsigned long flags;
 
     /*
      * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.  
@@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
 
         iommu = drhd->iommu;
 
-        clear_fault_bits(iommu);
+        /*
+         * We shouldn't receive any VT-d interrupt while initializing
+         * VT-d so just mask interrupt message generation.
+         */
+        mask_dma_interrupt(iommu);
 
-        spin_lock_irqsave(&iommu->register_lock, flags);
-        dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
-        spin_unlock_irqrestore(&iommu->register_lock, flags);
+        /*
+         * And then clear all previous faults.
+         */
+        clear_fault_bits(iommu);
     }
 
     /*
@@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
         if ( force_iommu )
             continue;
 
+        /*
+         * Mask interrupt message generation.
+         */
+        mask_dma_interrupt(iommu);
+
         iommu_disable_translation(iommu);
 
         /* If interrupt remapping is enabled, queued invalidation
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5
  2015-05-04  2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
  2015-05-04  2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
@ 2015-05-04  2:16 ` Tiejun Chen
  2015-05-04  9:00   ` Jan Beulich
  2015-05-04  4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
  2015-05-04  8:52 ` Jan Beulich
  3 siblings, 1 reply; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04  2:16 UTC (permalink / raw)
  To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

S5 menas we also shut down IOMMU engine as well, so we should
disable IOMMU engine completely before enter S5, otherwise we
still probably receive some interrupts in this period. But
currently VT-D path depends on force_iommu and iommu_intremap
to determine if we should disable translation and queued
invalidation. So this unexpected behaviour can be fixed by
clearing force_iommu and iommu_intremap. Additionally, some
cleanups to code comments.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
 xen/arch/x86/acpi/power.c           |  2 ++
 xen/arch/x86/crash.c                | 12 ++++++++----
 xen/drivers/passthrough/vtd/iommu.c |  5 +++--
 3 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index f41f0de..345470d 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -183,6 +183,8 @@ static int enter_state(u32 state)
         break;
     case ACPI_STATE_S5:
         acpi_enter_sleep_state(ACPI_STATE_S5);
+        /* Make sure we really disable VT-d engine completely. */
+        force_iommu = iommu_intremap = 0;
         break;
     default:
         error = -EINVAL;
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index eb7be9c..123fb12 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -171,15 +171,19 @@ static void nmi_shootdown_cpus(void)
         printk("Failed to shoot down CPUs {%s}\n", keyhandler_scratch);
     }
 
-    /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
-     * happy when booting if interrupt/dma remapping is still enabled */
+    /*
+     * Crash shutdown any IOMMU functionality as the crashdump kernel is not
+     * happy when booting if interrupt/dma remapping is still enabled.
+     */
     iommu_crash_shutdown();
 
     __stop_this_cpu();
 
-    /* This is a bit of a hack due to the problems with the x2apic_enabled
+    /*
+     * This is a bit of a hack due to the problems with the x2apic_enabled
      * variable, but we can't do any better without a significant refactoring
-     * of the APIC code */
+     * of the APIC code.
+     */
     x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
 
     disable_IO_APIC();
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 72cd854..28244ec 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2367,9 +2367,10 @@ static void vtd_suspend(void)
 
         iommu_disable_translation(iommu);
 
-        /* If interrupt remapping is enabled, queued invalidation
+        /*
+         * If interrupt remapping is enabled, queued invalidation
          * will be disabled following interupt remapping disabling
-         * in local apic suspend
+         * in local apic suspend.
          */
         if ( !iommu_intremap && iommu_qinval )
             disable_qinval(iommu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
@ 2015-05-04  4:07   ` Zhang, Yang Z
  2015-05-04  5:08     ` Chen, Tiejun
  0 siblings, 1 reply; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04  4:07 UTC (permalink / raw)
  To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

Chen, Tiejun wrote on 2015-05-04:
> While initializing VT-D we should mask interrupt message generation
> to avoid receiving any interrupt as pending before enable DMA
> translation, and also mask that before disable DMA engine.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
>  xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
>  1 file changed, 23 insertions(+), 8 deletions(-)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
> a/xen/drivers/passthrough/vtd/iommu.c +++
> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
> void dma_msi_unmask(struct irq_desc *desc)
>      iommu->msi.msi_attrib.masked = 0;
>  }
> -static void dma_msi_mask(struct irq_desc *desc)
> +static void mask_dma_interrupt(struct iommu *iommu)
>  {
>      unsigned long flags;
> -    struct iommu *iommu = desc->action->dev_id;
> 
> -    /* mask it */
>      spin_lock_irqsave(&iommu->register_lock, flags);
>      dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>      spin_unlock_irqrestore(&iommu->register_lock, flags);
> +}
> +
> +static void dma_msi_mask(struct irq_desc *desc)
> +{
> +    struct iommu *iommu = desc->action->dev_id;
> +
> +    /* mask it */
> +    mask_dma_interrupt(iommu);
>      iommu->msi.msi_attrib.masked = 1;
>  }
> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>      struct iommu *iommu;
>      struct iommu_flush *flush = NULL;
>      int ret;
> -    unsigned long flags;
> 
>      /*
>       * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
> 
>          iommu = drhd->iommu;
> -        clear_fault_bits(iommu);
> +        /*
> +         * We shouldn't receive any VT-d interrupt while initializing
> +         * VT-d so just mask interrupt message generation.
> +         */
> +        mask_dma_interrupt(iommu);
> 
> -        spin_lock_irqsave(&iommu->register_lock, flags);
> -        dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
> -        spin_unlock_irqrestore(&iommu->register_lock, flags);
> +        /*
> +         * And then clear all previous faults.
> +         */
> +        clear_fault_bits(iommu);
>      }
>      
>      /*
> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>          if ( force_iommu )
>              continue;
> +        /*
> +         * Mask interrupt message generation.
> +         */
> +        mask_dma_interrupt(iommu);
> +
>          iommu_disable_translation(iommu);
>          
>          /* If interrupt remapping is enabled, queued invalidation

Just curious that do you observe a real issue or just catch it through reading code?

Best regards,
Yang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
  2015-05-04  2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
  2015-05-04  2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
@ 2015-05-04  4:07 ` Zhang, Yang Z
  2015-05-04  8:52 ` Jan Beulich
  3 siblings, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04  4:07 UTC (permalink / raw)
  To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

Chen, Tiejun wrote on 2015-05-04:
> clflush is a light weight but not serializing instruction, so hence
> memory fence is necessary to make sure all load/store visible before flush cache line.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

Acked-by: Yang Zhang <yang.z.zhang@intel.com>

> ---
>  xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
>  1 file changed, 2 insertions(+)
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++
> b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int
> get_cache_line_size(void)
> 
>  void cacheline_flush(char * addr)
>  {
> +    mb();
>      clflush(addr);
> +    mb();
>  }
>  
>  void flush_all_cache()


Best regards,
Yang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  4:07   ` Zhang, Yang Z
@ 2015-05-04  5:08     ` Chen, Tiejun
  2015-05-04  6:41       ` Zhang, Yang Z
  2015-05-04  8:57       ` Jan Beulich
  0 siblings, 2 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04  5:08 UTC (permalink / raw)
  To: Zhang, Yang Z, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

Yang,

Thanks for your review.

On 2015/5/4 12:07, Zhang, Yang Z wrote:
> Chen, Tiejun wrote on 2015-05-04:
>> While initializing VT-D we should mask interrupt message generation
>> to avoid receiving any interrupt as pending before enable DMA
>> translation, and also mask that before disable DMA engine.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>>   xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
>>   1 file changed, 23 insertions(+), 8 deletions(-)
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
>> a/xen/drivers/passthrough/vtd/iommu.c +++
>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
>> void dma_msi_unmask(struct irq_desc *desc)
>>       iommu->msi.msi_attrib.masked = 0;
>>   }
>> -static void dma_msi_mask(struct irq_desc *desc)
>> +static void mask_dma_interrupt(struct iommu *iommu)
>>   {
>>       unsigned long flags;
>> -    struct iommu *iommu = desc->action->dev_id;
>>
>> -    /* mask it */
>>       spin_lock_irqsave(&iommu->register_lock, flags);
>>       dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>>       spin_unlock_irqrestore(&iommu->register_lock, flags);
>> +}
>> +
>> +static void dma_msi_mask(struct irq_desc *desc)
>> +{
>> +    struct iommu *iommu = desc->action->dev_id;
>> +
>> +    /* mask it */
>> +    mask_dma_interrupt(iommu);
>>       iommu->msi.msi_attrib.masked = 1;
>>   }
>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>>       struct iommu *iommu;
>>       struct iommu_flush *flush = NULL;
>>       int ret;
>> -    unsigned long flags;
>>
>>       /*
>>        * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>>
>>           iommu = drhd->iommu;
>> -        clear_fault_bits(iommu);
>> +        /*
>> +         * We shouldn't receive any VT-d interrupt while initializing
>> +         * VT-d so just mask interrupt message generation.
>> +         */
>> +        mask_dma_interrupt(iommu);
>>
>> -        spin_lock_irqsave(&iommu->register_lock, flags);
>> -        dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
>> -        spin_unlock_irqrestore(&iommu->register_lock, flags);
>> +        /*
>> +         * And then clear all previous faults.
>> +         */
>> +        clear_fault_bits(iommu);
>>       }
>>
>>       /*
>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>>           if ( force_iommu )
>>               continue;
>> +        /*
>> +         * Mask interrupt message generation.
>> +         */
>> +        mask_dma_interrupt(iommu);
>> +
>>           iommu_disable_translation(iommu);
>>
>>           /* If interrupt remapping is enabled, queued invalidation
>
> Just curious that do you observe a real issue or just catch it through reading code?
>

Yes, we observed this problem on BDW. And actually one HP customer also 
had this same issue.

Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root 
entry is clear', is triggered like this,

(XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg = 
ffff82c000201000
(XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault addr 0, 
iommu reg = ffff82c000201000
(XEN) [VT-D]d32767v0 DMAR: reason 01 - Present bit in root entry is clear
(XEN) print_vtd_entries: iommu ffff83012aa88600 dev 0000:00:02.0 gmfn 0
(XEN)     root_entry = ffff83012aa85000
(XEN)     root_entry[0] = 80f5001
(XEN)     context = ffff8300080f5000
(XEN)     context[10] = 2_8b75001
(XEN)     l4 = ffff830008b75000
(XEN)     l4_index = 0
(XEN)     l4[0] = 8b74003
(XEN)     l3 = ffff830008b74000
(XEN)     l3_index = 0
(XEN)     l3[0] = 8b73003
(XEN)     l2 = ffff830008b73000
(XEN)     l2_index = 0
(XEN)     l2[0] = 8b72003
(XEN)     l1 = ffff830008b72000
(XEN)     l1_index = 0
(XEN)     l1[0] = 3

At first I doubted this is issued by some improper cache behaviors.
Because as you see, "root_entry[0] = 80f5001" indicates we already set 
that present bit. But Caching Mode bit is zero in BDW so this means 
remapping hardware doesn't own contest cache. And I also check xen 
always calls clflush to write back memory on CPU side. (Even I also 
tried to use wbinvd to replace clflush). And plus, you can see this 
guest fault address is a weird zero, so I turned to concern if this is 
just an unexpected fault which is pending because of some BIOS cleanup, 
or undefined hardware behaviors. Here I mean clear_fault_bits() is 
already not enough to clear this kind of pending interrupt before we 
complete all VT-D initializations.

And anyway, I also think we should disable interrupt generation during 
VT-D setup until VT-D device enable interrupt explicitly.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  5:08     ` Chen, Tiejun
@ 2015-05-04  6:41       ` Zhang, Yang Z
  2015-05-04  8:57       ` Jan Beulich
  1 sibling, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04  6:41 UTC (permalink / raw)
  To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
  Cc: xen-devel

Chen, Tiejun wrote on 2015-05-04:
> Yang,
> 
> Thanks for your review.
> 
> On 2015/5/4 12:07, Zhang, Yang Z wrote:
>> Chen, Tiejun wrote on 2015-05-04:
>>> While initializing VT-D we should mask interrupt message generation
>>> to avoid receiving any interrupt as pending before enable DMA
>>> translation, and also mask that before disable DMA engine.
>>> 
>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>> ---
>>>   xen/drivers/passthrough/vtd/iommu.c | 31
>>>   +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8
>>>   deletions(-)
>>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
>>> a/xen/drivers/passthrough/vtd/iommu.c +++
>>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
>>> void dma_msi_unmask(struct irq_desc *desc)
>>>       iommu->msi.msi_attrib.masked = 0;
>>>   }
>>> -static void dma_msi_mask(struct irq_desc *desc)
>>> +static void mask_dma_interrupt(struct iommu *iommu)
>>>   {
>>>       unsigned long flags;
>>> -    struct iommu *iommu = desc->action->dev_id;
>>> 
>>> -    /* mask it */
>>>       spin_lock_irqsave(&iommu->register_lock, flags);
>>>       dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>>>       spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> +}
>>> +
>>> +static void dma_msi_mask(struct irq_desc *desc)
>>> +{
>>> +    struct iommu *iommu = desc->action->dev_id;
>>> +
>>> +    /* mask it */
>>> +    mask_dma_interrupt(iommu);
>>>       iommu->msi.msi_attrib.masked = 1;
>>>   }
>>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>>>       struct iommu *iommu;
>>>       struct iommu_flush *flush = NULL;
>>>       int ret;
>>> -    unsigned long flags;
>>> 
>>>       /*
>>>        * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
>>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>>> 
>>>           iommu = drhd->iommu;
>>> -        clear_fault_bits(iommu);
>>> +        /*
>>> +         * We shouldn't receive any VT-d interrupt while initializing
>>> +         * VT-d so just mask interrupt message generation.
>>> +         */
>>> +        mask_dma_interrupt(iommu);
>>> 
>>> -        spin_lock_irqsave(&iommu->register_lock, flags);
>>> -        dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
>>> -        spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> +        /*
>>> +         * And then clear all previous faults.
>>> +         */
>>> +        clear_fault_bits(iommu);
>>>       }
>>>       
>>>       /*
>>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>>>           if ( force_iommu )
>>>               continue;
>>> +        /*
>>> +         * Mask interrupt message generation.
>>> +         */
>>> +        mask_dma_interrupt(iommu);
>>> +
>>>           iommu_disable_translation(iommu);
>>>           
>>>           /* If interrupt remapping is enabled, queued invalidation
>> 
>> Just curious that do you observe a real issue or just catch it through
>> reading code?
>> 
> 
> Yes, we observed this problem on BDW. And actually one HP customer also
> had this same issue.
> 
> Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root
> entry is clear', is triggered like this,
> 
> (XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg =
> ffff82c000201000 (XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault
> Overflow (XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending
> Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault
> addr 0, iommu reg = ffff82c000201000 (XEN) [VT-D]d32767v0 DMAR: reason
> 01 - Present bit in root entry is clear (XEN) print_vtd_entries: iommu
> ffff83012aa88600 dev 0000:00:02.0 gmfn 0 (XEN)     root_entry =
> ffff83012aa85000 (XEN)     root_entry[0] = 80f5001 (XEN)     context =
> ffff8300080f5000 (XEN)     context[10] = 2_8b75001 (XEN)     l4 =
> ffff830008b75000 (XEN)     l4_index = 0 (XEN)     l4[0] = 8b74003 (XEN) 
>    l3 = ffff830008b74000 (XEN)     l3_index = 0 (XEN)     l3[0] =
> 8b73003 (XEN)     l2 = ffff830008b73000 (XEN)     l2_index = 0 (XEN)    
> l2[0] = 8b72003 (XEN)     l1 = ffff830008b72000 (XEN)     l1_index = 0
> (XEN)     l1[0] = 3
> 
> At first I doubted this is issued by some improper cache behaviors.
> Because as you see, "root_entry[0] = 80f5001" indicates we already set
> that present bit. But Caching Mode bit is zero in BDW so this means
> remapping hardware doesn't own contest cache. And I also check xen
> always calls clflush to write back memory on CPU side. (Even I also
> tried to use wbinvd to replace clflush). And plus, you can see this
> guest fault address is a weird zero, so I turned to concern if this is
> just an unexpected fault which is pending because of some BIOS cleanup,
> or undefined hardware behaviors. Here I mean clear_fault_bits() is
> already not enough to clear this kind of pending interrupt before we
> complete all VT-D initializations.

So this patch helps nothing in this case?

> 
> And anyway, I also think we should disable interrupt generation during
> VT-D setup until VT-D device enable interrupt explicitly.

The question is why there will be a fault interrupt generated during VT-D setup?

> 
> Thanks
> Tiejun


Best regards,
Yang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
                   ` (2 preceding siblings ...)
  2015-05-04  4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
@ 2015-05-04  8:52 ` Jan Beulich
  2015-05-04  9:14   ` Andrew Cooper
  2015-05-04 10:39   ` Chen, Tiejun
  3 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04  8:52 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang

>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>  
>  void cacheline_flush(char * addr)
>  {
> +    mb();
>      clflush(addr);
> +    mb();
>  }

I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  5:08     ` Chen, Tiejun
  2015-05-04  6:41       ` Zhang, Yang Z
@ 2015-05-04  8:57       ` Jan Beulich
  2015-05-04 11:21         ` Chen, Tiejun
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-04  8:57 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang

>>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote:
> At first I doubted this is issued by some improper cache behaviors.
> Because as you see, "root_entry[0] = 80f5001" indicates we already set 
> that present bit. But Caching Mode bit is zero in BDW so this means 
> remapping hardware doesn't own contest cache. And I also check xen 
> always calls clflush to write back memory on CPU side. (Even I also 
> tried to use wbinvd to replace clflush). And plus, you can see this 
> guest fault address is a weird zero, so I turned to concern if this is 
> just an unexpected fault which is pending because of some BIOS cleanup, 
> or undefined hardware behaviors. Here I mean clear_fault_bits() is 
> already not enough to clear this kind of pending interrupt before we 
> complete all VT-D initializations.

This last aspect in particular needs more explanation, but I'm
wary of the change in general (see Yang's response).

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5
  2015-05-04  2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
@ 2015-05-04  9:00   ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04  9:00 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang

>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> S5 menas we also shut down IOMMU engine as well, so we should
> disable IOMMU engine completely before enter S5, otherwise we
> still probably receive some interrupts in this period. But
> currently VT-D path depends on force_iommu and iommu_intremap
> to determine if we should disable translation and queued
> invalidation. So this unexpected behaviour can be fixed by
> clearing force_iommu and iommu_intremap.

Maybe it can, but is this the right way to do it? Fiddling with IOMMU
internal variables in xen/arch/x86/acpi/power.c seems bogus (not
to say hacky) at least. And why would this be S5-specific?

> Additionally, some cleanups to code comments.

Which has nothing to do in tis patch.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  8:52 ` Jan Beulich
@ 2015-05-04  9:14   ` Andrew Cooper
  2015-05-04  9:24     ` Jan Beulich
  2015-05-05 15:46     ` Boris Ostrovsky
  2015-05-04 10:39   ` Chen, Tiejun
  1 sibling, 2 replies; 33+ messages in thread
From: Andrew Cooper @ 2015-05-04  9:14 UTC (permalink / raw)
  To: Jan Beulich, Tiejun Chen
  Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel

On 04/05/2015 09:52, Jan Beulich wrote:
>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>  
>>  void cacheline_flush(char * addr)
>>  {
>> +    mb();
>>      clflush(addr);
>> +    mb();
>>  }
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I
> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.

On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).

However, the ISA extension document does not indicate which processors
will have support for these new instructions.

~Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  9:14   ` Andrew Cooper
@ 2015-05-04  9:24     ` Jan Beulich
  2015-05-05  1:13       ` Zhang, Yang Z
  2015-05-05 15:46     ` Boris Ostrovsky
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-04  9:24 UTC (permalink / raw)
  To: Andrew Cooper, Tiejun Chen
  Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel

>>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote:
> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>  
>>>  void cacheline_flush(char * addr)
>>>  {
>>> +    mb();
>>>      clflush(addr);
>>> +    mb();
>>>  }
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>> think "serializing" needs to be changed to "fencing", as serialization
>> is not what we really care about here. If you and the maintainers
>> agree, I could certainly fix up both aspects while committing.
> 
> On the subject of writebacks, we should get around to alternating-up the
> use of clflushopt and clwb, either of which would be better than a
> clflush in this case (avoiding the need for the leading mfence).

Plus the barrier would perhaps rather sit around the loop
invoking cacheline_flush() in __iommu_flush_cache(), and I
wonder whether VT-d code shouldn't use available flushing
code elsewhere in the system, and whether that code then
wouldn't need barriers added (or use clflushopt/clwb as you
suggest) instead.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  8:52 ` Jan Beulich
  2015-05-04  9:14   ` Andrew Cooper
@ 2015-05-04 10:39   ` Chen, Tiejun
  2015-05-04 10:43     ` Jan Beulich
  2015-05-04 15:22     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 10:39 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang

On 2015/5/4 16:52, Jan Beulich wrote:
>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>
>>   void cacheline_flush(char * addr)
>>   {
>> +    mb();
>>       clflush(addr);
>> +    mb();
>>   }
>
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I

Seems wmb() is not sufficient here.

"CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed 
to be ordered by any other fencing, serializing or other CLFLUSH 
instruction."

Thanks
Tiejun

> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.
>
> Jan
>
>
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 10:39   ` Chen, Tiejun
@ 2015-05-04 10:43     ` Jan Beulich
  2015-05-04 10:52       ` Tian, Kevin
  2015-05-05  2:45       ` Chen, Tiejun
  2015-05-04 15:22     ` Konrad Rzeszutek Wilk
  1 sibling, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 10:43 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang

>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>
>>>   void cacheline_flush(char * addr)
>>>   {
>>> +    mb();
>>>       clflush(addr);
>>> +    mb();
>>>   }
>>
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
> 
> Seems wmb() is not sufficient here.
> 
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed 
> to be ordered by any other fencing, serializing or other CLFLUSH 
> instruction."

Right - that's what I said in the second sentence.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 10:43     ` Jan Beulich
@ 2015-05-04 10:52       ` Tian, Kevin
  2015-05-04 11:26         ` Chen, Tiejun
  2015-05-05  2:45       ` Chen, Tiejun
  1 sibling, 1 reply; 33+ messages in thread
From: Tian, Kevin @ 2015-05-04 10:52 UTC (permalink / raw)
  To: Jan Beulich, Chen, Tiejun
  Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3

> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 04, 2015 6:44 PM
> 
> >>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> > On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>>
> >>>   void cacheline_flush(char * addr)
> >>>   {
> >>> +    mb();
> >>>       clflush(addr);
> >>> +    mb();
> >>>   }
> >>
> >> I think the purpose of the flush is to force write back, not to evict
> >> the cache line, and if so wmb() would appear to be sufficient. As
> >> the SDM says that's not the case, a comment explaining why wmb()
> >> is not sufficient would seem necessary. Plus in the description I
> >
> > Seems wmb() is not sufficient here.
> >
> > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> > to be ordered by any other fencing, serializing or other CLFLUSH
> > instruction."
> 
> Right - that's what I said in the second sentence.
> 

btw why do we need two fences here? Suppose we just care about
writes before the flush point...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
  2015-05-04  8:57       ` Jan Beulich
@ 2015-05-04 11:21         ` Chen, Tiejun
  0 siblings, 0 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 11:21 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang

On 2015/5/4 16:57, Jan Beulich wrote:
>>>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote:
>> At first I doubted this is issued by some improper cache behaviors.
>> Because as you see, "root_entry[0] = 80f5001" indicates we already set
>> that present bit. But Caching Mode bit is zero in BDW so this means
>> remapping hardware doesn't own contest cache. And I also check xen
>> always calls clflush to write back memory on CPU side. (Even I also
>> tried to use wbinvd to replace clflush). And plus, you can see this
>> guest fault address is a weird zero, so I turned to concern if this is
>> just an unexpected fault which is pending because of some BIOS cleanup,
>> or undefined hardware behaviors. Here I mean clear_fault_bits() is
>> already not enough to clear this kind of pending interrupt before we
>> complete all VT-D initializations.
>
> This last aspect in particular needs more explanation, but I'm
> wary of the change in general (see Yang's response).
>

Sure.

As we discussed offline, Yang though at most this is a way to workaround 
our problem. So I would ping BIOS team to double check if this is an 
issue specific to some given BIOSs or platforms since this doesn't meet 
the current specification

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 10:52       ` Tian, Kevin
@ 2015-05-04 11:26         ` Chen, Tiejun
  2015-05-05  1:13           ` Tian, Kevin
  0 siblings, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 11:26 UTC (permalink / raw)
  To: Tian, Kevin, Jan Beulich
  Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3

On 2015/5/4 18:52, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 04, 2015 6:44 PM
>>
>>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
>>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>
>>>>>    void cacheline_flush(char * addr)
>>>>>    {
>>>>> +    mb();
>>>>>        clflush(addr);
>>>>> +    mb();
>>>>>    }
>>>>
>>>> I think the purpose of the flush is to force write back, not to evict
>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>> is not sufficient would seem necessary. Plus in the description I
>>>
>>> Seems wmb() is not sufficient here.
>>>
>>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
>>> to be ordered by any other fencing, serializing or other CLFLUSH
>>> instruction."
>>
>> Right - that's what I said in the second sentence.
>>
>
> btw why do we need two fences here? Suppose we just care about
> writes before the flush point...
>

The first MFENCE guarantees all load/store visible before flush cache 
line. But the second MFENCE just makes sure CLFLUSH is not ordered by 
that ensuing load/store, right?

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 10:39   ` Chen, Tiejun
  2015-05-04 10:43     ` Jan Beulich
@ 2015-05-04 15:22     ` Konrad Rzeszutek Wilk
  2015-05-04 15:33       ` Andrew Cooper
  1 sibling, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-05-04 15:22 UTC (permalink / raw)
  To: Chen, Tiejun, boris.ostrovsky
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich,
	andrew.cooper3, yang.z.zhang

On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>--- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>
> >>  void cacheline_flush(char * addr)
> >>  {
> >>+    mb();
> >>      clflush(addr);
> >>+    mb();
> >>  }
> >
> >I think the purpose of the flush is to force write back, not to evict
> >the cache line, and if so wmb() would appear to be sufficient. As
> >the SDM says that's not the case, a comment explaining why wmb()
> >is not sufficient would seem necessary. Plus in the description I
> 
> Seems wmb() is not sufficient here.
> 
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
> be ordered by any other fencing, serializing or other CLFLUSH instruction."

That is incorrect. We have observed that CLFLUSH instruction do serialize each
other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline.

Cc-ing Boris who discovered this.
> 
> Thanks
> Tiejun
> 
> >think "serializing" needs to be changed to "fencing", as serialization
> >is not what we really care about here. If you and the maintainers
> >agree, I could certainly fix up both aspects while committing.
> >
> >Jan
> >
> >
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 15:22     ` Konrad Rzeszutek Wilk
@ 2015-05-04 15:33       ` Andrew Cooper
  0 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2015-05-04 15:33 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Chen, Tiejun, boris.ostrovsky
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich, yang.z.zhang

On 04/05/2015 16:22, Konrad Rzeszutek Wilk wrote:
> On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>>  void cacheline_flush(char * addr)
>>>>  {
>>>> +    mb();
>>>>      clflush(addr);
>>>> +    mb();
>>>>  }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>> Seems wmb() is not sufficient here.
>>
>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
>> be ordered by any other fencing, serializing or other CLFLUSH instruction."
> That is incorrect. We have observed that CLFLUSH instruction do serialize each
> other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline.
>
> Cc-ing Boris who discovered this.

Simply stalling the pipeline says nothing about its ordering with
respect to other instructions.

It is plausible that certain processor pipelines employ stricter
restrictions on CLFLUSH, but Xen must not assume that this is the case
in general, especially as it is in direct contradiction to both the
Intel and AMD instruction manuals.

~Andrew

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 11:26         ` Chen, Tiejun
@ 2015-05-05  1:13           ` Tian, Kevin
  0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2015-05-05  1:13 UTC (permalink / raw)
  To: Chen, Tiejun, Jan Beulich
  Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3

> From: Chen, Tiejun
> Sent: Monday, May 04, 2015 7:26 PM
> 
> On 2015/5/4 18:52, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, May 04, 2015 6:44 PM
> >>
> >>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> >>> On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>>>>
> >>>>>    void cacheline_flush(char * addr)
> >>>>>    {
> >>>>> +    mb();
> >>>>>        clflush(addr);
> >>>>> +    mb();
> >>>>>    }
> >>>>
> >>>> I think the purpose of the flush is to force write back, not to evict
> >>>> the cache line, and if so wmb() would appear to be sufficient. As
> >>>> the SDM says that's not the case, a comment explaining why wmb()
> >>>> is not sufficient would seem necessary. Plus in the description I
> >>>
> >>> Seems wmb() is not sufficient here.
> >>>
> >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> >>> to be ordered by any other fencing, serializing or other CLFLUSH
> >>> instruction."
> >>
> >> Right - that's what I said in the second sentence.
> >>
> >
> > btw why do we need two fences here? Suppose we just care about
> > writes before the flush point...
> >
> 
> The first MFENCE guarantees all load/store visible before flush cache
> line. But the second MFENCE just makes sure CLFLUSH is not ordered by
> that ensuing load/store, right?
> 

It's not an usual case to have another store to same destination address
right after CLFLUSH. Usually there's some protocol to do another update
until device completes access. Store to different address is not a concern.
But yes if just looking at this interface alone, we'd better not make
assumption about the caller so two mb() look necessary...

btw as Jan commented in another thread, if there's no other caller of
this function, it might be more reasonable to expand it into the bigger
loop in __iommu_flush_cache so you don't need to fence for each
cache line flush.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  9:24     ` Jan Beulich
@ 2015-05-05  1:13       ` Zhang, Yang Z
  0 siblings, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-05  1:13 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper, Chen, Tiejun
  Cc: jinsong.liu, Tian, Kevin, keir, xen-devel

Jan Beulich wrote on 2015-05-04:
>>>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>> 
>>>>  void cacheline_flush(char * addr)  {
>>>> +    mb();
>>>>      clflush(addr);
>>>> +    mb();
>>>>  }
>>> I think the purpose of the flush is to force write back, not to
>>> evict the cache line, and if so wmb() would appear to be
>>> sufficient. As the SDM says that's not the case, a comment
>>> explaining why wmb() is not sufficient would seem necessary. Plus
>>> in the description I think "serializing" needs to be changed to
>>> "fencing", as serialization is not what we really care about here.
>>> If you and the maintainers agree, I could certainly fix up both aspects while committing.
>> 
>> On the subject of writebacks, we should get around to alternating-up
>> the use of clflushopt and clwb, either of which would be better than
>> a clflush in this case (avoiding the need for the leading mfence).
> 
> Plus the barrier would perhaps rather sit around the loop invoking
> cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d

Agree. It's better to batch the flush operations, like:

@@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size)
     if ( !iommus_incoherent )
         return;
 
+    mb();
+
     if ( clflush_size == 0 )
         clflush_size = get_cache_line_size();
 
     for ( i = 0; i < size; i += clflush_size )
         cacheline_flush((char *)addr + i);
+
+    mb();
 }

> code shouldn't use available flushing code elsewhere in the system,
> and whether that code then wouldn't need barriers added (or use
> clflushopt/clwb as you
> suggest) instead.


> 
> Jan


Best regards,
Yang

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04 10:43     ` Jan Beulich
  2015-05-04 10:52       ` Tian, Kevin
@ 2015-05-05  2:45       ` Chen, Tiejun
  2015-05-05  9:24         ` Jan Beulich
  1 sibling, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-05  2:45 UTC (permalink / raw)
  To: Jan Beulich, andrew.cooper3, kevin.tian, yang.z.zhang,
	Konrad Rzeszutek Wilk, boris.ostrovsky
  Cc: jinsong.liu, keir, xen-devel

On 2015/5/4 18:43, Jan Beulich wrote:
>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>>    void cacheline_flush(char * addr)
>>>>    {
>>>> +    mb();
>>>>        clflush(addr);
>>>> +    mb();
>>>>    }
>>>
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>
>> Seems wmb() is not sufficient here.
>>
>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
>> to be ordered by any other fencing, serializing or other CLFLUSH
>> instruction."
>
> Right - that's what I said in the second sentence.
>

Thanks for all guys' comments.

Does this work for everyone?

     xen/vt-d: need barriers to workaround CLFLUSH

     clflush is a light weight but not fencing instruction, so
     hence memory fence is necessary to make sure all load/store
     visible before flush cache line.

     Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index c7bda73..1248a17 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -170,8 +170,15 @@ static void __iommu_flush_cache(void *addr, 
unsigned int size)
      if ( clflush_size == 0 )
          clflush_size = get_cache_line_size();

+    /*
+     * CLFLUSH is only ordered by the MFENCE instruction.
+     * It is not guaranteed to be ordered by any other fencing,
+     * serializing or other CLFLUSH instruction.
+     */
+    mb();
      for ( i = 0; i < size; i += clflush_size )
          cacheline_flush((char *)addr + i);
+    mb();
  }

  void iommu_flush_cache_entry(void *addr, unsigned int size)

Thanks
Tiejun

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05  2:45       ` Chen, Tiejun
@ 2015-05-05  9:24         ` Jan Beulich
  2015-05-06  6:47           ` Chen, Tiejun
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-05  9:24 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
	yang.z.zhang, boris.ostrovsky

>>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote:
> Does this work for everyone?

Please first of all explain why the interfaces in asm/flushtlb.h can't
be used here (at least when flushing entire pages). Because - as
said before - for a complete fix you'd need to deal with the CLFLUSH
use(s) elsewhere in the system too.

Btw, isn't the !iommus_incoherent check in that function inverted?
I.e. why would we _not_ need to flush caches when IOMMUs
are not coherent (and why would flushing be needed when they're
coherent anyway)?

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-04  9:14   ` Andrew Cooper
  2015-05-04  9:24     ` Jan Beulich
@ 2015-05-05 15:46     ` Boris Ostrovsky
  2015-05-05 15:58       ` Jan Beulich
  2015-05-06  7:09       ` Chen, Tiejun
  1 sibling, 2 replies; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-05 15:46 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich, Tiejun Chen
  Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel

On 05/04/2015 05:14 AM, Andrew Cooper wrote:
> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>   
>>>   void cacheline_flush(char * addr)
>>>   {
>>> +    mb();
>>>       clflush(addr);
>>> +    mb();
>>>   }
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>> think "serializing" needs to be changed to "fencing", as serialization
>> is not what we really care about here. If you and the maintainers
>> agree, I could certainly fix up both aspects while committing.
> On the subject of writebacks, we should get around to alternating-up the
> use of clflushopt and clwb, either of which would be better than a
> clflush in this case (avoiding the need for the leading mfence).
>
> However, the ISA extension document does not indicate which processors
> will have support for these new instructions.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

We really should add support for this. On shutting down a very large 
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. 
This was due to serializing nature of CLFLUSH.

-boris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05 15:46     ` Boris Ostrovsky
@ 2015-05-05 15:58       ` Jan Beulich
  2015-05-05 16:11         ` Boris Ostrovsky
  2015-05-06  7:09       ` Chen, Tiejun
  1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-05 15:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
	yang.z.zhang, Tiejun Chen

>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>   
>>>>   void cacheline_flush(char * addr)
>>>>   {
>>>> +    mb();
>>>>       clflush(addr);
>>>> +    mb();
>>>>   }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
> 
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf 
> 
> We really should add support for this. On shutting down a very large 
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. 
> This was due to serializing nature of CLFLUSH.

But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05 15:58       ` Jan Beulich
@ 2015-05-05 16:11         ` Boris Ostrovsky
  2015-05-06  7:12           ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-05 16:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
	yang.z.zhang, Tiejun Chen

On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>    
>>>>>    void cacheline_flush(char * addr)
>>>>>    {
>>>>> +    mb();
>>>>>        clflush(addr);
>>>>> +    mb();
>>>>>    }
>>>> I think the purpose of the flush is to force write back, not to evict
>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>> is not sufficient would seem necessary. Plus in the description I
>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>> is not what we really care about here. If you and the maintainers
>>>> agree, I could certainly fix up both aspects while committing.
>>> On the subject of writebacks, we should get around to alternating-up the
>>> use of clflushopt and clwb, either of which would be better than a
>>> clflush in this case (avoiding the need for the leading mfence).
>>>
>>> However, the ISA extension document does not indicate which processors
>>> will have support for these new instructions.
>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>
>> We really should add support for this. On shutting down a very large
>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>> This was due to serializing nature of CLFLUSH.
> But flushing the IOMMU isn't being done via CPU instructions, but
> rather via commands sent to the IOMMU. I.e. I'm somewhat
> confused by your reply.

I didn't mean flushing IOMMU itself, sorry. I meant 
__iommu_flush_cache() (or whatever it's equivalent we had in the 
product, which was 4.1-based).

-boris

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05  9:24         ` Jan Beulich
@ 2015-05-06  6:47           ` Chen, Tiejun
  2015-05-06  7:14             ` Jan Beulich
  0 siblings, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06  6:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
	yang.z.zhang, boris.ostrovsky

On 2015/5/5 17:24, Jan Beulich wrote:
>>>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote:
>> Does this work for everyone?
>
> Please first of all explain why the interfaces in asm/flushtlb.h can't
> be used here (at least when flushing entire pages). Because - as

I also don't understand any reason we didn't use this previously on 
IOMMU side...

> said before - for a complete fix you'd need to deal with the CLFLUSH
> use(s) elsewhere in the system too.

Looks we need to do this in xen/arch/x86/flushtlb.c:flush_area_local().

>
> Btw, isn't the !iommus_incoherent check in that function inverted?
> I.e. why would we _not_ need to flush caches when IOMMUs
> are not coherent (and why would flushing be needed when they're
> coherent anyway)?
>

I guess you're misunderstanding this

     if ( !ecap_coherent(iommu->ecap) )
         iommus_incoherent = 1;

So here !iommus_incoherent means IOMMU is coherent and then we don't 
need to flush cache in this case.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05 15:46     ` Boris Ostrovsky
  2015-05-05 15:58       ` Jan Beulich
@ 2015-05-06  7:09       ` Chen, Tiejun
  1 sibling, 0 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06  7:09 UTC (permalink / raw)
  To: Boris Ostrovsky, Andrew Cooper, Jan Beulich
  Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel

On 2015/5/5 23:46, Boris Ostrovsky wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>   void cacheline_flush(char * addr)
>>>>   {
>>>> +    mb();
>>>>       clflush(addr);
>>>> +    mb();
>>>>   }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
>
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>
> We really should add support for this. On shutting down a very large
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
> This was due to serializing nature of CLFLUSH.
>

CLFLUSHOPT/CLWB instructions are ordered only by store-fencing 
operations, so at least we still need SFENCE.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-05 16:11         ` Boris Ostrovsky
@ 2015-05-06  7:12           ` Jan Beulich
  2015-05-06  7:26             ` Chen, Tiejun
  2015-05-06 15:11             ` Boris Ostrovsky
  0 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06  7:12 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
	yang.z.zhang, Tiejun Chen

>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>    
>>>>>>    void cacheline_flush(char * addr)
>>>>>>    {
>>>>>> +    mb();
>>>>>>        clflush(addr);
>>>>>> +    mb();
>>>>>>    }
>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>> is not what we really care about here. If you and the maintainers
>>>>> agree, I could certainly fix up both aspects while committing.
>>>> On the subject of writebacks, we should get around to alternating-up the
>>>> use of clflushopt and clwb, either of which would be better than a
>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>
>>>> However, the ISA extension document does not indicate which processors
>>>> will have support for these new instructions.
>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf 
>>>
>>> We really should add support for this. On shutting down a very large
>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>> This was due to serializing nature of CLFLUSH.
>> But flushing the IOMMU isn't being done via CPU instructions, but
>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>> confused by your reply.
> 
> I didn't mean flushing IOMMU itself, sorry. I meant 
> __iommu_flush_cache() (or whatever it's equivalent we had in the 
> product, which was 4.1-based).

In that case I wonder how much of that flushing is really necessary
during IOMMU teardown. VT-d maintainers?

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-06  6:47           ` Chen, Tiejun
@ 2015-05-06  7:14             ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06  7:14 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
	yang.z.zhang, boris.ostrovsky

>>> On 06.05.15 at 08:47, <tiejun.chen@intel.com> wrote:
> On 2015/5/5 17:24, Jan Beulich wrote:
>> Btw, isn't the !iommus_incoherent check in that function inverted?
>> I.e. why would we _not_ need to flush caches when IOMMUs
>> are not coherent (and why would flushing be needed when they're
>> coherent anyway)?
>>
> 
> I guess you're misunderstanding this
> 
>      if ( !ecap_coherent(iommu->ecap) )
>          iommus_incoherent = 1;
> 
> So here !iommus_incoherent means IOMMU is coherent and then we don't 
> need to flush cache in this case.

Ah, indeed - I overlooked the "in" in the variable name (together with
the thus resulting double negation in the expression). Sorry for the
noise.

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-06  7:12           ` Jan Beulich
@ 2015-05-06  7:26             ` Chen, Tiejun
  2015-05-06  7:33               ` Jan Beulich
  2015-05-06 15:11             ` Boris Ostrovsky
  1 sibling, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06  7:26 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang

On 2015/5/6 15:12, Jan Beulich wrote:
>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>
>>>>>>>     void cacheline_flush(char * addr)
>>>>>>>     {
>>>>>>> +    mb();
>>>>>>>         clflush(addr);
>>>>>>> +    mb();
>>>>>>>     }
>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>> is not what we really care about here. If you and the maintainers
>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>
>>>>> However, the ISA extension document does not indicate which processors
>>>>> will have support for these new instructions.
>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>>
>>>> We really should add support for this. On shutting down a very large
>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>> This was due to serializing nature of CLFLUSH.
>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>> confused by your reply.
>>
>> I didn't mean flushing IOMMU itself, sorry. I meant
>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>> product, which was 4.1-based).
>
> In that case I wonder how much of that flushing is really necessary

Sorry, what is that case?

> during IOMMU teardown. VT-d maintainers?
>

In most cases __iommu_flush_cache() is used to flush any remapping 
structures into memory then IOMMU can get proper data.

Thanks
Tiejun

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-06  7:26             ` Chen, Tiejun
@ 2015-05-06  7:33               ` Jan Beulich
  0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06  7:33 UTC (permalink / raw)
  To: Tiejun Chen
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
	yang.z.zhang, Boris Ostrovsky

>>> On 06.05.15 at 09:26, <tiejun.chen@intel.com> wrote:
> On 2015/5/6 15:12, Jan Beulich wrote:
>>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>>
>>>>>>>>     void cacheline_flush(char * addr)
>>>>>>>>     {
>>>>>>>> +    mb();
>>>>>>>>         clflush(addr);
>>>>>>>> +    mb();
>>>>>>>>     }
>>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>>> is not what we really care about here. If you and the maintainers
>>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>>
>>>>>> However, the ISA extension document does not indicate which processors
>>>>>> will have support for these new instructions.
>>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf 
>>>>>
>>>>> We really should add support for this. On shutting down a very large
>>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>>> This was due to serializing nature of CLFLUSH.
>>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>>> confused by your reply.
>>>
>>> I didn't mean flushing IOMMU itself, sorry. I meant
>>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>>> product, which was 4.1-based).
>>
>> In that case I wonder how much of that flushing is really necessary
> 
> Sorry, what is that case?

The flushing CPU of side caches during IOMMU teardown for a guest.

>> during IOMMU teardown. VT-d maintainers?
>>
> 
> In most cases __iommu_flush_cache() is used to flush any remapping 
> structures into memory then IOMMU can get proper data.

Right, but here we're talking about teardown. Since the IOMMU
isn't to use any mappings anymore anyway for the dying guest,
there's little point in flushing respective changes from caches (or
at least that flushing could be limited to the ripping out of top
level structures, provided these get zapped before anything
hanging off of them).

Jan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
  2015-05-06  7:12           ` Jan Beulich
  2015-05-06  7:26             ` Chen, Tiejun
@ 2015-05-06 15:11             ` Boris Ostrovsky
  1 sibling, 0 replies; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-06 15:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
	yang.z.zhang, Tiejun Chen

On 05/06/2015 03:12 AM, Jan Beulich wrote:
>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>     
>>>>>>>     void cacheline_flush(char * addr)
>>>>>>>     {
>>>>>>> +    mb();
>>>>>>>         clflush(addr);
>>>>>>> +    mb();
>>>>>>>     }
>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>> is not what we really care about here. If you and the maintainers
>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>
>>>>> However, the ISA extension document does not indicate which processors
>>>>> will have support for these new instructions.
>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>>
>>>> We really should add support for this. On shutting down a very large
>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>> This was due to serializing nature of CLFLUSH.
>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>> confused by your reply.
>> I didn't mean flushing IOMMU itself, sorry. I meant
>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>> product, which was 4.1-based).
> In that case I wonder how much of that flushing is really necessary
> during IOMMU teardown. VT-d maintainers?

I should mention that we saw this on 4.1, where iommu teardown is done 
in the domain destruction code path. After we backported code that moves 
this into a tasklet things got much better.

Of course we still are doing flushing, even if in the background, and 
therefore system resources are still being consumed (and memory is not 
available until flushing is done). So if this is unnecessary then there 
are good reasons not to do it.

-boris

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-05-06 15:11 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-04  2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
2015-05-04  2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
2015-05-04  4:07   ` Zhang, Yang Z
2015-05-04  5:08     ` Chen, Tiejun
2015-05-04  6:41       ` Zhang, Yang Z
2015-05-04  8:57       ` Jan Beulich
2015-05-04 11:21         ` Chen, Tiejun
2015-05-04  2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
2015-05-04  9:00   ` Jan Beulich
2015-05-04  4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
2015-05-04  8:52 ` Jan Beulich
2015-05-04  9:14   ` Andrew Cooper
2015-05-04  9:24     ` Jan Beulich
2015-05-05  1:13       ` Zhang, Yang Z
2015-05-05 15:46     ` Boris Ostrovsky
2015-05-05 15:58       ` Jan Beulich
2015-05-05 16:11         ` Boris Ostrovsky
2015-05-06  7:12           ` Jan Beulich
2015-05-06  7:26             ` Chen, Tiejun
2015-05-06  7:33               ` Jan Beulich
2015-05-06 15:11             ` Boris Ostrovsky
2015-05-06  7:09       ` Chen, Tiejun
2015-05-04 10:39   ` Chen, Tiejun
2015-05-04 10:43     ` Jan Beulich
2015-05-04 10:52       ` Tian, Kevin
2015-05-04 11:26         ` Chen, Tiejun
2015-05-05  1:13           ` Tian, Kevin
2015-05-05  2:45       ` Chen, Tiejun
2015-05-05  9:24         ` Jan Beulich
2015-05-06  6:47           ` Chen, Tiejun
2015-05-06  7:14             ` Jan Beulich
2015-05-04 15:22     ` Konrad Rzeszutek Wilk
2015-05-04 15:33       ` Andrew Cooper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.