* [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
@ 2015-05-04 2:16 Tiejun Chen
2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
` (3 more replies)
0 siblings, 4 replies; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw)
To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
clflush is a light weight but not serializing instruction, so
hence memory fence is necessary to make sure all load/store
visible before flush cache line.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c
index 109234e..fd2ff04 100644
--- a/xen/drivers/passthrough/vtd/x86/vtd.c
+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
void cacheline_flush(char * addr)
{
+ mb();
clflush(addr);
+ mb();
}
void flush_all_cache()
--
1.9.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
@ 2015-05-04 2:16 ` Tiejun Chen
2015-05-04 4:07 ` Zhang, Yang Z
2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
` (2 subsequent siblings)
3 siblings, 1 reply; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw)
To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
While initializing VT-D we should mask interrupt message generation
to avoid receiving any interrupt as pending before enable DMA
translation, and also mask that before disable DMA engine.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index c7bda73..72cd854 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1000,15 +1000,21 @@ static void dma_msi_unmask(struct irq_desc *desc)
iommu->msi.msi_attrib.masked = 0;
}
-static void dma_msi_mask(struct irq_desc *desc)
+static void mask_dma_interrupt(struct iommu *iommu)
{
unsigned long flags;
- struct iommu *iommu = desc->action->dev_id;
- /* mask it */
spin_lock_irqsave(&iommu->register_lock, flags);
dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
spin_unlock_irqrestore(&iommu->register_lock, flags);
+}
+
+static void dma_msi_mask(struct irq_desc *desc)
+{
+ struct iommu *iommu = desc->action->dev_id;
+
+ /* mask it */
+ mask_dma_interrupt(iommu);
iommu->msi.msi_attrib.masked = 1;
}
@@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
struct iommu *iommu;
struct iommu_flush *flush = NULL;
int ret;
- unsigned long flags;
/*
* Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
@@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
iommu = drhd->iommu;
- clear_fault_bits(iommu);
+ /*
+ * We shouldn't receive any VT-d interrupt while initializing
+ * VT-d so just mask interrupt message generation.
+ */
+ mask_dma_interrupt(iommu);
- spin_lock_irqsave(&iommu->register_lock, flags);
- dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
- spin_unlock_irqrestore(&iommu->register_lock, flags);
+ /*
+ * And then clear all previous faults.
+ */
+ clear_fault_bits(iommu);
}
/*
@@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
if ( force_iommu )
continue;
+ /*
+ * Mask interrupt message generation.
+ */
+ mask_dma_interrupt(iommu);
+
iommu_disable_translation(iommu);
/* If interrupt remapping is enabled, queued invalidation
--
1.9.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5
2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
@ 2015-05-04 2:16 ` Tiejun Chen
2015-05-04 9:00 ` Jan Beulich
2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
2015-05-04 8:52 ` Jan Beulich
3 siblings, 1 reply; 33+ messages in thread
From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw)
To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
S5 menas we also shut down IOMMU engine as well, so we should
disable IOMMU engine completely before enter S5, otherwise we
still probably receive some interrupts in this period. But
currently VT-D path depends on force_iommu and iommu_intremap
to determine if we should disable translation and queued
invalidation. So this unexpected behaviour can be fixed by
clearing force_iommu and iommu_intremap. Additionally, some
cleanups to code comments.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
---
xen/arch/x86/acpi/power.c | 2 ++
xen/arch/x86/crash.c | 12 ++++++++----
xen/drivers/passthrough/vtd/iommu.c | 5 +++--
3 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index f41f0de..345470d 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -183,6 +183,8 @@ static int enter_state(u32 state)
break;
case ACPI_STATE_S5:
acpi_enter_sleep_state(ACPI_STATE_S5);
+ /* Make sure we really disable VT-d engine completely. */
+ force_iommu = iommu_intremap = 0;
break;
default:
error = -EINVAL;
diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index eb7be9c..123fb12 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -171,15 +171,19 @@ static void nmi_shootdown_cpus(void)
printk("Failed to shoot down CPUs {%s}\n", keyhandler_scratch);
}
- /* Crash shutdown any IOMMU functionality as the crashdump kernel is not
- * happy when booting if interrupt/dma remapping is still enabled */
+ /*
+ * Crash shutdown any IOMMU functionality as the crashdump kernel is not
+ * happy when booting if interrupt/dma remapping is still enabled.
+ */
iommu_crash_shutdown();
__stop_this_cpu();
- /* This is a bit of a hack due to the problems with the x2apic_enabled
+ /*
+ * This is a bit of a hack due to the problems with the x2apic_enabled
* variable, but we can't do any better without a significant refactoring
- * of the APIC code */
+ * of the APIC code.
+ */
x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC);
disable_IO_APIC();
diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c
index 72cd854..28244ec 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2367,9 +2367,10 @@ static void vtd_suspend(void)
iommu_disable_translation(iommu);
- /* If interrupt remapping is enabled, queued invalidation
+ /*
+ * If interrupt remapping is enabled, queued invalidation
* will be disabled following interupt remapping disabling
- * in local apic suspend
+ * in local apic suspend.
*/
if ( !iommu_intremap && iommu_qinval )
disable_qinval(iommu);
--
1.9.1
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
@ 2015-05-04 4:07 ` Zhang, Yang Z
2015-05-04 5:08 ` Chen, Tiejun
0 siblings, 1 reply; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04 4:07 UTC (permalink / raw)
To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
Chen, Tiejun wrote on 2015-05-04:
> While initializing VT-D we should mask interrupt message generation
> to avoid receiving any interrupt as pending before enable DMA
> translation, and also mask that before disable DMA engine.
>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
> ---
> xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
> a/xen/drivers/passthrough/vtd/iommu.c +++
> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
> void dma_msi_unmask(struct irq_desc *desc)
> iommu->msi.msi_attrib.masked = 0;
> }
> -static void dma_msi_mask(struct irq_desc *desc)
> +static void mask_dma_interrupt(struct iommu *iommu)
> {
> unsigned long flags;
> - struct iommu *iommu = desc->action->dev_id;
>
> - /* mask it */
> spin_lock_irqsave(&iommu->register_lock, flags);
> dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
> spin_unlock_irqrestore(&iommu->register_lock, flags);
> +}
> +
> +static void dma_msi_mask(struct irq_desc *desc)
> +{
> + struct iommu *iommu = desc->action->dev_id;
> +
> + /* mask it */
> + mask_dma_interrupt(iommu);
> iommu->msi.msi_attrib.masked = 1;
> }
> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
> struct iommu *iommu;
> struct iommu_flush *flush = NULL;
> int ret;
> - unsigned long flags;
>
> /*
> * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>
> iommu = drhd->iommu;
> - clear_fault_bits(iommu);
> + /*
> + * We shouldn't receive any VT-d interrupt while initializing
> + * VT-d so just mask interrupt message generation.
> + */
> + mask_dma_interrupt(iommu);
>
> - spin_lock_irqsave(&iommu->register_lock, flags);
> - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
> - spin_unlock_irqrestore(&iommu->register_lock, flags);
> + /*
> + * And then clear all previous faults.
> + */
> + clear_fault_bits(iommu);
> }
>
> /*
> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
> if ( force_iommu )
> continue;
> + /*
> + * Mask interrupt message generation.
> + */
> + mask_dma_interrupt(iommu);
> +
> iommu_disable_translation(iommu);
>
> /* If interrupt remapping is enabled, queued invalidation
Just curious that do you observe a real issue or just catch it through reading code?
Best regards,
Yang
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
@ 2015-05-04 4:07 ` Zhang, Yang Z
2015-05-04 8:52 ` Jan Beulich
3 siblings, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04 4:07 UTC (permalink / raw)
To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
Chen, Tiejun wrote on 2015-05-04:
> clflush is a light weight but not serializing instruction, so hence
> memory fence is necessary to make sure all load/store visible before flush cache line.
>
> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
> ---
> xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++
> 1 file changed, 2 insertions(+)
> diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c
> b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++
> b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int
> get_cache_line_size(void)
>
> void cacheline_flush(char * addr)
> {
> + mb();
> clflush(addr);
> + mb();
> }
>
> void flush_all_cache()
Best regards,
Yang
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 4:07 ` Zhang, Yang Z
@ 2015-05-04 5:08 ` Chen, Tiejun
2015-05-04 6:41 ` Zhang, Yang Z
2015-05-04 8:57 ` Jan Beulich
0 siblings, 2 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 5:08 UTC (permalink / raw)
To: Zhang, Yang Z, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
Yang,
Thanks for your review.
On 2015/5/4 12:07, Zhang, Yang Z wrote:
> Chen, Tiejun wrote on 2015-05-04:
>> While initializing VT-D we should mask interrupt message generation
>> to avoid receiving any interrupt as pending before enable DMA
>> translation, and also mask that before disable DMA engine.
>>
>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>> ---
>> xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++--------
>> 1 file changed, 23 insertions(+), 8 deletions(-)
>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
>> a/xen/drivers/passthrough/vtd/iommu.c +++
>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
>> void dma_msi_unmask(struct irq_desc *desc)
>> iommu->msi.msi_attrib.masked = 0;
>> }
>> -static void dma_msi_mask(struct irq_desc *desc)
>> +static void mask_dma_interrupt(struct iommu *iommu)
>> {
>> unsigned long flags;
>> - struct iommu *iommu = desc->action->dev_id;
>>
>> - /* mask it */
>> spin_lock_irqsave(&iommu->register_lock, flags);
>> dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>> spin_unlock_irqrestore(&iommu->register_lock, flags);
>> +}
>> +
>> +static void dma_msi_mask(struct irq_desc *desc)
>> +{
>> + struct iommu *iommu = desc->action->dev_id;
>> +
>> + /* mask it */
>> + mask_dma_interrupt(iommu);
>> iommu->msi.msi_attrib.masked = 1;
>> }
>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>> struct iommu *iommu;
>> struct iommu_flush *flush = NULL;
>> int ret;
>> - unsigned long flags;
>>
>> /*
>> * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>>
>> iommu = drhd->iommu;
>> - clear_fault_bits(iommu);
>> + /*
>> + * We shouldn't receive any VT-d interrupt while initializing
>> + * VT-d so just mask interrupt message generation.
>> + */
>> + mask_dma_interrupt(iommu);
>>
>> - spin_lock_irqsave(&iommu->register_lock, flags);
>> - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
>> - spin_unlock_irqrestore(&iommu->register_lock, flags);
>> + /*
>> + * And then clear all previous faults.
>> + */
>> + clear_fault_bits(iommu);
>> }
>>
>> /*
>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>> if ( force_iommu )
>> continue;
>> + /*
>> + * Mask interrupt message generation.
>> + */
>> + mask_dma_interrupt(iommu);
>> +
>> iommu_disable_translation(iommu);
>>
>> /* If interrupt remapping is enabled, queued invalidation
>
> Just curious that do you observe a real issue or just catch it through reading code?
>
Yes, we observed this problem on BDW. And actually one HP customer also
had this same issue.
Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root
entry is clear', is triggered like this,
(XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg =
ffff82c000201000
(XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault Overflow
(XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending Fault
(XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault addr 0,
iommu reg = ffff82c000201000
(XEN) [VT-D]d32767v0 DMAR: reason 01 - Present bit in root entry is clear
(XEN) print_vtd_entries: iommu ffff83012aa88600 dev 0000:00:02.0 gmfn 0
(XEN) root_entry = ffff83012aa85000
(XEN) root_entry[0] = 80f5001
(XEN) context = ffff8300080f5000
(XEN) context[10] = 2_8b75001
(XEN) l4 = ffff830008b75000
(XEN) l4_index = 0
(XEN) l4[0] = 8b74003
(XEN) l3 = ffff830008b74000
(XEN) l3_index = 0
(XEN) l3[0] = 8b73003
(XEN) l2 = ffff830008b73000
(XEN) l2_index = 0
(XEN) l2[0] = 8b72003
(XEN) l1 = ffff830008b72000
(XEN) l1_index = 0
(XEN) l1[0] = 3
At first I doubted this is issued by some improper cache behaviors.
Because as you see, "root_entry[0] = 80f5001" indicates we already set
that present bit. But Caching Mode bit is zero in BDW so this means
remapping hardware doesn't own contest cache. And I also check xen
always calls clflush to write back memory on CPU side. (Even I also
tried to use wbinvd to replace clflush). And plus, you can see this
guest fault address is a weird zero, so I turned to concern if this is
just an unexpected fault which is pending because of some BIOS cleanup,
or undefined hardware behaviors. Here I mean clear_fault_bits() is
already not enough to clear this kind of pending interrupt before we
complete all VT-D initializations.
And anyway, I also think we should disable interrupt generation during
VT-D setup until VT-D device enable interrupt explicitly.
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 5:08 ` Chen, Tiejun
@ 2015-05-04 6:41 ` Zhang, Yang Z
2015-05-04 8:57 ` Jan Beulich
1 sibling, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-04 6:41 UTC (permalink / raw)
To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3
Cc: xen-devel
Chen, Tiejun wrote on 2015-05-04:
> Yang,
>
> Thanks for your review.
>
> On 2015/5/4 12:07, Zhang, Yang Z wrote:
>> Chen, Tiejun wrote on 2015-05-04:
>>> While initializing VT-D we should mask interrupt message generation
>>> to avoid receiving any interrupt as pending before enable DMA
>>> translation, and also mask that before disable DMA engine.
>>>
>>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
>>> ---
>>> xen/drivers/passthrough/vtd/iommu.c | 31
>>> +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8
>>> deletions(-)
>>> diff --git a/xen/drivers/passthrough/vtd/iommu.c
>>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 ---
>>> a/xen/drivers/passthrough/vtd/iommu.c +++
>>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static
>>> void dma_msi_unmask(struct irq_desc *desc)
>>> iommu->msi.msi_attrib.masked = 0;
>>> }
>>> -static void dma_msi_mask(struct irq_desc *desc)
>>> +static void mask_dma_interrupt(struct iommu *iommu)
>>> {
>>> unsigned long flags;
>>> - struct iommu *iommu = desc->action->dev_id;
>>>
>>> - /* mask it */
>>> spin_lock_irqsave(&iommu->register_lock, flags);
>>> dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM);
>>> spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> +}
>>> +
>>> +static void dma_msi_mask(struct irq_desc *desc)
>>> +{
>>> + struct iommu *iommu = desc->action->dev_id;
>>> +
>>> + /* mask it */
>>> + mask_dma_interrupt(iommu);
>>> iommu->msi.msi_attrib.masked = 1;
>>> }
>>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void)
>>> struct iommu *iommu;
>>> struct iommu_flush *flush = NULL;
>>> int ret;
>>> - unsigned long flags;
>>>
>>> /*
>>> * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults.
>>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void)
>>>
>>> iommu = drhd->iommu;
>>> - clear_fault_bits(iommu);
>>> + /*
>>> + * We shouldn't receive any VT-d interrupt while initializing
>>> + * VT-d so just mask interrupt message generation.
>>> + */
>>> + mask_dma_interrupt(iommu);
>>>
>>> - spin_lock_irqsave(&iommu->register_lock, flags);
>>> - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0);
>>> - spin_unlock_irqrestore(&iommu->register_lock, flags);
>>> + /*
>>> + * And then clear all previous faults.
>>> + */
>>> + clear_fault_bits(iommu);
>>> }
>>>
>>> /*
>>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void)
>>> if ( force_iommu )
>>> continue;
>>> + /*
>>> + * Mask interrupt message generation.
>>> + */
>>> + mask_dma_interrupt(iommu);
>>> +
>>> iommu_disable_translation(iommu);
>>>
>>> /* If interrupt remapping is enabled, queued invalidation
>>
>> Just curious that do you observe a real issue or just catch it through
>> reading code?
>>
>
> Yes, we observed this problem on BDW. And actually one HP customer also
> had this same issue.
>
> Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root
> entry is clear', is triggered like this,
>
> (XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg =
> ffff82c000201000 (XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault
> Overflow (XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending
> Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault
> addr 0, iommu reg = ffff82c000201000 (XEN) [VT-D]d32767v0 DMAR: reason
> 01 - Present bit in root entry is clear (XEN) print_vtd_entries: iommu
> ffff83012aa88600 dev 0000:00:02.0 gmfn 0 (XEN) root_entry =
> ffff83012aa85000 (XEN) root_entry[0] = 80f5001 (XEN) context =
> ffff8300080f5000 (XEN) context[10] = 2_8b75001 (XEN) l4 =
> ffff830008b75000 (XEN) l4_index = 0 (XEN) l4[0] = 8b74003 (XEN)
> l3 = ffff830008b74000 (XEN) l3_index = 0 (XEN) l3[0] =
> 8b73003 (XEN) l2 = ffff830008b73000 (XEN) l2_index = 0 (XEN)
> l2[0] = 8b72003 (XEN) l1 = ffff830008b72000 (XEN) l1_index = 0
> (XEN) l1[0] = 3
>
> At first I doubted this is issued by some improper cache behaviors.
> Because as you see, "root_entry[0] = 80f5001" indicates we already set
> that present bit. But Caching Mode bit is zero in BDW so this means
> remapping hardware doesn't own contest cache. And I also check xen
> always calls clflush to write back memory on CPU side. (Even I also
> tried to use wbinvd to replace clflush). And plus, you can see this
> guest fault address is a weird zero, so I turned to concern if this is
> just an unexpected fault which is pending because of some BIOS cleanup,
> or undefined hardware behaviors. Here I mean clear_fault_bits() is
> already not enough to clear this kind of pending interrupt before we
> complete all VT-D initializations.
So this patch helps nothing in this case?
>
> And anyway, I also think we should disable interrupt generation during
> VT-D setup until VT-D device enable interrupt explicitly.
The question is why there will be a fault interrupt generated during VT-D setup?
>
> Thanks
> Tiejun
Best regards,
Yang
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
` (2 preceding siblings ...)
2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
@ 2015-05-04 8:52 ` Jan Beulich
2015-05-04 9:14 ` Andrew Cooper
2015-05-04 10:39 ` Chen, Tiejun
3 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 8:52 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang
>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>
> void cacheline_flush(char * addr)
> {
> + mb();
> clflush(addr);
> + mb();
> }
I think the purpose of the flush is to force write back, not to evict
the cache line, and if so wmb() would appear to be sufficient. As
the SDM says that's not the case, a comment explaining why wmb()
is not sufficient would seem necessary. Plus in the description I
think "serializing" needs to be changed to "fencing", as serialization
is not what we really care about here. If you and the maintainers
agree, I could certainly fix up both aspects while committing.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 5:08 ` Chen, Tiejun
2015-05-04 6:41 ` Zhang, Yang Z
@ 2015-05-04 8:57 ` Jan Beulich
2015-05-04 11:21 ` Chen, Tiejun
1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 8:57 UTC (permalink / raw)
To: Tiejun Chen
Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang
>>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote:
> At first I doubted this is issued by some improper cache behaviors.
> Because as you see, "root_entry[0] = 80f5001" indicates we already set
> that present bit. But Caching Mode bit is zero in BDW so this means
> remapping hardware doesn't own contest cache. And I also check xen
> always calls clflush to write back memory on CPU side. (Even I also
> tried to use wbinvd to replace clflush). And plus, you can see this
> guest fault address is a weird zero, so I turned to concern if this is
> just an unexpected fault which is pending because of some BIOS cleanup,
> or undefined hardware behaviors. Here I mean clear_fault_bits() is
> already not enough to clear this kind of pending interrupt before we
> complete all VT-D initializations.
This last aspect in particular needs more explanation, but I'm
wary of the change in general (see Yang's response).
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5
2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
@ 2015-05-04 9:00 ` Jan Beulich
0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 9:00 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang
>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> S5 menas we also shut down IOMMU engine as well, so we should
> disable IOMMU engine completely before enter S5, otherwise we
> still probably receive some interrupts in this period. But
> currently VT-D path depends on force_iommu and iommu_intremap
> to determine if we should disable translation and queued
> invalidation. So this unexpected behaviour can be fixed by
> clearing force_iommu and iommu_intremap.
Maybe it can, but is this the right way to do it? Fiddling with IOMMU
internal variables in xen/arch/x86/acpi/power.c seems bogus (not
to say hacky) at least. And why would this be S5-specific?
> Additionally, some cleanups to code comments.
Which has nothing to do in tis patch.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 8:52 ` Jan Beulich
@ 2015-05-04 9:14 ` Andrew Cooper
2015-05-04 9:24 ` Jan Beulich
2015-05-05 15:46 ` Boris Ostrovsky
2015-05-04 10:39 ` Chen, Tiejun
1 sibling, 2 replies; 33+ messages in thread
From: Andrew Cooper @ 2015-05-04 9:14 UTC (permalink / raw)
To: Jan Beulich, Tiejun Chen
Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel
On 04/05/2015 09:52, Jan Beulich wrote:
>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>
>> void cacheline_flush(char * addr)
>> {
>> + mb();
>> clflush(addr);
>> + mb();
>> }
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I
> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.
On the subject of writebacks, we should get around to alternating-up the
use of clflushopt and clwb, either of which would be better than a
clflush in this case (avoiding the need for the leading mfence).
However, the ISA extension document does not indicate which processors
will have support for these new instructions.
~Andrew
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 9:14 ` Andrew Cooper
@ 2015-05-04 9:24 ` Jan Beulich
2015-05-05 1:13 ` Zhang, Yang Z
2015-05-05 15:46 ` Boris Ostrovsky
1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 9:24 UTC (permalink / raw)
To: Andrew Cooper, Tiejun Chen
Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel
>>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote:
> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>
>>> void cacheline_flush(char * addr)
>>> {
>>> + mb();
>>> clflush(addr);
>>> + mb();
>>> }
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>> think "serializing" needs to be changed to "fencing", as serialization
>> is not what we really care about here. If you and the maintainers
>> agree, I could certainly fix up both aspects while committing.
>
> On the subject of writebacks, we should get around to alternating-up the
> use of clflushopt and clwb, either of which would be better than a
> clflush in this case (avoiding the need for the leading mfence).
Plus the barrier would perhaps rather sit around the loop
invoking cacheline_flush() in __iommu_flush_cache(), and I
wonder whether VT-d code shouldn't use available flushing
code elsewhere in the system, and whether that code then
wouldn't need barriers added (or use clflushopt/clwb as you
suggest) instead.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 8:52 ` Jan Beulich
2015-05-04 9:14 ` Andrew Cooper
@ 2015-05-04 10:39 ` Chen, Tiejun
2015-05-04 10:43 ` Jan Beulich
2015-05-04 15:22 ` Konrad Rzeszutek Wilk
1 sibling, 2 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 10:39 UTC (permalink / raw)
To: Jan Beulich
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang
On 2015/5/4 16:52, Jan Beulich wrote:
>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>
>> void cacheline_flush(char * addr)
>> {
>> + mb();
>> clflush(addr);
>> + mb();
>> }
>
> I think the purpose of the flush is to force write back, not to evict
> the cache line, and if so wmb() would appear to be sufficient. As
> the SDM says that's not the case, a comment explaining why wmb()
> is not sufficient would seem necessary. Plus in the description I
Seems wmb() is not sufficient here.
"CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
to be ordered by any other fencing, serializing or other CLFLUSH
instruction."
Thanks
Tiejun
> think "serializing" needs to be changed to "fencing", as serialization
> is not what we really care about here. If you and the maintainers
> agree, I could certainly fix up both aspects while committing.
>
> Jan
>
>
>
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 10:39 ` Chen, Tiejun
@ 2015-05-04 10:43 ` Jan Beulich
2015-05-04 10:52 ` Tian, Kevin
2015-05-05 2:45 ` Chen, Tiejun
2015-05-04 15:22 ` Konrad Rzeszutek Wilk
1 sibling, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-04 10:43 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang
>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>
>>> void cacheline_flush(char * addr)
>>> {
>>> + mb();
>>> clflush(addr);
>>> + mb();
>>> }
>>
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>
> Seems wmb() is not sufficient here.
>
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> to be ordered by any other fencing, serializing or other CLFLUSH
> instruction."
Right - that's what I said in the second sentence.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 10:43 ` Jan Beulich
@ 2015-05-04 10:52 ` Tian, Kevin
2015-05-04 11:26 ` Chen, Tiejun
2015-05-05 2:45 ` Chen, Tiejun
1 sibling, 1 reply; 33+ messages in thread
From: Tian, Kevin @ 2015-05-04 10:52 UTC (permalink / raw)
To: Jan Beulich, Chen, Tiejun
Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, May 04, 2015 6:44 PM
>
> >>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> > On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>>
> >>> void cacheline_flush(char * addr)
> >>> {
> >>> + mb();
> >>> clflush(addr);
> >>> + mb();
> >>> }
> >>
> >> I think the purpose of the flush is to force write back, not to evict
> >> the cache line, and if so wmb() would appear to be sufficient. As
> >> the SDM says that's not the case, a comment explaining why wmb()
> >> is not sufficient would seem necessary. Plus in the description I
> >
> > Seems wmb() is not sufficient here.
> >
> > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> > to be ordered by any other fencing, serializing or other CLFLUSH
> > instruction."
>
> Right - that's what I said in the second sentence.
>
btw why do we need two fences here? Suppose we just care about
writes before the flush point...
Thanks
Kevin
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation
2015-05-04 8:57 ` Jan Beulich
@ 2015-05-04 11:21 ` Chen, Tiejun
0 siblings, 0 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 11:21 UTC (permalink / raw)
To: Jan Beulich
Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang
On 2015/5/4 16:57, Jan Beulich wrote:
>>>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote:
>> At first I doubted this is issued by some improper cache behaviors.
>> Because as you see, "root_entry[0] = 80f5001" indicates we already set
>> that present bit. But Caching Mode bit is zero in BDW so this means
>> remapping hardware doesn't own contest cache. And I also check xen
>> always calls clflush to write back memory on CPU side. (Even I also
>> tried to use wbinvd to replace clflush). And plus, you can see this
>> guest fault address is a weird zero, so I turned to concern if this is
>> just an unexpected fault which is pending because of some BIOS cleanup,
>> or undefined hardware behaviors. Here I mean clear_fault_bits() is
>> already not enough to clear this kind of pending interrupt before we
>> complete all VT-D initializations.
>
> This last aspect in particular needs more explanation, but I'm
> wary of the change in general (see Yang's response).
>
Sure.
As we discussed offline, Yang though at most this is a way to workaround
our problem. So I would ping BIOS team to double check if this is an
issue specific to some given BIOSs or platforms since this doesn't meet
the current specification
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 10:52 ` Tian, Kevin
@ 2015-05-04 11:26 ` Chen, Tiejun
2015-05-05 1:13 ` Tian, Kevin
0 siblings, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-04 11:26 UTC (permalink / raw)
To: Tian, Kevin, Jan Beulich
Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3
On 2015/5/4 18:52, Tian, Kevin wrote:
>> From: Jan Beulich [mailto:JBeulich@suse.com]
>> Sent: Monday, May 04, 2015 6:44 PM
>>
>>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
>>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>
>>>>> void cacheline_flush(char * addr)
>>>>> {
>>>>> + mb();
>>>>> clflush(addr);
>>>>> + mb();
>>>>> }
>>>>
>>>> I think the purpose of the flush is to force write back, not to evict
>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>> is not sufficient would seem necessary. Plus in the description I
>>>
>>> Seems wmb() is not sufficient here.
>>>
>>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
>>> to be ordered by any other fencing, serializing or other CLFLUSH
>>> instruction."
>>
>> Right - that's what I said in the second sentence.
>>
>
> btw why do we need two fences here? Suppose we just care about
> writes before the flush point...
>
The first MFENCE guarantees all load/store visible before flush cache
line. But the second MFENCE just makes sure CLFLUSH is not ordered by
that ensuing load/store, right?
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 10:39 ` Chen, Tiejun
2015-05-04 10:43 ` Jan Beulich
@ 2015-05-04 15:22 ` Konrad Rzeszutek Wilk
2015-05-04 15:33 ` Andrew Cooper
1 sibling, 1 reply; 33+ messages in thread
From: Konrad Rzeszutek Wilk @ 2015-05-04 15:22 UTC (permalink / raw)
To: Chen, Tiejun, boris.ostrovsky
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich,
andrew.cooper3, yang.z.zhang
On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
> On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>--- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>+++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>
> >> void cacheline_flush(char * addr)
> >> {
> >>+ mb();
> >> clflush(addr);
> >>+ mb();
> >> }
> >
> >I think the purpose of the flush is to force write back, not to evict
> >the cache line, and if so wmb() would appear to be sufficient. As
> >the SDM says that's not the case, a comment explaining why wmb()
> >is not sufficient would seem necessary. Plus in the description I
>
> Seems wmb() is not sufficient here.
>
> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
> be ordered by any other fencing, serializing or other CLFLUSH instruction."
That is incorrect. We have observed that CLFLUSH instruction do serialize each
other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline.
Cc-ing Boris who discovered this.
>
> Thanks
> Tiejun
>
> >think "serializing" needs to be changed to "fencing", as serialization
> >is not what we really care about here. If you and the maintainers
> >agree, I could certainly fix up both aspects while committing.
> >
> >Jan
> >
> >
> >
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 15:22 ` Konrad Rzeszutek Wilk
@ 2015-05-04 15:33 ` Andrew Cooper
0 siblings, 0 replies; 33+ messages in thread
From: Andrew Cooper @ 2015-05-04 15:33 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk, Chen, Tiejun, boris.ostrovsky
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich, yang.z.zhang
On 04/05/2015 16:22, Konrad Rzeszutek Wilk wrote:
> On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote:
>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>> void cacheline_flush(char * addr)
>>>> {
>>>> + mb();
>>>> clflush(addr);
>>>> + mb();
>>>> }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>> Seems wmb() is not sufficient here.
>>
>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to
>> be ordered by any other fencing, serializing or other CLFLUSH instruction."
> That is incorrect. We have observed that CLFLUSH instruction do serialize each
> other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline.
>
> Cc-ing Boris who discovered this.
Simply stalling the pipeline says nothing about its ordering with
respect to other instructions.
It is plausible that certain processor pipelines employ stricter
restrictions on CLFLUSH, but Xen must not assume that this is the case
in general, especially as it is in direct contradiction to both the
Intel and AMD instruction manuals.
~Andrew
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 11:26 ` Chen, Tiejun
@ 2015-05-05 1:13 ` Tian, Kevin
0 siblings, 0 replies; 33+ messages in thread
From: Tian, Kevin @ 2015-05-05 1:13 UTC (permalink / raw)
To: Chen, Tiejun, Jan Beulich
Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3
> From: Chen, Tiejun
> Sent: Monday, May 04, 2015 7:26 PM
>
> On 2015/5/4 18:52, Tian, Kevin wrote:
> >> From: Jan Beulich [mailto:JBeulich@suse.com]
> >> Sent: Monday, May 04, 2015 6:44 PM
> >>
> >>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
> >>> On 2015/5/4 16:52, Jan Beulich wrote:
> >>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
> >>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
> >>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
> >>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
> >>>>>
> >>>>> void cacheline_flush(char * addr)
> >>>>> {
> >>>>> + mb();
> >>>>> clflush(addr);
> >>>>> + mb();
> >>>>> }
> >>>>
> >>>> I think the purpose of the flush is to force write back, not to evict
> >>>> the cache line, and if so wmb() would appear to be sufficient. As
> >>>> the SDM says that's not the case, a comment explaining why wmb()
> >>>> is not sufficient would seem necessary. Plus in the description I
> >>>
> >>> Seems wmb() is not sufficient here.
> >>>
> >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
> >>> to be ordered by any other fencing, serializing or other CLFLUSH
> >>> instruction."
> >>
> >> Right - that's what I said in the second sentence.
> >>
> >
> > btw why do we need two fences here? Suppose we just care about
> > writes before the flush point...
> >
>
> The first MFENCE guarantees all load/store visible before flush cache
> line. But the second MFENCE just makes sure CLFLUSH is not ordered by
> that ensuing load/store, right?
>
It's not an usual case to have another store to same destination address
right after CLFLUSH. Usually there's some protocol to do another update
until device completes access. Store to different address is not a concern.
But yes if just looking at this interface alone, we'd better not make
assumption about the caller so two mb() look necessary...
btw as Jan commented in another thread, if there's no other caller of
this function, it might be more reasonable to expand it into the bigger
loop in __iommu_flush_cache so you don't need to fence for each
cache line flush.
Thanks
Kevin
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 9:24 ` Jan Beulich
@ 2015-05-05 1:13 ` Zhang, Yang Z
0 siblings, 0 replies; 33+ messages in thread
From: Zhang, Yang Z @ 2015-05-05 1:13 UTC (permalink / raw)
To: Jan Beulich, Andrew Cooper, Chen, Tiejun
Cc: jinsong.liu, Tian, Kevin, keir, xen-devel
Jan Beulich wrote on 2015-05-04:
>>>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>> void cacheline_flush(char * addr) {
>>>> + mb();
>>>> clflush(addr);
>>>> + mb();
>>>> }
>>> I think the purpose of the flush is to force write back, not to
>>> evict the cache line, and if so wmb() would appear to be
>>> sufficient. As the SDM says that's not the case, a comment
>>> explaining why wmb() is not sufficient would seem necessary. Plus
>>> in the description I think "serializing" needs to be changed to
>>> "fencing", as serialization is not what we really care about here.
>>> If you and the maintainers agree, I could certainly fix up both aspects while committing.
>>
>> On the subject of writebacks, we should get around to alternating-up
>> the use of clflushopt and clwb, either of which would be better than
>> a clflush in this case (avoiding the need for the leading mfence).
>
> Plus the barrier would perhaps rather sit around the loop invoking
> cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d
Agree. It's better to batch the flush operations, like:
@@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size)
if ( !iommus_incoherent )
return;
+ mb();
+
if ( clflush_size == 0 )
clflush_size = get_cache_line_size();
for ( i = 0; i < size; i += clflush_size )
cacheline_flush((char *)addr + i);
+
+ mb();
}
> code shouldn't use available flushing code elsewhere in the system,
> and whether that code then wouldn't need barriers added (or use
> clflushopt/clwb as you
> suggest) instead.
>
> Jan
Best regards,
Yang
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 10:43 ` Jan Beulich
2015-05-04 10:52 ` Tian, Kevin
@ 2015-05-05 2:45 ` Chen, Tiejun
2015-05-05 9:24 ` Jan Beulich
1 sibling, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-05 2:45 UTC (permalink / raw)
To: Jan Beulich, andrew.cooper3, kevin.tian, yang.z.zhang,
Konrad Rzeszutek Wilk, boris.ostrovsky
Cc: jinsong.liu, keir, xen-devel
On 2015/5/4 18:43, Jan Beulich wrote:
>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote:
>> On 2015/5/4 16:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>> void cacheline_flush(char * addr)
>>>> {
>>>> + mb();
>>>> clflush(addr);
>>>> + mb();
>>>> }
>>>
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>
>> Seems wmb() is not sufficient here.
>>
>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
>> to be ordered by any other fencing, serializing or other CLFLUSH
>> instruction."
>
> Right - that's what I said in the second sentence.
>
Thanks for all guys' comments.
Does this work for everyone?
xen/vt-d: need barriers to workaround CLFLUSH
clflush is a light weight but not fencing instruction, so
hence memory fence is necessary to make sure all load/store
visible before flush cache line.
Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index c7bda73..1248a17 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -170,8 +170,15 @@ static void __iommu_flush_cache(void *addr,
unsigned int size)
if ( clflush_size == 0 )
clflush_size = get_cache_line_size();
+ /*
+ * CLFLUSH is only ordered by the MFENCE instruction.
+ * It is not guaranteed to be ordered by any other fencing,
+ * serializing or other CLFLUSH instruction.
+ */
+ mb();
for ( i = 0; i < size; i += clflush_size )
cacheline_flush((char *)addr + i);
+ mb();
}
void iommu_flush_cache_entry(void *addr, unsigned int size)
Thanks
Tiejun
^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 2:45 ` Chen, Tiejun
@ 2015-05-05 9:24 ` Jan Beulich
2015-05-06 6:47 ` Chen, Tiejun
0 siblings, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-05 9:24 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
yang.z.zhang, boris.ostrovsky
>>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote:
> Does this work for everyone?
Please first of all explain why the interfaces in asm/flushtlb.h can't
be used here (at least when flushing entire pages). Because - as
said before - for a complete fix you'd need to deal with the CLFLUSH
use(s) elsewhere in the system too.
Btw, isn't the !iommus_incoherent check in that function inverted?
I.e. why would we _not_ need to flush caches when IOMMUs
are not coherent (and why would flushing be needed when they're
coherent anyway)?
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-04 9:14 ` Andrew Cooper
2015-05-04 9:24 ` Jan Beulich
@ 2015-05-05 15:46 ` Boris Ostrovsky
2015-05-05 15:58 ` Jan Beulich
2015-05-06 7:09 ` Chen, Tiejun
1 sibling, 2 replies; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-05 15:46 UTC (permalink / raw)
To: Andrew Cooper, Jan Beulich, Tiejun Chen
Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel
On 05/04/2015 05:14 AM, Andrew Cooper wrote:
> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>
>>> void cacheline_flush(char * addr)
>>> {
>>> + mb();
>>> clflush(addr);
>>> + mb();
>>> }
>> I think the purpose of the flush is to force write back, not to evict
>> the cache line, and if so wmb() would appear to be sufficient. As
>> the SDM says that's not the case, a comment explaining why wmb()
>> is not sufficient would seem necessary. Plus in the description I
>> think "serializing" needs to be changed to "fencing", as serialization
>> is not what we really care about here. If you and the maintainers
>> agree, I could certainly fix up both aspects while committing.
> On the subject of writebacks, we should get around to alternating-up the
> use of clflushopt and clwb, either of which would be better than a
> clflush in this case (avoiding the need for the leading mfence).
>
> However, the ISA extension document does not indicate which processors
> will have support for these new instructions.
https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
We really should add support for this. On shutting down a very large
guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
This was due to serializing nature of CLFLUSH.
-boris
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 15:46 ` Boris Ostrovsky
@ 2015-05-05 15:58 ` Jan Beulich
2015-05-05 16:11 ` Boris Ostrovsky
2015-05-06 7:09 ` Chen, Tiejun
1 sibling, 1 reply; 33+ messages in thread
From: Jan Beulich @ 2015-05-05 15:58 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
yang.z.zhang, Tiejun Chen
>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>
>>>> void cacheline_flush(char * addr)
>>>> {
>>>> + mb();
>>>> clflush(addr);
>>>> + mb();
>>>> }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
>
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>
> We really should add support for this. On shutting down a very large
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
> This was due to serializing nature of CLFLUSH.
But flushing the IOMMU isn't being done via CPU instructions, but
rather via commands sent to the IOMMU. I.e. I'm somewhat
confused by your reply.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 15:58 ` Jan Beulich
@ 2015-05-05 16:11 ` Boris Ostrovsky
2015-05-06 7:12 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-05 16:11 UTC (permalink / raw)
To: Jan Beulich
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
yang.z.zhang, Tiejun Chen
On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>
>>>>> void cacheline_flush(char * addr)
>>>>> {
>>>>> + mb();
>>>>> clflush(addr);
>>>>> + mb();
>>>>> }
>>>> I think the purpose of the flush is to force write back, not to evict
>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>> is not sufficient would seem necessary. Plus in the description I
>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>> is not what we really care about here. If you and the maintainers
>>>> agree, I could certainly fix up both aspects while committing.
>>> On the subject of writebacks, we should get around to alternating-up the
>>> use of clflushopt and clwb, either of which would be better than a
>>> clflush in this case (avoiding the need for the leading mfence).
>>>
>>> However, the ISA extension document does not indicate which processors
>>> will have support for these new instructions.
>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>
>> We really should add support for this. On shutting down a very large
>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>> This was due to serializing nature of CLFLUSH.
> But flushing the IOMMU isn't being done via CPU instructions, but
> rather via commands sent to the IOMMU. I.e. I'm somewhat
> confused by your reply.
I didn't mean flushing IOMMU itself, sorry. I meant
__iommu_flush_cache() (or whatever it's equivalent we had in the
product, which was 4.1-based).
-boris
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 9:24 ` Jan Beulich
@ 2015-05-06 6:47 ` Chen, Tiejun
2015-05-06 7:14 ` Jan Beulich
0 siblings, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06 6:47 UTC (permalink / raw)
To: Jan Beulich
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
yang.z.zhang, boris.ostrovsky
On 2015/5/5 17:24, Jan Beulich wrote:
>>>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote:
>> Does this work for everyone?
>
> Please first of all explain why the interfaces in asm/flushtlb.h can't
> be used here (at least when flushing entire pages). Because - as
I also don't understand any reason we didn't use this previously on
IOMMU side...
> said before - for a complete fix you'd need to deal with the CLFLUSH
> use(s) elsewhere in the system too.
Looks we need to do this in xen/arch/x86/flushtlb.c:flush_area_local().
>
> Btw, isn't the !iommus_incoherent check in that function inverted?
> I.e. why would we _not_ need to flush caches when IOMMUs
> are not coherent (and why would flushing be needed when they're
> coherent anyway)?
>
I guess you're misunderstanding this
if ( !ecap_coherent(iommu->ecap) )
iommus_incoherent = 1;
So here !iommus_incoherent means IOMMU is coherent and then we don't
need to flush cache in this case.
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 15:46 ` Boris Ostrovsky
2015-05-05 15:58 ` Jan Beulich
@ 2015-05-06 7:09 ` Chen, Tiejun
1 sibling, 0 replies; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06 7:09 UTC (permalink / raw)
To: Boris Ostrovsky, Andrew Cooper, Jan Beulich
Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel
On 2015/5/5 23:46, Boris Ostrovsky wrote:
> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>> void cacheline_flush(char * addr)
>>>> {
>>>> + mb();
>>>> clflush(addr);
>>>> + mb();
>>>> }
>>> I think the purpose of the flush is to force write back, not to evict
>>> the cache line, and if so wmb() would appear to be sufficient. As
>>> the SDM says that's not the case, a comment explaining why wmb()
>>> is not sufficient would seem necessary. Plus in the description I
>>> think "serializing" needs to be changed to "fencing", as serialization
>>> is not what we really care about here. If you and the maintainers
>>> agree, I could certainly fix up both aspects while committing.
>> On the subject of writebacks, we should get around to alternating-up the
>> use of clflushopt and clwb, either of which would be better than a
>> clflush in this case (avoiding the need for the leading mfence).
>>
>> However, the ISA extension document does not indicate which processors
>> will have support for these new instructions.
>
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>
> We really should add support for this. On shutting down a very large
> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
> This was due to serializing nature of CLFLUSH.
>
CLFLUSHOPT/CLWB instructions are ordered only by store-fencing
operations, so at least we still need SFENCE.
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-05 16:11 ` Boris Ostrovsky
@ 2015-05-06 7:12 ` Jan Beulich
2015-05-06 7:26 ` Chen, Tiejun
2015-05-06 15:11 ` Boris Ostrovsky
0 siblings, 2 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06 7:12 UTC (permalink / raw)
To: Boris Ostrovsky
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
yang.z.zhang, Tiejun Chen
>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>
>>>>>> void cacheline_flush(char * addr)
>>>>>> {
>>>>>> + mb();
>>>>>> clflush(addr);
>>>>>> + mb();
>>>>>> }
>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>> is not what we really care about here. If you and the maintainers
>>>>> agree, I could certainly fix up both aspects while committing.
>>>> On the subject of writebacks, we should get around to alternating-up the
>>>> use of clflushopt and clwb, either of which would be better than a
>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>
>>>> However, the ISA extension document does not indicate which processors
>>>> will have support for these new instructions.
>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>
>>> We really should add support for this. On shutting down a very large
>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>> This was due to serializing nature of CLFLUSH.
>> But flushing the IOMMU isn't being done via CPU instructions, but
>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>> confused by your reply.
>
> I didn't mean flushing IOMMU itself, sorry. I meant
> __iommu_flush_cache() (or whatever it's equivalent we had in the
> product, which was 4.1-based).
In that case I wonder how much of that flushing is really necessary
during IOMMU teardown. VT-d maintainers?
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-06 6:47 ` Chen, Tiejun
@ 2015-05-06 7:14 ` Jan Beulich
0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06 7:14 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3,
yang.z.zhang, boris.ostrovsky
>>> On 06.05.15 at 08:47, <tiejun.chen@intel.com> wrote:
> On 2015/5/5 17:24, Jan Beulich wrote:
>> Btw, isn't the !iommus_incoherent check in that function inverted?
>> I.e. why would we _not_ need to flush caches when IOMMUs
>> are not coherent (and why would flushing be needed when they're
>> coherent anyway)?
>>
>
> I guess you're misunderstanding this
>
> if ( !ecap_coherent(iommu->ecap) )
> iommus_incoherent = 1;
>
> So here !iommus_incoherent means IOMMU is coherent and then we don't
> need to flush cache in this case.
Ah, indeed - I overlooked the "in" in the variable name (together with
the thus resulting double negation in the expression). Sorry for the
noise.
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-06 7:12 ` Jan Beulich
@ 2015-05-06 7:26 ` Chen, Tiejun
2015-05-06 7:33 ` Jan Beulich
2015-05-06 15:11 ` Boris Ostrovsky
1 sibling, 1 reply; 33+ messages in thread
From: Chen, Tiejun @ 2015-05-06 7:26 UTC (permalink / raw)
To: Jan Beulich, Boris Ostrovsky
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang
On 2015/5/6 15:12, Jan Beulich wrote:
>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>
>>>>>>> void cacheline_flush(char * addr)
>>>>>>> {
>>>>>>> + mb();
>>>>>>> clflush(addr);
>>>>>>> + mb();
>>>>>>> }
>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>> is not what we really care about here. If you and the maintainers
>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>
>>>>> However, the ISA extension document does not indicate which processors
>>>>> will have support for these new instructions.
>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>>
>>>> We really should add support for this. On shutting down a very large
>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>> This was due to serializing nature of CLFLUSH.
>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>> confused by your reply.
>>
>> I didn't mean flushing IOMMU itself, sorry. I meant
>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>> product, which was 4.1-based).
>
> In that case I wonder how much of that flushing is really necessary
Sorry, what is that case?
> during IOMMU teardown. VT-d maintainers?
>
In most cases __iommu_flush_cache() is used to flush any remapping
structures into memory then IOMMU can get proper data.
Thanks
Tiejun
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-06 7:26 ` Chen, Tiejun
@ 2015-05-06 7:33 ` Jan Beulich
0 siblings, 0 replies; 33+ messages in thread
From: Jan Beulich @ 2015-05-06 7:33 UTC (permalink / raw)
To: Tiejun Chen
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
yang.z.zhang, Boris Ostrovsky
>>> On 06.05.15 at 09:26, <tiejun.chen@intel.com> wrote:
> On 2015/5/6 15:12, Jan Beulich wrote:
>>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>>
>>>>>>>> void cacheline_flush(char * addr)
>>>>>>>> {
>>>>>>>> + mb();
>>>>>>>> clflush(addr);
>>>>>>>> + mb();
>>>>>>>> }
>>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>>> is not what we really care about here. If you and the maintainers
>>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>>
>>>>>> However, the ISA extension document does not indicate which processors
>>>>>> will have support for these new instructions.
>>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>>>
>>>>> We really should add support for this. On shutting down a very large
>>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>>> This was due to serializing nature of CLFLUSH.
>>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>>> confused by your reply.
>>>
>>> I didn't mean flushing IOMMU itself, sorry. I meant
>>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>>> product, which was 4.1-based).
>>
>> In that case I wonder how much of that flushing is really necessary
>
> Sorry, what is that case?
The flushing CPU of side caches during IOMMU teardown for a guest.
>> during IOMMU teardown. VT-d maintainers?
>>
>
> In most cases __iommu_flush_cache() is used to flush any remapping
> structures into memory then IOMMU can get proper data.
Right, but here we're talking about teardown. Since the IOMMU
isn't to use any mappings anymore anyway for the dying guest,
there's little point in flushing respective changes from caches (or
at least that flushing could be limited to the ripping out of top
level structures, provided these get zapped before anything
hanging off of them).
Jan
^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH
2015-05-06 7:12 ` Jan Beulich
2015-05-06 7:26 ` Chen, Tiejun
@ 2015-05-06 15:11 ` Boris Ostrovsky
1 sibling, 0 replies; 33+ messages in thread
From: Boris Ostrovsky @ 2015-05-06 15:11 UTC (permalink / raw)
To: Jan Beulich
Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper,
yang.z.zhang, Tiejun Chen
On 05/06/2015 03:12 AM, Jan Beulich wrote:
>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2015 11:58 AM, Jan Beulich wrote:
>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote:
>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote:
>>>>> On 04/05/2015 09:52, Jan Beulich wrote:
>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote:
>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c
>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void)
>>>>>>>
>>>>>>> void cacheline_flush(char * addr)
>>>>>>> {
>>>>>>> + mb();
>>>>>>> clflush(addr);
>>>>>>> + mb();
>>>>>>> }
>>>>>> I think the purpose of the flush is to force write back, not to evict
>>>>>> the cache line, and if so wmb() would appear to be sufficient. As
>>>>>> the SDM says that's not the case, a comment explaining why wmb()
>>>>>> is not sufficient would seem necessary. Plus in the description I
>>>>>> think "serializing" needs to be changed to "fencing", as serialization
>>>>>> is not what we really care about here. If you and the maintainers
>>>>>> agree, I could certainly fix up both aspects while committing.
>>>>> On the subject of writebacks, we should get around to alternating-up the
>>>>> use of clflushopt and clwb, either of which would be better than a
>>>>> clflush in this case (avoiding the need for the leading mfence).
>>>>>
>>>>> However, the ISA extension document does not indicate which processors
>>>>> will have support for these new instructions.
>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
>>>>
>>>> We really should add support for this. On shutting down a very large
>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU.
>>>> This was due to serializing nature of CLFLUSH.
>>> But flushing the IOMMU isn't being done via CPU instructions, but
>>> rather via commands sent to the IOMMU. I.e. I'm somewhat
>>> confused by your reply.
>> I didn't mean flushing IOMMU itself, sorry. I meant
>> __iommu_flush_cache() (or whatever it's equivalent we had in the
>> product, which was 4.1-based).
> In that case I wonder how much of that flushing is really necessary
> during IOMMU teardown. VT-d maintainers?
I should mention that we saw this on 4.1, where iommu teardown is done
in the domain destruction code path. After we backported code that moves
this into a tasklet things got much better.
Of course we still are doing flushing, even if in the background, and
therefore system resources are still being consumed (and memory is not
available until flushing is done). So if this is unnecessary then there
are good reasons not to do it.
-boris
^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2015-05-06 15:11 UTC | newest]
Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen
2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen
2015-05-04 4:07 ` Zhang, Yang Z
2015-05-04 5:08 ` Chen, Tiejun
2015-05-04 6:41 ` Zhang, Yang Z
2015-05-04 8:57 ` Jan Beulich
2015-05-04 11:21 ` Chen, Tiejun
2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen
2015-05-04 9:00 ` Jan Beulich
2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z
2015-05-04 8:52 ` Jan Beulich
2015-05-04 9:14 ` Andrew Cooper
2015-05-04 9:24 ` Jan Beulich
2015-05-05 1:13 ` Zhang, Yang Z
2015-05-05 15:46 ` Boris Ostrovsky
2015-05-05 15:58 ` Jan Beulich
2015-05-05 16:11 ` Boris Ostrovsky
2015-05-06 7:12 ` Jan Beulich
2015-05-06 7:26 ` Chen, Tiejun
2015-05-06 7:33 ` Jan Beulich
2015-05-06 15:11 ` Boris Ostrovsky
2015-05-06 7:09 ` Chen, Tiejun
2015-05-04 10:39 ` Chen, Tiejun
2015-05-04 10:43 ` Jan Beulich
2015-05-04 10:52 ` Tian, Kevin
2015-05-04 11:26 ` Chen, Tiejun
2015-05-05 1:13 ` Tian, Kevin
2015-05-05 2:45 ` Chen, Tiejun
2015-05-05 9:24 ` Jan Beulich
2015-05-06 6:47 ` Chen, Tiejun
2015-05-06 7:14 ` Jan Beulich
2015-05-04 15:22 ` Konrad Rzeszutek Wilk
2015-05-04 15:33 ` Andrew Cooper
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.