* [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH @ 2015-05-04 2:16 Tiejun Chen 2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen ` (3 more replies) 0 siblings, 4 replies; 33+ messages in thread From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw) To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel clflush is a light weight but not serializing instruction, so hence memory fence is necessary to make sure all load/store visible before flush cache line. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> --- xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644 --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) void cacheline_flush(char * addr) { + mb(); clflush(addr); + mb(); } void flush_all_cache() -- 1.9.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen @ 2015-05-04 2:16 ` Tiejun Chen 2015-05-04 4:07 ` Zhang, Yang Z 2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen ` (2 subsequent siblings) 3 siblings, 1 reply; 33+ messages in thread From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw) To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel While initializing VT-D we should mask interrupt message generation to avoid receiving any interrupt as pending before enable DMA translation, and also mask that before disable DMA engine. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> --- xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static void dma_msi_unmask(struct irq_desc *desc) iommu->msi.msi_attrib.masked = 0; } -static void dma_msi_mask(struct irq_desc *desc) +static void mask_dma_interrupt(struct iommu *iommu) { unsigned long flags; - struct iommu *iommu = desc->action->dev_id; - /* mask it */ spin_lock_irqsave(&iommu->register_lock, flags); dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM); spin_unlock_irqrestore(&iommu->register_lock, flags); +} + +static void dma_msi_mask(struct irq_desc *desc) +{ + struct iommu *iommu = desc->action->dev_id; + + /* mask it */ + mask_dma_interrupt(iommu); iommu->msi.msi_attrib.masked = 1; } @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void) struct iommu *iommu; struct iommu_flush *flush = NULL; int ret; - unsigned long flags; /* * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults. @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void) iommu = drhd->iommu; - clear_fault_bits(iommu); + /* + * We shouldn't receive any VT-d interrupt while initializing + * VT-d so just mask interrupt message generation. + */ + mask_dma_interrupt(iommu); - spin_lock_irqsave(&iommu->register_lock, flags); - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0); - spin_unlock_irqrestore(&iommu->register_lock, flags); + /* + * And then clear all previous faults. + */ + clear_fault_bits(iommu); } /* @@ -2350,6 +2360,11 @@ static void vtd_suspend(void) if ( force_iommu ) continue; + /* + * Mask interrupt message generation. + */ + mask_dma_interrupt(iommu); + iommu_disable_translation(iommu); /* If interrupt remapping is enabled, queued invalidation -- 1.9.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen @ 2015-05-04 4:07 ` Zhang, Yang Z 2015-05-04 5:08 ` Chen, Tiejun 0 siblings, 1 reply; 33+ messages in thread From: Zhang, Yang Z @ 2015-05-04 4:07 UTC (permalink / raw) To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel Chen, Tiejun wrote on 2015-05-04: > While initializing VT-D we should mask interrupt message generation > to avoid receiving any interrupt as pending before enable DMA > translation, and also mask that before disable DMA engine. > > Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> > --- > xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++-------- > 1 file changed, 23 insertions(+), 8 deletions(-) > diff --git a/xen/drivers/passthrough/vtd/iommu.c > b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 --- > a/xen/drivers/passthrough/vtd/iommu.c +++ > b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static > void dma_msi_unmask(struct irq_desc *desc) > iommu->msi.msi_attrib.masked = 0; > } > -static void dma_msi_mask(struct irq_desc *desc) > +static void mask_dma_interrupt(struct iommu *iommu) > { > unsigned long flags; > - struct iommu *iommu = desc->action->dev_id; > > - /* mask it */ > spin_lock_irqsave(&iommu->register_lock, flags); > dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM); > spin_unlock_irqrestore(&iommu->register_lock, flags); > +} > + > +static void dma_msi_mask(struct irq_desc *desc) > +{ > + struct iommu *iommu = desc->action->dev_id; > + > + /* mask it */ > + mask_dma_interrupt(iommu); > iommu->msi.msi_attrib.masked = 1; > } > @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void) > struct iommu *iommu; > struct iommu_flush *flush = NULL; > int ret; > - unsigned long flags; > > /* > * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults. > @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void) > > iommu = drhd->iommu; > - clear_fault_bits(iommu); > + /* > + * We shouldn't receive any VT-d interrupt while initializing > + * VT-d so just mask interrupt message generation. > + */ > + mask_dma_interrupt(iommu); > > - spin_lock_irqsave(&iommu->register_lock, flags); > - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0); > - spin_unlock_irqrestore(&iommu->register_lock, flags); > + /* > + * And then clear all previous faults. > + */ > + clear_fault_bits(iommu); > } > > /* > @@ -2350,6 +2360,11 @@ static void vtd_suspend(void) > if ( force_iommu ) > continue; > + /* > + * Mask interrupt message generation. > + */ > + mask_dma_interrupt(iommu); > + > iommu_disable_translation(iommu); > > /* If interrupt remapping is enabled, queued invalidation Just curious that do you observe a real issue or just catch it through reading code? Best regards, Yang ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 4:07 ` Zhang, Yang Z @ 2015-05-04 5:08 ` Chen, Tiejun 2015-05-04 6:41 ` Zhang, Yang Z 2015-05-04 8:57 ` Jan Beulich 0 siblings, 2 replies; 33+ messages in thread From: Chen, Tiejun @ 2015-05-04 5:08 UTC (permalink / raw) To: Zhang, Yang Z, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel Yang, Thanks for your review. On 2015/5/4 12:07, Zhang, Yang Z wrote: > Chen, Tiejun wrote on 2015-05-04: >> While initializing VT-D we should mask interrupt message generation >> to avoid receiving any interrupt as pending before enable DMA >> translation, and also mask that before disable DMA engine. >> >> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> >> --- >> xen/drivers/passthrough/vtd/iommu.c | 31 +++++++++++++++++++++++-------- >> 1 file changed, 23 insertions(+), 8 deletions(-) >> diff --git a/xen/drivers/passthrough/vtd/iommu.c >> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 --- >> a/xen/drivers/passthrough/vtd/iommu.c +++ >> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static >> void dma_msi_unmask(struct irq_desc *desc) >> iommu->msi.msi_attrib.masked = 0; >> } >> -static void dma_msi_mask(struct irq_desc *desc) >> +static void mask_dma_interrupt(struct iommu *iommu) >> { >> unsigned long flags; >> - struct iommu *iommu = desc->action->dev_id; >> >> - /* mask it */ >> spin_lock_irqsave(&iommu->register_lock, flags); >> dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM); >> spin_unlock_irqrestore(&iommu->register_lock, flags); >> +} >> + >> +static void dma_msi_mask(struct irq_desc *desc) >> +{ >> + struct iommu *iommu = desc->action->dev_id; >> + >> + /* mask it */ >> + mask_dma_interrupt(iommu); >> iommu->msi.msi_attrib.masked = 1; >> } >> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void) >> struct iommu *iommu; >> struct iommu_flush *flush = NULL; >> int ret; >> - unsigned long flags; >> >> /* >> * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults. >> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void) >> >> iommu = drhd->iommu; >> - clear_fault_bits(iommu); >> + /* >> + * We shouldn't receive any VT-d interrupt while initializing >> + * VT-d so just mask interrupt message generation. >> + */ >> + mask_dma_interrupt(iommu); >> >> - spin_lock_irqsave(&iommu->register_lock, flags); >> - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0); >> - spin_unlock_irqrestore(&iommu->register_lock, flags); >> + /* >> + * And then clear all previous faults. >> + */ >> + clear_fault_bits(iommu); >> } >> >> /* >> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void) >> if ( force_iommu ) >> continue; >> + /* >> + * Mask interrupt message generation. >> + */ >> + mask_dma_interrupt(iommu); >> + >> iommu_disable_translation(iommu); >> >> /* If interrupt remapping is enabled, queued invalidation > > Just curious that do you observe a real issue or just catch it through reading code? > Yes, we observed this problem on BDW. And actually one HP customer also had this same issue. Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root entry is clear', is triggered like this, (XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg = ffff82c000201000 (XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault Overflow (XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault addr 0, iommu reg = ffff82c000201000 (XEN) [VT-D]d32767v0 DMAR: reason 01 - Present bit in root entry is clear (XEN) print_vtd_entries: iommu ffff83012aa88600 dev 0000:00:02.0 gmfn 0 (XEN) root_entry = ffff83012aa85000 (XEN) root_entry[0] = 80f5001 (XEN) context = ffff8300080f5000 (XEN) context[10] = 2_8b75001 (XEN) l4 = ffff830008b75000 (XEN) l4_index = 0 (XEN) l4[0] = 8b74003 (XEN) l3 = ffff830008b74000 (XEN) l3_index = 0 (XEN) l3[0] = 8b73003 (XEN) l2 = ffff830008b73000 (XEN) l2_index = 0 (XEN) l2[0] = 8b72003 (XEN) l1 = ffff830008b72000 (XEN) l1_index = 0 (XEN) l1[0] = 3 At first I doubted this is issued by some improper cache behaviors. Because as you see, "root_entry[0] = 80f5001" indicates we already set that present bit. But Caching Mode bit is zero in BDW so this means remapping hardware doesn't own contest cache. And I also check xen always calls clflush to write back memory on CPU side. (Even I also tried to use wbinvd to replace clflush). And plus, you can see this guest fault address is a weird zero, so I turned to concern if this is just an unexpected fault which is pending because of some BIOS cleanup, or undefined hardware behaviors. Here I mean clear_fault_bits() is already not enough to clear this kind of pending interrupt before we complete all VT-D initializations. And anyway, I also think we should disable interrupt generation during VT-D setup until VT-D device enable interrupt explicitly. Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 5:08 ` Chen, Tiejun @ 2015-05-04 6:41 ` Zhang, Yang Z 2015-05-04 8:57 ` Jan Beulich 1 sibling, 0 replies; 33+ messages in thread From: Zhang, Yang Z @ 2015-05-04 6:41 UTC (permalink / raw) To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel Chen, Tiejun wrote on 2015-05-04: > Yang, > > Thanks for your review. > > On 2015/5/4 12:07, Zhang, Yang Z wrote: >> Chen, Tiejun wrote on 2015-05-04: >>> While initializing VT-D we should mask interrupt message generation >>> to avoid receiving any interrupt as pending before enable DMA >>> translation, and also mask that before disable DMA engine. >>> >>> Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> >>> --- >>> xen/drivers/passthrough/vtd/iommu.c | 31 >>> +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 >>> deletions(-) >>> diff --git a/xen/drivers/passthrough/vtd/iommu.c >>> b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..72cd854 100644 --- >>> a/xen/drivers/passthrough/vtd/iommu.c +++ >>> b/xen/drivers/passthrough/vtd/iommu.c @@ -1000,15 +1000,21 @@ static >>> void dma_msi_unmask(struct irq_desc *desc) >>> iommu->msi.msi_attrib.masked = 0; >>> } >>> -static void dma_msi_mask(struct irq_desc *desc) >>> +static void mask_dma_interrupt(struct iommu *iommu) >>> { >>> unsigned long flags; >>> - struct iommu *iommu = desc->action->dev_id; >>> >>> - /* mask it */ >>> spin_lock_irqsave(&iommu->register_lock, flags); >>> dmar_writel(iommu->reg, DMAR_FECTL_REG, DMA_FECTL_IM); >>> spin_unlock_irqrestore(&iommu->register_lock, flags); >>> +} >>> + >>> +static void dma_msi_mask(struct irq_desc *desc) >>> +{ >>> + struct iommu *iommu = desc->action->dev_id; >>> + >>> + /* mask it */ >>> + mask_dma_interrupt(iommu); >>> iommu->msi.msi_attrib.masked = 1; >>> } >>> @@ -1997,7 +2003,6 @@ static int init_vtd_hw(void) >>> struct iommu *iommu; >>> struct iommu_flush *flush = NULL; >>> int ret; >>> - unsigned long flags; >>> >>> /* >>> * Basic VT-d HW init: set VT-d interrupt, clear VT-d faults. >>> @@ -2008,11 +2013,16 @@ static int init_vtd_hw(void) >>> >>> iommu = drhd->iommu; >>> - clear_fault_bits(iommu); >>> + /* >>> + * We shouldn't receive any VT-d interrupt while initializing >>> + * VT-d so just mask interrupt message generation. >>> + */ >>> + mask_dma_interrupt(iommu); >>> >>> - spin_lock_irqsave(&iommu->register_lock, flags); >>> - dmar_writel(iommu->reg, DMAR_FECTL_REG, 0); >>> - spin_unlock_irqrestore(&iommu->register_lock, flags); >>> + /* >>> + * And then clear all previous faults. >>> + */ >>> + clear_fault_bits(iommu); >>> } >>> >>> /* >>> @@ -2350,6 +2360,11 @@ static void vtd_suspend(void) >>> if ( force_iommu ) >>> continue; >>> + /* >>> + * Mask interrupt message generation. >>> + */ >>> + mask_dma_interrupt(iommu); >>> + >>> iommu_disable_translation(iommu); >>> >>> /* If interrupt remapping is enabled, queued invalidation >> >> Just curious that do you observe a real issue or just catch it through >> reading code? >> > > Yes, we observed this problem on BDW. And actually one HP customer also > had this same issue. > > Once we enable IOMMU translation, an IOMMU fault, 'Present bit in root > entry is clear', is triggered like this, > > (XEN) [VT-D]iommu.c:731: iommu_enable_translation: iommu->reg = > ffff82c000201000 (XEN) [VT-D]iommu.c:875: iommu_fault_status: Fault > Overflow (XEN) [VT-D]iommu.c:877: iommu_fault_status: Primary Pending > Fault (XEN) [VT-D]DMAR:[DMA Write] Request device [0000:00:02.0] fault > addr 0, iommu reg = ffff82c000201000 (XEN) [VT-D]d32767v0 DMAR: reason > 01 - Present bit in root entry is clear (XEN) print_vtd_entries: iommu > ffff83012aa88600 dev 0000:00:02.0 gmfn 0 (XEN) root_entry = > ffff83012aa85000 (XEN) root_entry[0] = 80f5001 (XEN) context = > ffff8300080f5000 (XEN) context[10] = 2_8b75001 (XEN) l4 = > ffff830008b75000 (XEN) l4_index = 0 (XEN) l4[0] = 8b74003 (XEN) > l3 = ffff830008b74000 (XEN) l3_index = 0 (XEN) l3[0] = > 8b73003 (XEN) l2 = ffff830008b73000 (XEN) l2_index = 0 (XEN) > l2[0] = 8b72003 (XEN) l1 = ffff830008b72000 (XEN) l1_index = 0 > (XEN) l1[0] = 3 > > At first I doubted this is issued by some improper cache behaviors. > Because as you see, "root_entry[0] = 80f5001" indicates we already set > that present bit. But Caching Mode bit is zero in BDW so this means > remapping hardware doesn't own contest cache. And I also check xen > always calls clflush to write back memory on CPU side. (Even I also > tried to use wbinvd to replace clflush). And plus, you can see this > guest fault address is a weird zero, so I turned to concern if this is > just an unexpected fault which is pending because of some BIOS cleanup, > or undefined hardware behaviors. Here I mean clear_fault_bits() is > already not enough to clear this kind of pending interrupt before we > complete all VT-D initializations. So this patch helps nothing in this case? > > And anyway, I also think we should disable interrupt generation during > VT-D setup until VT-D device enable interrupt explicitly. The question is why there will be a fault interrupt generated during VT-D setup? > > Thanks > Tiejun Best regards, Yang ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 5:08 ` Chen, Tiejun 2015-05-04 6:41 ` Zhang, Yang Z @ 2015-05-04 8:57 ` Jan Beulich 2015-05-04 11:21 ` Chen, Tiejun 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2015-05-04 8:57 UTC (permalink / raw) To: Tiejun Chen Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang >>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote: > At first I doubted this is issued by some improper cache behaviors. > Because as you see, "root_entry[0] = 80f5001" indicates we already set > that present bit. But Caching Mode bit is zero in BDW so this means > remapping hardware doesn't own contest cache. And I also check xen > always calls clflush to write back memory on CPU side. (Even I also > tried to use wbinvd to replace clflush). And plus, you can see this > guest fault address is a weird zero, so I turned to concern if this is > just an unexpected fault which is pending because of some BIOS cleanup, > or undefined hardware behaviors. Here I mean clear_fault_bits() is > already not enough to clear this kind of pending interrupt before we > complete all VT-D initializations. This last aspect in particular needs more explanation, but I'm wary of the change in general (see Yang's response). Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 2/3] xen/vt-d: mask interrupt message generation 2015-05-04 8:57 ` Jan Beulich @ 2015-05-04 11:21 ` Chen, Tiejun 0 siblings, 0 replies; 33+ messages in thread From: Chen, Tiejun @ 2015-05-04 11:21 UTC (permalink / raw) To: Jan Beulich Cc: Kevin Tian, keir, jinsong.liu, xen-devel, andrew.cooper3, Yang Z Zhang On 2015/5/4 16:57, Jan Beulich wrote: >>>> On 04.05.15 at 07:08, <tiejun.chen@intel.com> wrote: >> At first I doubted this is issued by some improper cache behaviors. >> Because as you see, "root_entry[0] = 80f5001" indicates we already set >> that present bit. But Caching Mode bit is zero in BDW so this means >> remapping hardware doesn't own contest cache. And I also check xen >> always calls clflush to write back memory on CPU side. (Even I also >> tried to use wbinvd to replace clflush). And plus, you can see this >> guest fault address is a weird zero, so I turned to concern if this is >> just an unexpected fault which is pending because of some BIOS cleanup, >> or undefined hardware behaviors. Here I mean clear_fault_bits() is >> already not enough to clear this kind of pending interrupt before we >> complete all VT-D initializations. > > This last aspect in particular needs more explanation, but I'm > wary of the change in general (see Yang's response). > Sure. As we discussed offline, Yang though at most this is a way to workaround our problem. So I would ping BIOS team to double check if this is an issue specific to some given BIOSs or platforms since this doesn't meet the current specification Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen 2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen @ 2015-05-04 2:16 ` Tiejun Chen 2015-05-04 9:00 ` Jan Beulich 2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z 2015-05-04 8:52 ` Jan Beulich 3 siblings, 1 reply; 33+ messages in thread From: Tiejun Chen @ 2015-05-04 2:16 UTC (permalink / raw) To: yang.z.zhang, kevin.tian, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel S5 menas we also shut down IOMMU engine as well, so we should disable IOMMU engine completely before enter S5, otherwise we still probably receive some interrupts in this period. But currently VT-D path depends on force_iommu and iommu_intremap to determine if we should disable translation and queued invalidation. So this unexpected behaviour can be fixed by clearing force_iommu and iommu_intremap. Additionally, some cleanups to code comments. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> --- xen/arch/x86/acpi/power.c | 2 ++ xen/arch/x86/crash.c | 12 ++++++++---- xen/drivers/passthrough/vtd/iommu.c | 5 +++-- 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index f41f0de..345470d 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -183,6 +183,8 @@ static int enter_state(u32 state) break; case ACPI_STATE_S5: acpi_enter_sleep_state(ACPI_STATE_S5); + /* Make sure we really disable VT-d engine completely. */ + force_iommu = iommu_intremap = 0; break; default: error = -EINVAL; diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c index eb7be9c..123fb12 100644 --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -171,15 +171,19 @@ static void nmi_shootdown_cpus(void) printk("Failed to shoot down CPUs {%s}\n", keyhandler_scratch); } - /* Crash shutdown any IOMMU functionality as the crashdump kernel is not - * happy when booting if interrupt/dma remapping is still enabled */ + /* + * Crash shutdown any IOMMU functionality as the crashdump kernel is not + * happy when booting if interrupt/dma remapping is still enabled. + */ iommu_crash_shutdown(); __stop_this_cpu(); - /* This is a bit of a hack due to the problems with the x2apic_enabled + /* + * This is a bit of a hack due to the problems with the x2apic_enabled * variable, but we can't do any better without a significant refactoring - * of the APIC code */ + * of the APIC code. + */ x2apic_enabled = (current_local_apic_mode() == APIC_MODE_X2APIC); disable_IO_APIC(); diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index 72cd854..28244ec 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -2367,9 +2367,10 @@ static void vtd_suspend(void) iommu_disable_translation(iommu); - /* If interrupt remapping is enabled, queued invalidation + /* + * If interrupt remapping is enabled, queued invalidation * will be disabled following interupt remapping disabling - * in local apic suspend + * in local apic suspend. */ if ( !iommu_intremap && iommu_qinval ) disable_qinval(iommu); -- 1.9.1 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen @ 2015-05-04 9:00 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-04 9:00 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang >>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: > S5 menas we also shut down IOMMU engine as well, so we should > disable IOMMU engine completely before enter S5, otherwise we > still probably receive some interrupts in this period. But > currently VT-D path depends on force_iommu and iommu_intremap > to determine if we should disable translation and queued > invalidation. So this unexpected behaviour can be fixed by > clearing force_iommu and iommu_intremap. Maybe it can, but is this the right way to do it? Fiddling with IOMMU internal variables in xen/arch/x86/acpi/power.c seems bogus (not to say hacky) at least. And why would this be S5-specific? > Additionally, some cleanups to code comments. Which has nothing to do in tis patch. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen 2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen 2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen @ 2015-05-04 4:07 ` Zhang, Yang Z 2015-05-04 8:52 ` Jan Beulich 3 siblings, 0 replies; 33+ messages in thread From: Zhang, Yang Z @ 2015-05-04 4:07 UTC (permalink / raw) To: Chen, Tiejun, Tian, Kevin, jbeulich, jinsong.liu, keir, andrew.cooper3 Cc: xen-devel Chen, Tiejun wrote on 2015-05-04: > clflush is a light weight but not serializing instruction, so hence > memory fence is necessary to make sure all load/store visible before flush cache line. > > Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com> > --- > xen/drivers/passthrough/vtd/x86/vtd.c | 2 ++ > 1 file changed, 2 insertions(+) > diff --git a/xen/drivers/passthrough/vtd/x86/vtd.c > b/xen/drivers/passthrough/vtd/x86/vtd.c index 109234e..fd2ff04 100644 > --- a/xen/drivers/passthrough/vtd/x86/vtd.c +++ > b/xen/drivers/passthrough/vtd/x86/vtd.c @@ -56,7 +56,9 @@ unsigned int > get_cache_line_size(void) > > void cacheline_flush(char * addr) > { > + mb(); > clflush(addr); > + mb(); > } > > void flush_all_cache() Best regards, Yang ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen ` (2 preceding siblings ...) 2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z @ 2015-05-04 8:52 ` Jan Beulich 2015-05-04 9:14 ` Andrew Cooper 2015-05-04 10:39 ` Chen, Tiejun 3 siblings, 2 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-04 8:52 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang >>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: > --- a/xen/drivers/passthrough/vtd/x86/vtd.c > +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > > void cacheline_flush(char * addr) > { > + mb(); > clflush(addr); > + mb(); > } I think the purpose of the flush is to force write back, not to evict the cache line, and if so wmb() would appear to be sufficient. As the SDM says that's not the case, a comment explaining why wmb() is not sufficient would seem necessary. Plus in the description I think "serializing" needs to be changed to "fencing", as serialization is not what we really care about here. If you and the maintainers agree, I could certainly fix up both aspects while committing. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 8:52 ` Jan Beulich @ 2015-05-04 9:14 ` Andrew Cooper 2015-05-04 9:24 ` Jan Beulich 2015-05-05 15:46 ` Boris Ostrovsky 2015-05-04 10:39 ` Chen, Tiejun 1 sibling, 2 replies; 33+ messages in thread From: Andrew Cooper @ 2015-05-04 9:14 UTC (permalink / raw) To: Jan Beulich, Tiejun Chen Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel On 04/05/2015 09:52, Jan Beulich wrote: >>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >> >> void cacheline_flush(char * addr) >> { >> + mb(); >> clflush(addr); >> + mb(); >> } > I think the purpose of the flush is to force write back, not to evict > the cache line, and if so wmb() would appear to be sufficient. As > the SDM says that's not the case, a comment explaining why wmb() > is not sufficient would seem necessary. Plus in the description I > think "serializing" needs to be changed to "fencing", as serialization > is not what we really care about here. If you and the maintainers > agree, I could certainly fix up both aspects while committing. On the subject of writebacks, we should get around to alternating-up the use of clflushopt and clwb, either of which would be better than a clflush in this case (avoiding the need for the leading mfence). However, the ISA extension document does not indicate which processors will have support for these new instructions. ~Andrew ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 9:14 ` Andrew Cooper @ 2015-05-04 9:24 ` Jan Beulich 2015-05-05 1:13 ` Zhang, Yang Z 2015-05-05 15:46 ` Boris Ostrovsky 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2015-05-04 9:24 UTC (permalink / raw) To: Andrew Cooper, Tiejun Chen Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel >>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote: > On 04/05/2015 09:52, Jan Beulich wrote: >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>> >>> void cacheline_flush(char * addr) >>> { >>> + mb(); >>> clflush(addr); >>> + mb(); >>> } >> I think the purpose of the flush is to force write back, not to evict >> the cache line, and if so wmb() would appear to be sufficient. As >> the SDM says that's not the case, a comment explaining why wmb() >> is not sufficient would seem necessary. Plus in the description I >> think "serializing" needs to be changed to "fencing", as serialization >> is not what we really care about here. If you and the maintainers >> agree, I could certainly fix up both aspects while committing. > > On the subject of writebacks, we should get around to alternating-up the > use of clflushopt and clwb, either of which would be better than a > clflush in this case (avoiding the need for the leading mfence). Plus the barrier would perhaps rather sit around the loop invoking cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d code shouldn't use available flushing code elsewhere in the system, and whether that code then wouldn't need barriers added (or use clflushopt/clwb as you suggest) instead. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 9:24 ` Jan Beulich @ 2015-05-05 1:13 ` Zhang, Yang Z 0 siblings, 0 replies; 33+ messages in thread From: Zhang, Yang Z @ 2015-05-05 1:13 UTC (permalink / raw) To: Jan Beulich, Andrew Cooper, Chen, Tiejun Cc: jinsong.liu, Tian, Kevin, keir, xen-devel Jan Beulich wrote on 2015-05-04: >>>> On 04.05.15 at 11:14, <andrew.cooper3@citrix.com> wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>> >>>> void cacheline_flush(char * addr) { >>>> + mb(); >>>> clflush(addr); >>>> + mb(); >>>> } >>> I think the purpose of the flush is to force write back, not to >>> evict the cache line, and if so wmb() would appear to be >>> sufficient. As the SDM says that's not the case, a comment >>> explaining why wmb() is not sufficient would seem necessary. Plus >>> in the description I think "serializing" needs to be changed to >>> "fencing", as serialization is not what we really care about here. >>> If you and the maintainers agree, I could certainly fix up both aspects while committing. >> >> On the subject of writebacks, we should get around to alternating-up >> the use of clflushopt and clwb, either of which would be better than >> a clflush in this case (avoiding the need for the leading mfence). > > Plus the barrier would perhaps rather sit around the loop invoking > cacheline_flush() in __iommu_flush_cache(), and I wonder whether VT-d Agree. It's better to batch the flush operations, like: @@ -167,11 +167,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size) if ( !iommus_incoherent ) return; + mb(); + if ( clflush_size == 0 ) clflush_size = get_cache_line_size(); for ( i = 0; i < size; i += clflush_size ) cacheline_flush((char *)addr + i); + + mb(); } > code shouldn't use available flushing code elsewhere in the system, > and whether that code then wouldn't need barriers added (or use > clflushopt/clwb as you > suggest) instead. > > Jan Best regards, Yang ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 9:14 ` Andrew Cooper 2015-05-04 9:24 ` Jan Beulich @ 2015-05-05 15:46 ` Boris Ostrovsky 2015-05-05 15:58 ` Jan Beulich 2015-05-06 7:09 ` Chen, Tiejun 1 sibling, 2 replies; 33+ messages in thread From: Boris Ostrovsky @ 2015-05-05 15:46 UTC (permalink / raw) To: Andrew Cooper, Jan Beulich, Tiejun Chen Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel On 05/04/2015 05:14 AM, Andrew Cooper wrote: > On 04/05/2015 09:52, Jan Beulich wrote: >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>> >>> void cacheline_flush(char * addr) >>> { >>> + mb(); >>> clflush(addr); >>> + mb(); >>> } >> I think the purpose of the flush is to force write back, not to evict >> the cache line, and if so wmb() would appear to be sufficient. As >> the SDM says that's not the case, a comment explaining why wmb() >> is not sufficient would seem necessary. Plus in the description I >> think "serializing" needs to be changed to "fencing", as serialization >> is not what we really care about here. If you and the maintainers >> agree, I could certainly fix up both aspects while committing. > On the subject of writebacks, we should get around to alternating-up the > use of clflushopt and clwb, either of which would be better than a > clflush in this case (avoiding the need for the leading mfence). > > However, the ISA extension document does not indicate which processors > will have support for these new instructions. https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf We really should add support for this. On shutting down a very large guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. This was due to serializing nature of CLFLUSH. -boris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 15:46 ` Boris Ostrovsky @ 2015-05-05 15:58 ` Jan Beulich 2015-05-05 16:11 ` Boris Ostrovsky 2015-05-06 7:09 ` Chen, Tiejun 1 sibling, 1 reply; 33+ messages in thread From: Jan Beulich @ 2015-05-05 15:58 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang, Tiejun Chen >>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: > On 05/04/2015 05:14 AM, Andrew Cooper wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>> >>>> void cacheline_flush(char * addr) >>>> { >>>> + mb(); >>>> clflush(addr); >>>> + mb(); >>>> } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >>> think "serializing" needs to be changed to "fencing", as serialization >>> is not what we really care about here. If you and the maintainers >>> agree, I could certainly fix up both aspects while committing. >> On the subject of writebacks, we should get around to alternating-up the >> use of clflushopt and clwb, either of which would be better than a >> clflush in this case (avoiding the need for the leading mfence). >> >> However, the ISA extension document does not indicate which processors >> will have support for these new instructions. > > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf > > We really should add support for this. On shutting down a very large > guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. > This was due to serializing nature of CLFLUSH. But flushing the IOMMU isn't being done via CPU instructions, but rather via commands sent to the IOMMU. I.e. I'm somewhat confused by your reply. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 15:58 ` Jan Beulich @ 2015-05-05 16:11 ` Boris Ostrovsky 2015-05-06 7:12 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Boris Ostrovsky @ 2015-05-05 16:11 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang, Tiejun Chen On 05/05/2015 11:58 AM, Jan Beulich wrote: >>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: >> On 05/04/2015 05:14 AM, Andrew Cooper wrote: >>> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>> >>>>> void cacheline_flush(char * addr) >>>>> { >>>>> + mb(); >>>>> clflush(addr); >>>>> + mb(); >>>>> } >>>> I think the purpose of the flush is to force write back, not to evict >>>> the cache line, and if so wmb() would appear to be sufficient. As >>>> the SDM says that's not the case, a comment explaining why wmb() >>>> is not sufficient would seem necessary. Plus in the description I >>>> think "serializing" needs to be changed to "fencing", as serialization >>>> is not what we really care about here. If you and the maintainers >>>> agree, I could certainly fix up both aspects while committing. >>> On the subject of writebacks, we should get around to alternating-up the >>> use of clflushopt and clwb, either of which would be better than a >>> clflush in this case (avoiding the need for the leading mfence). >>> >>> However, the ISA extension document does not indicate which processors >>> will have support for these new instructions. >> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >> >> We really should add support for this. On shutting down a very large >> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >> This was due to serializing nature of CLFLUSH. > But flushing the IOMMU isn't being done via CPU instructions, but > rather via commands sent to the IOMMU. I.e. I'm somewhat > confused by your reply. I didn't mean flushing IOMMU itself, sorry. I meant __iommu_flush_cache() (or whatever it's equivalent we had in the product, which was 4.1-based). -boris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 16:11 ` Boris Ostrovsky @ 2015-05-06 7:12 ` Jan Beulich 2015-05-06 7:26 ` Chen, Tiejun 2015-05-06 15:11 ` Boris Ostrovsky 0 siblings, 2 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-06 7:12 UTC (permalink / raw) To: Boris Ostrovsky Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang, Tiejun Chen >>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote: > On 05/05/2015 11:58 AM, Jan Beulich wrote: >>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: >>> On 05/04/2015 05:14 AM, Andrew Cooper wrote: >>>> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>>> >>>>>> void cacheline_flush(char * addr) >>>>>> { >>>>>> + mb(); >>>>>> clflush(addr); >>>>>> + mb(); >>>>>> } >>>>> I think the purpose of the flush is to force write back, not to evict >>>>> the cache line, and if so wmb() would appear to be sufficient. As >>>>> the SDM says that's not the case, a comment explaining why wmb() >>>>> is not sufficient would seem necessary. Plus in the description I >>>>> think "serializing" needs to be changed to "fencing", as serialization >>>>> is not what we really care about here. If you and the maintainers >>>>> agree, I could certainly fix up both aspects while committing. >>>> On the subject of writebacks, we should get around to alternating-up the >>>> use of clflushopt and clwb, either of which would be better than a >>>> clflush in this case (avoiding the need for the leading mfence). >>>> >>>> However, the ISA extension document does not indicate which processors >>>> will have support for these new instructions. >>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >>> >>> We really should add support for this. On shutting down a very large >>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >>> This was due to serializing nature of CLFLUSH. >> But flushing the IOMMU isn't being done via CPU instructions, but >> rather via commands sent to the IOMMU. I.e. I'm somewhat >> confused by your reply. > > I didn't mean flushing IOMMU itself, sorry. I meant > __iommu_flush_cache() (or whatever it's equivalent we had in the > product, which was 4.1-based). In that case I wonder how much of that flushing is really necessary during IOMMU teardown. VT-d maintainers? Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-06 7:12 ` Jan Beulich @ 2015-05-06 7:26 ` Chen, Tiejun 2015-05-06 7:33 ` Jan Beulich 2015-05-06 15:11 ` Boris Ostrovsky 1 sibling, 1 reply; 33+ messages in thread From: Chen, Tiejun @ 2015-05-06 7:26 UTC (permalink / raw) To: Jan Beulich, Boris Ostrovsky Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang On 2015/5/6 15:12, Jan Beulich wrote: >>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote: >> On 05/05/2015 11:58 AM, Jan Beulich wrote: >>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: >>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote: >>>>> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>>>> >>>>>>> void cacheline_flush(char * addr) >>>>>>> { >>>>>>> + mb(); >>>>>>> clflush(addr); >>>>>>> + mb(); >>>>>>> } >>>>>> I think the purpose of the flush is to force write back, not to evict >>>>>> the cache line, and if so wmb() would appear to be sufficient. As >>>>>> the SDM says that's not the case, a comment explaining why wmb() >>>>>> is not sufficient would seem necessary. Plus in the description I >>>>>> think "serializing" needs to be changed to "fencing", as serialization >>>>>> is not what we really care about here. If you and the maintainers >>>>>> agree, I could certainly fix up both aspects while committing. >>>>> On the subject of writebacks, we should get around to alternating-up the >>>>> use of clflushopt and clwb, either of which would be better than a >>>>> clflush in this case (avoiding the need for the leading mfence). >>>>> >>>>> However, the ISA extension document does not indicate which processors >>>>> will have support for these new instructions. >>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >>>> >>>> We really should add support for this. On shutting down a very large >>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >>>> This was due to serializing nature of CLFLUSH. >>> But flushing the IOMMU isn't being done via CPU instructions, but >>> rather via commands sent to the IOMMU. I.e. I'm somewhat >>> confused by your reply. >> >> I didn't mean flushing IOMMU itself, sorry. I meant >> __iommu_flush_cache() (or whatever it's equivalent we had in the >> product, which was 4.1-based). > > In that case I wonder how much of that flushing is really necessary Sorry, what is that case? > during IOMMU teardown. VT-d maintainers? > In most cases __iommu_flush_cache() is used to flush any remapping structures into memory then IOMMU can get proper data. Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-06 7:26 ` Chen, Tiejun @ 2015-05-06 7:33 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-06 7:33 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang, Boris Ostrovsky >>> On 06.05.15 at 09:26, <tiejun.chen@intel.com> wrote: > On 2015/5/6 15:12, Jan Beulich wrote: >>>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote: >>> On 05/05/2015 11:58 AM, Jan Beulich wrote: >>>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: >>>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote: >>>>>> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>>>>> >>>>>>>> void cacheline_flush(char * addr) >>>>>>>> { >>>>>>>> + mb(); >>>>>>>> clflush(addr); >>>>>>>> + mb(); >>>>>>>> } >>>>>>> I think the purpose of the flush is to force write back, not to evict >>>>>>> the cache line, and if so wmb() would appear to be sufficient. As >>>>>>> the SDM says that's not the case, a comment explaining why wmb() >>>>>>> is not sufficient would seem necessary. Plus in the description I >>>>>>> think "serializing" needs to be changed to "fencing", as serialization >>>>>>> is not what we really care about here. If you and the maintainers >>>>>>> agree, I could certainly fix up both aspects while committing. >>>>>> On the subject of writebacks, we should get around to alternating-up the >>>>>> use of clflushopt and clwb, either of which would be better than a >>>>>> clflush in this case (avoiding the need for the leading mfence). >>>>>> >>>>>> However, the ISA extension document does not indicate which processors >>>>>> will have support for these new instructions. >>>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >>>>> >>>>> We really should add support for this. On shutting down a very large >>>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >>>>> This was due to serializing nature of CLFLUSH. >>>> But flushing the IOMMU isn't being done via CPU instructions, but >>>> rather via commands sent to the IOMMU. I.e. I'm somewhat >>>> confused by your reply. >>> >>> I didn't mean flushing IOMMU itself, sorry. I meant >>> __iommu_flush_cache() (or whatever it's equivalent we had in the >>> product, which was 4.1-based). >> >> In that case I wonder how much of that flushing is really necessary > > Sorry, what is that case? The flushing CPU of side caches during IOMMU teardown for a guest. >> during IOMMU teardown. VT-d maintainers? >> > > In most cases __iommu_flush_cache() is used to flush any remapping > structures into memory then IOMMU can get proper data. Right, but here we're talking about teardown. Since the IOMMU isn't to use any mappings anymore anyway for the dying guest, there's little point in flushing respective changes from caches (or at least that flushing could be limited to the ripping out of top level structures, provided these get zapped before anything hanging off of them). Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-06 7:12 ` Jan Beulich 2015-05-06 7:26 ` Chen, Tiejun @ 2015-05-06 15:11 ` Boris Ostrovsky 1 sibling, 0 replies; 33+ messages in thread From: Boris Ostrovsky @ 2015-05-06 15:11 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, keir, jinsong.liu, xen-devel, Andrew Cooper, yang.z.zhang, Tiejun Chen On 05/06/2015 03:12 AM, Jan Beulich wrote: >>>> On 05.05.15 at 18:11, <boris.ostrovsky@oracle.com> wrote: >> On 05/05/2015 11:58 AM, Jan Beulich wrote: >>>>>> On 05.05.15 at 17:46, <boris.ostrovsky@oracle.com> wrote: >>>> On 05/04/2015 05:14 AM, Andrew Cooper wrote: >>>>> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>>>> >>>>>>> void cacheline_flush(char * addr) >>>>>>> { >>>>>>> + mb(); >>>>>>> clflush(addr); >>>>>>> + mb(); >>>>>>> } >>>>>> I think the purpose of the flush is to force write back, not to evict >>>>>> the cache line, and if so wmb() would appear to be sufficient. As >>>>>> the SDM says that's not the case, a comment explaining why wmb() >>>>>> is not sufficient would seem necessary. Plus in the description I >>>>>> think "serializing" needs to be changed to "fencing", as serialization >>>>>> is not what we really care about here. If you and the maintainers >>>>>> agree, I could certainly fix up both aspects while committing. >>>>> On the subject of writebacks, we should get around to alternating-up the >>>>> use of clflushopt and clwb, either of which would be better than a >>>>> clflush in this case (avoiding the need for the leading mfence). >>>>> >>>>> However, the ISA extension document does not indicate which processors >>>>> will have support for these new instructions. >>>> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf >>>> >>>> We really should add support for this. On shutting down a very large >>>> guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. >>>> This was due to serializing nature of CLFLUSH. >>> But flushing the IOMMU isn't being done via CPU instructions, but >>> rather via commands sent to the IOMMU. I.e. I'm somewhat >>> confused by your reply. >> I didn't mean flushing IOMMU itself, sorry. I meant >> __iommu_flush_cache() (or whatever it's equivalent we had in the >> product, which was 4.1-based). > In that case I wonder how much of that flushing is really necessary > during IOMMU teardown. VT-d maintainers? I should mention that we saw this on 4.1, where iommu teardown is done in the domain destruction code path. After we backported code that moves this into a tasklet things got much better. Of course we still are doing flushing, even if in the background, and therefore system resources are still being consumed (and memory is not available until flushing is done). So if this is unnecessary then there are good reasons not to do it. -boris ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 15:46 ` Boris Ostrovsky 2015-05-05 15:58 ` Jan Beulich @ 2015-05-06 7:09 ` Chen, Tiejun 1 sibling, 0 replies; 33+ messages in thread From: Chen, Tiejun @ 2015-05-06 7:09 UTC (permalink / raw) To: Boris Ostrovsky, Andrew Cooper, Jan Beulich Cc: yang.z.zhang, jinsong.liu, kevin.tian, keir, xen-devel On 2015/5/5 23:46, Boris Ostrovsky wrote: > On 05/04/2015 05:14 AM, Andrew Cooper wrote: >> On 04/05/2015 09:52, Jan Beulich wrote: >>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>> void cacheline_flush(char * addr) >>>> { >>>> + mb(); >>>> clflush(addr); >>>> + mb(); >>>> } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >>> think "serializing" needs to be changed to "fencing", as serialization >>> is not what we really care about here. If you and the maintainers >>> agree, I could certainly fix up both aspects while committing. >> On the subject of writebacks, we should get around to alternating-up the >> use of clflushopt and clwb, either of which would be better than a >> clflush in this case (avoiding the need for the leading mfence). >> >> However, the ISA extension document does not indicate which processors >> will have support for these new instructions. > > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf > > We really should add support for this. On shutting down a very large > guest (hundreds of GB) we observed *minutes* spent in flushing IOMMU. > This was due to serializing nature of CLFLUSH. > CLFLUSHOPT/CLWB instructions are ordered only by store-fencing operations, so at least we still need SFENCE. Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 8:52 ` Jan Beulich 2015-05-04 9:14 ` Andrew Cooper @ 2015-05-04 10:39 ` Chen, Tiejun 2015-05-04 10:43 ` Jan Beulich 2015-05-04 15:22 ` Konrad Rzeszutek Wilk 1 sibling, 2 replies; 33+ messages in thread From: Chen, Tiejun @ 2015-05-04 10:39 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang On 2015/5/4 16:52, Jan Beulich wrote: >>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >> >> void cacheline_flush(char * addr) >> { >> + mb(); >> clflush(addr); >> + mb(); >> } > > I think the purpose of the flush is to force write back, not to evict > the cache line, and if so wmb() would appear to be sufficient. As > the SDM says that's not the case, a comment explaining why wmb() > is not sufficient would seem necessary. Plus in the description I Seems wmb() is not sufficient here. "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to be ordered by any other fencing, serializing or other CLFLUSH instruction." Thanks Tiejun > think "serializing" needs to be changed to "fencing", as serialization > is not what we really care about here. If you and the maintainers > agree, I could certainly fix up both aspects while committing. > > Jan > > > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 10:39 ` Chen, Tiejun @ 2015-05-04 10:43 ` Jan Beulich 2015-05-04 10:52 ` Tian, Kevin 2015-05-05 2:45 ` Chen, Tiejun 2015-05-04 15:22 ` Konrad Rzeszutek Wilk 1 sibling, 2 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-04 10:43 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang >>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote: > On 2015/5/4 16:52, Jan Beulich wrote: >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>> >>> void cacheline_flush(char * addr) >>> { >>> + mb(); >>> clflush(addr); >>> + mb(); >>> } >> >> I think the purpose of the flush is to force write back, not to evict >> the cache line, and if so wmb() would appear to be sufficient. As >> the SDM says that's not the case, a comment explaining why wmb() >> is not sufficient would seem necessary. Plus in the description I > > Seems wmb() is not sufficient here. > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > to be ordered by any other fencing, serializing or other CLFLUSH > instruction." Right - that's what I said in the second sentence. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 10:43 ` Jan Beulich @ 2015-05-04 10:52 ` Tian, Kevin 2015-05-04 11:26 ` Chen, Tiejun 2015-05-05 2:45 ` Chen, Tiejun 1 sibling, 1 reply; 33+ messages in thread From: Tian, Kevin @ 2015-05-04 10:52 UTC (permalink / raw) To: Jan Beulich, Chen, Tiejun Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3 > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, May 04, 2015 6:44 PM > > >>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote: > > On 2015/5/4 16:52, Jan Beulich wrote: > >>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: > >>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c > >>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > >>> > >>> void cacheline_flush(char * addr) > >>> { > >>> + mb(); > >>> clflush(addr); > >>> + mb(); > >>> } > >> > >> I think the purpose of the flush is to force write back, not to evict > >> the cache line, and if so wmb() would appear to be sufficient. As > >> the SDM says that's not the case, a comment explaining why wmb() > >> is not sufficient would seem necessary. Plus in the description I > > > > Seems wmb() is not sufficient here. > > > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > > to be ordered by any other fencing, serializing or other CLFLUSH > > instruction." > > Right - that's what I said in the second sentence. > btw why do we need two fences here? Suppose we just care about writes before the flush point... Thanks Kevin ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 10:52 ` Tian, Kevin @ 2015-05-04 11:26 ` Chen, Tiejun 2015-05-05 1:13 ` Tian, Kevin 0 siblings, 1 reply; 33+ messages in thread From: Chen, Tiejun @ 2015-05-04 11:26 UTC (permalink / raw) To: Tian, Kevin, Jan Beulich Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3 On 2015/5/4 18:52, Tian, Kevin wrote: >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Monday, May 04, 2015 6:44 PM >> >>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote: >>> On 2015/5/4 16:52, Jan Beulich wrote: >>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>>> >>>>> void cacheline_flush(char * addr) >>>>> { >>>>> + mb(); >>>>> clflush(addr); >>>>> + mb(); >>>>> } >>>> >>>> I think the purpose of the flush is to force write back, not to evict >>>> the cache line, and if so wmb() would appear to be sufficient. As >>>> the SDM says that's not the case, a comment explaining why wmb() >>>> is not sufficient would seem necessary. Plus in the description I >>> >>> Seems wmb() is not sufficient here. >>> >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed >>> to be ordered by any other fencing, serializing or other CLFLUSH >>> instruction." >> >> Right - that's what I said in the second sentence. >> > > btw why do we need two fences here? Suppose we just care about > writes before the flush point... > The first MFENCE guarantees all load/store visible before flush cache line. But the second MFENCE just makes sure CLFLUSH is not ordered by that ensuing load/store, right? Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 11:26 ` Chen, Tiejun @ 2015-05-05 1:13 ` Tian, Kevin 0 siblings, 0 replies; 33+ messages in thread From: Tian, Kevin @ 2015-05-05 1:13 UTC (permalink / raw) To: Chen, Tiejun, Jan Beulich Cc: Zhang, Yang Z, jinsong.liu, xen-devel, keir, andrew.cooper3 > From: Chen, Tiejun > Sent: Monday, May 04, 2015 7:26 PM > > On 2015/5/4 18:52, Tian, Kevin wrote: > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: Monday, May 04, 2015 6:44 PM > >> > >>>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote: > >>> On 2015/5/4 16:52, Jan Beulich wrote: > >>>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: > >>>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c > >>>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >>>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > >>>>> > >>>>> void cacheline_flush(char * addr) > >>>>> { > >>>>> + mb(); > >>>>> clflush(addr); > >>>>> + mb(); > >>>>> } > >>>> > >>>> I think the purpose of the flush is to force write back, not to evict > >>>> the cache line, and if so wmb() would appear to be sufficient. As > >>>> the SDM says that's not the case, a comment explaining why wmb() > >>>> is not sufficient would seem necessary. Plus in the description I > >>> > >>> Seems wmb() is not sufficient here. > >>> > >>> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed > >>> to be ordered by any other fencing, serializing or other CLFLUSH > >>> instruction." > >> > >> Right - that's what I said in the second sentence. > >> > > > > btw why do we need two fences here? Suppose we just care about > > writes before the flush point... > > > > The first MFENCE guarantees all load/store visible before flush cache > line. But the second MFENCE just makes sure CLFLUSH is not ordered by > that ensuing load/store, right? > It's not an usual case to have another store to same destination address right after CLFLUSH. Usually there's some protocol to do another update until device completes access. Store to different address is not a concern. But yes if just looking at this interface alone, we'd better not make assumption about the caller so two mb() look necessary... btw as Jan commented in another thread, if there's no other caller of this function, it might be more reasonable to expand it into the bigger loop in __iommu_flush_cache so you don't need to fence for each cache line flush. Thanks Kevin ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 10:43 ` Jan Beulich 2015-05-04 10:52 ` Tian, Kevin @ 2015-05-05 2:45 ` Chen, Tiejun 2015-05-05 9:24 ` Jan Beulich 1 sibling, 1 reply; 33+ messages in thread From: Chen, Tiejun @ 2015-05-05 2:45 UTC (permalink / raw) To: Jan Beulich, andrew.cooper3, kevin.tian, yang.z.zhang, Konrad Rzeszutek Wilk, boris.ostrovsky Cc: jinsong.liu, keir, xen-devel On 2015/5/4 18:43, Jan Beulich wrote: >>>> On 04.05.15 at 12:39, <tiejun.chen@intel.com> wrote: >> On 2015/5/4 16:52, Jan Beulich wrote: >>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>> >>>> void cacheline_flush(char * addr) >>>> { >>>> + mb(); >>>> clflush(addr); >>>> + mb(); >>>> } >>> >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >> >> Seems wmb() is not sufficient here. >> >> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed >> to be ordered by any other fencing, serializing or other CLFLUSH >> instruction." > > Right - that's what I said in the second sentence. > Thanks for all guys' comments. Does this work for everyone? xen/vt-d: need barriers to workaround CLFLUSH clflush is a light weight but not fencing instruction, so hence memory fence is necessary to make sure all load/store visible before flush cache line. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> diff --git a/xen/drivers/passthrough/vtd/iommu.c b/xen/drivers/passthrough/vtd/iommu.c index c7bda73..1248a17 100644 --- a/xen/drivers/passthrough/vtd/iommu.c +++ b/xen/drivers/passthrough/vtd/iommu.c @@ -170,8 +170,15 @@ static void __iommu_flush_cache(void *addr, unsigned int size) if ( clflush_size == 0 ) clflush_size = get_cache_line_size(); + /* + * CLFLUSH is only ordered by the MFENCE instruction. + * It is not guaranteed to be ordered by any other fencing, + * serializing or other CLFLUSH instruction. + */ + mb(); for ( i = 0; i < size; i += clflush_size ) cacheline_flush((char *)addr + i); + mb(); } void iommu_flush_cache_entry(void *addr, unsigned int size) Thanks Tiejun ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 2:45 ` Chen, Tiejun @ 2015-05-05 9:24 ` Jan Beulich 2015-05-06 6:47 ` Chen, Tiejun 0 siblings, 1 reply; 33+ messages in thread From: Jan Beulich @ 2015-05-05 9:24 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang, boris.ostrovsky >>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote: > Does this work for everyone? Please first of all explain why the interfaces in asm/flushtlb.h can't be used here (at least when flushing entire pages). Because - as said before - for a complete fix you'd need to deal with the CLFLUSH use(s) elsewhere in the system too. Btw, isn't the !iommus_incoherent check in that function inverted? I.e. why would we _not_ need to flush caches when IOMMUs are not coherent (and why would flushing be needed when they're coherent anyway)? Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-05 9:24 ` Jan Beulich @ 2015-05-06 6:47 ` Chen, Tiejun 2015-05-06 7:14 ` Jan Beulich 0 siblings, 1 reply; 33+ messages in thread From: Chen, Tiejun @ 2015-05-06 6:47 UTC (permalink / raw) To: Jan Beulich Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang, boris.ostrovsky On 2015/5/5 17:24, Jan Beulich wrote: >>>> On 05.05.15 at 04:45, <tiejun.chen@intel.com> wrote: >> Does this work for everyone? > > Please first of all explain why the interfaces in asm/flushtlb.h can't > be used here (at least when flushing entire pages). Because - as I also don't understand any reason we didn't use this previously on IOMMU side... > said before - for a complete fix you'd need to deal with the CLFLUSH > use(s) elsewhere in the system too. Looks we need to do this in xen/arch/x86/flushtlb.c:flush_area_local(). > > Btw, isn't the !iommus_incoherent check in that function inverted? > I.e. why would we _not_ need to flush caches when IOMMUs > are not coherent (and why would flushing be needed when they're > coherent anyway)? > I guess you're misunderstanding this if ( !ecap_coherent(iommu->ecap) ) iommus_incoherent = 1; So here !iommus_incoherent means IOMMU is coherent and then we don't need to flush cache in this case. Thanks Tiejun ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-06 6:47 ` Chen, Tiejun @ 2015-05-06 7:14 ` Jan Beulich 0 siblings, 0 replies; 33+ messages in thread From: Jan Beulich @ 2015-05-06 7:14 UTC (permalink / raw) To: Tiejun Chen Cc: kevin.tian, keir, jinsong.liu, xen-devel, andrew.cooper3, yang.z.zhang, boris.ostrovsky >>> On 06.05.15 at 08:47, <tiejun.chen@intel.com> wrote: > On 2015/5/5 17:24, Jan Beulich wrote: >> Btw, isn't the !iommus_incoherent check in that function inverted? >> I.e. why would we _not_ need to flush caches when IOMMUs >> are not coherent (and why would flushing be needed when they're >> coherent anyway)? >> > > I guess you're misunderstanding this > > if ( !ecap_coherent(iommu->ecap) ) > iommus_incoherent = 1; > > So here !iommus_incoherent means IOMMU is coherent and then we don't > need to flush cache in this case. Ah, indeed - I overlooked the "in" in the variable name (together with the thus resulting double negation in the expression). Sorry for the noise. Jan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 10:39 ` Chen, Tiejun 2015-05-04 10:43 ` Jan Beulich @ 2015-05-04 15:22 ` Konrad Rzeszutek Wilk 2015-05-04 15:33 ` Andrew Cooper 1 sibling, 1 reply; 33+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-05-04 15:22 UTC (permalink / raw) To: Chen, Tiejun, boris.ostrovsky Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich, andrew.cooper3, yang.z.zhang On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote: > On 2015/5/4 16:52, Jan Beulich wrote: > >>>>On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: > >>--- a/xen/drivers/passthrough/vtd/x86/vtd.c > >>+++ b/xen/drivers/passthrough/vtd/x86/vtd.c > >>@@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) > >> > >> void cacheline_flush(char * addr) > >> { > >>+ mb(); > >> clflush(addr); > >>+ mb(); > >> } > > > >I think the purpose of the flush is to force write back, not to evict > >the cache line, and if so wmb() would appear to be sufficient. As > >the SDM says that's not the case, a comment explaining why wmb() > >is not sufficient would seem necessary. Plus in the description I > > Seems wmb() is not sufficient here. > > "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to > be ordered by any other fencing, serializing or other CLFLUSH instruction." That is incorrect. We have observed that CLFLUSH instruction do serialize each other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline. Cc-ing Boris who discovered this. > > Thanks > Tiejun > > >think "serializing" needs to be changed to "fencing", as serialization > >is not what we really care about here. If you and the maintainers > >agree, I could certainly fix up both aspects while committing. > > > >Jan > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH 2015-05-04 15:22 ` Konrad Rzeszutek Wilk @ 2015-05-04 15:33 ` Andrew Cooper 0 siblings, 0 replies; 33+ messages in thread From: Andrew Cooper @ 2015-05-04 15:33 UTC (permalink / raw) To: Konrad Rzeszutek Wilk, Chen, Tiejun, boris.ostrovsky Cc: kevin.tian, keir, jinsong.liu, xen-devel, Jan Beulich, yang.z.zhang On 04/05/2015 16:22, Konrad Rzeszutek Wilk wrote: > On Mon, May 04, 2015 at 06:39:56PM +0800, Chen, Tiejun wrote: >> On 2015/5/4 16:52, Jan Beulich wrote: >>>>>> On 04.05.15 at 04:16, <tiejun.chen@intel.com> wrote: >>>> --- a/xen/drivers/passthrough/vtd/x86/vtd.c >>>> +++ b/xen/drivers/passthrough/vtd/x86/vtd.c >>>> @@ -56,7 +56,9 @@ unsigned int get_cache_line_size(void) >>>> >>>> void cacheline_flush(char * addr) >>>> { >>>> + mb(); >>>> clflush(addr); >>>> + mb(); >>>> } >>> I think the purpose of the flush is to force write back, not to evict >>> the cache line, and if so wmb() would appear to be sufficient. As >>> the SDM says that's not the case, a comment explaining why wmb() >>> is not sufficient would seem necessary. Plus in the description I >> Seems wmb() is not sufficient here. >> >> "CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed to >> be ordered by any other fencing, serializing or other CLFLUSH instruction." > That is incorrect. We have observed that CLFLUSH instruction do serialize each > other. That is if on a core you send a bunch of CLFLUSH it stalls the pipeline. > > Cc-ing Boris who discovered this. Simply stalling the pipeline says nothing about its ordering with respect to other instructions. It is plausible that certain processor pipelines employ stricter restrictions on CLFLUSH, but Xen must not assume that this is the case in general, especially as it is in direct contradiction to both the Intel and AMD instruction manuals. ~Andrew ^ permalink raw reply [flat|nested] 33+ messages in thread
end of thread, other threads:[~2015-05-06 15:11 UTC | newest] Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-04 2:16 [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Tiejun Chen 2015-05-04 2:16 ` [PATCH 2/3] xen/vt-d: mask interrupt message generation Tiejun Chen 2015-05-04 4:07 ` Zhang, Yang Z 2015-05-04 5:08 ` Chen, Tiejun 2015-05-04 6:41 ` Zhang, Yang Z 2015-05-04 8:57 ` Jan Beulich 2015-05-04 11:21 ` Chen, Tiejun 2015-05-04 2:16 ` [PATCH 3/3] xen/iommu: disable IOMMU engine completely before enter S5 Tiejun Chen 2015-05-04 9:00 ` Jan Beulich 2015-05-04 4:07 ` [PATCH 1/3] xen/vt-d: need barriers to workaround CLFLUSH Zhang, Yang Z 2015-05-04 8:52 ` Jan Beulich 2015-05-04 9:14 ` Andrew Cooper 2015-05-04 9:24 ` Jan Beulich 2015-05-05 1:13 ` Zhang, Yang Z 2015-05-05 15:46 ` Boris Ostrovsky 2015-05-05 15:58 ` Jan Beulich 2015-05-05 16:11 ` Boris Ostrovsky 2015-05-06 7:12 ` Jan Beulich 2015-05-06 7:26 ` Chen, Tiejun 2015-05-06 7:33 ` Jan Beulich 2015-05-06 15:11 ` Boris Ostrovsky 2015-05-06 7:09 ` Chen, Tiejun 2015-05-04 10:39 ` Chen, Tiejun 2015-05-04 10:43 ` Jan Beulich 2015-05-04 10:52 ` Tian, Kevin 2015-05-04 11:26 ` Chen, Tiejun 2015-05-05 1:13 ` Tian, Kevin 2015-05-05 2:45 ` Chen, Tiejun 2015-05-05 9:24 ` Jan Beulich 2015-05-06 6:47 ` Chen, Tiejun 2015-05-06 7:14 ` Jan Beulich 2015-05-04 15:22 ` Konrad Rzeszutek Wilk 2015-05-04 15:33 ` Andrew Cooper
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.