All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
@ 2018-04-18  4:51 Peter Xu
  2018-04-18  5:28 ` Peter Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Peter Xu @ 2018-04-18  4:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jintack Lim, Eric Auger, peterx, Jason Wang, Michael S . Tsirkin,
	Alex Williamson, Alexander Witte

During IOVA page table walk, there is a special case when:

- notify_unmap is set, meanwhile
- entry is invalid

In the past, we skip the entry always.  This is not correct.  We should
send UNMAP notification to registered notifiers in this case.  Otherwise
some stall pages will still be mapped in the host even if L1 guest
unmapped them already.

Without this patch, nested device assignment to L2 guests might dump
some errors like:

qemu-system-x86_64: VFIO_MAP_DMA: -17
qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
                    0x7f89a920d000) = -17 (File exists)

To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
affected by this problem).

Signed-off-by: Peter Xu <peterx@redhat.com>
---

To test nested assignment, one also needs to apply below patchset:
https://lkml.org/lkml/2018/4/18/5
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index fb31de9416..b359efd6f9 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
 
 typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
 
+static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
+                             vtd_page_walk_hook hook_fn, void *private)
+{
+    assert(hook_fn);
+    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
+                            entry->addr_mask, entry->perm);
+    return hook_fn(entry, private);
+}
+
 /**
  * vtd_page_walk_level - walk over specific level for IOVA range
  *
@@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
          */
         entry_valid = read_cur | write_cur;
 
+        entry.target_as = &address_space_memory;
+        entry.iova = iova & subpage_mask;
+        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+        entry.addr_mask = ~subpage_mask;
+
         if (vtd_is_last_slpte(slpte, level)) {
-            entry.target_as = &address_space_memory;
-            entry.iova = iova & subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
             entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
-            entry.addr_mask = ~subpage_mask;
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
             if (!entry_valid && !notify_unmap) {
                 trace_vtd_page_walk_skip_perm(iova, iova_next);
                 goto next;
             }
-            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
-                                    entry.addr_mask, entry.perm);
-            if (hook_fn) {
-                ret = hook_fn(&entry, private);
-                if (ret < 0) {
-                    return ret;
-                }
+            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+            if (ret < 0) {
+                return ret;
             }
         } else {
             if (!entry_valid) {
-                trace_vtd_page_walk_skip_perm(iova, iova_next);
+                if (notify_unmap) {
+                    /*
+                     * The whole entry is invalid; unmap it all.
+                     * Translated address is meaningless, zero it.
+                     */
+                    entry.translated_addr = 0x0;
+                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
+                    if (ret < 0) {
+                        return ret;
+                    }
+                } else {
+                    trace_vtd_page_walk_skip_perm(iova, iova_next);
+                }
                 goto next;
             }
             ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
@ 2018-04-18  5:28 ` Peter Xu
  2018-04-18  5:29 ` Liu, Yi L
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2018-04-18  5:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: Michael S . Tsirkin, Jason Wang, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim

On Wed, Apr 18, 2018 at 12:51:21PM +0800, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid
> 
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest
> unmapped them already.
> 
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
> 
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                     0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>

This should really be 2.12 material, it fixes a real bug, but not sure
whether it's too late already.  Michael, what do you think?

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
  2018-04-18  5:28 ` Peter Xu
@ 2018-04-18  5:29 ` Liu, Yi L
  2018-04-18  6:26   ` Peter Xu
  2018-04-20  4:57 ` Jason Wang
  2018-04-20  8:30 ` Jason Wang
  3 siblings, 1 reply; 8+ messages in thread
From: Liu, Yi L @ 2018-04-18  5:29 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Michael S . Tsirkin, Jason Wang, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim

> Sent: Wednesday, April 18, 2018 12:51 PM
> Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> set
> 
> During IOVA page table walk, there is a special case when:
> 
> - notify_unmap is set, meanwhile
> - entry is invalid

This is very brief description, would you mind talk a little bit more.
 
> In the past, we skip the entry always.  This is not correct.  We should send UNMAP
> notification to registered notifiers in this case.  Otherwise some stall pages will still
> be mapped in the host even if L1 guest unmapped them already.
>
> Without this patch, nested device assignment to L2 guests might dump some errors
> like:

Should it be physical device assigned from L0 host? Or emulated devices could also
trigger this problem?

> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                     0x7f89a920d000) = -17 (File exists)
> 
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected by this
> problem).

Does this fix also apply to L0 QEMU?
 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> 
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5
> ---
>  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>  1 file changed, 30 insertions(+), 12 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t
> iova, bool is_write,
> 
>  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> 
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>  /**
>   * vtd_page_walk_level - walk over specific level for IOVA range
>   *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> start,
>           */
>          entry_valid = read_cur | write_cur;
> 
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>          if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>              /* NOTE: this is only meaningful if entry_valid == true */
>              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>              if (!entry_valid && !notify_unmap) {
>                  trace_vtd_page_walk_skip_perm(iova, iova_next);
>                  goto next;
>              }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>              }
>          } else {
>              if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                  goto next;
>              }
>              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> --
> 2.14.3
> 

Thanks,
Yi Liu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-18  5:29 ` Liu, Yi L
@ 2018-04-18  6:26   ` Peter Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Xu @ 2018-04-18  6:26 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: qemu-devel, Michael S . Tsirkin, Jason Wang, Eric Auger,
	Alex Williamson, Alexander Witte, Jintack Lim

On Wed, Apr 18, 2018 at 05:29:56AM +0000, Liu, Yi L wrote:
> > Sent: Wednesday, April 18, 2018 12:51 PM
> > Subject: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap
> > set
> > 
> > During IOVA page table walk, there is a special case when:
> > 
> > - notify_unmap is set, meanwhile
> > - entry is invalid
> 
> This is very brief description, would you mind talk a little bit more.

It means the case when the program reaches [1] below.

>  
> > In the past, we skip the entry always.  This is not correct.  We should send UNMAP
> > notification to registered notifiers in this case.  Otherwise some stall pages will still
> > be mapped in the host even if L1 guest unmapped them already.
> >
> > Without this patch, nested device assignment to L2 guests might dump some errors
> > like:
> 
> Should it be physical device assigned from L0 host? Or emulated devices could also
> trigger this problem?

If using emulated devices, we possibly need three levels, so I think
we can also see this warning if you assign a emulated device from L1
guest to L2 then to L3, and you should be able to see this warning
dumped from the QEMU that runs L2.

> 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                     0x7f89a920d000) = -17 (File exists)
> > 
> > To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not affected by this
> > problem).
> 
> Does this fix also apply to L0 QEMU?

Sorry I wasn't clear.  When I say L1 QEMU I did mean the QEMU that
runs as L1.  I believe it means your "L0 QEMU" here.

And yes, this fix should also be valid even if without nesting,
however we can hardly trigger this (that's why I just found it
recently when people reported nested breakage, since it is hardly seen
without nested), but AFAIU it's still possible.

>  
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > To test nested assignment, one also needs to apply below patchset:
> > https://lkml.org/lkml/2018/4/18/5
> > ---
> >  hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
> >  1 file changed, 30 insertions(+), 12 deletions(-)
> > 
> > diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index
> > fb31de9416..b359efd6f9 100644
> > --- a/hw/i386/intel_iommu.c
> > +++ b/hw/i386/intel_iommu.c
> > @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t
> > iova, bool is_write,
> > 
> >  typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
> > 
> > +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> > +                             vtd_page_walk_hook hook_fn, void *private)
> > +{
> > +    assert(hook_fn);
> > +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> > +                            entry->addr_mask, entry->perm);
> > +    return hook_fn(entry, private);
> > +}
> > +
> >  /**
> >   * vtd_page_walk_level - walk over specific level for IOVA range
> >   *
> > @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t
> > start,
> >           */
> >          entry_valid = read_cur | write_cur;
> > 
> > +        entry.target_as = &address_space_memory;
> > +        entry.iova = iova & subpage_mask;
> > +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> > +        entry.addr_mask = ~subpage_mask;
> > +
> >          if (vtd_is_last_slpte(slpte, level)) {
> > -            entry.target_as = &address_space_memory;
> > -            entry.iova = iova & subpage_mask;
> >              /* NOTE: this is only meaningful if entry_valid == true */
> >              entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> > -            entry.addr_mask = ~subpage_mask;
> > -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >              if (!entry_valid && !notify_unmap) {
> >                  trace_vtd_page_walk_skip_perm(iova, iova_next);
> >                  goto next;
> >              }
> > -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> > -                                    entry.addr_mask, entry.perm);
> > -            if (hook_fn) {
> > -                ret = hook_fn(&entry, private);
> > -                if (ret < 0) {
> > -                    return ret;
> > -                }
> > +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +            if (ret < 0) {
> > +                return ret;
> >              }
> >          } else {
> >              if (!entry_valid) {

[1]

> > -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                if (notify_unmap) {
> > +                    /*
> > +                     * The whole entry is invalid; unmap it all.
> > +                     * Translated address is meaningless, zero it.
> > +                     */
> > +                    entry.translated_addr = 0x0;
> > +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> > +                    if (ret < 0) {
> > +                        return ret;
> > +                    }
> > +                } else {
> > +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> > +                }
> >                  goto next;
> >              }
> >              ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,
> > --
> > 2.14.3
> > 
> 
> Thanks,
> Yi Liu

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
  2018-04-18  5:28 ` Peter Xu
  2018-04-18  5:29 ` Liu, Yi L
@ 2018-04-20  4:57 ` Jason Wang
  2018-04-20  5:11   ` Peter Xu
  2018-04-20  8:30 ` Jason Wang
  3 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2018-04-20  4:57 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Michael S . Tsirkin, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim



On 2018年04月18日 12:51, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
>
> - notify_unmap is set, meanwhile
> - entry is invalid
>
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest

You mean 'stale' here?

> unmapped them already.

It looks like some IOTLB invalidation were missed? If not, the page 
should be unmaped during invalidation.

>
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
>
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                      0x7f89a920d000) = -17 (File exists)
>
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5

Have a quick glance at it, looks like there's no need for this patch is 
we had that kernel patch applied.

Thanks

> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>   1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
>   
>   typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>   
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>   /**
>    * vtd_page_walk_level - walk over specific level for IOVA range
>    *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>            */
>           entry_valid = read_cur | write_cur;
>   
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>           if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>               /* NOTE: this is only meaningful if entry_valid == true */
>               entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>               if (!entry_valid && !notify_unmap) {
>                   trace_vtd_page_walk_skip_perm(iova, iova_next);
>                   goto next;
>               }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>               }
>           } else {
>               if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                   goto next;
>               }
>               ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-20  4:57 ` Jason Wang
@ 2018-04-20  5:11   ` Peter Xu
  2018-04-20  8:29     ` Jason Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Xu @ 2018-04-20  5:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: qemu-devel, Michael S . Tsirkin, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim

On Fri, Apr 20, 2018 at 12:57:13PM +0800, Jason Wang wrote:
> 
> 
> On 2018年04月18日 12:51, Peter Xu wrote:
> > During IOVA page table walk, there is a special case when:
> > 
> > - notify_unmap is set, meanwhile
> > - entry is invalid
> > 
> > In the past, we skip the entry always.  This is not correct.  We should
> > send UNMAP notification to registered notifiers in this case.  Otherwise
> > some stall pages will still be mapped in the host even if L1 guest
> 
> You mean 'stale' here?

Ah, yes.

> 
> > unmapped them already.
> 
> It looks like some IOTLB invalidation were missed? If not, the page should
> be unmaped during invalidation.

I'm afraid it's not the same problem, and that's why I think we need
two fixes.

Even if the guest sends the correct invalidations, it's still possible
that QEMU skips the PSI with current code.

> 
> > 
> > Without this patch, nested device assignment to L2 guests might dump
> > some errors like:
> > 
> > qemu-system-x86_64: VFIO_MAP_DMA: -17
> > qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
> >                      0x7f89a920d000) = -17 (File exists)
> > 
> > To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> > affected by this problem).
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> > 
> > To test nested assignment, one also needs to apply below patchset:
> > https://lkml.org/lkml/2018/4/18/5
> 
> Have a quick glance at it, looks like there's no need for this patch is we
> had that kernel patch applied.

Even if guest kernel applies above fix, QEMU might still drop some of
the PSIs.  Above error messages were captured with kernel patches
applied already (otherwise we see no error messages, we directly miss
them from guest).

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-20  5:11   ` Peter Xu
@ 2018-04-20  8:29     ` Jason Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-04-20  8:29 UTC (permalink / raw)
  To: Peter Xu
  Cc: qemu-devel, Michael S . Tsirkin, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim



On 2018年04月20日 13:11, Peter Xu wrote:
> On Fri, Apr 20, 2018 at 12:57:13PM +0800, Jason Wang wrote:
>> On 2018年04月18日 12:51, Peter Xu wrote:
>>> During IOVA page table walk, there is a special case when:
>>>
>>> - notify_unmap is set, meanwhile
>>> - entry is invalid
>>>
>>> In the past, we skip the entry always.  This is not correct.  We should
>>> send UNMAP notification to registered notifiers in this case.  Otherwise
>>> some stall pages will still be mapped in the host even if L1 guest
>> You mean 'stale' here?
> Ah, yes.
>
>>> unmapped them already.
>> It looks like some IOTLB invalidation were missed? If not, the page should
>> be unmaped during invalidation.
> I'm afraid it's not the same problem, and that's why I think we need
> two fixes.
>
> Even if the guest sends the correct invalidations, it's still possible
> that QEMU skips the PSI with current code.
>
>>> Without this patch, nested device assignment to L2 guests might dump
>>> some errors like:
>>>
>>> qemu-system-x86_64: VFIO_MAP_DMA: -17
>>> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>>>                       0x7f89a920d000) = -17 (File exists)
>>>
>>> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
>>> affected by this problem).
>>>
>>> Signed-off-by: Peter Xu<peterx@redhat.com>
>>> ---
>>>
>>> To test nested assignment, one also needs to apply below patchset:
>>> https://lkml.org/lkml/2018/4/18/5
>> Have a quick glance at it, looks like there's no need for this patch is we
>> had that kernel patch applied.
> Even if guest kernel applies above fix, QEMU might still drop some of
> the PSIs.  Above error messages were captured with kernel patches
> applied already (otherwise we see no error messages, we directly miss
> them from guest).
>
> Thanks,

I see so this actually try to trigger unmap notifier for non last level 
slpte.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set
  2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
                   ` (2 preceding siblings ...)
  2018-04-20  4:57 ` Jason Wang
@ 2018-04-20  8:30 ` Jason Wang
  3 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-04-20  8:30 UTC (permalink / raw)
  To: Peter Xu, qemu-devel
  Cc: Michael S . Tsirkin, Eric Auger, Alex Williamson,
	Alexander Witte, Jintack Lim



On 2018年04月18日 12:51, Peter Xu wrote:
> During IOVA page table walk, there is a special case when:
>
> - notify_unmap is set, meanwhile
> - entry is invalid
>
> In the past, we skip the entry always.  This is not correct.  We should
> send UNMAP notification to registered notifiers in this case.  Otherwise
> some stall pages will still be mapped in the host even if L1 guest
> unmapped them already.
>
> Without this patch, nested device assignment to L2 guests might dump
> some errors like:
>
> qemu-system-x86_64: VFIO_MAP_DMA: -17
> qemu-system-x86_64: vfio_dma_map(0x557305420c30, 0xad000, 0x1000,
>                      0x7f89a920d000) = -17 (File exists)
>
> To fix this, we need to apply this patch to L1 QEMU (L2 QEMU is not
> affected by this problem).
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>
> To test nested assignment, one also needs to apply below patchset:
> https://lkml.org/lkml/2018/4/18/5
> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++------------
>   1 file changed, 30 insertions(+), 12 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index fb31de9416..b359efd6f9 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -722,6 +722,15 @@ static int vtd_iova_to_slpte(VTDContextEntry *ce, uint64_t iova, bool is_write,
>   
>   typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
>   
> +static int vtd_page_walk_one(IOMMUTLBEntry *entry, int level,
> +                             vtd_page_walk_hook hook_fn, void *private)
> +{
> +    assert(hook_fn);
> +    trace_vtd_page_walk_one(level, entry->iova, entry->translated_addr,
> +                            entry->addr_mask, entry->perm);
> +    return hook_fn(entry, private);
> +}
> +
>   /**
>    * vtd_page_walk_level - walk over specific level for IOVA range
>    *
> @@ -781,28 +790,37 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
>            */
>           entry_valid = read_cur | write_cur;
>   
> +        entry.target_as = &address_space_memory;
> +        entry.iova = iova & subpage_mask;
> +        entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> +        entry.addr_mask = ~subpage_mask;
> +
>           if (vtd_is_last_slpte(slpte, level)) {
> -            entry.target_as = &address_space_memory;
> -            entry.iova = iova & subpage_mask;
>               /* NOTE: this is only meaningful if entry_valid == true */
>               entry.translated_addr = vtd_get_slpte_addr(slpte, aw);
> -            entry.addr_mask = ~subpage_mask;
> -            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
>               if (!entry_valid && !notify_unmap) {
>                   trace_vtd_page_walk_skip_perm(iova, iova_next);
>                   goto next;
>               }
> -            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> -                                    entry.addr_mask, entry.perm);
> -            if (hook_fn) {
> -                ret = hook_fn(&entry, private);
> -                if (ret < 0) {
> -                    return ret;
> -                }
> +            ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +            if (ret < 0) {
> +                return ret;
>               }
>           } else {
>               if (!entry_valid) {
> -                trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                if (notify_unmap) {
> +                    /*
> +                     * The whole entry is invalid; unmap it all.
> +                     * Translated address is meaningless, zero it.
> +                     */
> +                    entry.translated_addr = 0x0;
> +                    ret = vtd_page_walk_one(&entry, level, hook_fn, private);
> +                    if (ret < 0) {
> +                        return ret;
> +                    }
> +                } else {
> +                    trace_vtd_page_walk_skip_perm(iova, iova_next);
> +                }
>                   goto next;
>               }
>               ret = vtd_page_walk_level(vtd_get_slpte_addr(slpte, aw), iova,

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-04-20  8:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-18  4:51 [Qemu-devel] [PATCH] intel-iommu: send PSI always when notify_unmap set Peter Xu
2018-04-18  5:28 ` Peter Xu
2018-04-18  5:29 ` Liu, Yi L
2018-04-18  6:26   ` Peter Xu
2018-04-20  4:57 ` Jason Wang
2018-04-20  5:11   ` Peter Xu
2018-04-20  8:29     ` Jason Wang
2018-04-20  8:30 ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.