Re: [PATCH v2 3/4] iommu: elide flushing for higher order map/unmap operations

From: Paul Durrant <Paul.Durrant@citrix.com>
To: 'Jan Beulich' <JBeulich@suse.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Andrew Cooper <Andrew.Cooper3@citrix.com>,
	"Tim (Xen.org)" <tim@xen.org>,
	George Dunlap <George.Dunlap@citrix.com>,
	Julien Grall <julien.grall@arm.com>,
	Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>,
	xen-devel <xen-devel@lists.xenproject.org>,
	Ian Jackson <Ian.Jackson@citrix.com>,
	Brian Woods <brian.woods@amd.com>,
	Roger Pau Monne <roger.pau@citrix.com>
Subject: Re: [PATCH v2 3/4] iommu: elide flushing for higher order map/unmap operations
Date: Tue, 4 Dec 2018 15:36:12 +0000	[thread overview]
Message-ID: <0d6a78307bb5438582dd7e8e55a449db@AMSPEX02CL03.citrite.net> (raw)
In-Reply-To: <5C069A650200007800202BFD@prv1-mh.provo.novell.com>

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 04 December 2018 15:17
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Brian Woods <brian.woods@amd.com>; Suravee Suthikulpanit
> <suravee.suthikulpanit@amd.com>; Julien Grall <julien.grall@arm.com>;
> Andrew Cooper <Andrew.Cooper3@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Wei Liu <wei.liu2@citrix.com>; George Dunlap
> <George.Dunlap@citrix.com>; Ian Jackson <Ian.Jackson@citrix.com>; Kevin
> Tian <kevin.tian@intel.com>; Stefano Stabellini <sstabellini@kernel.org>;
> xen-devel <xen-devel@lists.xenproject.org>; Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>; Tim (Xen.org) <tim@xen.org>
> Subject: Re: [PATCH v2 3/4] iommu: elide flushing for higher order
> map/unmap operations
> 
> >>> On 03.12.18 at 18:40, <paul.durrant@citrix.com> wrote:
> > This patch removes any implicit flushing that occurs in the
> implementation
> > of map and unmap operations and adds new iommu_map/unmap() wrapper
> > functions. To maintain sematics of the iommu_legacy_map/unmap() wrapper
> > functions, these are modified to call the new wrapper functions and then
> > perform an explicit flush operation.
> >
> > Because VT-d currently performs two different types of flush dependent
> upon
> > whether a PTE is being modified versus merely added (i.e. replacing a
> non-
> > present PTE) 'iommu flush flags' are defined by this patch and the
> > iommu_ops map_page() and unmap_page() methods are modified to OR the
> type
> > of flush necessary for the PTE that has been populated or depopulated
> into
> > an accumulated flags value. The accumulated value can then be passed
> into
> > the explicit flush operation.
> >
> > The ARM SMMU implementations of map_page() and unmap_page() currently
> > perform no implicit flushing and therefore the modified methods do not
> > adjust the flush flags.
> 
> Which, however, likely is wrong. If we mean the flushing to be initiated
> by the arch- and vendor-independent wrappers, then all map/unmap
> backends should indicate the needed kind of flush. Granted this can be
> done later, if things are otherwise correct on Arm right now.
> 
> > --- a/xen/arch/arm/p2m.c
> > +++ b/xen/arch/arm/p2m.c
> > @@ -971,8 +971,17 @@ static int __p2m_set_entry(struct p2m_domain *p2m,
> >
> >      if ( need_iommu_pt_sync(p2m->domain) &&
> >           (lpae_is_valid(orig_pte) || lpae_is_valid(*entry)) )
> > +    {
> > +        unsigned int flush_flags = 0;
> > +
> > +        if ( lpae_is_valid(orig_pte) )
> > +            flush_flags |= IOMMU_FLUSHF_modified;
> > +        if ( lpae_is_valid(*entry) )
> > +            flush_flags |= IOMMU_FLUSHF_added;
> 
> Shouldn't this be "else if" with the meaning assigned to both
> types? From an abstract perspective I'd also expect that for
> a single mapping no more than one of the flags can come
> back set (through the iommu_ops interface).

That's not how I see it. My rationale is:

- present PTE made non-present, or modified -> IOMMU_FLUSHF_modified
- new PTE value is present -> IOMMU_FLUSHF_added

So, a single op can set any combination of bits and thus the above code does not use 'else if'.

> 
> > @@ -84,7 +86,7 @@ static bool set_iommu_pde_present(uint32_t *pde,
> unsigned long next_mfn,
> >
> >          if ( maddr_old != maddr_next || iw != old_w || ir != old_r ||
> >               old_level != next_level )
> > -            need_flush = true;
> > +            flush_flags = IOMMU_FLUSHF_modified;
> 
> Why uniformly "modified"?

Because the AMD IOMMU does require flushing for a non-present -> present transition AFAICT. The old code certainly implies this.

> 
> > @@ -645,11 +648,13 @@ static unsigned long flush_count(unsigned long
> dfn, unsigned int page_count,
> >  }
> >
> >  int amd_iommu_flush_iotlb_pages(struct domain *d, dfn_t dfn,
> > -                                unsigned int page_count)
> > +                                unsigned int page_count,
> > +                                unsigned int flush_flags)
> >  {
> >      unsigned long dfn_l = dfn_x(dfn);
> >
> >      ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
> > +    ASSERT(flush_flags & IOMMU_FLUSHF_modified);
> 
> Is this valid? What if a map operation solely re-established what
> was already there? Aiui in that case set_iommu_pde_present()
> would always return zero. Or take this (seeing that the generic
> wrapper has a zero check for the flush flags):

Yes, the ASSERT is there because this should never be called unless flush_flags != 0 (ensured by the wrapper) and the map code should only ever set IOMMU_FLUSHF_modified.

> 
> > @@ -692,6 +697,7 @@ int amd_iommu_reserve_domain_unity_map(struct domain
> *domain,
> >      unsigned long npages, i;
> >      unsigned long gfn;
> >      unsigned int flags = !!ir;
> > +    unsigned int flush_flags = 0;
> >      int rt = 0;
> >
> >      if ( iw )
> > @@ -703,11 +709,21 @@ int amd_iommu_reserve_domain_unity_map(struct
> domain *domain,
> >      {
> >          unsigned long frame = gfn + i;
> >
> > -        rt = amd_iommu_map_page(domain, _dfn(frame), _mfn(frame),
> flags);
> > +        rt = amd_iommu_map_page(domain, _dfn(frame), _mfn(frame),
> flags,
> > +                                &flush_flags);
> >          if ( rt != 0 )
> > -            return rt;
> > +            break;
> >      }
> > -    return 0;
> > +
> > +    /*
> > +     * The underlying implementation is void so the return value is
> > +     * meaningless and can hence be ignored.
> > +     */
> > +    while ( amd_iommu_flush_iotlb_pages(domain, _dfn(gfn), npages,
> > +                                        flush_flags) )
> > +        break;
> 
> Nothing here guarantees flush_flags to be non-zero.

Good point. I'll add a check.

> 
> > @@ -235,6 +236,9 @@ void __hwdom_init iommu_hwdom_init(struct domain *d)
> >                  process_pending_softirqs();
> >          }
> >
> > +        while ( !flush_flags && iommu_flush_all(d) )
> > +            break;
> 
> Is there a comment missing here explaining the seemingly odd
> loop?

I'm merely using the construct you suggested, but I can add a comment.

> 
> > @@ -381,6 +402,17 @@ int iommu_legacy_unmap(struct domain *d, dfn_t dfn,
> unsigned int page_order)
> >      return rc;
> >  }
> >
> > +int iommu_legacy_unmap(struct domain *d, dfn_t dfn, unsigned int
> page_order)
> > +{
> > +    unsigned int flush_flags = 0;
> > +    int rc = iommu_unmap(d, dfn, page_order, &flush_flags);
> > +
> > +    if ( !rc )
> > +        rc = iommu_flush(d, dfn, (1u << page_order), flush_flags);
> 
> No iommu_dont_flush_iotlb check needed here?

I thought the old VT-d unmap code ignored it, but I see it didn't so yes I do need to add the check.

> 
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -633,11 +633,14 @@ static int __must_check iommu_flush_iotlb(struct
> domain *d, dfn_t dfn,
> >
> >  static int __must_check iommu_flush_iotlb_pages(struct domain *d,
> >                                                  dfn_t dfn,
> > -                                                unsigned int
> page_count)
> > +                                                unsigned int
> page_count,
> > +                                                unsigned int
> flush_flags)
> >  {
> >      ASSERT(page_count && !dfn_eq(dfn, INVALID_DFN));
> > +    ASSERT(flush_flags);
> >
> > -    return iommu_flush_iotlb(d, dfn, 1, page_count);
> > +    return iommu_flush_iotlb(d, dfn, flush_flags &
> IOMMU_FLUSHF_modified,
> > +                             page_count);
> 
> Why the restriction to "modified"?

The parameter is a bool which should be true if an existing PTE was modified or false otherwise. I can make this !!(flush_flags & IOMMU_FLUSHF_modified) is you prefer.

> 
> > @@ -1825,15 +1825,18 @@ static int __must_check
> intel_iommu_map_page(struct domain *d,
> >      spin_unlock(&hd->arch.mapping_lock);
> >      unmap_vtd_domain_page(page);
> >
> > -    if ( !this_cpu(iommu_dont_flush_iotlb) )
> > -        rc = iommu_flush_iotlb(d, dfn, dma_pte_present(old), 1);
> > +    *flush_flags |= IOMMU_FLUSHF_added;
> > +    if ( dma_pte_present(old) )
> > +        *flush_flags |= IOMMU_FLUSHF_modified;
> 
> See my earlier comment as to only one of them to get set for an
> individual mapping.
> 
> > @@ -62,14 +61,15 @@ int arch_iommu_populate_page_table(struct domain *d)
> >          {
> >              unsigned long mfn = mfn_x(page_to_mfn(page));
> >              unsigned long gfn = mfn_to_gmfn(d, mfn);
> > +            unsigned int flush_flags = 0;
> >
> >              if ( gfn != gfn_x(INVALID_GFN) )
> >              {
> >                  ASSERT(!(gfn >> DEFAULT_DOMAIN_ADDRESS_WIDTH));
> >                  BUG_ON(SHARED_M2P(gfn));
> > -                rc = iommu_legacy_map(d, _dfn(gfn), _mfn(mfn),
> > -                                      PAGE_ORDER_4K, IOMMUF_readable |
> > -                                      IOMMUF_writable);
> > +                rc = iommu_map(d, _dfn(gfn), _mfn(mfn),
> > +                               PAGE_ORDER_4K, IOMMUF_readable |
> > +                               IOMMUF_writable, &flush_flags);
> >              }
> >              if ( rc )
> >              {
> > @@ -103,7 +103,6 @@ int arch_iommu_populate_page_table(struct domain *d)
> >      }
> >
> >      spin_unlock(&d->page_alloc_lock);
> > -    this_cpu(iommu_dont_flush_iotlb) = 0;
> >
> >      if ( !rc )
> >          rc = iommu_flush_all(d);
> 
> Would be nice to have a comment here clarifying why flush_flags
> doesn't get used.

Ok.

> 
> > @@ -249,6 +251,9 @@ void __hwdom_init arch_iommu_hwdom_init(struct
> domain *d)
> >          if (!(i & 0xfffff))
> >              process_pending_softirqs();
> >      }
> > +
> > +    if ( !flush_flags && iommu_flush_all(d) )
> > +        return;
> >  }
> 
> Again please attach a brief comment explaining the seemingly
> strange construct.
> 

Ok.

> > --- a/xen/include/xen/iommu.h
> > +++ b/xen/include/xen/iommu.h
> > @@ -93,6 +93,22 @@ void iommu_teardown(struct domain *d);
> >  #define _IOMMUF_writable 1
> >  #define IOMMUF_writable  (1u<<_IOMMUF_writable)
> >
> > +enum
> > +{
> 
> Brace on the same line as "enum" please, just like for struct/union. When
> they're named this helps finding the place where a certain type gets
> (fully) declared.

Ok.

  Paul

> 
> Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel