RE: [PATCH v10 7/7] vtd: use a bit field for dma_pte

From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jan Beulich <jbeulich@suse.com>, Paul Durrant <paul@xen.org>
Cc: Paul Durrant <pdurrant@amazon.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: RE: [PATCH v10 7/7] vtd: use a bit field for dma_pte
Date: Mon, 30 Nov 2020 05:29:04 +0000	[thread overview]
Message-ID: <MWHPR11MB16456F6B57ED225135D2EC218CF50@MWHPR11MB1645.namprd11.prod.outlook.com> (raw)
In-Reply-To: <24774b4e-3ae8-2941-24ee-722acea69657@suse.com>

> From: Beulich
> Sent: Saturday, November 28, 2020 12:02 AM
> 
> On 20.11.2020 14:24, Paul Durrant wrote:
> > @@ -709,20 +709,23 @@ static void dma_pte_clear_one(struct domain
> *domain, uint64_t addr,
> >      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >      pte = page + address_level_offset(addr, 1);
> >
> > -    if ( !dma_pte_present(*pte) )
> > +    if ( !pte->r && !pte->w )
> 
> I think dma_pte_present() wants to stay, so we would have to touch
> only one place when adding support for x.
> 
> >      {
> >          spin_unlock(&hd->arch.mapping_lock);
> >          unmap_vtd_domain_page(page);
> >          return;
> >      }
> >
> > -    dma_clear_pte(*pte);
> > -    *flush_flags |= IOMMU_FLUSHF_modified;
> > +    pte->r = pte->w = false;
> > +    smp_wmb();
> > +    pte->val = 0;
> >
> >      spin_unlock(&hd->arch.mapping_lock);
> >      iommu_sync_cache(pte, sizeof(struct dma_pte));
> 
> Just as an observation - in an earlier patch I think there was a
> code sequence having these the other way around. I think we want
> to settle one one way of doing this (flush then unlock, or unlock
> then flush). Kevin?
> 

Agree. Generally speaking 'unlock then flush' is preferred since
spinlock doesn't protect iommu anyway.

> > @@ -1775,15 +1778,12 @@ static int __must_check
> intel_iommu_map_page(struct domain *d, dfn_t dfn,
> >      page = (struct dma_pte *)map_vtd_domain_page(pg_maddr);
> >      pte = &page[dfn_x(dfn) & LEVEL_MASK];
> >      old = *pte;
> > -
> > -    dma_set_pte_addr(new, mfn_to_maddr(mfn));
> > -    dma_set_pte_prot(new,
> > -                     ((flags & IOMMUF_readable) ? DMA_PTE_READ  : 0) |
> > -                     ((flags & IOMMUF_writable) ? DMA_PTE_WRITE : 0));
> > -
> > -    /* Set the SNP on leaf page table if Snoop Control available */
> > -    if ( iommu_snoop )
> > -        dma_set_pte_snp(new);
> > +    new = (struct dma_pte){
> > +        .r = flags & IOMMUF_readable,
> > +        .w = flags & IOMMUF_writable,
> > +        .snp = iommu_snoop,
> > +        .addr = mfn_x(mfn),
> > +    };
> 
> We still haven't settled on a newer gcc baseline, so this kind of
> initializer is still not allowed (as in: will break the build) for
> struct-s with unnamed sub-struct-s / sub-union-s.
> 
> > @@ -2611,18 +2611,18 @@ static void
> vtd_dump_page_table_level(paddr_t pt_maddr, int level, paddr_t gpa,
> >              process_pending_softirqs();
> >
> >          pte = &pt_vaddr[i];
> > -        if ( !dma_pte_present(*pte) )
> > +        if ( !pte->r && !pte->w )
> >              continue;
> >
> >          address = gpa + offset_level_address(i, level);
> >          if ( next_level >= 1 )
> > -            vtd_dump_page_table_level(dma_pte_addr(*pte), next_level,
> > +            vtd_dump_page_table_level(pfn_to_paddr(pte->addr), next_level,
> >                                        address, indent + 1);
> >          else
> >              printk("%*sdfn: %08lx mfn: %08lx\n",
> >                     indent, "",
> >                     (unsigned long)(address >> PAGE_SHIFT_4K),
> > -                   (unsigned long)(dma_pte_addr(*pte) >> PAGE_SHIFT_4K));
> > +                   (unsigned long)(pte->addr));
> 
> Could you also drop the no longer needed pair of parentheses. I
> further suspect the cast isn't needed (anymore?). (Otoh I think
> I recall oddities with gcc's printf()-style format checking and
> direct passing of bitfields. But if that's a problem, I think
> one of the earlier ones already introduced such an issue. So
> perhaps we can wait until someone actually confirms there is an
> issue - quite likely this someone would be me anyway.)
> 
> > --- a/xen/drivers/passthrough/vtd/iommu.h
> > +++ b/xen/drivers/passthrough/vtd/iommu.h
> > @@ -244,38 +244,21 @@ struct context_entry {
> >  #define level_size(l) (1 << level_to_offset_bits(l))
> >  #define align_to_level(addr, l) ((addr + level_size(l) - 1) & level_mask(l))
> >
> > -/*
> > - * 0: readable
> > - * 1: writable
> > - * 2-6: reserved
> > - * 7: super page
> > - * 8-11: available
> > - * 12-63: Host physcial address
> > - */
> >  struct dma_pte {
> > -    u64 val;
> > +    union {
> > +        uint64_t val;
> > +        struct {
> > +            bool r:1;
> > +            bool w:1;
> > +            unsigned int reserved0:1;

'X' bit is ignored instead of reserved when execute request is not
reported or disabled.

Thanks
Kevin

> > +            unsigned int ignored0:4;
> > +            bool ps:1;
> > +            unsigned int ignored1:3;
> > +            bool snp:1;
> > +            uint64_t addr:52;
> 
> As per the doc I look at this extends only to bit 51 at most.
> Above are 11 ignored bits and (in leaf entries) the TM one.
> 
> Considering the differences between leaf and intermediate
> entries, perhaps leaf-only fields could gain a brief comment?
> 
> Jan