Re: [Xen-devel] [PATCH v2] x86/domain_page: implement pure per-vCPU mapping infrastructure

From: "Xia, Hongyan" <hongyxia@amazon.com>
To: "wl@xen.org" <wl@xen.org>
Cc: "andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
	"Grall, Julien" <jgrall@amazon.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"jbeulich@suse.com" <jbeulich@suse.com>,
	"roger.pau@citrix.com" <roger.pau@citrix.com>
Subject: Re: [Xen-devel] [PATCH v2] x86/domain_page: implement pure per-vCPU mapping infrastructure
Date: Fri, 21 Feb 2020 12:52:07 +0000	[thread overview]
Message-ID: <2326943257e6f04bc9a858ef5667295651395c85.camel@amazon.com> (raw)
In-Reply-To: <20200221115017.tuo2i5db5mhd5yyt@debian>

On Fri, 2020-02-21 at 11:50 +0000, Wei Liu wrote:
> On Thu, Feb 06, 2020 at 06:58:23PM +0000, Hongyan Xia wrote:
> > ...
> > 
> > Benchmarks
> > (baseline uses direct map instead of the mapcache in
> > map_domain_page,
> > old is the existing mapcache and new is after this patch):
> > 
> > Geekbench on a 32-vCPU guest,
> > 
> > performance drop     old        new
> > single core         0.04%      0.18%
> 
> Do you know why the new mapcache has more overhead than the old one
> in
> the single core case?

To be honest I think this is within noise territory. The benchmarks
were run 5 times, I can average over even more runs to confirm.

> [...]
> 
> >  void unmap_domain_page(const void *ptr)
> >  {
> > -    unsigned int idx;
> > +    unsigned int idx, glb_idx;
> >      struct vcpu *v;
> > -    struct mapcache_domain *dcache;
> > -    unsigned long va = (unsigned long)ptr, mfn, flags;
> > -    struct vcpu_maphash_entry *hashent;
> > +    struct mapcache_vcpu *vcache;
> > +    unsigned long va = (unsigned long)ptr, mfn, hashmfn, flags;
> >  
> >      if ( va >= DIRECTMAP_VIRT_START )
> >          return;
> > @@ -189,114 +144,78 @@ void unmap_domain_page(const void *ptr)
> >      v = mapcache_current_vcpu();
> >      ASSERT(v && is_pv_vcpu(v));
> >  
> > -    dcache = &v->domain->arch.pv.mapcache;
> > -    ASSERT(dcache->inuse);
> > +    vcache = v->arch.pv.mapcache;
> > +    ASSERT(vcache);
> >  
> > -    idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> > -    mfn = l1e_get_pfn(MAPCACHE_L1ENT(idx));
> > -    hashent = &v->arch.pv.mapcache.hash[MAPHASH_HASHFN(mfn)];
> 
> Please also assert the va in question is within the range allocated
> to
> this vcpu.

Sure.

> > +    glb_idx = PFN_DOWN(va - MAPCACHE_VIRT_START);
> > +    idx = glb_idx - v->vcpu_id * MAPCACHE_VCPU_ENTRIES;
> >  
> >      local_irq_save(flags);
> >  
> > -    if ( hashent->idx == idx )
> > -    {
> > -        ASSERT(hashent->mfn == mfn);
> > -        ASSERT(hashent->refcnt);
> > -        hashent->refcnt--;
> > -    }
> > -    else if ( !hashent->refcnt )
> > -    {
> > -        if ( hashent->idx != MAPHASHENT_NOTINUSE )
> > -        {
> > -            /* /First/, zap the PTE. */
> > -            ASSERT(l1e_get_pfn(MAPCACHE_L1ENT(hashent->idx)) ==
> > -                   hashent->mfn);
> > -            l1e_write(&MAPCACHE_L1ENT(hashent->idx), l1e_empty());
> > -            /* /Second/, mark as garbage. */
> > -            set_bit(hashent->idx, dcache->garbage);
> > -        }
> > -
> > -        /* Add newly-freed mapping to the maphash. */
> > -        hashent->mfn = mfn;
> > -        hashent->idx = idx;
> > -    }
> > -    else
> > -    {
> > -        /* /First/, zap the PTE. */
> > -        l1e_write(&MAPCACHE_L1ENT(idx), l1e_empty());
> > -        /* /Second/, mark as garbage. */
> > -        set_bit(idx, dcache->garbage);
> > -    }
> > +    mfn = l1e_get_pfn(MAPCACHE_L1ENT(glb_idx));
> > +    hashmfn = vcache->hash_mfn[MAPHASH_HASHFN(mfn)];
> > +
> > +    vcache->refcnt[idx]--;
> > +    /* If refcnt drops to 0, we only clear inuse when it's not in
> > the maphash. */
> 
> It would be clearer to "when it has been evicted from maphash" to
> match
> the terminology in the map routine.

Will reword in next revision.

> > +    if ( hashmfn != mfn && !vcache->refcnt[idx] )
> > +        __clear_bit(idx, vcache->inuse);
> 
> Also, please flush the linear address here and the other __clear_bit
> location.

I flush when a new entry is taking a slot. Yeah, it's probably better
to flush earlier whenever a slot is no longer in use.

> >  
> >      local_irq_restore(flags);
> >  }
> >  
> > -int mapcache_domain_init(struct domain *d)
> > +int mapcache_vcpu_init(struct vcpu *v)
> >  {
> > +    struct domain *d = v->domain;
> >      struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> > -    unsigned int bitmap_pages;
> > -
> > -    ASSERT(is_pv_domain(d));
> > +    unsigned long i;
> 
> Since you're changing this anyway -- we normally use unsigned int as
> index type.
> 
> > +    unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
> >  
> >  #ifdef NDEBUG
> >      if ( !mem_hotplug && max_page <=
> > PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
> >          return 0;
> >  #endif
> >  
> > -    BUILD_BUG_ON(MAPCACHE_VIRT_END + PAGE_SIZE * (3 +
> > -                 2 * PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) *
> > sizeof(long))) >
> > -                 MAPCACHE_VIRT_START + (PERDOMAIN_SLOT_MBYTES <<
> > 20));
> > -    bitmap_pages = PFN_UP(BITS_TO_LONGS(MAPCACHE_ENTRIES) *
> > sizeof(long));
> > -    dcache->inuse = (void *)MAPCACHE_VIRT_END + PAGE_SIZE;
> > -    dcache->garbage = dcache->inuse +
> > -                      (bitmap_pages + 1) * PAGE_SIZE /
> > sizeof(long);
> > -
> > -    spin_lock_init(&dcache->lock);
> > -
> > -    return create_perdomain_mapping(d, (unsigned long)dcache-
> > >inuse,
> > -                                    2 * bitmap_pages + 1,
> > -                                    NIL(l1_pgentry_t *), NULL);
> > -}
> > -
> > -int mapcache_vcpu_init(struct vcpu *v)
> > -{
> > -    struct domain *d = v->domain;
> > -    struct mapcache_domain *dcache = &d->arch.pv.mapcache;
> > -    unsigned long i;
> > -    unsigned int ents = d->max_vcpus * MAPCACHE_VCPU_ENTRIES;
> > -    unsigned int nr = PFN_UP(BITS_TO_LONGS(ents) * sizeof(long));
> > -
> > -    if ( !is_pv_vcpu(v) || !dcache->inuse )
> > +    if ( is_idle_vcpu(v) || is_hvm_vcpu(v) )
> >          return 0;
> >  
> > +    BUILD_BUG_ON(MAPCACHE_VIRT_END > ARG_XLAT_VIRT_START);
> > +
> >      if ( ents > dcache->entries )
> >      {
> > +        int rc;
> > +
> > +        ASSERT(ents * PAGE_SIZE <= (PERDOMAIN_SLOT_MBYTES << 20));
> > +
> >          /* Populate page tables. */
> > -        int rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START,
> > ents,
> > +        rc = create_perdomain_mapping(d, MAPCACHE_VIRT_START,
> > ents,
> >                                            NIL(l1_pgentry_t *),
> > NULL);
> >  
> > -        /* Populate bit maps. */
> > -        if ( !rc )
> > -            rc = create_perdomain_mapping(d, (unsigned
> > long)dcache->inuse,
> > -                                          nr, NULL, NIL(struct
> > page_info *));
> > -        if ( !rc )
> > -            rc = create_perdomain_mapping(d, (unsigned
> > long)dcache->garbage,
> > -                                          nr, NULL, NIL(struct
> > page_info *));
> > -
> >          if ( rc )
> >              return rc;
> >  
> >          dcache->entries = ents;
> 
> Given that:
> 
> 1. mapcache_domain is now a structure with only one member.
> 2. ents is a constant throughout domain's lifecycle.
> 
> You can replace mapcache_domain with a boolean --
> mapcache_mapping_populated (?) in arch.pv.
> 
> If I'm not mistaken, the size of the mapping is derived from the vcpu
> being initialised, so a further improvement is to lift the mapping
> creation out of mapcache_vcpu_init.

But you can just XEN_DOMCTL_max_vcpus on a running domain to increase
its max_vcpus count, so that ents is not constant?

> >      }
> >  
> > -    /* Mark all maphash entries as not in use. */
> >      BUILD_BUG_ON(MAPHASHENT_NOTINUSE < MAPCACHE_ENTRIES);
> > +    /* MAPHASH_ENTRIES has to be power-of-two to make hashing
> > work. */
> > +    BUILD_BUG_ON(MAPHASH_ENTRIES & (MAPHASH_ENTRIES - 1));
> > +    /*
> > +     * Since entries in the maphash also occupy inuse slots, we
> > have to make
> > +     * sure MAPCACHE_VCPU_ENTRIES is large enough to accommodate
> > both the
> > +     * maphash and a nested page table walk.
> > +     */
> > +    BUILD_BUG_ON(MAPCACHE_VCPU_ENTRIES - MAPHASH_ENTRIES <
> > +                 CONFIG_PAGING_LEVELS * CONFIG_PAGING_LEVELS);
> > +
> > +    v->arch.pv.mapcache = xzalloc(struct mapcache_vcpu);
> > +    if ( !v->arch.pv.mapcache )
> > +        return -ENOMEM;
> > +
> > +    /* Mark all maphash entries as not in use. */
> >      for ( i = 0; i < MAPHASH_ENTRIES; i++ )
> >      {
> > -        struct vcpu_maphash_entry *hashent = &v-
> > >arch.pv.mapcache.hash[i];
> > -
> > -        hashent->mfn = ~0UL; /* never valid to map */
> > -        hashent->idx = MAPHASHENT_NOTINUSE;
> > +        v->arch.pv.mapcache->hash_mfn[i] = ~0UL;
> 
> mfn_x(INVALID_MFN) here. Or you can change the type of the array to
> mfn_t.

mfn_t is a better idea. Thanks.

Hongyan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel