All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv8 0/3] gnttab: Improve scaleability
@ 2015-05-12 14:15 David Vrabel
  2015-05-12 14:15 ` [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: David Vrabel @ 2015-05-12 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
	Malcolm Crossley

The series makes the grant table locking more fine-grained and adds
per-VCPU maptrack free lists, which greatly improves scalability.

The series builds on the original series by Matt Wilson and Christoph
Egger from Amazon.

Performance results for aggregate intrahost network throughput
(between 20 VM pairs, with 16 dom0 VCPUs) show substantial
improvements.

                             Throughput/Gbit/s

Base                          9.7
Split locks                  25.8
Split locks                  42.9
 + per-VCPU maptrack lists

v8:
  * Misc locking fixes.
  * Use read/write_atomic() in per-VCPU maptrack free list.
  * Improve locking docs.

v7:
  * Re-add grant table rwlock (locking was broken without it and the
    gains are minimal anyway).
  * Remove unneeded locks from grant_table_destroy().
  * Fix get_maptrack_handle() locking.
v6:
  * Remove most uses of the grant table lock.
  * Make the grant table lock a spin lock again (there were only
    writers left after the above)
  * Add per-VCPU maptrack free lists.
v5:
  * Addressed locking issue pointed out by Jan Beulich
  * Fixed git rebase merge issue introduced in v4
    (acquiring locking twice)
  * Change for ()-loop in grant_map_exists
  * Coding style fixes
v4:
  * Coding style nits from Jan Beulich
  * Fixup read locks pointed out by Jan Beulich
  * renamed double_gt_(un)lock to double_maptrack_(un)lock
    per request from Jan Beulich
  * Addressed ASSERT()'s from Jan Beulich
  * Addressed locking issues in unmap_common pointed out
    by Jan Beulich
v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
  * Add arm part per request from Julien Grall

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state
  2015-05-12 14:15 [PATCHv8 0/3] gnttab: Improve scaleability David Vrabel
@ 2015-05-12 14:15 ` David Vrabel
  2015-05-19  8:02   ` Jan Beulich
  2015-05-12 14:15 ` [PATCHv8 2/3] gnttab: per-active entry locking David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: David Vrabel @ 2015-05-12 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	David Vrabel, Jan Beulich, Malcolm Crossley, Matt Wilson

From: Christoph Egger <chegger@amazon.de>

Split grant table lock into two separate locks. One to protect
maptrack state and change the other into a rwlock.

The rwlock is used to prevent readers from accessing inconsistent
grant table state such as current version, partially initialized
active table pages, etc.

Signed-off-by: Matt Wilson <msw@amazon.com>
[chegger: ported to xen-staging, split into multiple commits]
Signed-off-by: Christoph Egger <chegger@amazon.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt    |   31 ++++++++-
 xen/arch/arm/mm.c             |    4 +-
 xen/arch/x86/mm.c             |    4 +-
 xen/common/grant_table.c      |  138 +++++++++++++++++++++++------------------
 xen/include/xen/grant_table.h |   11 +++-
 5 files changed, 119 insertions(+), 69 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..a01f4bf 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -74,7 +74,36 @@ is complete.
  matching map track entry is then removed, as if unmap had been invoked.
  These are not used by the transfer mechanism.
   map->domid         : owner of the mapped frame
-  map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+  map->ref           : grant reference
+  map->flags         : ro/rw, mapped for host or device access
+
+********************************************************************************
+ Locking
+ ~~~~~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+  grant_table->lock          : rwlock used to prevent readers from accessing
+                               inconsistent grant table state such as current
+                               version, partially initialized active table pages,
+                               etc.
+  grant_table->maptrack_lock : spinlock used to protect the maptrack state
+
+ The primary lock for the grant table is a read/write spinlock. All
+ functions that access members of struct grant_table must acquire a
+ read lock around critical sections. Any modification to the members
+ of struct grant_table (e.g., nr_status_frames, nr_grant_frames,
+ active frames, etc.) must only be made if the write lock is
+ held. These elements are read-mostly, and read critical sections can
+ be large, which makes a rwlock a good choice.
+
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
+
+ The maptrack lock may be locked while holding the read or write grant
+ table lock.
 
 ********************************************************************************
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a91ea77..b305041 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1048,7 +1048,7 @@ int xenmem_add_to_physmap_one(
     switch ( space )
     {
     case XENMAPSPACE_grant_table:
-        spin_lock(&d->grant_table->lock);
+        write_lock(&d->grant_table->lock);
 
         if ( d->grant_table->gt_version == 0 )
             d->grant_table->gt_version = 1;
@@ -1078,7 +1078,7 @@ int xenmem_add_to_physmap_one(
 
         t = p2m_ram_rw;
 
-        spin_unlock(&d->grant_table->lock);
+        write_unlock(&d->grant_table->lock);
         break;
     case XENMAPSPACE_shared_info:
         if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 034de22..aa568a4 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4593,7 +4593,7 @@ int xenmem_add_to_physmap_one(
                 mfn = virt_to_mfn(d->shared_info);
             break;
         case XENMAPSPACE_grant_table:
-            spin_lock(&d->grant_table->lock);
+            write_lock(&d->grant_table->lock);
 
             if ( d->grant_table->gt_version == 0 )
                 d->grant_table->gt_version = 1;
@@ -4615,7 +4615,7 @@ int xenmem_add_to_physmap_one(
                     mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
             }
 
-            spin_unlock(&d->grant_table->lock);
+            write_unlock(&d->grant_table->lock);
             break;
         case XENMAPSPACE_gmfn_range:
         case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index dfb45f8..54e4411 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -229,27 +229,27 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
 }
 
 static inline void
-double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
+double_maptrack_lock(struct grant_table *lgt, struct grant_table *rgt)
 {
     if ( lgt < rgt )
     {
-        spin_lock(&lgt->lock);
-        spin_lock(&rgt->lock);
+        spin_lock(&lgt->maptrack_lock);
+        spin_lock(&rgt->maptrack_lock);
     }
     else
     {
         if ( lgt != rgt )
-            spin_lock(&rgt->lock);
-        spin_lock(&lgt->lock);
+            spin_lock(&rgt->maptrack_lock);
+        spin_lock(&lgt->maptrack_lock);
     }
 }
 
 static inline void
-double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
+double_maptrack_unlock(struct grant_table *lgt, struct grant_table *rgt)
 {
-    spin_unlock(&lgt->lock);
+    spin_unlock(&lgt->maptrack_lock);
     if ( lgt != rgt )
-        spin_unlock(&rgt->lock);
+        spin_unlock(&rgt->maptrack_lock);
 }
 
 static inline int
@@ -267,10 +267,10 @@ static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->lock);
+    spin_lock(&t->maptrack_lock);
     maptrack_entry(t, handle).ref = t->maptrack_head;
     t->maptrack_head = handle;
-    spin_unlock(&t->lock);
+    spin_unlock(&t->maptrack_lock);
 }
 
 static inline int
@@ -282,7 +282,7 @@ get_maptrack_handle(
     struct grant_mapping *new_mt;
     unsigned int          new_mt_limit, nr_frames;
 
-    spin_lock(&lgt->lock);
+    spin_lock(&lgt->maptrack_lock);
 
     while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
     {
@@ -311,7 +311,7 @@ get_maptrack_handle(
                  nr_frames + 1);
     }
 
-    spin_unlock(&lgt->lock);
+    spin_unlock(&lgt->maptrack_lock);
 
     return handle;
 }
@@ -508,7 +508,7 @@ static int grant_map_exists(const struct domain *ld,
     const struct active_grant_entry *act;
     unsigned int ref, max_iter;
     
-    ASSERT(spin_is_locked(&rgt->lock));
+    ASSERT(rw_is_locked(&rgt->lock));
 
     max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
                    nr_grant_entries(rgt));
@@ -629,7 +629,7 @@ __gnttab_map_grant_ref(
     }
 
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
         PIN_FAIL(unlock_out, GNTST_general_error,
@@ -702,7 +702,7 @@ __gnttab_map_grant_ref(
 
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     /* pg may be set, with a refcount included, from __get_paged_frame */
     if ( !pg )
@@ -778,7 +778,7 @@ __gnttab_map_grant_ref(
         goto undo_out;
     }
 
-    double_gt_lock(lgt, rgt);
+    double_maptrack_lock(lgt, rgt);
 
     if ( gnttab_need_iommu_mapping(ld) )
     {
@@ -801,7 +801,7 @@ __gnttab_map_grant_ref(
         }
         if ( err )
         {
-            double_gt_unlock(lgt, rgt);
+            double_maptrack_unlock(lgt, rgt);
             rc = GNTST_general_error;
             goto undo_out;
         }
@@ -814,7 +814,8 @@ __gnttab_map_grant_ref(
     mt->ref   = op->ref;
     mt->flags = op->flags;
 
-    double_gt_unlock(lgt, rgt);
+    double_maptrack_unlock(lgt, rgt);
+    read_unlock(&rgt->lock);
 
     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
     op->handle       = handle;
@@ -837,7 +838,7 @@ __gnttab_map_grant_ref(
         put_page(pg);
     }
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     act = &active_entry(rgt, op->ref);
 
@@ -857,7 +858,7 @@ __gnttab_map_grant_ref(
         gnttab_clear_flag(_GTF_reading, status);
 
  unlock_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     op->status = rc;
     put_maptrack_handle(lgt, handle);
     rcu_unlock_domain(rd);
@@ -907,18 +908,19 @@ __gnttab_unmap_common(
     }
 
     op->map = &maptrack_entry(lgt, op->handle);
-    spin_lock(&lgt->lock);
+
+    read_lock(&lgt->lock);
 
     if ( unlikely(!op->map->flags) )
     {
-        spin_unlock(&lgt->lock);
+        read_unlock(&lgt->lock);
         gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
         return;
     }
 
     dom = op->map->domid;
-    spin_unlock(&lgt->lock);
+    read_unlock(&lgt->lock);
 
     if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
     {
@@ -939,7 +941,9 @@ __gnttab_unmap_common(
     TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
 
     rgt = rd->grant_table;
-    double_gt_lock(lgt, rgt);
+
+    read_lock(&rgt->lock);
+    double_maptrack_lock(lgt, rgt);
 
     op->flags = op->map->flags;
     if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
@@ -1009,7 +1013,9 @@ __gnttab_unmap_common(
          gnttab_mark_dirty(rd, op->frame);
 
  unmap_out:
-    double_gt_unlock(lgt, rgt);
+    double_maptrack_unlock(lgt, rgt);
+    read_unlock(&rgt->lock);
+
     op->status = rc;
     rcu_unlock_domain(rd);
 }
@@ -1039,8 +1045,8 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
 
     rcu_lock_domain(rd);
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
 
+    read_lock(&rgt->lock);
     if ( rgt->gt_version == 0 )
         goto unmap_out;
 
@@ -1104,7 +1110,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
         gnttab_clear_flag(_GTF_reading, status);
 
  unmap_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     if ( put_handle )
     {
         op->map->flags = 0;
@@ -1290,11 +1296,13 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
     gt->nr_status_frames = 0;
 }
 
+/* 
+ * Grow the grant table. The caller must hold the grant table's
+ * write lock before calling this function.
+ */
 int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
 {
-    /* d's grant table lock must be held by the caller */
-
     struct grant_table *gt = d->grant_table;
     unsigned int i;
 
@@ -1398,7 +1406,7 @@ gnttab_setup_table(
     }
 
     gt = d->grant_table;
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
 
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
@@ -1426,7 +1434,7 @@ gnttab_setup_table(
     }
 
  out3:
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
  out2:
     rcu_unlock_domain(d);
  out1:
@@ -1468,13 +1476,13 @@ gnttab_query_size(
         goto query_out_unlock;
     }
 
-    spin_lock(&d->grant_table->lock);
+    read_lock(&d->grant_table->lock);
 
     op.nr_frames     = nr_grant_frames(d->grant_table);
     op.max_nr_frames = max_grant_frames;
     op.status        = GNTST_okay;
 
-    spin_unlock(&d->grant_table->lock);
+    read_unlock(&d->grant_table->lock);
 
  
  query_out_unlock:
@@ -1500,7 +1508,7 @@ gnttab_prepare_for_transfer(
     union grant_combo   scombo, prev_scombo, new_scombo;
     int                 retries = 0;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
     {
@@ -1551,11 +1559,11 @@ gnttab_prepare_for_transfer(
         scombo = prev_scombo;
     }
 
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return 1;
 
  fail:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return 0;
 }
 
@@ -1748,7 +1756,7 @@ gnttab_transfer(
         TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
 
         /* Tell the guest about its new page frame. */
-        spin_lock(&e->grant_table->lock);
+        read_lock(&e->grant_table->lock);
 
         if ( e->grant_table->gt_version == 1 )
         {
@@ -1766,7 +1774,7 @@ gnttab_transfer(
         shared_entry_header(e->grant_table, gop.ref)->flags |=
             GTF_transfer_completed;
 
-        spin_unlock(&e->grant_table->lock);
+        read_unlock(&e->grant_table->lock);
 
         rcu_unlock_domain(e);
 
@@ -1804,7 +1812,7 @@ __release_grant_for_copy(
     released_read = 0;
     released_write = 0;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     act = &active_entry(rgt, gref);
     sha = shared_entry_header(rgt, gref);
@@ -1845,7 +1853,7 @@ __release_grant_for_copy(
         released_read = 1;
     }
 
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     if ( td != rd )
     {
@@ -1903,7 +1911,7 @@ __acquire_grant_for_copy(
 
     *page = NULL;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
         PIN_FAIL(unlock_out, GNTST_general_error,
@@ -1972,17 +1980,23 @@ __acquire_grant_for_copy(
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant referenced bad domain %d\n",
                          trans_domid);
-            spin_unlock(&rgt->lock);
+
+            /* 
+             * __acquire_grant_for_copy() could take the read lock on
+             * the right table (if rd == td), so we have to drop the
+             * lock here and reacquire
+             */
+            read_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
                                           readonly, &grant_frame, page,
                                           &trans_page_off, &trans_length, 0);
 
-            spin_lock(&rgt->lock);
+            read_lock(&rgt->lock);
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
+                read_unlock(&rgt->lock);
                 rcu_unlock_domain(td);
-                spin_unlock(&rgt->lock);
                 return rc;
             }
 
@@ -1994,7 +2008,7 @@ __acquire_grant_for_copy(
             {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
-                spin_unlock(&rgt->lock);
+                read_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
                                                 frame, page, page_off, length,
@@ -2062,7 +2076,7 @@ __acquire_grant_for_copy(
     *length = act->length;
     *frame = act->frame;
 
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return rc;
  
  unlock_out_clear:
@@ -2074,7 +2088,8 @@ __acquire_grant_for_copy(
         gnttab_clear_flag(_GTF_reading, status);
 
  unlock_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
+
     return rc;
 }
 
@@ -2390,7 +2405,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     if ( gt->gt_version == op.version )
         goto out;
 
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
     /* Make sure that the grant table isn't currently in use when we
        change the version number, except for the first 8 entries which
        are allowed to be in use (xenstore/xenconsole keeps them mapped).
@@ -2476,7 +2491,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     gt->gt_version = op.version;
 
 out_unlock:
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
 
 out:
     op.version = gt->gt_version;
@@ -2532,7 +2547,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
 
     op.status = GNTST_okay;
 
-    spin_lock(&gt->lock);
+    read_lock(&gt->lock);
 
     for ( i = 0; i < op.nr_frames; i++ )
     {
@@ -2541,7 +2556,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
             op.status = GNTST_bad_virt_addr;
     }
 
-    spin_unlock(&gt->lock);
+    read_unlock(&gt->lock);
 out2:
     rcu_unlock_domain(d);
 out1:
@@ -2590,7 +2605,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     struct active_grant_entry *act;
     s16 rc = GNTST_okay;
 
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
 
     /* Bounds check on the grant refs */
     if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
@@ -2630,7 +2645,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     }
 
 out:
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
 
@@ -2701,12 +2716,12 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
 
     if ( d != owner )
     {
-        spin_lock(&owner->grant_table->lock);
+        read_lock(&owner->grant_table->lock);
 
         ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
         if ( ret != 0 )
         {
-            spin_unlock(&owner->grant_table->lock);
+            read_unlock(&owner->grant_table->lock);
             rcu_unlock_domain(d);
             put_page(page);
             return ret;
@@ -2726,7 +2741,7 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
         ret = 0;
 
     if ( d != owner )
-        spin_unlock(&owner->grant_table->lock);
+        read_unlock(&owner->grant_table->lock);
     unmap_domain_page(v);
     put_page(page);
 
@@ -2945,7 +2960,8 @@ grant_table_create(
         goto no_mem_0;
 
     /* Simple stuff. */
-    spin_lock_init(&t->lock);
+    rwlock_init(&t->lock);
+    spin_lock_init(&t->maptrack_lock);
     t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
 
     /* Active grant table. */
@@ -3052,7 +3068,7 @@ gnttab_release_mappings(
         }
 
         rgt = rd->grant_table;
-        spin_lock(&rgt->lock);
+        read_lock(&rgt->lock);
 
         act = &active_entry(rgt, ref);
         sha = shared_entry_header(rgt, ref);
@@ -3112,7 +3128,7 @@ gnttab_release_mappings(
         if ( act->pin == 0 )
             gnttab_clear_flag(_GTF_reading, status);
 
-        spin_unlock(&rgt->lock);
+        read_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
 
@@ -3160,7 +3176,7 @@ static void gnttab_usage_print(struct domain *rd)
     printk("      -------- active --------       -------- shared --------\n");
     printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
 
-    spin_lock(&gt->lock);
+    read_lock(&gt->lock);
 
     if ( gt->gt_version == 0 )
         goto out;
@@ -3209,7 +3225,7 @@ static void gnttab_usage_print(struct domain *rd)
     }
 
  out:
-    spin_unlock(&gt->lock);
+    read_unlock(&gt->lock);
 
     if ( first )
         printk("grant-table for remote domain:%5d ... "
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 32f5786..98cc94a 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -64,6 +64,11 @@ struct grant_mapping {
 
 /* Per-domain grant information. */
 struct grant_table {
+    /* 
+     * Lock protecting updates to grant table state (version, active
+     * entry list, etc.)
+     */
+    rwlock_t              lock;
     /* Table size. Number of frames shared with guest */
     unsigned int          nr_grant_frames;
     /* Shared grant table (see include/public/grant_table.h). */
@@ -82,8 +87,8 @@ struct grant_table {
     struct grant_mapping **maptrack;
     unsigned int          maptrack_head;
     unsigned int          maptrack_limit;
-    /* Lock protecting updates to active and shared grant tables. */
-    spinlock_t            lock;
+    /* Lock protecting the maptrack page list, head, and limit */
+    spinlock_t            maptrack_lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
        what version to use yet. */
     unsigned              gt_version;
@@ -101,7 +106,7 @@ gnttab_release_mappings(
     struct domain *d);
 
 /* Increase the size of a domain's grant table.
- * Caller must hold d's grant table lock.
+ * Caller must hold d's grant table write lock.
  */
 int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv8 2/3] gnttab: per-active entry locking
  2015-05-12 14:15 [PATCHv8 0/3] gnttab: Improve scaleability David Vrabel
  2015-05-12 14:15 ` [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state David Vrabel
@ 2015-05-12 14:15 ` David Vrabel
  2015-05-19  8:27   ` Jan Beulich
  2015-05-12 14:15 ` [PATCHv8 3/3] gnttab: use per-VCPU maptrack free lists David Vrabel
  2015-05-19  7:36 ` [PATCHv8 0/3] gnttab: Improve scaleability Jan Beulich
  3 siblings, 1 reply; 10+ messages in thread
From: David Vrabel @ 2015-05-12 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	David Vrabel, Jan Beulich, Malcolm Crossley, Matt Wilson

From: Matt Wilson <msw@amazon.com>

Introduce a per-active entry spin lock to protect active entry state
instead of requiring a double maptrack lock.  The grant table read
lock must be locked before acquiring (locking) a active entry.

The reduces lock contention on grant map/unmap to the maptrack lock
during maptrack handle get/put, significantly improving performance
with many concurrent map/unmap operations.

Signed-off-by: Matt Wilson <msw@amazon.com>
[chegger: ported to xen-staging, split into multiple commits]
Signed-off-by: Christoph Egger <chegger@amazon.de>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt |   25 ++++++
 xen/common/grant_table.c   |  212 +++++++++++++++++++++++++++-----------------
 2 files changed, 154 insertions(+), 83 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index a01f4bf..c30906a 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
   act->domid : remote domain being granted rights
   act->frame : machine frame being granted
   act->pin   : used to hold reference counts
+  act->lock  : spinlock used to serialize access to active entry state
 
  Map tracking
  ~~~~~~~~~~~~
@@ -87,6 +88,8 @@ is complete.
                                version, partially initialized active table pages,
                                etc.
   grant_table->maptrack_lock : spinlock used to protect the maptrack state
+  active_grant_entry->lock   : spinlock used to serialize modifications to
+                               active entries
 
  The primary lock for the grant table is a read/write spinlock. All
  functions that access members of struct grant_table must acquire a
@@ -105,6 +108,28 @@ is complete.
  The maptrack lock may be locked while holding the read or write grant
  table lock.
 
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. The caller must hold the rwlock for the gt in question
+ before calling active_entry_acquire(). This is because the grant
+ table can be dynamically extended via gnttab_grow_table() while a
+ domain is running and must be fully initialized. Once all access to
+ the active entry is complete, release the lock by calling
+ active_entry_release(act).
+
+ Summary of rules for locking:
+  active_entry_acquire() and active_entry_release() can only be
+  called when holding the relevant grant table's lock. I.e.:
+    read_lock(&gt->lock);
+    act = active_entry_acquire(gt, ref);
+    ...
+    active_entry_release(act);
+    read_unlock(&gt->lock);
+
+ Active entries cannot be acquired while holding the maptrack lock.
+ Multiple active entries can only be acquired while holding the grant
+ table _write_ lock.
+
 ********************************************************************************
 
  Granting a foreign domain access to frames
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 54e4411..76de00e 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
                                in the page.                           */
     unsigned      length:16; /* For sub-page grants, the length of the
                                 grant.                                */
+    spinlock_t    lock;      /* lock to protect access of this entry.
+                                see docs/misc/grant-tables.txt for
+                                locking protocol                      */
 };
 
 #define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
     ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
 
 static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,28 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+    struct active_grant_entry *act;
+
+    /* 
+     * not perfect, but better than nothing for a debug build
+     * sanity check
+     */
+    ASSERT(rw_is_locked(&t->lock));
+
+    act = &_active_entry(t, e);
+    spin_lock(&act->lock);
+
+    return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+    spin_unlock(&act->lock);
+}
+    
 /* Check if the page has been paged out, or needs unsharing. 
    If rc == GNTST_okay, *page contains the page struct with a ref taken.
    Caller must do put_page(*page).
@@ -228,30 +253,6 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
     return rc;
 }
 
-static inline void
-double_maptrack_lock(struct grant_table *lgt, struct grant_table *rgt)
-{
-    if ( lgt < rgt )
-    {
-        spin_lock(&lgt->maptrack_lock);
-        spin_lock(&rgt->maptrack_lock);
-    }
-    else
-    {
-        if ( lgt != rgt )
-            spin_lock(&rgt->maptrack_lock);
-        spin_lock(&lgt->maptrack_lock);
-    }
-}
-
-static inline void
-double_maptrack_unlock(struct grant_table *lgt, struct grant_table *rgt)
-{
-    spin_unlock(&lgt->maptrack_lock);
-    if ( lgt != rgt )
-        spin_unlock(&rgt->maptrack_lock);
-}
-
 static inline int
 __get_maptrack_handle(
     struct grant_table *t)
@@ -316,7 +317,10 @@ get_maptrack_handle(
     return handle;
 }
 
-/* Number of grant table entries. Caller must hold d's grant table lock. */
+/* 
+ * Number of grant table entries. Caller must hold d's grant table
+ * read lock.
+ */
 static unsigned int nr_grant_entries(struct grant_table *gt)
 {
     ASSERT(gt->gt_version != 0);
@@ -505,26 +509,23 @@ static int grant_map_exists(const struct domain *ld,
                             unsigned long mfn,
                             unsigned int *ref_count)
 {
-    const struct active_grant_entry *act;
+    struct active_grant_entry *act;
     unsigned int ref, max_iter;
     
     ASSERT(rw_is_locked(&rgt->lock));
 
     max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
                    nr_grant_entries(rgt));
-    for ( ref = *ref_count; ref < max_iter; ref++ )
+    for ( ref = *ref_count; ref < max_iter; active_entry_release(act), ref++ )
     {
-        act = &active_entry(rgt, ref);
-
-        if ( !act->pin )
-            continue;
-
-        if ( act->domid != ld->domain_id )
-            continue;
+        act = active_entry_acquire(rgt, ref);
 
-        if ( act->frame != mfn )
+        if ( !act->pin ||
+             act->domid != ld->domain_id ||
+             act->frame != mfn )
             continue;
 
+        active_entry_release(act);
         return 0;
     }
 
@@ -537,6 +538,13 @@ static int grant_map_exists(const struct domain *ld,
     return -EINVAL;
 }
 
+/* 
+ * Count the number of mapped ro or rw entries tracked in the local
+ * grant table given a provided mfn provided by a foreign domain.
+ *
+ * This function takes the maptrack lock from the local grant table
+ * and must be called with the remote grant table's rwlock held.
+ */
 static void mapcount(
     struct grant_table *lgt, struct domain *rd, unsigned long mfn,
     unsigned int *wrc, unsigned int *rdc)
@@ -546,15 +554,28 @@ static void mapcount(
 
     *wrc = *rdc = 0;
 
+    /* 
+     * N.B.: while taking the local maptrack spinlock prevents any
+     * mapping changes, the remote active entries could be changing
+     * while we are counting. The caller has to hold the grant table
+     * write lock, or some other mechanism should be used to prevent
+     * concurrent changes during this operation.  This is tricky
+     * because we can't promote a read lock into a write lock.
+     */
+    ASSERT(rw_is_locked(&rd->grant_table->lock));
+    spin_lock(&lgt->maptrack_lock);
+
     for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
     {
         map = &maptrack_entry(lgt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
              map->domid != rd->domain_id )
             continue;
-        if ( active_entry(rd->grant_table, map->ref).frame == mfn )
+        if ( _active_entry(rd->grant_table, map->ref).frame == mfn )
             (map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
     }
+
+    spin_unlock(&lgt->maptrack_lock);
 }
 
 /*
@@ -576,7 +597,6 @@ __gnttab_map_grant_ref(
     struct page_info *pg = NULL;
     int            rc = GNTST_okay;
     u32            old_pin;
-    u32            act_pin;
     unsigned int   cache_flags;
     struct active_grant_entry *act = NULL;
     struct grant_mapping *mt;
@@ -639,7 +659,7 @@ __gnttab_map_grant_ref(
     if ( unlikely(op->ref >= nr_grant_entries(rgt)))
         PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref (%d).\n", op->ref);
 
-    act = &active_entry(rgt, op->ref);
+    act = active_entry_acquire(rgt, op->ref);
     shah = shared_entry_header(rgt, op->ref);
     if (rgt->gt_version == 1) {
         sha1 = &shared_entry_v1(rgt, op->ref);
@@ -656,7 +676,7 @@ __gnttab_map_grant_ref(
          ((act->domid != ld->domain_id) ||
           (act->pin & 0x80808080U) != 0 ||
           (act->is_sub_page)) )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(act_release_out, GNTST_general_error,
                  "Bad domain (%d != %d), or risk of counter overflow %08x, or subpage %d\n",
                  act->domid, ld->domain_id, act->pin, act->is_sub_page);
 
@@ -667,7 +687,7 @@ __gnttab_map_grant_ref(
         if ( (rc = _set_status(rgt->gt_version, ld->domain_id,
                                op->flags & GNTMAP_readonly,
                                1, shah, act, status) ) != GNTST_okay )
-             goto unlock_out;
+            goto act_release_out;
 
         if ( !act->pin )
         {
@@ -698,12 +718,9 @@ __gnttab_map_grant_ref(
             GNTPIN_hstr_inc : GNTPIN_hstw_inc;
 
     frame = act->frame;
-    act_pin = act->pin;
 
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
-    read_unlock(&rgt->lock);
-
     /* pg may be set, with a refcount included, from __get_paged_frame */
     if ( !pg )
     {
@@ -778,8 +795,6 @@ __gnttab_map_grant_ref(
         goto undo_out;
     }
 
-    double_maptrack_lock(lgt, rgt);
-
     if ( gnttab_need_iommu_mapping(ld) )
     {
         unsigned int wrc, rdc;
@@ -787,21 +802,20 @@ __gnttab_map_grant_ref(
         /* We're not translated, so we know that gmfns and mfns are
            the same things, so the IOMMU entry is always 1-to-1. */
         mapcount(lgt, rd, frame, &wrc, &rdc);
-        if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
+        if ( (act->pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
         {
             if ( wrc == 0 )
                 err = iommu_map_page(ld, frame, frame,
                                      IOMMUF_readable|IOMMUF_writable);
         }
-        else if ( act_pin && !old_pin )
+        else if ( act->pin && !old_pin )
         {
             if ( (wrc + rdc) == 0 )
                 err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
         }
         if ( err )
         {
-            double_maptrack_unlock(lgt, rgt);
             rc = GNTST_general_error;
             goto undo_out;
         }
@@ -814,7 +828,7 @@ __gnttab_map_grant_ref(
     mt->ref   = op->ref;
     mt->flags = op->flags;
 
-    double_maptrack_unlock(lgt, rgt);
+    active_entry_release(act);
     read_unlock(&rgt->lock);
 
     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
@@ -838,10 +852,6 @@ __gnttab_map_grant_ref(
         put_page(pg);
     }
 
-    read_lock(&rgt->lock);
-
-    act = &active_entry(rgt, op->ref);
-
     if ( op->flags & GNTMAP_device_map )
         act->pin -= (op->flags & GNTMAP_readonly) ?
             GNTPIN_devr_inc : GNTPIN_devw_inc;
@@ -857,6 +867,9 @@ __gnttab_map_grant_ref(
     if ( !act->pin )
         gnttab_clear_flag(_GTF_reading, status);
 
+ act_release_out:
+    active_entry_release(act);
+
  unlock_out:
     read_unlock(&rgt->lock);
     op->status = rc;
@@ -943,18 +956,17 @@ __gnttab_unmap_common(
     rgt = rd->grant_table;
 
     read_lock(&rgt->lock);
-    double_maptrack_lock(lgt, rgt);
 
     op->flags = op->map->flags;
     if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
     {
         gdprintk(XENLOG_WARNING, "Unstable handle %u\n", op->handle);
         rc = GNTST_bad_handle;
-        goto unmap_out;
+        goto read_unlock_out;
     }
 
     op->rd = rd;
-    act = &active_entry(rgt, op->map->ref);
+    act = active_entry_acquire(rgt, op->map->ref);
 
     if ( op->frame == 0 )
     {
@@ -963,7 +975,7 @@ __gnttab_unmap_common(
     else
     {
         if ( unlikely(op->frame != act->frame) )
-            PIN_FAIL(unmap_out, GNTST_general_error,
+            PIN_FAIL(act_release_out, GNTST_general_error,
                      "Bad frame number doesn't match gntref. (%lx != %lx)\n",
                      op->frame, act->frame);
         if ( op->flags & GNTMAP_device_map )
@@ -982,7 +994,7 @@ __gnttab_unmap_common(
         if ( (rc = replace_grant_host_mapping(op->host_addr,
                                               op->frame, op->new_addr, 
                                               op->flags)) < 0 )
-            goto unmap_out;
+            goto act_release_out;
 
         ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
         op->map->flags &= ~GNTMAP_host_map;
@@ -1004,7 +1016,7 @@ __gnttab_unmap_common(
         if ( err )
         {
             rc = GNTST_general_error;
-            goto unmap_out;
+            goto act_release_out;
         }
     }
 
@@ -1012,8 +1024,9 @@ __gnttab_unmap_common(
     if ( !(op->flags & GNTMAP_readonly) )
          gnttab_mark_dirty(rd, op->frame);
 
- unmap_out:
-    double_maptrack_unlock(lgt, rgt);
+ act_release_out:
+    active_entry_release(act);
+ read_unlock_out:
     read_unlock(&rgt->lock);
 
     op->status = rc;
@@ -1048,9 +1061,9 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
 
     read_lock(&rgt->lock);
     if ( rgt->gt_version == 0 )
-        goto unmap_out;
+        goto unlock_out;
 
-    act = &active_entry(rgt, op->map->ref);
+    act = active_entry_acquire(rgt, op->map->ref);
     sha = shared_entry_header(rgt, op->map->ref);
 
     if ( rgt->gt_version == 1 )
@@ -1064,7 +1077,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
          * Suggests that __gntab_unmap_common failed early and so
          * nothing further to do
          */
-        goto unmap_out;
+        goto act_release_out;
     }
 
     pg = mfn_to_page(op->frame);
@@ -1088,7 +1101,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
              * Suggests that __gntab_unmap_common failed in
              * replace_grant_host_mapping() so nothing further to do
              */
-            goto unmap_out;
+            goto act_release_out;
         }
 
         if ( !is_iomem_page(op->frame) ) 
@@ -1109,8 +1122,12 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     if ( act->pin == 0 )
         gnttab_clear_flag(_GTF_reading, status);
 
- unmap_out:
+ act_release_out:
+    active_entry_release(act);
+    
+ unlock_out:
     read_unlock(&rgt->lock);
+
     if ( put_handle )
     {
         op->map->flags = 0;
@@ -1304,7 +1321,7 @@ int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
 {
     struct grant_table *gt = d->grant_table;
-    unsigned int i;
+    unsigned int i, j;
 
     ASSERT(req_nr_frames <= max_grant_frames);
 
@@ -1319,6 +1336,8 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
         if ( (gt->active[i] = alloc_xenheap_page()) == NULL )
             goto active_alloc_failed;
         clear_page(gt->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&gt->active[i][j].lock);
     }
 
     /* Shared */
@@ -1814,7 +1833,7 @@ __release_grant_for_copy(
 
     read_lock(&rgt->lock);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
     r_frame = act->frame;
 
@@ -1853,6 +1872,7 @@ __release_grant_for_copy(
         released_read = 1;
     }
 
+    active_entry_release(act);
     read_unlock(&rgt->lock);
 
     if ( td != rd )
@@ -1914,14 +1934,14 @@ __acquire_grant_for_copy(
     read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(gnt_unlock_out, GNTST_general_error,
                  "remote grant table not ready\n");
 
     if ( unlikely(gref >= nr_grant_entries(rgt)) )
-        PIN_FAIL(unlock_out, GNTST_bad_gntref,
+        PIN_FAIL(gnt_unlock_out, GNTST_bad_gntref,
                  "Bad grant reference %ld\n", gref);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     shah = shared_entry_header(rgt, gref);
     if ( rgt->gt_version == 1 )
     {
@@ -1986,6 +2006,7 @@ __acquire_grant_for_copy(
              * the right table (if rd == td), so we have to drop the
              * lock here and reacquire
              */
+            active_entry_release(act);
             read_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
@@ -1993,8 +2014,11 @@ __acquire_grant_for_copy(
                                           &trans_page_off, &trans_length, 0);
 
             read_lock(&rgt->lock);
+            act = active_entry_acquire(rgt, gref);
+
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
+                active_entry_release(act);
                 read_unlock(&rgt->lock);
                 rcu_unlock_domain(td);
                 return rc;
@@ -2008,6 +2032,7 @@ __acquire_grant_for_copy(
             {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
+                active_entry_release(act);
                 read_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
@@ -2076,6 +2101,7 @@ __acquire_grant_for_copy(
     *length = act->length;
     *frame = act->frame;
 
+    active_entry_release(act);
     read_unlock(&rgt->lock);
     return rc;
  
@@ -2088,6 +2114,9 @@ __acquire_grant_for_copy(
         gnttab_clear_flag(_GTF_reading, status);
 
  unlock_out:
+    active_entry_release(act);
+
+ gnt_unlock_out:
     read_unlock(&rgt->lock);
 
     return rc;
@@ -2389,7 +2418,6 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     gnttab_set_version_t op;
     struct domain *d = current->domain;
     struct grant_table *gt = d->grant_table;
-    struct active_grant_entry *act;
     grant_entry_v1_t reserved_entries[GNTTAB_NR_RESERVED_ENTRIES];
     long res;
     int i;
@@ -2414,8 +2442,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     {
         for ( i = GNTTAB_NR_RESERVED_ENTRIES; i < nr_grant_entries(gt); i++ )
         {
-            act = &active_entry(gt, i);
-            if ( act->pin != 0 )
+            if ( read_atomic(&_active_entry(gt, i).pin) != 0 )
             {
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version from %d to %d, but some grant entries still in use\n",
@@ -2602,9 +2629,17 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
 {
     struct domain *d = rcu_lock_current_domain();
     struct grant_table *gt = d->grant_table;
-    struct active_grant_entry *act;
+    struct active_grant_entry *act_a = NULL;
+    struct active_grant_entry *act_b = NULL;
     s16 rc = GNTST_okay;
 
+    if ( ref_a == ref_b )
+        /*
+         * noop, so avoid acquiring the same active entry
+         * twice which would case a deadlock.
+         */
+        return rc;
+
     write_lock(&gt->lock);
 
     /* Bounds check on the grant refs */
@@ -2613,12 +2648,12 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     if ( unlikely(ref_b >= nr_grant_entries(d->grant_table)))
         PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b (%d).\n", ref_b);
 
-    act = &active_entry(gt, ref_a);
-    if ( act->pin )
+    act_a = active_entry_acquire(gt, ref_a);
+    if ( act_a->pin )
         PIN_FAIL(out, GNTST_eagain, "ref a %ld busy\n", (long)ref_a);
 
-    act = &active_entry(gt, ref_b);
-    if ( act->pin )
+    act_b = active_entry_acquire(gt, ref_b);
+    if ( act_b->pin )
         PIN_FAIL(out, GNTST_eagain, "ref b %ld busy\n", (long)ref_b);
 
     if ( gt->gt_version == 1 )
@@ -2645,6 +2680,10 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     }
 
 out:
+    if ( act_b != NULL )
+        active_entry_release(act_b);
+    if ( act_a != NULL )
+        active_entry_release(act_a);
     write_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
@@ -2954,7 +2993,7 @@ grant_table_create(
     struct domain *d)
 {
     struct grant_table *t;
-    int                 i;
+    unsigned int i, j;
 
     if ( (t = xzalloc(struct grant_table)) == NULL )
         goto no_mem_0;
@@ -2974,6 +3013,8 @@ grant_table_create(
         if ( (t->active[i] = alloc_xenheap_page()) == NULL )
             goto no_mem_2;
         clear_page(t->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&t->active[i][j].lock);
     }
 
     /* Tracking of mapped foreign frames table */
@@ -3070,7 +3111,7 @@ gnttab_release_mappings(
         rgt = rd->grant_table;
         read_lock(&rgt->lock);
 
-        act = &active_entry(rgt, ref);
+        act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
         if (rgt->gt_version == 1)
             status = &sha->flags;
@@ -3128,6 +3169,7 @@ gnttab_release_mappings(
         if ( act->pin == 0 )
             gnttab_clear_flag(_GTF_reading, status);
 
+        active_entry_release(act);
         read_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
@@ -3190,9 +3232,12 @@ static void gnttab_usage_print(struct domain *rd)
         uint16_t status;
         uint64_t frame;
 
-        act = &active_entry(gt, ref);
+        act = active_entry_acquire(gt, ref);
         if ( !act->pin )
+        {
+            active_entry_release(act);
             continue;
+        }
 
         sha = shared_entry_header(gt, ref);
 
@@ -3222,6 +3267,7 @@ static void gnttab_usage_print(struct domain *rd)
         printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 0x%02x\n",
                ref, act->domid, act->frame, act->pin,
                sha->domid, frame, status);
+        active_entry_release(act);
     }
 
  out:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv8 3/3] gnttab: use per-VCPU maptrack free lists
  2015-05-12 14:15 [PATCHv8 0/3] gnttab: Improve scaleability David Vrabel
  2015-05-12 14:15 ` [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state David Vrabel
  2015-05-12 14:15 ` [PATCHv8 2/3] gnttab: per-active entry locking David Vrabel
@ 2015-05-12 14:15 ` David Vrabel
  2015-05-19  7:36 ` [PATCHv8 0/3] gnttab: Improve scaleability Jan Beulich
  3 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2015-05-12 14:15 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
	Malcolm Crossley

From: Malcolm Crossley <malcolm.crossley@citrix.com>

Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.

Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle.  Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).

A domain's maptrack limit is multiplied by the number of VCPUs.  This
ensures that a worst case domain that only performs grant table
operations via one VCPU will not see a lower map track limit.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/common/grant_table.c      |  139 ++++++++++++++++++++++++++---------------
 xen/include/xen/grant_table.h |    8 ++-
 xen/include/xen/sched.h       |    5 ++
 3 files changed, 97 insertions(+), 55 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 76de00e..92d2740 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -118,9 +118,9 @@ struct gnttab_unmap_common {
     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 
 static inline unsigned int
-nr_maptrack_frames(struct grant_table *t)
+nr_vcpu_maptrack_frames(struct vcpu *v)
 {
-    return t->maptrack_limit / MAPTRACK_PER_PAGE;
+    return v->maptrack_limit / MAPTRACK_PER_PAGE;
 }
 
 #define MAPTRACK_TAIL (~0u)
@@ -255,66 +255,101 @@ static int __get_paged_frame(unsigned long gfn, unsigned long *frame, struct pag
 
 static inline int
 __get_maptrack_handle(
-    struct grant_table *t)
+    struct grant_table *t,
+    struct vcpu *v)
 {
-    unsigned int h;
-    if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) )
+    unsigned int head, next;
+
+    /* No maptrack pages allocated for this VCPU yet? */
+    head = v->maptrack_head;
+    if ( unlikely(head == MAPTRACK_TAIL) )
+        return -1;
+
+    /*
+     * Always keep one entry in the free list to make it easier to add
+     * free entries to the tail.
+     */
+    next = read_atomic(&maptrack_entry(t, head).ref);
+    if ( unlikely(next == MAPTRACK_TAIL) )
         return -1;
-    t->maptrack_head = maptrack_entry(t, h).ref;
-    return h;
+
+    v->maptrack_head = next;
+
+    return head;
 }
 
 static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->maptrack_lock);
-    maptrack_entry(t, handle).ref = t->maptrack_head;
-    t->maptrack_head = handle;
-    spin_unlock(&t->maptrack_lock);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+    unsigned int prev_tail, cur_tail;
+
+    /* 1. Set entry to be a tail. */
+    maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
+
+    /* 2. Add entry to the tail of the list on the original VCPU. */
+    v = d->vcpu[maptrack_entry(t,handle).vcpu];
+
+    cur_tail = read_atomic(&v->maptrack_tail);
+    do {
+        prev_tail = cur_tail;
+        cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
+    } while ( cur_tail != prev_tail );
+
+    /* 3. Update the old tail entry to point to the new entry. */
+    write_atomic(&maptrack_entry(t, prev_tail).ref, handle);
 }
 
 static inline int
 get_maptrack_handle(
     struct grant_table *lgt)
 {
+    struct vcpu          *v = current;
     int                   i;
     grant_handle_t        handle;
     struct grant_mapping *new_mt;
-    unsigned int          new_mt_limit, nr_frames;
+    unsigned int          nr_frames;
 
-    spin_lock(&lgt->maptrack_lock);
+    handle = __get_maptrack_handle(lgt, v);
+    if ( likely(handle != -1) )
+        return handle;
 
-    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
-    {
-        nr_frames = nr_maptrack_frames(lgt);
-        if ( nr_frames >= max_maptrack_frames )
-            break;
+    nr_frames = nr_vcpu_maptrack_frames(v);
+    if ( nr_frames >= max_maptrack_frames )
+        return -1;
 
-        new_mt = alloc_xenheap_page();
-        if ( !new_mt )
-            break;
+    new_mt = alloc_xenheap_page();
+    if ( !new_mt )
+        return -1;
+    clear_page(new_mt);
 
-        clear_page(new_mt);
+    spin_lock(&lgt->maptrack_lock);
 
-        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
+    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
+    {
+        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
+        new_mt[i - 1].vcpu = v->vcpu_id;
+    }
+    /* Set last entry vcpu and ref */
+    new_mt[i - 1].ref = v->maptrack_head;
+    new_mt[i - 1].vcpu = v->vcpu_id;
 
-        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-            new_mt[i - 1].ref = lgt->maptrack_limit + i;
-        new_mt[i - 1].ref = lgt->maptrack_head;
-        lgt->maptrack_head = lgt->maptrack_limit;
+    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
 
-        lgt->maptrack[nr_frames] = new_mt;
-        smp_wmb();
-        lgt->maptrack_limit      = new_mt_limit;
+    /* Set tail directly if this is the first page for this VCPU. */
+    if ( v->maptrack_tail == MAPTRACK_TAIL )
+        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
+            + MAPTRACK_PER_PAGE - 1;
 
-        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
-                 nr_frames + 1);
-    }
+    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
 
     spin_unlock(&lgt->maptrack_lock);
 
-    return handle;
+    v->maptrack_limit += MAPTRACK_PER_PAGE;
+
+    return __get_maptrack_handle(lgt, v);
 }
 
 /* 
@@ -565,7 +600,8 @@ static void mapcount(
     ASSERT(rw_is_locked(&rd->grant_table->lock));
     spin_lock(&lgt->maptrack_lock);
 
-    for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < lgt->maptrack_pages * MAPTRACK_PER_PAGE; 
+          handle++ )
     {
         map = &maptrack_entry(lgt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
@@ -913,7 +949,7 @@ __gnttab_unmap_common(
 
     op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
 
-    if ( unlikely(op->handle >= lgt->maptrack_limit) )
+    if ( unlikely(op->handle >= lgt->maptrack_pages * MAPTRACK_PER_PAGE) )
     {
         gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
@@ -1386,6 +1422,7 @@ gnttab_setup_table(
     struct gnttab_setup_table op;
     struct domain *d;
     struct grant_table *gt;
+    struct vcpu *v;
     int            i;
     xen_pfn_t  gmfn;
 
@@ -1427,6 +1464,17 @@ gnttab_setup_table(
     gt = d->grant_table;
     write_lock(&gt->lock);
 
+    /* Tracking of mapped foreign frames table */
+    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
+                                       max_maptrack_frames * d->max_vcpus)) == NULL )
+        goto out3;
+    for_each_vcpu( d, v )
+    {
+        v->maptrack_head = MAPTRACK_TAIL;
+        v->maptrack_tail = MAPTRACK_TAIL;
+    }
+    gt->maptrack_pages = 0;
+
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
 
@@ -3017,18 +3065,6 @@ grant_table_create(
             spin_lock_init(&t->active[i][j].lock);
     }
 
-    /* Tracking of mapped foreign frames table */
-    if ( (t->maptrack = xzalloc_array(struct grant_mapping *,
-                                      max_maptrack_frames)) == NULL )
-        goto no_mem_2;
-    if ( (t->maptrack[0] = alloc_xenheap_page()) == NULL )
-        goto no_mem_3;
-    clear_page(t->maptrack[0]);
-    t->maptrack_limit = MAPTRACK_PER_PAGE;
-    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-        t->maptrack[0][i - 1].ref = i;
-    t->maptrack[0][i - 1].ref = MAPTRACK_TAIL;
-
     /* Shared grant table. */
     if ( (t->shared_raw = xzalloc_array(void *, max_grant_frames)) == NULL )
         goto no_mem_3;
@@ -3059,8 +3095,6 @@ grant_table_create(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
  no_mem_3:
-    free_xenheap_page(t->maptrack[0]);
-    xfree(t->maptrack);
  no_mem_2:
     for ( i = 0;
           i < num_act_frames_from_sha_frames(INITIAL_NR_GRANT_FRAMES); i++ )
@@ -3088,7 +3122,8 @@ gnttab_release_mappings(
 
     BUG_ON(!d->is_dying);
 
-    for ( handle = 0; handle < gt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < gt->maptrack_pages * MAPTRACK_PER_PAGE;
+          handle++ )
     {
         map = &maptrack_entry(gt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) )
@@ -3193,7 +3228,7 @@ grant_table_destroy(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
 
-    for ( i = 0; i < nr_maptrack_frames(t); i++ )
+    for ( i = 0; i < t->maptrack_pages; i++ )
         free_xenheap_page(t->maptrack[i]);
     xfree(t->maptrack);
 
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 98cc94a..c10464c 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -60,6 +60,8 @@ struct grant_mapping {
     u32      ref;           /* grant ref */
     u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
     domid_t  domid;         /* granting domain */
+    u32      vcpu;          /* vcpu which created the grant mapping */
+    u32      pad;           /* round size to a power of 2 */
 };
 
 /* Per-domain grant information. */
@@ -83,10 +85,10 @@ struct grant_table {
     grant_status_t       **status;
     /* Active grant table. */
     struct active_grant_entry **active;
-    /* Mapping tracking table. */
+    /* Mapping tracking table per vcpu. */
     struct grant_mapping **maptrack;
-    unsigned int          maptrack_head;
-    unsigned int          maptrack_limit;
+    /* Total pages used for mapping tracking table */
+    unsigned int          maptrack_pages;
     /* Lock protecting the maptrack page list, head, and limit */
     spinlock_t            maptrack_lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 80c6f62..7f8422e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -219,6 +219,11 @@ struct vcpu
     /* VCPU paused by system controller. */
     int              controller_pause_count;
 
+    /* Maptrack */
+    unsigned int     maptrack_head;
+    unsigned int     maptrack_tail;
+    unsigned int     maptrack_limit;
+
     /* IRQ-safe virq_lock protects against delivering VIRQ to stale evtchn. */
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 0/3] gnttab: Improve scaleability
  2015-05-12 14:15 [PATCHv8 0/3] gnttab: Improve scaleability David Vrabel
                   ` (2 preceding siblings ...)
  2015-05-12 14:15 ` [PATCHv8 3/3] gnttab: use per-VCPU maptrack free lists David Vrabel
@ 2015-05-19  7:36 ` Jan Beulich
  3 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2015-05-19  7:36 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 12.05.15 at 16:15, <david.vrabel@citrix.com> wrote:
> The series makes the grant table locking more fine-grained and adds
> per-VCPU maptrack free lists, which greatly improves scalability.
> 
> The series builds on the original series by Matt Wilson and Christoph
> Egger from Amazon.
> 
> Performance results for aggregate intrahost network throughput
> (between 20 VM pairs, with 16 dom0 VCPUs) show substantial
> improvements.
> 
>                              Throughput/Gbit/s
> 
> Base                          9.7
> Split locks                  25.8
> Split locks                  42.9
>  + per-VCPU maptrack lists
> 
> v8:
>   * Misc locking fixes.

This is of course rather unhelpful a comment, especially without
the patches themselves also having an updates-in-v8 section.
May I please ask that going forward you include at least the
most recent revision's changes in each patch?

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state
  2015-05-12 14:15 ` [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state David Vrabel
@ 2015-05-19  8:02   ` Jan Beulich
  2015-05-19 13:20     ` David Vrabel
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2015-05-19  8:02 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	MattWilson, xen-devel, Malcolm Crossley

>>> On 12.05.15 at 16:15, <david.vrabel@citrix.com> wrote:
> @@ -629,7 +629,7 @@ __gnttab_map_grant_ref(
>      }
>  
>      rgt = rd->grant_table;
> -    spin_lock(&rgt->lock);
> +    read_lock(&rgt->lock);
>  
>      if ( rgt->gt_version == 0 )
>          PIN_FAIL(unlock_out, GNTST_general_error,
> @@ -702,7 +702,7 @@ __gnttab_map_grant_ref(
>  
>      cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
>  
> -    spin_unlock(&rgt->lock);
> +    read_unlock(&rgt->lock);

Lock dropped once, and ...

> @@ -778,7 +778,7 @@ __gnttab_map_grant_ref(
>          goto undo_out;
>      }
>  
> -    double_gt_lock(lgt, rgt);
> +    double_maptrack_lock(lgt, rgt);
>  
>      if ( gnttab_need_iommu_mapping(ld) )
>      {
> @@ -801,7 +801,7 @@ __gnttab_map_grant_ref(
>          }
>          if ( err )
>          {
> -            double_gt_unlock(lgt, rgt);
> +            double_maptrack_unlock(lgt, rgt);
>              rc = GNTST_general_error;
>              goto undo_out;
>          }
> @@ -814,7 +814,8 @@ __gnttab_map_grant_ref(
>      mt->ref   = op->ref;
>      mt->flags = op->flags;
>  
> -    double_gt_unlock(lgt, rgt);
> +    double_maptrack_unlock(lgt, rgt);
> +    read_unlock(&rgt->lock);

... again without being re-acquired another time. Since this appears
to be a recurring theme on this first patch - did this patch alone ever
get tested (I'm sure I'll again find one of the following patches to fix
this issue as a "side effect")?

> @@ -939,7 +941,9 @@ __gnttab_unmap_common(
>      TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
>  
>      rgt = rd->grant_table;
> -    double_gt_lock(lgt, rgt);
> +
> +    read_lock(&rgt->lock);
> +    double_maptrack_lock(lgt, rgt);
>  
>      op->flags = op->map->flags;
>      if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
> @@ -1009,7 +1013,9 @@ __gnttab_unmap_common(
>           gnttab_mark_dirty(rd, op->frame);
>  
>   unmap_out:
> -    double_gt_unlock(lgt, rgt);
> +    double_maptrack_unlock(lgt, rgt);
> +    read_unlock(&rgt->lock);

Am I right to conclude from this that the fix for the issue above is
to add a read_lock(), not to drop the unbalanced read_unlock()?

Which then of course raises the question - is taking the read lock
here and in several other places really sufficient? The thing that the
original spin lock appears to protect here is not only the grant table
structure itself, but also the active entry. I.e. relaxing to a read
lock would seem possible only after having put per-active-entry
locking in place.

> @@ -64,6 +64,11 @@ struct grant_mapping {
>  
>  /* Per-domain grant information. */
>  struct grant_table {
> +    /* 
> +     * Lock protecting updates to grant table state (version, active
> +     * entry list, etc.)
> +     */
> +    rwlock_t              lock;
>      /* Table size. Number of frames shared with guest */
>      unsigned int          nr_grant_frames;
>      /* Shared grant table (see include/public/grant_table.h). */
> @@ -82,8 +87,8 @@ struct grant_table {
>      struct grant_mapping **maptrack;
>      unsigned int          maptrack_head;
>      unsigned int          maptrack_limit;
> -    /* Lock protecting updates to active and shared grant tables. */
> -    spinlock_t            lock;
> +    /* Lock protecting the maptrack page list, head, and limit */
> +    spinlock_t            maptrack_lock;
>      /* The defined versions are 1 and 2.  Set to 0 if we don't know
>         what version to use yet. */
>      unsigned              gt_version;

So in order for fields protected by one of the locks to be as likely
as possible on the same cache line as the lock itself, I think
gt_version should also be moved up in the structure. We may
even want/need to add some separator between basic and
maptrack data to ensure they end up on different cache lines.

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 2/3] gnttab: per-active entry locking
  2015-05-12 14:15 ` [PATCHv8 2/3] gnttab: per-active entry locking David Vrabel
@ 2015-05-19  8:27   ` Jan Beulich
  2015-05-19 13:46     ` David Vrabel
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2015-05-19  8:27 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

>>> On 12.05.15 at 16:15, <david.vrabel@citrix.com> wrote:
> +static inline struct active_grant_entry *
> +active_entry_acquire(struct grant_table *t, grant_ref_t e)
> +{
> +    struct active_grant_entry *act;
> +
> +    /* 
> +     * not perfect, but better than nothing for a debug build
> +     * sanity check
> +     */

Comment style.

> @@ -546,15 +554,28 @@ static void mapcount(
>  
>      *wrc = *rdc = 0;
>  
> +    /* 
> +     * N.B.: while taking the local maptrack spinlock prevents any
> +     * mapping changes, the remote active entries could be changing
> +     * while we are counting. The caller has to hold the grant table
> +     * write lock, or some other mechanism should be used to prevent
> +     * concurrent changes during this operation.  This is tricky
> +     * because we can't promote a read lock into a write lock.
> +     */
> +    ASSERT(rw_is_locked(&rd->grant_table->lock));

So both callers only hold the read lock - what is this "other mechanism"
that they employ?

> @@ -2602,9 +2629,17 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
>  {
>      struct domain *d = rcu_lock_current_domain();
>      struct grant_table *gt = d->grant_table;
> -    struct active_grant_entry *act;
> +    struct active_grant_entry *act_a = NULL;
> +    struct active_grant_entry *act_b = NULL;
>      s16 rc = GNTST_okay;
>  
> +    if ( ref_a == ref_b )
> +        /*
> +         * noop, so avoid acquiring the same active entry
> +         * twice which would case a deadlock.

cause

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state
  2015-05-19  8:02   ` Jan Beulich
@ 2015-05-19 13:20     ` David Vrabel
  2015-05-19 13:31       ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: David Vrabel @ 2015-05-19 13:20 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	MattWilson, xen-devel, Malcolm Crossley

On 19/05/15 09:02, Jan Beulich wrote:
> 
> Which then of course raises the question - is taking the read lock
> here and in several other places really sufficient? The thing that the
> original spin lock appears to protect here is not only the grant table
> structure itself, but also the active entry. I.e. relaxing to a read
> lock would seem possible only after having put per-active-entry
> locking in place.

Yes, this series was split the wrong way round.  I could:

a) squash the first two patches back together;
b) refactor the first two patches into
   - introduce active entry locks
   - introduce maptrack lock
   - convert grant table lock to rw lock.

Which approach is preferred (obviously (a) is a lot less work)?

>> @@ -64,6 +64,11 @@ struct grant_mapping {
>>  
>>  /* Per-domain grant information. */
>>  struct grant_table {
>> +    /* 
>> +     * Lock protecting updates to grant table state (version, active
>> +     * entry list, etc.)
>> +     */
>> +    rwlock_t              lock;
>>      /* Table size. Number of frames shared with guest */
>>      unsigned int          nr_grant_frames;
>>      /* Shared grant table (see include/public/grant_table.h). */
>> @@ -82,8 +87,8 @@ struct grant_table {
>>      struct grant_mapping **maptrack;
>>      unsigned int          maptrack_head;
>>      unsigned int          maptrack_limit;
>> -    /* Lock protecting updates to active and shared grant tables. */
>> -    spinlock_t            lock;
>> +    /* Lock protecting the maptrack page list, head, and limit */
>> +    spinlock_t            maptrack_lock;
>>      /* The defined versions are 1 and 2.  Set to 0 if we don't know
>>         what version to use yet. */
>>      unsigned              gt_version;
> 
> So in order for fields protected by one of the locks to be as likely
> as possible on the same cache line as the lock itself, I think
> gt_version should also be moved up in the structure. We may
> even want/need to add some separator between basic and
> maptrack data to ensure they end up on different cache lines.

I think this sort of structure field shuffling should be a follow on patch.

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state
  2015-05-19 13:20     ` David Vrabel
@ 2015-05-19 13:31       ` Jan Beulich
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2015-05-19 13:31 UTC (permalink / raw)
  To: David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	MattWilson, xen-devel, Malcolm Crossley

>>> On 19.05.15 at 15:20, <david.vrabel@citrix.com> wrote:
> On 19/05/15 09:02, Jan Beulich wrote:
>> 
>> Which then of course raises the question - is taking the read lock
>> here and in several other places really sufficient? The thing that the
>> original spin lock appears to protect here is not only the grant table
>> structure itself, but also the active entry. I.e. relaxing to a read
>> lock would seem possible only after having put per-active-entry
>> locking in place.
> 
> Yes, this series was split the wrong way round.  I could:
> 
> a) squash the first two patches back together;
> b) refactor the first two patches into
>    - introduce active entry locks
>    - introduce maptrack lock
>    - convert grant table lock to rw lock.
> 
> Which approach is preferred (obviously (a) is a lot less work)?

Unless it results in a very hard to review patch, I guess I'd be fine
with (a). Another alternative I would see is to have the current
first patch use write_lock() in most places, and convert them to
read_lock() as the active entry locks get introduced (which I'd
expect to still be less work than (b)).

>>> @@ -64,6 +64,11 @@ struct grant_mapping {
>>>  
>>>  /* Per-domain grant information. */
>>>  struct grant_table {
>>> +    /* 
>>> +     * Lock protecting updates to grant table state (version, active
>>> +     * entry list, etc.)
>>> +     */
>>> +    rwlock_t              lock;
>>>      /* Table size. Number of frames shared with guest */
>>>      unsigned int          nr_grant_frames;
>>>      /* Shared grant table (see include/public/grant_table.h). */
>>> @@ -82,8 +87,8 @@ struct grant_table {
>>>      struct grant_mapping **maptrack;
>>>      unsigned int          maptrack_head;
>>>      unsigned int          maptrack_limit;
>>> -    /* Lock protecting updates to active and shared grant tables. */
>>> -    spinlock_t            lock;
>>> +    /* Lock protecting the maptrack page list, head, and limit */
>>> +    spinlock_t            maptrack_lock;
>>>      /* The defined versions are 1 and 2.  Set to 0 if we don't know
>>>         what version to use yet. */
>>>      unsigned              gt_version;
>> 
>> So in order for fields protected by one of the locks to be as likely
>> as possible on the same cache line as the lock itself, I think
>> gt_version should also be moved up in the structure. We may
>> even want/need to add some separator between basic and
>> maptrack data to ensure they end up on different cache lines.
> 
> I think this sort of structure field shuffling should be a follow on patch.

Sure, if you so prefer (with "lock" being moved I personally
wouldn't mind gt_version to also move).

Jan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv8 2/3] gnttab: per-active entry locking
  2015-05-19  8:27   ` Jan Beulich
@ 2015-05-19 13:46     ` David Vrabel
  0 siblings, 0 replies; 10+ messages in thread
From: David Vrabel @ 2015-05-19 13:46 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: Keir Fraser, Ian Campbell, Christoph Egger, Tim Deegan,
	Matt Wilson, xen-devel, Malcolm Crossley

On 19/05/15 09:27, Jan Beulich wrote:
>>>> On 12.05.15 at 16:15, <david.vrabel@citrix.com> wrote:
>> @@ -546,15 +554,28 @@ static void mapcount(
>>  
>>      *wrc = *rdc = 0;
>>  
>> +    /* 
>> +     * N.B.: while taking the local maptrack spinlock prevents any
>> +     * mapping changes, the remote active entries could be changing
>> +     * while we are counting. The caller has to hold the grant table
>> +     * write lock, or some other mechanism should be used to prevent
>> +     * concurrent changes during this operation.  This is tricky
>> +     * because we can't promote a read lock into a write lock.
>> +     */
>> +    ASSERT(rw_is_locked(&rd->grant_table->lock));
> 
> So both callers only hold the read lock - what is this "other mechanism"
> that they employ?

This comment is just nonsense.  We must hold the grant table write lock
for mapcount() to be safe -- the maptrack lock doesn't help and as you
say, this "other mechanism" just doesn't exist.

I think the callers of mapcount() need to be refactored so iommu updates
and the mapcount() call are done with the write lock held only (the
maptrack nor active entry locks are relevant here).

David

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-05-19 13:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 14:15 [PATCHv8 0/3] gnttab: Improve scaleability David Vrabel
2015-05-12 14:15 ` [PATCHv8 1/3] gnttab: Introduce rwlock to protect updates to grant table state David Vrabel
2015-05-19  8:02   ` Jan Beulich
2015-05-19 13:20     ` David Vrabel
2015-05-19 13:31       ` Jan Beulich
2015-05-12 14:15 ` [PATCHv8 2/3] gnttab: per-active entry locking David Vrabel
2015-05-19  8:27   ` Jan Beulich
2015-05-19 13:46     ` David Vrabel
2015-05-12 14:15 ` [PATCHv8 3/3] gnttab: use per-VCPU maptrack free lists David Vrabel
2015-05-19  7:36 ` [PATCHv8 0/3] gnttab: Improve scaleability Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.