All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv9 0/3] gnttab: Improve scaleability
@ 2015-05-20 15:54 David Vrabel
  2015-05-20 15:54 ` [PATCHv9 1/4] gnttab: per-active entry locking David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-20 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Tim Deegan, David Vrabel, Jan Beulich, Ian Campbell

The series builds on the original series by Matt Wilson and Christoph
Egger from Amazon.

Performance results for aggregate intrahost network throughput
(between 20 VM pairs, with 16 dom0 VCPUs) show substantial
improvements.

                             Throughput/Gbit/s

Base                          9.7
Split locks                  25.8
Split locks                  42.9
 + per-VCPU maptrack lists

v9:
  * Refactor the locking patches into three commits:
      1. Add per-active entry locking.
      2. Add maptrack_lock.
      3. Make the grant table lock a read-write lock.
  * Keep the double lock around IOMMU updates, but adjust their scope
    to only the IOMMU updates.

v8:
  * Misc locking fixes.
  * Use read/write_atomic() in per-VCPU maptrack free list.
  * Improve locking docs.

v7:
  * Re-add grant table rwlock (locking was broken without it and the
    gains are minimal anyway).
  * Remove unneeded locks from grant_table_destroy().
  * Fix get_maptrack_handle() locking.
v6:
  * Remove most uses of the grant table lock.
  * Make the grant table lock a spin lock again (there were only
    writers left after the above)
  * Add per-VCPU maptrack free lists.
v5:
  * Addressed locking issue pointed out by Jan Beulich
  * Fixed git rebase merge issue introduced in v4
    (acquiring locking twice)
  * Change for ()-loop in grant_map_exists
  * Coding style fixes
v4:
  * Coding style nits from Jan Beulich
  * Fixup read locks pointed out by Jan Beulich
  * renamed double_gt_(un)lock to double_maptrack_(un)lock
    per request from Jan Beulich
  * Addressed ASSERT()'s from Jan Beulich
  * Addressed locking issues in unmap_common pointed out
    by Jan Beulich
v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
  * Add arm part per request from Julien Grall

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv9 1/4] gnttab: per-active entry locking
  2015-05-20 15:54 [PATCHv9 0/3] gnttab: Improve scaleability David Vrabel
@ 2015-05-20 15:54 ` David Vrabel
  2015-05-21  7:46   ` Jan Beulich
  2015-05-20 15:54 ` [PATCHv9 2/4] gnttab: introduce maptrack lock David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-05-20 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Tim Deegan, David Vrabel, Jan Beulich, Ian Campbell

Introduce a per-active entry spin lock to protect active entry state
The grant table lock must be locked before acquiring (locking) an
active entry.

This is a step in reducing contention on the grant table lock, but
will only do so once the grant table lock is turned into a read-write
lock.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt |   42 ++++++++++++-
 xen/common/grant_table.c   |  146 ++++++++++++++++++++++++++++++++------------
 2 files changed, 147 insertions(+), 41 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..2a3f591 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
   act->domid : remote domain being granted rights
   act->frame : machine frame being granted
   act->pin   : used to hold reference counts
+  act->lock  : spinlock used to serialize access to active entry state
 
  Map tracking
  ~~~~~~~~~~~~
@@ -74,7 +75,46 @@ is complete.
  matching map track entry is then removed, as if unmap had been invoked.
  These are not used by the transfer mechanism.
   map->domid         : owner of the mapped frame
-  map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+  map->ref           : grant reference
+  map->flags         : ro/rw, mapped for host or device access
+
+********************************************************************************
+ Locking
+ ~~~~~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+  grant_table->lock          : lock used to prevent readers from accessing
+                               inconsistent grant table state such as current
+                               version, partially initialized active table pages,
+                               etc.
+  active_grant_entry->lock   : spinlock used to serialize modifications to
+                               active entries
+
+ The primary lock for the grant table is a spinlock. All functions
+ that access members of struct grant_table must acquire the lock
+ around critical sections.
+
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. The caller must hold the grant table lock for the gt in
+ question before calling active_entry_acquire(). This is because the
+ grant table can be dynamically extended via gnttab_grow_table() while
+ a domain is running and must be fully initialized. Once all access to
+ the active entry is complete, release the lock by calling
+ active_entry_release(act).
+
+ Summary of rules for locking:
+  active_entry_acquire() and active_entry_release() can only be
+  called when holding the relevant grant table's lock. I.e.:
+    spin_lock(&gt->lock);
+    act = active_entry_acquire(gt, ref);
+    ...
+    active_entry_release(act);
+    spin_unlock(&gt->lock);
+
+ Active entries cannot be acquired while holding the maptrack lock.
+ Multiple active entries can be acquired while holding the grant table
+ lock.
 
 ********************************************************************************
 
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index dfb45f8..bb90627 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
                                in the page.                           */
     unsigned      length:16; /* For sub-page grants, the length of the
                                 grant.                                */
+    spinlock_t    lock;      /* lock to protect access of this entry.
+                                see docs/misc/grant-tables.txt for
+                                locking protocol                      */
 };
 
 #define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
     ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
 
 static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,24 @@ nr_active_grant_frames(struct grant_table *gt)
     return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+    struct active_grant_entry *act;
+
+    ASSERT(spin_is_locked(&t->lock));
+
+    act = &_active_entry(t, e);
+    spin_lock(&act->lock);
+
+    return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+    spin_unlock(&act->lock);
+}
+
 /* Check if the page has been paged out, or needs unsharing. 
    If rc == GNTST_okay, *page contains the page struct with a ref taken.
    Caller must do put_page(*page).
@@ -505,7 +526,7 @@ static int grant_map_exists(const struct domain *ld,
                             unsigned long mfn,
                             unsigned int *ref_count)
 {
-    const struct active_grant_entry *act;
+    struct active_grant_entry *act;
     unsigned int ref, max_iter;
     
     ASSERT(spin_is_locked(&rgt->lock));
@@ -514,18 +535,18 @@ static int grant_map_exists(const struct domain *ld,
                    nr_grant_entries(rgt));
     for ( ref = *ref_count; ref < max_iter; ref++ )
     {
-        act = &active_entry(rgt, ref);
+        bool_t exists;
 
-        if ( !act->pin )
-            continue;
+        act = active_entry_acquire(rgt, ref);
 
-        if ( act->domid != ld->domain_id )
-            continue;
+        exists = act->pin
+            && act->domid == ld->domain_id
+            && act->frame != mfn;
 
-        if ( act->frame != mfn )
-            continue;
+        active_entry_release(act);
 
-        return 0;
+        if ( exists )
+            return 0;
     }
 
     if ( ref < nr_grant_entries(rgt) )
@@ -546,13 +567,19 @@ static void mapcount(
 
     *wrc = *rdc = 0;
 
+    /*
+     * Must have the remote domain's grant table lock while counting
+     * its active entries.
+     */
+    ASSERT(spin_is_locked(&rd->grant_table->lock));
+
     for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
     {
         map = &maptrack_entry(lgt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
              map->domid != rd->domain_id )
             continue;
-        if ( active_entry(rd->grant_table, map->ref).frame == mfn )
+        if ( _active_entry(rd->grant_table, map->ref).frame == mfn )
             (map->flags & GNTMAP_readonly) ? (*rdc)++ : (*wrc)++;
     }
 }
@@ -639,7 +666,7 @@ __gnttab_map_grant_ref(
     if ( unlikely(op->ref >= nr_grant_entries(rgt)))
         PIN_FAIL(unlock_out, GNTST_bad_gntref, "Bad ref (%d).\n", op->ref);
 
-    act = &active_entry(rgt, op->ref);
+    act = active_entry_acquire(rgt, op->ref);
     shah = shared_entry_header(rgt, op->ref);
     if (rgt->gt_version == 1) {
         sha1 = &shared_entry_v1(rgt, op->ref);
@@ -656,7 +683,7 @@ __gnttab_map_grant_ref(
          ((act->domid != ld->domain_id) ||
           (act->pin & 0x80808080U) != 0 ||
           (act->is_sub_page)) )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(act_release_out, GNTST_general_error,
                  "Bad domain (%d != %d), or risk of counter overflow %08x, or subpage %d\n",
                  act->domid, ld->domain_id, act->pin, act->is_sub_page);
 
@@ -667,7 +694,7 @@ __gnttab_map_grant_ref(
         if ( (rc = _set_status(rgt->gt_version, ld->domain_id,
                                op->flags & GNTMAP_readonly,
                                1, shah, act, status) ) != GNTST_okay )
-             goto unlock_out;
+            goto act_release_out;
 
         if ( !act->pin )
         {
@@ -702,6 +729,7 @@ __gnttab_map_grant_ref(
 
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
+    active_entry_release(act);
     spin_unlock(&rgt->lock);
 
     /* pg may be set, with a refcount included, from __get_paged_frame */
@@ -839,7 +867,7 @@ __gnttab_map_grant_ref(
 
     spin_lock(&rgt->lock);
 
-    act = &active_entry(rgt, op->ref);
+    act = active_entry_acquire(rgt, op->ref);
 
     if ( op->flags & GNTMAP_device_map )
         act->pin -= (op->flags & GNTMAP_readonly) ?
@@ -856,6 +884,9 @@ __gnttab_map_grant_ref(
     if ( !act->pin )
         gnttab_clear_flag(_GTF_reading, status);
 
+ act_release_out:
+    active_entry_release(act);
+
  unlock_out:
     spin_unlock(&rgt->lock);
     op->status = rc;
@@ -950,7 +981,7 @@ __gnttab_unmap_common(
     }
 
     op->rd = rd;
-    act = &active_entry(rgt, op->map->ref);
+    act = active_entry_acquire(rgt, op->map->ref);
 
     if ( op->frame == 0 )
     {
@@ -959,7 +990,7 @@ __gnttab_unmap_common(
     else
     {
         if ( unlikely(op->frame != act->frame) )
-            PIN_FAIL(unmap_out, GNTST_general_error,
+            PIN_FAIL(act_release_out, GNTST_general_error,
                      "Bad frame number doesn't match gntref. (%lx != %lx)\n",
                      op->frame, act->frame);
         if ( op->flags & GNTMAP_device_map )
@@ -978,7 +1009,7 @@ __gnttab_unmap_common(
         if ( (rc = replace_grant_host_mapping(op->host_addr,
                                               op->frame, op->new_addr, 
                                               op->flags)) < 0 )
-            goto unmap_out;
+            goto act_release_out;
 
         ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
         op->map->flags &= ~GNTMAP_host_map;
@@ -1000,7 +1031,7 @@ __gnttab_unmap_common(
         if ( err )
         {
             rc = GNTST_general_error;
-            goto unmap_out;
+            goto act_release_out;
         }
     }
 
@@ -1008,8 +1039,11 @@ __gnttab_unmap_common(
     if ( !(op->flags & GNTMAP_readonly) )
          gnttab_mark_dirty(rd, op->frame);
 
+ act_release_out:
+    active_entry_release(act);
  unmap_out:
     double_gt_unlock(lgt, rgt);
+
     op->status = rc;
     rcu_unlock_domain(rd);
 }
@@ -1042,9 +1076,9 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     spin_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
-        goto unmap_out;
+        goto unlock_out;
 
-    act = &active_entry(rgt, op->map->ref);
+    act = active_entry_acquire(rgt, op->map->ref);
     sha = shared_entry_header(rgt, op->map->ref);
 
     if ( rgt->gt_version == 1 )
@@ -1058,7 +1092,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
          * Suggests that __gntab_unmap_common failed early and so
          * nothing further to do
          */
-        goto unmap_out;
+        goto act_release_out;
     }
 
     pg = mfn_to_page(op->frame);
@@ -1082,7 +1116,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
              * Suggests that __gntab_unmap_common failed in
              * replace_grant_host_mapping() so nothing further to do
              */
-            goto unmap_out;
+            goto act_release_out;
         }
 
         if ( !is_iomem_page(op->frame) ) 
@@ -1103,8 +1137,11 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
     if ( act->pin == 0 )
         gnttab_clear_flag(_GTF_reading, status);
 
- unmap_out:
+ act_release_out:
+    active_entry_release(act);
+ unlock_out:
     spin_unlock(&rgt->lock);
+
     if ( put_handle )
     {
         op->map->flags = 0;
@@ -1296,7 +1333,7 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
     /* d's grant table lock must be held by the caller */
 
     struct grant_table *gt = d->grant_table;
-    unsigned int i;
+    unsigned int i, j;
 
     ASSERT(req_nr_frames <= max_grant_frames);
 
@@ -1311,6 +1348,8 @@ gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
         if ( (gt->active[i] = alloc_xenheap_page()) == NULL )
             goto active_alloc_failed;
         clear_page(gt->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&gt->active[i][j].lock);
     }
 
     /* Shared */
@@ -1806,7 +1845,7 @@ __release_grant_for_copy(
 
     spin_lock(&rgt->lock);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
     r_frame = act->frame;
 
@@ -1845,6 +1884,7 @@ __release_grant_for_copy(
         released_read = 1;
     }
 
+    active_entry_release(act);
     spin_unlock(&rgt->lock);
 
     if ( td != rd )
@@ -1906,14 +1946,14 @@ __acquire_grant_for_copy(
     spin_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
-        PIN_FAIL(unlock_out, GNTST_general_error,
+        PIN_FAIL(gnt_unlock_out, GNTST_general_error,
                  "remote grant table not ready\n");
 
     if ( unlikely(gref >= nr_grant_entries(rgt)) )
-        PIN_FAIL(unlock_out, GNTST_bad_gntref,
+        PIN_FAIL(gnt_unlock_out, GNTST_bad_gntref,
                  "Bad grant reference %ld\n", gref);
 
-    act = &active_entry(rgt, gref);
+    act = active_entry_acquire(rgt, gref);
     shah = shared_entry_header(rgt, gref);
     if ( rgt->gt_version == 1 )
     {
@@ -1972,6 +2012,13 @@ __acquire_grant_for_copy(
                 PIN_FAIL(unlock_out_clear, GNTST_general_error,
                          "transitive grant referenced bad domain %d\n",
                          trans_domid);
+
+            /*
+             * __acquire_grant_for_copy() could take the lock on the
+             * remote table (if rd == td), so we have to drop the lock
+             * here and reacquire
+             */
+            active_entry_release(act);
             spin_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
@@ -1979,8 +2026,11 @@ __acquire_grant_for_copy(
                                           &trans_page_off, &trans_length, 0);
 
             spin_lock(&rgt->lock);
+            act = active_entry_acquire(rgt, gref);
+
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
+                active_entry_release(act);
                 rcu_unlock_domain(td);
                 spin_unlock(&rgt->lock);
                 return rc;
@@ -1994,6 +2044,7 @@ __acquire_grant_for_copy(
             {
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
+                active_entry_release(act);
                 spin_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
@@ -2062,6 +2113,7 @@ __acquire_grant_for_copy(
     *length = act->length;
     *frame = act->frame;
 
+    active_entry_release(act);
     spin_unlock(&rgt->lock);
     return rc;
  
@@ -2074,7 +2126,11 @@ __acquire_grant_for_copy(
         gnttab_clear_flag(_GTF_reading, status);
 
  unlock_out:
+    active_entry_release(act);
+
+ gnt_unlock_out:
     spin_unlock(&rgt->lock);
+
     return rc;
 }
 
@@ -2374,7 +2430,6 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     gnttab_set_version_t op;
     struct domain *d = current->domain;
     struct grant_table *gt = d->grant_table;
-    struct active_grant_entry *act;
     grant_entry_v1_t reserved_entries[GNTTAB_NR_RESERVED_ENTRIES];
     long res;
     int i;
@@ -2399,8 +2454,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     {
         for ( i = GNTTAB_NR_RESERVED_ENTRIES; i < nr_grant_entries(gt); i++ )
         {
-            act = &active_entry(gt, i);
-            if ( act->pin != 0 )
+            if ( read_atomic(&_active_entry(gt, i).pin) != 0 )
             {
                 gdprintk(XENLOG_WARNING,
                          "tried to change grant table version from %d to %d, but some grant entries still in use\n",
@@ -2587,7 +2641,8 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
 {
     struct domain *d = rcu_lock_current_domain();
     struct grant_table *gt = d->grant_table;
-    struct active_grant_entry *act;
+    struct active_grant_entry *act_a = NULL;
+    struct active_grant_entry *act_b = NULL;
     s16 rc = GNTST_okay;
 
     spin_lock(&gt->lock);
@@ -2598,12 +2653,12 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     if ( unlikely(ref_b >= nr_grant_entries(d->grant_table)))
         PIN_FAIL(out, GNTST_bad_gntref, "Bad ref-b (%d).\n", ref_b);
 
-    act = &active_entry(gt, ref_a);
-    if ( act->pin )
+    act_a = active_entry_acquire(gt, ref_a);
+    if ( act_a->pin )
         PIN_FAIL(out, GNTST_eagain, "ref a %ld busy\n", (long)ref_a);
 
-    act = &active_entry(gt, ref_b);
-    if ( act->pin )
+    act_b = active_entry_acquire(gt, ref_b);
+    if ( act_b->pin )
         PIN_FAIL(out, GNTST_eagain, "ref b %ld busy\n", (long)ref_b);
 
     if ( gt->gt_version == 1 )
@@ -2630,6 +2685,10 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     }
 
 out:
+    if ( act_b != NULL )
+        active_entry_release(act_b);
+    if ( act_a != NULL )
+        active_entry_release(act_a);
     spin_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
@@ -2939,7 +2998,7 @@ grant_table_create(
     struct domain *d)
 {
     struct grant_table *t;
-    int                 i;
+    unsigned int i, j;
 
     if ( (t = xzalloc(struct grant_table)) == NULL )
         goto no_mem_0;
@@ -2958,6 +3017,8 @@ grant_table_create(
         if ( (t->active[i] = alloc_xenheap_page()) == NULL )
             goto no_mem_2;
         clear_page(t->active[i]);
+        for ( j = 0; j < ACGNT_PER_PAGE; j++ )
+            spin_lock_init(&t->active[i][j].lock);
     }
 
     /* Tracking of mapped foreign frames table */
@@ -3054,7 +3115,7 @@ gnttab_release_mappings(
         rgt = rd->grant_table;
         spin_lock(&rgt->lock);
 
-        act = &active_entry(rgt, ref);
+        act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
         if (rgt->gt_version == 1)
             status = &sha->flags;
@@ -3112,6 +3173,7 @@ gnttab_release_mappings(
         if ( act->pin == 0 )
             gnttab_clear_flag(_GTF_reading, status);
 
+        active_entry_release(act);
         spin_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
@@ -3174,9 +3236,12 @@ static void gnttab_usage_print(struct domain *rd)
         uint16_t status;
         uint64_t frame;
 
-        act = &active_entry(gt, ref);
+        act = active_entry_acquire(gt, ref);
         if ( !act->pin )
+        {
+            active_entry_release(act);
             continue;
+        }
 
         sha = shared_entry_header(gt, ref);
 
@@ -3206,6 +3271,7 @@ static void gnttab_usage_print(struct domain *rd)
         printk("[%3d]    %5d 0x%06lx 0x%08x      %5d 0x%06"PRIx64" 0x%02x\n",
                ref, act->domid, act->frame, act->pin,
                sha->domid, frame, status);
+        active_entry_release(act);
     }
 
  out:
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv9 2/4] gnttab: introduce maptrack lock
  2015-05-20 15:54 [PATCHv9 0/3] gnttab: Improve scaleability David Vrabel
  2015-05-20 15:54 ` [PATCHv9 1/4] gnttab: per-active entry locking David Vrabel
@ 2015-05-20 15:54 ` David Vrabel
  2015-05-20 15:54 ` [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
  2015-05-20 15:54 ` [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
  3 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-20 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Tim Deegan, David Vrabel, Jan Beulich, Ian Campbell

Split grant table lock into two separate locks. One to protect
maptrack state (maptrack_lock) and one for everything else (lock).

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt    |    9 +++++++++
 xen/common/grant_table.c      |    9 +++++----
 xen/include/xen/grant_table.h |    2 ++
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 2a3f591..83b3454 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -87,6 +87,7 @@ is complete.
                                inconsistent grant table state such as current
                                version, partially initialized active table pages,
                                etc.
+  grant_table->maptrack_lock : spinlock used to protect the maptrack state
   active_grant_entry->lock   : spinlock used to serialize modifications to
                                active entries
 
@@ -94,6 +95,14 @@ is complete.
  that access members of struct grant_table must acquire the lock
  around critical sections.
 
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
+
+ The maptrack lock may be locked while holding the grant table lock.
+
  Active entries are obtained by calling active_entry_acquire(gt, ref).
  This function returns a pointer to the active entry after locking its
  spinlock. The caller must hold the grant table lock for the gt in
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index bb90627..fee34ee 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -288,10 +288,10 @@ static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->lock);
+    spin_lock(&t->maptrack_lock);
     maptrack_entry(t, handle).ref = t->maptrack_head;
     t->maptrack_head = handle;
-    spin_unlock(&t->lock);
+    spin_unlock(&t->maptrack_lock);
 }
 
 static inline int
@@ -303,7 +303,7 @@ get_maptrack_handle(
     struct grant_mapping *new_mt;
     unsigned int          new_mt_limit, nr_frames;
 
-    spin_lock(&lgt->lock);
+    spin_lock(&lgt->maptrack_lock);
 
     while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
     {
@@ -332,7 +332,7 @@ get_maptrack_handle(
                  nr_frames + 1);
     }
 
-    spin_unlock(&lgt->lock);
+    spin_unlock(&lgt->maptrack_lock);
 
     return handle;
 }
@@ -3005,6 +3005,7 @@ grant_table_create(
 
     /* Simple stuff. */
     spin_lock_init(&t->lock);
+    spin_lock_init(&t->maptrack_lock);
     t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
 
     /* Active grant table. */
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 32f5786..0b35a5e 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -82,6 +82,8 @@ struct grant_table {
     struct grant_mapping **maptrack;
     unsigned int          maptrack_head;
     unsigned int          maptrack_limit;
+    /* Lock protecting the maptrack page list, head, and limit */
+    spinlock_t            maptrack_lock;
     /* Lock protecting updates to active and shared grant tables. */
     spinlock_t            lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-20 15:54 [PATCHv9 0/3] gnttab: Improve scaleability David Vrabel
  2015-05-20 15:54 ` [PATCHv9 1/4] gnttab: per-active entry locking David Vrabel
  2015-05-20 15:54 ` [PATCHv9 2/4] gnttab: introduce maptrack lock David Vrabel
@ 2015-05-20 15:54 ` David Vrabel
  2015-05-21 10:32   ` Jan Beulich
  2015-05-20 15:54 ` [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
  3 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-05-20 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Tim Deegan, David Vrabel, Jan Beulich, Ian Campbell

In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).

The write lock is also required when two active entries must be
acquired.

The double lock is still required when updating IOMMU page tables.

With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.

Based on a patch originally by Matt Wilson <msw@amazon.com>.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 docs/misc/grant-tables.txt    |   30 +++++----
 xen/arch/arm/mm.c             |    4 +-
 xen/arch/x86/mm.c             |    4 +-
 xen/common/grant_table.c      |  148 ++++++++++++++++++++++-------------------
 xen/include/xen/grant_table.h |    9 ++-
 5 files changed, 105 insertions(+), 90 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 83b3454..9d9d01f 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -83,7 +83,7 @@ is complete.
  ~~~~~~~
  Xen uses several locks to serialize access to the internal grant table state.
 
-  grant_table->lock          : lock used to prevent readers from accessing
+  grant_table->lock          : rwlock used to prevent readers from accessing
                                inconsistent grant table state such as current
                                version, partially initialized active table pages,
                                etc.
@@ -91,9 +91,13 @@ is complete.
   active_grant_entry->lock   : spinlock used to serialize modifications to
                                active entries
 
- The primary lock for the grant table is a spinlock. All functions
- that access members of struct grant_table must acquire the lock
- around critical sections.
+ The primary lock for the grant table is a read/write spinlock. All
+ functions that access members of struct grant_table must acquire a
+ read lock around critical sections. Any modification to the members
+ of struct grant_table (e.g., nr_status_frames, nr_grant_frames,
+ active frames, etc.) must only be made if the write lock is
+ held. These elements are read-mostly, and read critical sections can
+ be large, which makes a rwlock a good choice.
 
  The maptrack state is protected by its own spinlock. Any access (read
  or write) of struct grant_table members that have a "maptrack_"
@@ -105,25 +109,25 @@ is complete.
 
  Active entries are obtained by calling active_entry_acquire(gt, ref).
  This function returns a pointer to the active entry after locking its
- spinlock. The caller must hold the grant table lock for the gt in
- question before calling active_entry_acquire(). This is because the
- grant table can be dynamically extended via gnttab_grow_table() while
- a domain is running and must be fully initialized. Once all access to
- the active entry is complete, release the lock by calling
+ spinlock. The caller must hold the grant table read lock before
+ calling active_entry_acquire(). This is because the grant table can
+ be dynamically extended via gnttab_grow_table() while a domain is
+ running and must be fully initialized. Once all access to the active
+ entry is complete, release the lock by calling
  active_entry_release(act).
 
  Summary of rules for locking:
   active_entry_acquire() and active_entry_release() can only be
-  called when holding the relevant grant table's lock. I.e.:
-    spin_lock(&gt->lock);
+  called when holding the relevant grant table's read lock. I.e.:
+    read_lock(&gt->lock);
     act = active_entry_acquire(gt, ref);
     ...
     active_entry_release(act);
-    spin_unlock(&gt->lock);
+    read_unlock(&gt->lock);
 
  Active entries cannot be acquired while holding the maptrack lock.
  Multiple active entries can be acquired while holding the grant table
- lock.
+ _write_ lock.
 
 ********************************************************************************
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index a91ea77..b305041 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1048,7 +1048,7 @@ int xenmem_add_to_physmap_one(
     switch ( space )
     {
     case XENMAPSPACE_grant_table:
-        spin_lock(&d->grant_table->lock);
+        write_lock(&d->grant_table->lock);
 
         if ( d->grant_table->gt_version == 0 )
             d->grant_table->gt_version = 1;
@@ -1078,7 +1078,7 @@ int xenmem_add_to_physmap_one(
 
         t = p2m_ram_rw;
 
-        spin_unlock(&d->grant_table->lock);
+        write_unlock(&d->grant_table->lock);
         break;
     case XENMAPSPACE_shared_info:
         if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 5fe08df..8ca38b0 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4593,7 +4593,7 @@ int xenmem_add_to_physmap_one(
                 mfn = virt_to_mfn(d->shared_info);
             break;
         case XENMAPSPACE_grant_table:
-            spin_lock(&d->grant_table->lock);
+            write_lock(&d->grant_table->lock);
 
             if ( d->grant_table->gt_version == 0 )
                 d->grant_table->gt_version = 1;
@@ -4615,7 +4615,7 @@ int xenmem_add_to_physmap_one(
                     mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
             }
 
-            spin_unlock(&d->grant_table->lock);
+            write_unlock(&d->grant_table->lock);
             break;
         case XENMAPSPACE_gmfn_range:
         case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index fee34ee..7d9f64d 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -196,7 +196,7 @@ active_entry_acquire(struct grant_table *t, grant_ref_t e)
 {
     struct active_grant_entry *act;
 
-    ASSERT(spin_is_locked(&t->lock));
+    ASSERT(rw_is_locked(&t->lock));
 
     act = &_active_entry(t, e);
     spin_lock(&act->lock);
@@ -254,23 +254,23 @@ double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
 {
     if ( lgt < rgt )
     {
-        spin_lock(&lgt->lock);
-        spin_lock(&rgt->lock);
+        write_lock(&lgt->lock);
+        write_lock(&rgt->lock);
     }
     else
     {
         if ( lgt != rgt )
-            spin_lock(&rgt->lock);
-        spin_lock(&lgt->lock);
+            write_lock(&rgt->lock);
+        write_lock(&lgt->lock);
     }
 }
 
 static inline void
 double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
 {
-    spin_unlock(&lgt->lock);
+    write_unlock(&lgt->lock);
     if ( lgt != rgt )
-        spin_unlock(&rgt->lock);
+        write_unlock(&rgt->lock);
 }
 
 static inline int
@@ -529,7 +529,7 @@ static int grant_map_exists(const struct domain *ld,
     struct active_grant_entry *act;
     unsigned int ref, max_iter;
     
-    ASSERT(spin_is_locked(&rgt->lock));
+    ASSERT(rw_is_locked(&rgt->lock));
 
     max_iter = min(*ref_count + (1 << GNTTABOP_CONTINUATION_ARG_SHIFT),
                    nr_grant_entries(rgt));
@@ -568,10 +568,10 @@ static void mapcount(
     *wrc = *rdc = 0;
 
     /*
-     * Must have the remote domain's grant table lock while counting
-     * its active entries.
+     * Must have the remote domain's grant table write lock while
+     * counting its active entries.
      */
-    ASSERT(spin_is_locked(&rd->grant_table->lock));
+    ASSERT(rw_is_write_locked(&rd->grant_table->lock));
 
     for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
     {
@@ -656,7 +656,7 @@ __gnttab_map_grant_ref(
     }
 
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
         PIN_FAIL(unlock_out, GNTST_general_error,
@@ -730,7 +730,7 @@ __gnttab_map_grant_ref(
     cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     /* pg may be set, with a refcount included, from __get_paged_frame */
     if ( !pg )
@@ -806,12 +806,13 @@ __gnttab_map_grant_ref(
         goto undo_out;
     }
 
-    double_gt_lock(lgt, rgt);
-
     if ( gnttab_need_iommu_mapping(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
+
+        double_gt_lock(lgt, rgt);
+
         /* We're not translated, so we know that gmfns and mfns are
            the same things, so the IOMMU entry is always 1-to-1. */
         mapcount(lgt, rd, frame, &wrc, &rdc);
@@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
             if ( (wrc + rdc) == 0 )
                 err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
         }
+
+        double_gt_lock(lgt, rgt);
+
         if ( err )
         {
-            double_gt_unlock(lgt, rgt);
             rc = GNTST_general_error;
             goto undo_out;
         }
@@ -842,8 +845,6 @@ __gnttab_map_grant_ref(
     mt->ref   = op->ref;
     mt->flags = op->flags;
 
-    double_gt_unlock(lgt, rgt);
-
     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
     op->handle       = handle;
     op->status       = GNTST_okay;
@@ -865,7 +866,7 @@ __gnttab_map_grant_ref(
         put_page(pg);
     }
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     act = active_entry_acquire(rgt, op->ref);
 
@@ -888,7 +889,7 @@ __gnttab_map_grant_ref(
     active_entry_release(act);
 
  unlock_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     op->status = rc;
     put_maptrack_handle(lgt, handle);
     rcu_unlock_domain(rd);
@@ -938,18 +939,19 @@ __gnttab_unmap_common(
     }
 
     op->map = &maptrack_entry(lgt, op->handle);
-    spin_lock(&lgt->lock);
+
+    read_lock(&lgt->lock);
 
     if ( unlikely(!op->map->flags) )
     {
-        spin_unlock(&lgt->lock);
+        read_unlock(&lgt->lock);
         gdprintk(XENLOG_INFO, "Zero flags for handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
         return;
     }
 
     dom = op->map->domid;
-    spin_unlock(&lgt->lock);
+    read_unlock(&lgt->lock);
 
     if ( unlikely((rd = rcu_lock_domain_by_id(dom)) == NULL) )
     {
@@ -970,7 +972,8 @@ __gnttab_unmap_common(
     TRACE_1D(TRC_MEM_PAGE_GRANT_UNMAP, dom);
 
     rgt = rd->grant_table;
-    double_gt_lock(lgt, rgt);
+
+    read_lock(&rgt->lock);
 
     op->flags = op->map->flags;
     if ( unlikely(!op->flags) || unlikely(op->map->domid != dom) )
@@ -1019,31 +1022,34 @@ __gnttab_unmap_common(
             act->pin -= GNTPIN_hstw_inc;
     }
 
-    if ( gnttab_need_iommu_mapping(ld) )
+ act_release_out:
+    active_entry_release(act);
+ unmap_out:
+    read_unlock(&rgt->lock);
+
+    if ( rc == GNTST_okay && gnttab_need_iommu_mapping(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
+
+        double_gt_lock(lgt, rgt);
+
         mapcount(lgt, rd, op->frame, &wrc, &rdc);
         if ( (wrc + rdc) == 0 )
             err = iommu_unmap_page(ld, op->frame);
         else if ( wrc == 0 )
             err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);
+
+        double_gt_unlock(lgt, rgt);
+
         if ( err )
-        {
             rc = GNTST_general_error;
-            goto act_release_out;
-        }
     }
 
     /* If just unmapped a writable mapping, mark as dirtied */
-    if ( !(op->flags & GNTMAP_readonly) )
+    if ( rc == GNTST_okay && !(op->flags & GNTMAP_readonly) )
          gnttab_mark_dirty(rd, op->frame);
 
- act_release_out:
-    active_entry_release(act);
- unmap_out:
-    double_gt_unlock(lgt, rgt);
-
     op->status = rc;
     rcu_unlock_domain(rd);
 }
@@ -1073,8 +1079,8 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
 
     rcu_lock_domain(rd);
     rgt = rd->grant_table;
-    spin_lock(&rgt->lock);
 
+    read_lock(&rgt->lock);
     if ( rgt->gt_version == 0 )
         goto unlock_out;
 
@@ -1140,7 +1146,7 @@ __gnttab_unmap_common_complete(struct gnttab_unmap_common *op)
  act_release_out:
     active_entry_release(act);
  unlock_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     if ( put_handle )
     {
@@ -1327,11 +1333,13 @@ gnttab_unpopulate_status_frames(struct domain *d, struct grant_table *gt)
     gt->nr_status_frames = 0;
 }
 
+/*
+ * Grow the grant table. The caller must hold the grant table's
+ * write lock before calling this function.
+ */
 int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames)
 {
-    /* d's grant table lock must be held by the caller */
-
     struct grant_table *gt = d->grant_table;
     unsigned int i, j;
 
@@ -1437,7 +1445,7 @@ gnttab_setup_table(
     }
 
     gt = d->grant_table;
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
 
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
@@ -1465,7 +1473,7 @@ gnttab_setup_table(
     }
 
  out3:
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
  out2:
     rcu_unlock_domain(d);
  out1:
@@ -1507,13 +1515,13 @@ gnttab_query_size(
         goto query_out_unlock;
     }
 
-    spin_lock(&d->grant_table->lock);
+    read_lock(&d->grant_table->lock);
 
     op.nr_frames     = nr_grant_frames(d->grant_table);
     op.max_nr_frames = max_grant_frames;
     op.status        = GNTST_okay;
 
-    spin_unlock(&d->grant_table->lock);
+    read_unlock(&d->grant_table->lock);
 
  
  query_out_unlock:
@@ -1539,7 +1547,7 @@ gnttab_prepare_for_transfer(
     union grant_combo   scombo, prev_scombo, new_scombo;
     int                 retries = 0;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
     {
@@ -1590,11 +1598,11 @@ gnttab_prepare_for_transfer(
         scombo = prev_scombo;
     }
 
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return 1;
 
  fail:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return 0;
 }
 
@@ -1787,7 +1795,7 @@ gnttab_transfer(
         TRACE_1D(TRC_MEM_PAGE_GRANT_TRANSFER, e->domain_id);
 
         /* Tell the guest about its new page frame. */
-        spin_lock(&e->grant_table->lock);
+        read_lock(&e->grant_table->lock);
 
         if ( e->grant_table->gt_version == 1 )
         {
@@ -1805,7 +1813,7 @@ gnttab_transfer(
         shared_entry_header(e->grant_table, gop.ref)->flags |=
             GTF_transfer_completed;
 
-        spin_unlock(&e->grant_table->lock);
+        read_unlock(&e->grant_table->lock);
 
         rcu_unlock_domain(e);
 
@@ -1843,7 +1851,7 @@ __release_grant_for_copy(
     released_read = 0;
     released_write = 0;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     act = active_entry_acquire(rgt, gref);
     sha = shared_entry_header(rgt, gref);
@@ -1885,7 +1893,7 @@ __release_grant_for_copy(
     }
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     if ( td != rd )
     {
@@ -1943,7 +1951,7 @@ __acquire_grant_for_copy(
 
     *page = NULL;
 
-    spin_lock(&rgt->lock);
+    read_lock(&rgt->lock);
 
     if ( rgt->gt_version == 0 )
         PIN_FAIL(gnt_unlock_out, GNTST_general_error,
@@ -2019,20 +2027,20 @@ __acquire_grant_for_copy(
              * here and reacquire
              */
             active_entry_release(act);
-            spin_unlock(&rgt->lock);
+            read_unlock(&rgt->lock);
 
             rc = __acquire_grant_for_copy(td, trans_gref, rd->domain_id,
                                           readonly, &grant_frame, page,
                                           &trans_page_off, &trans_length, 0);
 
-            spin_lock(&rgt->lock);
+            read_lock(&rgt->lock);
             act = active_entry_acquire(rgt, gref);
 
             if ( rc != GNTST_okay ) {
                 __fixup_status_for_copy_pin(act, status);
                 active_entry_release(act);
+                read_unlock(&rgt->lock);
                 rcu_unlock_domain(td);
-                spin_unlock(&rgt->lock);
                 return rc;
             }
 
@@ -2045,7 +2053,7 @@ __acquire_grant_for_copy(
                 __fixup_status_for_copy_pin(act, status);
                 rcu_unlock_domain(td);
                 active_entry_release(act);
-                spin_unlock(&rgt->lock);
+                read_unlock(&rgt->lock);
                 put_page(*page);
                 return __acquire_grant_for_copy(rd, gref, ldom, readonly,
                                                 frame, page, page_off, length,
@@ -2114,7 +2122,7 @@ __acquire_grant_for_copy(
     *frame = act->frame;
 
     active_entry_release(act);
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
     return rc;
  
  unlock_out_clear:
@@ -2129,7 +2137,7 @@ __acquire_grant_for_copy(
     active_entry_release(act);
 
  gnt_unlock_out:
-    spin_unlock(&rgt->lock);
+    read_unlock(&rgt->lock);
 
     return rc;
 }
@@ -2445,7 +2453,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     if ( gt->gt_version == op.version )
         goto out;
 
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
     /* Make sure that the grant table isn't currently in use when we
        change the version number, except for the first 8 entries which
        are allowed to be in use (xenstore/xenconsole keeps them mapped).
@@ -2530,7 +2538,7 @@ gnttab_set_version(XEN_GUEST_HANDLE_PARAM(gnttab_set_version_t) uop)
     gt->gt_version = op.version;
 
 out_unlock:
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
 
 out:
     op.version = gt->gt_version;
@@ -2586,7 +2594,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
 
     op.status = GNTST_okay;
 
-    spin_lock(&gt->lock);
+    read_lock(&gt->lock);
 
     for ( i = 0; i < op.nr_frames; i++ )
     {
@@ -2595,7 +2603,7 @@ gnttab_get_status_frames(XEN_GUEST_HANDLE_PARAM(gnttab_get_status_frames_t) uop,
             op.status = GNTST_bad_virt_addr;
     }
 
-    spin_unlock(&gt->lock);
+    read_unlock(&gt->lock);
 out2:
     rcu_unlock_domain(d);
 out1:
@@ -2645,7 +2653,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
     struct active_grant_entry *act_b = NULL;
     s16 rc = GNTST_okay;
 
-    spin_lock(&gt->lock);
+    write_lock(&gt->lock);
 
     /* Bounds check on the grant refs */
     if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
@@ -2689,7 +2697,7 @@ out:
         active_entry_release(act_b);
     if ( act_a != NULL )
         active_entry_release(act_a);
-    spin_unlock(&gt->lock);
+    write_unlock(&gt->lock);
 
     rcu_unlock_domain(d);
 
@@ -2760,12 +2768,12 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
 
     if ( d != owner )
     {
-        spin_lock(&owner->grant_table->lock);
+        read_lock(&owner->grant_table->lock);
 
         ret = grant_map_exists(d, owner->grant_table, mfn, ref_count);
         if ( ret != 0 )
         {
-            spin_unlock(&owner->grant_table->lock);
+            read_unlock(&owner->grant_table->lock);
             rcu_unlock_domain(d);
             put_page(page);
             return ret;
@@ -2785,7 +2793,7 @@ static int __gnttab_cache_flush(gnttab_cache_flush_t *cflush,
         ret = 0;
 
     if ( d != owner )
-        spin_unlock(&owner->grant_table->lock);
+        read_unlock(&owner->grant_table->lock);
     unmap_domain_page(v);
     put_page(page);
 
@@ -3004,7 +3012,7 @@ grant_table_create(
         goto no_mem_0;
 
     /* Simple stuff. */
-    spin_lock_init(&t->lock);
+    rwlock_init(&t->lock);
     spin_lock_init(&t->maptrack_lock);
     t->nr_grant_frames = INITIAL_NR_GRANT_FRAMES;
 
@@ -3114,7 +3122,7 @@ gnttab_release_mappings(
         }
 
         rgt = rd->grant_table;
-        spin_lock(&rgt->lock);
+        read_lock(&rgt->lock);
 
         act = active_entry_acquire(rgt, ref);
         sha = shared_entry_header(rgt, ref);
@@ -3175,7 +3183,7 @@ gnttab_release_mappings(
             gnttab_clear_flag(_GTF_reading, status);
 
         active_entry_release(act);
-        spin_unlock(&rgt->lock);
+        read_unlock(&rgt->lock);
 
         rcu_unlock_domain(rd);
 
@@ -3223,7 +3231,7 @@ static void gnttab_usage_print(struct domain *rd)
     printk("      -------- active --------       -------- shared --------\n");
     printk("[ref] localdom mfn      pin          localdom gmfn     flags\n");
 
-    spin_lock(&gt->lock);
+    read_lock(&gt->lock);
 
     if ( gt->gt_version == 0 )
         goto out;
@@ -3276,7 +3284,7 @@ static void gnttab_usage_print(struct domain *rd)
     }
 
  out:
-    spin_unlock(&gt->lock);
+    read_unlock(&gt->lock);
 
     if ( first )
         printk("grant-table for remote domain:%5d ... "
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index 0b35a5e..f22ebd0 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -64,6 +64,11 @@ struct grant_mapping {
 
 /* Per-domain grant information. */
 struct grant_table {
+    /*
+     * Lock protecting updates to grant table state (version, active
+     * entry list, etc.)
+     */
+    rwlock_t              lock;
     /* Table size. Number of frames shared with guest */
     unsigned int          nr_grant_frames;
     /* Shared grant table (see include/public/grant_table.h). */
@@ -84,8 +89,6 @@ struct grant_table {
     unsigned int          maptrack_limit;
     /* Lock protecting the maptrack page list, head, and limit */
     spinlock_t            maptrack_lock;
-    /* Lock protecting updates to active and shared grant tables. */
-    spinlock_t            lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
        what version to use yet. */
     unsigned              gt_version;
@@ -103,7 +106,7 @@ gnttab_release_mappings(
     struct domain *d);
 
 /* Increase the size of a domain's grant table.
- * Caller must hold d's grant table lock.
+ * Caller must hold d's grant table write lock.
  */
 int
 gnttab_grow_table(struct domain *d, unsigned int req_nr_frames);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists
  2015-05-20 15:54 [PATCHv9 0/3] gnttab: Improve scaleability David Vrabel
                   ` (2 preceding siblings ...)
  2015-05-20 15:54 ` [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
@ 2015-05-20 15:54 ` David Vrabel
  2015-05-21 10:42   ` Jan Beulich
  3 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-05-20 15:54 UTC (permalink / raw)
  To: xen-devel
  Cc: Keir Fraser, Ian Campbell, Tim Deegan, David Vrabel, Jan Beulich,
	Malcolm Crossley

From: Malcolm Crossley <malcolm.crossley@citrix.com>

Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.

Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle.  Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).

A domain's maptrack limit is multiplied by the number of VCPUs.  This
ensures that a worst case domain that only performs grant table
operations via one VCPU will not see a lower map track limit.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/common/grant_table.c      |  139 ++++++++++++++++++++++++++---------------
 xen/include/xen/grant_table.h |    8 ++-
 xen/include/xen/sched.h       |    5 ++
 3 files changed, 97 insertions(+), 55 deletions(-)

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 7d9f64d..cdfd0c4 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -118,9 +118,9 @@ struct gnttab_unmap_common {
     ((t)->maptrack[(e)/MAPTRACK_PER_PAGE][(e)%MAPTRACK_PER_PAGE])
 
 static inline unsigned int
-nr_maptrack_frames(struct grant_table *t)
+nr_vcpu_maptrack_frames(struct vcpu *v)
 {
-    return t->maptrack_limit / MAPTRACK_PER_PAGE;
+    return v->maptrack_limit / MAPTRACK_PER_PAGE;
 }
 
 #define MAPTRACK_TAIL (~0u)
@@ -275,66 +275,101 @@ double_gt_unlock(struct grant_table *lgt, struct grant_table *rgt)
 
 static inline int
 __get_maptrack_handle(
-    struct grant_table *t)
+    struct grant_table *t,
+    struct vcpu *v)
 {
-    unsigned int h;
-    if ( unlikely((h = t->maptrack_head) == MAPTRACK_TAIL) )
+    unsigned int head, next;
+
+    /* No maptrack pages allocated for this VCPU yet? */
+    head = v->maptrack_head;
+    if ( unlikely(head == MAPTRACK_TAIL) )
+        return -1;
+
+    /*
+     * Always keep one entry in the free list to make it easier to add
+     * free entries to the tail.
+     */
+    next = read_atomic(&maptrack_entry(t, head).ref);
+    if ( unlikely(next == MAPTRACK_TAIL) )
         return -1;
-    t->maptrack_head = maptrack_entry(t, h).ref;
-    return h;
+
+    v->maptrack_head = next;
+
+    return head;
 }
 
 static inline void
 put_maptrack_handle(
     struct grant_table *t, int handle)
 {
-    spin_lock(&t->maptrack_lock);
-    maptrack_entry(t, handle).ref = t->maptrack_head;
-    t->maptrack_head = handle;
-    spin_unlock(&t->maptrack_lock);
+    struct domain *d = current->domain;
+    struct vcpu *v;
+    unsigned int prev_tail, cur_tail;
+
+    /* 1. Set entry to be a tail. */
+    maptrack_entry(t, handle).ref = MAPTRACK_TAIL;
+
+    /* 2. Add entry to the tail of the list on the original VCPU. */
+    v = d->vcpu[maptrack_entry(t,handle).vcpu];
+
+    cur_tail = read_atomic(&v->maptrack_tail);
+    do {
+        prev_tail = cur_tail;
+        cur_tail = cmpxchg(&v->maptrack_tail, prev_tail, handle);
+    } while ( cur_tail != prev_tail );
+
+    /* 3. Update the old tail entry to point to the new entry. */
+    write_atomic(&maptrack_entry(t, prev_tail).ref, handle);
 }
 
 static inline int
 get_maptrack_handle(
     struct grant_table *lgt)
 {
+    struct vcpu          *v = current;
     int                   i;
     grant_handle_t        handle;
     struct grant_mapping *new_mt;
-    unsigned int          new_mt_limit, nr_frames;
+    unsigned int          nr_frames;
 
-    spin_lock(&lgt->maptrack_lock);
+    handle = __get_maptrack_handle(lgt, v);
+    if ( likely(handle != -1) )
+        return handle;
 
-    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
-    {
-        nr_frames = nr_maptrack_frames(lgt);
-        if ( nr_frames >= max_maptrack_frames )
-            break;
+    nr_frames = nr_vcpu_maptrack_frames(v);
+    if ( nr_frames >= max_maptrack_frames )
+        return -1;
 
-        new_mt = alloc_xenheap_page();
-        if ( !new_mt )
-            break;
+    new_mt = alloc_xenheap_page();
+    if ( !new_mt )
+        return -1;
+    clear_page(new_mt);
 
-        clear_page(new_mt);
+    spin_lock(&lgt->maptrack_lock);
 
-        new_mt_limit = lgt->maptrack_limit + MAPTRACK_PER_PAGE;
+    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
+    {
+        new_mt[i - 1].ref = (lgt->maptrack_pages * MAPTRACK_PER_PAGE) + i;
+        new_mt[i - 1].vcpu = v->vcpu_id;
+    }
+    /* Set last entry vcpu and ref */
+    new_mt[i - 1].ref = v->maptrack_head;
+    new_mt[i - 1].vcpu = v->vcpu_id;
 
-        for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-            new_mt[i - 1].ref = lgt->maptrack_limit + i;
-        new_mt[i - 1].ref = lgt->maptrack_head;
-        lgt->maptrack_head = lgt->maptrack_limit;
+    v->maptrack_head = lgt->maptrack_pages * MAPTRACK_PER_PAGE;
 
-        lgt->maptrack[nr_frames] = new_mt;
-        smp_wmb();
-        lgt->maptrack_limit      = new_mt_limit;
+    /* Set tail directly if this is the first page for this VCPU. */
+    if ( v->maptrack_tail == MAPTRACK_TAIL )
+        v->maptrack_tail = (lgt->maptrack_pages * MAPTRACK_PER_PAGE)
+            + MAPTRACK_PER_PAGE - 1;
 
-        gdprintk(XENLOG_INFO, "Increased maptrack size to %u frames\n",
-                 nr_frames + 1);
-    }
+    lgt->maptrack[lgt->maptrack_pages++] = new_mt;
 
     spin_unlock(&lgt->maptrack_lock);
 
-    return handle;
+    v->maptrack_limit += MAPTRACK_PER_PAGE;
+
+    return __get_maptrack_handle(lgt, v);
 }
 
 /* Number of grant table entries. Caller must hold d's grant table lock. */
@@ -573,7 +608,8 @@ static void mapcount(
      */
     ASSERT(rw_is_write_locked(&rd->grant_table->lock));
 
-    for ( handle = 0; handle < lgt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < lgt->maptrack_pages * MAPTRACK_PER_PAGE; 
+          handle++ )
     {
         map = &maptrack_entry(lgt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) ||
@@ -931,7 +967,7 @@ __gnttab_unmap_common(
 
     op->frame = (unsigned long)(op->dev_bus_addr >> PAGE_SHIFT);
 
-    if ( unlikely(op->handle >= lgt->maptrack_limit) )
+    if ( unlikely(op->handle >= lgt->maptrack_pages * MAPTRACK_PER_PAGE) )
     {
         gdprintk(XENLOG_INFO, "Bad handle (%d).\n", op->handle);
         op->status = GNTST_bad_handle;
@@ -1406,6 +1442,7 @@ gnttab_setup_table(
     struct gnttab_setup_table op;
     struct domain *d;
     struct grant_table *gt;
+    struct vcpu *v;
     int            i;
     xen_pfn_t  gmfn;
 
@@ -1447,6 +1484,17 @@ gnttab_setup_table(
     gt = d->grant_table;
     write_lock(&gt->lock);
 
+    /* Tracking of mapped foreign frames table */
+    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
+                                       max_maptrack_frames * d->max_vcpus)) == NULL )
+        goto out3;
+    for_each_vcpu( d, v )
+    {
+        v->maptrack_head = MAPTRACK_TAIL;
+        v->maptrack_tail = MAPTRACK_TAIL;
+    }
+    gt->maptrack_pages = 0;
+
     if ( gt->gt_version == 0 )
         gt->gt_version = 1;
 
@@ -3030,18 +3078,6 @@ grant_table_create(
             spin_lock_init(&t->active[i][j].lock);
     }
 
-    /* Tracking of mapped foreign frames table */
-    if ( (t->maptrack = xzalloc_array(struct grant_mapping *,
-                                      max_maptrack_frames)) == NULL )
-        goto no_mem_2;
-    if ( (t->maptrack[0] = alloc_xenheap_page()) == NULL )
-        goto no_mem_3;
-    clear_page(t->maptrack[0]);
-    t->maptrack_limit = MAPTRACK_PER_PAGE;
-    for ( i = 1; i < MAPTRACK_PER_PAGE; i++ )
-        t->maptrack[0][i - 1].ref = i;
-    t->maptrack[0][i - 1].ref = MAPTRACK_TAIL;
-
     /* Shared grant table. */
     if ( (t->shared_raw = xzalloc_array(void *, max_grant_frames)) == NULL )
         goto no_mem_3;
@@ -3072,8 +3108,6 @@ grant_table_create(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
  no_mem_3:
-    free_xenheap_page(t->maptrack[0]);
-    xfree(t->maptrack);
  no_mem_2:
     for ( i = 0;
           i < num_act_frames_from_sha_frames(INITIAL_NR_GRANT_FRAMES); i++ )
@@ -3101,7 +3135,8 @@ gnttab_release_mappings(
 
     BUG_ON(!d->is_dying);
 
-    for ( handle = 0; handle < gt->maptrack_limit; handle++ )
+    for ( handle = 0; handle < gt->maptrack_pages * MAPTRACK_PER_PAGE;
+          handle++ )
     {
         map = &maptrack_entry(gt, handle);
         if ( !(map->flags & (GNTMAP_device_map|GNTMAP_host_map)) )
@@ -3206,7 +3241,7 @@ grant_table_destroy(
         free_xenheap_page(t->shared_raw[i]);
     xfree(t->shared_raw);
 
-    for ( i = 0; i < nr_maptrack_frames(t); i++ )
+    for ( i = 0; i < t->maptrack_pages; i++ )
         free_xenheap_page(t->maptrack[i]);
     xfree(t->maptrack);
 
diff --git a/xen/include/xen/grant_table.h b/xen/include/xen/grant_table.h
index f22ebd0..c6e4ebf 100644
--- a/xen/include/xen/grant_table.h
+++ b/xen/include/xen/grant_table.h
@@ -60,6 +60,8 @@ struct grant_mapping {
     u32      ref;           /* grant ref */
     u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */
     domid_t  domid;         /* granting domain */
+    u32      vcpu;          /* vcpu which created the grant mapping */
+    u32      pad;           /* round size to a power of 2 */
 };
 
 /* Per-domain grant information. */
@@ -83,10 +85,10 @@ struct grant_table {
     grant_status_t       **status;
     /* Active grant table. */
     struct active_grant_entry **active;
-    /* Mapping tracking table. */
+    /* Mapping tracking table per vcpu. */
     struct grant_mapping **maptrack;
-    unsigned int          maptrack_head;
-    unsigned int          maptrack_limit;
+    /* Total pages used for mapping tracking table */
+    unsigned int          maptrack_pages;
     /* Lock protecting the maptrack page list, head, and limit */
     spinlock_t            maptrack_lock;
     /* The defined versions are 1 and 2.  Set to 0 if we don't know
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 80c6f62..7f8422e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -219,6 +219,11 @@ struct vcpu
     /* VCPU paused by system controller. */
     int              controller_pause_count;
 
+    /* Maptrack */
+    unsigned int     maptrack_head;
+    unsigned int     maptrack_tail;
+    unsigned int     maptrack_limit;
+
     /* IRQ-safe virq_lock protects against delivering VIRQ to stale evtchn. */
     evtchn_port_t    virq_to_evtchn[NR_VIRQS];
     spinlock_t       virq_lock;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 1/4] gnttab: per-active entry locking
  2015-05-20 15:54 ` [PATCHv9 1/4] gnttab: per-active entry locking David Vrabel
@ 2015-05-21  7:46   ` Jan Beulich
  2015-05-21 14:11     ` David Vrabel
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2015-05-21  7:46 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
> @@ -505,7 +526,7 @@ static int grant_map_exists(const struct domain *ld,
>                              unsigned long mfn,
>                              unsigned int *ref_count)
>  {
> -    const struct active_grant_entry *act;
> +    struct active_grant_entry *act;
>      unsigned int ref, max_iter;
>      
>      ASSERT(spin_is_locked(&rgt->lock));
> @@ -514,18 +535,18 @@ static int grant_map_exists(const struct domain *ld,
>                     nr_grant_entries(rgt));
>      for ( ref = *ref_count; ref < max_iter; ref++ )
>      {
> -        act = &active_entry(rgt, ref);
> +        bool_t exists;

With this I think act's declaration should move into this more narrow
scope too, the more that you need to drop its const anyway.

> @@ -702,6 +729,7 @@ __gnttab_map_grant_ref(
>  
>      cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
>  
> +    active_entry_release(act);
>      spin_unlock(&rgt->lock);

Just for my understanding: The lock isn't meant to also cover *shah?
I.e. it could be dropped ahead of the cache_flags assignment?

> @@ -978,7 +1009,7 @@ __gnttab_unmap_common(
>          if ( (rc = replace_grant_host_mapping(op->host_addr,
>                                                op->frame, op->new_addr, 
>                                                op->flags)) < 0 )
> -            goto unmap_out;
> +            goto act_release_out;
>  
>          ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
>          op->map->flags &= ~GNTMAP_host_map;

act doesn't look to be accessed anymore after the if() this is
contained in - why don't you drop the lock as early as possible?
Or wait - do we need act->pin to remain stable until after that
subsequent if() (in which case dropping the lock before the
final if() in the function would mainly make error handling more
cumbersome without bying us much)?

> @@ -1979,8 +2026,11 @@ __acquire_grant_for_copy(
>                                            &trans_page_off, &trans_length, 0);
>  
>              spin_lock(&rgt->lock);
> +            act = active_entry_acquire(rgt, gref);
> +
>              if ( rc != GNTST_okay ) {
>                  __fixup_status_for_copy_pin(act, status);
> +                active_entry_release(act);
>                  rcu_unlock_domain(td);
>                  spin_unlock(&rgt->lock);
>                  return rc;
> @@ -1994,6 +2044,7 @@ __acquire_grant_for_copy(
>              {
>                  __fixup_status_for_copy_pin(act, status);
>                  rcu_unlock_domain(td);
> +                active_entry_release(act);
>                  spin_unlock(&rgt->lock);

Please make the two sequences of unlocks the same.

> @@ -2074,7 +2126,11 @@ __acquire_grant_for_copy(
>          gnttab_clear_flag(_GTF_reading, status);
>  
>   unlock_out:
> +    active_entry_release(act);
> +
> + gnt_unlock_out:

I think this would better be gt_unlock_out or gnttab_unlock_out.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-20 15:54 ` [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
@ 2015-05-21 10:32   ` Jan Beulich
  2015-05-21 13:36     ` David Vrabel
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2015-05-21 10:32 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
> @@ -254,23 +254,23 @@ double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
>  {
>      if ( lgt < rgt )
>      {
> -        spin_lock(&lgt->lock);
> -        spin_lock(&rgt->lock);
> +        write_lock(&lgt->lock);
> +        write_lock(&rgt->lock);
>      }
>      else
>      {
>          if ( lgt != rgt )
> -            spin_lock(&rgt->lock);
> -        spin_lock(&lgt->lock);
> +            write_lock(&rgt->lock);
> +        write_lock(&lgt->lock);
>      }
>  }

Do both of them need to be write locks now?

> @@ -806,12 +806,13 @@ __gnttab_map_grant_ref(
>          goto undo_out;
>      }
>  
> -    double_gt_lock(lgt, rgt);
> -
>      if ( gnttab_need_iommu_mapping(ld) )
>      {
>          unsigned int wrc, rdc;
>          int err = 0;
> +
> +        double_gt_lock(lgt, rgt);
> +
>          /* We're not translated, so we know that gmfns and mfns are
>             the same things, so the IOMMU entry is always 1-to-1. */
>          mapcount(lgt, rd, frame, &wrc, &rdc);
> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>              if ( (wrc + rdc) == 0 )
>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>          }
> +
> +        double_gt_lock(lgt, rgt);

unlock. And with this code path actually used (due to the bug it's
pretty clear it didn't get exercised in your testing), how does
performance look like? 

> @@ -842,8 +845,6 @@ __gnttab_map_grant_ref(
>      mt->ref   = op->ref;
>      mt->flags = op->flags;
>  
> -    double_gt_unlock(lgt, rgt);

Don't the mt-> updates above need some kind of protection?

> @@ -2645,7 +2653,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
>      struct active_grant_entry *act_b = NULL;
>      s16 rc = GNTST_okay;
>  
> -    spin_lock(&gt->lock);
> +    write_lock(&gt->lock);
>  
>      /* Bounds check on the grant refs */
>      if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
> @@ -2689,7 +2697,7 @@ out:
>          active_entry_release(act_b);
>      if ( act_a != NULL )
>          active_entry_release(act_a);
> -    spin_unlock(&gt->lock);
> +    write_unlock(&gt->lock);

With the per-entry locks, does this still o be a write lock?

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists
  2015-05-20 15:54 ` [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
@ 2015-05-21 10:42   ` Jan Beulich
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Beulich @ 2015-05-21 10:42 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Malcolm Crossley, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>  static inline int
>  get_maptrack_handle(
>      struct grant_table *lgt)
>  {
> +    struct vcpu          *v = current;
>      int                   i;
>      grant_handle_t        handle;
>      struct grant_mapping *new_mt;
> -    unsigned int          new_mt_limit, nr_frames;
> +    unsigned int          nr_frames;
>  
> -    spin_lock(&lgt->maptrack_lock);
> +    handle = __get_maptrack_handle(lgt, v);
> +    if ( likely(handle != -1) )
> +        return handle;
>  
> -    while ( unlikely((handle = __get_maptrack_handle(lgt)) == -1) )
> -    {
> -        nr_frames = nr_maptrack_frames(lgt);
> -        if ( nr_frames >= max_maptrack_frames )
> -            break;
> +    nr_frames = nr_vcpu_maptrack_frames(v);
> +    if ( nr_frames >= max_maptrack_frames )

I think you want to check lgt->maptrack_pages here, to make sure
the global limit remains a per-domain one. Or else we'd need a dual
(Dom0/DomU) or even per-domain bound to avoid many-vCPU
guests being able to force Xen into using significant amounts of
memory for their maptrack.

> @@ -1447,6 +1484,17 @@ gnttab_setup_table(
>      gt = d->grant_table;
>      write_lock(&gt->lock);
>  
> +    /* Tracking of mapped foreign frames table */
> +    if ( (gt->maptrack = xzalloc_array(struct grant_mapping *,
> +                                       max_maptrack_frames * d->max_vcpus)) == NULL )
> +        goto out3;

Not that vzalloc() meanwhile exists. But depending on the resolution
of the aspect above you may not even urgently need it here.

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-21 10:32   ` Jan Beulich
@ 2015-05-21 13:36     ` David Vrabel
  2015-05-21 14:20       ` David Vrabel
  2015-05-21 14:53       ` Jan Beulich
  0 siblings, 2 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-21 13:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

On 21/05/15 11:32, Jan Beulich wrote:
>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>> @@ -254,23 +254,23 @@ double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
>>  {
>>      if ( lgt < rgt )
>>      {
>> -        spin_lock(&lgt->lock);
>> -        spin_lock(&rgt->lock);
>> +        write_lock(&lgt->lock);
>> +        write_lock(&rgt->lock);
>>      }
>>      else
>>      {
>>          if ( lgt != rgt )
>> -            spin_lock(&rgt->lock);
>> -        spin_lock(&lgt->lock);
>> +            write_lock(&rgt->lock);
>> +        write_lock(&lgt->lock);
>>      }
>>  }
> 
> Do both of them need to be write locks now?

I don't know.  This isn't used in any path I care about so I didn't
spend any time on analysing it and did the obviously correct change.

Someone who cares about the performance of this path is going to have to
do the work to further improve performance.

>> @@ -806,12 +806,13 @@ __gnttab_map_grant_ref(
>>          goto undo_out;
>>      }
>>  
>> -    double_gt_lock(lgt, rgt);
>> -
>>      if ( gnttab_need_iommu_mapping(ld) )
>>      {
>>          unsigned int wrc, rdc;
>>          int err = 0;
>> +
>> +        double_gt_lock(lgt, rgt);
>> +
>>          /* We're not translated, so we know that gmfns and mfns are
>>             the same things, so the IOMMU entry is always 1-to-1. */
>>          mapcount(lgt, rd, frame, &wrc, &rdc);
>> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>>              if ( (wrc + rdc) == 0 )
>>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>>          }
>> +
>> +        double_gt_lock(lgt, rgt);
> 
> unlock. And with this code path actually used (due to the bug it's
> pretty clear it didn't get exercised in your testing), how does
> performance look like? 

I think it will be no worse than what it was before -- this path already
really sucks (mapcount() loops over 1000s of entries).  I don't care
about this path at all.

>> @@ -842,8 +845,6 @@ __gnttab_map_grant_ref(
>>      mt->ref   = op->ref;
>>      mt->flags = op->flags;
>>  
>> -    double_gt_unlock(lgt, rgt);
> 
> Don't the mt-> updates above need some kind of protection?

It depends:

If not gnttab_need_iommu_mapping() but we only need a write barrier
before the mt->flags store.

If gnttab_need_iommu_mapping() then a lock is required to prevent racing
with concurrent mapcount() calls.  This is not a path I care about so
its easiest to just keep the double lock around this.

>> @@ -2645,7 +2653,7 @@ __gnttab_swap_grant_ref(grant_ref_t ref_a, grant_ref_t ref_b)
>>      struct active_grant_entry *act_b = NULL;
>>      s16 rc = GNTST_okay;
>>  
>> -    spin_lock(&gt->lock);
>> +    write_lock(&gt->lock);
>>  
>>      /* Bounds check on the grant refs */
>>      if ( unlikely(ref_a >= nr_grant_entries(d->grant_table)))
>> @@ -2689,7 +2697,7 @@ out:
>>          active_entry_release(act_b);
>>      if ( act_a != NULL )
>>          active_entry_release(act_a);
>> -    spin_unlock(&gt->lock);
>> +    write_unlock(&gt->lock);
> 
> With the per-entry locks, does this still o be a write lock?

Yes, because we take two active entry locks and requiring the write lock
is easier than ordering the active_entry_acquire()s.

Again, this isn't a path I care about.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 1/4] gnttab: per-active entry locking
  2015-05-21  7:46   ` Jan Beulich
@ 2015-05-21 14:11     ` David Vrabel
  0 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-21 14:11 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

On 21/05/15 08:46, Jan Beulich wrote:
>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>> @@ -702,6 +729,7 @@ __gnttab_map_grant_ref(
>>  
>>      cache_flags = (shah->flags & (GTF_PAT | GTF_PWT | GTF_PCD) );
>>  
>> +    active_entry_release(act);
>>      spin_unlock(&rgt->lock);
> 
> Just for my understanding: The lock isn't meant to also cover *shah?
> I.e. it could be dropped ahead of the cache_flags assignment?

I think so, but I think it's preferable (for now) to keep the locked
regions as-is where possible.

There are few other places where it looked like the locked regions can
be reduced.  It won't really make much difference though since the read
lock and active entry locks are not contented.

>> @@ -978,7 +1009,7 @@ __gnttab_unmap_common(
>>          if ( (rc = replace_grant_host_mapping(op->host_addr,
>>                                                op->frame, op->new_addr, 
>>                                                op->flags)) < 0 )
>> -            goto unmap_out;
>> +            goto act_release_out;
>>  
>>          ASSERT(act->pin & (GNTPIN_hstw_mask | GNTPIN_hstr_mask));
>>          op->map->flags &= ~GNTMAP_host_map;
> 
> act doesn't look to be accessed anymore after the if() this is
> contained in - why don't you drop the lock as early as possible?
> Or wait - do we need act->pin to remain stable until after that
> subsequent if() (in which case dropping the lock before the
> final if() in the function would mainly make error handling more
> cumbersome without bying us much)?

See comment above.

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-21 13:36     ` David Vrabel
@ 2015-05-21 14:20       ` David Vrabel
  2015-05-21 14:53       ` Jan Beulich
  1 sibling, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-21 14:20 UTC (permalink / raw)
  To: David Vrabel, Jan Beulich
  Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

On 21/05/15 14:36, David Vrabel wrote:
> On 21/05/15 11:32, Jan Beulich wrote:
>>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>>> @@ -842,8 +845,6 @@ __gnttab_map_grant_ref(
>>>      mt->ref   = op->ref;
>>>      mt->flags = op->flags;
>>>  
>>> -    double_gt_unlock(lgt, rgt);
>>
>> Don't the mt-> updates above need some kind of protection?
> 
> It depends:
> 
> If not gnttab_need_iommu_mapping() but we only need a write barrier
> before the mt->flags store.
> 
> If gnttab_need_iommu_mapping() then a lock is required to prevent racing
> with concurrent mapcount() calls.  This is not a path I care about so
> its easiest to just keep the double lock around this.

Like so:

@@ -806,12 +806,13 @@ __gnttab_map_grant_ref(
         goto undo_out;
     }

-    double_gt_lock(lgt, rgt);
-
     if ( gnttab_need_iommu_mapping(ld) )
     {
         unsigned int wrc, rdc;
         int err = 0;
+
+        double_gt_lock(lgt, rgt);
+
         /* We're not translated, so we know that gmfns and mfns are
            the same things, so the IOMMU entry is always 1-to-1. */
         mapcount(lgt, rd, frame, &wrc, &rdc);
@@ -837,12 +838,22 @@ __gnttab_map_grant_ref(

     TRACE_1D(TRC_MEM_PAGE_GRANT_MAP, op->dom);

+    /*
+     * All maptrack entry users check mt->flags first before using the
+     * other fields so just ensure the flags field is stored last.
+     *
+     * However, if gnttab_need_iommu_mapping() then this would race
+     * with a concurrent mapcount() call (on an unmap, for example)
+     * and a lock is required.
+     */
     mt = &maptrack_entry(lgt, handle);
     mt->domid = op->dom;
     mt->ref   = op->ref;
-    mt->flags = op->flags;
+    wmb();
+    write_atomic(&mt->flags, op->flags);

-    double_gt_unlock(lgt, rgt);
+    if ( gnttab_need_iommu_mapping(ld) )
+        double_gt_unlock(lgt, rgt);

     op->dev_bus_addr = (u64)frame << PAGE_SHIFT;
     op->handle       = handle;

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-21 13:36     ` David Vrabel
  2015-05-21 14:20       ` David Vrabel
@ 2015-05-21 14:53       ` Jan Beulich
  2015-05-21 15:16         ` David Vrabel
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2015-05-21 14:53 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 21.05.15 at 15:36, <david.vrabel@citrix.com> wrote:
> On 21/05/15 11:32, Jan Beulich wrote:
>>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>>> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>>>              if ( (wrc + rdc) == 0 )
>>>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>>>          }
>>> +
>>> +        double_gt_lock(lgt, rgt);
>> 
>> unlock. And with this code path actually used (due to the bug it's
>> pretty clear it didn't get exercised in your testing), how does
>> performance look like? 
> 
> I think it will be no worse than what it was before -- this path already
> really sucks (mapcount() loops over 1000s of entries).  I don't care
> about this path at all.

It's kind of strange that you don't care about this case - afaict we're
not getting there by default solely because dom0-strict mode is not
currently the default (albeit arguably it should be).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-21 14:53       ` Jan Beulich
@ 2015-05-21 15:16         ` David Vrabel
  2015-05-22  6:37           ` Jan Beulich
  0 siblings, 1 reply; 15+ messages in thread
From: David Vrabel @ 2015-05-21 15:16 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

On 21/05/15 15:53, Jan Beulich wrote:
>>>> On 21.05.15 at 15:36, <david.vrabel@citrix.com> wrote:
>> On 21/05/15 11:32, Jan Beulich wrote:
>>>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>>>> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>>>>              if ( (wrc + rdc) == 0 )
>>>>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>>>>          }
>>>> +
>>>> +        double_gt_lock(lgt, rgt);
>>>
>>> unlock. And with this code path actually used (due to the bug it's
>>> pretty clear it didn't get exercised in your testing), how does
>>> performance look like? 
>>
>> I think it will be no worse than what it was before -- this path already
>> really sucks (mapcount() loops over 1000s of entries).  I don't care
>> about this path at all.
> 
> It's kind of strange that you don't care about this case - afaict we're
> not getting there by default solely because dom0-strict mode is not
> currently the default (albeit arguably it should be).

What's your point?  Are you going to refuse this series unless it also
optimizes dom0-strict mode?

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-21 15:16         ` David Vrabel
@ 2015-05-22  6:37           ` Jan Beulich
  2015-05-22  8:54             ` David Vrabel
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Beulich @ 2015-05-22  6:37 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

>>> On 21.05.15 at 17:16, <david.vrabel@citrix.com> wrote:
> On 21/05/15 15:53, Jan Beulich wrote:
>>>>> On 21.05.15 at 15:36, <david.vrabel@citrix.com> wrote:
>>> On 21/05/15 11:32, Jan Beulich wrote:
>>>>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>>>>> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>>>>>              if ( (wrc + rdc) == 0 )
>>>>>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>>>>>          }
>>>>> +
>>>>> +        double_gt_lock(lgt, rgt);
>>>>
>>>> unlock. And with this code path actually used (due to the bug it's
>>>> pretty clear it didn't get exercised in your testing), how does
>>>> performance look like? 
>>>
>>> I think it will be no worse than what it was before -- this path already
>>> really sucks (mapcount() loops over 1000s of entries).  I don't care
>>> about this path at all.
>> 
>> It's kind of strange that you don't care about this case - afaict we're
>> not getting there by default solely because dom0-strict mode is not
>> currently the default (albeit arguably it should be).
> 
> What's your point?  Are you going to refuse this series unless it also
> optimizes dom0-strict mode?

No, I'm certainly not going to be. But this series of "I don't care"
worry me to a certain degree (apart from making me wonder
why, as so far I thought an as secure as possible environment is
key for XenServer, and "iommu=dom0-strict" is certainly helping
with that, which is also why I said that this should arguably
become the default).

Jan

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock
  2015-05-22  6:37           ` Jan Beulich
@ 2015-05-22  8:54             ` David Vrabel
  0 siblings, 0 replies; 15+ messages in thread
From: David Vrabel @ 2015-05-22  8:54 UTC (permalink / raw)
  To: Jan Beulich, David Vrabel
  Cc: xen-devel, Keir Fraser, Ian Campbell, Tim Deegan

On 22/05/15 07:37, Jan Beulich wrote:
>>>> On 21.05.15 at 17:16, <david.vrabel@citrix.com> wrote:
>> On 21/05/15 15:53, Jan Beulich wrote:
>>>>>> On 21.05.15 at 15:36, <david.vrabel@citrix.com> wrote:
>>>> On 21/05/15 11:32, Jan Beulich wrote:
>>>>>>>> On 20.05.15 at 17:54, <david.vrabel@citrix.com> wrote:
>>>>>> @@ -827,9 +828,11 @@ __gnttab_map_grant_ref(
>>>>>>              if ( (wrc + rdc) == 0 )
>>>>>>                  err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>>>>>>          }
>>>>>> +
>>>>>> +        double_gt_lock(lgt, rgt);
>>>>>
>>>>> unlock. And with this code path actually used (due to the bug it's
>>>>> pretty clear it didn't get exercised in your testing), how does
>>>>> performance look like? 
>>>>
>>>> I think it will be no worse than what it was before -- this path already
>>>> really sucks (mapcount() loops over 1000s of entries).  I don't care
>>>> about this path at all.
>>>
>>> It's kind of strange that you don't care about this case - afaict we're
>>> not getting there by default solely because dom0-strict mode is not
>>> currently the default (albeit arguably it should be).
>>
>> What's your point?  Are you going to refuse this series unless it also
>> optimizes dom0-strict mode?
> 
> No, I'm certainly not going to be. But this series of "I don't care"
> worry me to a certain degree (apart from making me wonder
> why, as so far I thought an as secure as possible environment is
> key for XenServer, and "iommu=dom0-strict" is certainly helping
> with that, which is also why I said that this should arguably
> become the default).

dom0-strict mode doesn't align with our PV-IOMMU plans to make BFN ==
PFN (including for grants).

David

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-05-22  8:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20 15:54 [PATCHv9 0/3] gnttab: Improve scaleability David Vrabel
2015-05-20 15:54 ` [PATCHv9 1/4] gnttab: per-active entry locking David Vrabel
2015-05-21  7:46   ` Jan Beulich
2015-05-21 14:11     ` David Vrabel
2015-05-20 15:54 ` [PATCHv9 2/4] gnttab: introduce maptrack lock David Vrabel
2015-05-20 15:54 ` [PATCHv9 3/4] gnttab: make the grant table lock a read-write lock David Vrabel
2015-05-21 10:32   ` Jan Beulich
2015-05-21 13:36     ` David Vrabel
2015-05-21 14:20       ` David Vrabel
2015-05-21 14:53       ` Jan Beulich
2015-05-21 15:16         ` David Vrabel
2015-05-22  6:37           ` Jan Beulich
2015-05-22  8:54             ` David Vrabel
2015-05-20 15:54 ` [PATCHv9 4/4] gnttab: use per-VCPU maptrack free lists David Vrabel
2015-05-21 10:42   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.