All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/9] Memory scrubbing from idle loop
@ 2017-08-08 21:44 Boris Ostrovsky
  2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:44 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky


Changes in v7:
* Split free_heap_pages() buddy merge changes into a separate patch (patch 1)
* Changed type for page_info.u.free.need_tlbflush to bool:1
* Added BUILD_BUG_ON
* Adjusted datatype of temp variable in check_and_stop_scrub()
* Formatting changes

(see per-patch changes)

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag

V5:
* Make page_info.u.free and union and use bitfields there.
* Bug fixes

V6:
* Changed first_dirty tracking from pointer-based to index-based (patch 1)
* Added/modified a few ASSERT()s
* Moved/modifed a couple of comments
* Adjusted width of INVALID_DIRTY_IDX


Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.



Boris Ostrovsky (9):
  mm: Clean up free_heap_pages()
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug          |   7 +
 xen/arch/arm/domain.c      |   8 +-
 xen/arch/x86/domain.c      |   8 +-
 xen/arch/x86/domain_page.c |   6 +-
 xen/common/page_alloc.c    | 607 ++++++++++++++++++++++++++++++++++++++-------
 xen/common/spinlock.c      |   9 +-
 xen/include/asm-arm/mm.h   |  30 ++-
 xen/include/asm-x86/mm.h   |  31 ++-
 xen/include/xen/mm.h       |   5 +-
 xen/include/xen/spinlock.h |   8 +
 10 files changed, 612 insertions(+), 107 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v7 1/9] mm: Clean up free_heap_pages()
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-08-08 21:44 ` Boris Ostrovsky
  2017-08-10 12:21   ` Wei Liu
  2017-08-14 10:30   ` Jan Beulich
  2017-08-08 21:45 ` [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:44 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

Make buddy merging part of free_heap_pages() a bit more readable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v7:
* New in this version (this used to be part of patch 2, splitting
  into separate patch)

 xen/common/page_alloc.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 8bcef6a..330f8ed 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -977,24 +977,31 @@ static void free_heap_pages(
 
         if ( (page_to_mfn(pg) & mask) )
         {
+            struct page_info *predecessor = pg - mask;
+
             /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
+                 !page_state_is(predecessor, free) ||
+                 (PFN_ORDER(predecessor) != order) ||
+                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
+
+            page_list_del(predecessor, &heap(node, zone, order));
+
+            pg = predecessor;
         }
         else
         {
+            struct page_info *successor = pg + mask;
+
             /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
+                 !page_state_is(successor, free) ||
+                 (PFN_ORDER(successor) != order) ||
+                 (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
-            page_list_del(pg + mask, &heap(node, zone, order));
+
+            page_list_del(successor, &heap(node, zone, order));
         }
 
         order++;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-14 10:37   ` Jan Beulich
  2017-08-14 11:16   ` Julien Grall
  2017-08-08 21:45 ` [PATCH v7 3/9] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

.. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v7
* Simplified tracking dirty pages in reserve_offlined_page() to match how
  it is done in alloc_heap_pages()
* Changed page_info.u.free.need_tlbflush definition to bool:1
* Added BUILD_BUG_ON() to make sure page_info.u stays within unsigned long

 xen/common/page_alloc.c  | 157 ++++++++++++++++++++++++++++++++++++++++-------
 xen/include/asm-arm/mm.h |  18 +++++-
 xen/include/asm-x86/mm.h |  17 ++++-
 3 files changed, 167 insertions(+), 25 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 330f8ed..b857a44 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,13 +680,30 @@ static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+    {
+        ASSERT(first_dirty < (1U << order));
+        page_list_add_tail(pg, &heap(node, zone, order));
+    }
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
     unsigned int order, unsigned int memflags,
     struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
+    unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
     unsigned long request = 1UL << order;
     struct page_info *pg;
@@ -798,12 +817,26 @@ static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+
+    first_dirty = pg->u.free.first_dirty;
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
-        pg += 1 << j;
+        j--;
+        page_list_add_scrub(pg, node, zone, j,
+                            (1U << j) > first_dirty ?
+                            first_dirty : INVALID_DIRTY_IDX);
+        pg += 1U << j;
+
+        if ( first_dirty != INVALID_DIRTY_IDX )
+        {
+            /* Adjust first_dirty */
+            if ( first_dirty >= 1U << j )
+                first_dirty -= 1U << j;
+            else
+                first_dirty = 0; /* We've moved past original first_dirty */
+        }
     }
 
     ASSERT(avail[node][zone] >= request);
@@ -850,12 +883,20 @@ static int reserve_offlined_page(struct page_info *head)
     unsigned int node = phys_to_nid(page_to_maddr(head));
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
-    int cur_order;
+    unsigned int cur_order, first_dirty;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    first_dirty = head->u.free.first_dirty;
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -866,6 +907,8 @@ static int reserve_offlined_page(struct page_info *head)
         if ( page_state_is(cur_head, offlined) )
         {
             cur_head++;
+            if ( first_dirty != INVALID_DIRTY_IDX && first_dirty )
+                first_dirty--;
             continue;
         }
 
@@ -892,9 +935,20 @@ static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+                page_list_add_scrub(cur_head, node, zone, cur_order,
+                                    (1U << cur_order) > first_dirty ?
+                                    first_dirty : INVALID_DIRTY_IDX);
                 cur_head += (1 << cur_order);
+
+                /* Adjust first_dirty if needed. */
+                if ( first_dirty != INVALID_DIRTY_IDX )
+                {
+                    if ( first_dirty >=  1U << cur_order )
+                        first_dirty -= 1U << cur_order;
+                    else
+                        first_dirty = 0;
+                }
+
                 break;
             }
         }
@@ -919,9 +973,53 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1059,20 @@ static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -988,6 +1096,12 @@ static void free_heap_pages(
 
             page_list_del(predecessor, &heap(node, zone, order));
 
+            /* Keep predecessor's first_dirty if it is already set. */
+            if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX &&
+                 pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
             pg = predecessor;
         }
         else
@@ -1007,12 +1121,14 @@ static void free_heap_pages(
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1233,7 +1349,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1276,6 +1392,8 @@ static void init_heap_pages(
 {
     unsigned long i;
 
+    BUILD_BUG_ON(sizeof(pg->u) != sizeof(unsigned long));
+
     for ( i = 0; i < nr_pages; i++ )
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
@@ -1302,7 +1420,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1629,7 +1747,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1683,12 +1801,9 @@ void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1797,7 +1912,7 @@ struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1865,11 +1980,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index ef84b72..572337c 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -44,7 +44,16 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            bool need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more bit than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
+
         } free;
 
     } u;
@@ -107,6 +116,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern mfn_t xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2bf3f33..07dc963 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -88,7 +88,15 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            bool need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more bit than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
         } free;
 
     } u;
@@ -233,6 +241,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 #define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap)
 #define is_xen_heap_mfn(mfn) \
     (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 3/9] mm: Extract allocation loop from alloc_heap_pages()
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 4/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/common/page_alloc.c | 139 +++++++++++++++++++++++++++---------------------
 1 file changed, 77 insertions(+), 62 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index b857a44..5920e87 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -697,22 +697,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-    unsigned int zone_lo, unsigned int zone_hi,
-    unsigned int order, unsigned int memflags,
-    struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+                                        unsigned int zone_hi,
+                                        unsigned int order, unsigned int memflags,
+                                        const struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-    unsigned long request = 1UL << order;
+    nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+    unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
-    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
-    uint32_t tlbflush_timestamp = 0;
-
-    /* Make sure there are enough bits in memflags for nodeID. */
-    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
 
     if ( node == NUMA_NO_NODE )
     {
@@ -728,34 +721,6 @@ static struct page_info *alloc_heap_pages(
     first_node = node;
 
     ASSERT(node < MAX_NUMNODES);
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
-    if ( unlikely(order > MAX_ORDER) )
-        return NULL;
-
-    spin_lock(&heap_lock);
-
-    /*
-     * Claimed memory is considered unavailable unless the request
-     * is made by a domain with sufficient unclaimed pages.
-     */
-    if ( (outstanding_claims + request >
-          total_avail_pages + tmem_freeable_pages()) &&
-          ((memflags & MEMF_no_refcount) ||
-           !d || d->outstanding_pages < request) )
-        goto not_found;
-
-    /*
-     * TMEM: When available memory is scarce due to tmem absorbing it, allow
-     * only mid-size allocations to avoid worst of fragmentation issues.
-     * Others try tmem pools then fail.  This is a workaround until all
-     * post-dom0-creation-multi-page allocations can be eliminated.
-     */
-    if ( ((order == 0) || (order >= 9)) &&
-         (total_avail_pages <= midsize_alloc_zone_pages) &&
-         tmem_freeable_pages() )
-        goto try_tmem;
 
     /*
      * Start with requested node, but exhaust all node memory in requested 
@@ -767,17 +732,17 @@ static struct page_info *alloc_heap_pages(
         zone = zone_hi;
         do {
             /* Check if target node can support the allocation. */
-            if ( !avail[node] || (avail[node][zone] < request) )
+            if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
                 continue;
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    goto found;
+                    return pg;
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-            goto not_found;
+            return NULL;
 
         /* Pick next node. */
         if ( !node_isset(node, nodemask) )
@@ -794,46 +759,96 @@ static struct page_info *alloc_heap_pages(
         {
             /* When we have tried all in nodemask, we fall back to others. */
             if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-                goto not_found;
+                return NULL;
             nodes_andnot(nodemask, node_online_map, nodemask);
             first_node = node = first_node(nodemask);
             if ( node >= MAX_NUMNODES )
-                goto not_found;
+                return NULL;
         }
     }
+}
+
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+    unsigned int zone_lo, unsigned int zone_hi,
+    unsigned int order, unsigned int memflags,
+    struct domain *d)
+{
+    nodeid_t node;
+    unsigned int i, buddy_order, zone, first_dirty;
+    unsigned long request = 1UL << order;
+    struct page_info *pg;
+    bool need_tlbflush = false;
+    uint32_t tlbflush_timestamp = 0;
+
+    /* Make sure there are enough bits in memflags for nodeID. */
+    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < request) )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
- try_tmem:
-    /* Try to free memory from tmem */
-    if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL )
+    /*
+     * TMEM: When available memory is scarce due to tmem absorbing it, allow
+     * only mid-size allocations to avoid worst of fragmentation issues.
+     * Others try tmem pools then fail.  This is a workaround until all
+     * post-dom0-creation-multi-page allocations can be eliminated.
+     */
+    if ( ((order == 0) || (order >= 9)) &&
+         (total_avail_pages <= midsize_alloc_zone_pages) &&
+         tmem_freeable_pages() )
     {
-        /* reassigning an already allocated anonymous heap page */
+        /* Try to free memory from tmem. */
+        pg = tmem_relinquish_pages(order, memflags);
         spin_unlock(&heap_lock);
         return pg;
     }
 
- not_found:
-    /* No suitable memory blocks. Fail the request. */
-    spin_unlock(&heap_lock);
-    return NULL;
+    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    if ( !pg )
+    {
+        /* No suitable memory blocks. Fail the request. */
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
- found: 
+    node = phys_to_nid(page_to_maddr(pg));
+    zone = page_to_zone(pg);
+    buddy_order = PFN_ORDER(pg);
 
     first_dirty = pg->u.free.first_dirty;
 
     /* We may have to halve the chunk a number of times. */
-    while ( j != order )
+    while ( buddy_order != order )
     {
-        j--;
-        page_list_add_scrub(pg, node, zone, j,
-                            (1U << j) > first_dirty ?
+        buddy_order--;
+        page_list_add_scrub(pg, node, zone, buddy_order,
+                            (1U << buddy_order) > first_dirty ?
                             first_dirty : INVALID_DIRTY_IDX);
-        pg += 1U << j;
+        pg += 1U << buddy_order;
 
         if ( first_dirty != INVALID_DIRTY_IDX )
         {
             /* Adjust first_dirty */
-            if ( first_dirty >= 1U << j )
-                first_dirty -= 1U << j;
+            if ( first_dirty >= 1U << buddy_order )
+                first_dirty -= 1U << buddy_order;
             else
                 first_dirty = 0; /* We've moved past original first_dirty */
         }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 4/9] mm: Scrub pages in alloc_heap_pages() if needed
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 3/9] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 5/9] mm: Scrub memory from idle loop Boris Ostrovsky
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/common/page_alloc.c | 33 +++++++++++++++++++++++++++++----
 xen/include/xen/mm.h    |  4 +++-
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 5920e87..7fa8896 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -706,6 +706,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -737,8 +738,20 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    /*
+                     * We grab single pages (order=0) even if they are
+                     * unscrubbed. Given that scrubbing one page is fairly quick
+                     * it is not worth breaking higher orders.
+                     */
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -822,6 +835,10 @@ static struct page_info *alloc_heap_pages(
     }
 
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    /* Try getting a dirty buddy if we couldn't get a clean one. */
+    if ( !pg && !(memflags & MEMF_no_scrub) )
+        pg = get_free_buddy(zone_lo, zone_hi, order,
+                            memflags | MEMF_no_scrub, d);
     if ( !pg )
     {
         /* No suitable memory blocks. Fail the request. */
@@ -867,7 +884,15 @@ static struct page_info *alloc_heap_pages(
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( !(memflags & MEMF_no_scrub) )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1743,7 +1768,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1793,7 +1818,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 503b92e..e1f9c42 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -248,7 +248,9 @@ struct npfec {
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_no_icache_flush 7
 #define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
-#define _MEMF_node        8
+#define _MEMF_no_scrub    8
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
+#define _MEMF_node        16
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
 #define  MEMF_get_node(f) ((((f) >> _MEMF_node) - 1) & MEMF_node_mask)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 5/9] mm: Scrub memory from idle loop
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 4/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, tim, julien.grall, jbeulich,
	Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
---
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
 xen/arch/arm/domain.c      |   8 ++-
 xen/arch/x86/domain.c      |   8 ++-
 xen/arch/x86/domain_page.c |   6 +--
 xen/common/page_alloc.c    | 119 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h       |   1 +
 5 files changed, 124 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2dc8b0a..d7961bb 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -51,7 +51,13 @@ void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        /*
+         * Test softirqs twice --- first to see if should even try scrubbing
+         * and then, after it is done, whether softirqs became pending
+         * while we were scrubbing.
+         */
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+                    !softirq_pending(cpu) )
         {
             local_irq_disable();
             if ( cpu_is_haltable(cpu) )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index baaf815..9b4b959 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -122,7 +122,13 @@ static void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        /*
+         * Test softirqs twice --- first to see if should even try scrubbing
+         * and then, after it is done, whether softirqs became pending
+         * while we were scrubbing.
+         */
+        else if ( !softirq_pending(cpu) && !scrub_free_pages()  &&
+                    !softirq_pending(cpu) )
             pm_idle();
         do_softirq();
         /*
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..0783c1e 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,12 @@
 #include <asm/hardirq.h>
 #include <asm/setup.h>
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
     /* In the common case we use the mapcache of the running VCPU. */
-    struct vcpu *v = override ?: current;
+    struct vcpu *v = this_cpu(override) ?: current;
 
     /*
      * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-    override = v;
+    this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7fa8896..b886983 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1013,15 +1013,86 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-    struct page_info *pg;
-    unsigned int zone;
+    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+    nodeid_t closest = NUMA_NO_NODE;
+    u8 dist, shortest = 0xff;
 
-    ASSERT(spin_is_locked(&heap_lock));
+    if ( node == NUMA_NO_NODE )
+        node = 0;
 
-    if ( !node_need_scrub[node] )
-        return;
+    if ( node_need_scrub[node] &&
+         (!get_node || !node_test_and_set(node, node_scrubbing)) )
+        return node;
+
+    /*
+     * See if there are memory-only nodes that need scrubbing and choose
+     * the closest one.
+     */
+    local_node = node;
+    for ( ; ; )
+    {
+        do {
+            node = cycle_node(node, node_online_map);
+        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
+                  (node != local_node) );
+
+        if ( node == local_node )
+            break;
+
+        if ( node_need_scrub[node] )
+        {
+            if ( !get_node )
+                return node;
+
+            dist = __node_distance(local_node, node);
+
+            /*
+             * Grab the node right away. If we find a closer node later we will
+             * release this one. While there is a chance that another CPU will
+             * not be able to scrub that node when it is searching for scrub work
+             * at the same time it will be able to do so next time it wakes up.
+             * The alternative would be to perform this search under a lock but
+             * then we'd need to take this lock every time we come in here.
+             */
+            if ( (dist < shortest || closest == NUMA_NO_NODE) &&
+                 !node_test_and_set(node, node_scrubbing) )
+            {
+                if ( closest != NUMA_NO_NODE )
+                    node_clear(closest, node_scrubbing);
+                shortest = dist;
+                closest = node;
+            }
+        }
+    }
+
+    return closest;
+}
+
+bool scrub_free_pages(void)
+{
+    struct page_info *pg;
+    unsigned int zone;
+    unsigned int cpu = smp_processor_id();
+    bool preempt = false;
+    nodeid_t node;
+    unsigned int cnt = 0;
+  
+    node = node_to_scrub(true);
+    if ( node == NUMA_NO_NODE )
+        return false;
+ 
+    spin_lock(&heap_lock);
 
     for ( zone = 0; zone < NR_ZONES; zone++ )
     {
@@ -1044,17 +1115,42 @@ static void scrub_free_pages(unsigned int node)
                         scrub_one_page(&pg[i]);
                         pg[i].count_info &= ~PGC_need_scrub;
                         node_need_scrub[node]--;
+                        cnt += 100; /* scrubbed pages add heavier weight. */
+                    }
+                    else
+                        cnt++;
+
+                    /*
+                     * Scrub a few (8) pages before becoming eligible for
+                     * preemption. But also count non-scrubbing loop iterations
+                     * so that we don't get stuck here with an almost clean
+                     * heap.
+                     */
+                    if ( cnt > 800 && softirq_pending(cpu) )
+                    {
+                        preempt = true;
+                        break;
                     }
                 }
 
-                page_list_del(pg, &heap(node, zone, order));
-                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                if ( i >= (1U << order) - 1 )
+                {
+                    page_list_del(pg, &heap(node, zone, order));
+                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                }
+                else
+                    pg->u.free.first_dirty = i + 1;
 
-                if ( node_need_scrub[node] == 0 )
-                    return;
+                if ( preempt || (node_need_scrub[node] == 0) )
+                    goto out;
             }
         } while ( order-- != 0 );
     }
+
+ out:
+    spin_unlock(&heap_lock);
+    node_clear(node, node_scrubbing);
+    return node_to_scrub(false) != NUMA_NO_NODE;
 }
 
 /* Free 2^@order set of pages. */
@@ -1166,9 +1262,6 @@ static void free_heap_pages(
     if ( tainted )
         reserve_offlined_page(pg);
 
-    if ( need_scrub )
-        scrub_free_pages(node);
-
     spin_unlock(&heap_lock);
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index e1f9c42..ddc3fb3 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -160,6 +160,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
 void free_xenheap_pages(void *v, unsigned int order);
+bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 /* Map machine page range in Xen virtual address space. */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 6/9] spinlock: Introduce spin_lock_cb()
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 5/9] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-14 11:22   ` Julien Grall
  2017-08-08 21:45 ` [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
 xen/common/spinlock.c      | 9 ++++++++-
 xen/include/xen/spinlock.h | 8 ++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
     return read_atomic(&t->head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
     spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
     LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
     while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
+        if ( unlikely(cb) )
+            cb(data);
         arch_lock_relax();
     }
     LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
     arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+     _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)                  _spin_lock(l)
+#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)              _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
     1 : ({ local_irq_restore(flags); 0; });     \
 })
 
+#define spin_lock_kick(l)                       \
+({                                              \
+    smp_mb();                                   \
+    arch_lock_signal();                         \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)               _spin_barrier(l)
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-14 10:38   ` Jan Beulich
  2017-08-08 21:45 ` [PATCH v7 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
  8 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v7:
* Replaced page_info with typeof(head->u.free) in check_and_stop_scrub()
* Replaced 1UL with 1U in scrub_free_pages()
* Fixed formatting in asm-*/mm.h

 xen/common/page_alloc.c  | 110 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/mm.h |  28 +++++++-----
 xen/include/asm-x86/mm.h |  30 ++++++++-----
 3 files changed, 143 insertions(+), 25 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index b886983..726f857 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -687,6 +687,7 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
 {
     PFN_ORDER(pg) = order;
     pg->u.free.first_dirty = first_dirty;
+    pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
 
     if ( first_dirty != INVALID_DIRTY_IDX )
     {
@@ -697,6 +698,25 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
+    {
+        typeof(head->u.free) pgfree;
+
+        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
+        spin_lock_kick();
+        for ( ; ; )
+        {
+            /* Can't ACCESS_ONCE() a bitfield. */
+            pgfree.val = ACCESS_ONCE(head->u.free.val);
+            if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
+                break;
+            cpu_relax();
+        }
+    }
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
                                         unsigned int zone_hi,
                                         unsigned int order, unsigned int memflags,
@@ -741,14 +761,19 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
+                    if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                        return pg;
                     /*
                      * We grab single pages (order=0) even if they are
                      * unscrubbed. Given that scrubbing one page is fairly quick
                      * it is not worth breaking higher orders.
                      */
-                    if ( (order == 0) || use_unscrubbed ||
-                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                    if ( (order == 0) || use_unscrubbed )
+                    {
+                        check_and_stop_scrub(pg);
                         return pg;
+                    }
+
                     page_list_add_tail(pg, &heap(node, zone, j));
                 }
             }
@@ -929,6 +954,7 @@ static int reserve_offlined_page(struct page_info *head)
 
     cur_head = head;
 
+    check_and_stop_scrub(head);
     /*
      * We may break the buddy so let's mark the head as clean. Then, when
      * merging chunks back into the heap, we will see whether the chunk has
@@ -1079,6 +1105,29 @@ static unsigned int node_to_scrub(bool get_node)
     return closest;
 }
 
+struct scrub_wait_state {
+    struct page_info *pg;
+    unsigned int first_dirty;
+    bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+    struct scrub_wait_state *st = data;
+
+    if ( st->drop )
+        return;
+
+    if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+    {
+        /* There is a waiter for this buddy. Release it. */
+        st->drop = true;
+        st->pg->u.free.first_dirty = st->first_dirty;
+        smp_wmb();
+        st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+    }
+}
+
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
@@ -1101,25 +1150,53 @@ bool scrub_free_pages(void)
         do {
             while ( !page_list_empty(&heap(node, zone, order)) )
             {
-                unsigned int i;
+                unsigned int i, dirty_cnt;
+                struct scrub_wait_state st;
 
                 /* Unscrubbed pages are always at the end of the list. */
                 pg = page_list_last(&heap(node, zone, order));
                 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                     break;
 
+                ASSERT(pg->u.free.scrub_state == BUDDY_NOT_SCRUBBING);
+                pg->u.free.scrub_state = BUDDY_SCRUBBING;
+
+                spin_unlock(&heap_lock);
+
+                dirty_cnt = 0;
+
                 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
                 {
                     if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                     {
                         scrub_one_page(&pg[i]);
+                        /*
+                         * We can modify count_info without holding heap
+                         * lock since we effectively locked this buddy by
+                         * setting its scrub_state.
+                         */
                         pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        dirty_cnt++;
                         cnt += 100; /* scrubbed pages add heavier weight. */
                     }
                     else
                         cnt++;
 
+                    if ( pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+                    {
+                        /* Someone wants this chunk. Drop everything. */
+
+                        pg->u.free.first_dirty = (i == (1U << order) - 1) ?
+                            INVALID_DIRTY_IDX : i + 1; 
+                        smp_wmb();
+                        pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
+                        spin_lock(&heap_lock);
+                        node_need_scrub[node] -= dirty_cnt;
+                        spin_unlock(&heap_lock);
+                        goto out_nolock;
+                    }
+
                     /*
                      * Scrub a few (8) pages before becoming eligible for
                      * preemption. But also count non-scrubbing loop iterations
@@ -1133,6 +1210,23 @@ bool scrub_free_pages(void)
                     }
                 }
 
+                st.pg = pg;
+                /*
+                 * get_free_buddy() grabs a buddy with first_dirty set to
+                 * INVALID_DIRTY_IDX so we can't set pg's first_dirty here.
+                 * It will be set either below or in the lock callback (in
+                 * scrub_continue()).
+                 */
+                st.first_dirty = (i >= (1U << order) - 1) ?
+                    INVALID_DIRTY_IDX : i + 1;
+                st.drop = false;
+                spin_lock_cb(&heap_lock, scrub_continue, &st);
+
+                node_need_scrub[node] -= dirty_cnt;
+
+                if ( st.drop )
+                    goto out;
+
                 if ( i >= (1U << order) - 1 )
                 {
                     page_list_del(pg, &heap(node, zone, order));
@@ -1141,6 +1235,8 @@ bool scrub_free_pages(void)
                 else
                     pg->u.free.first_dirty = i + 1;
 
+                pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
                 if ( preempt || (node_need_scrub[node] == 0) )
                     goto out;
             }
@@ -1149,6 +1245,8 @@ bool scrub_free_pages(void)
 
  out:
     spin_unlock(&heap_lock);
+
+ out_nolock:
     node_clear(node, node_scrubbing);
     return node_to_scrub(false) != NUMA_NO_NODE;
 }
@@ -1230,6 +1328,8 @@ static void free_heap_pages(
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
+            check_and_stop_scrub(predecessor);
+
             page_list_del(predecessor, &heap(node, zone, order));
 
             /* Keep predecessor's first_dirty if it is already set. */
@@ -1251,6 +1351,8 @@ static void free_heap_pages(
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
 
+            check_and_stop_scrub(successor);
+
             page_list_del(successor, &heap(node, zone, order));
         }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 572337c..d42b070 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -42,18 +42,26 @@ struct page_info
             unsigned long type_info;
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            bool need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more bit than maximum possible order to accommodate
-             * INVALID_DIRTY_IDX.
-             */
+        union {
+            struct {
+                /* Do TLBs need flushing for safety before next page use? */
+                bool need_tlbflush:1;
+
+                /*
+                 * Index of the first *possibly* unscrubbed page in the buddy.
+                 * One more bit than maximum possible order to accommodate
+                 * INVALID_DIRTY_IDX.
+                 */
 #define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
-            unsigned long first_dirty:MAX_ORDER + 1;
+                unsigned long first_dirty:MAX_ORDER + 1;
+
+#define BUDDY_NOT_SCRUBBING    0
+#define BUDDY_SCRUBBING        1
+#define BUDDY_SCRUB_ABORT      2
+                unsigned long scrub_state:2;
+            };
 
+            unsigned long val;
         } free;
 
     } u;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 07dc963..c9cf6c3 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -86,19 +86,27 @@ struct page_info
         } sh;
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            bool need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more bit than maximum possible order to accommodate
-             * INVALID_DIRTY_IDX.
-             */
+        union {
+            struct {
+                /* Do TLBs need flushing for safety before next page use? */
+                bool need_tlbflush:1;
+
+                /*
+                 * Index of the first *possibly* unscrubbed page in the buddy.
+                 * One more bit than maximum possible order to accommodate
+                 * INVALID_DIRTY_IDX.
+                 */
 #define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
-            unsigned long first_dirty:MAX_ORDER + 1;
-        } free;
+                unsigned long first_dirty:MAX_ORDER + 1;
 
+#define BUDDY_NOT_SCRUBBING    0
+#define BUDDY_SCRUBBING        1
+#define BUDDY_SCRUB_ABORT      2
+                unsigned long scrub_state:2;
+            };
+
+            unsigned long val;
+        } free;
     } u;
 
     union {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 8/9] mm: Print number of unscrubbed pages in 'H' debug handler
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  2017-08-08 21:45 ` [PATCH v7 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
  8 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 726f857..675ac0d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2314,6 +2314,13 @@ static void dump_heap(unsigned char key)
             printk("heap[node=%d][zone=%d] -> %lu pages\n",
                    i, j, avail[i][j]);
     }
+
+    for ( i = 0; i < MAX_NUMNODES; i++ )
+    {
+        if ( !node_need_scrub[i] )
+            continue;
+        printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+    }
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v7 9/9] mm: Make sure pages are scrubbed
  2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2017-08-08 21:45 ` [PATCH v7 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
@ 2017-08-08 21:45 ` Boris Ostrovsky
  8 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-08 21:45 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, julien.grall, jbeulich, Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/Kconfig.debug       |  7 ++++++
 xen/common/page_alloc.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..195d504 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
 	  logged in the Xen ring buffer.
 	  If unsure, say N here.
 
+config SCRUB_DEBUG
+	bool "Page scrubbing test"
+	default DEBUG
+	---help---
+	  Verify that pages that need to be scrubbed before being allocated to
+	  a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 675ac0d..83eaffb 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool __read_mostly boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -698,6 +702,43 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#ifndef NDEBUG
+#define SCRUB_PATTERN        0xc2c2c2c2c2c2c2c2ULL
+#else
+#define SCRUB_PATTERN        0ULL
+#endif
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+
+    ptr = map_domain_page(mfn);
+    *ptr = ~SCRUB_PATTERN;
+    unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    const uint64_t *ptr;
+    unsigned int i;
+
+    if ( !boot_scrub_done )
+        return;
+
+    ptr = map_domain_page(mfn);
+    for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+        ASSERT(ptr[i] == SCRUB_PATTERN);
+    unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
     if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
@@ -932,6 +973,9 @@ static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
+
+        if ( !(memflags & MEMF_no_scrub) )
+            check_one_page(&pg[i]);
     }
 
     spin_unlock(&heap_lock);
@@ -1295,7 +1339,10 @@ static void free_heap_pages(
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
         if ( need_scrub )
+        {
             pg[i].count_info |= PGC_need_scrub;
+            poison_one_page(&pg[i]);
+        }
     }
 
     avail[node][zone] += 1 << order;
@@ -1655,7 +1702,12 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
+#ifndef CONFIG_SCRUB_DEBUG
         free_heap_pages(pg + i, 0, false);
+#else
+        free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+	
     }
 }
 
@@ -1921,6 +1973,10 @@ void __init scrub_heap_pages(void)
 
     printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+    boot_scrub_done = true;
+#endif
+
     /* Now that the heap is initialized, run checks and set bounds
      * for the low mem virq algorithm. */
     setup_low_mem_virq();
@@ -2194,12 +2250,16 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 
             spin_unlock_recursive(&d->page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
             /*
              * Normally we expect a domain to clear pages before freeing them,
              * if it cares about the secrecy of their contents. However, after
              * a domain has died we assume responsibility for erasure.
              */
             scrub = !!d->is_dying;
+#else
+            scrub = true;
+#endif
         }
         else
         {
@@ -2291,7 +2351,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
     /* Avoid callers relying on allocations returning zeroed pages. */
-    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+    unmap_domain_page(memset(__map_domain_page(pg),
+                             SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
     /* For a production build, clear_page() is the fastest way to scrub. */
     clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 1/9] mm: Clean up free_heap_pages()
  2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
@ 2017-08-10 12:21   ` Wei Liu
  2017-08-14 10:30   ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Wei Liu @ 2017-08-10 12:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall, jbeulich

On Tue, Aug 08, 2017 at 05:44:59PM -0400, Boris Ostrovsky wrote:
> Make buddy merging part of free_heap_pages() a bit more readable.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 1/9] mm: Clean up free_heap_pages()
  2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
  2017-08-10 12:21   ` Wei Liu
@ 2017-08-14 10:30   ` Jan Beulich
  1 sibling, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2017-08-14 10:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

>>> On 08.08.17 at 23:44, <boris.ostrovsky@oracle.com> wrote:
> Make buddy merging part of free_heap_pages() a bit more readable.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-08 21:45 ` [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-08-14 10:37   ` Jan Beulich
  2017-08-14 14:29     ` Boris Ostrovsky
  2017-08-14 11:16   ` Julien Grall
  1 sibling, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2017-08-14 10:37 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
> .. so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
> 
> We keep track of the first unscrubbed page in a page buddy using first_dirty
> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent 
> patches
> will allow scrubbing to be interrupted, resulting in first_dirty taking any
> value.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one remark:

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -88,7 +88,15 @@ struct page_info
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
>              /* Do TLBs need flushing for safety before next page use? */
> -            bool_t need_tlbflush;
> +            bool need_tlbflush:1;
> +
> +            /*
> +             * Index of the first *possibly* unscrubbed page in the buddy.
> +             * One more bit than maximum possible order to accommodate
> +             * INVALID_DIRTY_IDX.
> +             */
> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
> +            unsigned long first_dirty:MAX_ORDER + 1;
>          } free;

I think generated code will be better with the two fields swapped:
That way reading first_dirty won't involve a shift, and accessing a
single bit doesn't require shifts at all on many architectures.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing
  2017-08-08 21:45 ` [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-08-14 10:38   ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2017-08-14 10:38 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
> Instead of scrubbing pages while holding heap lock we can mark
> buddy's head as being scrubbed and drop the lock temporarily.
> If someone (most likely alloc_heap_pages()) tries to access
> this chunk it will signal the scrubber to abort scrub by setting
> head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
> processing each page and stops its work as soon as it sees it.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-08 21:45 ` [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-08-14 10:37   ` Jan Beulich
@ 2017-08-14 11:16   ` Julien Grall
  1 sibling, 0 replies; 26+ messages in thread
From: Julien Grall @ 2017-08-14 11:16 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Hi Boris,

On 08/08/17 22:45, Boris Ostrovsky wrote:
> .. so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
>
> We keep track of the first unscrubbed page in a page buddy using first_dirty
> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
> will allow scrubbing to be interrupted, resulting in first_dirty taking any
> value.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

For the ARM bits and with Jan suggestion:

Acked-by: Julien Grall <julien.grall@arm.com>

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/9] spinlock: Introduce spin_lock_cb()
  2017-08-08 21:45 ` [PATCH v7 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-08-14 11:22   ` Julien Grall
  2017-08-14 14:39     ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2017-08-14 11:22 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Hi Boris,

On 08/08/17 22:45, Boris Ostrovsky wrote:
> While waiting for a lock we may want to periodically run some
> code. This code may, for example, allow the caller to release
> resources held by it that are no longer needed in the critical
> section protected by the lock.
>
> Specifically, this feature will be needed by scrubbing code where
> the scrubber, while waiting for heap lock to merge back clean
> pages, may be requested by page allocator (which is currently
> holding the lock) to abort merging and release the buddy page head
> that the allocator wants.
>
> We could use spin_trylock() but since it doesn't take lock ticket
> it may take long time until the lock is taken. Instead we add
> spin_lock_cb() that allows us to grab the ticket and execute a
> callback while waiting. This callback is executed on every iteration
> of the spinlock waiting loop.
>
> Since we may be sleeping in the lock until it is released we need a
> mechanism that will make sure that the callback has a chance to run.
> We add spin_lock_kick() that will wake up the waiter.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Acked-by: Jan Beulich <jbeulich@suse.com>
> ---
>  xen/common/spinlock.c      | 9 ++++++++-
>  xen/include/xen/spinlock.h | 8 ++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
> index 2a06406..3c1caae 100644
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
>      return read_atomic(&t->head);
>  }
>
> -void _spin_lock(spinlock_t *lock)
> +void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
>  {
>      spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
>      LOCK_PROFILE_VAR;
> @@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
>      while ( tickets.tail != observe_head(&lock->tickets) )
>      {
>          LOCK_PROFILE_BLOCK;
> +        if ( unlikely(cb) )
> +            cb(data);
>          arch_lock_relax();
>      }
>      LOCK_PROFILE_GOT;
> @@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
>      arch_lock_acquire_barrier();
>  }
>
> +void _spin_lock(spinlock_t *lock)
> +{
> +     _spin_lock_cb(lock, NULL, NULL);
> +}
> +
>  void _spin_lock_irq(spinlock_t *lock)
>  {
>      ASSERT(local_irq_is_enabled());
> diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
> index c1883bd..91bfb95 100644
> --- a/xen/include/xen/spinlock.h
> +++ b/xen/include/xen/spinlock.h
> @@ -153,6 +153,7 @@ typedef struct spinlock {
>  #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
>
>  void _spin_lock(spinlock_t *lock);
> +void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
>  void _spin_lock_irq(spinlock_t *lock);
>  unsigned long _spin_lock_irqsave(spinlock_t *lock);
>
> @@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
>  void _spin_unlock_recursive(spinlock_t *lock);
>
>  #define spin_lock(l)                  _spin_lock(l)
> +#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
>  #define spin_lock_irq(l)              _spin_lock_irq(l)
>  #define spin_lock_irqsave(l, f)                                 \
>      ({                                                          \
> @@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
>      1 : ({ local_irq_restore(flags); 0; });     \
>  })
>
> +#define spin_lock_kick(l)                       \
> +({                                              \to understand why you need a stronger one here
> +    smp_mb();                                   \

arch_lock_signal() has already a barrier for ARM. So we have a double 
barrier now.

However, the barrier is slightly weaker (smp_wmb()). I am not sure why 
you need to use a stronger barrier here. What you care is the write to 
be done before signaling, read does not much matter. Did I miss anything?

Cheers,

> +    arch_lock_signal();                         \
> +})
> +
>  /* Ensure a lock is quiescent between two critical operations. */
>  #define spin_barrier(l)               _spin_barrier(l)
>
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-14 10:37   ` Jan Beulich
@ 2017-08-14 14:29     ` Boris Ostrovsky
  2017-08-15  8:18       ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-14 14:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>> .. so that it's easy to find pages that need to be scrubbed (those pages are
>> now marked with _PGC_need_scrub bit).
>>
>> We keep track of the first unscrubbed page in a page buddy using first_dirty
>> field. For now it can have two values, 0 (whole buddy needs scrubbing) or
>> INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent 
>> patches
>> will allow scrubbing to be interrupted, resulting in first_dirty taking any
>> value.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one remark:
>
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -88,7 +88,15 @@ struct page_info
>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>          struct {
>>              /* Do TLBs need flushing for safety before next page use? */
>> -            bool_t need_tlbflush;
>> +            bool need_tlbflush:1;
>> +
>> +            /*
>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>> +             * One more bit than maximum possible order to accommodate
>> +             * INVALID_DIRTY_IDX.
>> +             */
>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>          } free;
> I think generated code will be better with the two fields swapped:
> That way reading first_dirty won't involve a shift, and accessing a
> single bit doesn't require shifts at all on many architectures.

Ok, I will then keep need_tlbflush as the last field so the final struct
(as defined in patch 7) will look like

struct {
        unsigned long first_dirty:MAX_ORDER + 1;
        unsigned long scrub_state:2;
        bool need_tlbflush:1;
};


-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/9] spinlock: Introduce spin_lock_cb()
  2017-08-14 11:22   ` Julien Grall
@ 2017-08-14 14:39     ` Boris Ostrovsky
  2017-08-14 14:42       ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-14 14:39 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


>>
>> +#define spin_lock_kick(l)                       \
>> +({                                              \to understand why
>> you need a stronger one here
>> +    smp_mb();                                   \
>
> arch_lock_signal() has already a barrier for ARM. So we have a double
> barrier now.
>
> However, the barrier is slightly weaker (smp_wmb()). I am not sure why
> you need to use a stronger barrier here. What you care is the write to
> be done before signaling, read does not much matter. Did I miss anything?

Yes, smp_wmb() should be sufficient.

Should I then add arch_lock_signal_wmb() --- defined as
arch_lock_signal() for ARM and smp_wmb() for x86?


-boris

>
> Cheers,
>
>> +    arch_lock_signal();                         \
>> +})
>> +
>>  /* Ensure a lock is quiescent between two critical operations. */
>>  #define spin_barrier(l)               _spin_barrier(l)
>>
>>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/9] spinlock: Introduce spin_lock_cb()
  2017-08-14 14:39     ` Boris Ostrovsky
@ 2017-08-14 14:42       ` Julien Grall
  2017-08-14 14:57         ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2017-08-14 14:42 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich



On 14/08/17 15:39, Boris Ostrovsky wrote:
>
>>>
>>> +#define spin_lock_kick(l)                       \
>>> +({                                              \to understand why
>>> you need a stronger one here
>>> +    smp_mb();                                   \
>>
>> arch_lock_signal() has already a barrier for ARM. So we have a double
>> barrier now.
>>
>> However, the barrier is slightly weaker (smp_wmb()). I am not sure why
>> you need to use a stronger barrier here. What you care is the write to
>> be done before signaling, read does not much matter. Did I miss anything?
>
> Yes, smp_wmb() should be sufficient.
>
> Should I then add arch_lock_signal_wmb() --- defined as
> arch_lock_signal() for ARM and smp_wmb() for x86?

I am not an x86 expert. Do you know why the barrier is not in 
arch_lock_signal() today?

>
>
> -boris
>
>>
>> Cheers,
>>
>>> +    arch_lock_signal();                         \
>>> +})
>>> +
>>>  /* Ensure a lock is quiescent between two critical operations. */
>>>  #define spin_barrier(l)               _spin_barrier(l)
>>>
>>>
>>
>

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 6/9] spinlock: Introduce spin_lock_cb()
  2017-08-14 14:42       ` Julien Grall
@ 2017-08-14 14:57         ` Boris Ostrovsky
  0 siblings, 0 replies; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-14 14:57 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

On 08/14/2017 10:42 AM, Julien Grall wrote:
>
>
> On 14/08/17 15:39, Boris Ostrovsky wrote:
>>
>>>>
>>>> +#define spin_lock_kick(l)                       \
>>>> +({                                              \to understand why
>>>> you need a stronger one here
>>>> +    smp_mb();                                   \
>>>
>>> arch_lock_signal() has already a barrier for ARM. So we have a double
>>> barrier now.
>>>
>>> However, the barrier is slightly weaker (smp_wmb()). I am not sure why
>>> you need to use a stronger barrier here. What you care is the write to
>>> be done before signaling, read does not much matter. Did I miss
>>> anything?
>>
>> Yes, smp_wmb() should be sufficient.
>>
>> Should I then add arch_lock_signal_wmb() --- defined as
>> arch_lock_signal() for ARM and smp_wmb() for x86?
>
> I am not an x86 expert. Do you know why the barrier is not in
> arch_lock_signal() today?

Possibly because _spin_unlock() which is the only instance where
arch_lock_signal is used has arch_lock_release_barrier() (and
preempt_enable has one too). This guarantees that incremented ticket
head will be seen after all previous accesses have completed.



OTOH,

>
>>
>>
>> -boris
>>
>>>
>>> Cheers,
>>>
>>>> +    arch_lock_signal();                         \
>>>> +})
>>>> +
>>>>  /* Ensure a lock is quiescent between two critical operations. */
>>>>  #define spin_barrier(l)               _spin_barrier(l)
>>>>
>>>>
>>>
>>
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-14 14:29     ` Boris Ostrovsky
@ 2017-08-15  8:18       ` Jan Beulich
  2017-08-15 14:41         ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2017-08-15  8:18 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -88,7 +88,15 @@ struct page_info
>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>          struct {
>>>              /* Do TLBs need flushing for safety before next page use? */
>>> -            bool_t need_tlbflush;
>>> +            bool need_tlbflush:1;
>>> +
>>> +            /*
>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>> +             * One more bit than maximum possible order to accommodate
>>> +             * INVALID_DIRTY_IDX.
>>> +             */
>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>          } free;
>> I think generated code will be better with the two fields swapped:
>> That way reading first_dirty won't involve a shift, and accessing a
>> single bit doesn't require shifts at all on many architectures.
> 
> Ok, I will then keep need_tlbflush as the last field so the final struct
> (as defined in patch 7) will look like
> 
> struct {
>         unsigned long first_dirty:MAX_ORDER + 1;
>         unsigned long scrub_state:2;
>         bool need_tlbflush:1;
> };

Hmm, actually - why do you need bitfields on the x86 side at all?
They're needed for 32-bit architectures only, 64-bit ones ought
to be fine with

struct {
        unsigned int first_dirty;
        bool need_tlbflush;
        uint8_t scrub_state;
};

(plus a suitable BUILD_BUG_ON() to make sure first_dirty has
at least MAX_ORDER + 1 bits). The ARM maintainers will know
whether they would want to also differentiate ARM32 and
ARM64 here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-15  8:18       ` Jan Beulich
@ 2017-08-15 14:41         ` Boris Ostrovsky
  2017-08-15 14:51           ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-15 14:41 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>> --- a/xen/include/asm-x86/mm.h
>>>> +++ b/xen/include/asm-x86/mm.h
>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>          struct {
>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>> -            bool_t need_tlbflush;
>>>> +            bool need_tlbflush:1;
>>>> +
>>>> +            /*
>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>> +             * One more bit than maximum possible order to accommodate
>>>> +             * INVALID_DIRTY_IDX.
>>>> +             */
>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>          } free;
>>> I think generated code will be better with the two fields swapped:
>>> That way reading first_dirty won't involve a shift, and accessing a
>>> single bit doesn't require shifts at all on many architectures.
>> Ok, I will then keep need_tlbflush as the last field so the final struct
>> (as defined in patch 7) will look like
>>
>> struct {
>>         unsigned long first_dirty:MAX_ORDER + 1;
>>         unsigned long scrub_state:2;
>>         bool need_tlbflush:1;
>> };
> Hmm, actually - why do you need bitfields on the x86 side at all?
> They're needed for 32-bit architectures only, 64-bit ones ought
> to be fine with
>
> struct {
>         unsigned int first_dirty;
>         bool need_tlbflush;
>         uint8_t scrub_state;
> };

IIRC it was exactly because of ARM32 and at some point you suggested to
switch both x86 and ARM to bitfields.


>
> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
> at least MAX_ORDER + 1 bits). The ARM maintainers will know
> whether they would want to also differentiate ARM32 and
> ARM64 here.

Isn't using bitfields the only possibility for 32-bit? We can't fit
first_dirty into 2 bytes.

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-15 14:41         ` Boris Ostrovsky
@ 2017-08-15 14:51           ` Jan Beulich
  2017-08-15 14:52             ` Julien Grall
  0 siblings, 1 reply; 26+ messages in thread
From: Jan Beulich @ 2017-08-15 14:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, julien.grall

>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>> --- a/xen/include/asm-x86/mm.h
>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>          struct {
>>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>>> -            bool_t need_tlbflush;
>>>>> +            bool need_tlbflush:1;
>>>>> +
>>>>> +            /*
>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>> +             * One more bit than maximum possible order to accommodate
>>>>> +             * INVALID_DIRTY_IDX.
>>>>> +             */
>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>          } free;
>>>> I think generated code will be better with the two fields swapped:
>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>> single bit doesn't require shifts at all on many architectures.
>>> Ok, I will then keep need_tlbflush as the last field so the final struct
>>> (as defined in patch 7) will look like
>>>
>>> struct {
>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>         unsigned long scrub_state:2;
>>>         bool need_tlbflush:1;
>>> };
>> Hmm, actually - why do you need bitfields on the x86 side at all?
>> They're needed for 32-bit architectures only, 64-bit ones ought
>> to be fine with
>>
>> struct {
>>         unsigned int first_dirty;
>>         bool need_tlbflush;
>>         uint8_t scrub_state;
>> };
> 
> IIRC it was exactly because of ARM32 and at some point you suggested to
> switch both x86 and ARM to bitfields.

I don't recall for sure whether I had asked for the change to be done
uniformly; it was certainly ARM32 that triggered me notice the
structure size change in your original version.

>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>> whether they would want to also differentiate ARM32 and
>> ARM64 here.
> 
> Isn't using bitfields the only possibility for 32-bit? We can't fit
> first_dirty into 2 bytes.

Yes, hence the question whether to stay with bitfields uniformly
or make ARM64 follow x86, but ARM32 keep using bitfields.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-15 14:51           ` Jan Beulich
@ 2017-08-15 14:52             ` Julien Grall
  2017-08-15 15:03               ` Boris Ostrovsky
  0 siblings, 1 reply; 26+ messages in thread
From: Julien Grall @ 2017-08-15 14:52 UTC (permalink / raw)
  To: Jan Beulich, Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

Hi Jan,

On 15/08/17 15:51, Jan Beulich wrote:
>>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
>> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>>          struct {
>>>>>>              /* Do TLBs need flushing for safety before next page use? */
>>>>>> -            bool_t need_tlbflush;
>>>>>> +            bool need_tlbflush:1;
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>>> +             * One more bit than maximum possible order to accommodate
>>>>>> +             * INVALID_DIRTY_IDX.
>>>>>> +             */
>>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>>          } free;
>>>>> I think generated code will be better with the two fields swapped:
>>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>>> single bit doesn't require shifts at all on many architectures.
>>>> Ok, I will then keep need_tlbflush as the last field so the final struct
>>>> (as defined in patch 7) will look like
>>>>
>>>> struct {
>>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>>         unsigned long scrub_state:2;
>>>>         bool need_tlbflush:1;
>>>> };
>>> Hmm, actually - why do you need bitfields on the x86 side at all?
>>> They're needed for 32-bit architectures only, 64-bit ones ought
>>> to be fine with
>>>
>>> struct {
>>>         unsigned int first_dirty;
>>>         bool need_tlbflush;
>>>         uint8_t scrub_state;
>>> };
>>
>> IIRC it was exactly because of ARM32 and at some point you suggested to
>> switch both x86 and ARM to bitfields.
>
> I don't recall for sure whether I had asked for the change to be done
> uniformly; it was certainly ARM32 that triggered me notice the
> structure size change in your original version.
>
>>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>>> whether they would want to also differentiate ARM32 and
>>> ARM64 here.
>>
>> Isn't using bitfields the only possibility for 32-bit? We can't fit
>> first_dirty into 2 bytes.
>
> Yes, hence the question whether to stay with bitfields uniformly
> or make ARM64 follow x86, but ARM32 keep using bitfields.

I would prefer to avoid differentiation between Arm32 and Arm64.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-15 14:52             ` Julien Grall
@ 2017-08-15 15:03               ` Boris Ostrovsky
  2017-08-15 15:08                 ` Jan Beulich
  0 siblings, 1 reply; 26+ messages in thread
From: Boris Ostrovsky @ 2017-08-15 15:03 UTC (permalink / raw)
  To: Julien Grall, Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 08/15/2017 10:52 AM, Julien Grall wrote:
> Hi Jan,
>
> On 15/08/17 15:51, Jan Beulich wrote:
>>>>> On 15.08.17 at 16:41, <boris.ostrovsky@oracle.com> wrote:
>>> On 08/15/2017 04:18 AM, Jan Beulich wrote:
>>>>>>> On 14.08.17 at 16:29, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 08/14/2017 06:37 AM, Jan Beulich wrote:
>>>>>>>>> On 08.08.17 at 23:45, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>>>          /* Page is on a free list: ((count_info &
>>>>>>> PGC_count_mask) == 0). */
>>>>>>>          struct {
>>>>>>>              /* Do TLBs need flushing for safety before next
>>>>>>> page use? */
>>>>>>> -            bool_t need_tlbflush;
>>>>>>> +            bool need_tlbflush:1;
>>>>>>> +
>>>>>>> +            /*
>>>>>>> +             * Index of the first *possibly* unscrubbed page in
>>>>>>> the buddy.
>>>>>>> +             * One more bit than maximum possible order to
>>>>>>> accommodate
>>>>>>> +             * INVALID_DIRTY_IDX.
>>>>>>> +             */
>>>>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>>>>>          } free;
>>>>>> I think generated code will be better with the two fields swapped:
>>>>>> That way reading first_dirty won't involve a shift, and accessing a
>>>>>> single bit doesn't require shifts at all on many architectures.
>>>>> Ok, I will then keep need_tlbflush as the last field so the final
>>>>> struct
>>>>> (as defined in patch 7) will look like
>>>>>
>>>>> struct {
>>>>>         unsigned long first_dirty:MAX_ORDER + 1;
>>>>>         unsigned long scrub_state:2;
>>>>>         bool need_tlbflush:1;
>>>>> };
>>>> Hmm, actually - why do you need bitfields on the x86 side at all?
>>>> They're needed for 32-bit architectures only, 64-bit ones ought
>>>> to be fine with
>>>>
>>>> struct {
>>>>         unsigned int first_dirty;
>>>>         bool need_tlbflush;
>>>>         uint8_t scrub_state;
>>>> };
>>>
>>> IIRC it was exactly because of ARM32 and at some point you suggested to
>>> switch both x86 and ARM to bitfields.
>>
>> I don't recall for sure whether I had asked for the change to be done
>> uniformly; it was certainly ARM32 that triggered me notice the
>> structure size change in your original version.
>>
>>>> (plus a suitable BUILD_BUG_ON() to make sure first_dirty has
>>>> at least MAX_ORDER + 1 bits). The ARM maintainers will know
>>>> whether they would want to also differentiate ARM32 and
>>>> ARM64 here.
>>>
>>> Isn't using bitfields the only possibility for 32-bit? We can't fit
>>> first_dirty into 2 bytes.
>>
>> Yes, hence the question whether to stay with bitfields uniformly
>> or make ARM64 follow x86, but ARM32 keep using bitfields.
>
> I would prefer to avoid differentiation between Arm32 and Arm64.

I can switch x86 only back to "normal" types but then we also have this
in patch 7:

static void check_and_stop_scrub(struct page_info *head)
{
    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
    {
        typeof(head->u.free) pgfree;

        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
        spin_lock_kick();
        for ( ; ; )
        {
            /* Can't ACCESS_ONCE() a bitfield. */
            pgfree.val = ACCESS_ONCE(head->u.free.val);
            if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
                break;
            cpu_relax();
        }
    }
}

I'd rather avoid having '#ifdef <arch>' meaning that

union {
            struct {
                unsigned int first_dirty;
                bool need_tlbflush;
                uint8_t scrub_state;
            };

            unsigned long val;
} free;

is still kept for x86.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-08-15 15:03               ` Boris Ostrovsky
@ 2017-08-15 15:08                 ` Jan Beulich
  0 siblings, 0 replies; 26+ messages in thread
From: Jan Beulich @ 2017-08-15 15:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Julien Grall

>>> On 15.08.17 at 17:03, <boris.ostrovsky@oracle.com> wrote:
> I can switch x86 only back to "normal" types but then we also have this
> in patch 7:
> 
> static void check_and_stop_scrub(struct page_info *head)
> {
>     if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>     {
>         typeof(head->u.free) pgfree;
> 
>         head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
>         spin_lock_kick();
>         for ( ; ; )
>         {
>             /* Can't ACCESS_ONCE() a bitfield. */
>             pgfree.val = ACCESS_ONCE(head->u.free.val);
>             if ( pgfree.scrub_state != BUDDY_SCRUB_ABORT )
>                 break;
>             cpu_relax();
>         }
>     }
> }
> 
> I'd rather avoid having '#ifdef <arch>' meaning that
> 
> union {
>             struct {
>                 unsigned int first_dirty;
>                 bool need_tlbflush;
>                 uint8_t scrub_state;
>             };
> 
>             unsigned long val;
> } free;
> 
> is still kept for x86.

That's fine I think.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-08-15 15:08 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-08 21:44 [PATCH v7 0/9] Memory scrubbing from idle loop Boris Ostrovsky
2017-08-08 21:44 ` [PATCH v7 1/9] mm: Clean up free_heap_pages() Boris Ostrovsky
2017-08-10 12:21   ` Wei Liu
2017-08-14 10:30   ` Jan Beulich
2017-08-08 21:45 ` [PATCH v7 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-08-14 10:37   ` Jan Beulich
2017-08-14 14:29     ` Boris Ostrovsky
2017-08-15  8:18       ` Jan Beulich
2017-08-15 14:41         ` Boris Ostrovsky
2017-08-15 14:51           ` Jan Beulich
2017-08-15 14:52             ` Julien Grall
2017-08-15 15:03               ` Boris Ostrovsky
2017-08-15 15:08                 ` Jan Beulich
2017-08-14 11:16   ` Julien Grall
2017-08-08 21:45 ` [PATCH v7 3/9] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
2017-08-08 21:45 ` [PATCH v7 4/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-08-08 21:45 ` [PATCH v7 5/9] mm: Scrub memory from idle loop Boris Ostrovsky
2017-08-08 21:45 ` [PATCH v7 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-08-14 11:22   ` Julien Grall
2017-08-14 14:39     ` Boris Ostrovsky
2017-08-14 14:42       ` Julien Grall
2017-08-14 14:57         ` Boris Ostrovsky
2017-08-08 21:45 ` [PATCH v7 7/9] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
2017-08-14 10:38   ` Jan Beulich
2017-08-08 21:45 ` [PATCH v7 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-08-08 21:45 ` [PATCH v7 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.