All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Memory scrubbing from idle loop
@ 2017-05-19 15:50 Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
                   ` (7 more replies)
  0 siblings, 8 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Changes in V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag


When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.

Boris Ostrovsky (8):
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug          |   7 +
 xen/arch/arm/domain.c      |  16 +-
 xen/arch/x86/domain.c      |   3 +-
 xen/arch/x86/domain_page.c |   8 +-
 xen/common/page_alloc.c    | 579 ++++++++++++++++++++++++++++++++++++++-------
 xen/common/spinlock.c      |   9 +-
 xen/include/asm-arm/mm.h   |  14 ++
 xen/include/asm-x86/mm.h   |  14 ++
 xen/include/xen/mm.h       |   3 +
 xen/include/xen/spinlock.h |   8 +
 10 files changed, 558 insertions(+), 103 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-09 14:50   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Instead of using a bool dirty_head in page_info use int first_dirty.
  - Keep track of first_dirty in free_heap_pages()
* Alias PGC_need_scrub flag to PGC_allocated


 xen/common/page_alloc.c  | 175 ++++++++++++++++++++++++++++++++++++++---------
 xen/include/asm-arm/mm.h |  10 +++
 xen/include/asm-x86/mm.h |  10 +++
 3 files changed, 163 insertions(+), 32 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e41fb4..c65d214 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+        page_list_add_tail(pg, &heap(node, zone, order));
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -689,7 +705,7 @@ static struct page_info *alloc_heap_pages(
     unsigned long request = 1UL << order;
     struct page_info *pg;
     nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
+    bool need_scrub, need_tlbflush = 0;
     uint32_t tlbflush_timestamp = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
@@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
+        /*
+         * Some of the sub-chunks may be clean but we will mark them
+         * as dirty (if need_scrub is set) to avoid traversing the
+         * list here.
+         */
+        page_list_add_scrub(pg, node, zone, --j,
+                            need_scrub ? 0 : INVALID_DIRTY_IDX);
         pg += 1 << j;
     }
 
@@ -851,11 +874,20 @@ static int reserve_offlined_page(struct page_info *head)
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
     int cur_order;
+    bool need_scrub;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    need_scrub = (head->u.free.first_dirty != INVALID_DIRTY_IDX);
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -873,6 +905,8 @@ static int reserve_offlined_page(struct page_info *head)
 
         while ( cur_order < head_order )
         {
+            unsigned int first_dirty = INVALID_DIRTY_IDX;
+
             next_order = cur_order + 1;
 
             if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
@@ -892,8 +926,20 @@ static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+
+                /* See if any of the pages indeed need scrubbing. */
+                if ( need_scrub )
+                {
+                    for ( i = 0; i < (1 << cur_order); i++ )
+                        if ( test_bit(_PGC_need_scrub,
+                                      &cur_head[i].count_info) )
+                        {
+                            first_dirty = i;
+                            break;
+                        }
+                }
+                page_list_add_scrub(cur_head, node, zone,
+                                    cur_order, first_dirty);
                 cur_head += (1 << cur_order);
                 break;
             }
@@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1050,20 @@ static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -977,35 +1076,54 @@ static void free_heap_pages(
 
         if ( (page_to_mfn(pg) & mask) )
         {
+            struct page_info *predecessor = pg - mask;
+
             /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
+                 !page_state_is(predecessor, free) ||
+                 (PFN_ORDER(predecessor) != order) ||
+                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
+
+            page_list_del(predecessor, &heap(node, zone, order));
+
+            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
+                need_scrub = true;
+                /* ... and keep predecessor's first_dirty. */
+            else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
+            pg->u.free.first_dirty = INVALID_DIRTY_IDX;
+            pg = predecessor;
         }
         else
         {
+            struct page_info *successor = pg + mask;
+
             /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
+                 !page_state_is(successor, free) ||
+                 (PFN_ORDER(successor) != order) ||
+                 (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
-            page_list_del(pg + mask, &heap(node, zone, order));
+            page_list_del(successor, &heap(node, zone, order));
+
+            need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX);
+            successor->u.free.first_dirty = INVALID_DIRTY_IDX;
         }
 
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1226,7 +1344,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1295,7 +1413,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1622,7 +1740,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1676,12 +1794,9 @@ void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1790,7 +1905,7 @@ struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1858,11 +1973,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index f6915ad..38d4fba 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -43,6 +43,9 @@ struct page_info
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+            /* Index of the first *possibly* unscrubbed page in the buddy. */
+#define INVALID_DIRTY_IDX -1U
+            unsigned int first_dirty;
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
@@ -115,6 +118,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern unsigned long xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 119d7de..e20f161 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -87,6 +87,9 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+            /* Index of the first *possibly* unscrubbed page in the buddy. */
+#define INVALID_DIRTY_IDX -1U
+            unsigned int first_dirty;
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
@@ -233,6 +236,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 struct spage_info
 {
        unsigned long type_info;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-09 15:08   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Change in v4:
* New patch

 xen/common/page_alloc.c | 132 +++++++++++++++++++++++++++---------------------
 1 file changed, 74 insertions(+), 58 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index c65d214..1e57885 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,22 +694,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-    unsigned int zone_lo, unsigned int zone_hi,
-    unsigned int order, unsigned int memflags,
-    struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+                                        unsigned int zone_hi,
+                                        unsigned int order, unsigned int memflags,
+                                        struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-    unsigned long request = 1UL << order;
+    nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+    unsigned int j, zone, nodemask_retry = 0, request = 1UL << order;
     struct page_info *pg;
-    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool need_scrub, need_tlbflush = 0;
-    uint32_t tlbflush_timestamp = 0;
-
-    /* Make sure there are enough bits in memflags for nodeID. */
-    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
 
     if ( node == NUMA_NO_NODE )
     {
@@ -725,36 +718,8 @@ static struct page_info *alloc_heap_pages(
     first_node = node;
 
     ASSERT(node < MAX_NUMNODES);
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
-    if ( unlikely(order > MAX_ORDER) )
-        return NULL;
-
-    spin_lock(&heap_lock);
-
-    /*
-     * Claimed memory is considered unavailable unless the request
-     * is made by a domain with sufficient unclaimed pages.
-     */
-    if ( (outstanding_claims + request >
-          total_avail_pages + tmem_freeable_pages()) &&
-          ((memflags & MEMF_no_refcount) ||
-           !d || d->outstanding_pages < request) )
-        goto not_found;
-
-    /*
-     * TMEM: When available memory is scarce due to tmem absorbing it, allow
-     * only mid-size allocations to avoid worst of fragmentation issues.
-     * Others try tmem pools then fail.  This is a workaround until all
-     * post-dom0-creation-multi-page allocations can be eliminated.
-     */
-    if ( ((order == 0) || (order >= 9)) &&
-         (total_avail_pages <= midsize_alloc_zone_pages) &&
-         tmem_freeable_pages() )
-        goto try_tmem;
 
-    /*
+   /*
      * Start with requested node, but exhaust all node memory in requested 
      * zone before failing, only calc new node value if we fail to find memory 
      * in target node, this avoids needless computation on fast-path.
@@ -770,11 +735,11 @@ static struct page_info *alloc_heap_pages(
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    goto found;
+                    return pg;
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-            goto not_found;
+            return NULL;
 
         /* Pick next node. */
         if ( !node_isset(node, nodemask) )
@@ -791,42 +756,93 @@ static struct page_info *alloc_heap_pages(
         {
             /* When we have tried all in nodemask, we fall back to others. */
             if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-                goto not_found;
+                return NULL;
             nodes_andnot(nodemask, node_online_map, nodemask);
             first_node = node = first_node(nodemask);
             if ( node >= MAX_NUMNODES )
-                goto not_found;
+                return NULL;
         }
     }
+}
+
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+    unsigned int zone_lo, unsigned int zone_hi,
+    unsigned int order, unsigned int memflags,
+    struct domain *d)
+{
+    nodeid_t node;
+    unsigned int i, buddy_order, zone;
+    unsigned long request = 1UL << order;
+    struct page_info *pg;
+    bool need_scrub, need_tlbflush = false;
+    uint32_t tlbflush_timestamp = 0;
+
+    /* Make sure there are enough bits in memflags for nodeID. */
+    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < request) )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
- try_tmem:
-    /* Try to free memory from tmem */
-    if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL )
+    /*
+     * TMEM: When available memory is scarce due to tmem absorbing it, allow
+     * only mid-size allocations to avoid worst of fragmentation issues.
+     * Others try tmem pools then fail.  This is a workaround until all
+     * post-dom0-creation-multi-page allocations can be eliminated.
+     */
+    if ( ((order == 0) || (order >= 9)) &&
+         (total_avail_pages <= midsize_alloc_zone_pages) &&
+         tmem_freeable_pages() )
     {
-        /* reassigning an already allocated anonymous heap page */
+        /* Try to free memory from tmem. */
+        pg = tmem_relinquish_pages(order, memflags);
         spin_unlock(&heap_lock);
         return pg;
     }
 
- not_found:
-    /* No suitable memory blocks. Fail the request. */
-    spin_unlock(&heap_lock);
-    return NULL;
+    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    if ( !pg )
+    {
+        /* No suitable memory blocks. Fail the request. */
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
+
+    node = phys_to_nid(page_to_maddr(pg));
+    zone = page_to_zone(pg);
+    buddy_order = PFN_ORDER(pg);
 
- found: 
     need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
 
     /* We may have to halve the chunk a number of times. */
-    while ( j != order )
+    while ( buddy_order != order )
     {
         /*
          * Some of the sub-chunks may be clean but we will mark them
          * as dirty (if need_scrub is set) to avoid traversing the
          * list here.
          */
-        page_list_add_scrub(pg, node, zone, --j,
+        page_list_add_scrub(pg, node, zone, --buddy_order,
                             need_scrub ? 0 : INVALID_DIRTY_IDX);
-        pg += 1 << j;
+        pg += 1 << buddy_order;
     }
 
     ASSERT(avail[node][zone] >= request);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-09 15:22   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Add MEMF_no_tlbflush flag

 xen/common/page_alloc.c | 43 ++++++++++++++++++++++++++++++++++++-------
 xen/include/xen/mm.h    |  2 ++
 2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 1e57885..b7c7426 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -703,6 +703,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0, request = 1UL << order;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -821,9 +829,16 @@ static struct page_info *alloc_heap_pages(
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
     if ( !pg )
     {
-        /* No suitable memory blocks. Fail the request. */
-        spin_unlock(&heap_lock);
-        return NULL;
+        /* Try now getting a dirty buddy. */
+        if ( !(memflags & MEMF_no_scrub) )
+            pg = get_free_buddy(zone_lo, zone_hi, order,
+                                memflags | MEMF_no_scrub, d);
+        if ( !pg )
+        {
+            /* No suitable memory blocks. Fail the request. */
+            spin_unlock(&heap_lock);
+            return NULL;
+        }
     }
 
     node = phys_to_nid(page_to_maddr(pg));
@@ -855,10 +870,24 @@ static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    need_scrub &= !(memflags & MEMF_no_scrub);
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( need_scrub )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+            /*
+             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
+             * on buddy's head. However, since we assign pg[i].count_info
+             * below, we can skip this.
+             */
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1737,7 +1766,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1787,7 +1816,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1..0d4b7c2 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -224,6 +224,8 @@ struct npfec {
 #define  MEMF_no_owner    (1U<<_MEMF_no_owner)
 #define _MEMF_no_tlbflush 6
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
+#define _MEMF_no_scrub    7
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
 #define _MEMF_node        8
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-05-19 15:50 ` [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-12  8:08   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Be careful with tasklets in idle_loop()
* Use per-cpu mapcache override
* Update node_to_scrub() algorithm to select closest node (and add comment
   explaining what it does)
* Put buddy back in the heap directly (as opposed to using merge_and_free_buddy()
  which is dropped anyway)
* Don't stop scrubbing immediately when softirq is pending, try to scrub at least
  a few (8) pages.

 xen/arch/arm/domain.c      |  16 ++++---
 xen/arch/x86/domain.c      |   3 +-
 xen/arch/x86/domain_page.c |   8 ++--
 xen/common/page_alloc.c    | 113 +++++++++++++++++++++++++++++++++++++++------
 xen/include/xen/mm.h       |   1 +
 5 files changed, 117 insertions(+), 24 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 76310ed..9931ca2 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -46,15 +46,19 @@ void idle_loop(void)
         if ( cpu_is_offline(smp_processor_id()) )
             stop_cpu();
 
-        local_irq_disable();
-        if ( cpu_is_haltable(smp_processor_id()) )
+        do_tasklet();
+
+        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
         {
-            dsb(sy);
-            wfi();
+            local_irq_disable();
+            if ( cpu_is_haltable(smp_processor_id()) )
+            {
+                dsb(sy);
+                wfi();
+            }
+            local_irq_enable();
         }
-        local_irq_enable();
 
-        do_tasklet();
         do_softirq();
         /*
          * We MUST be last (or before dsb, wfi). Otherwise after we get the
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 13cdc50..229711f 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -118,8 +118,9 @@ static void idle_loop(void)
     {
         if ( cpu_is_offline(smp_processor_id()) )
             play_dead();
-        (*pm_idle)();
         do_tasklet();
+        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
+            (*pm_idle)();
         do_softirq();
         /*
          * We MUST be last (or before pm_idle). Otherwise after we get the
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..cfe7cc1 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,14 @@
 #include <asm/hardirq.h>
 #include <asm/setup.h>
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
+    struct vcpu *v, *this_vcpu = this_cpu(override);
+
     /* In the common case we use the mapcache of the running VCPU. */
-    struct vcpu *v = override ?: current;
+    v = this_vcpu ?: current;
 
     /*
      * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +61,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-    override = v;
+    this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index b7c7426..6e505b1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1010,15 +1010,79 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-    struct page_info *pg;
-    unsigned int zone;
+    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+    nodeid_t closest = NUMA_NO_NODE;
+    u8 dist, shortest = 0xff;
 
-    ASSERT(spin_is_locked(&heap_lock));
+    if ( node == NUMA_NO_NODE )
+        node = 0;
 
-    if ( !node_need_scrub[node] )
-        return;
+    if ( node_need_scrub[node] &&
+         (!get_node || !node_test_and_set(node, node_scrubbing)) )
+        return node;
+
+    /*
+     * See if there are memory-only nodes that need scrubbing and choose
+     * the closest one.
+     */
+    local_node = node;
+    while ( 1 )
+    {
+        do {
+            node = cycle_node(node, node_online_map);
+        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
+                  (node != local_node) );
+
+        if ( node == local_node )
+            break;
+
+        if ( node_need_scrub[node] )
+        {
+            if ( !get_node )
+                return node;
+
+            dist = __node_distance(local_node, node);
+            if ( dist < shortest || closest == NUMA_NO_NODE )
+            {
+                if ( !node_test_and_set(node, node_scrubbing) )
+                {
+                    if ( closest != NUMA_NO_NODE )
+                        node_clear(closest, node_scrubbing);
+                    shortest = dist;
+                    closest = node;
+                }
+            }
+        }
+    }
+
+    return closest;
+}
+
+bool scrub_free_pages(void)
+{
+    struct page_info *pg;
+    unsigned int zone;
+    unsigned int cpu = smp_processor_id();
+    bool preempt = false;
+    nodeid_t node;
+    unsigned int cnt = 0;
+  
+    node = node_to_scrub(true);
+    if ( node == NUMA_NO_NODE )
+        return false;
+ 
+    spin_lock(&heap_lock);
 
     for ( zone = 0; zone < NR_ZONES; zone++ )
     {
@@ -1035,22 +1099,46 @@ static void scrub_free_pages(unsigned int node)
 
                 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
                 {
+                    cnt++;
                     if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                     {
                         scrub_one_page(&pg[i]);
                         pg[i].count_info &= ~PGC_need_scrub;
                         node_need_scrub[node]--;
+                        cnt += 100; /* scrubbed pages add heavier weight. */
                     }
-                }
 
-                page_list_del(pg, &heap(node, zone, order));
-                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                    /*
+                     * Scrub a few (8) pages before becoming eligible for
+                     * preemtion. But also count non-scrubbing loop iteration
+                     * so that we don't get stuck here with an almost clean
+                     * heap.
+                     */
+                    if ( softirq_pending(cpu) && cnt > 800 )
+                    {
+                        preempt = true;
+                        break;
+                    }
+                }
 
-                if ( node_need_scrub[node] == 0 )
-                    return;
+                if ( i == (1U << order) )
+                {
+                    page_list_del(pg, &heap(node, zone, order));
+                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                }
+                else
+                    pg->u.free.first_dirty = i + 1;
+ 
+                if ( preempt || (node_need_scrub[node] == 0) )
+                    goto out;
             }
         } while ( order-- != 0 );
     }
+
+ out:
+    spin_unlock(&heap_lock);
+    node_clear(node, node_scrubbing);
+    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
 
 /* Free 2^@order set of pages. */
@@ -1166,9 +1254,6 @@ static void free_heap_pages(
     if ( tainted )
         reserve_offlined_page(pg);
 
-    if ( need_scrub )
-        scrub_free_pages(node);
-
     spin_unlock(&heap_lock);
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 0d4b7c2..ed90a61 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -138,6 +138,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
 void free_xenheap_pages(void *v, unsigned int order);
+bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 /* Map machine page range in Xen virtual address space. */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 5/8] spinlock: Introduce spin_lock_cb()
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-05-19 15:50 ` [PATCH v4 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-12  8:23   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket abd execute a
callback while waiting.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/spinlock.c      | 9 ++++++++-
 xen/include/xen/spinlock.h | 8 ++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
     return read_atomic(&t->head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
     spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
     LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
     while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
+        if ( unlikely(cb) )
+            cb(data);
         arch_lock_relax();
     }
     LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
     arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+     _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)                  _spin_lock(l)
+#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)              _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
     1 : ({ local_irq_restore(flags); 0; });     \
 })
 
+#define spin_lock_kick(l)                       \
+({                                              \
+    smp_mb();                                   \
+    arch_lock_signal();                         \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)               _spin_barrier(l)
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-05-19 15:50 ` [PATCH v4 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-12  8:30   ` Jan Beulich
  2017-05-19 15:50 ` [PATCH v4 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's PAGE_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Drop unnecessary ACCESS_ONCE in scrub_free_pages, add smp_wmb()
  in scrub_continue()
* Keep track of first_dirty in scrub_wait_state

 xen/common/page_alloc.c  | 100 ++++++++++++++++++++++++++++++++++++++++++++---
 xen/include/asm-arm/mm.h |   4 ++
 xen/include/asm-x86/mm.h |   4 ++
 3 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6e505b1..b365305 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,6 +694,17 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+    if ( head->u.free.scrub_state & PAGE_SCRUBBING )
+    {
+        head->u.free.scrub_state |= PAGE_SCRUB_ABORT;
+        spin_lock_kick();
+        while ( ACCESS_ONCE(head->u.free.scrub_state) & PAGE_SCRUB_ABORT )
+            cpu_relax();
+    }
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
                                         unsigned int zone_hi,
                                         unsigned int order, unsigned int memflags,
@@ -738,9 +749,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
-                    if ( (order == 0) || use_unscrubbed ||
-                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                    if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                         return pg;
+
+                    if ( (order == 0) || use_unscrubbed )
+                    {
+                        check_and_stop_scrub(pg);
+                        return pg;
+                    }
+
                     page_list_add_tail(pg, &heap(node, zone, j));
                 }
             }
@@ -925,6 +942,7 @@ static int reserve_offlined_page(struct page_info *head)
 
     cur_head = head;
 
+    check_and_stop_scrub(head);
     /*
      * We may break the buddy so let's mark the head as clean. Then, when
      * merging chunks back into the heap, we will see whether the chunk has
@@ -1069,6 +1087,29 @@ static unsigned int node_to_scrub(bool get_node)
     return closest;
 }
 
+struct scrub_wait_state {
+    struct page_info *pg;
+    unsigned int first_dirty;
+    bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+    struct scrub_wait_state *st = data;
+
+    if ( st->drop )
+        return;
+
+    if ( st->pg->u.free.scrub_state & PAGE_SCRUB_ABORT )
+    {
+        /* There is a waiter for this buddy. Release it. */
+        st->drop = true;
+        st->pg->u.free.first_dirty = st->first_dirty;
+        smp_wmb();
+        st->pg->u.free.scrub_state = 0;
+    }
+}
+
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
@@ -1090,24 +1131,51 @@ bool scrub_free_pages(void)
         do {
             while ( !page_list_empty(&heap(node, zone, order)) )
             {
-                unsigned int i;
+                unsigned int i, dirty_cnt;
+                struct scrub_wait_state st;
 
                 /* Unscrubbed pages are always at the end of the list. */
                 pg = page_list_last(&heap(node, zone, order));
                 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                     break;
 
+                ASSERT(!pg->u.free.scrub_state);
+                pg->u.free.scrub_state = PAGE_SCRUBBING;
+
+                spin_unlock(&heap_lock);
+
+                dirty_cnt = 0;
                 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
                 {
                     cnt++;
                     if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                     {
                         scrub_one_page(&pg[i]);
+                        /*
+                         * We can modify count_info without holding heap
+                         * lock since we effectively locked this buddy by
+                         * setting its scrub_state.
+                         */
                         pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        dirty_cnt++;
                         cnt += 100; /* scrubbed pages add heavier weight. */
                     }
 
+                    if ( pg->u.free.scrub_state & PAGE_SCRUB_ABORT )
+                    {
+                        /* Someone wants this chunk. Drop everything. */
+
+                        pg->u.free.first_dirty = (i == (1U << order)) ?
+                            INVALID_DIRTY_IDX : i + 1; 
+                        smp_wmb();
+                        pg->u.free.scrub_state = 0;
+
+                        spin_lock(&heap_lock);
+                        node_need_scrub[node] -= dirty_cnt;
+                        spin_unlock(&heap_lock);
+                        goto out_nolock;
+                    }
+
                     /*
                      * Scrub a few (8) pages before becoming eligible for
                      * preemtion. But also count non-scrubbing loop iteration
@@ -1121,6 +1189,17 @@ bool scrub_free_pages(void)
                     }
                 }
 
+                st.pg = pg;
+                st.first_dirty = (i == (1UL << order)) ?
+                    INVALID_DIRTY_IDX : i + 1;
+                st.drop = false;
+                spin_lock_cb(&heap_lock, scrub_continue, &st);
+
+                node_need_scrub[node] -= dirty_cnt;
+
+                if ( st.drop )
+                    goto out;
+
                 if ( i == (1U << order) )
                 {
                     page_list_del(pg, &heap(node, zone, order));
@@ -1128,7 +1207,9 @@ bool scrub_free_pages(void)
                 }
                 else
                     pg->u.free.first_dirty = i + 1;
- 
+
+                pg->u.free.scrub_state = 0;
+
                 if ( preempt || (node_need_scrub[node] == 0) )
                     goto out;
             }
@@ -1137,6 +1218,8 @@ bool scrub_free_pages(void)
 
  out:
     spin_unlock(&heap_lock);
+
+ out_nolock:
     node_clear(node, node_scrubbing);
     return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
@@ -1175,6 +1258,8 @@ static void free_heap_pages(
         if ( page_state_is(&pg[i], offlined) )
             tainted = 1;
 
+        pg[i].u.free.scrub_state = 0;
+
         /* If a page has no owner it will need no safety TLB flush. */
         pg[i].u.free.need_tlbflush = (page_get_owner(&pg[i]) != NULL);
         if ( pg[i].u.free.need_tlbflush )
@@ -1218,6 +1303,8 @@ static void free_heap_pages(
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
+            check_and_stop_scrub(predecessor);
+
             page_list_del(predecessor, &heap(node, zone, order));
 
             if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
@@ -1240,6 +1327,9 @@ static void free_heap_pages(
                  (PFN_ORDER(successor) != order) ||
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
+
+            check_and_stop_scrub(successor);
+
             page_list_del(successor, &heap(node, zone, order));
 
             need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX);
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 38d4fba..148d60b 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -46,6 +46,10 @@ struct page_info
             /* Index of the first *possibly* unscrubbed page in the buddy. */
 #define INVALID_DIRTY_IDX -1U
             unsigned int first_dirty;
+#define PAGE_SCRUBBING    (1<<0)
+#define PAGE_SCRUB_ABORT  (1<<1)
+            unsigned char scrub_state;
+
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index e20f161..f46b242 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -90,6 +90,10 @@ struct page_info
             /* Index of the first *possibly* unscrubbed page in the buddy. */
 #define INVALID_DIRTY_IDX -1U
             unsigned int first_dirty;
+#define PAGE_SCRUBBING    (1<<0)
+#define PAGE_SCRUB_ABORT  (1<<1)
+            unsigned char scrub_state;
+
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
         } free;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 7/8] mm: Print number of unscrubbed pages in 'H' debug handler
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-05-19 15:50 ` [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-05-19 15:50 ` [PATCH v4 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
  7 siblings, 0 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index b365305..e744d81 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2292,6 +2292,13 @@ static void dump_heap(unsigned char key)
             printk("heap[node=%d][zone=%d] -> %lu pages\n",
                    i, j, avail[i][j]);
     }
+
+    for ( i = 0; i < MAX_NUMNODES; i++ )
+    {
+        if ( !node_need_scrub[i] )
+            continue;
+        printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+    }
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v4 8/8] mm: Make sure pages are scrubbed
  2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-05-19 15:50 ` [PATCH v4 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
@ 2017-05-19 15:50 ` Boris Ostrovsky
  2017-06-12  8:43   ` Jan Beulich
  7 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-05-19 15:50 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v4:
* Don't (debug-)scrub (and don't check for poison) before bootscrub completes
* Adjust scrub pattern

 xen/Kconfig.debug       |  7 ++++++
 xen/common/page_alloc.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..adc4162 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
 	  logged in the Xen ring buffer.
 	  If unsure, say N here.
 
+config SCRUB_DEBUG
+    bool "Page scrubbing test"
+    default DEBUG
+    ---help---
+	  Verify that pages that need to be scrubbed before being allocated to
+	  a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index e744d81..c1ac26d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -694,6 +698,39 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#define SCRUB_PATTERN        0xc2c2c2c2c2c2c2c2ULL
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+
+    ptr = map_domain_page(mfn);
+    *ptr = ~SCRUB_PATTERN;
+    unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+    unsigned i;
+
+    if ( !boot_scrub_done )
+        return;
+
+    ptr = map_domain_page(mfn);
+    for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+        ASSERT(ptr[i] == SCRUB_PATTERN);
+    unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
     if ( head->u.free.scrub_state & PAGE_SCRUBBING )
@@ -919,6 +956,9 @@ static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]));
+
+        if ( !(memflags & MEMF_no_scrub) )
+            check_one_page(&pg[i]); 
     }
 
     spin_unlock(&heap_lock);
@@ -1270,7 +1310,10 @@ static void free_heap_pages(
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
         if ( need_scrub )
+        {
             pg[i].count_info |= PGC_need_scrub;
+            poison_one_page(&pg[i]);
+        }
     }
 
     avail[node][zone] += 1 << order;
@@ -1633,7 +1676,12 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
+#ifndef CONFIG_SCRUB_DEBUG
         free_heap_pages(pg + i, 0, false);
+#else
+        free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+	
     }
 }
 
@@ -1899,6 +1947,10 @@ void __init scrub_heap_pages(void)
 
     printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+    boot_scrub_done = true;
+#endif
+
     /* Now that the heap is initialized, run checks and set bounds
      * for the low mem virq algorithm. */
     setup_low_mem_virq();
@@ -2172,12 +2224,16 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 
             spin_unlock_recursive(&d->page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
             /*
              * Normally we expect a domain to clear pages before freeing them,
              * if it cares about the secrecy of their contents. However, after
              * a domain has died we assume responsibility for erasure.
              */
             scrub = !!d->is_dying;
+#else
+            scrub = true;
+#endif
         }
         else
         {
@@ -2269,7 +2325,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
     /* Avoid callers relying on allocations returning zeroed pages. */
-    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+    unmap_domain_page(memset(__map_domain_page(pg),
+                             SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
     /* For a production build, clear_page() is the fastest way to scrub. */
     clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-05-19 15:50 ` [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-06-09 14:50   ` Jan Beulich
  2017-06-09 20:07     ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-09 14:50 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>  
> +static unsigned long node_need_scrub[MAX_NUMNODES];

Just as a remark - I think it is inefficient to store per-node data
this way (equally applies to _heap[]); this would better be put
into per-node memory. Since we don't have (and likely also don't
want to have) this_node() / per_node(), perhaps we could
(ab)use per-CPU data for this.

> @@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages(
>      return NULL;
>  
>   found: 
> +    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
> +
>      /* We may have to halve the chunk a number of times. */
>      while ( j != order )
>      {
> -        PFN_ORDER(pg) = --j;
> -        page_list_add_tail(pg, &heap(node, zone, j));
> +        /*
> +         * Some of the sub-chunks may be clean but we will mark them
> +         * as dirty (if need_scrub is set) to avoid traversing the
> +         * list here.
> +         */
> +        page_list_add_scrub(pg, node, zone, --j,
> +                            need_scrub ? 0 : INVALID_DIRTY_IDX);

I suppose this is going to be improved in subsequent patches,
and hence the comment will be gone by the end of the series?

> @@ -851,11 +874,20 @@ static int reserve_offlined_page(struct page_info *head)
>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>      struct page_info *cur_head;
>      int cur_order;
> +    bool need_scrub;
>  
>      ASSERT(spin_is_locked(&heap_lock));
>  
>      cur_head = head;
>  
> +    /*
> +     * We may break the buddy so let's mark the head as clean. Then, when
> +     * merging chunks back into the heap, we will see whether the chunk has
> +     * unscrubbed pages and set its first_dirty properly.
> +     */
> +    need_scrub = (head->u.free.first_dirty != INVALID_DIRTY_IDX);
> +    head->u.free.first_dirty = INVALID_DIRTY_IDX;
> +
>      page_list_del(head, &heap(node, zone, head_order));
>  
>      while ( cur_head < (head + (1 << head_order)) )
> @@ -873,6 +905,8 @@ static int reserve_offlined_page(struct page_info *head)
>  
>          while ( cur_order < head_order )
>          {
> +            unsigned int first_dirty = INVALID_DIRTY_IDX;
> +
>              next_order = cur_order + 1;
>  
>              if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
> @@ -892,8 +926,20 @@ static int reserve_offlined_page(struct page_info *head)
>              {
>              merge:
>                  /* We don't consider merging outside the head_order. */
> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
> -                PFN_ORDER(cur_head) = cur_order;
> +
> +                /* See if any of the pages indeed need scrubbing. */
> +                if ( need_scrub )
> +                {
> +                    for ( i = 0; i < (1 << cur_order); i++ )
> +                        if ( test_bit(_PGC_need_scrub,
> +                                      &cur_head[i].count_info) )
> +                        {
> +                            first_dirty = i;
> +                            break;
> +                        }
> +                }

This is inefficient - at the point you set need_scrub you could
instead latch the first page needing to be scrubbed, and start
the loop only there (and in some cases you would get away
without entering the loop altogether).

> @@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head)
>      return count;
>  }
>  
> +static void scrub_free_pages(unsigned int node)
> +{
> +    struct page_info *pg;
> +    unsigned int zone;
> +
> +    ASSERT(spin_is_locked(&heap_lock));
> +
> +    if ( !node_need_scrub[node] )
> +        return;
> +
> +    for ( zone = 0; zone < NR_ZONES; zone++ )
> +    {
> +        unsigned int order = MAX_ORDER;
> +        do {

Blank line between declaration(s) and statement(s) please.

Also, considering that this entire function runs with the heap lock
held, I think we really need to split that one into per-node ones.
But of course, as with the NUMA locality related remark, this isn't
a request to necessarily do this in the series here.

> @@ -977,35 +1076,54 @@ static void free_heap_pages(
>  
>          if ( (page_to_mfn(pg) & mask) )
>          {
> +            struct page_info *predecessor = pg - mask;
> +
>              /* Merge with predecessor block? */
> -            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
> -                 !page_state_is(pg-mask, free) ||
> -                 (PFN_ORDER(pg-mask) != order) ||
> -                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
> +            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
> +                 !page_state_is(predecessor, free) ||
> +                 (PFN_ORDER(predecessor) != order) ||
> +                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
>                  break;
> -            pg -= mask;
> -            page_list_del(pg, &heap(node, zone, order));
> +
> +            page_list_del(predecessor, &heap(node, zone, order));
> +
> +            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
> +                need_scrub = true;
> +                /* ... and keep predecessor's first_dirty. */
> +            else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
> +                predecessor->u.free.first_dirty = (1U << order) +
> +                                                  pg->u.free.first_dirty;
> +
> +            pg->u.free.first_dirty = INVALID_DIRTY_IDX;

Is this really needed?

> +            pg = predecessor;
>          }
>          else
>          {
> +            struct page_info *successor = pg + mask;
> +
>              /* Merge with successor block? */
> -            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
> -                 !page_state_is(pg+mask, free) ||
> -                 (PFN_ORDER(pg+mask) != order) ||
> -                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
> +            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
> +                 !page_state_is(successor, free) ||
> +                 (PFN_ORDER(successor) != order) ||
> +                 (phys_to_nid(page_to_maddr(successor)) != node) )
>                  break;
> -            page_list_del(pg + mask, &heap(node, zone, order));
> +            page_list_del(successor, &heap(node, zone, order));
> +
> +            need_scrub |= (successor->u.free.first_dirty != INVALID_DIRTY_IDX);

|= on a boolean is slightly odd - perhaps better us if() just like you
do on the other path?

> +            successor->u.free.first_dirty = INVALID_DIRTY_IDX;

Same here - is this really needed?

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -87,6 +87,9 @@ struct page_info
>  
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
> +            /* Index of the first *possibly* unscrubbed page in the buddy. */
> +#define INVALID_DIRTY_IDX -1U

This needs parentheses and would better be either ~0U or UINT_MAX.

> +            unsigned int first_dirty;

On x86 this change is fine at present, albeit not optimal. Its ARM
equivalent, however, grows struct page_info in the 32-bit case,
which I don't think is wanted or needed. You really only need
MAX_ORDER+1 bits here, so I'd suggest making this a bit field
(also on x86, to at once make obvious how many bits remain
unused), and perhaps making need_tlbflush a single bit at once
(unless its address is being taken somewhere).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-05-19 15:50 ` [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-06-09 15:08   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-09 15:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -694,22 +694,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
>          page_list_add(pg, &heap(node, zone, order));
>  }
>  
> -/* Allocate 2^@order contiguous pages. */
> -static struct page_info *alloc_heap_pages(
> -    unsigned int zone_lo, unsigned int zone_hi,
> -    unsigned int order, unsigned int memflags,
> -    struct domain *d)
> +static struct page_info *get_free_buddy(unsigned int zone_lo,
> +                                        unsigned int zone_hi,
> +                                        unsigned int order, unsigned int memflags,
> +                                        struct domain *d)

Did you check whether this can be const here?

>  {
> -    unsigned int i, j, zone = 0, nodemask_retry = 0;
>      nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
> -    unsigned long request = 1UL << order;
> +    nodemask_t nodemask = d ? d->node_affinity : node_online_map;
> +    unsigned int j, zone, nodemask_retry = 0, request = 1UL << order;

There's a type mismatch here - either you mean 1U or request's
type should be unsigned long (following the old code it should be
the latter). Otoh it's not immediately visible from the patch
whether in both functions this really still is a useful local variable
at all.

With these two taken care of
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-05-19 15:50 ` [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-06-09 15:22   ` Jan Beulich
  2017-06-09 20:55     ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-09 15:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>  
>              /* Find smallest order which can satisfy the request. */
>              for ( j = order; j <= MAX_ORDER; j++ )
> +            {
>                  if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
> -                    return pg;
> +                {
> +                    if ( (order == 0) || use_unscrubbed ||

Why is order 0 being special cased here? If this really is intended, a
comment should be added.

> @@ -821,9 +829,16 @@ static struct page_info *alloc_heap_pages(
>      pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
>      if ( !pg )
>      {
> -        /* No suitable memory blocks. Fail the request. */
> -        spin_unlock(&heap_lock);
> -        return NULL;
> +        /* Try now getting a dirty buddy. */
> +        if ( !(memflags & MEMF_no_scrub) )
> +            pg = get_free_buddy(zone_lo, zone_hi, order,
> +                                memflags | MEMF_no_scrub, d);
> +        if ( !pg )
> +        {
> +            /* No suitable memory blocks. Fail the request. */
> +            spin_unlock(&heap_lock);
> +            return NULL;
> +        }
>      }

I'd appreciate if you avoided the re-indentation by simply
prefixing another if() to the one that's already there.

> @@ -855,10 +870,24 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    need_scrub &= !(memflags & MEMF_no_scrub);

Can't this be done right away when need_scrub is being set?

>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */
> -        BUG_ON(pg[i].count_info != PGC_state_free);
> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);

Isn't this change needed in one of the earlier patches already?
There also is a stray blank ahead of the first closing paren here.

> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +        {
> +            if ( need_scrub )
> +                scrub_one_page(&pg[i]);
> +            node_need_scrub[node]--;
> +            /*
> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
> +             * on buddy's head. However, since we assign pg[i].count_info
> +             * below, we can skip this.
> +             */

This comment is correct only with the current way struct page_info's
fields are unionized. In fact I think the comment is unneeded - the
buddy is being transitioned from free to allocated here, so the field
loses its meaning.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-06-09 14:50   ` Jan Beulich
@ 2017-06-09 20:07     ` Boris Ostrovsky
  2017-06-12  6:50       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-09 20:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 06/09/2017 10:50 AM, Jan Beulich wrote:
>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>>  
>> +static unsigned long node_need_scrub[MAX_NUMNODES];
> Just as a remark - I think it is inefficient to store per-node data
> this way (equally applies to _heap[]); this would better be put
> into per-node memory. Since we don't have (and likely also don't
> want to have) this_node() / per_node(), perhaps we could
> (ab)use per-CPU data for this.

I did think about doing this but then decided against it since I wasn't
sure it's worth the extra space.


>
>> @@ -798,11 +814,18 @@ static struct page_info *alloc_heap_pages(
>>      return NULL;
>>  
>>   found: 
>> +    need_scrub = (pg->u.free.first_dirty != INVALID_DIRTY_IDX);
>> +
>>      /* We may have to halve the chunk a number of times. */
>>      while ( j != order )
>>      {
>> -        PFN_ORDER(pg) = --j;
>> -        page_list_add_tail(pg, &heap(node, zone, j));
>> +        /*
>> +         * Some of the sub-chunks may be clean but we will mark them
>> +         * as dirty (if need_scrub is set) to avoid traversing the
>> +         * list here.
>> +         */
>> +        page_list_add_scrub(pg, node, zone, --j,
>> +                            need_scrub ? 0 : INVALID_DIRTY_IDX);
> I suppose this is going to be improved in subsequent patches,
> and hence the comment will be gone by the end of the series?

No, it stays the same throughout he series. I suppose I could improve
this by tracking the original buddy's first_dirty and call
page_list_add_scrub(.., false) until we reach that value.


>
>> @@ -919,9 +965,52 @@ static int reserve_offlined_page(struct page_info *head)
>>      return count;
>>  }
>>  
>> +static void scrub_free_pages(unsigned int node)
>> +{
>> +    struct page_info *pg;
>> +    unsigned int zone;
>> +
>> +    ASSERT(spin_is_locked(&heap_lock));
>> +
>> +    if ( !node_need_scrub[node] )
>> +        return;
>> +
>> +    for ( zone = 0; zone < NR_ZONES; zone++ )
>> +    {
>> +        unsigned int order = MAX_ORDER;
>> +        do {
> Blank line between declaration(s) and statement(s) please.
>
> Also, considering that this entire function runs with the heap lock
> held, I think we really need to split that one into per-node ones.
> But of course, as with the NUMA locality related remark, this isn't
> a request to necessarily do this in the series here.

Keep in mind that last patch drops the lock when doing actual scrubbing
so it will get better.

>> +            unsigned int first_dirty;
> On x86 this change is fine at present, albeit not optimal. Its ARM
> equivalent, however, grows struct page_info in the 32-bit case,

It does? I am looking at include/asm-arm/mm.h and I don't see this.

> which I don't think is wanted or needed. You really only need
> MAX_ORDER+1 bits here, so I'd suggest making this a bit field

Just to make sure --- when you say "bit field" you mean masking various
bits in a word, not C language bit fields?

> (also on x86, to at once make obvious how many bits remain
> unused), and perhaps making need_tlbflush a single bit at once
> (unless its address is being taken somewhere).


accumulate_tlbflush() flush wants the address but I think it can be
modified to handle a single bit.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-06-09 15:22   ` Jan Beulich
@ 2017-06-09 20:55     ` Boris Ostrovsky
  2017-06-12  6:54       ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-09 20:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 06/09/2017 11:22 AM, Jan Beulich wrote:
>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
>>  
>>              /* Find smallest order which can satisfy the request. */
>>              for ( j = order; j <= MAX_ORDER; j++ )
>> +            {
>>                  if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
>> -                    return pg;
>> +                {
>> +                    if ( (order == 0) || use_unscrubbed ||
> Why is order 0 being special cased here? If this really is intended, a
> comment should be added.

That's because for a single page it's not worth skipping a dirty buddy.
(It is a pretty arbitrary number, could be <=1 or even <=2, presumably)

I'll add a comment.


>> @@ -855,10 +870,24 @@ static struct page_info *alloc_heap_pages(
>>      if ( d != NULL )
>>          d->last_alloc_node = node;
>>  
>> +    need_scrub &= !(memflags & MEMF_no_scrub);
> Can't this be done right away when need_scrub is being set?

No, because we use the earlier assignment to decide how we put
"sub-buddies" back to the heap (dirty or not). Here we use need_scrub to
decide whether to scrub the buddy.

This may change though with the changes that you suggested in the
comments to the first patch.

>
>>      for ( i = 0; i < (1 << order); i++ )
>>      {
>>          /* Reference count must continuously be zero for free pages. */
>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub ) != PGC_state_free);
> Isn't this change needed in one of the earlier patches already?

At this patch level we are still scrubbing in free_heap_pages() so there
is never an unscrubbed page in the allocator. The next patch will switch
to scrubbing from idle loop.

> There also is a stray blank ahead of the first closing paren here.
>
>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +        {
>> +            if ( need_scrub )
>> +                scrub_one_page(&pg[i]);
>> +            node_need_scrub[node]--;
>> +            /*
>> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
>> +             * on buddy's head. However, since we assign pg[i].count_info
>> +             * below, we can skip this.
>> +             */
> This comment is correct only with the current way struct page_info's
> fields are unionized. In fact I think the comment is unneeded - the
> buddy is being transitioned from free to allocated here, so the field
> loses its meaning.

That, actually, is exactly what I was trying to say. I can drop the
comment if you feel it is obvious why we don't need to set first_dirty.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-06-09 20:07     ` Boris Ostrovsky
@ 2017-06-12  6:50       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  6:50 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 09.06.17 at 22:07, <boris.ostrovsky@oracle.com> wrote:
> On 06/09/2017 10:50 AM, Jan Beulich wrote:
>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>> +            unsigned int first_dirty;
>> On x86 this change is fine at present, albeit not optimal. Its ARM
>> equivalent, however, grows struct page_info in the 32-bit case,
> 
> It does? I am looking at include/asm-arm/mm.h and I don't see this.

The union with field name u is 32 bits on 32-bit ARM right now.
You change grows it to 64 bits.

>> which I don't think is wanted or needed. You really only need
>> MAX_ORDER+1 bits here, so I'd suggest making this a bit field
> 
> Just to make sure --- when you say "bit field" you mean masking various
> bits in a word, not C language bit fields?

No, actually the latter.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-06-09 20:55     ` Boris Ostrovsky
@ 2017-06-12  6:54       ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  6:54 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 09.06.17 at 22:55, <boris.ostrovsky@oracle.com> wrote:
> On 06/09/2017 11:22 AM, Jan Beulich wrote:
>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -734,8 +735,15 @@ static struct page_info *get_free_buddy(unsigned int 
>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +        {
>>> +            if ( need_scrub )
>>> +                scrub_one_page(&pg[i]);
>>> +            node_need_scrub[node]--;
>>> +            /*
>>> +             * Technically, we need to set first_dirty to INVALID_DIRTY_IDX
>>> +             * on buddy's head. However, since we assign pg[i].count_info
>>> +             * below, we can skip this.
>>> +             */
>> This comment is correct only with the current way struct page_info's
>> fields are unionized. In fact I think the comment is unneeded - the
>> buddy is being transitioned from free to allocated here, so the field
>> loses its meaning.
> 
> That, actually, is exactly what I was trying to say. I can drop the
> comment if you feel it is obvious why we don't need to set first_dirty.

Well, my personal order of preference would be to (a) drop
the comment or then (b) re-word it to express the free ->
allocated transition as the reason explicitly. Others my prefer
a corrected comment over no comment at all ...

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-05-19 15:50 ` [PATCH v4 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-06-12  8:08   ` Jan Beulich
  2017-06-12 17:01     ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  8:08 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.

This is too brief for my taste. In particular the re-ordering ...

> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -118,8 +118,9 @@ static void idle_loop(void)
>      {
>          if ( cpu_is_offline(smp_processor_id()) )
>              play_dead();
> -        (*pm_idle)();
>          do_tasklet();
> +        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
> +            (*pm_idle)();
>          do_softirq();

... here (and its correctness / safety) needs an explanation. Not
processing tasklets right after idle wakeup is a not obviously
correct change. Nor is it immediately clear why this needs to be
pulled ahead for your purposes.

> --- a/xen/arch/x86/domain_page.c
> +++ b/xen/arch/x86/domain_page.c
> @@ -18,12 +18,14 @@
>  #include <asm/hardirq.h>
>  #include <asm/setup.h>
>  
> -static struct vcpu *__read_mostly override;
> +static DEFINE_PER_CPU(struct vcpu *, override);
>  
>  static inline struct vcpu *mapcache_current_vcpu(void)
>  {
> +    struct vcpu *v, *this_vcpu = this_cpu(override);
> +
>      /* In the common case we use the mapcache of the running VCPU. */
> -    struct vcpu *v = override ?: current;
> +    v = this_vcpu ?: current;

What's wrong with

    struct vcpu *v = this_cpu(override) ?: current;

? And this, btw, is another change which should have an explanation
in the commit message.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1010,15 +1010,79 @@ static int reserve_offlined_page(struct page_info *head)
>      return count;
>  }
>  
> -static void scrub_free_pages(unsigned int node)
> +static nodemask_t node_scrubbing;
> +
> +/*
> + * If get_node is true this will return closest node that needs to be scrubbed,
> + * with appropriate bit in node_scrubbing set.
> + * If get_node is not set, this will return *a* node that needs to be scrubbed.
> + * node_scrubbing bitmask will no be updated.
> + * If no node needs scrubbing then NUMA_NO_NODE is returned.
> + */
> +static unsigned int node_to_scrub(bool get_node)
>  {
> -    struct page_info *pg;
> -    unsigned int zone;
> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
> +    nodeid_t closest = NUMA_NO_NODE;
> +    u8 dist, shortest = 0xff;
>  
> -    ASSERT(spin_is_locked(&heap_lock));
> +    if ( node == NUMA_NO_NODE )
> +        node = 0;
>  
> -    if ( !node_need_scrub[node] )
> -        return;
> +    if ( node_need_scrub[node] &&
> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
> +        return node;
> +
> +    /*
> +     * See if there are memory-only nodes that need scrubbing and choose
> +     * the closest one.
> +     */
> +    local_node = node;
> +    while ( 1 )

As some compilers / compiler versions warn about such constructs
("condition is always true"), I generally recommend using "for ( ; ; )"
instead.

> +    {
> +        do {
> +            node = cycle_node(node, node_online_map);
> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
> +                  (node != local_node) );
> +
> +        if ( node == local_node )
> +            break;
> +
> +        if ( node_need_scrub[node] )
> +        {
> +            if ( !get_node )
> +                return node;
> +
> +            dist = __node_distance(local_node, node);
> +            if ( dist < shortest || closest == NUMA_NO_NODE )
> +            {
> +                if ( !node_test_and_set(node, node_scrubbing) )
> +                {
> +                    if ( closest != NUMA_NO_NODE )
> +                        node_clear(closest, node_scrubbing);

You call this function with no locks held, yet you temporarily set a
bit in node_scrubbing which you then may clear again here. That'll
prevent another idle CPU to do scrubbing on this node afaict,
which, while not a bug, is not optimal. Therefore I think for this
second part of the function you actually want to acquire the heap
lock.

> +                    shortest = dist;
> +                    closest = node;
> +                }
> +            }

Also note that two if()s like the above, when the inner one also
has no else, can and should be combined into one.

> @@ -1035,22 +1099,46 @@ static void scrub_free_pages(unsigned int node)
>  
>                  for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
>                  {
> +                    cnt++;
>                      if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>                      {
>                          scrub_one_page(&pg[i]);
>                          pg[i].count_info &= ~PGC_need_scrub;
>                          node_need_scrub[node]--;
> +                        cnt += 100; /* scrubbed pages add heavier weight. */
>                      }

While it doesn't matter much, I would guess that your intention
really was to either do the "cnt++" in an "else" to this "if()", or
use 99 instead of 100 above.

> -                }
>  
> -                page_list_del(pg, &heap(node, zone, order));
> -                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
> +                    /*
> +                     * Scrub a few (8) pages before becoming eligible for
> +                     * preemtion. But also count non-scrubbing loop iteration

"preemption" and "iterations"

> +                     * so that we don't get stuck here with an almost clean
> +                     * heap.
> +                     */
> +                    if ( softirq_pending(cpu) && cnt > 800 )

Please switch both sides of the &&.

> +                    {
> +                        preempt = true;
> +                        break;
> +                    }
> +                }
>  
> -                if ( node_need_scrub[node] == 0 )
> -                    return;
> +                if ( i == (1U << order) )
> +                {
> +                    page_list_del(pg, &heap(node, zone, order));
> +                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
> +                }
> +                else
> +                    pg->u.free.first_dirty = i + 1;

This seems wrong if you set "preempt" after scrubbing the last page
of a buddy - in that case the increment of i in the loop header is being
skipped yet the entire buddy is now clean, so the if() condition here
wants to be "i >= (1U << order) - 1" afaict. In any event it needs to
be impossible for first_dirty to obtain a value of 1U << order here.

> +                if ( preempt || (node_need_scrub[node] == 0) )
> +                    goto out;
>              }
>          } while ( order-- != 0 );
>      }
> +
> + out:
> +    spin_unlock(&heap_lock);
> +    node_clear(node, node_scrubbing);

With the memory-less node comment above it may be necessary
to switch these two around.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 5/8] spinlock: Introduce spin_lock_cb()
  2017-05-19 15:50 ` [PATCH v4 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-06-12  8:23   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  8:23 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> @@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
>      while ( tickets.tail != observe_head(&lock->tickets) )
>      {
>          LOCK_PROFILE_BLOCK;
> +        if ( unlikely(cb) )
> +            cb(data);

The description says "periodically", which to me implies every once
in a while, not on every iteration. Not a big problem, bit I wanted to
mention it anyway.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing
  2017-05-19 15:50 ` [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-06-12  8:30   ` Jan Beulich
  2017-06-12 17:11     ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  8:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> @@ -1090,24 +1131,51 @@ bool scrub_free_pages(void)
>          do {
>              while ( !page_list_empty(&heap(node, zone, order)) )
>              {
> -                unsigned int i;
> +                unsigned int i, dirty_cnt;
> +                struct scrub_wait_state st;
>  
>                  /* Unscrubbed pages are always at the end of the list. */
>                  pg = page_list_last(&heap(node, zone, order));
>                  if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
>                      break;
>  
> +                ASSERT(!pg->u.free.scrub_state);
> +                pg->u.free.scrub_state = PAGE_SCRUBBING;
> +
> +                spin_unlock(&heap_lock);
> +
> +                dirty_cnt = 0;
>                  for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
>                  {
>                      cnt++;
>                      if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>                      {
>                          scrub_one_page(&pg[i]);
> +                        /*
> +                         * We can modify count_info without holding heap
> +                         * lock since we effectively locked this buddy by
> +                         * setting its scrub_state.
> +                         */
>                          pg[i].count_info &= ~PGC_need_scrub;
> -                        node_need_scrub[node]--;
> +                        dirty_cnt++;
>                          cnt += 100; /* scrubbed pages add heavier weight. */
>                      }
>  
> +                    if ( pg->u.free.scrub_state & PAGE_SCRUB_ABORT )
> +                    {
> +                        /* Someone wants this chunk. Drop everything. */
> +
> +                        pg->u.free.first_dirty = (i == (1U << order)) ?

Similar like for the earlier patch, this condition is always false
(do to the condition in the loop header) afaict, and ...

> +                            INVALID_DIRTY_IDX : i + 1; 

... you'd again risk storing 1U << order into first_dirty here.

> @@ -1121,6 +1189,17 @@ bool scrub_free_pages(void)
>                      }
>                  }
>  
> +                st.pg = pg;
> +                st.first_dirty = (i == (1UL << order)) ?
> +                    INVALID_DIRTY_IDX : i + 1;

Same here then.

> +                st.drop = false;
> +                spin_lock_cb(&heap_lock, scrub_continue, &st);
> +
> +                node_need_scrub[node] -= dirty_cnt;
> +
> +                if ( st.drop )
> +                    goto out;
> +
>                  if ( i == (1U << order) )
>                  {
>                      page_list_del(pg, &heap(node, zone, order));
> @@ -1128,7 +1207,9 @@ bool scrub_free_pages(void)
>                  }
>                  else
>                      pg->u.free.first_dirty = i + 1;
> - 
> +

Please avoid adding the trailing blank in the first place (in the
earlier patch).

> @@ -1175,6 +1258,8 @@ static void free_heap_pages(
>          if ( page_state_is(&pg[i], offlined) )
>              tainted = 1;
>  
> +        pg[i].u.free.scrub_state = 0;

Is this really needed for every page in the buddy?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 8/8] mm: Make sure pages are scrubbed
  2017-05-19 15:50 ` [PATCH v4 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
@ 2017-06-12  8:43   ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-12  8:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
>  	  logged in the Xen ring buffer.
>  	  If unsure, say N here.
>  
> +config SCRUB_DEBUG
> +    bool "Page scrubbing test"
> +    default DEBUG
> +    ---help---

Indentation.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
>  static unsigned long __initdata opt_bootscrub_chunk = MB(128);
>  size_param("bootscrub_chunk", opt_bootscrub_chunk);
>  
> +#ifdef CONFIG_SCRUB_DEBUG
> +static bool boot_scrub_done;

It's not all that important as it's debugging code only, but -
__read_mostly?

> +static void check_one_page(struct page_info *pg)
> +{
> +#ifdef CONFIG_SCRUB_DEBUG
> +    mfn_t mfn = _mfn(page_to_mfn(pg));
> +    uint64_t *ptr;

const

> +    unsigned i;

unsigned int

> @@ -2269,7 +2325,8 @@ void scrub_one_page(struct page_info *pg)
>  
>  #ifndef NDEBUG
>      /* Avoid callers relying on allocations returning zeroed pages. */
> -    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
> +    unmap_domain_page(memset(__map_domain_page(pg),
> +                             SCRUB_BYTE_PATTERN, PAGE_SIZE));
>  #else
>      /* For a production build, clear_page() is the fastest way to scrub. */
>      clear_domain_page(_mfn(page_to_mfn(pg)));

With EXPERT=y SCRUB_DEBUG can also be selected for non-debug
builds, in which case they will be zeroed but check_one_page() will
nevertheless check for the non-zero pattern. IOW I think the
pattern needs to be zero for non-debug builds, at which point the
#if here would probably better check whether the pattern is non-
zero.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-12  8:08   ` Jan Beulich
@ 2017-06-12 17:01     ` Boris Ostrovsky
  2017-06-12 21:28       ` Dario Faggioli
  2017-06-13  8:12       ` Jan Beulich
  0 siblings, 2 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-12 17:01 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson,
	Dario Faggioli, tim, xen-devel, andrew.cooper3

On 06/12/2017 04:08 AM, Jan Beulich wrote:
>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> Instead of scrubbing pages during guest destruction (from
>> free_heap_pages()) do this opportunistically, from the idle loop.
> This is too brief for my taste. In particular the re-ordering ...
>
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -118,8 +118,9 @@ static void idle_loop(void)
>>      {
>>          if ( cpu_is_offline(smp_processor_id()) )
>>              play_dead();
>> -        (*pm_idle)();
>>          do_tasklet();
>> +        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
>> +            (*pm_idle)();
>>          do_softirq();
> ... here (and its correctness / safety) needs an explanation. Not
> processing tasklets right after idle wakeup is a not obviously
> correct change. Nor is it immediately clear why this needs to be
> pulled ahead for your purposes.

Are you asking for a comment here (i.e. the explanation given by Dario
(added)  in
https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01215.html)
or are you saying something is wrong?


>
>> +        if ( node_need_scrub[node] )
>> +        {
>> +            if ( !get_node )
>> +                return node;
>> +
>> +            dist = __node_distance(local_node, node);
>> +            if ( dist < shortest || closest == NUMA_NO_NODE )
>> +            {
>> +                if ( !node_test_and_set(node, node_scrubbing) )
>> +                {
>> +                    if ( closest != NUMA_NO_NODE )
>> +                        node_clear(closest, node_scrubbing);
> You call this function with no locks held, yet you temporarily set a
> bit in node_scrubbing which you then may clear again here. That'll
> prevent another idle CPU to do scrubbing on this node afaict,
> which, while not a bug, is not optimal. Therefore I think for this
> second part of the function you actually want to acquire the heap
> lock.

I actually specifically didn't want to take the heap lock here since we
will be calling this routine quite often and in most cases no scrubbing
will be needed.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing
  2017-06-12  8:30   ` Jan Beulich
@ 2017-06-12 17:11     ` Boris Ostrovsky
  0 siblings, 0 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-12 17:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel


>> @@ -1175,6 +1258,8 @@ static void free_heap_pages(
>>          if ( page_state_is(&pg[i], offlined) )
>>              tainted = 1;
>>  
>> +        pg[i].u.free.scrub_state = 0;
> Is this really needed for every page in the buddy?
>

The concern here is that is we break the buddy (in alloc_heap_pages(),
for example) this may still be set. But perhaps I just should be careful
about clearing this on the head when a (sub-)buddy is added to the heap.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-12 17:01     ` Boris Ostrovsky
@ 2017-06-12 21:28       ` Dario Faggioli
  2017-06-13  8:19         ` Jan Beulich
  2017-06-13  8:12       ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Dario Faggioli @ 2017-06-12 21:28 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim,
	xen-devel, andrew.cooper3


[-- Attachment #1.1: Type: text/plain, Size: 3491 bytes --]

On Mon, 2017-06-12 at 13:01 -0400, Boris Ostrovsky wrote:
> On 06/12/2017 04:08 AM, Jan Beulich wrote:
> > > > > On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
> > > 
> > > Instead of scrubbing pages during guest destruction (from
> > > free_heap_pages()) do this opportunistically, from the idle loop.
> > 
> > This is too brief for my taste. In particular the re-ordering ...
> > 
> > > --- a/xen/arch/x86/domain.c
> > > +++ b/xen/arch/x86/domain.c
> > > @@ -118,8 +118,9 @@ static void idle_loop(void)
> > >      {
> > >          if ( cpu_is_offline(smp_processor_id()) )
> > >              play_dead();
> > > -        (*pm_idle)();
> > >          do_tasklet();
> > > +        if ( cpu_is_haltable(smp_processor_id()) &&
> > > !scrub_free_pages() )
> > > +            (*pm_idle)();
> > >          do_softirq();
> > 
> > ... here (and its correctness / safety) needs an explanation. Not
> > processing tasklets right after idle wakeup is a not obviously
> > correct change. 
>
Well, one can also see things the other way round, though. I.e.:
considering that do_tasklet() is here for when we force the idle vcpu
into execution right because we want to process tasklets, doing that
only after having tried to sleep is not obviously correct.

And in fact, there's an unwritten (AFAICT) requirement that every
implementation of pm_idle should not actually sleep if there are
tasklets pending.

Truth is, IMO, that we may be here for two reasons: 1) going to sleep
or 2) running tasklet, and the only think we can do is guessing (and
ordering the call according to such guess) and checking whether we
guessed right or wrong. That is:
 - guess it's 1. Check whether it's really 1. If it is, go ahead with  
    it; if not, go for 2;
 - guess it's 2. Check whether it's really 2. If it is, go ahead with
   it, if not, go for 1;

Now scrubbing is kind of a third reason why we may be here, and doing
as done in the code above (although I'm not super happy of the final
look of the result either), should make all the use cases happy.

Also, what's the scenario where you think this may be problematic?
AFAICT, tasklets are vcpu context, or softirq context. If some softirq
context tasklet work is scheduled for a CPU while it is sleeping,
TASKLET_SOFTIRQ is raised, and the call to do_softirq() --which still
happens right after the wakeup-- will take care of it.

If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is raised.
do_softirq() will call the scheduler, which will see that there is vcpu
tasklet work to do, and hence confirm in execution the idle vcpu, which
will get to execute do_tasklet().

Anyway...

> > Nor is it immediately clear why this needs to be
> > pulled ahead for your purposes.
> 
> Are you asking for a comment here (i.e. the explanation given by
> Dario
> (added)  in
> https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01215
> .html)
> or are you saying something is wrong?
> 
...If it's more commenting that's being asked, either in the code or in
the changelog, that would indeed improve things, I agree.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-12 17:01     ` Boris Ostrovsky
  2017-06-12 21:28       ` Dario Faggioli
@ 2017-06-13  8:12       ` Jan Beulich
  2017-06-13 18:20         ` Boris Ostrovsky
  1 sibling, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-13  8:12 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, xen-devel

>>> On 12.06.17 at 19:01, <boris.ostrovsky@oracle.com> wrote:
> On 06/12/2017 04:08 AM, Jan Beulich wrote:
>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>> Instead of scrubbing pages during guest destruction (from
>>> free_heap_pages()) do this opportunistically, from the idle loop.
>> This is too brief for my taste. In particular the re-ordering ...
>>
>>> --- a/xen/arch/x86/domain.c
>>> +++ b/xen/arch/x86/domain.c
>>> @@ -118,8 +118,9 @@ static void idle_loop(void)
>>>      {
>>>          if ( cpu_is_offline(smp_processor_id()) )
>>>              play_dead();
>>> -        (*pm_idle)();
>>>          do_tasklet();
>>> +        if ( cpu_is_haltable(smp_processor_id()) && !scrub_free_pages() )
>>> +            (*pm_idle)();
>>>          do_softirq();
>> ... here (and its correctness / safety) needs an explanation. Not
>> processing tasklets right after idle wakeup is a not obviously
>> correct change. Nor is it immediately clear why this needs to be
>> pulled ahead for your purposes.
> 
> Are you asking for a comment here (i.e. the explanation given by Dario
> (added)  in
> https://lists.xenproject.org/archives/html/xen-devel/2017-05/msg01215.html)
> or are you saying something is wrong?

To judge whether the change is correct I'd like to have an
explanation in the commit message.

>>> +        if ( node_need_scrub[node] )
>>> +        {
>>> +            if ( !get_node )
>>> +                return node;
>>> +
>>> +            dist = __node_distance(local_node, node);
>>> +            if ( dist < shortest || closest == NUMA_NO_NODE )
>>> +            {
>>> +                if ( !node_test_and_set(node, node_scrubbing) )
>>> +                {
>>> +                    if ( closest != NUMA_NO_NODE )
>>> +                        node_clear(closest, node_scrubbing);
>> You call this function with no locks held, yet you temporarily set a
>> bit in node_scrubbing which you then may clear again here. That'll
>> prevent another idle CPU to do scrubbing on this node afaict,
>> which, while not a bug, is not optimal. Therefore I think for this
>> second part of the function you actually want to acquire the heap
>> lock.
> 
> I actually specifically didn't want to take the heap lock here since we
> will be calling this routine quite often and in most cases no scrubbing
> will be needed.

I'm not convinced - memory-only nodes shouldn't be that common,
so in a presumably large share of cases we don't even need to
enter this second part of the function. And if a debatable choice
is being made, giving the reason in a short comment would help
reviewers judge for themselves whether indeed the chosen
approach is the lesser of two (or more) possible evils.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-12 21:28       ` Dario Faggioli
@ 2017-06-13  8:19         ` Jan Beulich
  2017-06-13 18:39           ` Boris Ostrovsky
  0 siblings, 1 reply; 31+ messages in thread
From: Jan Beulich @ 2017-06-13  8:19 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Boris Ostrovsky

>>> On 12.06.17 at 23:28, <dario.faggioli@citrix.com> wrote:
> On Mon, 2017-06-12 at 13:01 -0400, Boris Ostrovsky wrote:
>> On 06/12/2017 04:08 AM, Jan Beulich wrote:
>> > > > > On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>> > > 
>> > > Instead of scrubbing pages during guest destruction (from
>> > > free_heap_pages()) do this opportunistically, from the idle loop.
>> > 
>> > This is too brief for my taste. In particular the re-ordering ...
>> > 
>> > > --- a/xen/arch/x86/domain.c
>> > > +++ b/xen/arch/x86/domain.c
>> > > @@ -118,8 +118,9 @@ static void idle_loop(void)
>> > >      {
>> > >          if ( cpu_is_offline(smp_processor_id()) )
>> > >              play_dead();
>> > > -        (*pm_idle)();
>> > >          do_tasklet();
>> > > +        if ( cpu_is_haltable(smp_processor_id()) &&
>> > > !scrub_free_pages() )
>> > > +            (*pm_idle)();
>> > >          do_softirq();
>> > 
>> > ... here (and its correctness / safety) needs an explanation. Not
>> > processing tasklets right after idle wakeup is a not obviously
>> > correct change. 
>>
> Well, one can also see things the other way round, though. I.e.:
> considering that do_tasklet() is here for when we force the idle vcpu
> into execution right because we want to process tasklets, doing that
> only after having tried to sleep is not obviously correct.

That's a valid point, but would then call for the re-ordering to
be done in a separate commit with proper explanation.

> And in fact, there's an unwritten (AFAICT) requirement that every
> implementation of pm_idle should not actually sleep if there are
> tasklets pending.

Unwritten or not - that check before actually going to sleep is
quite obviously required, as it needs to be done with interrupts
already disabled (i.e. can't be done _only_ here).

> Truth is, IMO, that we may be here for two reasons: 1) going to sleep
> or 2) running tasklet, and the only think we can do is guessing (and
> ordering the call according to such guess) and checking whether we
> guessed right or wrong. That is:
>  - guess it's 1. Check whether it's really 1. If it is, go ahead with  
>     it; if not, go for 2;
>  - guess it's 2. Check whether it's really 2. If it is, go ahead with
>    it, if not, go for 1;
> 
> Now scrubbing is kind of a third reason why we may be here, and doing
> as done in the code above (although I'm not super happy of the final
> look of the result either), should make all the use cases happy.
> 
> Also, what's the scenario where you think this may be problematic?

First of all I'm not sure there's anything problematic here. But with
no explanation given at all, the change also isn't obviously fine, as
it does alter behavior. If there's indeed nothing that can affect
what do_tasklet() would do and that might happen while we're in
the low level idle handler, then fine. But this needs to be proven.

> AFAICT, tasklets are vcpu context, or softirq context. If some softirq
> context tasklet work is scheduled for a CPU while it is sleeping,
> TASKLET_SOFTIRQ is raised, and the call to do_softirq() --which still
> happens right after the wakeup-- will take care of it.
> 
> If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is raised.
> do_softirq() will call the scheduler, which will see that there is vcpu
> tasklet work to do, and hence confirm in execution the idle vcpu, which
> will get to execute do_tasklet().

Right, so something along these lines will need to go into the commit
message.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13  8:12       ` Jan Beulich
@ 2017-06-13 18:20         ` Boris Ostrovsky
  2017-06-14  9:17           ` Jan Beulich
  0 siblings, 1 reply; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-13 18:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, xen-devel


>>>> +        if ( node_need_scrub[node] )
>>>> +        {
>>>> +            if ( !get_node )
>>>> +                return node;
>>>> +
>>>> +            dist = __node_distance(local_node, node);
>>>> +            if ( dist < shortest || closest == NUMA_NO_NODE )
>>>> +            {
>>>> +                if ( !node_test_and_set(node, node_scrubbing) )
>>>> +                {
>>>> +                    if ( closest != NUMA_NO_NODE )
>>>> +                        node_clear(closest, node_scrubbing);
>>> You call this function with no locks held, yet you temporarily set a
>>> bit in node_scrubbing which you then may clear again here. That'll
>>> prevent another idle CPU to do scrubbing on this node afaict,
>>> which, while not a bug, is not optimal. Therefore I think for this
>>> second part of the function you actually want to acquire the heap
>>> lock.
>> I actually specifically didn't want to take the heap lock here since we
>> will be calling this routine quite often and in most cases no scrubbing
>> will be needed.
> I'm not convinced - memory-only nodes shouldn't be that common,
> so in a presumably large share of cases we don't even need to
> enter this second part of the function. And if a debatable choice
> is being made, giving the reason in a short comment would help
> reviewers judge for themselves whether indeed the chosen
> approach is the lesser of two (or more) possible evils.

I realize that CPU-less nodes are rare but if we are on such a system we
will always be adding pressure on the heap lock. Especially on systems
with many CPUs.

Even if a CPU unnecessarily loses its turn to scrub because another
processor held the node and decided not to scrub it the first CPU can
come back next time when it wakes up from sleep.

Another alternative could be adding a local lock, just for this routine.

-boris


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13  8:19         ` Jan Beulich
@ 2017-06-13 18:39           ` Boris Ostrovsky
  2017-06-13 20:36             ` Dario Faggioli
  2017-06-14  9:18             ` Jan Beulich
  0 siblings, 2 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-13 18:39 UTC (permalink / raw)
  To: Jan Beulich, Dario Faggioli
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 06/13/2017 04:19 AM, Jan Beulich wrote:
>>>> On 12.06.17 at 23:28, <dario.faggioli@citrix.com> wrote:
>> On Mon, 2017-06-12 at 13:01 -0400, Boris Ostrovsky wrote:
>>> On 06/12/2017 04:08 AM, Jan Beulich wrote:
>>>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>>>> Instead of scrubbing pages during guest destruction (from
>>>>> free_heap_pages()) do this opportunistically, from the idle loop.
>>>> This is too brief for my taste. In particular the re-ordering ...
>>>>
>>>>> --- a/xen/arch/x86/domain.c
>>>>> +++ b/xen/arch/x86/domain.c
>>>>> @@ -118,8 +118,9 @@ static void idle_loop(void)
>>>>>      {
>>>>>          if ( cpu_is_offline(smp_processor_id()) )
>>>>>              play_dead();
>>>>> -        (*pm_idle)();
>>>>>          do_tasklet();
>>>>> +        if ( cpu_is_haltable(smp_processor_id()) &&
>>>>> !scrub_free_pages() )
>>>>> +            (*pm_idle)();
>>>>>          do_softirq();
>>>> ... here (and its correctness / safety) needs an explanation. Not
>>>> processing tasklets right after idle wakeup is a not obviously
>>>> correct change. 
>> Well, one can also see things the other way round, though. I.e.:
>> considering that do_tasklet() is here for when we force the idle vcpu
>> into execution right because we want to process tasklets, doing that
>> only after having tried to sleep is not obviously correct.
> That's a valid point, but would then call for the re-ordering to
> be done in a separate commit with proper explanation.
>
>> And in fact, there's an unwritten (AFAICT) requirement that every
>> implementation of pm_idle should not actually sleep if there are
>> tasklets pending.
> Unwritten or not - that check before actually going to sleep is
> quite obviously required, as it needs to be done with interrupts
> already disabled (i.e. can't be done _only_ here).
>
>> Truth is, IMO, that we may be here for two reasons: 1) going to sleep
>> or 2) running tasklet, and the only think we can do is guessing (and
>> ordering the call according to such guess) and checking whether we
>> guessed right or wrong. That is:
>>  - guess it's 1. Check whether it's really 1. If it is, go ahead with  
>>     it; if not, go for 2;
>>  - guess it's 2. Check whether it's really 2. If it is, go ahead with
>>    it, if not, go for 1;
>>
>> Now scrubbing is kind of a third reason why we may be here, and doing
>> as done in the code above (although I'm not super happy of the final
>> look of the result either), should make all the use cases happy.
>>
>> Also, what's the scenario where you think this may be problematic?
> First of all I'm not sure there's anything problematic here. But with
> no explanation given at all, the change also isn't obviously fine, as
> it does alter behavior. If there's indeed nothing that can affect
> what do_tasklet() would do and that might happen while we're in
> the low level idle handler, then fine. But this needs to be proven.
>
>> AFAICT, tasklets are vcpu context, or softirq context. If some softirq
>> context tasklet work is scheduled for a CPU while it is sleeping,
>> TASKLET_SOFTIRQ is raised, and the call to do_softirq() --which still
>> happens right after the wakeup-- will take care of it.
>>
>> If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is raised.
>> do_softirq() will call the scheduler, which will see that there is vcpu
>> tasklet work to do, and hence confirm in execution the idle vcpu, which
>> will get to execute do_tasklet().
> Right, so something along these lines will need to go into the commit
> message.


So would you then prefer to separate this into two patches, with the
first just moving do_tasklet() above sleeping?

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13 18:39           ` Boris Ostrovsky
@ 2017-06-13 20:36             ` Dario Faggioli
  2017-06-13 21:54               ` Boris Ostrovsky
  2017-06-14  9:18             ` Jan Beulich
  1 sibling, 1 reply; 31+ messages in thread
From: Dario Faggioli @ 2017-06-13 20:36 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1102 bytes --]

On Tue, 2017-06-13 at 14:39 -0400, Boris Ostrovsky wrote:
> On 06/13/2017 04:19 AM, Jan Beulich wrote:
> > > > > On 12.06.17 at 23:28, <dario.faggioli@citrix.com> wrote:
> > > If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is
> > > raised.
> > > do_softirq() will call the scheduler, which will see that there
> > > is vcpu
> > > tasklet work to do, and hence confirm in execution the idle vcpu,
> > > which
> > > will get to execute do_tasklet().
> > 
> > Right, so something along these lines will need to go into the
> > commit
> > message.
> 
> 
> So would you then prefer to separate this into two patches, with the
> first just moving do_tasklet() above sleeping?
> 
If you want, I can send a patch to this effect (which can then either
just go in, or Boris can carry it in this series).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13 20:36             ` Dario Faggioli
@ 2017-06-13 21:54               ` Boris Ostrovsky
  0 siblings, 0 replies; 31+ messages in thread
From: Boris Ostrovsky @ 2017-06-13 21:54 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1055 bytes --]

On 06/13/2017 04:36 PM, Dario Faggioli wrote:
> On Tue, 2017-06-13 at 14:39 -0400, Boris Ostrovsky wrote:
>> On 06/13/2017 04:19 AM, Jan Beulich wrote:
>>>>>> On 12.06.17 at 23:28, <dario.faggioli@citrix.com> wrote:
>>>> If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is
>>>> raised.
>>>> do_softirq() will call the scheduler, which will see that there
>>>> is vcpu
>>>> tasklet work to do, and hence confirm in execution the idle vcpu,
>>>> which
>>>> will get to execute do_tasklet().
>>> Right, so something along these lines will need to go into the
>>> commit
>>> message.
>>
>> So would you then prefer to separate this into two patches, with the
>> first just moving do_tasklet() above sleeping?
>>
> If you want, I can send a patch to this effect (which can then either
> just go in, or Boris can carry it in this series).

Thanks for the offer Dario.

We can do it either way but FYI I am leaving for ~3 weeks next Friday so
I am not sure I will be able have this series ready and tested by then.

-boris



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13 18:20         ` Boris Ostrovsky
@ 2017-06-14  9:17           ` Jan Beulich
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-14  9:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, xen-devel

>>> On 13.06.17 at 20:20, <boris.ostrovsky@oracle.com> wrote:

>>>>> +        if ( node_need_scrub[node] )
>>>>> +        {
>>>>> +            if ( !get_node )
>>>>> +                return node;
>>>>> +
>>>>> +            dist = __node_distance(local_node, node);
>>>>> +            if ( dist < shortest || closest == NUMA_NO_NODE )
>>>>> +            {
>>>>> +                if ( !node_test_and_set(node, node_scrubbing) )
>>>>> +                {
>>>>> +                    if ( closest != NUMA_NO_NODE )
>>>>> +                        node_clear(closest, node_scrubbing);
>>>> You call this function with no locks held, yet you temporarily set a
>>>> bit in node_scrubbing which you then may clear again here. That'll
>>>> prevent another idle CPU to do scrubbing on this node afaict,
>>>> which, while not a bug, is not optimal. Therefore I think for this
>>>> second part of the function you actually want to acquire the heap
>>>> lock.
>>> I actually specifically didn't want to take the heap lock here since we
>>> will be calling this routine quite often and in most cases no scrubbing
>>> will be needed.
>> I'm not convinced - memory-only nodes shouldn't be that common,
>> so in a presumably large share of cases we don't even need to
>> enter this second part of the function. And if a debatable choice
>> is being made, giving the reason in a short comment would help
>> reviewers judge for themselves whether indeed the chosen
>> approach is the lesser of two (or more) possible evils.
> 
> I realize that CPU-less nodes are rare but if we are on such a system we
> will always be adding pressure on the heap lock. Especially on systems
> with many CPUs.
> 
> Even if a CPU unnecessarily loses its turn to scrub because another
> processor held the node and decided not to scrub it the first CPU can
> come back next time when it wakes up from sleep.
> 
> Another alternative could be adding a local lock, just for this routine.

If no accesses elsewhere are affected, that's certainly an option.
But no matter what route you go, please reason about it in the
commit message.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v4 4/8] mm: Scrub memory from idle loop
  2017-06-13 18:39           ` Boris Ostrovsky
  2017-06-13 20:36             ` Dario Faggioli
@ 2017-06-14  9:18             ` Jan Beulich
  1 sibling, 0 replies; 31+ messages in thread
From: Jan Beulich @ 2017-06-14  9:18 UTC (permalink / raw)
  To: Dario Faggioli, Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 13.06.17 at 20:39, <boris.ostrovsky@oracle.com> wrote:
> On 06/13/2017 04:19 AM, Jan Beulich wrote:
>>>>> On 12.06.17 at 23:28, <dario.faggioli@citrix.com> wrote:
>>> On Mon, 2017-06-12 at 13:01 -0400, Boris Ostrovsky wrote:
>>>> On 06/12/2017 04:08 AM, Jan Beulich wrote:
>>>>>>>> On 19.05.17 at 17:50, <boris.ostrovsky@oracle.com> wrote:
>>>>>> Instead of scrubbing pages during guest destruction (from
>>>>>> free_heap_pages()) do this opportunistically, from the idle loop.
>>>>> This is too brief for my taste. In particular the re-ordering ...
>>>>>
>>>>>> --- a/xen/arch/x86/domain.c
>>>>>> +++ b/xen/arch/x86/domain.c
>>>>>> @@ -118,8 +118,9 @@ static void idle_loop(void)
>>>>>>      {
>>>>>>          if ( cpu_is_offline(smp_processor_id()) )
>>>>>>              play_dead();
>>>>>> -        (*pm_idle)();
>>>>>>          do_tasklet();
>>>>>> +        if ( cpu_is_haltable(smp_processor_id()) &&
>>>>>> !scrub_free_pages() )
>>>>>> +            (*pm_idle)();
>>>>>>          do_softirq();
>>>>> ... here (and its correctness / safety) needs an explanation. Not
>>>>> processing tasklets right after idle wakeup is a not obviously
>>>>> correct change. 
>>> Well, one can also see things the other way round, though. I.e.:
>>> considering that do_tasklet() is here for when we force the idle vcpu
>>> into execution right because we want to process tasklets, doing that
>>> only after having tried to sleep is not obviously correct.
>> That's a valid point, but would then call for the re-ordering to
>> be done in a separate commit with proper explanation.
>>
>>> And in fact, there's an unwritten (AFAICT) requirement that every
>>> implementation of pm_idle should not actually sleep if there are
>>> tasklets pending.
>> Unwritten or not - that check before actually going to sleep is
>> quite obviously required, as it needs to be done with interrupts
>> already disabled (i.e. can't be done _only_ here).
>>
>>> Truth is, IMO, that we may be here for two reasons: 1) going to sleep
>>> or 2) running tasklet, and the only think we can do is guessing (and
>>> ordering the call according to such guess) and checking whether we
>>> guessed right or wrong. That is:
>>>  - guess it's 1. Check whether it's really 1. If it is, go ahead with  
>>>     it; if not, go for 2;
>>>  - guess it's 2. Check whether it's really 2. If it is, go ahead with
>>>    it, if not, go for 1;
>>>
>>> Now scrubbing is kind of a third reason why we may be here, and doing
>>> as done in the code above (although I'm not super happy of the final
>>> look of the result either), should make all the use cases happy.
>>>
>>> Also, what's the scenario where you think this may be problematic?
>> First of all I'm not sure there's anything problematic here. But with
>> no explanation given at all, the change also isn't obviously fine, as
>> it does alter behavior. If there's indeed nothing that can affect
>> what do_tasklet() would do and that might happen while we're in
>> the low level idle handler, then fine. But this needs to be proven.
>>
>>> AFAICT, tasklets are vcpu context, or softirq context. If some softirq
>>> context tasklet work is scheduled for a CPU while it is sleeping,
>>> TASKLET_SOFTIRQ is raised, and the call to do_softirq() --which still
>>> happens right after the wakeup-- will take care of it.
>>>
>>> If some vcpu context work is scheduled, SCHEDULE_SOFTIRQ is raised.
>>> do_softirq() will call the scheduler, which will see that there is vcpu
>>> tasklet work to do, and hence confirm in execution the idle vcpu, which
>>> will get to execute do_tasklet().
>> Right, so something along these lines will need to go into the commit
>> message.
> 
> 
> So would you then prefer to separate this into two patches, with the
> first just moving do_tasklet() above sleeping?

Yes, please (no matter who of you two does so).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2017-06-14  9:18 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19 15:50 [PATCH v4 0/8] Memory scrubbing from idle loop Boris Ostrovsky
2017-05-19 15:50 ` [PATCH v4 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-06-09 14:50   ` Jan Beulich
2017-06-09 20:07     ` Boris Ostrovsky
2017-06-12  6:50       ` Jan Beulich
2017-05-19 15:50 ` [PATCH v4 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
2017-06-09 15:08   ` Jan Beulich
2017-05-19 15:50 ` [PATCH v4 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-06-09 15:22   ` Jan Beulich
2017-06-09 20:55     ` Boris Ostrovsky
2017-06-12  6:54       ` Jan Beulich
2017-05-19 15:50 ` [PATCH v4 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
2017-06-12  8:08   ` Jan Beulich
2017-06-12 17:01     ` Boris Ostrovsky
2017-06-12 21:28       ` Dario Faggioli
2017-06-13  8:19         ` Jan Beulich
2017-06-13 18:39           ` Boris Ostrovsky
2017-06-13 20:36             ` Dario Faggioli
2017-06-13 21:54               ` Boris Ostrovsky
2017-06-14  9:18             ` Jan Beulich
2017-06-13  8:12       ` Jan Beulich
2017-06-13 18:20         ` Boris Ostrovsky
2017-06-14  9:17           ` Jan Beulich
2017-05-19 15:50 ` [PATCH v4 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-06-12  8:23   ` Jan Beulich
2017-05-19 15:50 ` [PATCH v4 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
2017-06-12  8:30   ` Jan Beulich
2017-06-12 17:11     ` Boris Ostrovsky
2017-05-19 15:50 ` [PATCH v4 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-05-19 15:50 ` [PATCH v4 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
2017-06-12  8:43   ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.