All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] Memory scrubbing from idle loop
@ 2017-04-14 15:37 Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
                   ` (9 more replies)
  0 siblings, 10 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.


Boris Ostrovsky (9):
  mm: Separate free page chunk merging into its own routine
  mm: Place unscrubbed pages at the end of pagelist
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  mm: Do not discard already-scrubbed pages if softirqs are pending
  spinlock: Introduce spin_lock_cb()
  mm: Keep pages available for allocation while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug          |    7 +
 xen/arch/arm/domain.c      |   13 +-
 xen/arch/x86/domain.c      |    3 +-
 xen/common/page_alloc.c    |  529 ++++++++++++++++++++++++++++++++++++++------
 xen/common/spinlock.c      |    9 +-
 xen/include/asm-arm/mm.h   |   10 +
 xen/include/asm-x86/mm.h   |   10 +
 xen/include/xen/mm.h       |    1 +
 xen/include/xen/spinlock.h |    8 +
 9 files changed, 513 insertions(+), 77 deletions(-)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04  9:45   ` Jan Beulich
  2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

This is needed for subsequent changes to memory scrubbing.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---

Changes in v3:
* Simplify merge_and_free_buddy() (and drop can_merge())

 xen/common/page_alloc.c |   74 ++++++++++++++++++++++++++---------------------
 1 files changed, 41 insertions(+), 33 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e41fb4..6fe55ee 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -919,11 +919,50 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+/* Returns new buddy head. */
+static struct page_info *
+merge_and_free_buddy(struct page_info *pg, unsigned int node,
+                     unsigned int zone, unsigned int order)
+{
+    ASSERT(spin_is_locked(&heap_lock));
+
+    /* Merge chunks as far as possible. */
+    while ( order < MAX_ORDER )
+    {
+        unsigned long mask = 1UL << order;
+        struct page_info *buddy;
+
+        if ( (page_to_mfn(pg) & mask) )
+            buddy = pg - mask; /* Merge with predecessor block. */
+        else
+            buddy = pg + mask; /* Merge with successor block. */
+
+        if ( !mfn_valid(_mfn(page_to_mfn(buddy))) ||
+             !page_state_is(buddy, free) ||
+             (PFN_ORDER(buddy) != order) ||
+             (phys_to_nid(page_to_maddr(buddy)) != node) )
+            break;
+
+        page_list_del(buddy, &heap(node, zone, order));
+
+        /* Adjust current buddy head if we merged backwards. */
+        if ( buddy < pg )
+            pg = buddy;
+
+        order++;
+    }
+
+    PFN_ORDER(pg) = order;
+    page_list_add_tail(pg, &heap(node, zone, order));
+
+    return pg;
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
     struct page_info *pg, unsigned int order)
 {
-    unsigned long mask, mfn = page_to_mfn(pg);
+    unsigned long mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
     unsigned int zone = page_to_zone(pg);
 
@@ -970,38 +1009,7 @@ static void free_heap_pages(
         midsize_alloc_zone_pages = max(
             midsize_alloc_zone_pages, total_avail_pages / MIDSIZE_ALLOC_FRAC);
 
-    /* Merge chunks as far as possible. */
-    while ( order < MAX_ORDER )
-    {
-        mask = 1UL << order;
-
-        if ( (page_to_mfn(pg) & mask) )
-        {
-            /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
-                break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
-        }
-        else
-        {
-            /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
-                break;
-            page_list_del(pg + mask, &heap(node, zone, order));
-        }
-
-        order++;
-    }
-
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    pg = merge_and_free_buddy(pg, node, zone, order);
 
     if ( tainted )
         reserve_offlined_page(pg);
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04 10:17   ` Jan Beulich
  2017-05-08 16:41   ` George Dunlap
  2017-04-14 15:37 ` [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v3:
* Keep dirty bit per page, add dirty_head to page_info that indicates whether
  the buddy has dirty pages.
* Make page_list_add_scrub() set buddy's page order
* Data type adjustments (int -> unsigned)

 xen/common/page_alloc.c  |  119 +++++++++++++++++++++++++++++++++++++--------
 xen/include/asm-arm/mm.h |    6 ++
 xen/include/asm-x86/mm.h |    6 ++
 3 files changed, 110 insertions(+), 21 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6fe55ee..9dcf6ee 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                bool need_scrub)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.dirty_head = need_scrub;
+
+    if ( need_scrub )
+        page_list_add_tail(pg, &heap(node, zone, order));
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -802,7 +818,7 @@ static struct page_info *alloc_heap_pages(
     while ( j != order )
     {
         PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
+        page_list_add(pg, &heap(node, zone, j));
         pg += 1 << j;
     }
 
@@ -851,11 +867,14 @@ static int reserve_offlined_page(struct page_info *head)
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
     int cur_order;
+    bool need_scrub;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    head->u.free.dirty_head = false;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -892,8 +911,16 @@ static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+
+                /* See if any of the pages need scrubbing. */
+                need_scrub = false;
+                for ( i = 0; i < (1 << cur_order); i++ )
+                    if ( test_bit(_PGC_need_scrub, &cur_head[i].count_info) )
+                    {
+                        need_scrub = true;
+                        break;
+                    }
+                page_list_add_scrub(cur_head, node, zone, cur_order, need_scrub);
                 cur_head += (1 << cur_order);
                 break;
             }
@@ -922,10 +949,13 @@ static int reserve_offlined_page(struct page_info *head)
 /* Returns new buddy head. */
 static struct page_info *
 merge_and_free_buddy(struct page_info *pg, unsigned int node,
-                     unsigned int zone, unsigned int order)
+                     unsigned int zone, unsigned int order,
+                     bool need_scrub)
 {
     ASSERT(spin_is_locked(&heap_lock));
 
+    pg->u.free.dirty_head = false;
+
     /* Merge chunks as far as possible. */
     while ( order < MAX_ORDER )
     {
@@ -944,6 +974,8 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
             break;
 
         page_list_del(buddy, &heap(node, zone, order));
+        need_scrub |= buddy->u.free.dirty_head;
+        buddy->u.free.dirty_head = false;
 
         /* Adjust current buddy head if we merged backwards. */
         if ( buddy < pg )
@@ -952,15 +984,56 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, need_scrub);
 
     return pg;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone, order;
+    unsigned long i;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        order = MAX_ORDER;
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( !pg->u.free.dirty_head )
+                    break;
+
+                for ( i = 0; i < (1UL << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                merge_and_free_buddy(pg, node, zone, order, false);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -1009,11 +1082,22 @@ static void free_heap_pages(
         midsize_alloc_zone_pages = max(
             midsize_alloc_zone_pages, total_avail_pages / MIDSIZE_ALLOC_FRAC);
 
-    pg = merge_and_free_buddy(pg, node, zone, order);
+    if ( need_scrub )
+    {
+        for ( i = 0; i < (1 << order); i++ )
+            pg[i].count_info |= PGC_need_scrub;
+
+        node_need_scrub[node] += (1UL << order);
+    }
+
+    pg = merge_and_free_buddy(pg, node, zone, order, need_scrub);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1234,7 +1318,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1303,7 +1387,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1630,7 +1714,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1684,12 +1768,9 @@ void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1798,7 +1879,7 @@ struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1866,11 +1947,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 0fef612..abc3f6b 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -45,6 +45,8 @@ struct page_info
         struct {
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
+            /* Set on a buddy head if the buddy has unscrubbed pages. */
+            bool dirty_head;
         } free;
 
     } u;
@@ -115,6 +117,10 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/* Page needs to be scrubbed */
+#define _PGC_need_scrub   PG_shift(10)
+#define PGC_need_scrub    PG_mask(1, 10)
+
 extern unsigned long xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 119d7de..5cf528a 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -89,6 +89,8 @@ struct page_info
         struct {
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
+            /* Set on a buddy head if the buddy has unscrubbed pages. */
+            bool dirty_head;
         } free;
 
     } u;
@@ -233,6 +235,10 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/* Page needs to be scrubbed */
+#define _PGC_need_scrub   PG_shift(10)
+#define PGC_need_scrub    PG_mask(1, 10)
+
 struct spage_info
 {
        unsigned long type_info;
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04 14:44   ` Jan Beulich
  2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/page_alloc.c |   96 ++++++++++++++++++++++++++++++++++------------
 1 files changed, 71 insertions(+), 25 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9dcf6ee..055654d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -700,34 +700,17 @@ static struct page_info *alloc_heap_pages(
     unsigned int order, unsigned int memflags,
     struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
-    nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
+    unsigned int i, j, zone, nodemask_retry;
+    nodeid_t first_node, node, req_node;
     unsigned long request = 1UL << order;
     struct page_info *pg;
-    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
+    nodemask_t nodemask;
+    bool need_scrub, need_tlbflush = false, use_unscrubbed = false;
     uint32_t tlbflush_timestamp = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
     BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
 
-    if ( node == NUMA_NO_NODE )
-    {
-        if ( d != NULL )
-        {
-            node = next_node(d->last_alloc_node, nodemask);
-            if ( node >= MAX_NUMNODES )
-                node = first_node(nodemask);
-        }
-        if ( node >= MAX_NUMNODES )
-            node = cpu_to_node(smp_processor_id());
-    }
-    first_node = node;
-
-    ASSERT(node < MAX_NUMNODES);
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
     if ( unlikely(order > MAX_ORDER) )
         return NULL;
 
@@ -741,7 +724,10 @@ static struct page_info *alloc_heap_pages(
           total_avail_pages + tmem_freeable_pages()) &&
           ((memflags & MEMF_no_refcount) ||
            !d || d->outstanding_pages < request) )
-        goto not_found;
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
     /*
      * TMEM: When available memory is scarce due to tmem absorbing it, allow
@@ -754,6 +740,28 @@ static struct page_info *alloc_heap_pages(
          tmem_freeable_pages() )
         goto try_tmem;
 
+ again:
+
+    nodemask_retry = 0;
+    nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
+    node = req_node = MEMF_get_node(memflags);
+    if ( node == NUMA_NO_NODE )
+    {
+        if ( d != NULL )
+        {
+            node = next_node(d->last_alloc_node, nodemask);
+            if ( node >= MAX_NUMNODES )
+                node = first_node(nodemask);
+        }
+        if ( node >= MAX_NUMNODES )
+            node = cpu_to_node(smp_processor_id());
+    }
+    first_node = node;
+
+    ASSERT(node < MAX_NUMNODES);
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
     /*
      * Start with requested node, but exhaust all node memory in requested 
      * zone before failing, only calc new node value if we fail to find memory 
@@ -769,8 +777,16 @@ static struct page_info *alloc_heap_pages(
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    goto found;
+                {
+                    if ( (order == 0) || use_unscrubbed ||
+                         !pg->u.free.dirty_head )
+                        goto found;
+
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -809,16 +825,32 @@ static struct page_info *alloc_heap_pages(
     }
 
  not_found:
+    /*
+     * If we couldn't find clean page let's search again and this time
+     * take unscrubbed pages if available.
+     */
+    if ( !use_unscrubbed )
+    {
+        use_unscrubbed = true;
+        goto again;
+    }
+
     /* No suitable memory blocks. Fail the request. */
     spin_unlock(&heap_lock);
     return NULL;
 
  found: 
+    need_scrub = pg->u.free.dirty_head;
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add(pg, &heap(node, zone, j));
+        /*
+         * Some of the sub-chunks may be clean but we will mark them
+         * as dirty (if need_scrub is set) to avoid traversing the
+         * array here.
+         */
+        page_list_add_scrub(pg, node, zone, --j, need_scrub);
         pg += 1 << j;
     }
 
@@ -832,6 +864,20 @@ static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    if ( need_scrub )
+    {
+        for ( i = 0; i < (1 << order); i++ )
+        {
+            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+            {
+                scrub_one_page(&pg[i]);
+                pg[i].count_info &= ~PGC_need_scrub;
+                node_need_scrub[node]--;
+            }
+        }
+        pg->u.free.dirty_head = false;
+    }
+
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04 15:31   ` Jan Beulich
  2017-05-11 10:26   ` Dario Faggioli
  2017-04-14 15:37 ` [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending Boris Ostrovsky
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v3:
* If memory-only nodes exist, select the closest one for scrubbing
* Don't scrub from idle loop until we reach SYS_STATE_active.

 xen/arch/arm/domain.c   |   13 ++++--
 xen/arch/x86/domain.c   |    3 +-
 xen/common/page_alloc.c |   98 +++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h    |    1 +
 4 files changed, 98 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 76310ed..38d6331 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -46,13 +46,16 @@ void idle_loop(void)
         if ( cpu_is_offline(smp_processor_id()) )
             stop_cpu();
 
-        local_irq_disable();
-        if ( cpu_is_haltable(smp_processor_id()) )
+        if ( !scrub_free_pages() )
         {
-            dsb(sy);
-            wfi();
+            local_irq_disable();
+            if ( cpu_is_haltable(smp_processor_id()) )
+            {
+                dsb(sy);
+                wfi();
+            }
+            local_irq_enable();
         }
-        local_irq_enable();
 
         do_tasklet();
         do_softirq();
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 90e2b1f..a5f62b5 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -118,7 +118,8 @@ static void idle_loop(void)
     {
         if ( cpu_is_offline(smp_processor_id()) )
             play_dead();
-        (*pm_idle)();
+        if ( !scrub_free_pages() )
+            (*pm_idle)();
         do_tasklet();
         do_softirq();
         /*
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 055654d..fcd7308 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
     return pg;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+static unsigned int node_to_scrub(bool get_node)
+{
+    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+    nodeid_t closest = NUMA_NO_NODE;
+    u8 dist, shortest = 0xff;
+
+    if ( node == NUMA_NO_NODE )
+        node = 0;
+
+    if ( node_need_scrub[node] &&
+         (!get_node || !node_test_and_set(node, node_scrubbing)) )
+        return node;
+
+    /*
+     * See if there are memory-only nodes that need scrubbing and choose
+     * the closest one.
+     */
+    local_node = node;
+    while ( 1 )
+    {
+        do {
+            node = cycle_node(node, node_online_map);
+        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
+                  (node != local_node) );
+
+        if ( node == local_node )
+            break;
+
+        if ( node_need_scrub[node] )
+        {
+            if ( !get_node )
+                return node;
+
+            if ( !node_test_and_set(node, node_scrubbing) )
+            {
+                dist = __node_distance(local_node, node);
+                if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) )
+                {
+                    /* Release previous node. */
+                    if ( closest != NUMA_NO_NODE )
+                        node_clear(closest, node_scrubbing);
+                    shortest = dist;
+                    closest = node;
+                }
+                else
+                    node_clear(node, node_scrubbing);
+            }
+        }
+    }
+
+    return closest;
+}
+
+bool scrub_free_pages(void)
 {
     struct page_info *pg;
     unsigned int zone, order;
     unsigned long i;
+    unsigned int cpu = smp_processor_id();
+    bool preempt = false;
+    nodeid_t node;
 
-    ASSERT(spin_is_locked(&heap_lock));
+    /*
+     * Don't scrub while dom0 is being constructed since we may
+     * fail trying to call map_domain_page() from scrub_one_page().
+     */
+    if ( system_state < SYS_STATE_active )
+        return false;
+ 
+    node = node_to_scrub(true);
+    if ( node == NUMA_NO_NODE )
+        return false;
 
-    if ( !node_need_scrub[node] )
-        return;
+    spin_lock(&heap_lock);
 
     for ( zone = 0; zone < NR_ZONES; zone++ )
     {
@@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
                         pg[i].count_info &= ~PGC_need_scrub;
                         node_need_scrub[node]--;
                     }
+                    if ( softirq_pending(cpu) )
+                    {
+                        preempt = true;
+                        break;
+                    }
                 }
 
-                page_list_del(pg, &heap(node, zone, order));
-                merge_and_free_buddy(pg, node, zone, order, false);
+                if ( i == (1UL << order) )
+                {
+                    page_list_del(pg, &heap(node, zone, order));
+                    merge_and_free_buddy(pg, node, zone, order, false);
+                }
 
-                if ( node_need_scrub[node] == 0 )
-                    return;
+                if ( preempt || (node_need_scrub[node] == 0) )
+                    goto out;
             }
         } while ( order-- != 0 );
     }
+
+ out:
+    spin_unlock(&heap_lock);
+    node_clear(node, node_scrubbing);
+    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
 
 /* Free 2^@order set of pages. */
@@ -1141,9 +1220,6 @@ static void free_heap_pages(
     if ( tainted )
         reserve_offlined_page(pg);
 
-    if ( need_scrub )
-        scrub_free_pages(node);
-
     spin_unlock(&heap_lock);
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 88de3c1..b66dbbe 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -138,6 +138,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
 void free_xenheap_pages(void *v, unsigned int order);
+bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 /* Map machine page range in Xen virtual address space. */
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04 15:43   ` Jan Beulich
  2017-04-14 15:37 ` [PATCH v3 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

While scrubbing from idle loop, check for softirqs every 256 pages.
If softirq is pending, don't scrub any further and merge the
partially-scrubbed buddy back into heap by breaking the clean portion
into smaller power-of-2 chunks. Then repeat the same process for the
dirty part.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/page_alloc.c |   78 ++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 64 insertions(+), 14 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index fcd7308..0b2dff1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -996,7 +996,7 @@ static int reserve_offlined_page(struct page_info *head)
 static struct page_info *
 merge_and_free_buddy(struct page_info *pg, unsigned int node,
                      unsigned int zone, unsigned int order,
-                     bool need_scrub)
+                     bool need_scrub, bool is_frag)
 {
     ASSERT(spin_is_locked(&heap_lock));
 
@@ -1011,7 +1011,15 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
         if ( (page_to_mfn(pg) & mask) )
             buddy = pg - mask; /* Merge with predecessor block. */
         else
-            buddy = pg + mask; /* Merge with successor block. */
+        {
+            /*
+             * Merge with successor block.
+             * Only un-fragmented buddy can be merged forward.
+             */
+            if ( is_frag )
+                break;
+            buddy = pg + mask;
+        }
 
         if ( !mfn_valid(_mfn(page_to_mfn(buddy))) ||
              !page_state_is(buddy, free) ||
@@ -1093,12 +1101,15 @@ static unsigned int node_to_scrub(bool get_node)
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
-    unsigned int zone, order;
-    unsigned long i;
+    unsigned int zone, order, scrub_order;
+    unsigned long i, num_processed, start, end;
     unsigned int cpu = smp_processor_id();
-    bool preempt = false;
+    bool preempt = false, is_frag;
     nodeid_t node;
 
+    /* Scrubbing granularity. */
+#define SCRUB_CHUNK_ORDER  8
+
     /*
      * Don't scrub while dom0 is being constructed since we may
      * fail trying to call map_domain_page() from scrub_one_page().
@@ -1123,25 +1134,64 @@ bool scrub_free_pages(void)
                 if ( !pg->u.free.dirty_head )
                     break;
 
-                for ( i = 0; i < (1UL << order); i++)
+                scrub_order = MIN(order, SCRUB_CHUNK_ORDER);
+                num_processed = 0;
+                is_frag = false;
+                while ( num_processed < (1UL << order) )
                 {
-                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    for ( i = num_processed;
+                          i < num_processed + (1UL << scrub_order); i++ )
                     {
-                        scrub_one_page(&pg[i]);
-                        pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                        {
+                            scrub_one_page(&pg[i]);
+                            pg[i].count_info &= ~PGC_need_scrub;
+                            node_need_scrub[node]--;
+                        }
                     }
+
+                    num_processed += (1UL << scrub_order);
                     if ( softirq_pending(cpu) )
                     {
                         preempt = true;
+                        is_frag = (num_processed < (1UL << order));
                         break;
                     }
                 }
 
-                if ( i == (1UL << order) )
+                start = 0;
+                end = num_processed;
+
+                page_list_del(pg, &heap(node, zone, order));
+
+                /* Merge clean pages */
+                while ( start < end )
+                {
+                    /*
+                     * Largest power-of-two chunk starting @start,
+                     * not greater than @end
+                     */
+                    unsigned chunk_order = flsl(end - start) - 1;
+                    struct page_info *ppg = &pg[start];
+
+                    PFN_ORDER(ppg) = chunk_order;
+                    merge_and_free_buddy(ppg, node, zone, chunk_order, false, is_frag);
+                    start += (1UL << chunk_order);
+                }
+
+                /* Merge unscrubbed pages */
+                while ( end < (1UL << order) )
                 {
-                    page_list_del(pg, &heap(node, zone, order));
-                    merge_and_free_buddy(pg, node, zone, order, false);
+                    /*
+                     * Largest power-of-two chunk starting @end, not crossing
+                     * next power-of-two boundary
+                     */
+                    unsigned chunk_order = ffsl(end) - 1;
+                    struct page_info *ppg = &pg[end];
+
+                    PFN_ORDER(ppg) = chunk_order;
+                    merge_and_free_buddy(ppg, node, zone, chunk_order, true, true);
+                    end += (1UL << chunk_order);
                 }
 
                 if ( preempt || (node_need_scrub[node] == 0) )
@@ -1215,7 +1265,7 @@ static void free_heap_pages(
         node_need_scrub[node] += (1UL << order);
     }
 
-    pg = merge_and_free_buddy(pg, node, zone, order, need_scrub);
+    pg = merge_and_free_buddy(pg, node, zone, order, need_scrub, false);
 
     if ( tainted )
         reserve_offlined_page(pg);
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 6/9] spinlock: Introduce spin_lock_cb()
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing Boris Ostrovsky
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket abd execute a
callback while waiting.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v3:
* Integrate smp_mb into spin_lock_kick()

 xen/common/spinlock.c      |    9 ++++++++-
 xen/include/xen/spinlock.h |    8 ++++++++
 2 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
     return read_atomic(&t->head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
     spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
     LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
     while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
+        if ( unlikely(cb) )
+            cb(data);
         arch_lock_relax();
     }
     LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
     arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+     _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)                  _spin_lock(l)
+#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)              _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
     1 : ({ local_irq_restore(flags); 0; });     \
 })
 
+#define spin_lock_kick(l)                       \
+({                                              \
+    smp_mb();                                   \
+    arch_lock_signal();                         \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)               _spin_barrier(l)
 
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-04 16:03   ` Jan Beulich
  2017-04-14 15:37 ` [PATCH v3 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's PAGE_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v3:
* Adjusted page_info's scrub_state definitions but kept them as binary
  flags since I think having both PAGE_SCRUBBING and PAGE_SCRUB_ABORT
  bits set make sense.

 xen/common/page_alloc.c  |   92 ++++++++++++++++++++++++++++++++++++++++++---
 xen/include/asm-arm/mm.h |    4 ++
 xen/include/asm-x86/mm.h |    4 ++
 3 files changed, 93 insertions(+), 7 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 0b2dff1..514a4a1 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,6 +694,17 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+    if ( head->u.free.scrub_state & PAGE_SCRUBBING )
+    {
+        head->u.free.scrub_state |= PAGE_SCRUB_ABORT;
+        spin_lock_kick();
+        while ( ACCESS_ONCE(head->u.free.scrub_state) & PAGE_SCRUB_ABORT )
+            cpu_relax();
+    }
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -780,10 +791,15 @@ static struct page_info *alloc_heap_pages(
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
-                    if ( (order == 0) || use_unscrubbed ||
-                         !pg->u.free.dirty_head )
+                    if ( !pg->u.free.dirty_head )
                         goto found;
 
+                    if ( (order == 0) || use_unscrubbed )
+                    {
+                        check_and_stop_scrub(pg);
+                        goto found;
+                    }
+
                     page_list_add_tail(pg, &heap(node, zone, j));
                 }
             }
@@ -921,6 +937,8 @@ static int reserve_offlined_page(struct page_info *head)
 
     head->u.free.dirty_head = false;
 
+    check_and_stop_scrub(head);
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -1027,6 +1045,9 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
              (phys_to_nid(page_to_maddr(buddy)) != node) )
             break;
 
+        if ( buddy->u.free.scrub_state & PAGE_SCRUBBING )
+            break;
+
         page_list_del(buddy, &heap(node, zone, order));
         need_scrub |= buddy->u.free.dirty_head;
         buddy->u.free.dirty_head = false;
@@ -1098,14 +1119,35 @@ static unsigned int node_to_scrub(bool get_node)
     return closest;
 }
 
+struct scrub_wait_state {
+    struct page_info *pg;
+    bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+    struct scrub_wait_state *st = data;
+
+    if ( st->drop )
+        return;
+
+    if ( st->pg->u.free.scrub_state & PAGE_SCRUB_ABORT )
+    {
+        /* There is a waiter for this buddy. Release it. */
+        st->drop = true;
+        st->pg->u.free.scrub_state = 0;
+    }
+}
+
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
     unsigned int zone, order, scrub_order;
-    unsigned long i, num_processed, start, end;
+    unsigned long i, num_processed, start, end, dirty_cnt;
     unsigned int cpu = smp_processor_id();
     bool preempt = false, is_frag;
     nodeid_t node;
+    struct scrub_wait_state st;
 
     /* Scrubbing granularity. */
 #define SCRUB_CHUNK_ORDER  8
@@ -1134,8 +1176,13 @@ bool scrub_free_pages(void)
                 if ( !pg->u.free.dirty_head )
                     break;
 
+                ASSERT(!pg->u.free.scrub_state);
+                pg->u.free.scrub_state = PAGE_SCRUBBING;
+
+                spin_unlock(&heap_lock);
+
                 scrub_order = MIN(order, SCRUB_CHUNK_ORDER);
-                num_processed = 0;
+                num_processed = dirty_cnt = 0;
                 is_frag = false;
                 while ( num_processed < (1UL << order) )
                 {
@@ -1145,8 +1192,24 @@ bool scrub_free_pages(void)
                         if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                         {
                             scrub_one_page(&pg[i]);
+                            /*
+                             * We can modify count_info without holding heap
+                             * lock since we effectively locked this buddy by
+                             * setting its scrub_state.
+                             */
                             pg[i].count_info &= ~PGC_need_scrub;
-                            node_need_scrub[node]--;
+                            dirty_cnt++;
+                        }
+
+                        if ( ACCESS_ONCE(pg->u.free.scrub_state) &
+                             PAGE_SCRUB_ABORT )
+                        {
+                            /* Someone wants this chunk. Drop everything. */
+                            pg->u.free.scrub_state = 0;
+                            spin_lock(&heap_lock);
+                            node_need_scrub[node] -= dirty_cnt;
+                            spin_unlock(&heap_lock);
+                            goto out_nolock;
                         }
                     }
 
@@ -1159,11 +1222,20 @@ bool scrub_free_pages(void)
                     }
                 }
 
-                start = 0;
-                end = num_processed;
+                st.pg = pg;
+                st.drop = false;
+                spin_lock_cb(&heap_lock, scrub_continue, &st);
+
+                node_need_scrub[node] -= dirty_cnt;
+
+                if ( st.drop )
+                    goto out;
 
                 page_list_del(pg, &heap(node, zone, order));
 
+                start = 0;
+                end = num_processed;
+
                 /* Merge clean pages */
                 while ( start < end )
                 {
@@ -1194,6 +1266,8 @@ bool scrub_free_pages(void)
                     end += (1UL << chunk_order);
                 }
 
+                pg->u.free.scrub_state = 0;
+
                 if ( preempt || (node_need_scrub[node] == 0) )
                     goto out;
             }
@@ -1202,6 +1276,8 @@ bool scrub_free_pages(void)
 
  out:
     spin_unlock(&heap_lock);
+
+ out_nolock:
     node_clear(node, node_scrubbing);
     return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
@@ -1240,6 +1316,8 @@ static void free_heap_pages(
         if ( page_state_is(&pg[i], offlined) )
             tainted = 1;
 
+        pg[i].u.free.scrub_state = 0;
+
         /* If a page has no owner it will need no safety TLB flush. */
         pg[i].u.free.need_tlbflush = (page_get_owner(&pg[i]) != NULL);
         if ( pg[i].u.free.need_tlbflush )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index abc3f6b..b333b16 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -43,6 +43,10 @@ struct page_info
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#define PAGE_SCRUBBING      (1<<0)
+#define PAGE_SCRUB_ABORT    (1<<1)
+            unsigned char scrub_state;
+
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
             /* Set on a buddy head if the buddy has unscrubbed pages. */
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 5cf528a..d00c4a1 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -87,6 +87,10 @@ struct page_info
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
+#define PAGE_SCRUBBING      (1<<0)
+#define PAGE_SCRUB_ABORT    (1<<1)
+            unsigned char scrub_state;
+
             /* Do TLBs need flushing for safety before next page use? */
             bool_t need_tlbflush;
             /* Set on a buddy head if the buddy has unscrubbed pages. */
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 8/9] mm: Print number of unscrubbed pages in 'H' debug handler
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-04-14 15:37 ` [PATCH v3 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
  2017-05-02 14:46 ` [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  9 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/page_alloc.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 514a4a1..dd6f248 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2296,6 +2296,13 @@ static void dump_heap(unsigned char key)
             printk("heap[node=%d][zone=%d] -> %lu pages\n",
                    i, j, avail[i][j]);
     }
+
+    for ( i = 0; i < MAX_NUMNODES; i++ )
+    {
+        if ( !node_need_scrub[i] )
+            continue;
+        printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+    }
 }
 
 static __init int register_heap_trigger(void)
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH v3 9/9] mm: Make sure pages are scrubbed
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
@ 2017-04-14 15:37 ` Boris Ostrovsky
  2017-05-05 15:05   ` Jan Beulich
  2017-05-02 14:46 ` [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-04-14 15:37 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/Kconfig.debug       |    7 ++++++
 xen/common/page_alloc.c |   49 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 55 insertions(+), 1 deletions(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..f3bf9a9 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
 	  logged in the Xen ring buffer.
 	  If unsure, say N here.
 
+config SCRUB_DEBUG
+    bool "Page scrubbing test"
+    default DEBUG
+    ---help---
+      Verify that pages that need to be scrubbed before being allocated to
+      a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index dd6f248..18c80d7 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,6 +694,31 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+#define SCRUB_BYTE_PATTERN 0xc2c2c2c2c2c2c2c2
+#ifdef CONFIG_SCRUB_DEBUG
+static void poison_one_page(struct page_info *pg)
+{
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+
+    ptr = map_domain_page(mfn);
+    *ptr = ~SCRUB_BYTE_PATTERN;
+    unmap_domain_page(ptr);
+}
+
+static void check_one_page(struct page_info *pg)
+{
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+    unsigned i;
+
+    ptr = map_domain_page(mfn);
+    for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+        ASSERT(ptr[i] == SCRUB_BYTE_PATTERN);
+    unmap_domain_page(ptr);
+}
+#endif /* CONFIG_SCRUB_DEBUG */
+
 static void check_and_stop_scrub(struct page_info *head)
 {
     if ( head->u.free.scrub_state & PAGE_SCRUBBING )
@@ -912,6 +937,11 @@ static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]));
+
+#ifdef CONFIG_SCRUB_DEBUG
+        if ( d && !is_idle_domain(d) )
+            check_one_page(&pg[i]);
+#endif
     }
 
     spin_unlock(&heap_lock);
@@ -1341,6 +1371,11 @@ static void free_heap_pages(
             pg[i].count_info |= PGC_need_scrub;
 
         node_need_scrub[node] += (1UL << order);
+
+#ifdef CONFIG_SCRUB_DEBUG
+        for ( i = 0; i < (1 << order); i++ )
+            poison_one_page(&pg[i]);
+#endif
     }
 
     pg = merge_and_free_buddy(pg, node, zone, order, need_scrub, false);
@@ -1637,6 +1672,14 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
+#ifdef CONFIG_SCRUB_DEBUG
+        /*
+         * These pages get into heap and are allocated to dom0 before
+         * boot scrub happens.
+         * Not scrubbing them here will cause failure in check_one_page().
+         */
+        scrub_one_page(pg + i);
+#endif
         free_heap_pages(pg + i, 0, false);
     }
 }
@@ -2170,6 +2213,9 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             {
                 BUG_ON((pg[i].u.inuse.type_info & PGT_count_mask) != 0);
                 arch_free_heap_page(d, &pg[i]);
+#ifdef CONFIG_SCRUB_DEBUG
+                scrub_one_page(&pg[i]);
+#endif
             }
 
             drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order));
@@ -2273,7 +2319,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
     /* Avoid callers relying on allocations returning zeroed pages. */
-    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+    unmap_domain_page(memset(__map_domain_page(pg),
+                             SCRUB_BYTE_PATTERN & 0xff, PAGE_SIZE));
 #else
     /* For a production build, clear_page() is the fastest way to scrub. */
     clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.7.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 0/9] Memory scrubbing from idle loop
  2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (8 preceding siblings ...)
  2017-04-14 15:37 ` [PATCH v3 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
@ 2017-05-02 14:46 ` Boris Ostrovsky
  2017-05-02 14:58   ` Jan Beulich
  9 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-02 14:46 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Ping?

On 04/14/2017 11:37 AM, Boris Ostrovsky wrote:
> When a domain is destroyed the hypervisor must scrub domain's pages before
> giving them to another guest in order to prevent leaking the deceased
> guest's data. Currently this is done during guest's destruction, possibly
> causing very lengthy cleanup process.
>
> This series adds support for scrubbing released pages from idle loop,
> making guest destruction significantly faster. For example, destroying a
> 1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
> using existing scrubbing algorithm.
>
> Briefly, the new algorithm places dirty pages at the end of heap's page list
> for each node/zone/order to avoid having to scan full list while searching
> for dirty pages. One processor form each node checks whether the node has any
> dirty pages and, if such pages are found, scrubs them. Scrubbing itself
> happens without holding heap lock so other users may access heap in the
> meantime. If while idle loop is scrubbing a particular chunk of pages this
> chunk is requested by the heap allocator, scrubbing is immediately stopped.
>
> On the allocation side, alloc_heap_pages() first tries to satisfy allocation
> request using only clean pages. If this is not possible, the search is
> repeated and dirty pages are scrubbed by the allocator.
>
> This series is somewhat based on earlier work by Bob Liu.
>
> V1:
> * Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
>   to scan whole buddy
> * Fix spin_lock_cb()
> * Scrub CPU-less nodes
> * ARM support. Note that I have not been able to test this, only built the
>   binary
> * Added scrub test patch (last one). Not sure whether it should be considered
>   for committing but I have been running with it.
>
> V2:
> * merge_chunks() returns new buddy head
> * scrub_free_pages() returns softirq pending status in addition to (factored out)
>   status of unscrubbed memory
> * spin_lock uses inlined spin_lock_cb()
> * scrub debugging code checks whole page, not just the first word.
>
> V3:
> * Keep dirty bit per page
> * Simplify merge_chunks() (now merge_and_free_buddy())
> * When scrubbing memmory-only nodes try to find the closest node.
>
> Deferred:
> * Per-node heap locks. In addition to (presumably) improving performance in
>   general, once they are available we can parallelize scrubbing further by
>   allowing more than one core per node to do idle loop scrubbing.
> * AVX-based scrubbing
> * Use idle loop scrubbing during boot.
>
>
> Boris Ostrovsky (9):
>   mm: Separate free page chunk merging into its own routine
>   mm: Place unscrubbed pages at the end of pagelist
>   mm: Scrub pages in alloc_heap_pages() if needed
>   mm: Scrub memory from idle loop
>   mm: Do not discard already-scrubbed pages if softirqs are pending
>   spinlock: Introduce spin_lock_cb()
>   mm: Keep pages available for allocation while scrubbing
>   mm: Print number of unscrubbed pages in 'H' debug handler
>   mm: Make sure pages are scrubbed
>
>  xen/Kconfig.debug          |    7 +
>  xen/arch/arm/domain.c      |   13 +-
>  xen/arch/x86/domain.c      |    3 +-
>  xen/common/page_alloc.c    |  529 ++++++++++++++++++++++++++++++++++++++------
>  xen/common/spinlock.c      |    9 +-
>  xen/include/asm-arm/mm.h   |   10 +
>  xen/include/asm-x86/mm.h   |   10 +
>  xen/include/xen/mm.h       |    1 +
>  xen/include/xen/spinlock.h |    8 +
>  9 files changed, 513 insertions(+), 77 deletions(-)
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 0/9] Memory scrubbing from idle loop
  2017-05-02 14:46 ` [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-05-02 14:58   ` Jan Beulich
  2017-05-02 15:07     ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-02 14:58 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 02.05.17 at 16:46, <boris.ostrovsky@oracle.com> wrote:
> Ping?

I has made it to near the top of my to-be-reviewed list, so hopefully
soon. It's post-4.9 stuff only anyway, so not really needing quick
turnaround.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 0/9] Memory scrubbing from idle loop
  2017-05-02 14:58   ` Jan Beulich
@ 2017-05-02 15:07     ` Boris Ostrovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-02 15:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/02/2017 10:58 AM, Jan Beulich wrote:
>>>> On 02.05.17 at 16:46, <boris.ostrovsky@oracle.com> wrote:
>> Ping?
> I has made it to near the top of my to-be-reviewed list, so hopefully
> soon. It's post-4.9 stuff only anyway, so not really needing quick
> turnaround.


Thanks.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine
  2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
@ 2017-05-04  9:45   ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-04  9:45 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -919,11 +919,50 @@ static int reserve_offlined_page(struct page_info *head)
>      return count;
>  }
>  
> +/* Returns new buddy head. */
> +static struct page_info *
> +merge_and_free_buddy(struct page_info *pg, unsigned int node,
> +                     unsigned int zone, unsigned int order)
> +{
> +    ASSERT(spin_is_locked(&heap_lock));
> +
> +    /* Merge chunks as far as possible. */
> +    while ( order < MAX_ORDER )
> +    {
> +        unsigned long mask = 1UL << order;
> +        struct page_info *buddy;
> +
> +        if ( (page_to_mfn(pg) & mask) )

Stray inner parentheses.

> +            buddy = pg - mask; /* Merge with predecessor block. */
> +        else
> +            buddy = pg + mask; /* Merge with successor block. */
> +
> +        if ( !mfn_valid(_mfn(page_to_mfn(buddy))) ||
> +             !page_state_is(buddy, free) ||
> +             (PFN_ORDER(buddy) != order) ||
> +             (phys_to_nid(page_to_maddr(buddy)) != node) )

I think generated code would benefit from pulling out the page->mfn
translation, using the result also in place of page_to_maddr().

With at least the cosmetic issue above taken care of,
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-05-04 10:17   ` Jan Beulich
  2017-05-04 14:53     ` Boris Ostrovsky
  2017-05-08 16:41   ` George Dunlap
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 10:17 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> @@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
>      }
>  }
>  
> +/* Pages that need scrub are added to tail, otherwise to head. */
> +static void page_list_add_scrub(struct page_info *pg, unsigned int node,
> +                                unsigned int zone, unsigned int order,
> +                                bool need_scrub)
> +{
> +    PFN_ORDER(pg) = order;
> +    pg->u.free.dirty_head = need_scrub;
> +
> +    if ( need_scrub )
> +        page_list_add_tail(pg, &heap(node, zone, order));
> +    else
> +        page_list_add(pg, &heap(node, zone, order));
> +}
> +
>  /* Allocate 2^@order contiguous pages. */
>  static struct page_info *alloc_heap_pages(
>      unsigned int zone_lo, unsigned int zone_hi,
> @@ -802,7 +818,7 @@ static struct page_info *alloc_heap_pages(
>      while ( j != order )
>      {
>          PFN_ORDER(pg) = --j;
> -        page_list_add_tail(pg, &heap(node, zone, j));
> +        page_list_add(pg, &heap(node, zone, j));

Don't you need to replicate pg->u.free.dirty_head (and hence use
page_list_add_scrub()) here too?

> @@ -851,11 +867,14 @@ static int reserve_offlined_page(struct page_info *head)
>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>      struct page_info *cur_head;
>      int cur_order;
> +    bool need_scrub;

Please put this in the most narrow scope it's needed in.

>      ASSERT(spin_is_locked(&heap_lock));
>  
>      cur_head = head;
>  
> +    head->u.free.dirty_head = false;
> +
>      page_list_del(head, &heap(node, zone, head_order));
>  
>      while ( cur_head < (head + (1 << head_order)) )
> @@ -892,8 +911,16 @@ static int reserve_offlined_page(struct page_info *head)
>              {
>              merge:
>                  /* We don't consider merging outside the head_order. */
> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
> -                PFN_ORDER(cur_head) = cur_order;
> +
> +                /* See if any of the pages need scrubbing. */
> +                need_scrub = false;
> +                for ( i = 0; i < (1 << cur_order); i++ )
> +                    if ( test_bit(_PGC_need_scrub, &cur_head[i].count_info) )
> +                    {
> +                        need_scrub = true;
> +                        break;
> +                    }

Can't you skip this loop when the incoming chunk has
->u.free.dirty_head clear?

> +static void scrub_free_pages(unsigned int node)
> +{
> +    struct page_info *pg;
> +    unsigned int zone, order;
> +    unsigned long i;

Here I would similarly appreciate if the local variables were moved
into the scopes they're actually needed in.

> +    ASSERT(spin_is_locked(&heap_lock));
> +
> +    if ( !node_need_scrub[node] )
> +        return;
> +
> +    for ( zone = 0; zone < NR_ZONES; zone++ )
> +    {
> +        order = MAX_ORDER;
> +        do {
> +            while ( !page_list_empty(&heap(node, zone, order)) )
> +            {
> +                /* Unscrubbed pages are always at the end of the list. */
> +                pg = page_list_last(&heap(node, zone, order));
> +                if ( !pg->u.free.dirty_head )
> +                    break;
> +
> +                for ( i = 0; i < (1UL << order); i++)
> +                {
> +                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +                    {
> +                        scrub_one_page(&pg[i]);
> +                        pg[i].count_info &= ~PGC_need_scrub;
> +                        node_need_scrub[node]--;
> +                    }
> +                }
> +
> +                page_list_del(pg, &heap(node, zone, order));
> +                merge_and_free_buddy(pg, node, zone, order, false);

Is there actually any merging involved here, i.e. can't you
simply put back the buddy at the list head?

> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -45,6 +45,8 @@ struct page_info
>          struct {
>              /* Do TLBs need flushing for safety before next page use? */
>              bool_t need_tlbflush;
> +            /* Set on a buddy head if the buddy has unscrubbed pages. */
> +            bool dirty_head;
>          } free;
>  
>      } u;
> @@ -115,6 +117,10 @@ struct page_info
>  #define PGC_count_width   PG_shift(9)
>  #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
>  
> +/* Page needs to be scrubbed */
> +#define _PGC_need_scrub   PG_shift(10)
> +#define PGC_need_scrub    PG_mask(1, 10)

This is at least dangerous: You're borrowing a bit from
PGC_count_mask. That's presumably okay because you only
ever set the bit on free pages (and it is being zapped by
alloc_heap_pages() setting ->count_info to PGC_state_inuse),
but I think it would be more explicit that you borrow another bit
if you re-used one of the existing ones (like PGC_allocated), and
then did so by using a straight #define to that other value.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed
  2017-04-14 15:37 ` [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-05-04 14:44   ` Jan Beulich
  2017-05-04 15:04     ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 14:44 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> When allocating pages in alloc_heap_pages() first look for clean pages.

As expressed before, there are cases when we don't really need
scrubbed pages. Hence the local variable "use_unscrubbed" below
should really be some form of input to alloc_heap_pages().

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -700,34 +700,17 @@ static struct page_info *alloc_heap_pages(
>      unsigned int order, unsigned int memflags,
>      struct domain *d)
>  {
> -    unsigned int i, j, zone = 0, nodemask_retry = 0;
> -    nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
> +    unsigned int i, j, zone, nodemask_retry;
> +    nodeid_t first_node, node, req_node;
>      unsigned long request = 1UL << order;
>      struct page_info *pg;
> -    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
> -    bool_t need_tlbflush = 0;
> +    nodemask_t nodemask;
> +    bool need_scrub, need_tlbflush = false, use_unscrubbed = false;
>      uint32_t tlbflush_timestamp = 0;
>  
>      /* Make sure there are enough bits in memflags for nodeID. */
>      BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
>  
> -    if ( node == NUMA_NO_NODE )
> -    {
> -        if ( d != NULL )
> -        {
> -            node = next_node(d->last_alloc_node, nodemask);
> -            if ( node >= MAX_NUMNODES )
> -                node = first_node(nodemask);
> -        }
> -        if ( node >= MAX_NUMNODES )
> -            node = cpu_to_node(smp_processor_id());
> -    }
> -    first_node = node;
> -
> -    ASSERT(node < MAX_NUMNODES);
> -    ASSERT(zone_lo <= zone_hi);
> -    ASSERT(zone_hi < NR_ZONES);

The last two can remain where they are (but see also below).

> @@ -754,6 +740,28 @@ static struct page_info *alloc_heap_pages(
>           tmem_freeable_pages() )
>          goto try_tmem;
>  
> + again:

Is there any hope to get away without such an ugly pseudo loop?
E.g. by making this function a helper function of a relatively thin
wrapper named alloc_heap_pages(), invoking this helper twice
unless use_unscrubbed is true (as said, this ought to be an input)?

> +    nodemask_retry = 0;
> +    nodemask = (d != NULL ) ? d->node_affinity : node_online_map;

Stray blank before closing paren; you may want to consider dropping
the != NULL altogether, at which point the parens won't be needed
anymore (unless, of course, all the code churn here can be undone
anyway).

> @@ -769,8 +777,16 @@ static struct page_info *alloc_heap_pages(
>  
>              /* Find smallest order which can satisfy the request. */
>              for ( j = order; j <= MAX_ORDER; j++ )
> +            {
>                  if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
> -                    goto found;
> +                {
> +                    if ( (order == 0) || use_unscrubbed ||

Why is order-0 a special case here?


> @@ -832,6 +864,20 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    if ( need_scrub )
> +    {
> +        for ( i = 0; i < (1 << order); i++ )
> +        {
> +            if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +            {
> +                scrub_one_page(&pg[i]);
> +                pg[i].count_info &= ~PGC_need_scrub;

This is pointless, which will become more obvious once you move
this loop body into ...

> +                node_need_scrub[node]--;
> +            }
> +        }
> +        pg->u.free.dirty_head = false;
> +    }
> +
>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */

... the already existing loop here. The "pointless" part is because
of

        pg[i].count_info = PGC_state_inuse;

but of course you'd need to adjust the immediately preceding
BUG_ON().

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-05-04 10:17   ` Jan Beulich
@ 2017-05-04 14:53     ` Boris Ostrovsky
  2017-05-04 15:00       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-04 14:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/04/2017 06:17 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> @@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
>>      }
>>  }
>>  
>> +/* Pages that need scrub are added to tail, otherwise to head. */
>> +static void page_list_add_scrub(struct page_info *pg, unsigned int node,
>> +                                unsigned int zone, unsigned int order,
>> +                                bool need_scrub)
>> +{
>> +    PFN_ORDER(pg) = order;
>> +    pg->u.free.dirty_head = need_scrub;
>> +
>> +    if ( need_scrub )
>> +        page_list_add_tail(pg, &heap(node, zone, order));
>> +    else
>> +        page_list_add(pg, &heap(node, zone, order));
>> +}
>> +
>>  /* Allocate 2^@order contiguous pages. */
>>  static struct page_info *alloc_heap_pages(
>>      unsigned int zone_lo, unsigned int zone_hi,
>> @@ -802,7 +818,7 @@ static struct page_info *alloc_heap_pages(
>>      while ( j != order )
>>      {
>>          PFN_ORDER(pg) = --j;
>> -        page_list_add_tail(pg, &heap(node, zone, j));
>> +        page_list_add(pg, &heap(node, zone, j));
> Don't you need to replicate pg->u.free.dirty_head (and hence use
> page_list_add_scrub()) here too?

I don't need to because we are still scrubbing from free_heap_pages()
and not from idle loop, thus we never have dirty pages in the heap. Next
patch will, in fact, start using page_list_add_scrub() here.

However, I can switch to it here for consistency.


>
>> +static void scrub_free_pages(unsigned int node)
>> +{
>> +    struct page_info *pg;
>> +    unsigned int zone, order;
>> +    unsigned long i;
> Here I would similarly appreciate if the local variables were moved
> into the scopes they're actually needed in.
>
>> +    ASSERT(spin_is_locked(&heap_lock));
>> +
>> +    if ( !node_need_scrub[node] )
>> +        return;
>> +
>> +    for ( zone = 0; zone < NR_ZONES; zone++ )
>> +    {
>> +        order = MAX_ORDER;
>> +        do {
>> +            while ( !page_list_empty(&heap(node, zone, order)) )
>> +            {
>> +                /* Unscrubbed pages are always at the end of the list. */
>> +                pg = page_list_last(&heap(node, zone, order));
>> +                if ( !pg->u.free.dirty_head )
>> +                    break;
>> +
>> +                for ( i = 0; i < (1UL << order); i++)
>> +                {
>> +                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +                    {
>> +                        scrub_one_page(&pg[i]);
>> +                        pg[i].count_info &= ~PGC_need_scrub;
>> +                        node_need_scrub[node]--;
>> +                    }
>> +                }
>> +
>> +                page_list_del(pg, &heap(node, zone, order));
>> +                merge_and_free_buddy(pg, node, zone, order, false);
> Is there actually any merging involved here, i.e. can't you
> simply put back the buddy at the list head?


Yes, I can. I just wanted pages to enter heap only via
merge_and_free_buddy().

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-05-04 14:53     ` Boris Ostrovsky
@ 2017-05-04 15:00       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 15:00 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 04.05.17 at 16:53, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 06:17 AM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
>>>      }
>>>  }
>>>  
>>> +/* Pages that need scrub are added to tail, otherwise to head. */
>>> +static void page_list_add_scrub(struct page_info *pg, unsigned int node,
>>> +                                unsigned int zone, unsigned int order,
>>> +                                bool need_scrub)
>>> +{
>>> +    PFN_ORDER(pg) = order;
>>> +    pg->u.free.dirty_head = need_scrub;
>>> +
>>> +    if ( need_scrub )
>>> +        page_list_add_tail(pg, &heap(node, zone, order));
>>> +    else
>>> +        page_list_add(pg, &heap(node, zone, order));
>>> +}
>>> +
>>>  /* Allocate 2^@order contiguous pages. */
>>>  static struct page_info *alloc_heap_pages(
>>>      unsigned int zone_lo, unsigned int zone_hi,
>>> @@ -802,7 +818,7 @@ static struct page_info *alloc_heap_pages(
>>>      while ( j != order )
>>>      {
>>>          PFN_ORDER(pg) = --j;
>>> -        page_list_add_tail(pg, &heap(node, zone, j));
>>> +        page_list_add(pg, &heap(node, zone, j));
>> Don't you need to replicate pg->u.free.dirty_head (and hence use
>> page_list_add_scrub()) here too?
> 
> I don't need to because we are still scrubbing from free_heap_pages()
> and not from idle loop, thus we never have dirty pages in the heap. Next
> patch will, in fact, start using page_list_add_scrub() here.
> 
> However, I can switch to it here for consistency.

Please do.

>>> +static void scrub_free_pages(unsigned int node)
>>> +{
>>> +    struct page_info *pg;
>>> +    unsigned int zone, order;
>>> +    unsigned long i;
>> Here I would similarly appreciate if the local variables were moved
>> into the scopes they're actually needed in.
>>
>>> +    ASSERT(spin_is_locked(&heap_lock));
>>> +
>>> +    if ( !node_need_scrub[node] )
>>> +        return;
>>> +
>>> +    for ( zone = 0; zone < NR_ZONES; zone++ )
>>> +    {
>>> +        order = MAX_ORDER;
>>> +        do {
>>> +            while ( !page_list_empty(&heap(node, zone, order)) )
>>> +            {
>>> +                /* Unscrubbed pages are always at the end of the list. */
>>> +                pg = page_list_last(&heap(node, zone, order));
>>> +                if ( !pg->u.free.dirty_head )
>>> +                    break;
>>> +
>>> +                for ( i = 0; i < (1UL << order); i++)
>>> +                {
>>> +                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +                    {
>>> +                        scrub_one_page(&pg[i]);
>>> +                        pg[i].count_info &= ~PGC_need_scrub;
>>> +                        node_need_scrub[node]--;
>>> +                    }
>>> +                }
>>> +
>>> +                page_list_del(pg, &heap(node, zone, order));
>>> +                merge_and_free_buddy(pg, node, zone, order, false);
>> Is there actually any merging involved here, i.e. can't you
>> simply put back the buddy at the list head?
> 
> 
> Yes, I can. I just wanted pages to enter heap only via
> merge_and_free_buddy().

I don't see why a page_list_del() couldn't be paired with a
page_list_add() here. Otherwise, just like it happened to me,
readers may wonder why this more involved function is being
called here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed
  2017-05-04 14:44   ` Jan Beulich
@ 2017-05-04 15:04     ` Boris Ostrovsky
  2017-05-04 15:36       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-04 15:04 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/04/2017 10:44 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> When allocating pages in alloc_heap_pages() first look for clean pages.
> As expressed before, there are cases when we don't really need
> scrubbed pages. Hence the local variable "use_unscrubbed" below
> should really be some form of input to alloc_heap_pages().

That would be alloc_xenheap_pages() only, in which case can I just
initialize the still local use_unscrubbed
as
    use_unscrubbed = (zone_lo == MEMZONE_XEN)

Or do you prefer this to be explicit?

>
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -700,34 +700,17 @@ static struct page_info *alloc_heap_pages(
>>      unsigned int order, unsigned int memflags,
>>      struct domain *d)
>>  {
>> -    unsigned int i, j, zone = 0, nodemask_retry = 0;
>> -    nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
>> +    unsigned int i, j, zone, nodemask_retry;
>> +    nodeid_t first_node, node, req_node;
>>      unsigned long request = 1UL << order;
>>      struct page_info *pg;
>> -    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
>> -    bool_t need_tlbflush = 0;
>> +    nodemask_t nodemask;
>> +    bool need_scrub, need_tlbflush = false, use_unscrubbed = false;
>>      uint32_t tlbflush_timestamp = 0;
>>  
>>      /* Make sure there are enough bits in memflags for nodeID. */
>>      BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
>>  
>> -    if ( node == NUMA_NO_NODE )
>> -    {
>> -        if ( d != NULL )
>> -        {
>> -            node = next_node(d->last_alloc_node, nodemask);
>> -            if ( node >= MAX_NUMNODES )
>> -                node = first_node(nodemask);
>> -        }
>> -        if ( node >= MAX_NUMNODES )
>> -            node = cpu_to_node(smp_processor_id());
>> -    }
>> -    first_node = node;
>> -
>> -    ASSERT(node < MAX_NUMNODES);
>> -    ASSERT(zone_lo <= zone_hi);
>> -    ASSERT(zone_hi < NR_ZONES);
> The last two can remain where they are (but see also below).
>
>> @@ -754,6 +740,28 @@ static struct page_info *alloc_heap_pages(
>>           tmem_freeable_pages() )
>>          goto try_tmem;
>>  
>> + again:
> Is there any hope to get away without such an ugly pseudo loop?
> E.g. by making this function a helper function of a relatively thin
> wrapper named alloc_heap_pages(), invoking this helper twice
> unless use_unscrubbed is true (as said, this ought to be an input)?


OK, let me see if I can beautify this.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-05-04 15:31   ` Jan Beulich
  2017-05-04 17:09     ` Boris Ostrovsky
  2017-05-11 10:26   ` Dario Faggioli
  1 sibling, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 15:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
>      return pg;
>  }
>  
> -static void scrub_free_pages(unsigned int node)
> +static nodemask_t node_scrubbing;
> +
> +static unsigned int node_to_scrub(bool get_node)
> +{
> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
> +    nodeid_t closest = NUMA_NO_NODE;
> +    u8 dist, shortest = 0xff;
> +
> +    if ( node == NUMA_NO_NODE )
> +        node = 0;
> +
> +    if ( node_need_scrub[node] &&
> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
> +        return node;
> +
> +    /*
> +     * See if there are memory-only nodes that need scrubbing and choose
> +     * the closest one.
> +     */
> +    local_node = node;
> +    while ( 1 )
> +    {
> +        do {
> +            node = cycle_node(node, node_online_map);
> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
> +                  (node != local_node) );
> +
> +        if ( node == local_node )
> +            break;
> +
> +        if ( node_need_scrub[node] )
> +        {
> +            if ( !get_node )
> +                return node;

I think the function parameter name is not / no longer suitable. The
caller wants to get _some_ node in either case. The difference is
whether it wants to just know whether there's _any_ needing scrub
work done, or whether it wants _the one_ to actually scrub on. So
how about "get_any" or "get_any_node" or just "any"?

> +            if ( !node_test_and_set(node, node_scrubbing) )
> +            {
> +                dist = __node_distance(local_node, node);
> +                if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) )
> +                {
> +                    /* Release previous node. */
> +                    if ( closest != NUMA_NO_NODE )
> +                        node_clear(closest, node_scrubbing);
> +                    shortest = dist;
> +                    closest = node;
> +                }
> +                else
> +                    node_clear(node, node_scrubbing);
> +            }

Wouldn't it be better to check distance before setting the bit in
node_scrubbing, avoiding the possible need to clear it again later
(potentially misguiding other CPUs)?

And why would NUMA_NO_DISTANCE lead to a switch of nodes?
I can see that being needed when closest == NUMA_NO_NODE,
but once you've picked one I think you should switch only when
you've found another that's truly closer.

> +        }
> +    }
> +
> +    return closest;
> +}
> +
> +bool scrub_free_pages(void)
>  {
>      struct page_info *pg;
>      unsigned int zone, order;
>      unsigned long i;
> +    unsigned int cpu = smp_processor_id();
> +    bool preempt = false;
> +    nodeid_t node;
>  
> -    ASSERT(spin_is_locked(&heap_lock));
> +    /*
> +     * Don't scrub while dom0 is being constructed since we may
> +     * fail trying to call map_domain_page() from scrub_one_page().
> +     */
> +    if ( system_state < SYS_STATE_active )
> +        return false;

I assume that's because of the mapcache vcpu override? That's x86
specific though, so the restriction here ought to be arch specific.
Even better would be to find a way to avoid this restriction
altogether, as on bigger systems only one CPU is actually busy
while building Dom0, so all others could be happily scrubbing. Could
that override become a per-CPU one perhaps?

Otoh there's not much to scrub yet until Dom0 had all its memory
allocated, and we know which pages truly remain free (wanting
what is currently the boot time scrubbing done on them). But that
point in time may still be earlier than when we switch to
SYS_STATE_active.

> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>                          pg[i].count_info &= ~PGC_need_scrub;
>                          node_need_scrub[node]--;
>                      }
> +                    if ( softirq_pending(cpu) )
> +                    {
> +                        preempt = true;
> +                        break;
> +                    }

Isn't this a little too eager, especially if you didn't have to scrub
the page on this iteration?

> @@ -1141,9 +1220,6 @@ static void free_heap_pages(
>      if ( tainted )
>          reserve_offlined_page(pg);
>  
> -    if ( need_scrub )
> -        scrub_free_pages(node);

I'd expect this eliminates the need for the need_scrub variable.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed
  2017-05-04 15:04     ` Boris Ostrovsky
@ 2017-05-04 15:36       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 15:36 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 04.05.17 at 17:04, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 10:44 AM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> When allocating pages in alloc_heap_pages() first look for clean pages.
>> As expressed before, there are cases when we don't really need
>> scrubbed pages. Hence the local variable "use_unscrubbed" below
>> should really be some form of input to alloc_heap_pages().
> 
> That would be alloc_xenheap_pages() only, in which case can I just
> initialize the still local use_unscrubbed
> as
>     use_unscrubbed = (zone_lo == MEMZONE_XEN)
> 
> Or do you prefer this to be explicit?

I'd be fine with this implicit variant if it was correct, but I think it's
correct only for the CONFIG_SEPARATE_XENHEAP case.
Furthermore alloc_domheap_pages(NULL, ...) also doesn't
require scrubbed pages to be returned.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-04-14 15:37 ` [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending Boris Ostrovsky
@ 2017-05-04 15:43   ` Jan Beulich
  2017-05-04 17:18     ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 15:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> While scrubbing from idle loop, check for softirqs every 256 pages.
> If softirq is pending, don't scrub any further and merge the
> partially-scrubbed buddy back into heap by breaking the clean portion
> into smaller power-of-2 chunks. Then repeat the same process for the
> dirty part.

This is ugly, as it gets us back into the state where full merge
opportunities aren't being realized, just that the time window
may be smaller now. As hinted at before, is there no way to
flag the first page needing scrubbing alongside the head
indicating that _some_ page needs scrubbing? The pages are
all free, so if there's no other suitable storage, the head page
itself could serve as such. But instead of a flag in struct
page_info, perhaps you could store a (relatively small) integer?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing
  2017-04-14 15:37 ` [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing Boris Ostrovsky
@ 2017-05-04 16:03   ` Jan Beulich
  2017-05-04 17:26     ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-04 16:03 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> Instead of scrubbing pages while holding heap lock we can mark
> buddy's head as being scrubbed and drop the lock temporarily.
> If someone (most likely alloc_heap_pages()) tries to access
> this chunk it will signal the scrubber to abort scrub by setting
> head's PAGE_SCRUB_ABORT bit. The scrubber checks this bit after
> processing each page and stops its work as soon as it sees it.

Explanation sounds fine, but didn't you earlier indicate you think
yourself that the title is not really suitable (anymore)? I'm of
that opinion, at least, as pages are always available now, it's just
that the latency to get hold of the heap lock is higher before this
change than what we want it to be.

Looking at the patch itself is not overly useful as it seems without
the disposition of patch 5 being clear.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-04 15:31   ` Jan Beulich
@ 2017-05-04 17:09     ` Boris Ostrovsky
  2017-05-05 10:21       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-04 17:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/04/2017 11:31 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
>>      return pg;
>>  }
>>  
>> -static void scrub_free_pages(unsigned int node)
>> +static nodemask_t node_scrubbing;
>> +
>> +static unsigned int node_to_scrub(bool get_node)
>> +{
>> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
>> +    nodeid_t closest = NUMA_NO_NODE;
>> +    u8 dist, shortest = 0xff;
>> +
>> +    if ( node == NUMA_NO_NODE )
>> +        node = 0;
>> +
>> +    if ( node_need_scrub[node] &&
>> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
>> +        return node;
>> +
>> +    /*
>> +     * See if there are memory-only nodes that need scrubbing and choose
>> +     * the closest one.
>> +     */
>> +    local_node = node;
>> +    while ( 1 )
>> +    {
>> +        do {
>> +            node = cycle_node(node, node_online_map);
>> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>> +                  (node != local_node) );
>> +
>> +        if ( node == local_node )
>> +            break;
>> +
>> +        if ( node_need_scrub[node] )
>> +        {
>> +            if ( !get_node )
>> +                return node;
> I think the function parameter name is not / no longer suitable. The
> caller wants to get _some_ node in either case. The difference is
> whether it wants to just know whether there's _any_ needing scrub
> work done, or whether it wants _the one_ to actually scrub on. So
> how about "get_any" or "get_any_node" or just "any"?


Not only to find out whether there is anything to scrub but, if get_node
is true, to actually "get" it, i.e. set the bit in the node_scrubbing
mask. Thus the name.

>
>> +            if ( !node_test_and_set(node, node_scrubbing) )
>> +            {
>> +                dist = __node_distance(local_node, node);
>> +                if ( (dist < shortest) || (dist == NUMA_NO_DISTANCE) )
>> +                {
>> +                    /* Release previous node. */
>> +                    if ( closest != NUMA_NO_NODE )
>> +                        node_clear(closest, node_scrubbing);
>> +                    shortest = dist;
>> +                    closest = node;
>> +                }
>> +                else
>> +                    node_clear(node, node_scrubbing);
>> +            }
> Wouldn't it be better to check distance before setting the bit in
> node_scrubbing, avoiding the possible need to clear it again later
> (potentially misguiding other CPUs)?

Yes.

>
> And why would NUMA_NO_DISTANCE lead to a switch of nodes?
> I can see that being needed when closest == NUMA_NO_NODE,
> but once you've picked one I think you should switch only when
> you've found another that's truly closer.

OK --- yes, this was indeed done to "get started" (i.e. get first valid
value for 'closest').

>
>> +        }
>> +    }
>> +
>> +    return closest;
>> +}
>> +
>> +bool scrub_free_pages(void)
>>  {
>>      struct page_info *pg;
>>      unsigned int zone, order;
>>      unsigned long i;
>> +    unsigned int cpu = smp_processor_id();
>> +    bool preempt = false;
>> +    nodeid_t node;
>>  
>> -    ASSERT(spin_is_locked(&heap_lock));
>> +    /*
>> +     * Don't scrub while dom0 is being constructed since we may
>> +     * fail trying to call map_domain_page() from scrub_one_page().
>> +     */
>> +    if ( system_state < SYS_STATE_active )
>> +        return false;
> I assume that's because of the mapcache vcpu override? That's x86
> specific though, so the restriction here ought to be arch specific.
> Even better would be to find a way to avoid this restriction
> altogether, as on bigger systems only one CPU is actually busy
> while building Dom0, so all others could be happily scrubbing. Could
> that override become a per-CPU one perhaps?

Is it worth doing though? What you are saying below is exactly why I
simply return here --- there were very few dirty pages. This may change
if we decide to use idle-loop scrubbing for boot scrubbing as well (as
Andrew suggested earlier) but there is little reason to do it now IMO.

> Otoh there's not much to scrub yet until Dom0 had all its memory
> allocated, and we know which pages truly remain free (wanting
> what is currently the boot time scrubbing done on them). But that
> point in time may still be earlier than when we switch to
> SYS_STATE_active.
>
>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>>                          pg[i].count_info &= ~PGC_need_scrub;
>>                          node_need_scrub[node]--;
>>                      }
>> +                    if ( softirq_pending(cpu) )
>> +                    {
>> +                        preempt = true;
>> +                        break;
>> +                    }
> Isn't this a little too eager, especially if you didn't have to scrub
> the page on this iteration?

What would be a good place then? Count how actually scrubbed pages and
check for pending interrupts every so many?  Even if we don't scrub at
all walking whole heap can take a while.


>
>> @@ -1141,9 +1220,6 @@ static void free_heap_pages(
>>      if ( tainted )
>>          reserve_offlined_page(pg);
>>  
>> -    if ( need_scrub )
>> -        scrub_free_pages(node);
> I'd expect this eliminates the need for the need_scrub variable.

We still need it to decide whether to set PGC_need_scrub on pages.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-05-04 15:43   ` Jan Beulich
@ 2017-05-04 17:18     ` Boris Ostrovsky
  2017-05-05 10:27       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-04 17:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> While scrubbing from idle loop, check for softirqs every 256 pages.
>> If softirq is pending, don't scrub any further and merge the
>> partially-scrubbed buddy back into heap by breaking the clean portion
>> into smaller power-of-2 chunks. Then repeat the same process for the
>> dirty part.
> This is ugly, as it gets us back into the state where full merge
> opportunities aren't being realized, just that the time window
> may be smaller now. As hinted at before, is there no way to
> flag the first page needing scrubbing alongside the head
> indicating that _some_ page needs scrubbing? The pages are
> all free, so if there's no other suitable storage, the head page
> itself could serve as such. But instead of a flag in struct
> page_info, perhaps you could store a (relatively small) integer?

How will it help? Even if we know what the fist dirty page is we still
may have to drop scrubbing if irq is pending. We simply will not have to
scan the buddy until first dirty page is found.

Or perhaps I don't understand what you are suggesting.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing
  2017-05-04 16:03   ` Jan Beulich
@ 2017-05-04 17:26     ` Boris Ostrovsky
  2017-05-05 10:28       ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-04 17:26 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/04/2017 12:03 PM, Jan Beulich wrote:
>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>> Instead of scrubbing pages while holding heap lock we can mark
>> buddy's head as being scrubbed and drop the lock temporarily.
>> If someone (most likely alloc_heap_pages()) tries to access
>> this chunk it will signal the scrubber to abort scrub by setting
>> head's PAGE_SCRUB_ABORT bit. The scrubber checks this bit after
>> processing each page and stops its work as soon as it sees it.
> Explanation sounds fine, but didn't you earlier indicate you think
> yourself that the title is not really suitable (anymore)? I'm of
> that opinion, at least, as pages are always available now, it's just
> that the latency to get hold of the heap lock is higher before this
> change than what we want it to be.

Uhm... Yes, that's what it meant to say ;-)

Should have been "Keep *heap* available to others while scrubbing"

-boris

>
> Looking at the patch itself is not overly useful as it seems without
> the disposition of patch 5 being clear.
>
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-04 17:09     ` Boris Ostrovsky
@ 2017-05-05 10:21       ` Jan Beulich
  2017-05-05 13:42         ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 10:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 04.05.17 at 19:09, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 11:31 AM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -1035,16 +1035,82 @@ merge_and_free_buddy(struct page_info *pg, unsigned int node,
>>>      return pg;
>>>  }
>>>  
>>> -static void scrub_free_pages(unsigned int node)
>>> +static nodemask_t node_scrubbing;
>>> +
>>> +static unsigned int node_to_scrub(bool get_node)
>>> +{
>>> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
>>> +    nodeid_t closest = NUMA_NO_NODE;
>>> +    u8 dist, shortest = 0xff;
>>> +
>>> +    if ( node == NUMA_NO_NODE )
>>> +        node = 0;
>>> +
>>> +    if ( node_need_scrub[node] &&
>>> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
>>> +        return node;
>>> +
>>> +    /*
>>> +     * See if there are memory-only nodes that need scrubbing and choose
>>> +     * the closest one.
>>> +     */
>>> +    local_node = node;
>>> +    while ( 1 )
>>> +    {
>>> +        do {
>>> +            node = cycle_node(node, node_online_map);
>>> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
>>> +                  (node != local_node) );
>>> +
>>> +        if ( node == local_node )
>>> +            break;
>>> +
>>> +        if ( node_need_scrub[node] )
>>> +        {
>>> +            if ( !get_node )
>>> +                return node;
>> I think the function parameter name is not / no longer suitable. The
>> caller wants to get _some_ node in either case. The difference is
>> whether it wants to just know whether there's _any_ needing scrub
>> work done, or whether it wants _the one_ to actually scrub on. So
>> how about "get_any" or "get_any_node" or just "any"?
> 
> Not only to find out whether there is anything to scrub but, if get_node
> is true, to actually "get" it, i.e. set the bit in the node_scrubbing
> mask. Thus the name.

Hmm, okay, in that case at least an explanatory comment should be
added.

>>> +bool scrub_free_pages(void)
>>>  {
>>>      struct page_info *pg;
>>>      unsigned int zone, order;
>>>      unsigned long i;
>>> +    unsigned int cpu = smp_processor_id();
>>> +    bool preempt = false;
>>> +    nodeid_t node;
>>>  
>>> -    ASSERT(spin_is_locked(&heap_lock));
>>> +    /*
>>> +     * Don't scrub while dom0 is being constructed since we may
>>> +     * fail trying to call map_domain_page() from scrub_one_page().
>>> +     */
>>> +    if ( system_state < SYS_STATE_active )
>>> +        return false;
>> I assume that's because of the mapcache vcpu override? That's x86
>> specific though, so the restriction here ought to be arch specific.
>> Even better would be to find a way to avoid this restriction
>> altogether, as on bigger systems only one CPU is actually busy
>> while building Dom0, so all others could be happily scrubbing. Could
>> that override become a per-CPU one perhaps?
> 
> Is it worth doing though? What you are saying below is exactly why I
> simply return here --- there were very few dirty pages.

Well, in that case the comment should cover this second reason as
well, at the very least.

> This may change
> if we decide to use idle-loop scrubbing for boot scrubbing as well (as
> Andrew suggested earlier) but there is little reason to do it now IMO.

Why not? In fact I had meant to ask why your series doesn't
include that?

>> Otoh there's not much to scrub yet until Dom0 had all its memory
>> allocated, and we know which pages truly remain free (wanting
>> what is currently the boot time scrubbing done on them). But that
>> point in time may still be earlier than when we switch to
>> SYS_STATE_active.

IOW I think boot scrubbing could be kicked off as soon as Dom0
had the bulk of its memory allocated.

>>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>>>                          pg[i].count_info &= ~PGC_need_scrub;
>>>                          node_need_scrub[node]--;
>>>                      }
>>> +                    if ( softirq_pending(cpu) )
>>> +                    {
>>> +                        preempt = true;
>>> +                        break;
>>> +                    }
>> Isn't this a little too eager, especially if you didn't have to scrub
>> the page on this iteration?
> 
> What would be a good place then? Count how actually scrubbed pages and
> check for pending interrupts every so many?

Yes.

> Even if we don't scrub at all walking whole heap can take a while.

Correct - you can't skip this check altogether even if no page
requires actual scrubbing.

>>> @@ -1141,9 +1220,6 @@ static void free_heap_pages(
>>>      if ( tainted )
>>>          reserve_offlined_page(pg);
>>>  
>>> -    if ( need_scrub )
>>> -        scrub_free_pages(node);
>> I'd expect this eliminates the need for the need_scrub variable.
> 
> We still need it to decide whether to set PGC_need_scrub on pages.

Oh, of course.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-05-04 17:18     ` Boris Ostrovsky
@ 2017-05-05 10:27       ` Jan Beulich
  2017-05-05 13:51         ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 10:27 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>> If softirq is pending, don't scrub any further and merge the
>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>> dirty part.
>> This is ugly, as it gets us back into the state where full merge
>> opportunities aren't being realized, just that the time window
>> may be smaller now. As hinted at before, is there no way to
>> flag the first page needing scrubbing alongside the head
>> indicating that _some_ page needs scrubbing? The pages are
>> all free, so if there's no other suitable storage, the head page
>> itself could serve as such. But instead of a flag in struct
>> page_info, perhaps you could store a (relatively small) integer?
> 
> How will it help? Even if we know what the fist dirty page is we still
> may have to drop scrubbing if irq is pending. We simply will not have to
> scan the buddy until first dirty page is found.
> 
> Or perhaps I don't understand what you are suggesting.

If you get a request to stop scrubbing, you split the current
buddy into a clean and a (possibly) dirty part (if I understood
the code in this patch correctly). It's that splitting which I'd like
to see avoided, and by keeping a "first dirty" index (which
would be updated when you stop scrubbing) you could do so.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing
  2017-05-04 17:26     ` Boris Ostrovsky
@ 2017-05-05 10:28       ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 10:28 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 04.05.17 at 19:26, <boris.ostrovsky@oracle.com> wrote:
> On 05/04/2017 12:03 PM, Jan Beulich wrote:
>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>> Instead of scrubbing pages while holding heap lock we can mark
>>> buddy's head as being scrubbed and drop the lock temporarily.
>>> If someone (most likely alloc_heap_pages()) tries to access
>>> this chunk it will signal the scrubber to abort scrub by setting
>>> head's PAGE_SCRUB_ABORT bit. The scrubber checks this bit after
>>> processing each page and stops its work as soon as it sees it.
>> Explanation sounds fine, but didn't you earlier indicate you think
>> yourself that the title is not really suitable (anymore)? I'm of
>> that opinion, at least, as pages are always available now, it's just
>> that the latency to get hold of the heap lock is higher before this
>> change than what we want it to be.
> 
> Uhm... Yes, that's what it meant to say ;-)
> 
> Should have been "Keep *heap* available to others while scrubbing"

And perhaps s/available/accessible/ ?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 10:21       ` Jan Beulich
@ 2017-05-05 13:42         ` Boris Ostrovsky
  2017-05-05 14:10           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-05 13:42 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel


>>>> +bool scrub_free_pages(void)
>>>>  {
>>>>      struct page_info *pg;
>>>>      unsigned int zone, order;
>>>>      unsigned long i;
>>>> +    unsigned int cpu = smp_processor_id();
>>>> +    bool preempt = false;
>>>> +    nodeid_t node;
>>>>  
>>>> -    ASSERT(spin_is_locked(&heap_lock));
>>>> +    /*
>>>> +     * Don't scrub while dom0 is being constructed since we may
>>>> +     * fail trying to call map_domain_page() from scrub_one_page().
>>>> +     */
>>>> +    if ( system_state < SYS_STATE_active )
>>>> +        return false;
>>> I assume that's because of the mapcache vcpu override? That's x86
>>> specific though, so the restriction here ought to be arch specific.
>>> Even better would be to find a way to avoid this restriction
>>> altogether, as on bigger systems only one CPU is actually busy
>>> while building Dom0, so all others could be happily scrubbing. Could
>>> that override become a per-CPU one perhaps?
>> Is it worth doing though? What you are saying below is exactly why I
>> simply return here --- there were very few dirty pages.
> Well, in that case the comment should cover this second reason as
> well, at the very least.
>
>> This may change
>> if we decide to use idle-loop scrubbing for boot scrubbing as well (as
>> Andrew suggested earlier) but there is little reason to do it now IMO.
> Why not? In fact I had meant to ask why your series doesn't
> include that?

This felt to me like a separate feature.


>
>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>> allocated, and we know which pages truly remain free (wanting
>>> what is currently the boot time scrubbing done on them). But that
>>> point in time may still be earlier than when we switch to
>>> SYS_STATE_active.
> IOW I think boot scrubbing could be kicked off as soon as Dom0
> had the bulk of its memory allocated.

Since we only are trying to avoid mapcache vcpu override can't we just
scrub whenever override is NULL (per-cpu or not)?

>
>>>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>>>>                          pg[i].count_info &= ~PGC_need_scrub;
>>>>                          node_need_scrub[node]--;
>>>>                      }
>>>> +                    if ( softirq_pending(cpu) )
>>>> +                    {
>>>> +                        preempt = true;
>>>> +                        break;
>>>> +                    }
>>> Isn't this a little too eager, especially if you didn't have to scrub
>>> the page on this iteration?
>> What would be a good place then? Count how actually scrubbed pages and
>> check for pending interrupts every so many?
> Yes.
>
>> Even if we don't scrub at all walking whole heap can take a while.
> Correct - you can't skip this check altogether even if no page
> requires actual scrubbing.

But then how will counting help --- we want to periodically check even
when count is zero? Once per zone? That still may be too infrequent
since we may have, for example, one dirty page per order and so we are
scanning whole order but bumping the count by only one.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-05-05 10:27       ` Jan Beulich
@ 2017-05-05 13:51         ` Boris Ostrovsky
  2017-05-05 14:13           ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-05 13:51 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/05/2017 06:27 AM, Jan Beulich wrote:
>>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
>> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>>> If softirq is pending, don't scrub any further and merge the
>>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>>> dirty part.
>>> This is ugly, as it gets us back into the state where full merge
>>> opportunities aren't being realized, just that the time window
>>> may be smaller now. As hinted at before, is there no way to
>>> flag the first page needing scrubbing alongside the head
>>> indicating that _some_ page needs scrubbing? The pages are
>>> all free, so if there's no other suitable storage, the head page
>>> itself could serve as such. But instead of a flag in struct
>>> page_info, perhaps you could store a (relatively small) integer?
>> How will it help? Even if we know what the fist dirty page is we still
>> may have to drop scrubbing if irq is pending. We simply will not have to
>> scan the buddy until first dirty page is found.
>>
>> Or perhaps I don't understand what you are suggesting.
> If you get a request to stop scrubbing, you split the current
> buddy into a clean and a (possibly) dirty part (if I understood
> the code in this patch correctly). 

Right.

> It's that splitting which I'd like
> to see avoided, and by keeping a "first dirty" index (which
> would be updated when you stop scrubbing) you could do so.


If we are to avoid splitting then we might just keep the buddy on the
"dirty end" (i.e tail) of the order if we are interrupted in the middle.

Having "first_dirty" (which is really "maybe_first_dirty") would only
help us start next scan faster but it won't really change the algorithm.

(Or maybe that's exactly what you are proposing.)

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 13:42         ` Boris Ostrovsky
@ 2017-05-05 14:10           ` Jan Beulich
  2017-05-05 14:14             ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 14:10 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>> allocated, and we know which pages truly remain free (wanting
>>>> what is currently the boot time scrubbing done on them). But that
>>>> point in time may still be earlier than when we switch to
>>>> SYS_STATE_active.
>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>> had the bulk of its memory allocated.
> 
> Since we only are trying to avoid mapcache vcpu override can't we just
> scrub whenever override is NULL (per-cpu or not)?

But how do you know? The variable should remain static in
domain_page.c, so I think we'd instead need a notification to
the scrubber when it gets set back to NULL.

>>>>> @@ -1065,16 +1131,29 @@ static void scrub_free_pages(unsigned int node)
>>>>>                          pg[i].count_info &= ~PGC_need_scrub;
>>>>>                          node_need_scrub[node]--;
>>>>>                      }
>>>>> +                    if ( softirq_pending(cpu) )
>>>>> +                    {
>>>>> +                        preempt = true;
>>>>> +                        break;
>>>>> +                    }
>>>> Isn't this a little too eager, especially if you didn't have to scrub
>>>> the page on this iteration?
>>> What would be a good place then? Count how actually scrubbed pages and
>>> check for pending interrupts every so many?
>> Yes.
>>
>>> Even if we don't scrub at all walking whole heap can take a while.
>> Correct - you can't skip this check altogether even if no page
>> requires actual scrubbing.
> 
> But then how will counting help --- we want to periodically check even
> when count is zero? Once per zone? That still may be too infrequent
> since we may have, for example, one dirty page per order and so we are
> scanning whole order but bumping the count by only one.

Oh, I didn't read your proposal properly. Of course you'd need
to count loop iterations, not pages scrubbed. Presumably you'd
want to increment the counter by one when not scrubbing the
page, and by a larger value when scrubbing is needed.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending
  2017-05-05 13:51         ` Boris Ostrovsky
@ 2017-05-05 14:13           ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 14:13 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 15:51, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2017 06:27 AM, Jan Beulich wrote:
>>>>> On 04.05.17 at 19:18, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/04/2017 11:43 AM, Jan Beulich wrote:
>>>>>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
>>>>> While scrubbing from idle loop, check for softirqs every 256 pages.
>>>>> If softirq is pending, don't scrub any further and merge the
>>>>> partially-scrubbed buddy back into heap by breaking the clean portion
>>>>> into smaller power-of-2 chunks. Then repeat the same process for the
>>>>> dirty part.
>>>> This is ugly, as it gets us back into the state where full merge
>>>> opportunities aren't being realized, just that the time window
>>>> may be smaller now. As hinted at before, is there no way to
>>>> flag the first page needing scrubbing alongside the head
>>>> indicating that _some_ page needs scrubbing? The pages are
>>>> all free, so if there's no other suitable storage, the head page
>>>> itself could serve as such. But instead of a flag in struct
>>>> page_info, perhaps you could store a (relatively small) integer?
>>> How will it help? Even if we know what the fist dirty page is we still
>>> may have to drop scrubbing if irq is pending. We simply will not have to
>>> scan the buddy until first dirty page is found.
>>>
>>> Or perhaps I don't understand what you are suggesting.
>> If you get a request to stop scrubbing, you split the current
>> buddy into a clean and a (possibly) dirty part (if I understood
>> the code in this patch correctly). 
> 
> Right.
> 
>> It's that splitting which I'd like
>> to see avoided, and by keeping a "first dirty" index (which
>> would be updated when you stop scrubbing) you could do so.
> 
> 
> If we are to avoid splitting then we might just keep the buddy on the
> "dirty end" (i.e tail) of the order if we are interrupted in the middle.
> 
> Having "first_dirty" (which is really "maybe_first_dirty") would only
> help us start next scan faster but it won't really change the algorithm.
> 
> (Or maybe that's exactly what you are proposing.)

Yes - using such a hint means better chances of actually making
overall progress despite not splitting buddies.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 14:10           ` Jan Beulich
@ 2017-05-05 14:14             ` Jan Beulich
  2017-05-05 14:27               ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 14:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>> allocated, and we know which pages truly remain free (wanting
>>>>> what is currently the boot time scrubbing done on them). But that
>>>>> point in time may still be earlier than when we switch to
>>>>> SYS_STATE_active.
>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>> had the bulk of its memory allocated.
>> 
>> Since we only are trying to avoid mapcache vcpu override can't we just
>> scrub whenever override is NULL (per-cpu or not)?
> 
> But how do you know? The variable should remain static in
> domain_page.c, so I think we'd instead need a notification to
> the scrubber when it gets set back to NULL.

And of course if the override variable was per-CPU, you wouldn't
have a need to know, as you wouldn't try scrubbing on the CPU
doing the Dom0 setup.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 14:14             ` Jan Beulich
@ 2017-05-05 14:27               ` Boris Ostrovsky
  2017-05-05 14:51                 ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-05 14:27 UTC (permalink / raw)
  To: Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim,
	xen-devel, andrew.cooper3

On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>> point in time may still be earlier than when we switch to
>>>>>> SYS_STATE_active.
>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>> had the bulk of its memory allocated.
>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>> scrub whenever override is NULL (per-cpu or not)?
>> But how do you know? The variable should remain static in
>> domain_page.c, so I think we'd instead need a notification to
>> the scrubber when it gets set back to NULL.

Why not just have in domain_page.c

bool mapcache_override() {return override != NULL;}

(or appropriate per-cpu variant)?

> And of course if the override variable was per-CPU, you wouldn't
> have a need to know, as you wouldn't try scrubbing on the CPU
> doing the Dom0 setup.

Actually, won't (override != current) do the trick?

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 14:27               ` Boris Ostrovsky
@ 2017-05-05 14:51                 ` Jan Beulich
  2017-05-05 15:23                   ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 14:51 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 16:27, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>>> point in time may still be earlier than when we switch to
>>>>>>> SYS_STATE_active.
>>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>>> had the bulk of its memory allocated.
>>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>>> scrub whenever override is NULL (per-cpu or not)?
>>> But how do you know? The variable should remain static in
>>> domain_page.c, so I think we'd instead need a notification to
>>> the scrubber when it gets set back to NULL.
> 
> Why not just have in domain_page.c
> 
> bool mapcache_override() {return override != NULL;}
> 
> (or appropriate per-cpu variant)?

And you would mean to permanently poll this?

>> And of course if the override variable was per-CPU, you wouldn't
>> have a need to know, as you wouldn't try scrubbing on the CPU
>> doing the Dom0 setup.
> 
> Actually, won't (override != current) do the trick?

This is like saying "If I don't get a car, it should be a red one." If
you don't need to know, how can it matter what expression to use?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 9/9] mm: Make sure pages are scrubbed
  2017-04-14 15:37 ` [PATCH v3 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
@ 2017-05-05 15:05   ` Jan Beulich
  2017-05-08 15:48     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 15:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> --- a/xen/Kconfig.debug
> +++ b/xen/Kconfig.debug
> @@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
>  	  logged in the Xen ring buffer.
>  	  If unsure, say N here.
>  
> +config SCRUB_DEBUG
> +    bool "Page scrubbing test"
> +    default DEBUG
> +    ---help---
> +      Verify that pages that need to be scrubbed before being allocated to
> +      a guest are indeed scrubbed.

Indentation.

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -694,6 +694,31 @@ static void page_list_add_scrub(struct page_info *pg, 
> unsigned int node,
>          page_list_add(pg, &heap(node, zone, order));
>  }
>  
> +#define SCRUB_BYTE_PATTERN 0xc2c2c2c2c2c2c2c2

This likely needs a ULL suffix at least for the ARM32 build.

> @@ -912,6 +937,11 @@ static struct page_info *alloc_heap_pages(
>           * guest can control its own visibility of/through the cache.
>           */
>          flush_page_to_ram(page_to_mfn(&pg[i]));
> +
> +#ifdef CONFIG_SCRUB_DEBUG
> +        if ( d && !is_idle_domain(d) )
> +            check_one_page(&pg[i]);
> +#endif

I don't see the need for the is_idle_domain() check.

Also it would be nice for readability if the #ifdef-s were limited to
the function definitions - if their bodies are empty, I'd expect the
compiler to eliminate altogether both the if() here and ...

> @@ -1341,6 +1371,11 @@ static void free_heap_pages(
>              pg[i].count_info |= PGC_need_scrub;
>  
>          node_need_scrub[node] += (1UL << order);
> +
> +#ifdef CONFIG_SCRUB_DEBUG
> +        for ( i = 0; i < (1 << order); i++ )
> +            poison_one_page(&pg[i]);
> +#endif

... the loop here.

> @@ -1637,6 +1672,14 @@ static void init_heap_pages(
>              nr_pages -= n;
>          }
>  
> +#ifdef CONFIG_SCRUB_DEBUG
> +        /*
> +         * These pages get into heap and are allocated to dom0 before
> +         * boot scrub happens.
> +         * Not scrubbing them here will cause failure in check_one_page().
> +         */
> +        scrub_one_page(pg + i);
> +#endif

Isn't this introducing unacceptable overhead on bigger systems,
the more that this is then redundant with boot time scrubbing? To
me, such overhead would be discouraging to ever enable this
config option...

> @@ -2170,6 +2213,9 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>              {
>                  BUG_ON((pg[i].u.inuse.type_info & PGT_count_mask) != 0);
>                  arch_free_heap_page(d, &pg[i]);
> +#ifdef CONFIG_SCRUB_DEBUG
> +                scrub_one_page(&pg[i]);
> +#endif

This too looks to be redundant in the important case of reclaiming
memory from a dying domain (i.e. when scrubbing is to happen
anyway).

> @@ -2273,7 +2319,8 @@ void scrub_one_page(struct page_info *pg)
>  
>  #ifndef NDEBUG
>      /* Avoid callers relying on allocations returning zeroed pages. */
> -    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
> +    unmap_domain_page(memset(__map_domain_page(pg),
> +                             SCRUB_BYTE_PATTERN & 0xff, PAGE_SIZE));

This makes it a requirement for SCRUB_BYTE_PATTERN to consist of
all identical bytes. There should at least be a respective comment
next to its definition; even better would be if that #define made
sure this requirement is fulfilled by producing a suitably wide value
from just an initial 1-byte one.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 14:51                 ` Jan Beulich
@ 2017-05-05 15:23                   ` Boris Ostrovsky
  2017-05-05 16:05                     ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-05 15:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/05/2017 10:51 AM, Jan Beulich wrote:
>>>> On 05.05.17 at 16:27, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>>>> point in time may still be earlier than when we switch to
>>>>>>>> SYS_STATE_active.
>>>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>>>> had the bulk of its memory allocated.
>>>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>>>> scrub whenever override is NULL (per-cpu or not)?
>>>> But how do you know? The variable should remain static in
>>>> domain_page.c, so I think we'd instead need a notification to
>>>> the scrubber when it gets set back to NULL.
>> Why not just have in domain_page.c
>>
>> bool mapcache_override() {return override != NULL;}
>>
>> (or appropriate per-cpu variant)?
> And you would mean to permanently poll this?

We have to permanently poll on/check something. Either it will be local
to page_alloc.c or to domain_page.c.

-boris


>
>>> And of course if the override variable was per-CPU, you wouldn't
>>> have a need to know, as you wouldn't try scrubbing on the CPU
>>> doing the Dom0 setup.
>> Actually, won't (override != current) do the trick?
> This is like saying "If I don't get a car, it should be a red one." If
> you don't need to know, how can it matter what expression to use?
>
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 15:23                   ` Boris Ostrovsky
@ 2017-05-05 16:05                     ` Jan Beulich
  2017-05-05 16:49                       ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Jan Beulich @ 2017-05-05 16:05 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 17:23, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2017 10:51 AM, Jan Beulich wrote:
>>>>> On 05.05.17 at 16:27, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>>>>> point in time may still be earlier than when we switch to
>>>>>>>>> SYS_STATE_active.
>>>>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>>>>> had the bulk of its memory allocated.
>>>>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>>>>> scrub whenever override is NULL (per-cpu or not)?
>>>>> But how do you know? The variable should remain static in
>>>>> domain_page.c, so I think we'd instead need a notification to
>>>>> the scrubber when it gets set back to NULL.
>>> Why not just have in domain_page.c
>>>
>>> bool mapcache_override() {return override != NULL;}
>>>
>>> (or appropriate per-cpu variant)?
>> And you would mean to permanently poll this?
> 
> We have to permanently poll on/check something. Either it will be local
> to page_alloc.c or to domain_page.c.

Why can't this be kicked off right after zapping the override (if we
go that route in the first place; I think the per-cpu override would
be the more seamless approach)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 16:05                     ` Jan Beulich
@ 2017-05-05 16:49                       ` Boris Ostrovsky
  2017-05-08  7:14                         ` Jan Beulich
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-05 16:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 05/05/2017 12:05 PM, Jan Beulich wrote:
>>>> On 05.05.17 at 17:23, <boris.ostrovsky@oracle.com> wrote:
>> On 05/05/2017 10:51 AM, Jan Beulich wrote:
>>>>>> On 05.05.17 at 16:27, <boris.ostrovsky@oracle.com> wrote:
>>>> On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>>>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>>>>>> point in time may still be earlier than when we switch to
>>>>>>>>>> SYS_STATE_active.
>>>>>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>>>>>> had the bulk of its memory allocated.
>>>>>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>>>>>> scrub whenever override is NULL (per-cpu or not)?
>>>>>> But how do you know? The variable should remain static in
>>>>>> domain_page.c, so I think we'd instead need a notification to
>>>>>> the scrubber when it gets set back to NULL.
>>>> Why not just have in domain_page.c
>>>>
>>>> bool mapcache_override() {return override != NULL;}
>>>>
>>>> (or appropriate per-cpu variant)?
>>> And you would mean to permanently poll this?
>> We have to permanently poll on/check something. Either it will be local
>> to page_alloc.c or to domain_page.c.
> Why can't this be kicked off right after zapping the override (if we
> go that route in the first place; I think the per-cpu override would
> be the more seamless approach)?

Either global or per-cpu override will need to be checked by the
scrubber when it is called from the idle loop. Or I don't understand
what you mean by "kicking off".

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-05 16:49                       ` Boris Ostrovsky
@ 2017-05-08  7:14                         ` Jan Beulich
  0 siblings, 0 replies; 51+ messages in thread
From: Jan Beulich @ 2017-05-08  7:14 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 05.05.17 at 18:49, <boris.ostrovsky@oracle.com> wrote:
> On 05/05/2017 12:05 PM, Jan Beulich wrote:
>>>>> On 05.05.17 at 17:23, <boris.ostrovsky@oracle.com> wrote:
>>> On 05/05/2017 10:51 AM, Jan Beulich wrote:
>>>>>>> On 05.05.17 at 16:27, <boris.ostrovsky@oracle.com> wrote:
>>>>> On 05/05/2017 10:14 AM, Jan Beulich wrote:
>>>>>>>>> On 05.05.17 at 16:10, <JBeulich@suse.com> wrote:
>>>>>>>>>> On 05.05.17 at 15:42, <boris.ostrovsky@oracle.com> wrote:
>>>>>>>>>>> Otoh there's not much to scrub yet until Dom0 had all its memory
>>>>>>>>>>> allocated, and we know which pages truly remain free (wanting
>>>>>>>>>>> what is currently the boot time scrubbing done on them). But that
>>>>>>>>>>> point in time may still be earlier than when we switch to
>>>>>>>>>>> SYS_STATE_active.
>>>>>>>>> IOW I think boot scrubbing could be kicked off as soon as Dom0
>>>>>>>>> had the bulk of its memory allocated.
>>>>>>>> Since we only are trying to avoid mapcache vcpu override can't we just
>>>>>>>> scrub whenever override is NULL (per-cpu or not)?
>>>>>>> But how do you know? The variable should remain static in
>>>>>>> domain_page.c, so I think we'd instead need a notification to
>>>>>>> the scrubber when it gets set back to NULL.
>>>>> Why not just have in domain_page.c
>>>>>
>>>>> bool mapcache_override() {return override != NULL;}
>>>>>
>>>>> (or appropriate per-cpu variant)?
>>>> And you would mean to permanently poll this?
>>> We have to permanently poll on/check something. Either it will be local
>>> to page_alloc.c or to domain_page.c.
>> Why can't this be kicked off right after zapping the override (if we
>> go that route in the first place; I think the per-cpu override would
>> be the more seamless approach)?
> 
> Either global or per-cpu override will need to be checked by the
> scrubber when it is called from the idle loop. Or I don't understand
> what you mean by "kicking off".

I think some confusion arises from there being two different
proposals (of which I'd prefer the first):

The first is to make the override per-cpu. No kicking off is needed
in the case afaict, scrubbing can start right away if desired (albeit,
as was said, it may not make much sense prior to Dom0 having had
the bulk of its memory allocated, at least if its share isn't negligible
compared to the overall amount of memory).

The second is to leave the override as is, instead adding a call to
tell the scrubber to start doing its work. No need for any override
check in this case, as the override won't be used anymore from
the time of the mapcache_override_current(NULL) call (it is for
this reason that the kickoff call is to happen after this one).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 9/9] mm: Make sure pages are scrubbed
  2017-05-05 15:05   ` Jan Beulich
@ 2017-05-08 15:48     ` Konrad Rzeszutek Wilk
  2017-05-08 16:23       ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Konrad Rzeszutek Wilk @ 2017-05-08 15:48 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Boris Ostrovsky

On Fri, May 05, 2017 at 09:05:54AM -0600, Jan Beulich wrote:
> >>> On 14.04.17 at 17:37, <boris.ostrovsky@oracle.com> wrote:
> > --- a/xen/Kconfig.debug
> > +++ b/xen/Kconfig.debug
> > @@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
> >  	  logged in the Xen ring buffer.
> >  	  If unsure, say N here.
> >  
> > +config SCRUB_DEBUG
> > +    bool "Page scrubbing test"
> > +    default DEBUG
> > +    ---help---
> > +      Verify that pages that need to be scrubbed before being allocated to
> > +      a guest are indeed scrubbed.
> 
> Indentation.
> 
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -694,6 +694,31 @@ static void page_list_add_scrub(struct page_info *pg, 
> > unsigned int node,
> >          page_list_add(pg, &heap(node, zone, order));
> >  }
> >  
> > +#define SCRUB_BYTE_PATTERN 0xc2c2c2c2c2c2c2c2
> 
> This likely needs a ULL suffix at least for the ARM32 build.

See 3ac7ca1

So 0xe7f000f0 for ARM32 and AARCH64_BREAK_FAULT for ARM64.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 9/9] mm: Make sure pages are scrubbed
  2017-05-08 15:48     ` Konrad Rzeszutek Wilk
@ 2017-05-08 16:23       ` Boris Ostrovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-08 16:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel


>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -694,6 +694,31 @@ static void page_list_add_scrub(struct page_info *pg, 
>>> unsigned int node,
>>>          page_list_add(pg, &heap(node, zone, order));
>>>  }
>>>  
>>> +#define SCRUB_BYTE_PATTERN 0xc2c2c2c2c2c2c2c2
>> This likely needs a ULL suffix at least for the ARM32 build.
> See 3ac7ca1
>
> So 0xe7f000f0 for ARM32 and AARCH64_BREAK_FAULT for ARM64.


That's slightly different though --- unlike free_init_memory() this
pattern is only used when !NDEBUG and it is byte-sized (passed into
memset()).


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-05-04 10:17   ` Jan Beulich
@ 2017-05-08 16:41   ` George Dunlap
  2017-05-08 16:59     ` Boris Ostrovsky
  1 sibling, 1 reply; 51+ messages in thread
From: George Dunlap @ 2017-05-08 16:41 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

On 14/04/17 16:37, Boris Ostrovsky wrote:
> . so that it's easy to find pages that need to be scrubbed (those pages are
> now marked with _PGC_need_scrub bit).
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v3:
> * Keep dirty bit per page, add dirty_head to page_info that indicates whether
>   the buddy has dirty pages.
> * Make page_list_add_scrub() set buddy's page order
> * Data type adjustments (int -> unsigned)
> 
>  xen/common/page_alloc.c  |  119 +++++++++++++++++++++++++++++++++++++--------
>  xen/include/asm-arm/mm.h |    6 ++
>  xen/include/asm-x86/mm.h |    6 ++
>  3 files changed, 110 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 6fe55ee..9dcf6ee 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
>  static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
>  #define heap(node, zone, order) ((*_heap[node])[zone][order])
>  
> +static unsigned long node_need_scrub[MAX_NUMNODES];
> +
>  static unsigned long *avail[MAX_NUMNODES];
>  static long total_avail_pages;
>  
> @@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
>      }
>  }
>  
> +/* Pages that need scrub are added to tail, otherwise to head. */
> +static void page_list_add_scrub(struct page_info *pg, unsigned int node,
> +                                unsigned int zone, unsigned int order,
> +                                bool need_scrub)
> +{
> +    PFN_ORDER(pg) = order;
> +    pg->u.free.dirty_head = need_scrub;
> +
> +    if ( need_scrub )
> +        page_list_add_tail(pg, &heap(node, zone, order));
> +    else
> +        page_list_add(pg, &heap(node, zone, order));
> +}
> +
>  /* Allocate 2^@order contiguous pages. */
>  static struct page_info *alloc_heap_pages(
>      unsigned int zone_lo, unsigned int zone_hi,
> @@ -802,7 +818,7 @@ static struct page_info *alloc_heap_pages(
>      while ( j != order )
>      {
>          PFN_ORDER(pg) = --j;
> -        page_list_add_tail(pg, &heap(node, zone, j));
> +        page_list_add(pg, &heap(node, zone, j));
>          pg += 1 << j;
>      }
>  
> @@ -851,11 +867,14 @@ static int reserve_offlined_page(struct page_info *head)
>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>      struct page_info *cur_head;
>      int cur_order;
> +    bool need_scrub;
>  
>      ASSERT(spin_is_locked(&heap_lock));
>  
>      cur_head = head;
>  
> +    head->u.free.dirty_head = false;
> +
>      page_list_del(head, &heap(node, zone, head_order));
>  
>      while ( cur_head < (head + (1 << head_order)) )
> @@ -892,8 +911,16 @@ static int reserve_offlined_page(struct page_info *head)
>              {
>              merge:
>                  /* We don't consider merging outside the head_order. */
> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
> -                PFN_ORDER(cur_head) = cur_order;
> +
> +                /* See if any of the pages need scrubbing. */
> +                need_scrub = false;
> +                for ( i = 0; i < (1 << cur_order); i++ )
> +                    if ( test_bit(_PGC_need_scrub, &cur_head[i].count_info) )
> +                    {
> +                        need_scrub = true;
> +                        break;
> +                    }
> +                page_list_add_scrub(cur_head, node, zone, cur_order, need_scrub);

This thing with clearing dirty_head, then setting it again in
page_list_add_scrub() could use some explanation -- either near one of
these loops, or in mm.h preferrably.


>                  cur_head += (1 << cur_order);
>                  break;
>              }
> @@ -922,10 +949,13 @@ static int reserve_offlined_page(struct page_info *head)
>  /* Returns new buddy head. */
>  static struct page_info *
>  merge_and_free_buddy(struct page_info *pg, unsigned int node,
> -                     unsigned int zone, unsigned int order)
> +                     unsigned int zone, unsigned int order,
> +                     bool need_scrub)

What is the meaning of "need_scrub" here?  Does this mean that pg needs
to be scrubbed?

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist
  2017-05-08 16:41   ` George Dunlap
@ 2017-05-08 16:59     ` Boris Ostrovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-08 16:59 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


>> @@ -851,11 +867,14 @@ static int reserve_offlined_page(struct page_info *head)
>>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>>      struct page_info *cur_head;
>>      int cur_order;
>> +    bool need_scrub;
>>  
>>      ASSERT(spin_is_locked(&heap_lock));
>>  
>>      cur_head = head;
>>  
>> +    head->u.free.dirty_head = false;
>> +
>>      page_list_del(head, &heap(node, zone, head_order));
>>  
>>      while ( cur_head < (head + (1 << head_order)) )
>> @@ -892,8 +911,16 @@ static int reserve_offlined_page(struct page_info *head)
>>              {
>>              merge:
>>                  /* We don't consider merging outside the head_order. */
>> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
>> -                PFN_ORDER(cur_head) = cur_order;
>> +
>> +                /* See if any of the pages need scrubbing. */
>> +                need_scrub = false;
>> +                for ( i = 0; i < (1 << cur_order); i++ )
>> +                    if ( test_bit(_PGC_need_scrub, &cur_head[i].count_info) )
>> +                    {
>> +                        need_scrub = true;
>> +                        break;
>> +                    }
>> +                page_list_add_scrub(cur_head, node, zone, cur_order, need_scrub);
> This thing with clearing dirty_head, then setting it again in
> page_list_add_scrub() could use some explanation -- either near one of
> these loops, or in mm.h preferrably.

This is done because we are merging/splitting buddies and dirty_head
should only be set on buddy's head. So we clear it before any such
operation and then, as we merge/split things, we see if there are any
dirty pages in the buddy. If there are, we set dirty_head.

I will add something.

(Note that at Jan's suggestion I will replace boolean dirty_head with an
int that hints where the first dirty page in the buddy is)

>
>
>>                  cur_head += (1 << cur_order);
>>                  break;
>>              }
>> @@ -922,10 +949,13 @@ static int reserve_offlined_page(struct page_info *head)
>>  /* Returns new buddy head. */
>>  static struct page_info *
>>  merge_and_free_buddy(struct page_info *pg, unsigned int node,
>> -                     unsigned int zone, unsigned int order)
>> +                     unsigned int zone, unsigned int order,
>> +                     bool need_scrub)
> What is the meaning of "need_scrub" here?  Does this mean that pg needs
> to be scrubbed?

It means that the buddy needs (or, more precisely, may need) scrubbing.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
  2017-05-04 15:31   ` Jan Beulich
@ 2017-05-11 10:26   ` Dario Faggioli
  2017-05-11 14:19     ` Boris Ostrovsky
  1 sibling, 1 reply; 51+ messages in thread
From: Dario Faggioli @ 2017-05-11 10:26 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 3820 bytes --]

On Fri, 2017-04-14 at 11:37 -0400, Boris Ostrovsky wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v3:
> * If memory-only nodes exist, select the closest one for scrubbing
> * Don't scrub from idle loop until we reach SYS_STATE_active.
> 
>  xen/arch/arm/domain.c   |   13 ++++--
>  xen/arch/x86/domain.c   |    3 +-
>  xen/common/page_alloc.c |   98
> +++++++++++++++++++++++++++++++++++++++++-----
>  xen/include/xen/mm.h    |    1 +
>  4 files changed, 98 insertions(+), 17 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 76310ed..38d6331 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -46,13 +46,16 @@ void idle_loop(void)
>          if ( cpu_is_offline(smp_processor_id()) )
>              stop_cpu();
>  
> -        local_irq_disable();
> -        if ( cpu_is_haltable(smp_processor_id()) )
> +        if ( !scrub_free_pages() )
>          {
> -            dsb(sy);
> -            wfi();
> +            local_irq_disable();
> +            if ( cpu_is_haltable(smp_processor_id()) )
> +            {
> +                dsb(sy);
> +                wfi();
> +            }
> +            local_irq_enable();
>          }
> -        local_irq_enable();
>  
>          do_tasklet();
>          do_softirq();
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 90e2b1f..a5f62b5 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -118,7 +118,8 @@ static void idle_loop(void)
>      {
>          if ( cpu_is_offline(smp_processor_id()) )
>              play_dead();
> -        (*pm_idle)();
> +        if ( !scrub_free_pages() )
> +            (*pm_idle)();
>          do_tasklet();
>
This means that, if we got here to run a tasklet (as in, if the idle
vCPU has been forced into execution, because there were a vCPU context
tasklet wanting to run), we will (potentially) do some scrubbing first.

Is this on purpose, and, in any case, ideal? vCPU context tasklets are
not terribly common, but I still don't think it is (ideal).

Not sure how to address this, though. What (the variants of) pm_idle()
uses for deciding whether or not to actually go to sleep is
cpu_is_haltable(), which checks per_cpu(tasklet_work_to_do, cpu):

/*
 * Used by idle loop to decide whether there is work to do:
 *  (1) Run softirqs; or (2) Play dead; or (3) Run tasklets.
 */
#define cpu_is_haltable(cpu)                    \
    (!softirq_pending(cpu) &&                   \
     cpu_online(cpu) &&                         \
     !per_cpu(tasklet_work_to_do, cpu))

Pulling it out/adding a call to it (cpu_is_haltable()) is ugly, and
probably not what we want (e.g., it's always called with IRQs disabled,
while they're on here).

Maybe we can test tasklet_work_to_do, before calling scrub_free_pages()
(also ugly, IMO).
Or, if scrub_free_pages() is, and always will be, called only from
here, within the idle loop, test tasklet_work_to_do inside, similarly
to what it does already for pending softirqs...

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-11 10:26   ` Dario Faggioli
@ 2017-05-11 14:19     ` Boris Ostrovsky
  2017-05-11 15:48       ` Dario Faggioli
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-11 14:19 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1.1: Type: text/plain, Size: 2146 bytes --]


>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>> index 90e2b1f..a5f62b5 100644
>> --- a/xen/arch/x86/domain.c
>> +++ b/xen/arch/x86/domain.c
>> @@ -118,7 +118,8 @@ static void idle_loop(void)
>>      {
>>          if ( cpu_is_offline(smp_processor_id()) )
>>              play_dead();
>> -        (*pm_idle)();
>> +        if ( !scrub_free_pages() )
>> +            (*pm_idle)();
>>          do_tasklet();
>>
> This means that, if we got here to run a tasklet (as in, if the idle
> vCPU has been forced into execution, because there were a vCPU context
> tasklet wanting to run), we will (potentially) do some scrubbing first.
>
> Is this on purpose, and, in any case, ideal? vCPU context tasklets are
> not terribly common, but I still don't think it is (ideal).
>
> Not sure how to address this, though. What (the variants of) pm_idle()
> uses for deciding whether or not to actually go to sleep is
> cpu_is_haltable(), which checks per_cpu(tasklet_work_to_do, cpu):
>
> /*
>  * Used by idle loop to decide whether there is work to do:
>  *  (1) Run softirqs; or (2) Play dead; or (3) Run tasklets.
>  */
> #define cpu_is_haltable(cpu)                    \
>     (!softirq_pending(cpu) &&                   \
>      cpu_online(cpu) &&                         \
>      !per_cpu(tasklet_work_to_do, cpu))
>
> Pulling it out/adding a call to it (cpu_is_haltable()) is ugly, and
> probably not what we want (e.g., it's always called with IRQs disabled,
> while they're on here).
>
> Maybe we can test tasklet_work_to_do, before calling scrub_free_pages()
> (also ugly, IMO).
> Or, if scrub_free_pages() is, and always will be, called only from
> here, within the idle loop, test tasklet_work_to_do inside, similarly
> to what it does already for pending softirqs...

We can move do_tasklet() above scrub_free_pages(). And new tasklet after
that would result in a softirq being set so we'd do an early exit from
scrub_free_pages().

OTOH since, as you say, we only get to idle loop() if no tasklet is
pending (cpu_is_haltable() test) then would even that be needed?


-boris



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-11 14:19     ` Boris Ostrovsky
@ 2017-05-11 15:48       ` Dario Faggioli
  2017-05-11 17:05         ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Dario Faggioli @ 2017-05-11 15:48 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 4674 bytes --]

On Thu, 2017-05-11 at 10:19 -0400, Boris Ostrovsky wrote:
> > > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > > index 90e2b1f..a5f62b5 100644
> > > --- a/xen/arch/x86/domain.c
> > > +++ b/xen/arch/x86/domain.c
> > > @@ -118,7 +118,8 @@ static void idle_loop(void)
> > >      {
> > >          if ( cpu_is_offline(smp_processor_id()) )
> > >              play_dead();
> > > -        (*pm_idle)();
> > > +        if ( !scrub_free_pages() )
> > > +            (*pm_idle)();
> > >          do_tasklet();
> > > 
> > 
> > This means that, if we got here to run a tasklet (as in, if the
> > idle
> > vCPU has been forced into execution, because there were a vCPU
> > context
> > tasklet wanting to run), we will (potentially) do some scrubbing
> > first.
> > 
> We can move do_tasklet() above scrub_free_pages(). And new tasklet
> after
> that would result in a softirq being set so we'd do an early exit
> from
> scrub_free_pages().
> 
How early?

In fact, right now, if there is one tasklet queued, this is what
happens:

 tasklet_schedule(t)
   tasklet_enqueue(t)
     test_and_set_bit(_TASKLET_enqueued, tasklet_work_to_do);
     raise_softirq(SCHEDULE_SOFTIRQ);
 schedule()
   set_bit(_TASKLET_scheduled, tasklet_work_to_do)
   tasklet_work_scheduled = 1;
   do_schedule(tasklet_work_scheduled)
     csched_schedule(tasklet_work_to_do)
       snext = CSCHED_VCPU(idle_vcpu[cpu]);
 idle_loop()
   (*pm_idle)()
     if ( !cpu_is_haltable() ) return;
   do_tasklet() /* runs tasklet t */
     clear_bit(_TASKLET_enqueued, work_to_do); /* list_empty(list) == true */
     raise_softirq(SCHEDULE_SOFTIRQ);
   do_softirq()
 schedule()
   clear_bit(_TASKLET_scheduled, tasklet_work);
   tasklet_work_scheduled = 0;
   do_schedule(tasklet_work_scheduled)
     csched_schedule(tasklet_work_to_do)
       snext = CSCHED_VCPU(idle_vcpu[cpu]);
 idle_loop()
   (*pm_idle)()
     if ( !cpu_is_haltable() )
 ...

If we move do_tasklet up, as you suggest, this is what happens:

 tasklet_schedule(t)
   tasklet_enqueue(t)
     test_and_set_bit(_TASKLET_enqueued, tasklet_work_to_do);
     raise_softirq(SCHEDULE_SOFTIRQ);
 schedule()
   set_bit(_TASKLET_scheduled, tasklet_work_to_do)
   tasklet_work_scheduled = 1;
   do_schedule(tasklet_work_scheduled)
     csched_schedule(tasklet_work_to_do)
       snext = CSCHED_VCPU(idle_vcpu[cpu]);
 idle_loop()
   do_tasklet() /* runs tasklet t */
     clear_bit(_TASKLET_enqueued, work_to_do); /* list_empty(list) == true */
     raise_softirq(SCHEDULE_SOFTIRQ);
   if ( !scrub_free_pages() )
     //do some scrubbing, but softirq_pending() is true, so return 1
   do_softirq()
 schedule()
   clear_bit(_TASKLET_scheduled, tasklet_work);
   tasklet_work_scheduled = 0;
   do_schedule(tasklet_work_scheduled)
     csched_schedule(tasklet_work_to_do)
       snext = CSCHED_VCPU(idle_vcpu[cpu]);
 idle_loop()
   if ( !scrub_free_pages() )
     //do the scrubbing, returns 0, so we enter the if
     (*pm_idle)()
       if ( !cpu_is_haltable() )
 ...

IOW (provided I'm understanding your code right, of course), I still
see it happening that we switched to idle *not* because the system was
idle, but for running a tasklet, and yet we end up doing at least some
scrubbing (like if the system were idle).

Which still feels wrong to me.

If there's more than one tasklet queued (or another one, or more,
is/are queued before the one t is processed), it's even worse, because
we go through the whole schedule()->idle_loop()->do_tasklet() again and
again, and at each step we do a bit of scrubbing, before going back to
schedule().

It probably would be at least a bit better, if scrub_free_pages() would
check for softirqs() _before_ starting any scrubbing (which I don't
think it does, right now, am I right?).

> OTOH since, as you say, we only get to idle loop() if no tasklet is
> pending (cpu_is_haltable() test) then would even that be needed?
> 
Err... sorry, not getting. It's the other way round. One of the reasons
why we end up executing idle_loop(), is that there is at least a
tasklet pending.

Where we only get to if there's nothing pending is to calling
(*pm_idle)().

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-11 15:48       ` Dario Faggioli
@ 2017-05-11 17:05         ` Boris Ostrovsky
  2017-05-12  8:17           ` Dario Faggioli
  0 siblings, 1 reply; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-11 17:05 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1.1: Type: text/plain, Size: 4846 bytes --]

On 05/11/2017 11:48 AM, Dario Faggioli wrote:
> On Thu, 2017-05-11 at 10:19 -0400, Boris Ostrovsky wrote:
>>>> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
>>>> index 90e2b1f..a5f62b5 100644
>>>> --- a/xen/arch/x86/domain.c
>>>> +++ b/xen/arch/x86/domain.c
>>>> @@ -118,7 +118,8 @@ static void idle_loop(void)
>>>>      {
>>>>          if ( cpu_is_offline(smp_processor_id()) )
>>>>              play_dead();
>>>> -        (*pm_idle)();
>>>> +        if ( !scrub_free_pages() )
>>>> +            (*pm_idle)();
>>>>          do_tasklet();
>>>>
>>> This means that, if we got here to run a tasklet (as in, if the
>>> idle
>>> vCPU has been forced into execution, because there were a vCPU
>>> context
>>> tasklet wanting to run), we will (potentially) do some scrubbing
>>> first.
>>>
>> We can move do_tasklet() above scrub_free_pages(). And new tasklet
>> after
>> that would result in a softirq being set so we'd do an early exit
>> from
>> scrub_free_pages().
>>
> How early?
>
> In fact, right now, if there is one tasklet queued, this is what
> happens:
>
>  tasklet_schedule(t)
>    tasklet_enqueue(t)
>      test_and_set_bit(_TASKLET_enqueued, tasklet_work_to_do);
>      raise_softirq(SCHEDULE_SOFTIRQ);
>  schedule()
>    set_bit(_TASKLET_scheduled, tasklet_work_to_do)
>    tasklet_work_scheduled = 1;
>    do_schedule(tasklet_work_scheduled)
>      csched_schedule(tasklet_work_to_do)
>        snext = CSCHED_VCPU(idle_vcpu[cpu]);
>  idle_loop()
>    (*pm_idle)()
>      if ( !cpu_is_haltable() ) return;
>    do_tasklet() /* runs tasklet t */
>      clear_bit(_TASKLET_enqueued, work_to_do); /* list_empty(list) == true */
>      raise_softirq(SCHEDULE_SOFTIRQ);
>    do_softirq()
>  schedule()
>    clear_bit(_TASKLET_scheduled, tasklet_work);
>    tasklet_work_scheduled = 0;
>    do_schedule(tasklet_work_scheduled)
>      csched_schedule(tasklet_work_to_do)
>        snext = CSCHED_VCPU(idle_vcpu[cpu]);
>  idle_loop()
>    (*pm_idle)()
>      if ( !cpu_is_haltable() )
>  ...
>
> If we move do_tasklet up, as you suggest, this is what happens:
>
>  tasklet_schedule(t)
>    tasklet_enqueue(t)
>      test_and_set_bit(_TASKLET_enqueued, tasklet_work_to_do);
>      raise_softirq(SCHEDULE_SOFTIRQ);
>  schedule()
>    set_bit(_TASKLET_scheduled, tasklet_work_to_do)
>    tasklet_work_scheduled = 1;
>    do_schedule(tasklet_work_scheduled)
>      csched_schedule(tasklet_work_to_do)
>        snext = CSCHED_VCPU(idle_vcpu[cpu]);
>  idle_loop()
>    do_tasklet() /* runs tasklet t */
>      clear_bit(_TASKLET_enqueued, work_to_do); /* list_empty(list) == true */
>      raise_softirq(SCHEDULE_SOFTIRQ);
>    if ( !scrub_free_pages() )
>      //do some scrubbing, but softirq_pending() is true, so return 1
>    do_softirq()
>  schedule()
>    clear_bit(_TASKLET_scheduled, tasklet_work);
>    tasklet_work_scheduled = 0;
>    do_schedule(tasklet_work_scheduled)
>      csched_schedule(tasklet_work_to_do)
>        snext = CSCHED_VCPU(idle_vcpu[cpu]);
>  idle_loop()
>    if ( !scrub_free_pages() )
>      //do the scrubbing, returns 0, so we enter the if
>      (*pm_idle)()
>        if ( !cpu_is_haltable() )
>  ...
>
> IOW (provided I'm understanding your code right, of course), I still
> see it happening that we switched to idle *not* because the system was
> idle, but for running a tasklet, and yet we end up doing at least some
> scrubbing (like if the system were idle).
>
> Which still feels wrong to me.
>
> If there's more than one tasklet queued (or another one, or more,
> is/are queued before the one t is processed), it's even worse, because
> we go through the whole schedule()->idle_loop()->do_tasklet() again and
> again, and at each step we do a bit of scrubbing, before going back to
> schedule().
>
> It probably would be at least a bit better, if scrub_free_pages() would
> check for softirqs() _before_ starting any scrubbing (which I don't
> think it does, right now, am I right?).

Right.

I didn't realize that do_tasklet() also schedules softirq. So you are
suggesting something along the lines of

        do_tasklet();

        if ( !softirq_pending(smp_processor_id() && !scrub_free_pages() )
            (*pm_idle)();

        do_softirq();


>
>> OTOH since, as you say, we only get to idle loop() if no tasklet is
>> pending (cpu_is_haltable() test) then would even that be needed?
>>
> Err... sorry, not getting. It's the other way round. One of the reasons
> why we end up executing idle_loop(), is that there is at least a
> tasklet pending.

Nevermind that. I was thinking we enter idle_loop() based on
cpu_is_haltable().

-boris

>
> Where we only get to if there's nothing pending is to calling
> (*pm_idle)().
>
> Regards,
> Dario



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-11 17:05         ` Boris Ostrovsky
@ 2017-05-12  8:17           ` Dario Faggioli
  2017-05-12 14:42             ` Boris Ostrovsky
  0 siblings, 1 reply; 51+ messages in thread
From: Dario Faggioli @ 2017-05-12  8:17 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 1751 bytes --]

On Thu, 2017-05-11 at 13:05 -0400, Boris Ostrovsky wrote:
> On 05/11/2017 11:48 AM, Dario Faggioli wrote:
>
> > It probably would be at least a bit better, if scrub_free_pages()
> > would
> > check for softirqs() _before_ starting any scrubbing (which I don't
> > think it does, right now, am I right?).
> 
> Right.
> 
> I didn't realize that do_tasklet() also schedules softirq. So you are
> suggesting something along the lines of
> 
>         do_tasklet();
> 
>         if ( !softirq_pending(smp_processor_id() &&
> !scrub_free_pages() )
>             (*pm_idle)();
> 
>         do_softirq();
> 
I was indeed suggesting something like that, and in fact I was about to
say that, yes, what you wrote above should work.

*But* (and sorry for overlooking that), if there is more than one
tasklet queued SCHEDULE_SOFTIRQ is not raised (look at the 'if
(list_empty())' in do_tasklet()), and hence the softirq_pending() check
would fail, and we'll still try to scrub.

Basically, we'd have some scrubbing happening in between the processing
of two tasklets (and this will repeat, as a pattern, if there are more
than two), and unless someone else raises an (unrelated) softirq at
some point, we may even scrub a significant amount of memory.

So, it looks to me that the gating should consider both,
softirq_pending() and tasklet_work_to_do (making it look really really
similar to cpu_is_haltable()...).

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v3 4/9] mm: Scrub memory from idle loop
  2017-05-12  8:17           ` Dario Faggioli
@ 2017-05-12 14:42             ` Boris Ostrovsky
  0 siblings, 0 replies; 51+ messages in thread
From: Boris Ostrovsky @ 2017-05-12 14:42 UTC (permalink / raw)
  To: Dario Faggioli, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1.1: Type: text/plain, Size: 1696 bytes --]

On 05/12/2017 04:17 AM, Dario Faggioli wrote:
> On Thu, 2017-05-11 at 13:05 -0400, Boris Ostrovsky wrote:
>> On 05/11/2017 11:48 AM, Dario Faggioli wrote:
>>
>>> It probably would be at least a bit better, if scrub_free_pages()
>>> would
>>> check for softirqs() _before_ starting any scrubbing (which I don't
>>> think it does, right now, am I right?).
>> Right.
>>
>> I didn't realize that do_tasklet() also schedules softirq. So you are
>> suggesting something along the lines of
>>
>>         do_tasklet();
>>
>>         if ( !softirq_pending(smp_processor_id() &&
>> !scrub_free_pages() )
>>             (*pm_idle)();
>>
>>         do_softirq();
>>
> I was indeed suggesting something like that, and in fact I was about to
> say that, yes, what you wrote above should work.
>
> *But* (and sorry for overlooking that), if there is more than one
> tasklet queued SCHEDULE_SOFTIRQ is not raised (look at the 'if
> (list_empty())' in do_tasklet()), and hence the softirq_pending() check
> would fail, and we'll still try to scrub.
>
> Basically, we'd have some scrubbing happening in between the processing
> of two tasklets (and this will repeat, as a pattern, if there are more
> than two), and unless someone else raises an (unrelated) softirq at
> some point, we may even scrub a significant amount of memory.
>
> So, it looks to me that the gating should consider both,
> softirq_pending() and tasklet_work_to_do (making it look really really
> similar to cpu_is_haltable()...).

So I will then just use cpu_is_haltable() --- the possibly unnecessary
cpu_online() check here is simple/quick enough to justify reusing
existing macro.


-boris



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-05-12 14:42 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-14 15:37 [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 1/9] mm: Separate free page chunk merging into its own routine Boris Ostrovsky
2017-05-04  9:45   ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 2/9] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-05-04 10:17   ` Jan Beulich
2017-05-04 14:53     ` Boris Ostrovsky
2017-05-04 15:00       ` Jan Beulich
2017-05-08 16:41   ` George Dunlap
2017-05-08 16:59     ` Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 3/9] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-05-04 14:44   ` Jan Beulich
2017-05-04 15:04     ` Boris Ostrovsky
2017-05-04 15:36       ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 4/9] mm: Scrub memory from idle loop Boris Ostrovsky
2017-05-04 15:31   ` Jan Beulich
2017-05-04 17:09     ` Boris Ostrovsky
2017-05-05 10:21       ` Jan Beulich
2017-05-05 13:42         ` Boris Ostrovsky
2017-05-05 14:10           ` Jan Beulich
2017-05-05 14:14             ` Jan Beulich
2017-05-05 14:27               ` Boris Ostrovsky
2017-05-05 14:51                 ` Jan Beulich
2017-05-05 15:23                   ` Boris Ostrovsky
2017-05-05 16:05                     ` Jan Beulich
2017-05-05 16:49                       ` Boris Ostrovsky
2017-05-08  7:14                         ` Jan Beulich
2017-05-11 10:26   ` Dario Faggioli
2017-05-11 14:19     ` Boris Ostrovsky
2017-05-11 15:48       ` Dario Faggioli
2017-05-11 17:05         ` Boris Ostrovsky
2017-05-12  8:17           ` Dario Faggioli
2017-05-12 14:42             ` Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 5/9] mm: Do not discard already-scrubbed pages if softirqs are pending Boris Ostrovsky
2017-05-04 15:43   ` Jan Beulich
2017-05-04 17:18     ` Boris Ostrovsky
2017-05-05 10:27       ` Jan Beulich
2017-05-05 13:51         ` Boris Ostrovsky
2017-05-05 14:13           ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 6/9] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 7/9] mm: Keep pages available for allocation while scrubbing Boris Ostrovsky
2017-05-04 16:03   ` Jan Beulich
2017-05-04 17:26     ` Boris Ostrovsky
2017-05-05 10:28       ` Jan Beulich
2017-04-14 15:37 ` [PATCH v3 8/9] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-04-14 15:37 ` [PATCH v3 9/9] mm: Make sure pages are scrubbed Boris Ostrovsky
2017-05-05 15:05   ` Jan Beulich
2017-05-08 15:48     ` Konrad Rzeszutek Wilk
2017-05-08 16:23       ` Boris Ostrovsky
2017-05-02 14:46 ` [PATCH v3 0/9] Memory scrubbing from idle loop Boris Ostrovsky
2017-05-02 14:58   ` Jan Beulich
2017-05-02 15:07     ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.