All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/8] Memory scrubbing from idle loop
@ 2017-06-22 18:57 Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Changes in v5:
* Make page_info.u.free and union and use bitfields there.
* Bug fixes

(see per-patch notes)

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag


Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.


Boris Ostrovsky (8):
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug          |   7 +
 xen/arch/arm/domain.c      |   2 +-
 xen/arch/x86/domain.c      |   2 +-
 xen/arch/x86/domain_page.c |   6 +-
 xen/common/page_alloc.c    | 612 ++++++++++++++++++++++++++++++++++++++-------
 xen/common/spinlock.c      |   9 +-
 xen/include/asm-arm/mm.h   |  30 ++-
 xen/include/asm-x86/mm.h   |  30 ++-
 xen/include/xen/mm.h       |   5 +-
 xen/include/xen/spinlock.h |   8 +
 10 files changed, 603 insertions(+), 108 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-27 17:06   ` Jan Beulich
  2017-06-22 18:57 ` [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Be more careful when returning unused portion of a buddy to the heap in
  alloc_heap_pages() and don't set first_dirty if we know that the sub-buddy
  is clean
* In reserve_offlined_page(), don't try to find dirty pages in sub-buddies if we
  can figure out that there is none.
* Drop unnecessary setting of first_dirty in free_heap_pages()
* Switch to using bitfields in page_info.u.free

I kept node_need_scrub[] as a global array and not a "per-node". I think splitting
it should be part of making heap_lock a per-node lock, together with increasing
scrub concurrency by having more than one CPU scrub a node.



 xen/common/page_alloc.c  | 190 +++++++++++++++++++++++++++++++++++++++--------
 xen/include/asm-arm/mm.h |  18 ++++-
 xen/include/asm-x86/mm.h |  17 ++++-
 3 files changed, 190 insertions(+), 35 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 8bcef6a..570d1f7 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,6 +680,20 @@ static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+        page_list_add_tail(pg, &heap(node, zone, order));
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
@@ -687,7 +703,7 @@ static struct page_info *alloc_heap_pages(
     unsigned int i, j, zone = 0, nodemask_retry = 0;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
     unsigned long request = 1UL << order;
-    struct page_info *pg;
+    struct page_info *pg, *first_dirty_pg = NULL;
     nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
     bool_t need_tlbflush = 0;
     uint32_t tlbflush_timestamp = 0;
@@ -798,11 +814,26 @@ static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+
+    if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+        first_dirty_pg = pg + pg->u.free.first_dirty;
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
+        unsigned int first_dirty;
+
+        if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) )
+        {
+            if ( pg < first_dirty_pg )
+                first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
+            else
+                first_dirty = 0;
+        }
+        else
+            first_dirty = INVALID_DIRTY_IDX;
+
+        page_list_add_scrub(pg, node, zone, --j, first_dirty);
         pg += 1 << j;
     }
 
@@ -849,13 +880,22 @@ static int reserve_offlined_page(struct page_info *head)
 {
     unsigned int node = phys_to_nid(page_to_maddr(head));
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
-    struct page_info *cur_head;
+    struct page_info *cur_head, *first_dirty_pg = NULL;
     int cur_order;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    if (head->u.free.first_dirty != INVALID_DIRTY_IDX)
+        first_dirty_pg = head + head->u.free.first_dirty;
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -873,6 +913,8 @@ static int reserve_offlined_page(struct page_info *head)
 
         while ( cur_order < head_order )
         {
+            unsigned int first_dirty = INVALID_DIRTY_IDX;
+
             next_order = cur_order + 1;
 
             if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
@@ -892,8 +934,25 @@ static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+
+                /* See if any of the pages indeed need scrubbing. */
+                if ( first_dirty_pg && (cur_head + (1 << cur_order) > first_dirty_pg) )
+                {
+                    if ( cur_head < first_dirty_pg )
+                        i = (first_dirty_pg - cur_head) / sizeof(*cur_head);
+                    else
+                        i = 0;
+
+                    for ( ; i < (1 << cur_order); i++ )
+                        if ( test_bit(_PGC_need_scrub,
+                                      &cur_head[i].count_info) )
+                        {
+                            first_dirty = i;
+                            break;
+                        }
+                }
+                page_list_add_scrub(cur_head, node, zone,
+                                    cur_order, first_dirty);
                 cur_head += (1 << cur_order);
                 break;
             }
@@ -919,9 +978,53 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1064,20 @@ static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -977,35 +1090,53 @@ static void free_heap_pages(
 
         if ( (page_to_mfn(pg) & mask) )
         {
+            struct page_info *predecessor = pg - mask;
+
             /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
+                 !page_state_is(predecessor, free) ||
+                 (PFN_ORDER(predecessor) != order) ||
+                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
+
+            page_list_del(predecessor, &heap(node, zone, order));
+
+            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
+                need_scrub = true;
+                /* ... and keep predecessor's first_dirty. */
+            else if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
+            pg = predecessor;
         }
         else
         {
+            struct page_info *successor = pg + mask;
+
             /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
+                 !page_state_is(successor, free) ||
+                 (PFN_ORDER(successor) != order) ||
+                 (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
-            page_list_del(pg + mask, &heap(node, zone, order));
+            page_list_del(successor, &heap(node, zone, order));
+
+            if ( successor->u.free.first_dirty != INVALID_DIRTY_IDX )
+                need_scrub = true;
         }
 
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1226,7 +1357,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1295,7 +1426,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1622,7 +1753,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1676,12 +1807,9 @@ void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1790,7 +1918,7 @@ struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1858,11 +1986,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 13c673a..889a85e 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -44,7 +44,16 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            unsigned long need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more than maximum possible order (MAX_ORDER+1) to
+             * accommodate INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
+            unsigned long first_dirty:MAX_ORDER + 2;
+
         } free;
 
     } u;
@@ -107,6 +116,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern unsigned long xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index d33599d..cd00bef 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -88,7 +88,15 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            unsigned long need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more than maximum possible order (MAX_ORDER+1) to
+             * accommodate INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
+            unsigned long first_dirty:MAX_ORDER + 2;
         } free;
 
     } u;
@@ -233,6 +241,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 struct spage_info
 {
        unsigned long type_info;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-27 17:59   ` Jan Beulich
  2017-06-22 18:57 ` [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Constified get_free_buddy()'s struct domain argument
* Dropped request local variable in get_free_buddy().

Because of rebasing there were few more changes in this patch so I decided not
to keep Jan's ACK.


 xen/common/page_alloc.c | 143 ++++++++++++++++++++++++++----------------------
 1 file changed, 79 insertions(+), 64 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 570d1f7..89fe3ce 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -694,22 +694,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-    unsigned int zone_lo, unsigned int zone_hi,
-    unsigned int order, unsigned int memflags,
-    struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+                                        unsigned int zone_hi,
+                                        unsigned int order, unsigned int memflags,
+                                        const struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-    unsigned long request = 1UL << order;
-    struct page_info *pg, *first_dirty_pg = NULL;
-    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
-    uint32_t tlbflush_timestamp = 0;
-
-    /* Make sure there are enough bits in memflags for nodeID. */
-    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+    nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+    unsigned int j, zone, nodemask_retry = 0;
+    struct page_info *pg;
 
     if ( node == NUMA_NO_NODE )
     {
@@ -725,34 +718,6 @@ static struct page_info *alloc_heap_pages(
     first_node = node;
 
     ASSERT(node < MAX_NUMNODES);
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
-    if ( unlikely(order > MAX_ORDER) )
-        return NULL;
-
-    spin_lock(&heap_lock);
-
-    /*
-     * Claimed memory is considered unavailable unless the request
-     * is made by a domain with sufficient unclaimed pages.
-     */
-    if ( (outstanding_claims + request >
-          total_avail_pages + tmem_freeable_pages()) &&
-          ((memflags & MEMF_no_refcount) ||
-           !d || d->outstanding_pages < request) )
-        goto not_found;
-
-    /*
-     * TMEM: When available memory is scarce due to tmem absorbing it, allow
-     * only mid-size allocations to avoid worst of fragmentation issues.
-     * Others try tmem pools then fail.  This is a workaround until all
-     * post-dom0-creation-multi-page allocations can be eliminated.
-     */
-    if ( ((order == 0) || (order >= 9)) &&
-         (total_avail_pages <= midsize_alloc_zone_pages) &&
-         tmem_freeable_pages() )
-        goto try_tmem;
 
     /*
      * Start with requested node, but exhaust all node memory in requested 
@@ -764,17 +729,17 @@ static struct page_info *alloc_heap_pages(
         zone = zone_hi;
         do {
             /* Check if target node can support the allocation. */
-            if ( !avail[node] || (avail[node][zone] < request) )
+            if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
                 continue;
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    goto found;
+                    return pg;
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-            goto not_found;
+            return NULL;
 
         /* Pick next node. */
         if ( !node_isset(node, nodemask) )
@@ -791,39 +756,89 @@ static struct page_info *alloc_heap_pages(
         {
             /* When we have tried all in nodemask, we fall back to others. */
             if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-                goto not_found;
+                return NULL;
             nodes_andnot(nodemask, node_online_map, nodemask);
             first_node = node = first_node(nodemask);
             if ( node >= MAX_NUMNODES )
-                goto not_found;
+                return NULL;
         }
     }
+}
 
- try_tmem:
-    /* Try to free memory from tmem */
-    if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL )
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+    unsigned int zone_lo, unsigned int zone_hi,
+    unsigned int order, unsigned int memflags,
+    struct domain *d)
+{
+    nodeid_t node;
+    unsigned int i, buddy_order, zone;
+    unsigned long request = 1UL << order;
+    struct page_info *pg, *first_dirty_pg = NULL;
+    bool_t need_tlbflush = 0;
+    uint32_t tlbflush_timestamp = 0;
+
+    /* Make sure there are enough bits in memflags for nodeID. */
+    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < request) )
     {
-        /* reassigning an already allocated anonymous heap page */
         spin_unlock(&heap_lock);
-        return pg;
+        return NULL;
+    }
+ 
+    /*
+     * TMEM: When available memory is scarce due to tmem absorbing it, allow
+     * only mid-size allocations to avoid worst of fragmentation issues.
+     * Others try tmem pools then fail.  This is a workaround until all
+     * post-dom0-creation-multi-page allocations can be eliminated.
+     */
+    if ( ((order == 0) || (order >= 9)) &&
+         (total_avail_pages <= midsize_alloc_zone_pages) &&
+         tmem_freeable_pages() )
+     {
+        /* Try to free memory from tmem. */
+         pg = tmem_relinquish_pages(order, memflags);
+         spin_unlock(&heap_lock);
+         return pg;
+     }
+ 
+    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    if ( !pg )
+    {
+        /* No suitable memory blocks. Fail the request. */
+        spin_unlock(&heap_lock);
+        return NULL;
     }
 
- not_found:
-    /* No suitable memory blocks. Fail the request. */
-    spin_unlock(&heap_lock);
-    return NULL;
-
- found: 
+    node = phys_to_nid(page_to_maddr(pg));
+    zone = page_to_zone(pg);
+    buddy_order = PFN_ORDER(pg);
 
     if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
         first_dirty_pg = pg + pg->u.free.first_dirty;
-
-    /* We may have to halve the chunk a number of times. */
-    while ( j != order )
+ 
+     /* We may have to halve the chunk a number of times. */
+    while ( buddy_order != order )
     {
         unsigned int first_dirty;
 
-        if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) )
+        if ( first_dirty_pg && ((pg + (1 << buddy_order)) > first_dirty_pg) )
         {
             if ( pg < first_dirty_pg )
                 first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
@@ -833,8 +848,8 @@ static struct page_info *alloc_heap_pages(
         else
             first_dirty = INVALID_DIRTY_IDX;
 
-        page_list_add_scrub(pg, node, zone, --j, first_dirty);
-        pg += 1 << j;
+        page_list_add_scrub(pg, node, zone, --buddy_order, first_dirty);
+        pg += 1 << buddy_order;
     }
 
     ASSERT(avail[node][zone] >= request);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-27 18:00   ` Jan Beulich
  2017-06-22 18:57 ` [PATCH v5 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Added comment explaining why we always grab order 0 pages in alloc_heap_pages)
* Dropped the somewhat confusing comment about not needing to set first_dirty
  in alloc_heap_pages().
* Moved first bit of _MEMF_node by 8 to accommodate MEMF_no_scrub (bit 7 is
  no longer available)


 xen/common/page_alloc.c | 36 +++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h    |  4 +++-
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 89fe3ce..9aac196 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -703,6 +703,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -734,8 +735,20 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    /*
+                     * We grab single pages (order=0) even if they are
+                     * unscrubbed. Given that scrubbing one page is fairly quick
+                     * it is not worth breaking higher orders.
+                     */
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -775,7 +788,7 @@ static struct page_info *alloc_heap_pages(
     unsigned int i, buddy_order, zone;
     unsigned long request = 1UL << order;
     struct page_info *pg, *first_dirty_pg = NULL;
-    bool_t need_tlbflush = 0;
+    bool need_scrub, need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
@@ -819,6 +832,10 @@ static struct page_info *alloc_heap_pages(
      }
  
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    /* Try getting a dirty buddy if we couldn't get a clean one. */
+    if ( !pg && !(memflags & MEMF_no_scrub) )
+        pg = get_free_buddy(zone_lo, zone_hi, order,
+                            memflags | MEMF_no_scrub, d);
     if ( !pg )
     {
         /* No suitable memory blocks. Fail the request. */
@@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( need_scrub )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1749,7 +1775,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1799,7 +1825,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 3d3f31b..5f3d84a 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -238,7 +238,9 @@ struct npfec {
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_no_icache_flush 7
 #define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
-#define _MEMF_node        8
+#define _MEMF_no_scrub    8
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
+#define _MEMF_node        16
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
 #define  MEMF_get_node(f) ((((f) >> _MEMF_node) - 1) & MEMF_node_mask)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-23  8:36   ` Dario Faggioli
  2017-06-27 18:01   ` Jan Beulich
  2017-06-22 18:57 ` [PATCH v5 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do 
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes in v5:
* Added explanation in commit message for making mapcache override VCPU
  a per-cpu variable
* Fixed loop counting in scrub_free_pages()
* Fixed the off-by-one error in setting first_dirty in scrub_free_pages().
* Various style fixes
* Added a comment in node_to_scrub() explaining why it should be OK to
  prevent another CPU from scrubbing a node that ths current CPU temporarily
  claimed. (I decided against using locks there)


 xen/arch/arm/domain.c      |   2 +-
 xen/arch/x86/domain.c      |   2 +-
 xen/arch/x86/domain_page.c |   6 +--
 xen/common/page_alloc.c    | 118 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h       |   1 +
 5 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2dc8b0a..d282cd8 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -51,7 +51,7 @@ void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() )
         {
             local_irq_disable();
             if ( cpu_is_haltable(cpu) )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f7873da..71f1ef4 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -122,7 +122,7 @@ static void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() )
             pm_idle();
         do_softirq();
         /*
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..0783c1e 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,12 @@
 #include <asm/hardirq.h>
 #include <asm/setup.h>
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
     /* In the common case we use the mapcache of the running VCPU. */
-    struct vcpu *v = override ?: current;
+    struct vcpu *v = this_cpu(override) ?: current;
 
     /*
      * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-    override = v;
+    this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9aac196..4e2775f 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1019,15 +1019,85 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-    struct page_info *pg;
-    unsigned int zone;
+    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+    nodeid_t closest = NUMA_NO_NODE;
+    u8 dist, shortest = 0xff;
 
-    ASSERT(spin_is_locked(&heap_lock));
+    if ( node == NUMA_NO_NODE )
+        node = 0;
 
-    if ( !node_need_scrub[node] )
-        return;
+    if ( node_need_scrub[node] &&
+         (!get_node || !node_test_and_set(node, node_scrubbing)) )
+        return node;
+
+    /*
+     * See if there are memory-only nodes that need scrubbing and choose
+     * the closest one.
+     */
+    local_node = node;
+    for ( ; ; )
+    {
+        do {
+            node = cycle_node(node, node_online_map);
+        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
+                  (node != local_node) );
+
+        if ( node == local_node )
+            break;
+
+        /*
+         * Grab the node right away. If we find a closer node later we will
+         * release this one. While there is a chance that another CPU will
+         * not be able to scrub that node when it is searching for scrub work
+         * at the same time it will be able to do so next time it wakes up.
+         * The alternative would be to perform this search under a lock but
+         * then we'd need to take this lock every time we come in here.
+         */
+        if ( node_need_scrub[node] )
+        {
+            if ( !get_node )
+                return node;
+
+            dist = __node_distance(local_node, node);
+            if ( (dist < shortest || closest == NUMA_NO_NODE) &&
+                 !node_test_and_set(node, node_scrubbing) )
+            {
+                if ( closest != NUMA_NO_NODE )
+                    node_clear(closest, node_scrubbing);
+                shortest = dist;
+                closest = node;
+            }
+        }
+    }
+
+    return closest;
+}
+
+bool scrub_free_pages(void)
+{
+    struct page_info *pg;
+    unsigned int zone;
+    unsigned int cpu = smp_processor_id();
+    bool preempt = false;
+    nodeid_t node;
+    unsigned int cnt = 0;
+  
+    node = node_to_scrub(true);
+    if ( node == NUMA_NO_NODE )
+        return false;
+ 
+    spin_lock(&heap_lock);
 
     for ( zone = 0; zone < NR_ZONES; zone++ )
     {
@@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
                         scrub_one_page(&pg[i]);
                         pg[i].count_info &= ~PGC_need_scrub;
                         node_need_scrub[node]--;
+                        cnt += 100; /* scrubbed pages add heavier weight. */
+                    }
+                    else
+                        cnt++;
+
+                    /*
+                     * Scrub a few (8) pages before becoming eligible for
+                     * preemption. But also count non-scrubbing loop iterations
+                     * so that we don't get stuck here with an almost clean
+                     * heap.
+                     */
+                    if ( cnt > 800 && softirq_pending(cpu) )
+                    {
+                        preempt = true;
+                        break;
                     }
                 }
 
-                page_list_del(pg, &heap(node, zone, order));
-                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                if ( i >= (1U << order) - 1 )
+                {
+                    page_list_del(pg, &heap(node, zone, order));
+                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                }
+                else
+                    pg->u.free.first_dirty = i + 1;
 
-                if ( node_need_scrub[node] == 0 )
-                    return;
+                if ( preempt || (node_need_scrub[node] == 0) )
+                    goto out;
             }
         } while ( order-- != 0 );
     }
+
+ out:
+    spin_unlock(&heap_lock);
+    node_clear(node, node_scrubbing);
+    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
 
 /* Free 2^@order set of pages. */
@@ -1175,9 +1270,6 @@ static void free_heap_pages(
     if ( tainted )
         reserve_offlined_page(pg);
 
-    if ( need_scrub )
-        scrub_free_pages(node);
-
     spin_unlock(&heap_lock);
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 5f3d84a..a9829c2 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -150,6 +150,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
 void free_xenheap_pages(void *v, unsigned int order);
+bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 /* Map machine page range in Xen virtual address space. */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 5/8] spinlock: Introduce spin_lock_cb()
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Added a sentence in commit message to note that callback function is
  called on every iteration of the spin loop.

 xen/common/spinlock.c      | 9 ++++++++-
 xen/include/xen/spinlock.h | 8 ++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
     return read_atomic(&t->head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
     spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
     LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
     while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
+        if ( unlikely(cb) )
+            cb(data);
         arch_lock_relax();
     }
     LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
     arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+     _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)                  _spin_lock(l)
+#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)              _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
     1 : ({ local_irq_restore(flags); 0; });     \
 })
 
+#define spin_lock_kick(l)                       \
+({                                              \
+    smp_mb();                                   \
+    arch_lock_signal();                         \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)               _spin_barrier(l)
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-27 19:28   ` Jan Beulich
  2017-06-22 18:57 ` [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Fixed off-by-one error in setting first_dirty
* Changed struct page_info.u.free to a union to permit use of ACCESS_ONCE in
  check_and_stop_scrub()
* Renamed PAGE_SCRUBBING etc. macros to BUDDY_SCRUBBING etc

 xen/common/page_alloc.c  | 105 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/mm.h |  28 ++++++++-----
 xen/include/asm-x86/mm.h |  29 ++++++++-----
 3 files changed, 138 insertions(+), 24 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 4e2775f..f0e5399 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -687,6 +687,7 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
 {
     PFN_ORDER(pg) = order;
     pg->u.free.first_dirty = first_dirty;
+    pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
 
     if ( first_dirty != INVALID_DIRTY_IDX )
         page_list_add_tail(pg, &heap(node, zone, order));
@@ -694,6 +695,25 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
+    {
+        struct page_info pg;
+
+        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
+        spin_lock_kick();
+        for ( ; ; )
+        {
+            /* Can't ACCESS_ONCE() a bitfield. */
+            pg.u.free.val = ACCESS_ONCE(head->u.free.val);
+            if ( pg.u.free.scrub_state != BUDDY_SCRUB_ABORT )
+                break;
+            cpu_relax();
+        }
+    }
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
                                         unsigned int zone_hi,
                                         unsigned int order, unsigned int memflags,
@@ -738,14 +758,19 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
+                    if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                        return pg;
                     /*
                      * We grab single pages (order=0) even if they are
                      * unscrubbed. Given that scrubbing one page is fairly quick
                      * it is not worth breaking higher orders.
                      */
-                    if ( (order == 0) || use_unscrubbed ||
-                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                    if ( (order == 0) || use_unscrubbed )
+                    {
+                        check_and_stop_scrub(pg);
                         return pg;
+                    }
+
                     page_list_add_tail(pg, &heap(node, zone, j));
                 }
             }
@@ -928,6 +953,7 @@ static int reserve_offlined_page(struct page_info *head)
 
     cur_head = head;
 
+    check_and_stop_scrub(head);
     /*
      * We may break the buddy so let's mark the head as clean. Then, when
      * merging chunks back into the heap, we will see whether the chunk has
@@ -1084,6 +1110,29 @@ static unsigned int node_to_scrub(bool get_node)
     return closest;
 }
 
+struct scrub_wait_state {
+    struct page_info *pg;
+    unsigned int first_dirty;
+    bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+    struct scrub_wait_state *st = data;
+
+    if ( st->drop )
+        return;
+
+    if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+    {
+        /* There is a waiter for this buddy. Release it. */
+        st->drop = true;
+        st->pg->u.free.first_dirty = st->first_dirty;
+        smp_wmb();
+        st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+    }
+}
+
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
@@ -1106,25 +1155,53 @@ bool scrub_free_pages(void)
         do {
             while ( !page_list_empty(&heap(node, zone, order)) )
             {
-                unsigned int i;
+                unsigned int i, dirty_cnt;
+                struct scrub_wait_state st;
 
                 /* Unscrubbed pages are always at the end of the list. */
                 pg = page_list_last(&heap(node, zone, order));
                 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                     break;
 
+                ASSERT(!pg->u.free.scrub_state);
+                pg->u.free.scrub_state = BUDDY_SCRUBBING;
+
+                spin_unlock(&heap_lock);
+
+                dirty_cnt = 0;
+
                 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
                 {
                     if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                     {
                         scrub_one_page(&pg[i]);
+                        /*
+                         * We can modify count_info without holding heap
+                         * lock since we effectively locked this buddy by
+                         * setting its scrub_state.
+                         */
                         pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        dirty_cnt++;
                         cnt += 100; /* scrubbed pages add heavier weight. */
                     }
                     else
                         cnt++;
 
+                    if ( pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+                    {
+                        /* Someone wants this chunk. Drop everything. */
+
+                        pg->u.free.first_dirty = (i == (1U << order) - 1) ?
+                            INVALID_DIRTY_IDX : i + 1; 
+                        smp_wmb();
+                        pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
+                        spin_lock(&heap_lock);
+                        node_need_scrub[node] -= dirty_cnt;
+                        spin_unlock(&heap_lock);
+                        goto out_nolock;
+                    }
+
                     /*
                      * Scrub a few (8) pages before becoming eligible for
                      * preemption. But also count non-scrubbing loop iterations
@@ -1138,6 +1215,17 @@ bool scrub_free_pages(void)
                     }
                 }
 
+                st.pg = pg;
+                st.first_dirty = (i >= (1UL << order) - 1) ?
+                    INVALID_DIRTY_IDX : i + 1;
+                st.drop = false;
+                spin_lock_cb(&heap_lock, scrub_continue, &st);
+
+                node_need_scrub[node] -= dirty_cnt;
+
+                if ( st.drop )
+                    goto out;
+
                 if ( i >= (1U << order) - 1 )
                 {
                     page_list_del(pg, &heap(node, zone, order));
@@ -1146,6 +1234,8 @@ bool scrub_free_pages(void)
                 else
                     pg->u.free.first_dirty = i + 1;
 
+                pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
                 if ( preempt || (node_need_scrub[node] == 0) )
                     goto out;
             }
@@ -1154,6 +1244,8 @@ bool scrub_free_pages(void)
 
  out:
     spin_unlock(&heap_lock);
+
+ out_nolock:
     node_clear(node, node_scrubbing);
     return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
 }
@@ -1235,6 +1327,8 @@ static void free_heap_pages(
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
+            check_and_stop_scrub(predecessor);
+
             page_list_del(predecessor, &heap(node, zone, order));
 
             if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
@@ -1256,6 +1350,9 @@ static void free_heap_pages(
                  (PFN_ORDER(successor) != order) ||
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
+
+            check_and_stop_scrub(successor);
+
             page_list_del(successor, &heap(node, zone, order));
 
             if ( successor->u.free.first_dirty != INVALID_DIRTY_IDX )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 889a85e..625aa16 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -42,18 +42,26 @@ struct page_info
             unsigned long type_info;
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            unsigned long need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more than maximum possible order (MAX_ORDER+1) to
-             * accommodate INVALID_DIRTY_IDX.
-             */
+        union {
+            struct {
+                /* Do TLBs need flushing for safety before next page use? */
+                unsigned long need_tlbflush:1;
+
+                /*
+                 * Index of the first *possibly* unscrubbed page in the buddy.
+                 * One more than maximum possible order (MAX_ORDER+1) to
+                 * accommodate INVALID_DIRTY_IDX.
+                 */
 #define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
-            unsigned long first_dirty:MAX_ORDER + 2;
+                unsigned long first_dirty:MAX_ORDER + 2;
+
+#define BUDDY_NOT_SCRUBBING   0
+#define BUDDY_SCRUBBING       1
+#define BUDDY_SCRUB_ABORT     2
+                unsigned long scrub_state:2;
+            };
 
+           unsigned long val;
         } free;
 
     } u;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index cd00bef..db6f3a5 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -86,17 +86,26 @@ struct page_info
         } sh;
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            unsigned long need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more than maximum possible order (MAX_ORDER+1) to
-             * accommodate INVALID_DIRTY_IDX.
-             */
+        union {
+	    struct {
+		/* Do TLBs need flushing for safety before next page use? */
+		unsigned long need_tlbflush:1;
+
+		/*
+		 * Index of the first *possibly* unscrubbed page in the buddy.
+		 * One more than maximum possible order (MAX_ORDER+1) to
+		 * accommodate INVALID_DIRTY_IDX.
+		 */
 #define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
-            unsigned long first_dirty:MAX_ORDER + 2;
+		unsigned long first_dirty:MAX_ORDER + 2;
+
+#define BUDDY_NOT_SCRUBBING   0
+#define BUDDY_SCRUBBING       1
+#define BUDDY_SCRUB_ABORT     2
+		unsigned long scrub_state:2;
+	    };
+
+	    unsigned long val;
         } free;
 
     } u;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-22 18:57 ` [PATCH v5 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
  2017-06-23  9:36 ` [PATCH v5 0/8] Memory scrubbing from idle loop Jan Beulich
  8 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index f0e5399..da5ffc2 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2315,6 +2315,13 @@ static void dump_heap(unsigned char key)
             printk("heap[node=%d][zone=%d] -> %lu pages\n",
                    i, j, avail[i][j]);
     }
+
+    for ( i = 0; i < MAX_NUMNODES; i++ )
+    {
+        if ( !node_need_scrub[i] )
+            continue;
+        printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+    }
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 8/8] mm: Make sure pages are scrubbed
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
@ 2017-06-22 18:57 ` Boris Ostrovsky
  2017-06-27 19:29   ` Jan Beulich
  2017-06-23  9:36 ` [PATCH v5 0/8] Memory scrubbing from idle loop Jan Beulich
  8 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-22 18:57 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v5:
* Defined SCRUB_PATTERN for NDEBUG
* Style chages


 xen/Kconfig.debug       |  7 ++++++
 xen/common/page_alloc.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..195d504 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
 	  logged in the Xen ring buffer.
 	  If unsure, say N here.
 
+config SCRUB_DEBUG
+	bool "Page scrubbing test"
+	default DEBUG
+	---help---
+	  Verify that pages that need to be scrubbed before being allocated to
+	  a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index da5ffc2..5d50c2a 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool __read_mostly boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -695,6 +699,43 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#ifndef NDEBUG
+#define SCRUB_PATTERN        0xc2c2c2c2c2c2c2c2ULL
+#else
+#define SCRUB_PATTERN        0ULL
+#endif
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+
+    ptr = map_domain_page(mfn);
+    *ptr = ~SCRUB_PATTERN;
+    unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    const uint64_t *ptr;
+    unsigned int i;
+
+    if ( !boot_scrub_done )
+        return;
+
+    ptr = map_domain_page(mfn);
+    for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+        ASSERT(ptr[i] == SCRUB_PATTERN);
+    unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
     if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
@@ -931,6 +972,9 @@ static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
+
+        if ( !(memflags & MEMF_no_scrub) )
+            check_one_page(&pg[i]);
     }
 
     spin_unlock(&heap_lock);
@@ -1294,7 +1338,10 @@ static void free_heap_pages(
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
         if ( need_scrub )
+        {
             pg[i].count_info |= PGC_need_scrub;
+            poison_one_page(&pg[i]);
+        }
     }
 
     avail[node][zone] += 1 << order;
@@ -1656,7 +1703,12 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
+#ifndef CONFIG_SCRUB_DEBUG
         free_heap_pages(pg + i, 0, false);
+#else
+        free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+	
     }
 }
 
@@ -1922,6 +1974,10 @@ void __init scrub_heap_pages(void)
 
     printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+    boot_scrub_done = true;
+#endif
+
     /* Now that the heap is initialized, run checks and set bounds
      * for the low mem virq algorithm. */
     setup_low_mem_virq();
@@ -2195,12 +2251,16 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 
             spin_unlock_recursive(&d->page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
             /*
              * Normally we expect a domain to clear pages before freeing them,
              * if it cares about the secrecy of their contents. However, after
              * a domain has died we assume responsibility for erasure.
              */
             scrub = !!d->is_dying;
+#else
+            scrub = true;
+#endif
         }
         else
         {
@@ -2292,7 +2352,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
     /* Avoid callers relying on allocations returning zeroed pages. */
-    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+    unmap_domain_page(memset(__map_domain_page(pg),
+                             SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
     /* For a production build, clear_page() is the fastest way to scrub. */
     clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-06-22 18:57 ` [PATCH v5 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-06-23  8:36   ` Dario Faggioli
  2017-06-27 18:01   ` Jan Beulich
  1 sibling, 0 replies; 35+ messages in thread
From: Dario Faggioli @ 2017-06-23  8:36 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 847 bytes --]

On Thu, 2017-06-22 at 14:57 -0400, Boris Ostrovsky wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.
> 
> We might come to scrub_free_pages()from idle loop while another CPU
> uses mapcache override, resulting in a fault while trying to do 
> __map_domain_page() in scrub_one_page(). To avoid this, make mapcache
> vcpu override a per-cpu variable.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/8] Memory scrubbing from idle loop
  2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (7 preceding siblings ...)
  2017-06-22 18:57 ` [PATCH v5 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
@ 2017-06-23  9:36 ` Jan Beulich
  2017-06-23 13:11   ` Boris Ostrovsky
  8 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-06-23  9:36 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 22.06.17 at 20:57, <boris.ostrovsky@oracle.com> wrote:
> Deferred:
> * Per-node heap locks. In addition to (presumably) improving performance in
>   general, once they are available we can parallelize scrubbing further by
>   allowing more than one core per node to do idle loop scrubbing.

I don't understand: A per-node lock still calls for just one CPU
doing the scrubbing on that node, in order to not congest the
lock.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/8] Memory scrubbing from idle loop
  2017-06-23  9:36 ` [PATCH v5 0/8] Memory scrubbing from idle loop Jan Beulich
@ 2017-06-23 13:11   ` Boris Ostrovsky
  2017-06-23 13:22     ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-23 13:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 06/23/2017 05:36 AM, Jan Beulich wrote:
>>>> On 22.06.17 at 20:57, <boris.ostrovsky@oracle.com> wrote:
>> Deferred:
>> * Per-node heap locks. In addition to (presumably) improving performance in
>>    general, once they are available we can parallelize scrubbing further by
>>    allowing more than one core per node to do idle loop scrubbing.
> 
> I don't understand: A per-node lock still calls for just one CPU
> doing the scrubbing on that node, in order to not congest the
> lock.


Is this necessarily true? Maybe not allow all cores on a node to scrub 
but I'd think having more than one core do the work may be beneficial. 
Don't forget that actual scrubbing is performed without holding locks. 
We only grab the lock to find dirty buddies in the heap.


-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/8] Memory scrubbing from idle loop
  2017-06-23 13:11   ` Boris Ostrovsky
@ 2017-06-23 13:22     ` Jan Beulich
  2017-06-23 13:29       ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-06-23 13:22 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> On 23.06.17 at 15:11, <boris.ostrovsky@oracle.com> wrote:
> On 06/23/2017 05:36 AM, Jan Beulich wrote:
>>>>> On 22.06.17 at 20:57, <boris.ostrovsky@oracle.com> wrote:
>>> Deferred:
>>> * Per-node heap locks. In addition to (presumably) improving performance in
>>>    general, once they are available we can parallelize scrubbing further by
>>>    allowing more than one core per node to do idle loop scrubbing.
>> 
>> I don't understand: A per-node lock still calls for just one CPU
>> doing the scrubbing on that node, in order to not congest the
>> lock.
> 
> 
> Is this necessarily true? Maybe not allow all cores on a node to scrub 
> but I'd think having more than one core do the work may be beneficial. 
> Don't forget that actual scrubbing is performed without holding locks. 
> We only grab the lock to find dirty buddies in the heap.

Hmm, true, but then I still don't see the connection between
breaking up the lock and parallelizing scrubbing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/8] Memory scrubbing from idle loop
  2017-06-23 13:22     ` Jan Beulich
@ 2017-06-23 13:29       ` Boris Ostrovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-06-23 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 06/23/2017 09:22 AM, Jan Beulich wrote:
>>>> On 23.06.17 at 15:11, <boris.ostrovsky@oracle.com> wrote:
>> On 06/23/2017 05:36 AM, Jan Beulich wrote:
>>>>>> On 22.06.17 at 20:57, <boris.ostrovsky@oracle.com> wrote:
>>>> Deferred:
>>>> * Per-node heap locks. In addition to (presumably) improving performance in
>>>>     general, once they are available we can parallelize scrubbing further by
>>>>     allowing more than one core per node to do idle loop scrubbing.
>>>
>>> I don't understand: A per-node lock still calls for just one CPU
>>> doing the scrubbing on that node, in order to not congest the
>>> lock.
>>
>>
>> Is this necessarily true? Maybe not allow all cores on a node to scrub
>> but I'd think having more than one core do the work may be beneficial.
>> Don't forget that actual scrubbing is performed without holding locks.
>> We only grab the lock to find dirty buddies in the heap.
> 
> Hmm, true, but then I still don't see the connection between
> breaking up the lock and parallelizing scrubbing.

With a single heap lock we might indeed start seeing lock contention 
when multiple CPUs from many nodes are scrubbing. Making it per-node 
should lessen the pressure.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-06-22 18:57 ` [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-06-27 17:06   ` Jan Beulich
  2017-07-23  2:00     ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 17:06 UTC (permalink / raw)
  To: xen-devel, boris.ostrovsky
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, tim

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
> I kept node_need_scrub[] as a global array and not a "per-node". I think splitting
> it should be part of making heap_lock a per-node lock, together with increasing
> scrub concurrency by having more than one CPU scrub a node.

Agreed - I hadn't meant my earlier comment to necessarily affect this series.

> @@ -798,11 +814,26 @@ static struct page_info *alloc_heap_pages(
>      return NULL;
>  
>   found: 
> +
> +    if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
> +        first_dirty_pg = pg + pg->u.free.first_dirty;
> +
>      /* We may have to halve the chunk a number of times. */
>      while ( j != order )
>      {
> -        PFN_ORDER(pg) = --j;
> -        page_list_add_tail(pg, &heap(node, zone, j));
> +        unsigned int first_dirty;
> +
> +        if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) )

Despite the various examples of doing it this way, please at least use 1u.

> +        {
> +            if ( pg < first_dirty_pg )
> +                first_dirty = (first_dirty_pg - pg) / sizeof(*pg);

Pointer subtraction already includes the involved division. Otoh I wonder
if you couldn't get away without pointer comparison/subtraction here
altogether.

> @@ -849,13 +880,22 @@ static int reserve_offlined_page(struct page_info *head)
>  {
>      unsigned int node = phys_to_nid(page_to_maddr(head));
>      int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
> -    struct page_info *cur_head;
> +    struct page_info *cur_head, *first_dirty_pg = NULL;
>      int cur_order;
>  
>      ASSERT(spin_is_locked(&heap_lock));
>  
>      cur_head = head;
>  
> +    /*
> +     * We may break the buddy so let's mark the head as clean. Then, when
> +     * merging chunks back into the heap, we will see whether the chunk has
> +     * unscrubbed pages and set its first_dirty properly.
> +     */
> +    if (head->u.free.first_dirty != INVALID_DIRTY_IDX)

Coding style.

> @@ -892,8 +934,25 @@ static int reserve_offlined_page(struct page_info *head)
>              {
>              merge:
>                  /* We don't consider merging outside the head_order. */
> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
> -                PFN_ORDER(cur_head) = cur_order;
> +
> +                /* See if any of the pages indeed need scrubbing. */
> +                if ( first_dirty_pg && (cur_head + (1 << cur_order) > first_dirty_pg) )
> +                {
> +                    if ( cur_head < first_dirty_pg )
> +                        i = (first_dirty_pg - cur_head) / sizeof(*cur_head);
> +                    else
> +                        i = 0;
> +
> +                    for ( ; i < (1 << cur_order); i++ )
> +                        if ( test_bit(_PGC_need_scrub,
> +                                      &cur_head[i].count_info) )
> +                        {
> +                            first_dirty = i;
> +                            break;
> +                        }

Perhaps worth having ASSERT(first_dirty != INVALID_DIRTY_IDX) here? Or are
there cases where ->u.free.first_dirty of a page may be wrong?

> @@ -977,35 +1090,53 @@ static void free_heap_pages(
>  
>          if ( (page_to_mfn(pg) & mask) )
>          {
> +            struct page_info *predecessor = pg - mask;
> +
>              /* Merge with predecessor block? */
> -            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
> -                 !page_state_is(pg-mask, free) ||
> -                 (PFN_ORDER(pg-mask) != order) ||
> -                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
> +            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
> +                 !page_state_is(predecessor, free) ||
> +                 (PFN_ORDER(predecessor) != order) ||
> +                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
>                  break;
> -            pg -= mask;
> -            page_list_del(pg, &heap(node, zone, order));
> +
> +            page_list_del(predecessor, &heap(node, zone, order));
> +
> +            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
> +                need_scrub = true;

I'm afraid I continue to be confused by this: Why does need_scrub depend on
the state of pages not being the subject of the current free operation? I
realize that at this point in the series the path can't be taken yet, but
won't later patches need to rip it out or change it anyway, in which case it
would be better to introduce the then correct check (if any) only there?

> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -88,7 +88,15 @@ struct page_info
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
>              /* Do TLBs need flushing for safety before next page use? */
> -            bool_t need_tlbflush;
> +            unsigned long need_tlbflush:1;
> +
> +            /*
> +             * Index of the first *possibly* unscrubbed page in the buddy.
> +             * One more than maximum possible order (MAX_ORDER+1) to

Why +1 here and hence ...

> +             * accommodate INVALID_DIRTY_IDX.
> +             */
> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
> +            unsigned long first_dirty:MAX_ORDER + 2;

... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
parenthesized (apart from lacking blanks around the shift operator)? I'd
expect you want a value with MAX_ORDER+1 set bits, i.e.
(1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-06-22 18:57 ` [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-06-27 17:59   ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 17:59 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
> This will make code a bit more readable, especially with changes that
> will be introduced in subsequent patches.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Acked-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-06-22 18:57 ` [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-06-27 18:00   ` Jan Beulich
  2017-07-23  2:07     ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 18:00 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);

No need for !! here. But I wonder whether that part of the check is really
useful anyway, considering the sole use ...

>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */
> -        BUG_ON(pg[i].count_info != PGC_state_free);
> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
> +
> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
> +        {
> +            if ( need_scrub )
> +                scrub_one_page(&pg[i]);

... here. If it isn't, I think the local variable isn't warranted either.
If you agree, the thus adjusted patch can have
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(otherwise I'll wait with it to understand the reason first).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-06-22 18:57 ` [PATCH v5 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
  2017-06-23  8:36   ` Dario Faggioli
@ 2017-06-27 18:01   ` Jan Beulich
  2017-07-23  2:14     ` Boris Ostrovsky
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 18:01 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1019,15 +1019,85 @@ static int reserve_offlined_page(struct page_info *head)
>      return count;
>  }
>  
> -static void scrub_free_pages(unsigned int node)
> +static nodemask_t node_scrubbing;
> +
> +/*
> + * If get_node is true this will return closest node that needs to be scrubbed,
> + * with appropriate bit in node_scrubbing set.
> + * If get_node is not set, this will return *a* node that needs to be scrubbed.
> + * node_scrubbing bitmask will no be updated.
> + * If no node needs scrubbing then NUMA_NO_NODE is returned.
> + */
> +static unsigned int node_to_scrub(bool get_node)
>  {
> -    struct page_info *pg;
> -    unsigned int zone;
> +    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
> +    nodeid_t closest = NUMA_NO_NODE;
> +    u8 dist, shortest = 0xff;
>  
> -    ASSERT(spin_is_locked(&heap_lock));
> +    if ( node == NUMA_NO_NODE )
> +        node = 0;
>  
> -    if ( !node_need_scrub[node] )
> -        return;
> +    if ( node_need_scrub[node] &&
> +         (!get_node || !node_test_and_set(node, node_scrubbing)) )
> +        return node;
> +
> +    /*
> +     * See if there are memory-only nodes that need scrubbing and choose
> +     * the closest one.
> +     */
> +    local_node = node;
> +    for ( ; ; )
> +    {
> +        do {
> +            node = cycle_node(node, node_online_map);
> +        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
> +                  (node != local_node) );
> +
> +        if ( node == local_node )
> +            break;
> +
> +        /*
> +         * Grab the node right away. If we find a closer node later we will
> +         * release this one. While there is a chance that another CPU will
> +         * not be able to scrub that node when it is searching for scrub work
> +         * at the same time it will be able to do so next time it wakes up.
> +         * The alternative would be to perform this search under a lock but
> +         * then we'd need to take this lock every time we come in here.
> +         */

I'm fine with your choice of simply explaining your decision, but ...

> +        if ( node_need_scrub[node] )
> +        {
> +            if ( !get_node )
> +                return node;
> +
> +            dist = __node_distance(local_node, node);
> +            if ( (dist < shortest || closest == NUMA_NO_NODE) &&
> +                 !node_test_and_set(node, node_scrubbing) )

... wouldn't the comment better be placed ahead of this if()?

> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
>                          scrub_one_page(&pg[i]);
>                          pg[i].count_info &= ~PGC_need_scrub;
>                          node_need_scrub[node]--;
> +                        cnt += 100; /* scrubbed pages add heavier weight. */
> +                    }
> +                    else
> +                        cnt++;
> +
> +                    /*
> +                     * Scrub a few (8) pages before becoming eligible for
> +                     * preemption. But also count non-scrubbing loop iterations
> +                     * so that we don't get stuck here with an almost clean
> +                     * heap.
> +                     */
> +                    if ( cnt > 800 && softirq_pending(cpu) )
> +                    {
> +                        preempt = true;
> +                        break;
>                      }
>                  }
>  
> -                page_list_del(pg, &heap(node, zone, order));
> -                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
> +                if ( i >= (1U << order) - 1 )
> +                {
> +                    page_list_del(pg, &heap(node, zone, order));
> +                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
> +                }
> +                else
> +                    pg->u.free.first_dirty = i + 1;
>  
> -                if ( node_need_scrub[node] == 0 )
> -                    return;
> +                if ( preempt || (node_need_scrub[node] == 0) )
> +                    goto out;
>              }
>          } while ( order-- != 0 );
>      }
> +
> + out:
> +    spin_unlock(&heap_lock);
> +    node_clear(node, node_scrubbing);
> +    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);

While I can see why you use it here, the softirq_pending() looks sort of
misplaced: While invoking it twice in the caller will look a little odd too,
I still think that's where the check belongs.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
  2017-06-22 18:57 ` [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-06-27 19:28   ` Jan Beulich
  2017-06-27 19:31     ` Jan Beulich
  2017-07-23  2:28     ` Boris Ostrovsky
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 19:28 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
> Changes in v5:
> * Fixed off-by-one error in setting first_dirty
> * Changed struct page_info.u.free to a union to permit use of ACCESS_ONCE in
>   check_and_stop_scrub()

I don't see the need for this:

> +static void check_and_stop_scrub(struct page_info *head)
> +{
> +    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
> +    {
> +        struct page_info pg;
> +
> +        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
> +        spin_lock_kick();
> +        for ( ; ; )
> +        {
> +            /* Can't ACCESS_ONCE() a bitfield. */
> +            pg.u.free.val = ACCESS_ONCE(head->u.free.val);

Something like ACCESS_ONCE(head->u.free).val ought to work (or read_atomic(),
due to the questionable scalar type check in ACCESS_ONCE()).

> @@ -1106,25 +1155,53 @@ bool scrub_free_pages(void)
>          do {
>              while ( !page_list_empty(&heap(node, zone, order)) )
>              {
> -                unsigned int i;
> +                unsigned int i, dirty_cnt;
> +                struct scrub_wait_state st;
>  
>                  /* Unscrubbed pages are always at the end of the list. */
>                  pg = page_list_last(&heap(node, zone, order));
>                  if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
>                      break;
>  
> +                ASSERT(!pg->u.free.scrub_state);

Please use BUDDY_NOT_SCRUBBING here.

> @@ -1138,6 +1215,17 @@ bool scrub_free_pages(void)
>                      }
>                  }
>  
> +                st.pg = pg;
> +                st.first_dirty = (i >= (1UL << order) - 1) ?
> +                    INVALID_DIRTY_IDX : i + 1;

Would you mind explaining to me (again?) why you can't set pg's first_dirty
directly here? In case I'm not mistaken and this has been asked before, maybe
this is a hint that a comment might be warranted.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 8/8] mm: Make sure pages are scrubbed
  2017-06-22 18:57 ` [PATCH v5 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
@ 2017-06-27 19:29   ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 19:29 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
> Add a debug Kconfig option that will make page allocator verify
> that pages that were supposed to be scrubbed are, in fact, clean.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
  2017-06-27 19:28   ` Jan Beulich
@ 2017-06-27 19:31     ` Jan Beulich
  2017-07-23  2:28     ` Boris Ostrovsky
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2017-06-27 19:31 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> "Jan Beulich" <jbeulich@suse.com> 06/27/17 9:29 PM >>>
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
>> +            /* Can't ACCESS_ONCE() a bitfield. */
>> +            pg.u.free.val = ACCESS_ONCE(head->u.free.val);
>
>Something like ACCESS_ONCE(head->u.free).val ought to work (or read_atomic(),
>due to the questionable scalar type check in ACCESS_ONCE()).

And (of course) I did mean .scrub_state here instead of .val.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-06-27 17:06   ` Jan Beulich
@ 2017-07-23  2:00     ` Boris Ostrovsky
  2017-07-31 14:45       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-23  2:00 UTC (permalink / raw)
  To: Jan Beulich, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3, ian.jackson, tim



On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>> I kept node_need_scrub[] as a global array and not a "per-node". I think splitting
>> it should be part of making heap_lock a per-node lock, together with increasing
>> scrub concurrency by having more than one CPU scrub a node.
> 
> Agreed - I hadn't meant my earlier comment to necessarily affect this series.
> 
>> @@ -798,11 +814,26 @@ static struct page_info *alloc_heap_pages(
>>       return NULL;
>>   
>>    found:
>> +
>> +    if ( pg->u.free.first_dirty != INVALID_DIRTY_IDX )
>> +        first_dirty_pg = pg + pg->u.free.first_dirty;
>> +
>>       /* We may have to halve the chunk a number of times. */
>>       while ( j != order )
>>       {
>> -        PFN_ORDER(pg) = --j;
>> -        page_list_add_tail(pg, &heap(node, zone, j));
>> +        unsigned int first_dirty;
>> +
>> +        if ( first_dirty_pg && ((pg + (1 << j)) > first_dirty_pg) )
> 
> Despite the various examples of doing it this way, please at least use 1u.
> 
>> +        {
>> +            if ( pg < first_dirty_pg )
>> +                first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
> 
> Pointer subtraction already includes the involved division. 


Yes, this was a mistake.

> Otoh I wonder
> if you couldn't get away without pointer comparison/subtraction here
> altogether.


Without comparison I can only assume that first_dirty is zero (i.e. the 
whole buddy is potentially dirty). Is there something else I could do?


> 
>> @@ -849,13 +880,22 @@ static int reserve_offlined_page(struct page_info *head)
>>   {
>>       unsigned int node = phys_to_nid(page_to_maddr(head));
>>       int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
>> -    struct page_info *cur_head;
>> +    struct page_info *cur_head, *first_dirty_pg = NULL;
>>       int cur_order;
>>   
>>       ASSERT(spin_is_locked(&heap_lock));
>>   
>>       cur_head = head;
>>   
>> +    /*
>> +     * We may break the buddy so let's mark the head as clean. Then, when
>> +     * merging chunks back into the heap, we will see whether the chunk has
>> +     * unscrubbed pages and set its first_dirty properly.
>> +     */
>> +    if (head->u.free.first_dirty != INVALID_DIRTY_IDX)
> 
> Coding style.
> 
>> @@ -892,8 +934,25 @@ static int reserve_offlined_page(struct page_info *head)
>>               {
>>               merge:
>>                   /* We don't consider merging outside the head_order. */
>> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
>> -                PFN_ORDER(cur_head) = cur_order;
>> +
>> +                /* See if any of the pages indeed need scrubbing. */
>> +                if ( first_dirty_pg && (cur_head + (1 << cur_order) > first_dirty_pg) )
>> +                {
>> +                    if ( cur_head < first_dirty_pg )
>> +                        i = (first_dirty_pg - cur_head) / sizeof(*cur_head);

I assume the same comment as above applies here.

>> +                    else
>> +                        i = 0;
>> +
>> +                    for ( ; i < (1 << cur_order); i++ )
>> +                        if ( test_bit(_PGC_need_scrub,
>> +                                      &cur_head[i].count_info) )
>> +                        {
>> +                            first_dirty = i;
>> +                            break;
>> +                        }
> 
> Perhaps worth having ASSERT(first_dirty != INVALID_DIRTY_IDX) here? Or are
> there cases where ->u.free.first_dirty of a page may be wrong?


When we merge in free_heap_pages we don't clear first_dirty of the 
successor buddy (at some point I did have this done but you questioned 
whether it was needed and I dropped it).


> 
>> @@ -977,35 +1090,53 @@ static void free_heap_pages(
>>   
>>           if ( (page_to_mfn(pg) & mask) )
>>           {
>> +            struct page_info *predecessor = pg - mask;
>> +
>>               /* Merge with predecessor block? */
>> -            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
>> -                 !page_state_is(pg-mask, free) ||
>> -                 (PFN_ORDER(pg-mask) != order) ||
>> -                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
>> +            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
>> +                 !page_state_is(predecessor, free) ||
>> +                 (PFN_ORDER(predecessor) != order) ||
>> +                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
>>                   break;
>> -            pg -= mask;
>> -            page_list_del(pg, &heap(node, zone, order));
>> +
>> +            page_list_del(predecessor, &heap(node, zone, order));
>> +
>> +            if ( predecessor->u.free.first_dirty != INVALID_DIRTY_IDX )
>> +                need_scrub = true;
> 
> I'm afraid I continue to be confused by this: Why does need_scrub depend on
> the state of pages not being the subject of the current free operation? I
> realize that at this point in the series the path can't be taken yet, but
> won't later patches need to rip it out or change it anyway, in which case it
> would be better to introduce the then correct check (if any) only there?


Right, at this point we indeed will never have the 'if' evaluate to true 
since heap is always clean. And when we switch to scrubbing from idle 
need_scrub is never looked at.

I suspect this check will not be needed in the intermediate patches neither.

> 
>> --- a/xen/include/asm-x86/mm.h
>> +++ b/xen/include/asm-x86/mm.h
>> @@ -88,7 +88,15 @@ struct page_info
>>           /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>           struct {
>>               /* Do TLBs need flushing for safety before next page use? */
>> -            bool_t need_tlbflush;
>> +            unsigned long need_tlbflush:1;
>> +
>> +            /*
>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>> +             * One more than maximum possible order (MAX_ORDER+1) to
> 
> Why +1 here and hence ...

Don't we have MAX_ORDER+1 orders?

> 
>> +             * accommodate INVALID_DIRTY_IDX.
>> +             */
>> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
>> +            unsigned long first_dirty:MAX_ORDER + 2;
> 
> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
> parenthesized (apart from lacking blanks around the shift operator)? I'd
> expect you want a value with MAX_ORDER+1 set bits, i.e.
> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.

Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-06-27 18:00   ` Jan Beulich
@ 2017-07-23  2:07     ` Boris Ostrovsky
  2017-07-31 15:16       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-23  2:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>       if ( d != NULL )
>>           d->last_alloc_node = node;
>>   
>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
> 
> No need for !! here. But I wonder whether that part of the check is really
> useful anyway, considering the sole use ...
> 
>>       for ( i = 0; i < (1 << order); i++ )
>>       {
>>           /* Reference count must continuously be zero for free pages. */
>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>> +
>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>> +        {
>> +            if ( need_scrub )
>> +                scrub_one_page(&pg[i]);
> 
> ... here. If it isn't, I think the local variable isn't warranted either.
> If you agree, the thus adjusted patch can have
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> (otherwise I'll wait with it to understand the reason first).



first_dirty_pg is indeed unnecessary but I think local variable is 
useful to avoid ANDing memflags inside the loop on each iteration 
(unless you think compiler is smart enough to realize that memflags is 
not changing).


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-06-27 18:01   ` Jan Beulich
@ 2017-07-23  2:14     ` Boris Ostrovsky
  2017-07-31 15:20       ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-23  2:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel


> 
>> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
>>                           scrub_one_page(&pg[i]);
>>                           pg[i].count_info &= ~PGC_need_scrub;
>>                           node_need_scrub[node]--;
>> +                        cnt += 100; /* scrubbed pages add heavier weight. */
>> +                    }
>> +                    else
>> +                        cnt++;
>> +
>> +                    /*
>> +                     * Scrub a few (8) pages before becoming eligible for
>> +                     * preemption. But also count non-scrubbing loop iterations
>> +                     * so that we don't get stuck here with an almost clean
>> +                     * heap.
>> +                     */
>> +                    if ( cnt > 800 && softirq_pending(cpu) )
>> +                    {
>> +                        preempt = true;
>> +                        break;
>>                       }
>>                   }
>>   
>> -                page_list_del(pg, &heap(node, zone, order));
>> -                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
>> +                if ( i >= (1U << order) - 1 )
>> +                {
>> +                    page_list_del(pg, &heap(node, zone, order));
>> +                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
>> +                }
>> +                else
>> +                    pg->u.free.first_dirty = i + 1;
>>   
>> -                if ( node_need_scrub[node] == 0 )
>> -                    return;
>> +                if ( preempt || (node_need_scrub[node] == 0) )
>> +                    goto out;
>>               }
>>           } while ( order-- != 0 );
>>       }
>> +
>> + out:
>> +    spin_unlock(&heap_lock);
>> +    node_clear(node, node_scrubbing);
>> +    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
> 
> While I can see why you use it here, the softirq_pending() looks sort of
> misplaced: While invoking it twice in the caller will look a little odd too,
> I still think that's where the check belongs.


scrub_free_pages is called from idle loop as

	else if ( !softirq_pending(cpu) && !scrub_free_pages() )
             pm_idle();

so softirq_pending() is unnecessary here.

(Not sure why you are saying it would be invoked twice)

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
  2017-06-27 19:28   ` Jan Beulich
  2017-06-27 19:31     ` Jan Beulich
@ 2017-07-23  2:28     ` Boris Ostrovsky
  2017-08-02  8:34       ` Jan Beulich
  1 sibling, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-23  2:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel



On 06/27/2017 03:28 PM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
>> Changes in v5:
>> * Fixed off-by-one error in setting first_dirty
>> * Changed struct page_info.u.free to a union to permit use of ACCESS_ONCE in
>>    check_and_stop_scrub()
> 
> I don't see the need for this:
> 
>> +static void check_and_stop_scrub(struct page_info *head)
>> +{
>> +    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>> +    {
>> +        struct page_info pg;
>> +
>> +        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
>> +        spin_lock_kick();
>> +        for ( ; ; )
>> +        {
>> +            /* Can't ACCESS_ONCE() a bitfield. */
>> +            pg.u.free.val = ACCESS_ONCE(head->u.free.val);
> 
> Something like ACCESS_ONCE(head->u.free).val ought to work (or read_atomic(),
> due to the questionable scalar type check in ACCESS_ONCE()).

Hmm... I couldn't get this to work with either suggestion.

page_alloc.c:751:13: error: conversion to non-scalar type requested
              pg.u.free = read_atomic(&head->u.free);

page_alloc.c:753:6: error: conversion to non-scalar type requested
       if ( ACCESS_ONCE(head->u.free).scrub_state != BUDDY_SCRUB_ABORT )
> 
>> @@ -1106,25 +1155,53 @@ bool scrub_free_pages(void)
>>           do {
>>               while ( !page_list_empty(&heap(node, zone, order)) )
>>               {
>> -                unsigned int i;
>> +                unsigned int i, dirty_cnt;
>> +                struct scrub_wait_state st;
>>   
>>                   /* Unscrubbed pages are always at the end of the list. */
>>                   pg = page_list_last(&heap(node, zone, order));
>>                   if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
>>                       break;
>>   
>> +                ASSERT(!pg->u.free.scrub_state);
> 
> Please use BUDDY_NOT_SCRUBBING here.
> 
>> @@ -1138,6 +1215,17 @@ bool scrub_free_pages(void)
>>                       }
>>                   }
>>   
>> +                st.pg = pg;
>> +                st.first_dirty = (i >= (1UL << order) - 1) ?
>> +                    INVALID_DIRTY_IDX : i + 1;
> 
> Would you mind explaining to me (again?) why you can't set pg's first_dirty
> directly here? In case I'm not mistaken and this has been asked before, maybe
> this is a hint that a comment might be warranted.


In get_free_buddy() (formerly part of alloc_heap_pages()) I have

            /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
                     if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                         return pg;
                     /*
                      * We grab single pages (order=0) even if they are
                      * unscrubbed. Given that scrubbing one page is 
fairly quick
                      * it is not worth breaking higher orders.
                      */
                     if ( (order == 0) || use_unscrubbed )
                     {
                         check_and_stop_scrub(pg);
                         return pg;
                     }


If first_dirty gets assigned INVALID_DIRTY_IDX then get_free_buddy() 
will return pg right away without telling the scrubber that the buddy 
has been taken for use. The scrubber will then put the buddy back on the 
heap.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-07-23  2:00     ` Boris Ostrovsky
@ 2017-07-31 14:45       ` Jan Beulich
  2017-07-31 16:03         ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-07-31 14:45 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:01 AM >>>
>On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>> +        {
>>> +            if ( pg < first_dirty_pg )
>>> +                first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
>> 
>> Pointer subtraction already includes the involved division. 
>
>
>Yes, this was a mistake.
>
>> Otoh I wonder
>> if you couldn't get away without pointer comparison/subtraction here
>> altogether.
>
>
>Without comparison I can only assume that first_dirty is zero (i.e. the 
>whole buddy is potentially dirty). Is there something else I could do?

I was thinking of tracking indexes instead of pointers. But maybe that
would more hamper readability of the overall result than help it.
 
>>> @@ -892,8 +934,25 @@ static int reserve_offlined_page(struct page_info *head)
>>>               {
>>>               merge:
>>>                   /* We don't consider merging outside the head_order. */
>>> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
>>> -                PFN_ORDER(cur_head) = cur_order;
>>> +
>>> +                /* See if any of the pages indeed need scrubbing. */
>>> +                if ( first_dirty_pg && (cur_head + (1 << cur_order) > first_dirty_pg) )
>>> +                {
>>> +                    if ( cur_head < first_dirty_pg )
>>> +                        i = (first_dirty_pg - cur_head) / sizeof(*cur_head);
>
>I assume the same comment as above applies here.

Of course. I usually avoid repeating the same comment, except maybe
when reviewing patches of first time contributors.

>>> +                    else
>>> +                        i = 0;
>>> +
>>> +                    for ( ; i < (1 << cur_order); i++ )
>>> +                        if ( test_bit(_PGC_need_scrub,
>>> +                                      &cur_head[i].count_info) )
>>> +                        {
>>> +                            first_dirty = i;
>>> +                            break;
>>> +                        }
>> 
>> Perhaps worth having ASSERT(first_dirty != INVALID_DIRTY_IDX) here? Or are
>> there cases where ->u.free.first_dirty of a page may be wrong?
>
>
>When we merge in free_heap_pages we don't clear first_dirty of the 
>successor buddy (at some point I did have this done but you questioned 
>whether it was needed and I dropped it).

Hmm, this indeed answers my question, but doesn't help (me) understanding
whether the suggested ASSERT() could be wrong.

>>> --- a/xen/include/asm-x86/mm.h
>>> +++ b/xen/include/asm-x86/mm.h
>>> @@ -88,7 +88,15 @@ struct page_info
>>>           /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>           struct {
>>>               /* Do TLBs need flushing for safety before next page use? */
>>> -            bool_t need_tlbflush;
>>> +            unsigned long need_tlbflush:1;
>>> +
>>> +            /*
>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>> +             * One more than maximum possible order (MAX_ORDER+1) to
>> 
>> Why +1 here and hence ...
>
>Don't we have MAX_ORDER+1 orders?

So here there might be a simple misunderstanding: I understand the
parenthesized MAX_ORDER+1 to represent "maximum possible
order", i.e. excluding the "one more than", not the least because of
the ...

>> +             * accommodate INVALID_DIRTY_IDX.
>> +             */
>> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
>> +            unsigned long first_dirty:MAX_ORDER + 2;

+2 here.

>> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
>> parenthesized (apart from lacking blanks around the shift operator)? I'd
>> expect you want a value with MAX_ORDER+1 set bits, i.e.
>> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
>
>Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1

I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
here.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-07-23  2:07     ` Boris Ostrovsky
@ 2017-07-31 15:16       ` Jan Beulich
  2017-07-31 16:07         ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-07-31 15:16 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:07 AM >>>
>On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>>       if ( d != NULL )
>>>           d->last_alloc_node = node;
>>>   
>>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
>> 
>> No need for !! here. But I wonder whether that part of the check is really
>> useful anyway, considering the sole use ...
>> 
>>>       for ( i = 0; i < (1 << order); i++ )
>>>       {
>>>           /* Reference count must continuously be zero for free pages. */
>>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>>> +
>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>> +        {
>>> +            if ( need_scrub )
>>> +                scrub_one_page(&pg[i]);
>> 
>> ... here. If it isn't, I think the local variable isn't warranted either.
>> If you agree, the thus adjusted patch can have
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> (otherwise I'll wait with it to understand the reason first).
>
>first_dirty_pg is indeed unnecessary but I think local variable is 
>useful to avoid ANDing memflags inside the loop on each iteration 
>(unless you think compiler is smart enough to realize that memflags is 
>not changing).

I don't understand: At least on x86 I'd expect the compiler to use a
single TEST if you used memflags inside the loop, whereas the local
variable would likely be a single CMP inside the loop plus setup code
outside of it.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-07-23  2:14     ` Boris Ostrovsky
@ 2017-07-31 15:20       ` Jan Beulich
  2017-07-31 16:15         ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-07-31 15:20 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:14 AM >>>
>>> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
>>> -                if ( node_need_scrub[node] == 0 )
>>> -                    return;
>>> +                if ( preempt || (node_need_scrub[node] == 0) )
>>> +                    goto out;
>>>               }
>>>           } while ( order-- != 0 );
>>>       }
>>> +
>>> + out:
>>> +    spin_unlock(&heap_lock);
>>> +    node_clear(node, node_scrubbing);
>>> +    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
>> 
>> While I can see why you use it here, the softirq_pending() looks sort of
>> misplaced: While invoking it twice in the caller will look a little odd too,
>> I still think that's where the check belongs.
>
>
>scrub_free_pages is called from idle loop as
>
>else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>pm_idle();
>
>so softirq_pending() is unnecessary here.
>
>(Not sure why you are saying it would be invoked twice)

That was sort of implicit - the caller would want to become


    else if ( !softirq_pending(cpu) && !scrub_free_pages() && !softirq_pending(cpu) )
    pm_idle();

to account for the fact that a softirq may become pending while scrubbing.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-07-31 14:45       ` Jan Beulich
@ 2017-07-31 16:03         ` Boris Ostrovsky
  2017-08-02  9:24           ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-31 16:03 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 07/31/2017 10:45 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:01 AM >>>
>> On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>>> +        {
>>>> +            if ( pg < first_dirty_pg )
>>>> +                first_dirty = (first_dirty_pg - pg) / sizeof(*pg);
>>> Pointer subtraction already includes the involved division. 
>>
>> Yes, this was a mistake.
>>
>>> Otoh I wonder
>>> if you couldn't get away without pointer comparison/subtraction here
>>> altogether.
>>
>> Without comparison I can only assume that first_dirty is zero (i.e. the 
>> whole buddy is potentially dirty). Is there something else I could do?
> I was thinking of tracking indexes instead of pointers. But maybe that
> would more hamper readability of the overall result than help it.

I'll try to see how it looks.

>  
>>>> +                    else
>>>> +                        i = 0;
>>>> +
>>>> +                    for ( ; i < (1 << cur_order); i++ )
>>>> +                        if ( test_bit(_PGC_need_scrub,
>>>> +                                      &cur_head[i].count_info) )
>>>> +                        {
>>>> +                            first_dirty = i;
>>>> +                            break;
>>>> +                        }
>>> Perhaps worth having ASSERT(first_dirty != INVALID_DIRTY_IDX) here? Or are
>>> there cases where ->u.free.first_dirty of a page may be wrong?
>>
>> When we merge in free_heap_pages we don't clear first_dirty of the 
>> successor buddy (at some point I did have this done but you questioned 
>> whether it was needed and I dropped it).
> Hmm, this indeed answers my question, but doesn't help (me) understanding
> whether the suggested ASSERT() could be wrong.

Oh, I see what you were asking --- ASSERT() *after* the loop, to make
sure we indeed found the first dirty page. Yes, I will add it.

>
>>>> --- a/xen/include/asm-x86/mm.h
>>>> +++ b/xen/include/asm-x86/mm.h
>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>           /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>           struct {
>>>>               /* Do TLBs need flushing for safety before next page use? */
>>>> -            bool_t need_tlbflush;
>>>> +            unsigned long need_tlbflush:1;
>>>> +
>>>> +            /*
>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>> +             * One more than maximum possible order (MAX_ORDER+1) to
>>> Why +1 here and hence ...
>> Don't we have MAX_ORDER+1 orders?
> So here there might be a simple misunderstanding: I understand the
> parenthesized MAX_ORDER+1 to represent "maximum possible
> order", i.e. excluding the "one more than", not the least because of
> the ...
>
>>> +             * accommodate INVALID_DIRTY_IDX.
>>> +             */
>>> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
>>> +            unsigned long first_dirty:MAX_ORDER + 2;
> +2 here.
>
>>> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
>>> parenthesized (apart from lacking blanks around the shift operator)? I'd
>>> expect you want a value with MAX_ORDER+1 set bits, i.e.
>>> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
>> Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1
> I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
> here.


Sorry, I still don't get it.

Say, MAX_ORDER is 1. Since this implies that indexes 0, 1, 2 and 3 are
all valid (because we can have up to 2^(MAX_ORDER+1) pages), don't we
need 3 bits to indicate an invalid index?

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-07-31 15:16       ` Jan Beulich
@ 2017-07-31 16:07         ` Boris Ostrovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-31 16:07 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 07/31/2017 11:16 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:07 AM >>>
>> On 06/27/2017 02:00 PM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>>> @@ -862,10 +879,19 @@ static struct page_info *alloc_heap_pages(
>>>>       if ( d != NULL )
>>>>           d->last_alloc_node = node;
>>>>   
>>>> +    need_scrub = !!first_dirty_pg && !(memflags & MEMF_no_scrub);
>>> No need for !! here. But I wonder whether that part of the check is really
>>> useful anyway, considering the sole use ...
>>>
>>>>       for ( i = 0; i < (1 << order); i++ )
>>>>       {
>>>>           /* Reference count must continuously be zero for free pages. */
>>>> -        BUG_ON(pg[i].count_info != PGC_state_free);
>>>> +        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
>>>> +
>>>> +        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
>>>> +        {
>>>> +            if ( need_scrub )
>>>> +                scrub_one_page(&pg[i]);
>>> ... here. If it isn't, I think the local variable isn't warranted either.
>>> If you agree, the thus adjusted patch can have
>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> (otherwise I'll wait with it to understand the reason first).
>> first_dirty_pg is indeed unnecessary but I think local variable is 
>> useful to avoid ANDing memflags inside the loop on each iteration 
>> (unless you think compiler is smart enough to realize that memflags is 
>> not changing).
> I don't understand: At least on x86 I'd expect the compiler to use a
> single TEST if you used memflags inside the loop, whereas the local
> variable would likely be a single CMP inside the loop plus setup code
> outside of it.

OK, I haven't considered that you don't actually need to AND and then
CMP. Then yes, local variable is unnecessary.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-07-31 15:20       ` Jan Beulich
@ 2017-07-31 16:15         ` Boris Ostrovsky
  2017-08-02  9:27           ` Jan Beulich
  0 siblings, 1 reply; 35+ messages in thread
From: Boris Ostrovsky @ 2017-07-31 16:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel

On 07/31/2017 11:20 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:14 AM >>>
>>>> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
>>>> -                if ( node_need_scrub[node] == 0 )
>>>> -                    return;
>>>> +                if ( preempt || (node_need_scrub[node] == 0) )
>>>> +                    goto out;
>>>>               }
>>>>           } while ( order-- != 0 );
>>>>       }
>>>> +
>>>> + out:
>>>> +    spin_unlock(&heap_lock);
>>>> +    node_clear(node, node_scrubbing);
>>>> +    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
>>> While I can see why you use it here, the softirq_pending() looks sort of
>>> misplaced: While invoking it twice in the caller will look a little odd too,
>>> I still think that's where the check belongs.
>>
>> scrub_free_pages is called from idle loop as
>>
>> else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>> pm_idle();
>>
>> so softirq_pending() is unnecessary here.
>>
>> (Not sure why you are saying it would be invoked twice)
> That was sort of implicit - the caller would want to become
>
>
>     else if ( !softirq_pending(cpu) && !scrub_free_pages() && !softirq_pending(cpu) )
>     pm_idle();
>
> to account for the fact that a softirq may become pending while scrubbing.

That would look really odd IMO.

Would

else if ( !softirq_pending(cpu) )
    if ( !scrub_free_pages() && !softirq_pending(cpu) )
       pm_idle();

or 

else if ( !softirq_pending(cpu) && !scrub_free_pages() )
    if ( !softirq_pending(cpu) )
        pm_idle();

 
be better? (I'd prefer the first)

-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing
  2017-07-23  2:28     ` Boris Ostrovsky
@ 2017-08-02  8:34       ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2017-08-02  8:34 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:28 AM >>>
>On 06/27/2017 03:28 PM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:56 PM >>>
>>> +static void check_and_stop_scrub(struct page_info *head)
>>> +{
>>> +    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>>> +    {
>>> +        struct page_info pg;
>>> +
>>> +        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
>>> +        spin_lock_kick();
>>> +        for ( ; ; )
>>> +        {
>>> +            /* Can't ACCESS_ONCE() a bitfield. */
>>> +            pg.u.free.val = ACCESS_ONCE(head->u.free.val);
>> 
>> Something like ACCESS_ONCE(head->u.free).val ought to work (or read_atomic(),
>> due to the questionable scalar type check in ACCESS_ONCE()).
>
>Hmm... I couldn't get this to work with either suggestion.
>
>page_alloc.c:751:13: error: conversion to non-scalar type requested
>pg.u.free = read_atomic(&head->u.free);
>
>page_alloc.c:753:6: error: conversion to non-scalar type requested
>if ( ACCESS_ONCE(head->u.free).scrub_state != BUDDY_SCRUB_ABORT )

Oh, indeed. That's rather unfortunate.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-07-31 16:03         ` Boris Ostrovsky
@ 2017-08-02  9:24           ` Jan Beulich
  2017-08-02 15:31             ` Boris Ostrovsky
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Beulich @ 2017-08-02  9:24 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/31/17 6:03 PM >>>
On 07/31/2017 10:45 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:01 AM >>>
>> On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>>>> --- a/xen/include/asm-x86/mm.h
>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>           /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>           struct {
>>>>>               /* Do TLBs need flushing for safety before next page use? */
>>>>> -            bool_t need_tlbflush;
>>>>> +            unsigned long need_tlbflush:1;
>>>>> +
>>>>> +            /*
>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>> +             * One more than maximum possible order (MAX_ORDER+1) to
>>>> Why +1 here and hence ...
>>> Don't we have MAX_ORDER+1 orders?
>> So here there might be a simple misunderstanding: I understand the
>> parenthesized MAX_ORDER+1 to represent "maximum possible
>> order", i.e. excluding the "one more than", not the least because of
>> the ...
>>
>>>> +             * accommodate INVALID_DIRTY_IDX.
>>>> +             */
>>>> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
>>>> +            unsigned long first_dirty:MAX_ORDER + 2;
>> +2 here.
>>
>>>> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
>>>> parenthesized (apart from lacking blanks around the shift operator)? I'd
>>>> expect you want a value with MAX_ORDER+1 set bits, i.e.
>>>> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
>>> Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1
>> I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
>> here.
>
>
>Sorry, I still don't get it.
>
>Say, MAX_ORDER is 1. Since this implies that indexes 0, 1, 2 and 3 are
>all valid (because we can have up to 2^(MAX_ORDER+1) pages), don't we
>need 3 bits to indicate an invalid index?

Why 0, 1, 2, and 3? When MAX_ORDER is 1, we only have a single bit, i.e.
valid values 0 and 1 (plus one more for the invalid indicator), i.e. need 2 bits
for representation of all used values.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/8] mm: Scrub memory from idle loop
  2017-07-31 16:15         ` Boris Ostrovsky
@ 2017-08-02  9:27           ` Jan Beulich
  0 siblings, 0 replies; 35+ messages in thread
From: Jan Beulich @ 2017-08-02  9:27 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/31/17 6:16 PM >>>
>On 07/31/2017 11:20 AM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:14 AM >>>
>>>>> @@ -1050,17 +1120,42 @@ static void scrub_free_pages(unsigned int node)
>>>>> -                if ( node_need_scrub[node] == 0 )
>>>>> -                    return;
>>>>> +                if ( preempt || (node_need_scrub[node] == 0) )
>>>>> +                    goto out;
>>>>>               }
>>>>>           } while ( order-- != 0 );
>>>>>       }
>>>>> +
>>>>> + out:
>>>>> +    spin_unlock(&heap_lock);
>>>>> +    node_clear(node, node_scrubbing);
>>>>> +    return softirq_pending(cpu) || (node_to_scrub(false) != NUMA_NO_NODE);
>>>> While I can see why you use it here, the softirq_pending() looks sort of
>>>> misplaced: While invoking it twice in the caller will look a little odd too,
>>>> I still think that's where the check belongs.
>>>
>>> scrub_free_pages is called from idle loop as
>>>
>>> else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>>> pm_idle();
>>>
>>> so softirq_pending() is unnecessary here.
>>>
>>> (Not sure why you are saying it would be invoked twice)
>> That was sort of implicit - the caller would want to become
>>
>>
>>     else if ( !softirq_pending(cpu) && !scrub_free_pages() && !softirq_pending(cpu) )
>>     pm_idle();
>>
>> to account for the fact that a softirq may become pending while scrubbing.
>
>That would look really odd IMO.
>
>Would
>
>else if ( !softirq_pending(cpu) )
>if ( !scrub_free_pages() && !softirq_pending(cpu) )
>pm_idle();
>
>or 
>
>else if ( !softirq_pending(cpu) && !scrub_free_pages() )
>if ( !softirq_pending(cpu) )
>pm_idle();
>
>
>be better? (I'd prefer the first)

I dislike both (as we always as people to fold such chained if()s), and hence
would prefer my variant plus a suitable comment explaining the oddity.

Jan





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-02  9:24           ` Jan Beulich
@ 2017-08-02 15:31             ` Boris Ostrovsky
  0 siblings, 0 replies; 35+ messages in thread
From: Boris Ostrovsky @ 2017-08-02 15:31 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 08/02/2017 05:24 AM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/31/17 6:03 PM >>>
> On 07/31/2017 10:45 AM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 07/23/17 4:01 AM >>>
>>> On 06/27/2017 01:06 PM, Jan Beulich wrote:
>>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 06/22/17 8:55 PM >>>
>>>>>> --- a/xen/include/asm-x86/mm.h
>>>>>> +++ b/xen/include/asm-x86/mm.h
>>>>>> @@ -88,7 +88,15 @@ struct page_info
>>>>>>           /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>>>>>>           struct {
>>>>>>               /* Do TLBs need flushing for safety before next page use? */
>>>>>> -            bool_t need_tlbflush;
>>>>>> +            unsigned long need_tlbflush:1;
>>>>>> +
>>>>>> +            /*
>>>>>> +             * Index of the first *possibly* unscrubbed page in the buddy.
>>>>>> +             * One more than maximum possible order (MAX_ORDER+1) to
>>>>> Why +1 here and hence ...
>>>> Don't we have MAX_ORDER+1 orders?
>>> So here there might be a simple misunderstanding: I understand the
>>> parenthesized MAX_ORDER+1 to represent "maximum possible
>>> order", i.e. excluding the "one more than", not the least because of
>>> the ...
>>>
>>>>> +             * accommodate INVALID_DIRTY_IDX.
>>>>> +             */
>>>>> +#define INVALID_DIRTY_IDX (-1UL & (((1UL<<MAX_ORDER) + 2) - 1))
>>>>> +            unsigned long first_dirty:MAX_ORDER + 2;
>>> +2 here.
>>>
>>>>> ... why +2 instead of +1? And isn't the expression INVALID_DIRTY_IDX wrongly
>>>>> parenthesized (apart from lacking blanks around the shift operator)? I'd
>>>>> expect you want a value with MAX_ORDER+1 set bits, i.e.
>>>>> (1UL << (MAX_ORDER + 1)) - 1. ANDing with -1UL seems quite pointless too.
>>>> Yes to parentheses and AND. Should be (1UL << (MAX_ORDER + 2)) - 1
>>> I.e. I would still expect it to be (1UL << (MAX_ORDER + 1)) - 1
>>> here.
>>
>> Sorry, I still don't get it.
>>
>> Say, MAX_ORDER is 1. Since this implies that indexes 0, 1, 2 and 3 are
>> all valid (because we can have up to 2^(MAX_ORDER+1) pages), don't we
>> need 3 bits to indicate an invalid index?
> Why 0, 1, 2, and 3? 

Of course, it's 0 and 1 only. MAX_ORDER+1 completely threw me off.

-boris

> When MAX_ORDER is 1, we only have a single bit, i.e.
> valid values 0 and 1 (plus one more for the invalid indicator), i.e. need 2 bits
> for representation of all used values.
>
> Jan
>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2017-08-02 15:31 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-22 18:57 [PATCH v5 0/8] Memory scrubbing from idle loop Boris Ostrovsky
2017-06-22 18:57 ` [PATCH v5 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-06-27 17:06   ` Jan Beulich
2017-07-23  2:00     ` Boris Ostrovsky
2017-07-31 14:45       ` Jan Beulich
2017-07-31 16:03         ` Boris Ostrovsky
2017-08-02  9:24           ` Jan Beulich
2017-08-02 15:31             ` Boris Ostrovsky
2017-06-22 18:57 ` [PATCH v5 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
2017-06-27 17:59   ` Jan Beulich
2017-06-22 18:57 ` [PATCH v5 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-06-27 18:00   ` Jan Beulich
2017-07-23  2:07     ` Boris Ostrovsky
2017-07-31 15:16       ` Jan Beulich
2017-07-31 16:07         ` Boris Ostrovsky
2017-06-22 18:57 ` [PATCH v5 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
2017-06-23  8:36   ` Dario Faggioli
2017-06-27 18:01   ` Jan Beulich
2017-07-23  2:14     ` Boris Ostrovsky
2017-07-31 15:20       ` Jan Beulich
2017-07-31 16:15         ` Boris Ostrovsky
2017-08-02  9:27           ` Jan Beulich
2017-06-22 18:57 ` [PATCH v5 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-06-22 18:57 ` [PATCH v5 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
2017-06-27 19:28   ` Jan Beulich
2017-06-27 19:31     ` Jan Beulich
2017-07-23  2:28     ` Boris Ostrovsky
2017-08-02  8:34       ` Jan Beulich
2017-06-22 18:57 ` [PATCH v5 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-06-22 18:57 ` [PATCH v5 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
2017-06-27 19:29   ` Jan Beulich
2017-06-23  9:36 ` [PATCH v5 0/8] Memory scrubbing from idle loop Jan Beulich
2017-06-23 13:11   ` Boris Ostrovsky
2017-06-23 13:22     ` Jan Beulich
2017-06-23 13:29       ` Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.