All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/8] Memory scrubbing from idle loop
@ 2017-08-04 17:05 Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
                   ` (7 more replies)
  0 siblings, 8 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Changes in v6:
* Changed first_dirty tracking from pointer-based to index-based (patch 1)
* Added/modified a few ASSERT()s
* Moved/modifed a couple of comments
* Adjusted width of INVALID_DIRTY_IDX

(see per-patch changes)

When a domain is destroyed the hypervisor must scrub domain's pages before
giving them to another guest in order to prevent leaking the deceased
guest's data. Currently this is done during guest's destruction, possibly
causing very lengthy cleanup process.

This series adds support for scrubbing released pages from idle loop,
making guest destruction significantly faster. For example, destroying a
1TB guest can now be completed in 40+ seconds as opposed to about 9 minutes
using existing scrubbing algorithm.

Briefly, the new algorithm places dirty pages at the end of heap's page list
for each node/zone/order to avoid having to scan full list while searching
for dirty pages. One processor form each node checks whether the node has any
dirty pages and, if such pages are found, scrubs them. Scrubbing itself
happens without holding heap lock so other users may access heap in the
meantime. If while idle loop is scrubbing a particular chunk of pages this
chunk is requested by the heap allocator, scrubbing is immediately stopped.

On the allocation side, alloc_heap_pages() first tries to satisfy allocation
request using only clean pages. If this is not possible, the search is
repeated and dirty pages are scrubbed by the allocator.

This series is somewhat based on earlier work by Bob Liu.

V1:
* Only set PGC_need_scrub bit for the buddy head, thus making it unnecessary
  to scan whole buddy
* Fix spin_lock_cb()
* Scrub CPU-less nodes
* ARM support. Note that I have not been able to test this, only built the
  binary
* Added scrub test patch (last one). Not sure whether it should be considered
  for committing but I have been running with it.

V2:
* merge_chunks() returns new buddy head
* scrub_free_pages() returns softirq pending status in addition to (factored out)
  status of unscrubbed memory
* spin_lock uses inlined spin_lock_cb()
* scrub debugging code checks whole page, not just the first word.

V3:
* Keep dirty bit per page
* Simplify merge_chunks() (now merge_and_free_buddy())
* When scrubbing memmory-only nodes try to find the closest node.

V4:
* Keep track of dirty pages in a buddy with page_info.u.free.first_dirty.
* Drop patch 1 (factoring out merge_and_free_buddy()) since there is only
  one caller now
* Drop patch patch 5 (from V3) since we are not breaking partially-scrubbed
  buddy anymore
* Extract search loop in alloc_heap_pages() into get_free_buddy() (patch 2)
* Add MEMF_no_scrub flag

V5:
* Make page_info.u.free and union and use bitfields there.
* Bug fixes


Deferred:
* Per-node heap locks. In addition to (presumably) improving performance in
  general, once they are available we can parallelize scrubbing further by
  allowing more than one core per node to do idle loop scrubbing.
* AVX-based scrubbing
* Use idle loop scrubbing during boot.


Boris Ostrovsky (8):
  mm: Place unscrubbed pages at the end of pagelist
  mm: Extract allocation loop from alloc_heap_pages()
  mm: Scrub pages in alloc_heap_pages() if needed
  mm: Scrub memory from idle loop
  spinlock: Introduce spin_lock_cb()
  mm: Keep heap accessible to others while scrubbing
  mm: Print number of unscrubbed pages in 'H' debug handler
  mm: Make sure pages are scrubbed

 xen/Kconfig.debug          |   7 +
 xen/arch/arm/domain.c      |   8 +-
 xen/arch/x86/domain.c      |   8 +-
 xen/arch/x86/domain_page.c |   6 +-
 xen/common/page_alloc.c    | 616 ++++++++++++++++++++++++++++++++++++++-------
 xen/common/spinlock.c      |   9 +-
 xen/include/asm-arm/mm.h   |  30 ++-
 xen/include/asm-x86/mm.h   |  30 ++-
 xen/include/xen/mm.h       |   5 +-
 xen/include/xen/spinlock.h |   8 +
 10 files changed, 621 insertions(+), 106 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-06 17:41   ` Jan Beulich
  2017-08-07 10:45   ` Julien Grall
  2017-08-04 17:05 ` [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

. so that it's easy to find pages that need to be scrubbed (those pages are
now marked with _PGC_need_scrub bit).

We keep track of the first unscrubbed page in a page buddy using first_dirty
field. For now it can have two values, 0 (whole buddy needs scrubbing) or
INVALID_DIRTY_IDX (the buddy does not need to be scrubbed). Subsequent patches
will allow scrubbing to be interrupted, resulting in first_dirty taking any
value.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v6:
* Track indexes and not pointers in alloc_heap_pages() and reserve_offlined_page()
  when breaking a potentially dirty buddy and merging the fragments.
* Reduced (by one bit) width of INVALID_DIRTY_IDX
* Added a coupe of ASSERT()s

 xen/common/page_alloc.c  | 194 ++++++++++++++++++++++++++++++++++++++---------
 xen/include/asm-arm/mm.h |  18 ++++-
 xen/include/asm-x86/mm.h |  17 ++++-
 3 files changed, 193 insertions(+), 36 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 8bcef6a..9e787e0 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -383,6 +383,8 @@ typedef struct page_list_head heap_by_zone_and_order_t[NR_ZONES][MAX_ORDER+1];
 static heap_by_zone_and_order_t *_heap[MAX_NUMNODES];
 #define heap(node, zone, order) ((*_heap[node])[zone][order])
 
+static unsigned long node_need_scrub[MAX_NUMNODES];
+
 static unsigned long *avail[MAX_NUMNODES];
 static long total_avail_pages;
 
@@ -678,13 +680,30 @@ static void check_low_mem_virq(void)
     }
 }
 
+/* Pages that need a scrub are added to tail, otherwise to head. */
+static void page_list_add_scrub(struct page_info *pg, unsigned int node,
+                                unsigned int zone, unsigned int order,
+                                unsigned int first_dirty)
+{
+    PFN_ORDER(pg) = order;
+    pg->u.free.first_dirty = first_dirty;
+
+    if ( first_dirty != INVALID_DIRTY_IDX )
+    {
+        ASSERT(first_dirty < (1U << order));
+        page_list_add_tail(pg, &heap(node, zone, order));
+    }
+    else
+        page_list_add(pg, &heap(node, zone, order));
+}
+
 /* Allocate 2^@order contiguous pages. */
 static struct page_info *alloc_heap_pages(
     unsigned int zone_lo, unsigned int zone_hi,
     unsigned int order, unsigned int memflags,
     struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0;
+    unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
     unsigned long request = 1UL << order;
     struct page_info *pg;
@@ -798,12 +817,26 @@ static struct page_info *alloc_heap_pages(
     return NULL;
 
  found: 
+
+    first_dirty = pg->u.free.first_dirty;
+
     /* We may have to halve the chunk a number of times. */
     while ( j != order )
     {
-        PFN_ORDER(pg) = --j;
-        page_list_add_tail(pg, &heap(node, zone, j));
-        pg += 1 << j;
+        j--;
+        page_list_add_scrub(pg, node, zone, j,
+                            (1U << j) > first_dirty ?
+                            first_dirty : INVALID_DIRTY_IDX);
+        pg += 1U << j;
+
+        if ( first_dirty != INVALID_DIRTY_IDX )
+        {
+            /* Adjust first_dirty */
+            if ( first_dirty >= 1U << j )
+                first_dirty -= 1U << j;
+            else
+                first_dirty = 0; /* We've moved past original first_dirty */
+        }
     }
 
     ASSERT(avail[node][zone] >= request);
@@ -850,12 +883,20 @@ static int reserve_offlined_page(struct page_info *head)
     unsigned int node = phys_to_nid(page_to_maddr(head));
     int zone = page_to_zone(head), i, head_order = PFN_ORDER(head), count = 0;
     struct page_info *cur_head;
-    int cur_order;
+    unsigned int cur_order, first_dirty;
 
     ASSERT(spin_is_locked(&heap_lock));
 
     cur_head = head;
 
+    /*
+     * We may break the buddy so let's mark the head as clean. Then, when
+     * merging chunks back into the heap, we will see whether the chunk has
+     * unscrubbed pages and set its first_dirty properly.
+     */
+    first_dirty = head->u.free.first_dirty;
+    head->u.free.first_dirty = INVALID_DIRTY_IDX;
+
     page_list_del(head, &heap(node, zone, head_order));
 
     while ( cur_head < (head + (1 << head_order)) )
@@ -866,6 +907,8 @@ static int reserve_offlined_page(struct page_info *head)
         if ( page_state_is(cur_head, offlined) )
         {
             cur_head++;
+            if ( first_dirty != INVALID_DIRTY_IDX && first_dirty )
+                first_dirty--;
             continue;
         }
 
@@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head)
 
         while ( cur_order < head_order )
         {
+            unsigned int idx = INVALID_DIRTY_IDX;
+
             next_order = cur_order + 1;
 
             if ( (cur_head + (1 << next_order)) >= (head + ( 1 << head_order)) )
@@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head)
             {
             merge:
                 /* We don't consider merging outside the head_order. */
-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
-                PFN_ORDER(cur_head) = cur_order;
+
+                /* See if any of the pages indeed need scrubbing. */
+                if ( first_dirty != INVALID_DIRTY_IDX )
+                {
+                    if ( (1U << cur_order) > first_dirty )
+                    {
+                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
+                            if ( test_bit(_PGC_need_scrub,
+                                          &cur_head[i].count_info) )
+                            {
+                                idx = i;
+                                break;
+                            }
+                        ASSERT(idx != INVALID_DIRTY_IDX);
+                        first_dirty = 0;
+                    }
+                    else
+                        first_dirty -= 1U << cur_order;
+                }
+
+                page_list_add_scrub(cur_head, node, zone,
+                                    cur_order, idx);
                 cur_head += (1 << cur_order);
                 break;
             }
@@ -919,9 +984,53 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
+static void scrub_free_pages(unsigned int node)
+{
+    struct page_info *pg;
+    unsigned int zone;
+
+    ASSERT(spin_is_locked(&heap_lock));
+
+    if ( !node_need_scrub[node] )
+        return;
+
+    for ( zone = 0; zone < NR_ZONES; zone++ )
+    {
+        unsigned int order = MAX_ORDER;
+
+        do {
+            while ( !page_list_empty(&heap(node, zone, order)) )
+            {
+                unsigned int i;
+
+                /* Unscrubbed pages are always at the end of the list. */
+                pg = page_list_last(&heap(node, zone, order));
+                if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                    break;
+
+                for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
+                {
+                    if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+                    {
+                        scrub_one_page(&pg[i]);
+                        pg[i].count_info &= ~PGC_need_scrub;
+                        node_need_scrub[node]--;
+                    }
+                }
+
+                page_list_del(pg, &heap(node, zone, order));
+                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+
+                if ( node_need_scrub[node] == 0 )
+                    return;
+            }
+        } while ( order-- != 0 );
+    }
+}
+
 /* Free 2^@order set of pages. */
 static void free_heap_pages(
-    struct page_info *pg, unsigned int order)
+    struct page_info *pg, unsigned int order, bool need_scrub)
 {
     unsigned long mask, mfn = page_to_mfn(pg);
     unsigned int i, node = phys_to_nid(page_to_maddr(pg)), tainted = 0;
@@ -961,10 +1070,20 @@ static void free_heap_pages(
         /* This page is not a guest frame any more. */
         page_set_owner(&pg[i], NULL); /* set_gpfn_from_mfn snoops pg owner */
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
+
+        if ( need_scrub )
+            pg[i].count_info |= PGC_need_scrub;
     }
 
     avail[node][zone] += 1 << order;
     total_avail_pages += 1 << order;
+    if ( need_scrub )
+    {
+        node_need_scrub[node] += 1 << order;
+        pg->u.free.first_dirty = 0;
+    }
+    else
+        pg->u.free.first_dirty = INVALID_DIRTY_IDX;
 
     if ( tmem_enabled() )
         midsize_alloc_zone_pages = max(
@@ -977,35 +1096,49 @@ static void free_heap_pages(
 
         if ( (page_to_mfn(pg) & mask) )
         {
+            struct page_info *predecessor = pg - mask;
+
             /* Merge with predecessor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg-mask))) ||
-                 !page_state_is(pg-mask, free) ||
-                 (PFN_ORDER(pg-mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg-mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(predecessor))) ||
+                 !page_state_is(predecessor, free) ||
+                 (PFN_ORDER(predecessor) != order) ||
+                 (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
-            pg -= mask;
-            page_list_del(pg, &heap(node, zone, order));
+
+            page_list_del(predecessor, &heap(node, zone, order));
+
+            /* Keep predecessor's first_dirty if it is already set. */
+            if ( predecessor->u.free.first_dirty == INVALID_DIRTY_IDX &&
+                 pg->u.free.first_dirty != INVALID_DIRTY_IDX )
+                predecessor->u.free.first_dirty = (1U << order) +
+                                                  pg->u.free.first_dirty;
+
+            pg = predecessor;
         }
         else
         {
+            struct page_info *successor = pg + mask;
+
             /* Merge with successor block? */
-            if ( !mfn_valid(_mfn(page_to_mfn(pg+mask))) ||
-                 !page_state_is(pg+mask, free) ||
-                 (PFN_ORDER(pg+mask) != order) ||
-                 (phys_to_nid(page_to_maddr(pg+mask)) != node) )
+            if ( !mfn_valid(_mfn(page_to_mfn(successor))) ||
+                 !page_state_is(successor, free) ||
+                 (PFN_ORDER(successor) != order) ||
+                 (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
-            page_list_del(pg + mask, &heap(node, zone, order));
+            page_list_del(successor, &heap(node, zone, order));
         }
 
         order++;
     }
 
-    PFN_ORDER(pg) = order;
-    page_list_add_tail(pg, &heap(node, zone, order));
+    page_list_add_scrub(pg, node, zone, order, pg->u.free.first_dirty);
 
     if ( tainted )
         reserve_offlined_page(pg);
 
+    if ( need_scrub )
+        scrub_free_pages(node);
+
     spin_unlock(&heap_lock);
 }
 
@@ -1226,7 +1359,7 @@ unsigned int online_page(unsigned long mfn, uint32_t *status)
     spin_unlock(&heap_lock);
 
     if ( (y & PGC_state) == PGC_state_offlined )
-        free_heap_pages(pg, 0);
+        free_heap_pages(pg, 0, false);
 
     return ret;
 }
@@ -1295,7 +1428,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-        free_heap_pages(pg+i, 0);
+        free_heap_pages(pg + i, 0, false);
     }
 }
 
@@ -1622,7 +1755,7 @@ void free_xenheap_pages(void *v, unsigned int order)
 
     memguard_guard_range(v, 1 << (order + PAGE_SHIFT));
 
-    free_heap_pages(virt_to_page(v), order);
+    free_heap_pages(virt_to_page(v), order, false);
 }
 
 #else
@@ -1676,12 +1809,9 @@ void free_xenheap_pages(void *v, unsigned int order)
     pg = virt_to_page(v);
 
     for ( i = 0; i < (1u << order); i++ )
-    {
-        scrub_one_page(&pg[i]);
         pg[i].count_info &= ~PGC_xen_heap;
-    }
 
-    free_heap_pages(pg, order);
+    free_heap_pages(pg, order, true);
 }
 
 #endif
@@ -1790,7 +1920,7 @@ struct page_info *alloc_domheap_pages(
     if ( d && !(memflags & MEMF_no_owner) &&
          assign_pages(d, pg, order, memflags) )
     {
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, false);
         return NULL;
     }
     
@@ -1858,11 +1988,7 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
             scrub = 1;
         }
 
-        if ( unlikely(scrub) )
-            for ( i = 0; i < (1 << order); i++ )
-                scrub_one_page(&pg[i]);
-
-        free_heap_pages(pg, order);
+        free_heap_pages(pg, order, scrub);
     }
 
     if ( drop_dom_ref )
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index ef84b72..d26b232 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -44,7 +44,16 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            unsigned long need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
+
         } free;
 
     } u;
@@ -107,6 +116,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 extern mfn_t xenheap_mfn_start, xenheap_mfn_end;
 extern vaddr_t xenheap_virt_end;
 #ifdef CONFIG_ARM_64
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 2bf3f33..9b7fd05 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -88,7 +88,15 @@ struct page_info
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
         struct {
             /* Do TLBs need flushing for safety before next page use? */
-            bool_t need_tlbflush;
+            unsigned long need_tlbflush:1;
+
+            /*
+             * Index of the first *possibly* unscrubbed page in the buddy.
+             * One more than maximum possible order to accommodate
+             * INVALID_DIRTY_IDX.
+             */
+#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
+            unsigned long first_dirty:MAX_ORDER + 1;
         } free;
 
     } u;
@@ -233,6 +241,13 @@ struct page_info
 #define PGC_count_width   PG_shift(9)
 #define PGC_count_mask    ((1UL<<PGC_count_width)-1)
 
+/*
+ * Page needs to be scrubbed. Since this bit can only be set on a page that is
+ * free (i.e. in PGC_state_free) we can reuse PGC_allocated bit.
+ */
+#define _PGC_need_scrub   _PGC_allocated
+#define PGC_need_scrub    PGC_allocated
+
 #define is_xen_heap_page(page) ((page)->count_info & PGC_xen_heap)
 #define is_xen_heap_mfn(mfn) \
     (__mfn_valid(mfn) && is_xen_heap_page(__mfn_to_page(mfn)))
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-06 17:42   ` Jan Beulich
  2017-08-04 17:05 ` [PATCH v6 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

This will make code a bit more readable, especially with changes that
will be introduced in subsequent patches.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v6:
* Rebased due to changes in the first patch (thus dropped Jan's ACK)

 xen/common/page_alloc.c | 139 +++++++++++++++++++++++++++---------------------
 1 file changed, 77 insertions(+), 62 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9e787e0..6d7422d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -697,22 +697,15 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
-/* Allocate 2^@order contiguous pages. */
-static struct page_info *alloc_heap_pages(
-    unsigned int zone_lo, unsigned int zone_hi,
-    unsigned int order, unsigned int memflags,
-    struct domain *d)
+static struct page_info *get_free_buddy(unsigned int zone_lo,
+                                        unsigned int zone_hi,
+                                        unsigned int order, unsigned int memflags,
+                                        const struct domain *d)
 {
-    unsigned int i, j, zone = 0, nodemask_retry = 0, first_dirty;
     nodeid_t first_node, node = MEMF_get_node(memflags), req_node = node;
-    unsigned long request = 1UL << order;
+    nodemask_t nodemask = d ? d->node_affinity : node_online_map;
+    unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
-    nodemask_t nodemask = (d != NULL ) ? d->node_affinity : node_online_map;
-    bool_t need_tlbflush = 0;
-    uint32_t tlbflush_timestamp = 0;
-
-    /* Make sure there are enough bits in memflags for nodeID. */
-    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
 
     if ( node == NUMA_NO_NODE )
     {
@@ -728,34 +721,6 @@ static struct page_info *alloc_heap_pages(
     first_node = node;
 
     ASSERT(node < MAX_NUMNODES);
-    ASSERT(zone_lo <= zone_hi);
-    ASSERT(zone_hi < NR_ZONES);
-
-    if ( unlikely(order > MAX_ORDER) )
-        return NULL;
-
-    spin_lock(&heap_lock);
-
-    /*
-     * Claimed memory is considered unavailable unless the request
-     * is made by a domain with sufficient unclaimed pages.
-     */
-    if ( (outstanding_claims + request >
-          total_avail_pages + tmem_freeable_pages()) &&
-          ((memflags & MEMF_no_refcount) ||
-           !d || d->outstanding_pages < request) )
-        goto not_found;
-
-    /*
-     * TMEM: When available memory is scarce due to tmem absorbing it, allow
-     * only mid-size allocations to avoid worst of fragmentation issues.
-     * Others try tmem pools then fail.  This is a workaround until all
-     * post-dom0-creation-multi-page allocations can be eliminated.
-     */
-    if ( ((order == 0) || (order >= 9)) &&
-         (total_avail_pages <= midsize_alloc_zone_pages) &&
-         tmem_freeable_pages() )
-        goto try_tmem;
 
     /*
      * Start with requested node, but exhaust all node memory in requested 
@@ -767,17 +732,17 @@ static struct page_info *alloc_heap_pages(
         zone = zone_hi;
         do {
             /* Check if target node can support the allocation. */
-            if ( !avail[node] || (avail[node][zone] < request) )
+            if ( !avail[node] || (avail[node][zone] < (1UL << order)) )
                 continue;
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    goto found;
+                    return pg;
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
-            goto not_found;
+            return NULL;
 
         /* Pick next node. */
         if ( !node_isset(node, nodemask) )
@@ -794,46 +759,96 @@ static struct page_info *alloc_heap_pages(
         {
             /* When we have tried all in nodemask, we fall back to others. */
             if ( (memflags & MEMF_exact_node) || nodemask_retry++ )
-                goto not_found;
+                return NULL;
             nodes_andnot(nodemask, node_online_map, nodemask);
             first_node = node = first_node(nodemask);
             if ( node >= MAX_NUMNODES )
-                goto not_found;
+                return NULL;
         }
     }
+}
+
+/* Allocate 2^@order contiguous pages. */
+static struct page_info *alloc_heap_pages(
+    unsigned int zone_lo, unsigned int zone_hi,
+    unsigned int order, unsigned int memflags,
+    struct domain *d)
+{
+    nodeid_t node;
+    unsigned int i, buddy_order, zone, first_dirty;
+    unsigned long request = 1UL << order;
+    struct page_info *pg;
+    bool need_tlbflush = false;
+    uint32_t tlbflush_timestamp = 0;
+
+    /* Make sure there are enough bits in memflags for nodeID. */
+    BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
+
+    ASSERT(zone_lo <= zone_hi);
+    ASSERT(zone_hi < NR_ZONES);
+
+    if ( unlikely(order > MAX_ORDER) )
+        return NULL;
+
+    spin_lock(&heap_lock);
+
+    /*
+     * Claimed memory is considered unavailable unless the request
+     * is made by a domain with sufficient unclaimed pages.
+     */
+    if ( (outstanding_claims + request >
+          total_avail_pages + tmem_freeable_pages()) &&
+          ((memflags & MEMF_no_refcount) ||
+           !d || d->outstanding_pages < request) )
+    {
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
- try_tmem:
-    /* Try to free memory from tmem */
-    if ( (pg = tmem_relinquish_pages(order, memflags)) != NULL )
+    /*
+     * TMEM: When available memory is scarce due to tmem absorbing it, allow
+     * only mid-size allocations to avoid worst of fragmentation issues.
+     * Others try tmem pools then fail.  This is a workaround until all
+     * post-dom0-creation-multi-page allocations can be eliminated.
+     */
+    if ( ((order == 0) || (order >= 9)) &&
+         (total_avail_pages <= midsize_alloc_zone_pages) &&
+         tmem_freeable_pages() )
     {
-        /* reassigning an already allocated anonymous heap page */
+        /* Try to free memory from tmem. */
+        pg = tmem_relinquish_pages(order, memflags);
         spin_unlock(&heap_lock);
         return pg;
     }
 
- not_found:
-    /* No suitable memory blocks. Fail the request. */
-    spin_unlock(&heap_lock);
-    return NULL;
+    pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    if ( !pg )
+    {
+        /* No suitable memory blocks. Fail the request. */
+        spin_unlock(&heap_lock);
+        return NULL;
+    }
 
- found: 
+    node = phys_to_nid(page_to_maddr(pg));
+    zone = page_to_zone(pg);
+    buddy_order = PFN_ORDER(pg);
 
     first_dirty = pg->u.free.first_dirty;
 
     /* We may have to halve the chunk a number of times. */
-    while ( j != order )
+    while ( buddy_order != order )
     {
-        j--;
-        page_list_add_scrub(pg, node, zone, j,
-                            (1U << j) > first_dirty ?
+        buddy_order--;
+        page_list_add_scrub(pg, node, zone, buddy_order,
+                            (1U << buddy_order) > first_dirty ?
                             first_dirty : INVALID_DIRTY_IDX);
-        pg += 1U << j;
+        pg += 1U << buddy_order;
 
         if ( first_dirty != INVALID_DIRTY_IDX )
         {
             /* Adjust first_dirty */
-            if ( first_dirty >= 1U << j )
-                first_dirty -= 1U << j;
+            if ( first_dirty >= 1U << buddy_order )
+                first_dirty -= 1U << buddy_order;
             else
                 first_dirty = 0; /* We've moved past original first_dirty */
         }
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 3/8] mm: Scrub pages in alloc_heap_pages() if needed
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

When allocating pages in alloc_heap_pages() first look for clean pages. If none
is found then retry, take pages marked as unscrubbed and scrub them.

Note that we shouldn't find unscrubbed pages in alloc_heap_pages() yet. However,
this will become possible when we stop scrubbing from free_heap_pages() and
instead do it from idle loop.

Since not all allocations require clean pages (such as xenheap allocations)
introduce MEMF_no_scrub flag that callers can set if they are willing to
consume unscrubbed pages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v6:
* Dropped unnecessary need_scrub.

 xen/common/page_alloc.c | 33 +++++++++++++++++++++++++++++----
 xen/include/xen/mm.h    |  4 +++-
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6d7422d..eedff2d 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -706,6 +706,7 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
     nodemask_t nodemask = d ? d->node_affinity : node_online_map;
     unsigned int j, zone, nodemask_retry = 0;
     struct page_info *pg;
+    bool use_unscrubbed = (memflags & MEMF_no_scrub);
 
     if ( node == NUMA_NO_NODE )
     {
@@ -737,8 +738,20 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
 
             /* Find smallest order which can satisfy the request. */
             for ( j = order; j <= MAX_ORDER; j++ )
+            {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
-                    return pg;
+                {
+                    /*
+                     * We grab single pages (order=0) even if they are
+                     * unscrubbed. Given that scrubbing one page is fairly quick
+                     * it is not worth breaking higher orders.
+                     */
+                    if ( (order == 0) || use_unscrubbed ||
+                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                        return pg;
+                    page_list_add_tail(pg, &heap(node, zone, j));
+                }
+            }
         } while ( zone-- > zone_lo ); /* careful: unsigned zone may wrap */
 
         if ( (memflags & MEMF_exact_node) && req_node != NUMA_NO_NODE )
@@ -822,6 +835,10 @@ static struct page_info *alloc_heap_pages(
     }
 
     pg = get_free_buddy(zone_lo, zone_hi, order, memflags, d);
+    /* Try getting a dirty buddy if we couldn't get a clean one. */
+    if ( !pg && !(memflags & MEMF_no_scrub) )
+        pg = get_free_buddy(zone_lo, zone_hi, order,
+                            memflags | MEMF_no_scrub, d);
     if ( !pg )
     {
         /* No suitable memory blocks. Fail the request. */
@@ -867,7 +884,15 @@ static struct page_info *alloc_heap_pages(
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
-        BUG_ON(pg[i].count_info != PGC_state_free);
+        BUG_ON((pg[i].count_info & ~PGC_need_scrub) != PGC_state_free);
+
+        if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
+        {
+            if ( !(memflags & MEMF_no_scrub) )
+                scrub_one_page(&pg[i]);
+            node_need_scrub[node]--;
+        }
+
         pg[i].count_info = PGC_state_inuse;
 
         if ( !(memflags & MEMF_no_tlbflush) )
@@ -1751,7 +1776,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     ASSERT(!in_irq());
 
     pg = alloc_heap_pages(MEMZONE_XEN, MEMZONE_XEN,
-                          order, memflags, NULL);
+                          order, memflags | MEMF_no_scrub, NULL);
     if ( unlikely(pg == NULL) )
         return NULL;
 
@@ -1801,7 +1826,7 @@ void *alloc_xenheap_pages(unsigned int order, unsigned int memflags)
     if ( !(memflags >> _MEMF_bits) )
         memflags |= MEMF_bits(xenheap_bits);
 
-    pg = alloc_domheap_pages(NULL, order, memflags);
+    pg = alloc_domheap_pages(NULL, order, memflags | MEMF_no_scrub);
     if ( unlikely(pg == NULL) )
         return NULL;
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index 503b92e..e1f9c42 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -248,7 +248,9 @@ struct npfec {
 #define  MEMF_no_tlbflush (1U<<_MEMF_no_tlbflush)
 #define _MEMF_no_icache_flush 7
 #define  MEMF_no_icache_flush (1U<<_MEMF_no_icache_flush)
-#define _MEMF_node        8
+#define _MEMF_no_scrub    8
+#define  MEMF_no_scrub    (1U<<_MEMF_no_scrub)
+#define _MEMF_node        16
 #define  MEMF_node_mask   ((1U << (8 * sizeof(nodeid_t))) - 1)
 #define  MEMF_node(n)     ((((n) + 1) & MEMF_node_mask) << _MEMF_node)
 #define  MEMF_get_node(f) ((((f) >> _MEMF_node) - 1) & MEMF_node_mask)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 4/8] mm: Scrub memory from idle loop
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-08-04 17:05 ` [PATCH v6 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-07  7:29   ` Jan Beulich
  2017-08-07 14:05   ` Dario Faggioli
  2017-08-04 17:05 ` [PATCH v6 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	Dario Faggioli, ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages during guest destruction (from
free_heap_pages()) do this opportunistically, from the idle loop.

We might come to scrub_free_pages()from idle loop while another CPU
uses mapcache override, resulting in a fault while trying to do
__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
vcpu override a per-cpu variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
CC: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes in v6:
* Moved final softirq_pending() test from scrub_free_pages() to idle loop.


 xen/arch/arm/domain.c      |   8 ++-
 xen/arch/x86/domain.c      |   8 ++-
 xen/arch/x86/domain_page.c |   6 +--
 xen/common/page_alloc.c    | 119 ++++++++++++++++++++++++++++++++++++++++-----
 xen/include/xen/mm.h       |   1 +
 5 files changed, 124 insertions(+), 18 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2dc8b0a..d7961bb 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -51,7 +51,13 @@ void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        /*
+         * Test softirqs twice --- first to see if should even try scrubbing
+         * and then, after it is done, whether softirqs became pending
+         * while we were scrubbing.
+         */
+        else if ( !softirq_pending(cpu) && !scrub_free_pages() &&
+                    !softirq_pending(cpu) )
         {
             local_irq_disable();
             if ( cpu_is_haltable(cpu) )
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index baaf815..9b4b959 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -122,7 +122,13 @@ static void idle_loop(void)
         /* Are we here for running vcpu context tasklets, or for idling? */
         if ( unlikely(tasklet_work_to_do(cpu)) )
             do_tasklet();
-        else
+        /*
+         * Test softirqs twice --- first to see if should even try scrubbing
+         * and then, after it is done, whether softirqs became pending
+         * while we were scrubbing.
+         */
+        else if ( !softirq_pending(cpu) && !scrub_free_pages()  &&
+                    !softirq_pending(cpu) )
             pm_idle();
         do_softirq();
         /*
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index 71baede..0783c1e 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -18,12 +18,12 @@
 #include <asm/hardirq.h>
 #include <asm/setup.h>
 
-static struct vcpu *__read_mostly override;
+static DEFINE_PER_CPU(struct vcpu *, override);
 
 static inline struct vcpu *mapcache_current_vcpu(void)
 {
     /* In the common case we use the mapcache of the running VCPU. */
-    struct vcpu *v = override ?: current;
+    struct vcpu *v = this_cpu(override) ?: current;
 
     /*
      * When current isn't properly set up yet, this is equivalent to
@@ -59,7 +59,7 @@ static inline struct vcpu *mapcache_current_vcpu(void)
 
 void __init mapcache_override_current(struct vcpu *v)
 {
-    override = v;
+    this_cpu(override) = v;
 }
 
 #define mapcache_l2_entry(e) ((e) >> PAGETABLE_ORDER)
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index eedff2d..3f04f16 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1024,15 +1024,86 @@ static int reserve_offlined_page(struct page_info *head)
     return count;
 }
 
-static void scrub_free_pages(unsigned int node)
+static nodemask_t node_scrubbing;
+
+/*
+ * If get_node is true this will return closest node that needs to be scrubbed,
+ * with appropriate bit in node_scrubbing set.
+ * If get_node is not set, this will return *a* node that needs to be scrubbed.
+ * node_scrubbing bitmask will no be updated.
+ * If no node needs scrubbing then NUMA_NO_NODE is returned.
+ */
+static unsigned int node_to_scrub(bool get_node)
 {
-    struct page_info *pg;
-    unsigned int zone;
+    nodeid_t node = cpu_to_node(smp_processor_id()), local_node;
+    nodeid_t closest = NUMA_NO_NODE;
+    u8 dist, shortest = 0xff;
 
-    ASSERT(spin_is_locked(&heap_lock));
+    if ( node == NUMA_NO_NODE )
+        node = 0;
 
-    if ( !node_need_scrub[node] )
-        return;
+    if ( node_need_scrub[node] &&
+         (!get_node || !node_test_and_set(node, node_scrubbing)) )
+        return node;
+
+    /*
+     * See if there are memory-only nodes that need scrubbing and choose
+     * the closest one.
+     */
+    local_node = node;
+    for ( ; ; )
+    {
+        do {
+            node = cycle_node(node, node_online_map);
+        } while ( !cpumask_empty(&node_to_cpumask(node)) &&
+                  (node != local_node) );
+
+        if ( node == local_node )
+            break;
+
+        if ( node_need_scrub[node] )
+        {
+            if ( !get_node )
+                return node;
+
+            dist = __node_distance(local_node, node);
+
+            /*
+             * Grab the node right away. If we find a closer node later we will
+             * release this one. While there is a chance that another CPU will
+             * not be able to scrub that node when it is searching for scrub work
+             * at the same time it will be able to do so next time it wakes up.
+             * The alternative would be to perform this search under a lock but
+             * then we'd need to take this lock every time we come in here.
+             */
+            if ( (dist < shortest || closest == NUMA_NO_NODE) &&
+                 !node_test_and_set(node, node_scrubbing) )
+            {
+                if ( closest != NUMA_NO_NODE )
+                    node_clear(closest, node_scrubbing);
+                shortest = dist;
+                closest = node;
+            }
+        }
+    }
+
+    return closest;
+}
+
+bool scrub_free_pages(void)
+{
+    struct page_info *pg;
+    unsigned int zone;
+    unsigned int cpu = smp_processor_id();
+    bool preempt = false;
+    nodeid_t node;
+    unsigned int cnt = 0;
+  
+    node = node_to_scrub(true);
+    if ( node == NUMA_NO_NODE )
+        return false;
+ 
+    spin_lock(&heap_lock);
 
     for ( zone = 0; zone < NR_ZONES; zone++ )
     {
@@ -1055,17 +1126,42 @@ static void scrub_free_pages(unsigned int node)
                         scrub_one_page(&pg[i]);
                         pg[i].count_info &= ~PGC_need_scrub;
                         node_need_scrub[node]--;
+                        cnt += 100; /* scrubbed pages add heavier weight. */
+                    }
+                    else
+                        cnt++;
+
+                    /*
+                     * Scrub a few (8) pages before becoming eligible for
+                     * preemption. But also count non-scrubbing loop iterations
+                     * so that we don't get stuck here with an almost clean
+                     * heap.
+                     */
+                    if ( cnt > 800 && softirq_pending(cpu) )
+                    {
+                        preempt = true;
+                        break;
                     }
                 }
 
-                page_list_del(pg, &heap(node, zone, order));
-                page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                if ( i >= (1U << order) - 1 )
+                {
+                    page_list_del(pg, &heap(node, zone, order));
+                    page_list_add_scrub(pg, node, zone, order, INVALID_DIRTY_IDX);
+                }
+                else
+                    pg->u.free.first_dirty = i + 1;
 
-                if ( node_need_scrub[node] == 0 )
-                    return;
+                if ( preempt || (node_need_scrub[node] == 0) )
+                    goto out;
             }
         } while ( order-- != 0 );
     }
+
+ out:
+    spin_unlock(&heap_lock);
+    node_clear(node, node_scrubbing);
+    return node_to_scrub(false) != NUMA_NO_NODE;
 }
 
 /* Free 2^@order set of pages. */
@@ -1176,9 +1272,6 @@ static void free_heap_pages(
     if ( tainted )
         reserve_offlined_page(pg);
 
-    if ( need_scrub )
-        scrub_free_pages(node);
-
     spin_unlock(&heap_lock);
 }
 
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index e1f9c42..ddc3fb3 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -160,6 +160,7 @@ void init_xenheap_pages(paddr_t ps, paddr_t pe);
 void xenheap_max_mfn(unsigned long mfn);
 void *alloc_xenheap_pages(unsigned int order, unsigned int memflags);
 void free_xenheap_pages(void *v, unsigned int order);
+bool scrub_free_pages(void);
 #define alloc_xenheap_page() (alloc_xenheap_pages(0,0))
 #define free_xenheap_page(v) (free_xenheap_pages(v,0))
 /* Map machine page range in Xen virtual address space. */
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 5/8] spinlock: Introduce spin_lock_cb()
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-08-04 17:05 ` [PATCH v6 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-07  7:32   ` Jan Beulich
  2017-08-04 17:05 ` [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

While waiting for a lock we may want to periodically run some
code. This code may, for example, allow the caller to release
resources held by it that are no longer needed in the critical
section protected by the lock.

Specifically, this feature will be needed by scrubbing code where
the scrubber, while waiting for heap lock to merge back clean
pages, may be requested by page allocator (which is currently
holding the lock) to abort merging and release the buddy page head
that the allocator wants.

We could use spin_trylock() but since it doesn't take lock ticket
it may take long time until the lock is taken. Instead we add
spin_lock_cb() that allows us to grab the ticket and execute a
callback while waiting. This callback is executed on every iteration
of the spinlock waiting loop.

Since we may be sleeping in the lock until it is released we need a
mechanism that will make sure that the callback has a chance to run.
We add spin_lock_kick() that will wake up the waiter.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
 xen/common/spinlock.c      | 9 ++++++++-
 xen/include/xen/spinlock.h | 8 ++++++++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/common/spinlock.c b/xen/common/spinlock.c
index 2a06406..3c1caae 100644
--- a/xen/common/spinlock.c
+++ b/xen/common/spinlock.c
@@ -129,7 +129,7 @@ static always_inline u16 observe_head(spinlock_tickets_t *t)
     return read_atomic(&t->head);
 }
 
-void _spin_lock(spinlock_t *lock)
+void inline _spin_lock_cb(spinlock_t *lock, void (*cb)(void *), void *data)
 {
     spinlock_tickets_t tickets = SPINLOCK_TICKET_INC;
     LOCK_PROFILE_VAR;
@@ -140,6 +140,8 @@ void _spin_lock(spinlock_t *lock)
     while ( tickets.tail != observe_head(&lock->tickets) )
     {
         LOCK_PROFILE_BLOCK;
+        if ( unlikely(cb) )
+            cb(data);
         arch_lock_relax();
     }
     LOCK_PROFILE_GOT;
@@ -147,6 +149,11 @@ void _spin_lock(spinlock_t *lock)
     arch_lock_acquire_barrier();
 }
 
+void _spin_lock(spinlock_t *lock)
+{
+     _spin_lock_cb(lock, NULL, NULL);
+}
+
 void _spin_lock_irq(spinlock_t *lock)
 {
     ASSERT(local_irq_is_enabled());
diff --git a/xen/include/xen/spinlock.h b/xen/include/xen/spinlock.h
index c1883bd..91bfb95 100644
--- a/xen/include/xen/spinlock.h
+++ b/xen/include/xen/spinlock.h
@@ -153,6 +153,7 @@ typedef struct spinlock {
 #define spin_lock_init(l) (*(l) = (spinlock_t)SPIN_LOCK_UNLOCKED)
 
 void _spin_lock(spinlock_t *lock);
+void _spin_lock_cb(spinlock_t *lock, void (*cond)(void *), void *data);
 void _spin_lock_irq(spinlock_t *lock);
 unsigned long _spin_lock_irqsave(spinlock_t *lock);
 
@@ -169,6 +170,7 @@ void _spin_lock_recursive(spinlock_t *lock);
 void _spin_unlock_recursive(spinlock_t *lock);
 
 #define spin_lock(l)                  _spin_lock(l)
+#define spin_lock_cb(l, c, d)         _spin_lock_cb(l, c, d)
 #define spin_lock_irq(l)              _spin_lock_irq(l)
 #define spin_lock_irqsave(l, f)                                 \
     ({                                                          \
@@ -190,6 +192,12 @@ void _spin_unlock_recursive(spinlock_t *lock);
     1 : ({ local_irq_restore(flags); 0; });     \
 })
 
+#define spin_lock_kick(l)                       \
+({                                              \
+    smp_mb();                                   \
+    arch_lock_signal();                         \
+})
+
 /* Ensure a lock is quiescent between two critical operations. */
 #define spin_barrier(l)               _spin_barrier(l)
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (4 preceding siblings ...)
  2017-08-04 17:05 ` [PATCH v6 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-07  7:50   ` Jan Beulich
  2017-08-04 17:05 ` [PATCH v6 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
  7 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Instead of scrubbing pages while holding heap lock we can mark
buddy's head as being scrubbed and drop the lock temporarily.
If someone (most likely alloc_heap_pages()) tries to access
this chunk it will signal the scrubber to abort scrub by setting
head's BUDDY_SCRUB_ABORT bit. The scrubber checks this bit after
processing each page and stops its work as soon as it sees it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v6:
* Made ASSERT explicitly test for  BUDDY_NOT_SCRUBBING.
* Added a comment explaining why we set buddy's first_dirty in the lock's callback.

 xen/common/page_alloc.c  | 111 +++++++++++++++++++++++++++++++++++++++++++++--
 xen/include/asm-arm/mm.h |  26 +++++++----
 xen/include/asm-x86/mm.h |  27 ++++++++----
 3 files changed, 142 insertions(+), 22 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 3f04f16..e0bbc27 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -687,6 +687,7 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
 {
     PFN_ORDER(pg) = order;
     pg->u.free.first_dirty = first_dirty;
+    pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
 
     if ( first_dirty != INVALID_DIRTY_IDX )
     {
@@ -697,6 +698,25 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+static void check_and_stop_scrub(struct page_info *head)
+{
+    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
+    {
+        struct page_info pg;
+
+        head->u.free.scrub_state = BUDDY_SCRUB_ABORT;
+        spin_lock_kick();
+        for ( ; ; )
+        {
+            /* Can't ACCESS_ONCE() a bitfield. */
+            pg.u.free.val = ACCESS_ONCE(head->u.free.val);
+            if ( pg.u.free.scrub_state != BUDDY_SCRUB_ABORT )
+                break;
+            cpu_relax();
+        }
+    }
+}
+
 static struct page_info *get_free_buddy(unsigned int zone_lo,
                                         unsigned int zone_hi,
                                         unsigned int order, unsigned int memflags,
@@ -741,14 +761,19 @@ static struct page_info *get_free_buddy(unsigned int zone_lo,
             {
                 if ( (pg = page_list_remove_head(&heap(node, zone, j))) )
                 {
+                    if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
+                        return pg;
                     /*
                      * We grab single pages (order=0) even if they are
                      * unscrubbed. Given that scrubbing one page is fairly quick
                      * it is not worth breaking higher orders.
                      */
-                    if ( (order == 0) || use_unscrubbed ||
-                         pg->u.free.first_dirty == INVALID_DIRTY_IDX)
+                    if ( (order == 0) || use_unscrubbed )
+                    {
+                        check_and_stop_scrub(pg);
                         return pg;
+                    }
+
                     page_list_add_tail(pg, &heap(node, zone, j));
                 }
             }
@@ -929,6 +954,7 @@ static int reserve_offlined_page(struct page_info *head)
 
     cur_head = head;
 
+    check_and_stop_scrub(head);
     /*
      * We may break the buddy so let's mark the head as clean. Then, when
      * merging chunks back into the heap, we will see whether the chunk has
@@ -1090,6 +1116,29 @@ static unsigned int node_to_scrub(bool get_node)
     return closest;
 }
 
+struct scrub_wait_state {
+    struct page_info *pg;
+    unsigned int first_dirty;
+    bool drop;
+};
+
+static void scrub_continue(void *data)
+{
+    struct scrub_wait_state *st = data;
+
+    if ( st->drop )
+        return;
+
+    if ( st->pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+    {
+        /* There is a waiter for this buddy. Release it. */
+        st->drop = true;
+        st->pg->u.free.first_dirty = st->first_dirty;
+        smp_wmb();
+        st->pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+    }
+}
+
 bool scrub_free_pages(void)
 {
     struct page_info *pg;
@@ -1112,25 +1161,53 @@ bool scrub_free_pages(void)
         do {
             while ( !page_list_empty(&heap(node, zone, order)) )
             {
-                unsigned int i;
+                unsigned int i, dirty_cnt;
+                struct scrub_wait_state st;
 
                 /* Unscrubbed pages are always at the end of the list. */
                 pg = page_list_last(&heap(node, zone, order));
                 if ( pg->u.free.first_dirty == INVALID_DIRTY_IDX )
                     break;
 
+                ASSERT(pg->u.free.scrub_state == BUDDY_NOT_SCRUBBING);
+                pg->u.free.scrub_state = BUDDY_SCRUBBING;
+
+                spin_unlock(&heap_lock);
+
+                dirty_cnt = 0;
+
                 for ( i = pg->u.free.first_dirty; i < (1U << order); i++)
                 {
                     if ( test_bit(_PGC_need_scrub, &pg[i].count_info) )
                     {
                         scrub_one_page(&pg[i]);
+                        /*
+                         * We can modify count_info without holding heap
+                         * lock since we effectively locked this buddy by
+                         * setting its scrub_state.
+                         */
                         pg[i].count_info &= ~PGC_need_scrub;
-                        node_need_scrub[node]--;
+                        dirty_cnt++;
                         cnt += 100; /* scrubbed pages add heavier weight. */
                     }
                     else
                         cnt++;
 
+                    if ( pg->u.free.scrub_state == BUDDY_SCRUB_ABORT )
+                    {
+                        /* Someone wants this chunk. Drop everything. */
+
+                        pg->u.free.first_dirty = (i == (1U << order) - 1) ?
+                            INVALID_DIRTY_IDX : i + 1; 
+                        smp_wmb();
+                        pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
+                        spin_lock(&heap_lock);
+                        node_need_scrub[node] -= dirty_cnt;
+                        spin_unlock(&heap_lock);
+                        goto out_nolock;
+                    }
+
                     /*
                      * Scrub a few (8) pages before becoming eligible for
                      * preemption. But also count non-scrubbing loop iterations
@@ -1144,6 +1221,23 @@ bool scrub_free_pages(void)
                     }
                 }
 
+                st.pg = pg;
+                /*
+                 * get_free_buddy() grabs a buddy with first_dirty set to
+                 * INVALID_DIRTY_IDX so we can't set pg's first_dirty here.
+                 * It will be set either below or in the lock callback (in
+                 * scrub_continue()).
+                 */
+                st.first_dirty = (i >= (1UL << order) - 1) ?
+                    INVALID_DIRTY_IDX : i + 1;
+                st.drop = false;
+                spin_lock_cb(&heap_lock, scrub_continue, &st);
+
+                node_need_scrub[node] -= dirty_cnt;
+
+                if ( st.drop )
+                    goto out;
+
                 if ( i >= (1U << order) - 1 )
                 {
                     page_list_del(pg, &heap(node, zone, order));
@@ -1152,6 +1246,8 @@ bool scrub_free_pages(void)
                 else
                     pg->u.free.first_dirty = i + 1;
 
+                pg->u.free.scrub_state = BUDDY_NOT_SCRUBBING;
+
                 if ( preempt || (node_need_scrub[node] == 0) )
                     goto out;
             }
@@ -1160,6 +1256,8 @@ bool scrub_free_pages(void)
 
  out:
     spin_unlock(&heap_lock);
+
+ out_nolock:
     node_clear(node, node_scrubbing);
     return node_to_scrub(false) != NUMA_NO_NODE;
 }
@@ -1241,6 +1339,8 @@ static void free_heap_pages(
                  (phys_to_nid(page_to_maddr(predecessor)) != node) )
                 break;
 
+            check_and_stop_scrub(predecessor);
+
             page_list_del(predecessor, &heap(node, zone, order));
 
             /* Keep predecessor's first_dirty if it is already set. */
@@ -1261,6 +1361,9 @@ static void free_heap_pages(
                  (PFN_ORDER(successor) != order) ||
                  (phys_to_nid(page_to_maddr(successor)) != node) )
                 break;
+
+            check_and_stop_scrub(successor);
+
             page_list_del(successor, &heap(node, zone, order));
         }
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index d26b232..e2de0c1 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -42,18 +42,26 @@ struct page_info
             unsigned long type_info;
         } inuse;
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            unsigned long need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more than maximum possible order to accommodate
-             * INVALID_DIRTY_IDX.
-             */
+        union {
+            struct {
+                /* Do TLBs need flushing for safety before next page use? */
+                unsigned long need_tlbflush:1;
+
+                /*
+                 * Index of the first *possibly* unscrubbed page in the buddy.
+                 * One more than maximum possible order to accommodate
+                 * INVALID_DIRTY_IDX.
+                 */
 #define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
             unsigned long first_dirty:MAX_ORDER + 1;
 
+#define BUDDY_NOT_SCRUBBING    0
+#define BUDDY_SCRUBBING        1
+#define BUDDY_SCRUB_ABORT      2
+		unsigned long scrub_state:2;
+            };
+
+            unsigned long val;
         } free;
 
     } u;
diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
index 9b7fd05..df5eba2 100644
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -86,17 +86,26 @@ struct page_info
         } sh;
 
         /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
-        struct {
-            /* Do TLBs need flushing for safety before next page use? */
-            unsigned long need_tlbflush:1;
-
-            /*
-             * Index of the first *possibly* unscrubbed page in the buddy.
-             * One more than maximum possible order to accommodate
-             * INVALID_DIRTY_IDX.
-             */
+	union {
+	    struct {
+		/* Do TLBs need flushing for safety before next page use? */
+		unsigned long need_tlbflush:1;
+
+		/*
+		 * Index of the first *possibly* unscrubbed page in the buddy.
+		 * One more than maximum possible order to accommodate
+		 * INVALID_DIRTY_IDX.
+		 */
 #define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
             unsigned long first_dirty:MAX_ORDER + 1;
+
+#define BUDDY_NOT_SCRUBBING    0
+#define BUDDY_SCRUBBING        1
+#define BUDDY_SCRUB_ABORT      2
+		unsigned long scrub_state:2;
+	    };
+
+            unsigned long val;
         } free;
 
     } u;
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 7/8] mm: Print number of unscrubbed pages in 'H' debug handler
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (5 preceding siblings ...)
  2017-08-04 17:05 ` [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  2017-08-04 17:05 ` [PATCH v6 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky
  7 siblings, 0 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/page_alloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index e0bbc27..7cd736c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2323,6 +2323,13 @@ static void dump_heap(unsigned char key)
             printk("heap[node=%d][zone=%d] -> %lu pages\n",
                    i, j, avail[i][j]);
     }
+
+    for ( i = 0; i < MAX_NUMNODES; i++ )
+    {
+        if ( !node_need_scrub[i] )
+            continue;
+        printk("Node %d has %lu unscrubbed pages\n", i, node_need_scrub[i]);
+    }
 }
 
 static __init int register_heap_trigger(void)
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v6 8/8] mm: Make sure pages are scrubbed
  2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
                   ` (6 preceding siblings ...)
  2017-08-04 17:05 ` [PATCH v6 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
@ 2017-08-04 17:05 ` Boris Ostrovsky
  7 siblings, 0 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-04 17:05 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Add a debug Kconfig option that will make page allocator verify
that pages that were supposed to be scrubbed are, in fact, clean.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
 xen/Kconfig.debug       |  7 ++++++
 xen/common/page_alloc.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/xen/Kconfig.debug b/xen/Kconfig.debug
index 689f297..195d504 100644
--- a/xen/Kconfig.debug
+++ b/xen/Kconfig.debug
@@ -114,6 +114,13 @@ config DEVICE_TREE_DEBUG
 	  logged in the Xen ring buffer.
 	  If unsure, say N here.
 
+config SCRUB_DEBUG
+	bool "Page scrubbing test"
+	default DEBUG
+	---help---
+	  Verify that pages that need to be scrubbed before being allocated to
+	  a guest are indeed scrubbed.
+
 endif # DEBUG || EXPERT
 
 endmenu
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 7cd736c..aac1ff2 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -170,6 +170,10 @@ boolean_param("bootscrub", opt_bootscrub);
 static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
+#ifdef CONFIG_SCRUB_DEBUG
+static bool __read_mostly boot_scrub_done;
+#endif
+
 /*
  * Bit width of the DMA heap -- used to override NUMA-node-first.
  * allocation strategy, which can otherwise exhaust low memory.
@@ -698,6 +702,43 @@ static void page_list_add_scrub(struct page_info *pg, unsigned int node,
         page_list_add(pg, &heap(node, zone, order));
 }
 
+/* SCRUB_PATTERN needs to be a repeating series of bytes. */
+#ifndef NDEBUG
+#define SCRUB_PATTERN        0xc2c2c2c2c2c2c2c2ULL
+#else
+#define SCRUB_PATTERN        0ULL
+#endif
+#define SCRUB_BYTE_PATTERN   (SCRUB_PATTERN & 0xff)
+
+static void poison_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    uint64_t *ptr;
+
+    ptr = map_domain_page(mfn);
+    *ptr = ~SCRUB_PATTERN;
+    unmap_domain_page(ptr);
+#endif
+}
+
+static void check_one_page(struct page_info *pg)
+{
+#ifdef CONFIG_SCRUB_DEBUG
+    mfn_t mfn = _mfn(page_to_mfn(pg));
+    const uint64_t *ptr;
+    unsigned int i;
+
+    if ( !boot_scrub_done )
+        return;
+
+    ptr = map_domain_page(mfn);
+    for ( i = 0; i < PAGE_SIZE / sizeof (*ptr); i++ )
+        ASSERT(ptr[i] == SCRUB_PATTERN);
+    unmap_domain_page(ptr);
+#endif
+}
+
 static void check_and_stop_scrub(struct page_info *head)
 {
     if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
@@ -932,6 +973,9 @@ static struct page_info *alloc_heap_pages(
          * guest can control its own visibility of/through the cache.
          */
         flush_page_to_ram(page_to_mfn(&pg[i]), !(memflags & MEMF_no_icache_flush));
+
+        if ( !(memflags & MEMF_no_scrub) )
+            check_one_page(&pg[i]);
     }
 
     spin_unlock(&heap_lock);
@@ -1306,7 +1350,10 @@ static void free_heap_pages(
         set_gpfn_from_mfn(mfn + i, INVALID_M2P_ENTRY);
 
         if ( need_scrub )
+        {
             pg[i].count_info |= PGC_need_scrub;
+            poison_one_page(&pg[i]);
+        }
     }
 
     avail[node][zone] += 1 << order;
@@ -1664,7 +1711,12 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
+#ifndef CONFIG_SCRUB_DEBUG
         free_heap_pages(pg + i, 0, false);
+#else
+        free_heap_pages(pg + i, 0, boot_scrub_done);
+#endif
+	
     }
 }
 
@@ -1930,6 +1982,10 @@ void __init scrub_heap_pages(void)
 
     printk("done.\n");
 
+#ifdef CONFIG_SCRUB_DEBUG
+    boot_scrub_done = true;
+#endif
+
     /* Now that the heap is initialized, run checks and set bounds
      * for the low mem virq algorithm. */
     setup_low_mem_virq();
@@ -2203,12 +2259,16 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 
             spin_unlock_recursive(&d->page_alloc_lock);
 
+#ifndef CONFIG_SCRUB_DEBUG
             /*
              * Normally we expect a domain to clear pages before freeing them,
              * if it cares about the secrecy of their contents. However, after
              * a domain has died we assume responsibility for erasure.
              */
             scrub = !!d->is_dying;
+#else
+            scrub = true;
+#endif
         }
         else
         {
@@ -2300,7 +2360,8 @@ void scrub_one_page(struct page_info *pg)
 
 #ifndef NDEBUG
     /* Avoid callers relying on allocations returning zeroed pages. */
-    unmap_domain_page(memset(__map_domain_page(pg), 0xc2, PAGE_SIZE));
+    unmap_domain_page(memset(__map_domain_page(pg),
+                             SCRUB_BYTE_PATTERN, PAGE_SIZE));
 #else
     /* For a production build, clear_page() is the fastest way to scrub. */
     clear_domain_page(_mfn(page_to_mfn(pg)));
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
@ 2017-08-06 17:41   ` Jan Beulich
  2017-08-07 14:12     ` Boris Ostrovsky
  2017-08-07 10:45   ` Julien Grall
  1 sibling, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-08-06 17:41 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>@@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head)
 >
>while ( cur_order < head_order )
>{
>+            unsigned int idx = INVALID_DIRTY_IDX;

Is it correct for the variable to live in this scope, rather than ...

>@@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head)
>{
>merge:

... in this one? Of course it's less the variable scope itself, but the initial
value at the point here.

>/* We don't consider merging outside the head_order. */
>-                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
>-                PFN_ORDER(cur_head) = cur_order;
>+
>+                /* See if any of the pages indeed need scrubbing. */
>+                if ( first_dirty != INVALID_DIRTY_IDX )
>+                {
>+                    if ( (1U << cur_order) > first_dirty )
>+                    {
>+                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
>+                            if ( test_bit(_PGC_need_scrub,
>+                                          &cur_head[i].count_info) )
>+                            {
>+                                idx = i;
>+                                break;
>+                            }

Why again do you need to look through all the pages here, rather than
simply marking the chunks as needing scrubbing simply based on first_dirty?
It seems to me that I've asked this before, which is a good indication that
such special behavior would better have a comment attached.

>@@ -977,35 +1096,49 @@ static void free_heap_pages(
 >
>if ( (page_to_mfn(pg) & mask) )
>{
>+            struct page_info *predecessor = pg - mask;

For this and ...

>}
>else
>{
>+            struct page_info *successor = pg + mask;

... this, it would certainly help readability of the patch here if the introduction
of the new local variables was broken out in a prereq patch. But yes, I should
have asked for this earlier on, so I'm not going to insist.

>--- a/xen/include/asm-x86/mm.h
>+++ b/xen/include/asm-x86/mm.h
>@@ -88,7 +88,15 @@ struct page_info
>/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>struct {
>/* Do TLBs need flushing for safety before next page use? */
>-            bool_t need_tlbflush;
>+            unsigned long need_tlbflush:1;
>+
>+            /*
>+             * Index of the first *possibly* unscrubbed page in the buddy.
>+             * One more than maximum possible order to accommodate

"One more bit than ..." (I think I did point out the ambiguity of the wording
here before).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages()
  2017-08-04 17:05 ` [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
@ 2017-08-06 17:42   ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-08-06 17:42 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:04 PM >>>
>This will make code a bit more readable, especially with changes that
>will be introduced in subsequent patches.
>
>Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>---
>Changes in v6:
>* Rebased due to changes in the first patch (thus dropped Jan's ACK)

Feel free to re-instate.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 4/8] mm: Scrub memory from idle loop
  2017-08-04 17:05 ` [PATCH v6 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
@ 2017-08-07  7:29   ` Jan Beulich
  2017-08-07 14:05   ` Dario Faggioli
  1 sibling, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-08-07  7:29 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	dario.faggioli, ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:04 PM >>>
>Instead of scrubbing pages during guest destruction (from
>free_heap_pages()) do this opportunistically, from the idle loop.
>
>We might come to scrub_free_pages()from idle loop while another CPU
>uses mapcache override, resulting in a fault while trying to do
>__map_domain_page() in scrub_one_page(). To avoid this, make mapcache
>vcpu override a per-cpu variable.
>
>Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 5/8] spinlock: Introduce spin_lock_cb()
  2017-08-04 17:05 ` [PATCH v6 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
@ 2017-08-07  7:32   ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-08-07  7:32 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>While waiting for a lock we may want to periodically run some
>code. This code may, for example, allow the caller to release
>resources held by it that are no longer needed in the critical
>section protected by the lock.
>
>Specifically, this feature will be needed by scrubbing code where
>the scrubber, while waiting for heap lock to merge back clean
>pages, may be requested by page allocator (which is currently
>holding the lock) to abort merging and release the buddy page head
>that the allocator wants.
>
>We could use spin_trylock() but since it doesn't take lock ticket
>it may take long time until the lock is taken. Instead we add
>spin_lock_cb() that allows us to grab the ticket and execute a
>callback while waiting. This callback is executed on every iteration
>of the spinlock waiting loop.
>
>Since we may be sleeping in the lock until it is released we need a
>mechanism that will make sure that the callback has a chance to run.
>We add spin_lock_kick() that will wake up the waiter.
>
>Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

As previously expressed I'm not overly happy with this, but I also can't
see any good alternative for the use of it in the following patch(es).
Short of any other REST maintainer having voiced any opinion (iirc),
this is

Acked-by: Jan Beulich <jbeulich@suse.com>

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing
  2017-08-04 17:05 ` [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
@ 2017-08-07  7:50   ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-08-07  7:50 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>+static void check_and_stop_scrub(struct page_info *head)
>+{
>+    if ( head->u.free.scrub_state == BUDDY_SCRUBBING )
>+    {
>+        struct page_info pg;

Do you really need a full struct page_info here? I.e. can't this be
typeof(*head->u.free)? (I'm sorry for thinking of this only now.)

>@@ -1144,6 +1221,23 @@ bool scrub_free_pages(void)
>}
>}
 >
>+                st.pg = pg;
>+                /*
>+                 * get_free_buddy() grabs a buddy with first_dirty set to
>+                 * INVALID_DIRTY_IDX so we can't set pg's first_dirty here.
>+                 * It will be set either below or in the lock callback (in
>+                 * scrub_continue()).
>+                 */
>+                st.first_dirty = (i >= (1UL << order) - 1) ?

Everywhere else it is 1U - why 1UL here?

>--- a/xen/include/asm-arm/mm.h
>+++ b/xen/include/asm-arm/mm.h
>@@ -42,18 +42,26 @@ struct page_info
>unsigned long type_info;
>} inuse;
>/* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>-        struct {
>-            /* Do TLBs need flushing for safety before next page use? */
>-            unsigned long need_tlbflush:1;
>-
>-            /*
>-             * Index of the first *possibly* unscrubbed page in the buddy.
>-             * One more than maximum possible order to accommodate
>-             * INVALID_DIRTY_IDX.
>-             */
>+        union {
>+            struct {

Indentation.

>+                /* Do TLBs need flushing for safety before next page use? */
>+                unsigned long need_tlbflush:1;
>+
>+                /*
>+                 * Index of the first *possibly* unscrubbed page in the buddy.
>+                 * One more than maximum possible order to accommodate
>+                 * INVALID_DIRTY_IDX.
>+                 */
>#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>unsigned long first_dirty:MAX_ORDER + 1;

Why is this not being re-indented as well?
 
>+#define BUDDY_NOT_SCRUBBING    0
>+#define BUDDY_SCRUBBING        1
>+#define BUDDY_SCRUB_ABORT      2
>+		unsigned long scrub_state:2;

Indentation. Looks to be even worse in the x86 header (including an
earlier patch adding too much indentation). (Of course all of this with
the caveat that my mail web frontend may have garbled things.)

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
  2017-08-06 17:41   ` Jan Beulich
@ 2017-08-07 10:45   ` Julien Grall
  2017-08-07 14:46     ` Boris Ostrovsky
  1 sibling, 1 reply; 25+ messages in thread
From: Julien Grall @ 2017-08-07 10:45 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Hi Boris,

I would have appreciated to be CCed as maintainer of the ARM bits... 
Please use scripts/get_maintainers.pl in the future.

On 04/08/17 18:05, Boris Ostrovsky wrote:
> . so that it's easy to find pages that need to be scrubbed (those pages are

Pointless .

> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
> index ef84b72..d26b232 100644
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -44,7 +44,16 @@ struct page_info
>          /* Page is on a free list: ((count_info & PGC_count_mask) == 0). */
>          struct {
>              /* Do TLBs need flushing for safety before next page use? */
> -            bool_t need_tlbflush;
> +            unsigned long need_tlbflush:1;

You've turned need_tlbflush from bool to unsigned long : 1. But some of 
the users use true/false or may rely on the boolean property.  So it 
sounds like to me you want to use bool bitfields here (and in the x86 part).

> +
> +            /*
> +             * Index of the first *possibly* unscrubbed page in the buddy.
> +             * One more than maximum possible order to accommodate
> +             * INVALID_DIRTY_IDX.
> +             */
> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
> +            unsigned long first_dirty:MAX_ORDER + 1;

We need to make sure that this union will not be bigger than unsigned 
long. Otherwise this will limit lower down the maximum amount of memory 
we support.
So this likely means a BUILD_BUG_ON(....).

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 4/8] mm: Scrub memory from idle loop
  2017-08-04 17:05 ` [PATCH v6 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
  2017-08-07  7:29   ` Jan Beulich
@ 2017-08-07 14:05   ` Dario Faggioli
  1 sibling, 0 replies; 25+ messages in thread
From: Dario Faggioli @ 2017-08-07 14:05 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


[-- Attachment #1.1: Type: text/plain, Size: 845 bytes --]

On Fri, 2017-08-04 at 13:05 -0400, Boris Ostrovsky wrote:
> Instead of scrubbing pages during guest destruction (from
> free_heap_pages()) do this opportunistically, from the idle loop.
> 
> We might come to scrub_free_pages()from idle loop while another CPU
> uses mapcache override, resulting in a fault while trying to do
> __map_domain_page() in scrub_one_page(). To avoid this, make mapcache
> vcpu override a per-cpu variable.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-06 17:41   ` Jan Beulich
@ 2017-08-07 14:12     ` Boris Ostrovsky
  2017-08-07 14:37       ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-07 14:12 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 08/06/2017 01:41 PM, Jan Beulich wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>> @@ -873,6 +916,8 @@ static int reserve_offlined_page(struct page_info *head)
>  >
>> while ( cur_order < head_order )
>> {
>> +            unsigned int idx = INVALID_DIRTY_IDX;
> Is it correct for the variable to live in this scope, rather than ...
>
>> @@ -892,8 +937,28 @@ static int reserve_offlined_page(struct page_info *head)
>> {
>> merge:
> ... in this one? Of course it's less the variable scope itself, but the initial
> value at the point here.

I should move it to the inner scope --- I actually *want* to
reinitialize it on each iteration.

>
>> /* We don't consider merging outside the head_order. */
>> -                page_list_add_tail(cur_head, &heap(node, zone, cur_order));
>> -                PFN_ORDER(cur_head) = cur_order;
>> +
>> +                /* See if any of the pages indeed need scrubbing. */
>> +                if ( first_dirty != INVALID_DIRTY_IDX )
>> +                {
>> +                    if ( (1U << cur_order) > first_dirty )
>> +                    {
>> +                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
>> +                            if ( test_bit(_PGC_need_scrub,
>> +                                          &cur_head[i].count_info) )
>> +                            {
>> +                                idx = i;
>> +                                break;
>> +                            }
> Why again do you need to look through all the pages here, rather than
> simply marking the chunks as needing scrubbing simply based on first_dirty?
> It seems to me that I've asked this before, which is a good indication that
> such special behavior would better have a comment attached.

We want to make sure that there are in fact dirty pages in the
(sub-)buddy: first_dirty is only a hint there *may be* some. That's why
I have the word "indeed" in the comment there but it's probably worth
expanding on that.

>
>> @@ -977,35 +1096,49 @@ static void free_heap_pages(
>  >
>> if ( (page_to_mfn(pg) & mask) )
>> {
>> +            struct page_info *predecessor = pg - mask;
> For this and ...
>
>> }
>> else
>> {
>> +            struct page_info *successor = pg + mask;
> ... this, it would certainly help readability of the patch here if the introduction
> of the new local variables was broken out in a prereq patch. But yes, I should
> have asked for this earlier on, so I'm not going to insist.

Since I will be re-spinning this patch I will split this out.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 14:12     ` Boris Ostrovsky
@ 2017-08-07 14:37       ` Jan Beulich
  2017-08-07 14:55         ` Boris Ostrovsky
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Beulich @ 2017-08-07 14:37 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>>
>On 08/06/2017 01:41 PM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>>> +                /* See if any of the pages indeed need scrubbing. */
>>> +                if ( first_dirty != INVALID_DIRTY_IDX )
>>> +                {
>>> +                    if ( (1U << cur_order) > first_dirty )
>>> +                    {
>>> +                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
>>> +                            if ( test_bit(_PGC_need_scrub,
>>> +                                          &cur_head[i].count_info) )
>>> +                            {
>>> +                                idx = i;
>>> +                                break;
>>> +                            }
>> Why again do you need to look through all the pages here, rather than
>> simply marking the chunks as needing scrubbing simply based on first_dirty?
>> It seems to me that I've asked this before, which is a good indication that
>> such special behavior would better have a comment attached.
>
>We want to make sure that there are in fact dirty pages in the
>(sub-)buddy: first_dirty is only a hint there *may be* some.

But why is that needed? Iirc you don't go to such length when splitting a
buddy during allocation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 10:45   ` Julien Grall
@ 2017-08-07 14:46     ` Boris Ostrovsky
  2017-08-07 15:23       ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-07 14:46 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

On 08/07/2017 06:45 AM, Julien Grall wrote:
> Hi Boris,
>
> I would have appreciated to be CCed as maintainer of the ARM bits...
> Please use scripts/get_maintainers.pl in the future.

Ugh, sorry about that. (I did test builds for both ARM64 and ARM32, if
this make my transgression any less serious ;-))

>
> On 04/08/17 18:05, Boris Ostrovsky wrote:
>> . so that it's easy to find pages that need to be scrubbed (those
>> pages are
>
> Pointless .
>
>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>> index ef84b72..d26b232 100644
>> --- a/xen/include/asm-arm/mm.h
>> +++ b/xen/include/asm-arm/mm.h
>> @@ -44,7 +44,16 @@ struct page_info
>>          /* Page is on a free list: ((count_info & PGC_count_mask) ==
>> 0). */
>>          struct {
>>              /* Do TLBs need flushing for safety before next page
>> use? */
>> -            bool_t need_tlbflush;
>> +            unsigned long need_tlbflush:1;
>
> You've turned need_tlbflush from bool to unsigned long : 1. But some
> of the users use true/false or may rely on the boolean property.  So
> it sounds like to me you want to use bool bitfields here (and in the
> x86 part).

This is what we have (with this series applied):

root@ovs104> git grep need_tlbflush .
common/memory.c:    bool need_tlbflush = false;
common/memory.c:                       
accumulate_tlbflush(&need_tlbflush, &page[j],
common/memory.c:    if ( need_tlbflush )
common/page_alloc.c:    bool need_tlbflush = false;
common/page_alloc.c:            accumulate_tlbflush(&need_tlbflush, &pg[i],
common/page_alloc.c:    if ( need_tlbflush )
* common/page_alloc.c:        pg[i].u.free.need_tlbflush =
(page_get_owner(&pg[i]) != NULL);
common/page_alloc.c:        if ( pg[i].u.free.need_tlbflush )
include/asm-arm/mm.h:                unsigned long need_tlbflush:1;
include/asm-x86/mm.h:           unsigned long need_tlbflush:1;
include/xen/mm.h:static inline void accumulate_tlbflush(bool *need_tlbflush,
include/xen/mm.h:    if ( page->u.free.need_tlbflush &&
include/xen/mm.h:         (!*need_tlbflush ||
include/xen/mm.h:        *need_tlbflush = true;
root@ovs104>

The only possibly boolean-style use is '*' and event that I think is
allowed.

Stand-alone need_tlbflush variables above have nothing to do with this
change.


>
>> +
>> +            /*
>> +             * Index of the first *possibly* unscrubbed page in the
>> buddy.
>> +             * One more than maximum possible order to accommodate
>> +             * INVALID_DIRTY_IDX.
>> +             */
>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>> +            unsigned long first_dirty:MAX_ORDER + 1;
>
> We need to make sure that this union will not be bigger than unsigned
> long. Otherwise this will limit lower down the maximum amount of
> memory we support.
> So this likely means a BUILD_BUG_ON(....).


Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned
long)? If yes, the compiler should complain anyway, shouldn't it?


-boris



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 14:37       ` Jan Beulich
@ 2017-08-07 14:55         ` Boris Ostrovsky
  2017-08-07 15:28           ` Jan Beulich
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-07 14:55 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

On 08/07/2017 10:37 AM, Jan Beulich wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>>
>> On 08/06/2017 01:41 PM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>>>> +                /* See if any of the pages indeed need scrubbing. */
>>>> +                if ( first_dirty != INVALID_DIRTY_IDX )
>>>> +                {
>>>> +                    if ( (1U << cur_order) > first_dirty )
>>>> +                    {
>>>> +                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
>>>> +                            if ( test_bit(_PGC_need_scrub,
>>>> +                                          &cur_head[i].count_info) )
>>>> +                            {
>>>> +                                idx = i;
>>>> +                                break;
>>>> +                            }
>>> Why again do you need to look through all the pages here, rather than
>>> simply marking the chunks as needing scrubbing simply based on first_dirty?
>>> It seems to me that I've asked this before, which is a good indication that
>>> such special behavior would better have a comment attached.
>> We want to make sure that there are in fact dirty pages in the
>> (sub-)buddy: first_dirty is only a hint there *may be* some.
> But why is that needed? Iirc you don't go to such length when splitting a
> buddy during allocation.

I felt that allocation is more time-critical and so I decided against
trying to be "neat" as far as tracking dirty pages exactly.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 14:46     ` Boris Ostrovsky
@ 2017-08-07 15:23       ` Julien Grall
  2017-08-07 16:57         ` Boris Ostrovsky
  0 siblings, 1 reply; 25+ messages in thread
From: Julien Grall @ 2017-08-07 15:23 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich

Hi,

On 07/08/17 15:46, Boris Ostrovsky wrote:
> On 08/07/2017 06:45 AM, Julien Grall wrote:
>> Hi Boris,
>>
>> I would have appreciated to be CCed as maintainer of the ARM bits...
>> Please use scripts/get_maintainers.pl in the future.
>
> Ugh, sorry about that. (I did test builds for both ARM64 and ARM32, if
> this make my transgression any less serious ;-))
>
>>
>> On 04/08/17 18:05, Boris Ostrovsky wrote:
>>> . so that it's easy to find pages that need to be scrubbed (those
>>> pages are
>>
>> Pointless .
>>
>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>>> index ef84b72..d26b232 100644
>>> --- a/xen/include/asm-arm/mm.h
>>> +++ b/xen/include/asm-arm/mm.h
>>> @@ -44,7 +44,16 @@ struct page_info
>>>          /* Page is on a free list: ((count_info & PGC_count_mask) ==
>>> 0). */
>>>          struct {
>>>              /* Do TLBs need flushing for safety before next page
>>> use? */
>>> -            bool_t need_tlbflush;
>>> +            unsigned long need_tlbflush:1;
>>
>> You've turned need_tlbflush from bool to unsigned long : 1. But some
>> of the users use true/false or may rely on the boolean property.  So
>> it sounds like to me you want to use bool bitfields here (and in the
>> x86 part).
>
> This is what we have (with this series applied):
>
> root@ovs104> git grep need_tlbflush .
> common/memory.c:    bool need_tlbflush = false;
> common/memory.c:
> accumulate_tlbflush(&need_tlbflush, &page[j],
> common/memory.c:    if ( need_tlbflush )
> common/page_alloc.c:    bool need_tlbflush = false;
> common/page_alloc.c:            accumulate_tlbflush(&need_tlbflush, &pg[i],
> common/page_alloc.c:    if ( need_tlbflush )
> * common/page_alloc.c:        pg[i].u.free.need_tlbflush =
> (page_get_owner(&pg[i]) != NULL);
> common/page_alloc.c:        if ( pg[i].u.free.need_tlbflush )
> include/asm-arm/mm.h:                unsigned long need_tlbflush:1;
> include/asm-x86/mm.h:           unsigned long need_tlbflush:1;
> include/xen/mm.h:static inline void accumulate_tlbflush(bool *need_tlbflush,
> include/xen/mm.h:    if ( page->u.free.need_tlbflush &&
> include/xen/mm.h:         (!*need_tlbflush ||
> include/xen/mm.h:        *need_tlbflush = true;
> root@ovs104>
>
> The only possibly boolean-style use is '*' and event that I think is
> allowed.
>
> Stand-alone need_tlbflush variables above have nothing to do with this
> change.

If you look at it, all of them use bool semantic. Now you mix bool and 
int. We are trying to remove that in the new code, so please don't 
introduce new one.

>
>
>>
>>> +
>>> +            /*
>>> +             * Index of the first *possibly* unscrubbed page in the
>>> buddy.
>>> +             * One more than maximum possible order to accommodate
>>> +             * INVALID_DIRTY_IDX.
>>> +             */
>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>
>> We need to make sure that this union will not be bigger than unsigned
>> long. Otherwise this will limit lower down the maximum amount of
>> memory we support.
>> So this likely means a BUILD_BUG_ON(....).
>
>
> Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned
> long)? If yes, the compiler should complain anyway, shouldn't it?

I am more concerned about the sizeof of the union u to be larger than 
unsigned long.

first_dirty should not be greater than 63 bits (or 31 bits for 32-bits 
architecture). Otherwise likely the compiler will add a padding between 
need_tlbflush and first_dirty. This would increase the page_info by 4/8 
byte.

The BUILD_BUG_ON(...) would be here to catch such error.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 14:55         ` Boris Ostrovsky
@ 2017-08-07 15:28           ` Jan Beulich
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Beulich @ 2017-08-07 15:28 UTC (permalink / raw)
  To: boris.ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel

>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:56 PM >>>
>On 08/07/2017 10:37 AM, Jan Beulich wrote:
>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/07/17 4:16 PM >>>
>>> On 08/06/2017 01:41 PM, Jan Beulich wrote:
>>>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> 08/04/17 7:03 PM >>>
>>>>> +                /* See if any of the pages indeed need scrubbing. */
>>>>> +                if ( first_dirty != INVALID_DIRTY_IDX )
>>>>> +                {
>>>>> +                    if ( (1U << cur_order) > first_dirty )
>>>>> +                    {
>>>>> +                        for ( i = first_dirty ; i < (1U << cur_order); i++ )
>>>>> +                            if ( test_bit(_PGC_need_scrub,
>>>>> +                                          &cur_head[i].count_info) )
>>>>> +                            {
>>>>> +                                idx = i;
>>>>> +                                break;
>>>>> +                            }
>>>> Why again do you need to look through all the pages here, rather than
>>>> simply marking the chunks as needing scrubbing simply based on first_dirty?
>>>> It seems to me that I've asked this before, which is a good indication that
>>>> such special behavior would better have a comment attached.
>>> We want to make sure that there are in fact dirty pages in the
>>> (sub-)buddy: first_dirty is only a hint there *may be* some.
>> But why is that needed? Iirc you don't go to such length when splitting a
>> buddy during allocation.
>
>I felt that allocation is more time-critical and so I decided against
>trying to be "neat" as far as tracking dirty pages exactly.

I'd suggest to use the simpler approach here too, if the more involved one isn't
needed for correctness.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 15:23       ` Julien Grall
@ 2017-08-07 16:57         ` Boris Ostrovsky
  2017-08-07 17:01           ` Julien Grall
  0 siblings, 1 reply; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-07 16:57 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


>>>
>>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>>>> index ef84b72..d26b232 100644
>>>> --- a/xen/include/asm-arm/mm.h
>>>> +++ b/xen/include/asm-arm/mm.h
>>>> @@ -44,7 +44,16 @@ struct page_info
>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) ==
>>>> 0). */
>>>>          struct {
>>>>              /* Do TLBs need flushing for safety before next page
>>>> use? */
>>>> -            bool_t need_tlbflush;
>>>> +            unsigned long need_tlbflush:1;
>>>
>>> You've turned need_tlbflush from bool to unsigned long : 1. But some
>>> of the users use true/false or may rely on the boolean property.  So
>>> it sounds like to me you want to use bool bitfields here (and in the
>>> x86 part).
>>
>> This is what we have (with this series applied):
>>
>> root@ovs104> git grep need_tlbflush .
>> common/memory.c:    bool need_tlbflush = false;
>> common/memory.c:
>> accumulate_tlbflush(&need_tlbflush, &page[j],
>> common/memory.c:    if ( need_tlbflush )
>> common/page_alloc.c:    bool need_tlbflush = false;
>> common/page_alloc.c:            accumulate_tlbflush(&need_tlbflush,
>> &pg[i],
>> common/page_alloc.c:    if ( need_tlbflush )
>> * common/page_alloc.c:        pg[i].u.free.need_tlbflush =
>> (page_get_owner(&pg[i]) != NULL);
>> common/page_alloc.c:        if ( pg[i].u.free.need_tlbflush )
>> include/asm-arm/mm.h:                unsigned long need_tlbflush:1;
>> include/asm-x86/mm.h:           unsigned long need_tlbflush:1;
>> include/xen/mm.h:static inline void accumulate_tlbflush(bool
>> *need_tlbflush,
>> include/xen/mm.h:    if ( page->u.free.need_tlbflush &&
>> include/xen/mm.h:         (!*need_tlbflush ||
>> include/xen/mm.h:        *need_tlbflush = true;
>> root@ovs104>
>>
>> The only possibly boolean-style use is '*' and event that I think is
>> allowed.
>>
>> Stand-alone need_tlbflush variables above have nothing to do with this
>> change.
>
> If you look at it, all of them use bool semantic. Now you mix bool and
> int. We are trying to remove that in the new code, so please don't
> introduce new one.


I am not sure I see how we are mixing the semantics here. <datatype>:1
is really a tightly-packed bool.

Switching to bitfields was, btw, suggested by Jan at some point so if
the two of you agree on how to proceed I can go either way (but by
preference is to keep it as a single-bit bitfield).


>
>>
>>
>>>
>>>> +
>>>> +            /*
>>>> +             * Index of the first *possibly* unscrubbed page in the
>>>> buddy.
>>>> +             * One more than maximum possible order to accommodate
>>>> +             * INVALID_DIRTY_IDX.
>>>> +             */
>>>> +#define INVALID_DIRTY_IDX ((1UL << (MAX_ORDER + 1)) - 1)
>>>> +            unsigned long first_dirty:MAX_ORDER + 1;
>>>
>>> We need to make sure that this union will not be bigger than unsigned
>>> long. Otherwise this will limit lower down the maximum amount of
>>> memory we support.
>>> So this likely means a BUILD_BUG_ON(....).
>>
>>
>> Are you concerned that (MAX_ORDER+1) will be larger than sizeof(unsigned
>> long)? If yes, the compiler should complain anyway, shouldn't it?
>
> I am more concerned about the sizeof of the union u to be larger than
> unsigned long.
>
> first_dirty should not be greater than 63 bits (or 31 bits for 32-bits
> architecture). Otherwise likely the compiler will add a padding
> between need_tlbflush and first_dirty. This would increase the
> page_info by 4/8 byte.
>
> The BUILD_BUG_ON(...) would be here to catch such error.

Oh, I see. Sure, I'll add it.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 16:57         ` Boris Ostrovsky
@ 2017-08-07 17:01           ` Julien Grall
  2017-08-07 17:20             ` Boris Ostrovsky
  0 siblings, 1 reply; 25+ messages in thread
From: Julien Grall @ 2017-08-07 17:01 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich



On 07/08/17 17:57, Boris Ostrovsky wrote:
>
>>>>
>>>>> diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
>>>>> index ef84b72..d26b232 100644
>>>>> --- a/xen/include/asm-arm/mm.h
>>>>> +++ b/xen/include/asm-arm/mm.h
>>>>> @@ -44,7 +44,16 @@ struct page_info
>>>>>          /* Page is on a free list: ((count_info & PGC_count_mask) ==
>>>>> 0). */
>>>>>          struct {
>>>>>              /* Do TLBs need flushing for safety before next page
>>>>> use? */
>>>>> -            bool_t need_tlbflush;
>>>>> +            unsigned long need_tlbflush:1;
>>>>
>>>> You've turned need_tlbflush from bool to unsigned long : 1. But some
>>>> of the users use true/false or may rely on the boolean property.  So
>>>> it sounds like to me you want to use bool bitfields here (and in the
>>>> x86 part).
>>>
>>> This is what we have (with this series applied):
>>>
>>> root@ovs104> git grep need_tlbflush .
>>> common/memory.c:    bool need_tlbflush = false;
>>> common/memory.c:
>>> accumulate_tlbflush(&need_tlbflush, &page[j],
>>> common/memory.c:    if ( need_tlbflush )
>>> common/page_alloc.c:    bool need_tlbflush = false;
>>> common/page_alloc.c:            accumulate_tlbflush(&need_tlbflush,
>>> &pg[i],
>>> common/page_alloc.c:    if ( need_tlbflush )
>>> * common/page_alloc.c:        pg[i].u.free.need_tlbflush =
>>> (page_get_owner(&pg[i]) != NULL);
>>> common/page_alloc.c:        if ( pg[i].u.free.need_tlbflush )
>>> include/asm-arm/mm.h:                unsigned long need_tlbflush:1;
>>> include/asm-x86/mm.h:           unsigned long need_tlbflush:1;
>>> include/xen/mm.h:static inline void accumulate_tlbflush(bool
>>> *need_tlbflush,
>>> include/xen/mm.h:    if ( page->u.free.need_tlbflush &&
>>> include/xen/mm.h:         (!*need_tlbflush ||
>>> include/xen/mm.h:        *need_tlbflush = true;
>>> root@ovs104>
>>>
>>> The only possibly boolean-style use is '*' and event that I think is
>>> allowed.
>>>
>>> Stand-alone need_tlbflush variables above have nothing to do with this
>>> change.
>>
>> If you look at it, all of them use bool semantic. Now you mix bool and
>> int. We are trying to remove that in the new code, so please don't
>> introduce new one.
>
>
> I am not sure I see how we are mixing the semantics here. <datatype>:1
> is really a tightly-packed bool.

It is not a bool. It is an unsigned int. So if I am not mistaken, if you 
assign 0x2, the variable will be 0 unless you do !!(...).

Whilst with bool you would get 1.

>
> Switching to bitfields was, btw, suggested by Jan at some point so if
> the two of you agree on how to proceed I can go either way (but by
> preference is to keep it as a single-bit bitfield).

If you use a single-bit bitfield of bool (i.e bool need_flush : 1) you 
would address both Jan's comment and mine.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist
  2017-08-07 17:01           ` Julien Grall
@ 2017-08-07 17:20             ` Boris Ostrovsky
  0 siblings, 0 replies; 25+ messages in thread
From: Boris Ostrovsky @ 2017-08-07 17:20 UTC (permalink / raw)
  To: Julien Grall, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich


>>
>> Switching to bitfields was, btw, suggested by Jan at some point so if
>> the two of you agree on how to proceed I can go either way (but by
>> preference is to keep it as a single-bit bitfield).
>
> If you use a single-bit bitfield of bool (i.e bool need_flush : 1) you
> would address both Jan's comment and mine.

Haven't considered this. I thought you meant a fully-sized bool. I'll do
this, thanks.

-boris


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-08-07 17:20 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04 17:05 [PATCH v6 0/8] Memory scrubbing from idle loop Boris Ostrovsky
2017-08-04 17:05 ` [PATCH v6 1/8] mm: Place unscrubbed pages at the end of pagelist Boris Ostrovsky
2017-08-06 17:41   ` Jan Beulich
2017-08-07 14:12     ` Boris Ostrovsky
2017-08-07 14:37       ` Jan Beulich
2017-08-07 14:55         ` Boris Ostrovsky
2017-08-07 15:28           ` Jan Beulich
2017-08-07 10:45   ` Julien Grall
2017-08-07 14:46     ` Boris Ostrovsky
2017-08-07 15:23       ` Julien Grall
2017-08-07 16:57         ` Boris Ostrovsky
2017-08-07 17:01           ` Julien Grall
2017-08-07 17:20             ` Boris Ostrovsky
2017-08-04 17:05 ` [PATCH v6 2/8] mm: Extract allocation loop from alloc_heap_pages() Boris Ostrovsky
2017-08-06 17:42   ` Jan Beulich
2017-08-04 17:05 ` [PATCH v6 3/8] mm: Scrub pages in alloc_heap_pages() if needed Boris Ostrovsky
2017-08-04 17:05 ` [PATCH v6 4/8] mm: Scrub memory from idle loop Boris Ostrovsky
2017-08-07  7:29   ` Jan Beulich
2017-08-07 14:05   ` Dario Faggioli
2017-08-04 17:05 ` [PATCH v6 5/8] spinlock: Introduce spin_lock_cb() Boris Ostrovsky
2017-08-07  7:32   ` Jan Beulich
2017-08-04 17:05 ` [PATCH v6 6/8] mm: Keep heap accessible to others while scrubbing Boris Ostrovsky
2017-08-07  7:50   ` Jan Beulich
2017-08-04 17:05 ` [PATCH v6 7/8] mm: Print number of unscrubbed pages in 'H' debug handler Boris Ostrovsky
2017-08-04 17:05 ` [PATCH v6 8/8] mm: Make sure pages are scrubbed Boris Ostrovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.