All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Scrubbing updates
@ 2017-08-29 17:09 Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

First patch fixes a long-standing bug where a low memory monitor is
not initialized if boottime scrubbing is turned off.

Patch 2 is new.

The other threee patches are performace and readability optimizations.

Boris Ostrovsky (5):
  mm: Initialize lowmem virq when boot-time scrubbing is disabled
  mm: Change boot_scrub_done definition
  mm: Don't poison a page if scrub_debug is off
  mm: Don't request scrubbing until dom0 is running
  mm: Don't hold heap lock in alloc_heap_pages() longer than necessary

 xen/arch/arm/setup.c    |  3 +--
 xen/arch/x86/setup.c    |  3 +--
 xen/common/page_alloc.c | 42 +++++++++++++++++++++++-------------------
 xen/include/xen/mm.h    |  2 +-
 4 files changed, 26 insertions(+), 24 deletions(-)

-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled
  2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
@ 2017-08-29 17:09 ` Boris Ostrovsky
  2017-08-30  8:40   ` Wei Liu
  2017-08-30  8:50   ` Jan Beulich
  2017-08-29 17:09 ` [PATCH 2/5] mm: Change boot_scrub_done definition Boris Ostrovsky
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, Julien Grall, jbeulich, Boris Ostrovsky

scrub_heap_pages() does early return if boot-time scrubbing is
disabled, neglecting to initialize lowmem VIRQ.

Because setup_low_mem_virq() doesn't logically belong in
scrub_heap_pages() we put them both into the newly added
heap_init_late().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
CC: Julien Grall <julien.grall@arm.com>
---
Changes in v2:
* Added heap_init_late() which calls setup_low_mem_vir() and scrub_heap_pages()

 xen/arch/arm/setup.c    |  3 +--
 xen/arch/x86/setup.c    |  3 +--
 xen/common/page_alloc.c | 11 +++++++----
 xen/include/xen/mm.h    |  2 +-
 4 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 3b34855..92f173b 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -861,8 +861,7 @@ void __init start_xen(unsigned long boot_phys_offset,
     if ( construct_dom0(dom0) != 0)
             panic("Could not set up DOM0 guest OS");
 
-    /* Scrub RAM that is still free and so may go to an unprivileged domain. */
-    scrub_heap_pages();
+    heap_init_late();
 
     init_constructors();
 
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index ec96287..bc466e8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1662,8 +1662,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         cr4_pv32_mask |= X86_CR4_SMAP;
     }
 
-    /* Scrub RAM that is still free and so may go to an unprivileged domain. */
-    scrub_heap_pages();
+    heap_init_late();
 
     init_trace_bufs();
 
diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 9fa62d2..e1f7cd2 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1839,7 +1839,7 @@ static int __init find_non_smt(unsigned int node, cpumask_t *dest)
  * Scrub all unallocated pages in all heap zones. This function uses all
  * online cpu's to scrub the memory in parallel.
  */
-void __init scrub_heap_pages(void)
+static void __init scrub_heap_pages(void)
 {
     cpumask_t node_cpus, all_worker_cpus;
     unsigned int i, j;
@@ -1970,12 +1970,15 @@ void __init scrub_heap_pages(void)
 #ifdef CONFIG_SCRUB_DEBUG
     boot_scrub_done = true;
 #endif
+}
 
-    /* Now that the heap is initialized, run checks and set bounds
-     * for the low mem virq algorithm. */
+void __init heap_init_late(void)
+{
     setup_low_mem_virq();
-}
 
+    if ( opt_bootscrub )
+        scrub_heap_pages();
+}
 
 
 /*************************
diff --git a/xen/include/xen/mm.h b/xen/include/xen/mm.h
index ddc3fb3..c2f5a08 100644
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -199,7 +199,7 @@ int offline_page(unsigned long mfn, int broken, uint32_t *status);
 int query_page_offline(unsigned long mfn, uint32_t *status);
 unsigned long total_free_pages(void);
 
-void scrub_heap_pages(void);
+void heap_init_late(void);
 
 int assign_pages(
     struct domain *d,
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] mm: Change boot_scrub_done definition
  2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
@ 2017-08-29 17:09 ` Boris Ostrovsky
  2017-08-30  8:40   ` Wei Liu
  2017-08-29 17:09 ` [PATCH 3/5] mm: Don't poison a page if scrub_debug is off Boris Ostrovsky
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Rename it to the more appropriate scrub_debug and define as a macro
for !CONFIG_SCRUB_DEBUG. This will allow us to get rid of some
ifdefs (here and in the subsequent patch).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Suggested-by: Jan Beulich <JBeulich@suse.com>
---
New in v2

 xen/common/page_alloc.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index e1f7cd2..a4a193b 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -171,7 +171,9 @@ static unsigned long __initdata opt_bootscrub_chunk = MB(128);
 size_param("bootscrub_chunk", opt_bootscrub_chunk);
 
 #ifdef CONFIG_SCRUB_DEBUG
-static bool __read_mostly boot_scrub_done;
+static bool __read_mostly scrub_debug;
+#else
+#define scrub_debug    false
 #endif
 
 /*
@@ -725,7 +727,7 @@ static void check_one_page(struct page_info *pg)
     const uint64_t *ptr;
     unsigned int i;
 
-    if ( !boot_scrub_done )
+    if ( !scrub_debug )
         return;
 
     ptr = map_domain_page(mfn);
@@ -1696,12 +1698,7 @@ static void init_heap_pages(
             nr_pages -= n;
         }
 
-#ifndef CONFIG_SCRUB_DEBUG
-        free_heap_pages(pg + i, 0, false);
-#else
-        free_heap_pages(pg + i, 0, boot_scrub_done);
-#endif
-	
+        free_heap_pages(pg + i, 0, scrub_debug);
     }
 }
 
@@ -1968,7 +1965,7 @@ static void __init scrub_heap_pages(void)
     printk("done.\n");
 
 #ifdef CONFIG_SCRUB_DEBUG
-    boot_scrub_done = true;
+    scrub_debug = true;
 #endif
 }
 
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] mm: Don't poison a page if scrub_debug is off
  2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 2/5] mm: Change boot_scrub_done definition Boris Ostrovsky
@ 2017-08-29 17:09 ` Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 4/5] mm: Don't request scrubbing until dom0 is running Boris Ostrovsky
  2017-08-29 17:09 ` [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary Boris Ostrovsky
  4 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

If scrub_debug is off we don't check pages in check_one_page().
Thus there is no reason to ever poison them.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
Changes in v2:
* s/boot_scrub_done/scrub_debug

 xen/common/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a4a193b..3db77c5 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -714,6 +714,9 @@ static void poison_one_page(struct page_info *pg)
     mfn_t mfn = _mfn(page_to_mfn(pg));
     uint64_t *ptr;
 
+    if ( !scrub_debug )
+        return;
+
     ptr = map_domain_page(mfn);
     *ptr = ~SCRUB_PATTERN;
     unmap_domain_page(ptr);
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] mm: Don't request scrubbing until dom0 is running
  2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
                   ` (2 preceding siblings ...)
  2017-08-29 17:09 ` [PATCH 3/5] mm: Don't poison a page if scrub_debug is off Boris Ostrovsky
@ 2017-08-29 17:09 ` Boris Ostrovsky
  2017-08-30  8:42   ` Wei Liu
  2017-08-29 17:09 ` [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary Boris Ostrovsky
  4 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

There is no need to scrub pages freed during dom0 construction
since heap will be scrubbed once dom0 is ready (by scrub_heap_pages()).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
---
Changes in v2:
* Removed '#ifdef CONFIG_SCRUB_DEBUG'

 xen/common/page_alloc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 3db77c5..6c08983 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -2247,16 +2247,12 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
 
             spin_unlock_recursive(&d->page_alloc_lock);
 
-#ifndef CONFIG_SCRUB_DEBUG
             /*
              * Normally we expect a domain to clear pages before freeing them,
              * if it cares about the secrecy of their contents. However, after
              * a domain has died we assume responsibility for erasure.
              */
-            scrub = !!d->is_dying;
-#else
-            scrub = true;
-#endif
+            scrub = !!d->is_dying | scrub_debug;
         }
         else
         {
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
                   ` (3 preceding siblings ...)
  2017-08-29 17:09 ` [PATCH 4/5] mm: Don't request scrubbing until dom0 is running Boris Ostrovsky
@ 2017-08-29 17:09 ` Boris Ostrovsky
  2017-08-30  8:43   ` Wei Liu
  2017-08-30  9:27   ` Andrew Cooper
  4 siblings, 2 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-29 17:09 UTC (permalink / raw)
  To: xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, tim, jbeulich, Boris Ostrovsky

Once pages are removed from the heap we don't need to hold the heap
lock. It is especially useful to drop it for an unscrubbed buddy since
we will be scrubbing it.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v2:
* Moved spin_unlock() after d->last_alloc_node assignment

 xen/common/page_alloc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 6c08983..8df92aa 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -860,6 +860,7 @@ static struct page_info *alloc_heap_pages(
     struct page_info *pg;
     bool need_tlbflush = false;
     uint32_t tlbflush_timestamp = 0;
+    unsigned int dirty_cnt = 0;
 
     /* Make sure there are enough bits in memflags for nodeID. */
     BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
@@ -948,6 +949,8 @@ static struct page_info *alloc_heap_pages(
     if ( d != NULL )
         d->last_alloc_node = node;
 
+    spin_unlock(&heap_lock);
+
     for ( i = 0; i < (1 << order); i++ )
     {
         /* Reference count must continuously be zero for free pages. */
@@ -957,7 +960,7 @@ static struct page_info *alloc_heap_pages(
         {
             if ( !(memflags & MEMF_no_scrub) )
                 scrub_one_page(&pg[i]);
-            node_need_scrub[node]--;
+            dirty_cnt++;
         }
 
         pg[i].count_info = PGC_state_inuse;
@@ -979,6 +982,8 @@ static struct page_info *alloc_heap_pages(
             check_one_page(&pg[i]);
     }
 
+    spin_lock(&heap_lock);
+    node_need_scrub[node] -= dirty_cnt;
     spin_unlock(&heap_lock);
 
     if ( need_tlbflush )
-- 
1.8.3.1


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled
  2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
@ 2017-08-30  8:40   ` Wei Liu
  2017-08-30  8:50   ` Jan Beulich
  1 sibling, 0 replies; 22+ messages in thread
From: Wei Liu @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Julien Grall, jbeulich

On Tue, Aug 29, 2017 at 01:09:13PM -0400, Boris Ostrovsky wrote:
[...]
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 9fa62d2..e1f7cd2 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1839,7 +1839,7 @@ static int __init find_non_smt(unsigned int node, cpumask_t *dest)
>   * Scrub all unallocated pages in all heap zones. This function uses all
>   * online cpu's to scrub the memory in parallel.
>   */
> -void __init scrub_heap_pages(void)
> +static void __init scrub_heap_pages(void)
>  {

Since you now guard against opt_bootscrub in heap_init_late, you should
remove the check of opt_bootscrub in this function.

With this fixed:

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] mm: Change boot_scrub_done definition
  2017-08-29 17:09 ` [PATCH 2/5] mm: Change boot_scrub_done definition Boris Ostrovsky
@ 2017-08-30  8:40   ` Wei Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2017-08-30  8:40 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich

On Tue, Aug 29, 2017 at 01:09:14PM -0400, Boris Ostrovsky wrote:
> Rename it to the more appropriate scrub_debug and define as a macro
> for !CONFIG_SCRUB_DEBUG. This will allow us to get rid of some
> ifdefs (here and in the subsequent patch).
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Suggested-by: Jan Beulich <JBeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] mm: Don't request scrubbing until dom0 is running
  2017-08-29 17:09 ` [PATCH 4/5] mm: Don't request scrubbing until dom0 is running Boris Ostrovsky
@ 2017-08-30  8:42   ` Wei Liu
  2017-08-30 14:19     ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Liu @ 2017-08-30  8:42 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich

On Tue, Aug 29, 2017 at 01:09:16PM -0400, Boris Ostrovsky wrote:
> There is no need to scrub pages freed during dom0 construction
> since heap will be scrubbed once dom0 is ready (by scrub_heap_pages()).
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> ---
> Changes in v2:
> * Removed '#ifdef CONFIG_SCRUB_DEBUG'
> 
>  xen/common/page_alloc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 3db77c5..6c08983 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2247,16 +2247,12 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>  
>              spin_unlock_recursive(&d->page_alloc_lock);
>  
> -#ifndef CONFIG_SCRUB_DEBUG
>              /*
>               * Normally we expect a domain to clear pages before freeing them,
>               * if it cares about the secrecy of their contents. However, after
>               * a domain has died we assume responsibility for erasure.
>               */
> -            scrub = !!d->is_dying;
> -#else
> -            scrub = true;
> -#endif
> +            scrub = !!d->is_dying | scrub_debug;

Use logical or here please. And the !! for is_dying is not necessary.

Also please expand the comment to say why scrub_debug is also checked.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-29 17:09 ` [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary Boris Ostrovsky
@ 2017-08-30  8:43   ` Wei Liu
  2017-08-30  9:27   ` Andrew Cooper
  1 sibling, 0 replies; 22+ messages in thread
From: Wei Liu @ 2017-08-30  8:43 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich

On Tue, Aug 29, 2017 at 01:09:17PM -0400, Boris Ostrovsky wrote:
> Once pages are removed from the heap we don't need to hold the heap
> lock. It is especially useful to drop it for an unscrubbed buddy since
> we will be scrubbing it.
> 
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled
  2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
  2017-08-30  8:40   ` Wei Liu
@ 2017-08-30  8:50   ` Jan Beulich
  2017-08-30 13:02     ` Boris Ostrovsky
  1 sibling, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-08-30  8:50 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Julien Grall

>>> On 29.08.17 at 19:09, <boris.ostrovsky@oracle.com> wrote:
> @@ -1970,12 +1970,15 @@ void __init scrub_heap_pages(void)
>  #ifdef CONFIG_SCRUB_DEBUG
>      boot_scrub_done = true;
>  #endif
> +}
>  
> -    /* Now that the heap is initialized, run checks and set bounds
> -     * for the low mem virq algorithm. */
> +void __init heap_init_late(void)
> +{
>      setup_low_mem_virq();
> -}
>  
> +    if ( opt_bootscrub )
> +        scrub_heap_pages();
> +}

Any reason you fully remove that comment? I think the "run checks"
part is stale (if it was ever valid in the first place), but the rest
could more or less stay.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-29 17:09 ` [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary Boris Ostrovsky
  2017-08-30  8:43   ` Wei Liu
@ 2017-08-30  9:27   ` Andrew Cooper
  2017-08-30 10:22     ` Jan Beulich
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2017-08-30  9:27 UTC (permalink / raw)
  To: Boris Ostrovsky, xen-devel
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, jbeulich

On 29/08/17 18:09, Boris Ostrovsky wrote:
> Once pages are removed from the heap we don't need to hold the heap
> lock. It is especially useful to drop it for an unscrubbed buddy since
> we will be scrubbing it.
>
> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> Changes in v2:
> * Moved spin_unlock() after d->last_alloc_node assignment
>
>  xen/common/page_alloc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 6c08983..8df92aa 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -860,6 +860,7 @@ static struct page_info *alloc_heap_pages(
>      struct page_info *pg;
>      bool need_tlbflush = false;
>      uint32_t tlbflush_timestamp = 0;
> +    unsigned int dirty_cnt = 0;
>  
>      /* Make sure there are enough bits in memflags for nodeID. */
>      BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
> @@ -948,6 +949,8 @@ static struct page_info *alloc_heap_pages(
>      if ( d != NULL )
>          d->last_alloc_node = node;
>  
> +    spin_unlock(&heap_lock);
> +
>      for ( i = 0; i < (1 << order); i++ )
>      {
>          /* Reference count must continuously be zero for free pages. */
> @@ -957,7 +960,7 @@ static struct page_info *alloc_heap_pages(
>          {
>              if ( !(memflags & MEMF_no_scrub) )
>                  scrub_one_page(&pg[i]);
> -            node_need_scrub[node]--;
> +            dirty_cnt++;
>          }
>  
>          pg[i].count_info = PGC_state_inuse;
> @@ -979,6 +982,8 @@ static struct page_info *alloc_heap_pages(
>              check_one_page(&pg[i]);
>      }
>  
> +    spin_lock(&heap_lock);
> +    node_need_scrub[node] -= dirty_cnt;
>      spin_unlock(&heap_lock);
>  
>      if ( need_tlbflush )

This patch has been applied to staging, but its got problems.  The
following crash is rather trivial to provoke:

~Andrew

(d19) Test result: SUCCESS
(XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----

(XEN) CPU:    5

(XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1

(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v1)

(XEN) rax: 0000000000001b50   rbx: ffff82e00dd164a0   rcx: ffff82ffffffffe0

(XEN) rdx: ffff82ffffffffe0   rsi: ffff82d08056f600   rdi: 00000000ffffffff

(XEN) rbp: ffff83083751fda8   rsp: ffff83083751fd48   r8:  00000000000001b5

(XEN) r9:  0000000000000017   r10: 0000000000000017   r11: 0000000000000246

(XEN) r12: 0000000000000000   r13: 000000000000019e   r14: 0000000000000000

(XEN) r15: 0000000000000000   cr0: 0000000080050033   cr4: 00000000001526e0

(XEN) cr3: 0000000772d3f000   cr2: ffff82ffffffffe4

(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008

(XEN) Xen code around <ffff82d0802252fc> (page_alloc.c#free_heap_pages+0x786/0x7a1):

(XEN)  24 89 01 e9 91 fd ff ff <89> 7a 04 8b 03 89 01 e9 4d ff ff ff 48 83 c4 38

(XEN) Xen stack trace from rsp=ffff83083751fd48:

(XEN)    0000000000000001 0000001700000001 0000000000000000 0000000000000017

(XEN)    ffff82e00dd16480 0000000000000000 ffff82e00dd115e0 0000000000000000

(XEN)    ffff82e00dd16480 0000000000000000 ffff8306ec55f000 ffffea0002049040

(XEN)    ffff83083751fdf8 ffff82d080226785 ffff82d08023afa5 0000000000000203

(XEN)    ffff8306ec55f000 ffff8306ec55f330 ffff8306ec55f4d8 ffff8306ec55faa8

(XEN)    ffff8306ec55f000 ffffea0002049040 ffff83083751fe18 ffff82d0802f1e04

(XEN)    ffff8306ec55f000 ffff8306ec55f000 ffff83083751fe48 ffff82d0802e0f95

(XEN)    ffff8306ec55f000 00000000ffffffff ffff8306ec55faa8 ffff8306ec55f000

(XEN)    ffff83083751fe68 ffff82d080271bca ffff8306ec55faa8 ffff8300abdd5000

(XEN)    ffff83083751fe98 ffff82d080207d74 ffff830837516040 0000000000000000

(XEN)    0000000000000000 ffff83083751ffff ffff83083751fec8 ffff82d080229a3c

(XEN)    ffff82d080572d80 ffff82d080573000 ffff82d080572d80 ffffffffffffffff

(XEN)    ffff83083751fef8 ffff82d08023a68a ffff8300abfa7000 ffff88010490ede8

(XEN)    00000007b7595025 ffff8800848ba9a0 ffff83083751ff08 ffff82d08023a6df

(XEN)    00007cf7c8ae00c7 ffff82d08035f3a1 ffffea0002049040 ffff8800848ba9a0

(XEN)    00000007b7595025 ffff88010490ede8 ffff88008491bd78 0000000000000de8

(XEN)    0000000000000246 deadbeefdeadf00d 0000000000000000 0000000000000000

(XEN)    0000000000000000 ffffffff8100102a deadbeefdeadf00d deadbeefdeadf00d

(XEN)    deadbeefdeadf00d 0000010000000000 ffffffff8100102a 000000000000e033

(XEN)    0000000000000246 ffff88008491bd28 000000000000e02b c2c2c2c2c2c2beef

(XEN) Xen call trace:

(XEN)    [<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1

(XEN)    [<ffff82d080226785>] free_domheap_pages+0x312/0x37c

(XEN)    [<ffff82d0802f1e04>] stdvga_deinit+0x30/0x46

(XEN)    [<ffff82d0802e0f95>] hvm_domain_destroy+0x60/0x116

(XEN)    [<ffff82d080271bca>] arch_domain_destroy+0x1a/0x8f

(XEN)    [<ffff82d080207d74>] domain.c#complete_domain_destroy+0x6f/0x182

(XEN)    [<ffff82d080229a3c>] rcupdate.c#rcu_process_callbacks+0x141/0x1a2

(XEN)    [<ffff82d08023a68a>] softirq.c#__do_softirq+0x7f/0x8a

(XEN)    [<ffff82d08023a6df>] do_softirq+0x13/0x15

(XEN)    [<ffff82d08035f3a1>] x86_64/entry.S#process_softirqs+0x21/0x30

(XEN) 

(XEN) Pagetable walk from ffff82ffffffffe4:

(XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff

(XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff

(XEN) 

(XEN) ****************************************

(XEN) Panic on CPU 5:

(XEN) FATAL PAGE FAULT

(XEN) [error_code=0002]

(XEN) Faulting linear address: ffff82ffffffffe4

(XEN) ****************************************

(XEN) 

(XEN) Reboot in five seconds...

(XEN) Executing kexec image on cpu5

(XEN) Shot down all CPUs

I'm in purgatory



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30  9:27   ` Andrew Cooper
@ 2017-08-30 10:22     ` Jan Beulich
  2017-08-30 12:59       ` Boris Ostrovsky
  0 siblings, 1 reply; 22+ messages in thread
From: Jan Beulich @ 2017-08-30 10:22 UTC (permalink / raw)
  To: Andrew Cooper, Boris Ostrovsky
  Cc: sstabellini, wei.liu2, George.Dunlap, tim, ian.jackson, xen-devel

>>> On 30.08.17 at 11:27, <andrew.cooper3@citrix.com> wrote:
> On 29/08/17 18:09, Boris Ostrovsky wrote:
>> Once pages are removed from the heap we don't need to hold the heap
>> lock. It is especially useful to drop it for an unscrubbed buddy since
>> we will be scrubbing it.
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> Changes in v2:
>> * Moved spin_unlock() after d->last_alloc_node assignment
>>
>>  xen/common/page_alloc.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 6c08983..8df92aa 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -860,6 +860,7 @@ static struct page_info *alloc_heap_pages(
>>      struct page_info *pg;
>>      bool need_tlbflush = false;
>>      uint32_t tlbflush_timestamp = 0;
>> +    unsigned int dirty_cnt = 0;
>>  
>>      /* Make sure there are enough bits in memflags for nodeID. */
>>      BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));
>> @@ -948,6 +949,8 @@ static struct page_info *alloc_heap_pages(
>>      if ( d != NULL )
>>          d->last_alloc_node = node;
>>  
>> +    spin_unlock(&heap_lock);
>> +
>>      for ( i = 0; i < (1 << order); i++ )
>>      {
>>          /* Reference count must continuously be zero for free pages. */
>> @@ -957,7 +960,7 @@ static struct page_info *alloc_heap_pages(
>>          {
>>              if ( !(memflags & MEMF_no_scrub) )
>>                  scrub_one_page(&pg[i]);
>> -            node_need_scrub[node]--;
>> +            dirty_cnt++;
>>          }
>>  
>>          pg[i].count_info = PGC_state_inuse;
>> @@ -979,6 +982,8 @@ static struct page_info *alloc_heap_pages(
>>              check_one_page(&pg[i]);
>>      }
>>  
>> +    spin_lock(&heap_lock);
>> +    node_need_scrub[node] -= dirty_cnt;
>>      spin_unlock(&heap_lock);
>>  
>>      if ( need_tlbflush )
> 
> This patch has been applied to staging, but its got problems.  The
> following crash is rather trivial to provoke:
> 
> ~Andrew
> 
> (d19) Test result: SUCCESS
> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
> (XEN) CPU:    5
> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
>...
> (XEN) Pagetable walk from ffff82ffffffffe4:
> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff

Some negative offset into somewhere, it seems. Upon second
look I think the patch is simply wrong in its current shape:
free_heap_pages() looks for page_state_is(..., free) when
trying to merge chunks, while alloc_heap_pages() now sets
PGC_state_inuse outside of the locked area. I'll revert it right
away.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30 10:22     ` Jan Beulich
@ 2017-08-30 12:59       ` Boris Ostrovsky
  2017-08-30 13:01         ` Wei Liu
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-30 12:59 UTC (permalink / raw)
  To: Jan Beulich, Andrew Cooper
  Cc: sstabellini, wei.liu2, George.Dunlap, tim, ian.jackson, xen-devel


>> This patch has been applied to staging, but its got problems.  The
>> following crash is rather trivial to provoke:
>>
>> ~Andrew
>>
>> (d19) Test result: SUCCESS
>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>> (XEN) CPU:    5
>> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
>> ...
>> (XEN) Pagetable walk from ffff82ffffffffe4:
>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
> Some negative offset into somewhere, it seems. Upon second
> look I think the patch is simply wrong in its current shape:
> free_heap_pages() looks for page_state_is(..., free) when
> trying to merge chunks, while alloc_heap_pages() now sets
> PGC_state_inuse outside of the locked area. I'll revert it right
> away.

Yes, so we do need to update page state under heap lock. I'll then move
scrubbing (and checking) only to outside the lock.

I am curious though, what was the test to trigger this? I ran about 100
parallel reboots under memory pressure and never hit this.


-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30 12:59       ` Boris Ostrovsky
@ 2017-08-30 13:01         ` Wei Liu
  2017-08-30 13:06         ` Andrew Cooper
  2017-08-30 13:30         ` Jan Beulich
  2 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2017-08-30 13:01 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, Andrew Cooper,
	ian.jackson, xen-devel, Jan Beulich

On Wed, Aug 30, 2017 at 08:59:34AM -0400, Boris Ostrovsky wrote:
> 
> >> This patch has been applied to staging, but its got problems.  The
> >> following crash is rather trivial to provoke:
> >>
> >> ~Andrew
> >>
> >> (d19) Test result: SUCCESS
> >> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
> >> (XEN) CPU:    5
> >> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
> >> ...
> >> (XEN) Pagetable walk from ffff82ffffffffe4:
> >> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
> >> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
> > Some negative offset into somewhere, it seems. Upon second
> > look I think the patch is simply wrong in its current shape:
> > free_heap_pages() looks for page_state_is(..., free) when
> > trying to merge chunks, while alloc_heap_pages() now sets
> > PGC_state_inuse outside of the locked area. I'll revert it right
> > away.
> 
> Yes, so we do need to update page state under heap lock. I'll then move
> scrubbing (and checking) only to outside the lock.
> 
> I am curious though, what was the test to trigger this? I ran about 100
> parallel reboots under memory pressure and never hit this.
> 

Appears to be one of the tests in xtf.git

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled
  2017-08-30  8:50   ` Jan Beulich
@ 2017-08-30 13:02     ` Boris Ostrovsky
  2017-08-30 13:31       ` Jan Beulich
  0 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-30 13:02 UTC (permalink / raw)
  To: Jan Beulich
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Julien Grall

On 08/30/2017 04:50 AM, Jan Beulich wrote:
>>>> On 29.08.17 at 19:09, <boris.ostrovsky@oracle.com> wrote:
>> @@ -1970,12 +1970,15 @@ void __init scrub_heap_pages(void)
>>  #ifdef CONFIG_SCRUB_DEBUG
>>      boot_scrub_done = true;
>>  #endif
>> +}
>>  
>> -    /* Now that the heap is initialized, run checks and set bounds
>> -     * for the low mem virq algorithm. */
>> +void __init heap_init_late(void)
>> +{
>>      setup_low_mem_virq();
>> -}
>>  
>> +    if ( opt_bootscrub )
>> +        scrub_heap_pages();
>> +}
> Any reason you fully remove that comment? I think the "run checks"
> part is stale (if it was ever valid in the first place), but the rest
> could more or less stay.

I thought it was pretty clear from the routine's name what it is about
to do so I dropped it. I can put it back if you feel it is still needed.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30 12:59       ` Boris Ostrovsky
  2017-08-30 13:01         ` Wei Liu
@ 2017-08-30 13:06         ` Andrew Cooper
  2017-08-30 13:59           ` Boris Ostrovsky
  2017-08-30 13:30         ` Jan Beulich
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew Cooper @ 2017-08-30 13:06 UTC (permalink / raw)
  To: Boris Ostrovsky, Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, tim, ian.jackson, xen-devel

On 30/08/17 13:59, Boris Ostrovsky wrote:
>>> This patch has been applied to staging, but its got problems.  The
>>> following crash is rather trivial to provoke:
>>>
>>> ~Andrew
>>>
>>> (d19) Test result: SUCCESS
>>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>>> (XEN) CPU:    5
>>> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
>>> ...
>>> (XEN) Pagetable walk from ffff82ffffffffe4:
>>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
>> Some negative offset into somewhere, it seems. Upon second
>> look I think the patch is simply wrong in its current shape:
>> free_heap_pages() looks for page_state_is(..., free) when
>> trying to merge chunks, while alloc_heap_pages() now sets
>> PGC_state_inuse outside of the locked area. I'll revert it right
>> away.
> Yes, so we do need to update page state under heap lock. I'll then move
> scrubbing (and checking) only to outside the lock.
>
> I am curious though, what was the test to trigger this? I ran about 100
> parallel reboots under memory pressure and never hit this.

# git clone git://xenbits.xen.org/xtf.git
# cd xtf
# make -j4 -s
# ./xtf-runner -qa

Purposefully, ./xtf-runner doesn't synchronously wait for VMs to be
fully destroyed before starting the next test.  (There is an ~800ms
added delay to synchronously destroy HVM guests, over PV, which I expect
is down to an interaction with qemu.  I got sufficiently annoyed that I
coded around the issue.)

As a result, destruction of one domain will be happening while
construction of the next one is happening.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30 12:59       ` Boris Ostrovsky
  2017-08-30 13:01         ` Wei Liu
  2017-08-30 13:06         ` Andrew Cooper
@ 2017-08-30 13:30         ` Jan Beulich
  2 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2017-08-30 13:30 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, Andrew Cooper,
	ian.jackson, xen-devel

>>> On 30.08.17 at 14:59, <boris.ostrovsky@oracle.com> wrote:

>>> This patch has been applied to staging, but its got problems.  The
>>> following crash is rather trivial to provoke:
>>>
>>> ~Andrew
>>>
>>> (d19) Test result: SUCCESS
>>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>>> (XEN) CPU:    5
>>> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
>>> ...
>>> (XEN) Pagetable walk from ffff82ffffffffe4:
>>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
>> Some negative offset into somewhere, it seems. Upon second
>> look I think the patch is simply wrong in its current shape:
>> free_heap_pages() looks for page_state_is(..., free) when
>> trying to merge chunks, while alloc_heap_pages() now sets
>> PGC_state_inuse outside of the locked area. I'll revert it right
>> away.
> 
> Yes, so we do need to update page state under heap lock. I'll then move
> scrubbing (and checking) only to outside the lock.

Actually I think you only need to set the first 4k page's state
with the lock still held.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled
  2017-08-30 13:02     ` Boris Ostrovsky
@ 2017-08-30 13:31       ` Jan Beulich
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Beulich @ 2017-08-30 13:31 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, wei.liu2, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, Julien Grall

>>> On 30.08.17 at 15:02, <boris.ostrovsky@oracle.com> wrote:
> On 08/30/2017 04:50 AM, Jan Beulich wrote:
>>>>> On 29.08.17 at 19:09, <boris.ostrovsky@oracle.com> wrote:
>>> @@ -1970,12 +1970,15 @@ void __init scrub_heap_pages(void)
>>>  #ifdef CONFIG_SCRUB_DEBUG
>>>      boot_scrub_done = true;
>>>  #endif
>>> +}
>>>  
>>> -    /* Now that the heap is initialized, run checks and set bounds
>>> -     * for the low mem virq algorithm. */
>>> +void __init heap_init_late(void)
>>> +{
>>>      setup_low_mem_virq();
>>> -}
>>>  
>>> +    if ( opt_bootscrub )
>>> +        scrub_heap_pages();
>>> +}
>> Any reason you fully remove that comment? I think the "run checks"
>> part is stale (if it was ever valid in the first place), but the rest
>> could more or less stay.
> 
> I thought it was pretty clear from the routine's name what it is about
> to do so I dropped it. I can put it back if you feel it is still needed.

I'd prefer if it was (roughly) kept as comment on just the
setup_low_mem_virq() invocation (i.e. not the new function
as a whole).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary
  2017-08-30 13:06         ` Andrew Cooper
@ 2017-08-30 13:59           ` Boris Ostrovsky
  0 siblings, 0 replies; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-30 13:59 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: sstabellini, wei.liu2, George.Dunlap, ian.jackson, tim, xen-devel

On 08/30/2017 09:06 AM, Andrew Cooper wrote:
> On 30/08/17 13:59, Boris Ostrovsky wrote:
>>>> This patch has been applied to staging, but its got problems.  The
>>>> following crash is rather trivial to provoke:
>>>>
>>>> ~Andrew
>>>>
>>>> (d19) Test result: SUCCESS
>>>> (XEN) ----[ Xen-4.10-unstable  x86_64  debug=y   Tainted:    H ]----
>>>> (XEN) CPU:    5
>>>> (XEN) RIP:    e008:[<ffff82d0802252fc>] page_alloc.c#free_heap_pages+0x786/0x7a1
>>>> ...
>>>> (XEN) Pagetable walk from ffff82ffffffffe4:
>>>> (XEN)  L4[0x105] = 00000000abe5b063 ffffffffffffffff
>>>> (XEN)  L3[0x1ff] = 0000000000000000 ffffffffffffffff
>>> Some negative offset into somewhere, it seems. Upon second
>>> look I think the patch is simply wrong in its current shape:
>>> free_heap_pages() looks for page_state_is(..., free) when
>>> trying to merge chunks, while alloc_heap_pages() now sets
>>> PGC_state_inuse outside of the locked area. I'll revert it right
>>> away.
>> Yes, so we do need to update page state under heap lock. I'll then move
>> scrubbing (and checking) only to outside the lock.
>>
>> I am curious though, what was the test to trigger this? I ran about 100
>> parallel reboots under memory pressure and never hit this.
> # git clone git://xenbits.xen.org/xtf.git
> # cd xtf
> # make -j4 -s
> # ./xtf-runner -qa
>
> Purposefully, ./xtf-runner doesn't synchronously wait for VMs to be
> fully destroyed before starting the next test.  (There is an ~800ms
> added delay to synchronously destroy HVM guests, over PV, which I expect
> is down to an interaction with qemu.  I got sufficiently annoyed that I
> coded around the issue.)
>
> As a result, destruction of one domain will be happening while
> construction of the next one is happening.

I was also doing overlapped destruction/construction but at random (so
overlaps didn't happen all the time).

xtf-runner indeed tripped this panic fairly quickly.

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] mm: Don't request scrubbing until dom0 is running
  2017-08-30  8:42   ` Wei Liu
@ 2017-08-30 14:19     ` Boris Ostrovsky
  2017-08-30 14:21       ` Wei Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Boris Ostrovsky @ 2017-08-30 14:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: tim, sstabellini, George.Dunlap, andrew.cooper3, ian.jackson,
	xen-devel, jbeulich

On 08/30/2017 04:42 AM, Wei Liu wrote:
> On Tue, Aug 29, 2017 at 01:09:16PM -0400, Boris Ostrovsky wrote:
>> There is no need to scrub pages freed during dom0 construction
>> since heap will be scrubbed once dom0 is ready (by scrub_heap_pages()).
>>
>> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> ---
>> Changes in v2:
>> * Removed '#ifdef CONFIG_SCRUB_DEBUG'
>>
>>  xen/common/page_alloc.c | 6 +-----
>>  1 file changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 3db77c5..6c08983 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -2247,16 +2247,12 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
>>  
>>              spin_unlock_recursive(&d->page_alloc_lock);
>>  
>> -#ifndef CONFIG_SCRUB_DEBUG
>>              /*
>>               * Normally we expect a domain to clear pages before freeing them,
>>               * if it cares about the secrecy of their contents. However, after
>>               * a domain has died we assume responsibility for erasure.
>>               */
>> -            scrub = !!d->is_dying;
>> -#else
>> -            scrub = true;
>> -#endif
>> +            scrub = !!d->is_dying | scrub_debug;
> Use logical or here please. And the !! for is_dying is not necessary.

Yes, I'll change this.

>
> Also please expand the comment to say why scrub_debug is also checked.

TBH, I think variable name (scrub_debug) should be sufficient. I could
say something like "when scrub debugging is on we always scrub" but I am
not sure how useful that would be. (Previous name, boot_scrub_done,
would have indeed be helped by a comment).

-boris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] mm: Don't request scrubbing until dom0 is running
  2017-08-30 14:19     ` Boris Ostrovsky
@ 2017-08-30 14:21       ` Wei Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2017-08-30 14:21 UTC (permalink / raw)
  To: Boris Ostrovsky
  Cc: tim, sstabellini, Wei Liu, George.Dunlap, andrew.cooper3,
	ian.jackson, xen-devel, jbeulich

On Wed, Aug 30, 2017 at 10:19:49AM -0400, Boris Ostrovsky wrote:
> On 08/30/2017 04:42 AM, Wei Liu wrote:
> > On Tue, Aug 29, 2017 at 01:09:16PM -0400, Boris Ostrovsky wrote:
> >> There is no need to scrub pages freed during dom0 construction
> >> since heap will be scrubbed once dom0 is ready (by scrub_heap_pages()).
> >>
> >> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> >> ---
> >> Changes in v2:
> >> * Removed '#ifdef CONFIG_SCRUB_DEBUG'
> >>
> >>  xen/common/page_alloc.c | 6 +-----
> >>  1 file changed, 1 insertion(+), 5 deletions(-)
> >>
> >> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> >> index 3db77c5..6c08983 100644
> >> --- a/xen/common/page_alloc.c
> >> +++ b/xen/common/page_alloc.c
> >> @@ -2247,16 +2247,12 @@ void free_domheap_pages(struct page_info *pg, unsigned int order)
> >>  
> >>              spin_unlock_recursive(&d->page_alloc_lock);
> >>  
> >> -#ifndef CONFIG_SCRUB_DEBUG
> >>              /*
> >>               * Normally we expect a domain to clear pages before freeing them,
> >>               * if it cares about the secrecy of their contents. However, after
> >>               * a domain has died we assume responsibility for erasure.
> >>               */
> >> -            scrub = !!d->is_dying;
> >> -#else
> >> -            scrub = true;
> >> -#endif
> >> +            scrub = !!d->is_dying | scrub_debug;
> > Use logical or here please. And the !! for is_dying is not necessary.
> 
> Yes, I'll change this.
> 
> >
> > Also please expand the comment to say why scrub_debug is also checked.
> 
> TBH, I think variable name (scrub_debug) should be sufficient. I could
> say something like "when scrub debugging is on we always scrub" but I am
> not sure how useful that would be. (Previous name, boot_scrub_done,
> would have indeed be helped by a comment).
> 

Hmm... OK, I won't insist on this.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-08-30 14:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-29 17:09 [PATCH 0/5] Scrubbing updates Boris Ostrovsky
2017-08-29 17:09 ` [PATCH 1/5] mm: Initialize lowmem virq when boot-time scrubbing is disabled Boris Ostrovsky
2017-08-30  8:40   ` Wei Liu
2017-08-30  8:50   ` Jan Beulich
2017-08-30 13:02     ` Boris Ostrovsky
2017-08-30 13:31       ` Jan Beulich
2017-08-29 17:09 ` [PATCH 2/5] mm: Change boot_scrub_done definition Boris Ostrovsky
2017-08-30  8:40   ` Wei Liu
2017-08-29 17:09 ` [PATCH 3/5] mm: Don't poison a page if scrub_debug is off Boris Ostrovsky
2017-08-29 17:09 ` [PATCH 4/5] mm: Don't request scrubbing until dom0 is running Boris Ostrovsky
2017-08-30  8:42   ` Wei Liu
2017-08-30 14:19     ` Boris Ostrovsky
2017-08-30 14:21       ` Wei Liu
2017-08-29 17:09 ` [PATCH 5/5] mm: Don't hold heap lock in alloc_heap_pages() longer than necessary Boris Ostrovsky
2017-08-30  8:43   ` Wei Liu
2017-08-30  9:27   ` Andrew Cooper
2017-08-30 10:22     ` Jan Beulich
2017-08-30 12:59       ` Boris Ostrovsky
2017-08-30 13:01         ` Wei Liu
2017-08-30 13:06         ` Andrew Cooper
2017-08-30 13:59           ` Boris Ostrovsky
2017-08-30 13:30         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.