All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] xen/mm: Optimize init_heap_pages()
@ 2022-06-09  8:30 Julien Grall
  2022-06-09  8:30 ` [PATCH 1/2] xen/heap: Split init_heap_pages() in two Julien Grall
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Julien Grall @ 2022-06-09  8:30 UTC (permalink / raw)
  To: xen-devel
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

From: Julien Grall <jgrall@amazon.com>

Hi all,

As part of the Live-Update work, we noticed that a big part Xen boot
is spent to add pages in the heap. For instance, on when running Xen
in a nested envionment on a c5.metal, it takes ~1.5s.

This small series is reworking init_heap_pages() to give the pages
to free_heap_pages() by chunk rather than one by one.

With this approach, the time spent to init the heap is down
to 166 ms in the setup mention above.

There is potentially one more optimization possible that would
allow to further reduce the time spent. The new approach is accessing
the page information multiple time in separate loop that can potentially
be large.

Cheers,

Hongyan Xia (1):
  xen/heap: pass order to free_heap_pages() in heap init

Julien Grall (1):
  xen/heap: Split init_heap_pages() in two

 xen/common/page_alloc.c | 109 ++++++++++++++++++++++++++++++----------
 1 file changed, 82 insertions(+), 27 deletions(-)

-- 
2.32.0



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] xen/heap: Split init_heap_pages() in two
  2022-06-09  8:30 [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
@ 2022-06-09  8:30 ` Julien Grall
  2022-06-09 12:09   ` Jan Beulich
  2022-06-09  8:30 ` [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init Julien Grall
  2022-06-10  9:36 ` [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
  2 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-06-09  8:30 UTC (permalink / raw)
  To: xen-devel
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu

From: Julien Grall <jgrall@amazon.com>

At the moment, init_heap_pages() will call free_heap_pages() page
by page. To reduce the time to initialize the heap, we will want
to provide multiple pages at the same time.

init_heap_pages() is now split in two parts:
    - init_heap_pages(): will break down the range in multiple set
      of contiguous pages. For now, the criteria is the pages should
      belong to the same NUMA node.
    - init_contig_pages(): will initialize a set of contiguous pages.
      For now the pages are still passed one by one to free_heap_pages().

Note that the comment before init_heap_pages() is heavily outdated and
does not reflect the current code. So update it.

This patch is a merge/rework of patches from David Woodhouse and
Hongyan Xia.

Signed-off-by: Julien Grall <jgrall@amazon.com>

----

Interestingly, I was expecting this patch to perform worse. However,
from testing there is a small increase in perf.

That said, I split the patch because it keeps refactoring and
optimization separated.
---
 xen/common/page_alloc.c | 82 +++++++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 27 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index 3e6504283f1e..a1938df1406c 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
 }
 
 /*
- * Hand the specified arbitrary page range to the specified heap zone
- * checking the node_id of the previous page.  If they differ and the
- * latter is not on a MAX_ORDER boundary, then we reserve the page by
- * not freeing it to the buddy allocator.
+ * init_contig_heap_pages() is intended to only take pages from the same
+ * NUMA node.
  */
+static bool is_contig_page(struct page_info *pg, unsigned int nid)
+{
+    return (nid == (phys_to_nid(page_to_maddr(pg))));
+}
+
+/*
+ * This function should only be called with valid pages from the same NUMA
+ * node.
+ *
+ * Callers should use is_contig_page() first to check if all the pages
+ * in a range are contiguous.
+ */
+static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
+                                   bool need_scrub)
+{
+    unsigned long s, e;
+    unsigned int nid = phys_to_nid(page_to_maddr(pg));
+
+    s = mfn_x(page_to_mfn(pg));
+    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
+    if ( unlikely(!avail[nid]) )
+    {
+        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
+                        (find_first_set_bit(e) <= find_first_set_bit(s));
+        unsigned long n;
+
+        n = init_node_heap(nid, s, nr_pages, &use_tail);
+        BUG_ON(n > nr_pages);
+        if ( use_tail )
+            e -= n;
+        else
+            s += n;
+    }
+
+    while ( s < e )
+    {
+        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
+        s += 1UL;
+    }
+}
+
 static void init_heap_pages(
     struct page_info *pg, unsigned long nr_pages)
 {
     unsigned long i;
-    bool idle_scrub = false;
+    bool need_scrub = scrub_debug;
 
     /*
      * Keep MFN 0 away from the buddy allocator to avoid crossing zone
@@ -1812,35 +1851,24 @@ static void init_heap_pages(
     spin_unlock(&heap_lock);
 
     if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
-        idle_scrub = true;
+        need_scrub = true;
 
-    for ( i = 0; i < nr_pages; i++ )
+    for ( i = 0; i < nr_pages; )
     {
-        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
+        unsigned int nid = phys_to_nid(page_to_maddr(pg));
+        unsigned long left = nr_pages - i;
+        unsigned long contig_pages;
 
-        if ( unlikely(!avail[nid]) )
+        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
         {
-            unsigned long s = mfn_x(page_to_mfn(pg + i));
-            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
-            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
-                            !(s & ((1UL << MAX_ORDER) - 1)) &&
-                            (find_first_set_bit(e) <= find_first_set_bit(s));
-            unsigned long n;
-
-            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
-                               &use_tail);
-            BUG_ON(i + n > nr_pages);
-            if ( n && !use_tail )
-            {
-                i += n - 1;
-                continue;
-            }
-            if ( i + n == nr_pages )
+            if ( !is_contig_page(pg + contig_pages, nid) )
                 break;
-            nr_pages -= n;
         }
 
-        free_heap_pages(pg + i, 0, scrub_debug || idle_scrub);
+        init_contig_heap_pages(pg, contig_pages, need_scrub);
+
+        pg += contig_pages;
+        i += contig_pages;
     }
 }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init
  2022-06-09  8:30 [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
  2022-06-09  8:30 ` [PATCH 1/2] xen/heap: Split init_heap_pages() in two Julien Grall
@ 2022-06-09  8:30 ` Julien Grall
  2022-06-09 13:22   ` Jan Beulich
  2022-06-28 14:40   ` Bertrand Marquis
  2022-06-10  9:36 ` [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
  2 siblings, 2 replies; 12+ messages in thread
From: Julien Grall @ 2022-06-09  8:30 UTC (permalink / raw)
  To: xen-devel
  Cc: bertrand.marquis, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Julien Grall, Stefano Stabellini, Wei Liu,
	Julien Grall

From: Hongyan Xia <hongyxia@amazon.com>

The idea is to split the range into multiple aligned power-of-2 regions
which only needs to call free_heap_pages() once each. We check the least
significant set bit of the start address and use its bit index as the
order of this increment. This makes sure that each increment is both
power-of-2 and properly aligned, which can be safely passed to
free_heap_pages(). Of course, the order also needs to be sanity checked
against the upper bound and MAX_ORDER.

Testing on a nested environment on c5.metal with various amount
of RAM. Time for end_boot_allocator() to complete:
            Before         After
    - 90GB: 1426 ms        166 ms
    -  8GB:  124 ms         12 ms
    -  4GB:   60 ms          6 ms

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
---
 xen/common/page_alloc.c | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index a1938df1406c..bf852cfc11ea 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1779,16 +1779,28 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
 
 /*
  * init_contig_heap_pages() is intended to only take pages from the same
- * NUMA node.
+ * NUMA node and zone.
+ *
+ * For the latter, it is always true for !CONFIG_SEPARATE_XENHEAP since
+ * free_heap_pages() can only take power-of-two ranges which never cross
+ * zone boundaries. But for separate xenheap which is manually defined,
+ * it is possible for a power-of-two range to cross zones, so we need to
+ * check that as well.
  */
-static bool is_contig_page(struct page_info *pg, unsigned int nid)
+static bool is_contig_page(struct page_info *pg, unsigned int nid,
+                           unsigned int zone)
 {
+#ifdef CONFIG_SEPARATE_XENHEAP
+    if ( zone != page_to_zone(pg) )
+        return false;
+#endif
+
     return (nid == (phys_to_nid(page_to_maddr(pg))));
 }
 
 /*
  * This function should only be called with valid pages from the same NUMA
- * node.
+ * node and the same zone.
  *
  * Callers should use is_contig_page() first to check if all the pages
  * in a range are contiguous.
@@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
 
     while ( s < e )
     {
-        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
-        s += 1UL;
+        /*
+         * For s == 0, we simply use the largest increment by checking the
+         * index of the MSB set. For s != 0, we also need to ensure that the
+         * chunk is properly sized to end at power-of-two alignment. We do this
+         * by checking the LSB set and use its index as the increment. Both
+         * cases need to be guarded by MAX_ORDER.
+         *
+         * Note that the value of ffsl() and flsl() starts from 1 so we need
+         * to decrement it by 1.
+         */
+        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
+
+        if ( s )
+            inc_order = min(inc_order, ffsl(s) - 1);
+        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
+        s += (1UL << inc_order);
     }
 }
 
@@ -1856,12 +1882,13 @@ static void init_heap_pages(
     for ( i = 0; i < nr_pages; )
     {
         unsigned int nid = phys_to_nid(page_to_maddr(pg));
+        unsigned int zone = page_to_zone(pg);
         unsigned long left = nr_pages - i;
         unsigned long contig_pages;
 
         for ( contig_pages = 1; contig_pages < left; contig_pages++ )
         {
-            if ( !is_contig_page(pg + contig_pages, nid) )
+            if ( !is_contig_page(pg + contig_pages, nid, zone) )
                 break;
         }
 
-- 
2.32.0



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two
  2022-06-09  8:30 ` [PATCH 1/2] xen/heap: Split init_heap_pages() in two Julien Grall
@ 2022-06-09 12:09   ` Jan Beulich
  2022-06-09 12:33     ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-06-09 12:09 UTC (permalink / raw)
  To: Julien Grall
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, xen-devel

On 09.06.2022 10:30, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> At the moment, init_heap_pages() will call free_heap_pages() page
> by page. To reduce the time to initialize the heap, we will want
> to provide multiple pages at the same time.
> 
> init_heap_pages() is now split in two parts:
>     - init_heap_pages(): will break down the range in multiple set
>       of contiguous pages. For now, the criteria is the pages should
>       belong to the same NUMA node.
>     - init_contig_pages(): will initialize a set of contiguous pages.
>       For now the pages are still passed one by one to free_heap_pages().

Hmm, the common use of "contiguous" is to describe address correlation.
Therefore I'm afraid I'd like to see "contig" avoided when you mean
"same node". Perhaps init_node_pages()?

> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>  }
>  
>  /*
> - * Hand the specified arbitrary page range to the specified heap zone
> - * checking the node_id of the previous page.  If they differ and the
> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
> - * not freeing it to the buddy allocator.
> + * init_contig_heap_pages() is intended to only take pages from the same
> + * NUMA node.
>   */
> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
> +{
> +    return (nid == (phys_to_nid(page_to_maddr(pg))));
> +}

If such a helper is indeed needed, then I think it absolutely wants
pg to be pointer-to-const. And imo it would also help readability if
the extra pair of parentheses around the nested function calls was
omitted. Given the naming concern, though, I wonder whether this
wouldn't better be open-coded in the one place it is used. Actually
naming is quite odd here beyond what I'd said further up: "Is this
page contiguous?" Such a question requires two pages, i.e. "Are these
two pages contiguous?" What you want to know is "Is this page on the
given node?"

> +/*
> + * This function should only be called with valid pages from the same NUMA
> + * node.
> + *
> + * Callers should use is_contig_page() first to check if all the pages
> + * in a range are contiguous.
> + */
> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,

const again?

> +                                   bool need_scrub)
> +{
> +    unsigned long s, e;
> +    unsigned int nid = phys_to_nid(page_to_maddr(pg));
> +
> +    s = mfn_x(page_to_mfn(pg));
> +    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
> +    if ( unlikely(!avail[nid]) )
> +    {
> +        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&

IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?

> +                        (find_first_set_bit(e) <= find_first_set_bit(s));
> +        unsigned long n;
> +
> +        n = init_node_heap(nid, s, nr_pages, &use_tail);
> +        BUG_ON(n > nr_pages);
> +        if ( use_tail )
> +            e -= n;
> +        else
> +            s += n;
> +    }
> +
> +    while ( s < e )
> +    {
> +        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
> +        s += 1UL;

Nit (I realize the next patch will replace this anyway): Just ++s? Or
at least a plain 1 without UL suffix?

> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>      spin_unlock(&heap_lock);
>  
>      if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
> -        idle_scrub = true;
> +        need_scrub = true;
>  
> -    for ( i = 0; i < nr_pages; i++ )
> +    for ( i = 0; i < nr_pages; )
>      {
> -        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
> +        unsigned int nid = phys_to_nid(page_to_maddr(pg));
> +        unsigned long left = nr_pages - i;
> +        unsigned long contig_pages;
>  
> -        if ( unlikely(!avail[nid]) )
> +        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>          {
> -            unsigned long s = mfn_x(page_to_mfn(pg + i));
> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
> -            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
> -                            !(s & ((1UL << MAX_ORDER) - 1)) &&
> -                            (find_first_set_bit(e) <= find_first_set_bit(s));
> -            unsigned long n;
> -
> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
> -                               &use_tail);
> -            BUG_ON(i + n > nr_pages);
> -            if ( n && !use_tail )
> -            {
> -                i += n - 1;
> -                continue;
> -            }
> -            if ( i + n == nr_pages )
> +            if ( !is_contig_page(pg + contig_pages, nid) )
>                  break;
> -            nr_pages -= n;
>          }

Isn't doing this page by page in a loop quite inefficient? Can't you
simply obtain the end of the node's range covering the first page, and
then adjust "left" accordingly? I even wonder whether the admittedly
lax original check's assumption couldn't be leveraged alternatively,
by effectively bisecting to the end address on the node of interest
(where the assumption is that nodes aren't interleaved - see Wei's
NUMA series dealing with precisely that situation).

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two
  2022-06-09 12:09   ` Jan Beulich
@ 2022-06-09 12:33     ` Julien Grall
  2022-06-09 13:12       ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2022-06-09 12:33 UTC (permalink / raw)
  To: Jan Beulich
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, xen-devel

Hi Jan,

On 09/06/2022 13:09, Jan Beulich wrote:
> On 09.06.2022 10:30, Julien Grall wrote:
>> From: Julien Grall <jgrall@amazon.com>
>>
>> At the moment, init_heap_pages() will call free_heap_pages() page
>> by page. To reduce the time to initialize the heap, we will want
>> to provide multiple pages at the same time.
>>
>> init_heap_pages() is now split in two parts:
>>      - init_heap_pages(): will break down the range in multiple set
>>        of contiguous pages. For now, the criteria is the pages should
>>        belong to the same NUMA node.
>>      - init_contig_pages(): will initialize a set of contiguous pages.
>>        For now the pages are still passed one by one to free_heap_pages().
> 
> Hmm, the common use of "contiguous" is to describe address correlation.
> Therefore I'm afraid I'd like to see "contig" avoided when you mean
> "same node". Perhaps init_node_pages()?

After the next patch, it will not only be the same node, It will also be 
the same zone at least. Also, in the future, I would like to 
re-submitting David Woodhouse patch to exclude broken pages (see [1]).

Therefore, I think the name init_node_pages() would not be suitable. 
Please suggest a different name.

> 
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>>   }
>>   
>>   /*
>> - * Hand the specified arbitrary page range to the specified heap zone
>> - * checking the node_id of the previous page.  If they differ and the
>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
>> - * not freeing it to the buddy allocator.
>> + * init_contig_heap_pages() is intended to only take pages from the same
>> + * NUMA node.
>>    */
>> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
>> +{
>> +    return (nid == (phys_to_nid(page_to_maddr(pg))));
>> +}
> 
> If such a helper is indeed needed, then I think it absolutely wants
> pg to be pointer-to-const. And imo it would also help readability if
> the extra pair of parentheses around the nested function calls was
> omitted. Given the naming concern, though, I wonder whether this
> wouldn't better be open-coded in the one place it is used. Actually
> naming is quite odd here beyond what I'd said further up: "Is this
> page contiguous?" Such a question requires two pages, i.e. "Are these
> two pages contiguous?" What you want to know is "Is this page on the
> given node?"

There will be more check in the future (see next patch). I created an 
helper because it reduces the size of the loop init_heap_pages(). I 
would be OK to fold it if you strongly prefer that.

> 
>> +/*
>> + * This function should only be called with valid pages from the same NUMA
>> + * node.
>> + *
>> + * Callers should use is_contig_page() first to check if all the pages
>> + * in a range are contiguous.
>> + */
>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
> 
> const again?

I will have a look.

> 
>> +                                   bool need_scrub)
>> +{
>> +    unsigned long s, e;
>> +    unsigned int nid = phys_to_nid(page_to_maddr(pg));
>> +
>> +    s = mfn_x(page_to_mfn(pg));
>> +    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>> +    if ( unlikely(!avail[nid]) )
>> +    {
>> +        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
> 
> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?

This is existing code and it is quite complex. So I would prefer if we 
avoid to simplify and move the code in the same patch. I would be happy 
to write a follow-up patch to switch to IS_ALIGNED().

> 
>> +                        (find_first_set_bit(e) <= find_first_set_bit(s));
>> +        unsigned long n;
>> +
>> +        n = init_node_heap(nid, s, nr_pages, &use_tail);
>> +        BUG_ON(n > nr_pages);
>> +        if ( use_tail )
>> +            e -= n;
>> +        else
>> +            s += n;
>> +    }
>> +
>> +    while ( s < e )
>> +    {
>> +        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
>> +        s += 1UL;
> 
> Nit (I realize the next patch will replace this anyway): Just ++s? Or
> at least a plain 1 without UL suffix?

I will switch to s++.

> 
>> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>>       spin_unlock(&heap_lock);
>>   
>>       if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
>> -        idle_scrub = true;
>> +        need_scrub = true;
>>   
>> -    for ( i = 0; i < nr_pages; i++ )
>> +    for ( i = 0; i < nr_pages; )
>>       {
>> -        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>> +        unsigned int nid = phys_to_nid(page_to_maddr(pg));
>> +        unsigned long left = nr_pages - i;
>> +        unsigned long contig_pages;
>>   
>> -        if ( unlikely(!avail[nid]) )
>> +        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>           {
>> -            unsigned long s = mfn_x(page_to_mfn(pg + i));
>> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>> -            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>> -                            !(s & ((1UL << MAX_ORDER) - 1)) &&
>> -                            (find_first_set_bit(e) <= find_first_set_bit(s));
>> -            unsigned long n;
>> -
>> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
>> -                               &use_tail);
>> -            BUG_ON(i + n > nr_pages);
>> -            if ( n && !use_tail )
>> -            {
>> -                i += n - 1;
>> -                continue;
>> -            }
>> -            if ( i + n == nr_pages )
>> +            if ( !is_contig_page(pg + contig_pages, nid) )
>>                   break;
>> -            nr_pages -= n;
>>           }
> 
> Isn't doing this page by page in a loop quite inefficient? Can't you
> simply obtain the end of the node's range covering the first page, and
> then adjust "left" accordingly?

The page by page is necessary because we may need to exclude some pages 
(see [1]) or the range may cross multiple-zone (see [2]).

> I even wonder whether the admittedly
> lax original check's assumption couldn't be leveraged alternatively,
> by effectively bisecting to the end address on the node of interest
> (where the assumption is that nodes aren't interleaved - see Wei's
> NUMA series dealing with precisely that situation).
See above. We went this way because there are some pages to be excluded. 
The immediate use case is broken pages, but in the future we may need to 
also exclude pages that contain guest content after Live-Update.

I also plan to get rid of the loop in free_heap_pages() to mark each 
page free. This would mean that pages would only be accessed once in 
init_heap_pages() (I am still cleaning up that patch and it is a much 
more controversial change).

Cheers,

[1] 
https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@infradead.org/

> 
> Jan

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two
  2022-06-09 12:33     ` Julien Grall
@ 2022-06-09 13:12       ` Jan Beulich
  2022-06-09 13:18         ` Julien Grall
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-06-09 13:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, xen-devel

On 09.06.2022 14:33, Julien Grall wrote:
> On 09/06/2022 13:09, Jan Beulich wrote:
>> On 09.06.2022 10:30, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> At the moment, init_heap_pages() will call free_heap_pages() page
>>> by page. To reduce the time to initialize the heap, we will want
>>> to provide multiple pages at the same time.
>>>
>>> init_heap_pages() is now split in two parts:
>>>      - init_heap_pages(): will break down the range in multiple set
>>>        of contiguous pages. For now, the criteria is the pages should
>>>        belong to the same NUMA node.
>>>      - init_contig_pages(): will initialize a set of contiguous pages.
>>>        For now the pages are still passed one by one to free_heap_pages().
>>
>> Hmm, the common use of "contiguous" is to describe address correlation.
>> Therefore I'm afraid I'd like to see "contig" avoided when you mean
>> "same node". Perhaps init_node_pages()?
> 
> After the next patch, it will not only be the same node, It will also be 
> the same zone at least. Also, in the future, I would like to 
> re-submitting David Woodhouse patch to exclude broken pages (see [1]).
> 
> Therefore, I think the name init_node_pages() would not be suitable. 
> Please suggest a different name.

_init_heap_pages() then, as a helper of init_heap_pages()?

>>> --- a/xen/common/page_alloc.c
>>> +++ b/xen/common/page_alloc.c
>>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>>>   }
>>>   
>>>   /*
>>> - * Hand the specified arbitrary page range to the specified heap zone
>>> - * checking the node_id of the previous page.  If they differ and the
>>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
>>> - * not freeing it to the buddy allocator.
>>> + * init_contig_heap_pages() is intended to only take pages from the same
>>> + * NUMA node.
>>>    */
>>> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
>>> +{
>>> +    return (nid == (phys_to_nid(page_to_maddr(pg))));
>>> +}
>>
>> If such a helper is indeed needed, then I think it absolutely wants
>> pg to be pointer-to-const. And imo it would also help readability if
>> the extra pair of parentheses around the nested function calls was
>> omitted. Given the naming concern, though, I wonder whether this
>> wouldn't better be open-coded in the one place it is used. Actually
>> naming is quite odd here beyond what I'd said further up: "Is this
>> page contiguous?" Such a question requires two pages, i.e. "Are these
>> two pages contiguous?" What you want to know is "Is this page on the
>> given node?"
> 
> There will be more check in the future (see next patch). I created an 
> helper because it reduces the size of the loop init_heap_pages(). I 
> would be OK to fold it if you strongly prefer that.

I don't "strongly" prefer that; I'd also be okay with a suitably named
helper. Just that I can't seem to be able to come up with any good name.

>>> +/*
>>> + * This function should only be called with valid pages from the same NUMA
>>> + * node.
>>> + *
>>> + * Callers should use is_contig_page() first to check if all the pages
>>> + * in a range are contiguous.
>>> + */
>>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
>>> +                                   bool need_scrub)
>>> +{
>>> +    unsigned long s, e;
>>> +    unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> +
>>> +    s = mfn_x(page_to_mfn(pg));
>>> +    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>> +    if ( unlikely(!avail[nid]) )
>>> +    {
>>> +        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
>>
>> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?
> 
> This is existing code and it is quite complex. So I would prefer if we 
> avoid to simplify and move the code in the same patch. I would be happy 
> to write a follow-up patch to switch to IS_ALIGNED().

I do realize it's code you move, but I can accept your desire to merely
move the code without any cleanup. Personally I think that rather than a
follow-up patch (which doesn't help the reviewing of this one) such an
adjustment would better be a prereq one.

>>> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>>>       spin_unlock(&heap_lock);
>>>   
>>>       if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
>>> -        idle_scrub = true;
>>> +        need_scrub = true;
>>>   
>>> -    for ( i = 0; i < nr_pages; i++ )
>>> +    for ( i = 0; i < nr_pages; )
>>>       {
>>> -        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>>> +        unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>> +        unsigned long left = nr_pages - i;
>>> +        unsigned long contig_pages;
>>>   
>>> -        if ( unlikely(!avail[nid]) )
>>> +        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>>           {
>>> -            unsigned long s = mfn_x(page_to_mfn(pg + i));
>>> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>> -            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>>> -                            !(s & ((1UL << MAX_ORDER) - 1)) &&
>>> -                            (find_first_set_bit(e) <= find_first_set_bit(s));
>>> -            unsigned long n;
>>> -
>>> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
>>> -                               &use_tail);
>>> -            BUG_ON(i + n > nr_pages);
>>> -            if ( n && !use_tail )
>>> -            {
>>> -                i += n - 1;
>>> -                continue;
>>> -            }
>>> -            if ( i + n == nr_pages )
>>> +            if ( !is_contig_page(pg + contig_pages, nid) )
>>>                   break;
>>> -            nr_pages -= n;
>>>           }
>>
>> Isn't doing this page by page in a loop quite inefficient? Can't you
>> simply obtain the end of the node's range covering the first page, and
>> then adjust "left" accordingly?
> 
> The page by page is necessary because we may need to exclude some pages 
> (see [1]) or the range may cross multiple-zone (see [2]).

If you want/need to do this for "future" reasons (aiui [1] was never
committed, and you forgot to supply [2], but yes, splitting at zone
boundaries is of course necessary), then I think this wants making quite
clear in the description.

Jan

>> I even wonder whether the admittedly
>> lax original check's assumption couldn't be leveraged alternatively,
>> by effectively bisecting to the end address on the node of interest
>> (where the assumption is that nodes aren't interleaved - see Wei's
>> NUMA series dealing with precisely that situation).
> See above. We went this way because there are some pages to be excluded. 
> The immediate use case is broken pages, but in the future we may need to 
> also exclude pages that contain guest content after Live-Update.
> 
> I also plan to get rid of the loop in free_heap_pages() to mark each 
> page free. This would mean that pages would only be accessed once in 
> init_heap_pages() (I am still cleaning up that patch and it is a much 
> more controversial change).
> 
> Cheers,
> 
> [1] 
> https://lore.kernel.org/xen-devel/20200201003303.2363081-2-dwmw2@infradead.org/
> 
>>
>> Jan
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/2] xen/heap: Split init_heap_pages() in two
  2022-06-09 13:12       ` Jan Beulich
@ 2022-06-09 13:18         ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-06-09 13:18 UTC (permalink / raw)
  To: Jan Beulich
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, xen-devel

Hi,

On 09/06/2022 14:12, Jan Beulich wrote:
> On 09.06.2022 14:33, Julien Grall wrote:
>> On 09/06/2022 13:09, Jan Beulich wrote:
>>> On 09.06.2022 10:30, Julien Grall wrote:
>>>> From: Julien Grall <jgrall@amazon.com>
>>>>
>>>> At the moment, init_heap_pages() will call free_heap_pages() page
>>>> by page. To reduce the time to initialize the heap, we will want
>>>> to provide multiple pages at the same time.
>>>>
>>>> init_heap_pages() is now split in two parts:
>>>>       - init_heap_pages(): will break down the range in multiple set
>>>>         of contiguous pages. For now, the criteria is the pages should
>>>>         belong to the same NUMA node.
>>>>       - init_contig_pages(): will initialize a set of contiguous pages.
>>>>         For now the pages are still passed one by one to free_heap_pages().
>>>
>>> Hmm, the common use of "contiguous" is to describe address correlation.
>>> Therefore I'm afraid I'd like to see "contig" avoided when you mean
>>> "same node". Perhaps init_node_pages()?
>>
>> After the next patch, it will not only be the same node, It will also be
>> the same zone at least. Also, in the future, I would like to
>> re-submitting David Woodhouse patch to exclude broken pages (see [1]).
>>
>> Therefore, I think the name init_node_pages() would not be suitable.
>> Please suggest a different name.
> 
> _init_heap_pages() then, as a helper of init_heap_pages()?

I am fine with your proposed named. That said...

> 
>>>> --- a/xen/common/page_alloc.c
>>>> +++ b/xen/common/page_alloc.c
>>>> @@ -1778,16 +1778,55 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>>>>    }
>>>>    
>>>>    /*
>>>> - * Hand the specified arbitrary page range to the specified heap zone
>>>> - * checking the node_id of the previous page.  If they differ and the
>>>> - * latter is not on a MAX_ORDER boundary, then we reserve the page by
>>>> - * not freeing it to the buddy allocator.
>>>> + * init_contig_heap_pages() is intended to only take pages from the same
>>>> + * NUMA node.
>>>>     */
>>>> +static bool is_contig_page(struct page_info *pg, unsigned int nid)
>>>> +{
>>>> +    return (nid == (phys_to_nid(page_to_maddr(pg))));
>>>> +}
>>>
>>> If such a helper is indeed needed, then I think it absolutely wants
>>> pg to be pointer-to-const. And imo it would also help readability if
>>> the extra pair of parentheses around the nested function calls was
>>> omitted. Given the naming concern, though, I wonder whether this
>>> wouldn't better be open-coded in the one place it is used. Actually
>>> naming is quite odd here beyond what I'd said further up: "Is this
>>> page contiguous?" Such a question requires two pages, i.e. "Are these
>>> two pages contiguous?" What you want to know is "Is this page on the
>>> given node?"
>>
>> There will be more check in the future (see next patch). I created an
>> helper because it reduces the size of the loop init_heap_pages(). I
>> would be OK to fold it if you strongly prefer that.
> 
> I don't "strongly" prefer that; I'd also be okay with a suitably named
> helper. Just that I can't seem to be able to come up with any good name.

... I am not sure what could be a suitable name for this helper. I will 
have a look how bad the fold version look like.

> 
>>>> +/*
>>>> + * This function should only be called with valid pages from the same NUMA
>>>> + * node.
>>>> + *
>>>> + * Callers should use is_contig_page() first to check if all the pages
>>>> + * in a range are contiguous.
>>>> + */
>>>> +static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
>>>> +                                   bool need_scrub)
>>>> +{
>>>> +    unsigned long s, e;
>>>> +    unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>>> +
>>>> +    s = mfn_x(page_to_mfn(pg));
>>>> +    e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>>> +    if ( unlikely(!avail[nid]) )
>>>> +    {
>>>> +        bool use_tail = !(s & ((1UL << MAX_ORDER) - 1)) &&
>>>
>>> IS_ALIGNED(s, 1UL << MAX_ORDER) to "describe" what's meant?
>>
>> This is existing code and it is quite complex. So I would prefer if we
>> avoid to simplify and move the code in the same patch. I would be happy
>> to write a follow-up patch to switch to IS_ALIGNED().
> 
> I do realize it's code you move, but I can accept your desire to merely
> move the code without any cleanup. Personally I think that rather than a
> follow-up patch (which doesn't help the reviewing of this one) such an
> adjustment would better be a prereq one.

I will look for a prereq.

> 
>>>> @@ -1812,35 +1851,24 @@ static void init_heap_pages(
>>>>        spin_unlock(&heap_lock);
>>>>    
>>>>        if ( system_state < SYS_STATE_active && opt_bootscrub == BOOTSCRUB_IDLE )
>>>> -        idle_scrub = true;
>>>> +        need_scrub = true;
>>>>    
>>>> -    for ( i = 0; i < nr_pages; i++ )
>>>> +    for ( i = 0; i < nr_pages; )
>>>>        {
>>>> -        unsigned int nid = phys_to_nid(page_to_maddr(pg+i));
>>>> +        unsigned int nid = phys_to_nid(page_to_maddr(pg));
>>>> +        unsigned long left = nr_pages - i;
>>>> +        unsigned long contig_pages;
>>>>    
>>>> -        if ( unlikely(!avail[nid]) )
>>>> +        for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>>>            {
>>>> -            unsigned long s = mfn_x(page_to_mfn(pg + i));
>>>> -            unsigned long e = mfn_x(mfn_add(page_to_mfn(pg + nr_pages - 1), 1));
>>>> -            bool use_tail = (nid == phys_to_nid(pfn_to_paddr(e - 1))) &&
>>>> -                            !(s & ((1UL << MAX_ORDER) - 1)) &&
>>>> -                            (find_first_set_bit(e) <= find_first_set_bit(s));
>>>> -            unsigned long n;
>>>> -
>>>> -            n = init_node_heap(nid, mfn_x(page_to_mfn(pg + i)), nr_pages - i,
>>>> -                               &use_tail);
>>>> -            BUG_ON(i + n > nr_pages);
>>>> -            if ( n && !use_tail )
>>>> -            {
>>>> -                i += n - 1;
>>>> -                continue;
>>>> -            }
>>>> -            if ( i + n == nr_pages )
>>>> +            if ( !is_contig_page(pg + contig_pages, nid) )
>>>>                    break;
>>>> -            nr_pages -= n;
>>>>            }
>>>
>>> Isn't doing this page by page in a loop quite inefficient? Can't you
>>> simply obtain the end of the node's range covering the first page, and
>>> then adjust "left" accordingly?
>>
>> The page by page is necessary because we may need to exclude some pages
>> (see [1]) or the range may cross multiple-zone (see [2]).
> 
> If you want/need to do this for "future" reasons (aiui [1] was never
> committed

You are correct. I would like to revive it at some point.

, and you forgot to supply [2], but yes, splitting at zone
> boundaries is of course necessary)

Sorry. I was meant to write patch #2:

20220609083039.76667-3-julien@xen.org


> , then I think this wants making quite
> clear in the description.

Good point. I will update the commit message.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init
  2022-06-09  8:30 ` [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init Julien Grall
@ 2022-06-09 13:22   ` Jan Beulich
  2022-07-15 17:16     ` Julien Grall
  2022-06-28 14:40   ` Bertrand Marquis
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-06-09 13:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: bertrand.marquis, Hongyan Xia, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, Julien Grall, xen-devel

On 09.06.2022 10:30, Julien Grall wrote:
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> The idea is to split the range into multiple aligned power-of-2 regions
> which only needs to call free_heap_pages() once each. We check the least
> significant set bit of the start address and use its bit index as the
> order of this increment. This makes sure that each increment is both
> power-of-2 and properly aligned, which can be safely passed to
> free_heap_pages(). Of course, the order also needs to be sanity checked
> against the upper bound and MAX_ORDER.
> 
> Testing on a nested environment on c5.metal with various amount
> of RAM. Time for end_boot_allocator() to complete:
>             Before         After
>     - 90GB: 1426 ms        166 ms
>     -  8GB:  124 ms         12 ms
>     -  4GB:   60 ms          6 ms
> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>
> ---
>  xen/common/page_alloc.c | 39 +++++++++++++++++++++++++++++++++------
>  1 file changed, 33 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index a1938df1406c..bf852cfc11ea 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -1779,16 +1779,28 @@ int query_page_offline(mfn_t mfn, uint32_t *status)
>  
>  /*
>   * init_contig_heap_pages() is intended to only take pages from the same
> - * NUMA node.
> + * NUMA node and zone.
> + *
> + * For the latter, it is always true for !CONFIG_SEPARATE_XENHEAP since
> + * free_heap_pages() can only take power-of-two ranges which never cross
> + * zone boundaries. But for separate xenheap which is manually defined,
> + * it is possible for a power-of-two range to cross zones, so we need to
> + * check that as well.
>   */
> -static bool is_contig_page(struct page_info *pg, unsigned int nid)
> +static bool is_contig_page(struct page_info *pg, unsigned int nid,
> +                           unsigned int zone)
>  {
> +#ifdef CONFIG_SEPARATE_XENHEAP
> +    if ( zone != page_to_zone(pg) )
> +        return false;
> +#endif
> +
>      return (nid == (phys_to_nid(page_to_maddr(pg))));
>  }
>  
>  /*
>   * This function should only be called with valid pages from the same NUMA
> - * node.
> + * node and the same zone.
>   *
>   * Callers should use is_contig_page() first to check if all the pages
>   * in a range are contiguous.
> @@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
>  
>      while ( s < e )
>      {
> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
> -        s += 1UL;
> +        /*
> +         * For s == 0, we simply use the largest increment by checking the
> +         * index of the MSB set. For s != 0, we also need to ensure that the

"The MSB" reads as it it was not in line with the code; at least I would,
short of it being spelled out, assume it can only be the page's address
which is meant (as is the case for LSB below). But it's the MSB of the
range's size.

> +         * chunk is properly sized to end at power-of-two alignment. We do this
> +         * by checking the LSB set and use its index as the increment. Both
> +         * cases need to be guarded by MAX_ORDER.
> +         *
> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
> +         * to decrement it by 1.
> +         */
> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
> +
> +        if ( s )
> +            inc_order = min(inc_order, ffsl(s) - 1);
> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
> +        s += (1UL << inc_order);
>      }
>  }
>  
> @@ -1856,12 +1882,13 @@ static void init_heap_pages(
>      for ( i = 0; i < nr_pages; )
>      {
>          unsigned int nid = phys_to_nid(page_to_maddr(pg));
> +        unsigned int zone = page_to_zone(pg);
>          unsigned long left = nr_pages - i;
>          unsigned long contig_pages;
>  
>          for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>          {
> -            if ( !is_contig_page(pg + contig_pages, nid) )
> +            if ( !is_contig_page(pg + contig_pages, nid, zone) )
>                  break;
>          }

Coming back to your reply to my comment on patch 1: I think this
addition of the node check is actually an argument for inlining the
function's body here (and then dropping the function). That way the
separate-Xen-heap aspect is visible at the place where it matters,
rather than requiring an indirection via looking at the helper
function (and leaving a dead parameter in the opposite case). But as
said - I'm not going to insist as long as the helper function has a
suitable name (representing what it does and not misguiding anyone
with the common "contig"-means-addresses implication).

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/2] xen/mm: Optimize init_heap_pages()
  2022-06-09  8:30 [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
  2022-06-09  8:30 ` [PATCH 1/2] xen/heap: Split init_heap_pages() in two Julien Grall
  2022-06-09  8:30 ` [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init Julien Grall
@ 2022-06-10  9:36 ` Julien Grall
  2 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-06-10  9:36 UTC (permalink / raw)
  To: xen-devel
  Cc: bertrand.marquis, Julien Grall, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu

On 09/06/2022 09:30, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> Hi all,
> 
> As part of the Live-Update work, we noticed that a big part Xen boot
> is spent to add pages in the heap. For instance, on when running Xen
> in a nested envionment on a c5.metal, it takes ~1.5s.

On IRC, Bertrand asked me how I measured the time taken here. I will 
share on xen-devel so everyone can use it. Note the patch is x86 
specific, but could be easily adapted for Arm.

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 53a73010e029..d99b9f3abf5e 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -615,10 +615,16 @@ static inline bool using_2M_mapping(void)
             !l1_table_offset((unsigned long)__2M_rwdata_end);
  }

+extern uint64_t myticks;
+
  static void noreturn init_done(void)
  {
      void *va;
      unsigned long start, end;
+    uint64_t elapsed = tsc_ticks2ns(myticks);
+
+    printk("elapsed %lu ms %lu ns\n", elapsed / MILLISECS(1),
+           elapsed % MILLISECS(1));

      system_state = SYS_STATE_active;

diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
index ea59cd1a4aba..3e6504283f1e 100644
--- a/xen/common/page_alloc.c
+++ b/xen/common/page_alloc.c
@@ -1865,9 +1865,12 @@ static unsigned long avail_heap_pages(
      return free_pages;
  }

+uint64_t myticks;
+
  void __init end_boot_allocator(void)
  {
      unsigned int i;
+    uint64_t stsc = rdtsc_ordered();

      /* Pages that are free now go to the domain sub-allocator. */
      for ( i = 0; i < nr_bootmem_regions; i++ )
@@ -1892,6 +1895,8 @@ void __init end_boot_allocator(void)
      if ( !dma_bitsize && (num_online_nodes() > 1) )
          dma_bitsize = arch_get_dma_bitsize();

+    myticks = rdtsc_ordered() - stsc;
+
      printk("Domain heap initialised");
      if ( dma_bitsize )
          printk(" DMA width %u bits", dma_bitsize);

Cheers,

-- 
Julien Grall


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init
  2022-06-09  8:30 ` [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init Julien Grall
  2022-06-09 13:22   ` Jan Beulich
@ 2022-06-28 14:40   ` Bertrand Marquis
  2022-07-01 18:03     ` Julien Grall
  1 sibling, 1 reply; 12+ messages in thread
From: Bertrand Marquis @ 2022-06-28 14:40 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Julien Grall

Hi Julien,

> On 9 Jun 2022, at 09:30, Julien Grall <julien@xen.org> wrote:
> 
> From: Hongyan Xia <hongyxia@amazon.com>
> 
> The idea is to split the range into multiple aligned power-of-2 regions
> which only needs to call free_heap_pages() once each. We check the least
> significant set bit of the start address and use its bit index as the
> order of this increment. This makes sure that each increment is both
> power-of-2 and properly aligned, which can be safely passed to
> free_heap_pages(). Of course, the order also needs to be sanity checked
> against the upper bound and MAX_ORDER.
> 
> Testing on a nested environment on c5.metal with various amount
> of RAM. Time for end_boot_allocator() to complete:
>            Before         After
>    - 90GB: 1426 ms        166 ms
>    -  8GB:  124 ms         12 ms
>    -  4GB:   60 ms          6 ms


On a arm64 Neoverse N1 system with 32GB of Ram I have:
- 1180 ms before
- 63 ms after

and my internal tests are passing on arm64.

Great optimisation :-)

(I will do a full review of code the in a second step).

> 
> Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Cheers
Bertrand



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init
  2022-06-28 14:40   ` Bertrand Marquis
@ 2022-07-01 18:03     ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-07-01 18:03 UTC (permalink / raw)
  To: Bertrand Marquis
  Cc: xen-devel, Hongyan Xia, Andrew Cooper, George Dunlap,
	Jan Beulich, Stefano Stabellini, Wei Liu, Julien Grall



On 28/06/2022 15:40, Bertrand Marquis wrote:
> Hi Julien,

Hi Bertrand,

>> On 9 Jun 2022, at 09:30, Julien Grall <julien@xen.org> wrote:
>>
>> From: Hongyan Xia <hongyxia@amazon.com>
>>
>> The idea is to split the range into multiple aligned power-of-2 regions
>> which only needs to call free_heap_pages() once each. We check the least
>> significant set bit of the start address and use its bit index as the
>> order of this increment. This makes sure that each increment is both
>> power-of-2 and properly aligned, which can be safely passed to
>> free_heap_pages(). Of course, the order also needs to be sanity checked
>> against the upper bound and MAX_ORDER.
>>
>> Testing on a nested environment on c5.metal with various amount
>> of RAM. Time for end_boot_allocator() to complete:
>>             Before         After
>>     - 90GB: 1426 ms        166 ms
>>     -  8GB:  124 ms         12 ms
>>     -  4GB:   60 ms          6 ms
> 
> 
> On a arm64 Neoverse N1 system with 32GB of Ram I have:
> - 1180 ms before
> - 63 ms after
> 
> and my internal tests are passing on arm64.

Thanks for the testing! The number are a lot better than I was actually 
expecting on arm64.

> 
> Great optimisation :-)

You will have to thanks Hongyan. He came up with the idea :).

> 
> (I will do a full review of code the in a second step).

I am planning to send a new version in the next few days. So you may 
want to wait before reviewing the series.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init
  2022-06-09 13:22   ` Jan Beulich
@ 2022-07-15 17:16     ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2022-07-15 17:16 UTC (permalink / raw)
  To: Jan Beulich
  Cc: bertrand.marquis, Hongyan Xia, Andrew Cooper, George Dunlap,
	Stefano Stabellini, Wei Liu, Julien Grall, xen-devel

Hi Jan,

On 09/06/2022 14:22, Jan Beulich wrote:
>>   /*
>>    * This function should only be called with valid pages from the same NUMA
>> - * node.
>> + * node and the same zone.
>>    *
>>    * Callers should use is_contig_page() first to check if all the pages
>>    * in a range are contiguous.
>> @@ -1817,8 +1829,22 @@ static void init_contig_heap_pages(struct page_info *pg, unsigned long nr_pages,
>>   
>>       while ( s < e )
>>       {
>> -        free_heap_pages(mfn_to_page(_mfn(s)), 0, need_scrub);
>> -        s += 1UL;
>> +        /*
>> +         * For s == 0, we simply use the largest increment by checking the
>> +         * index of the MSB set. For s != 0, we also need to ensure that the
> 
> "The MSB" reads as it it was not in line with the code; at least I would,
> short of it being spelled out, assume it can only be the page's address
> which is meant (as is the case for LSB below). But it's the MSB of the
> range's size.

Indeed. I have reworded the comment to:

         /*
          * For s == 0, we simply use the largest increment by checking the
          * MSB of the region size. For s != 0, we also need to ensure 
that the
          * chunk is properly sized to end at power-of-two alignment. We 
do this
          * by checking the LSB of the start address and use its index as
          * the increment. Both cases need to be guarded by MAX_ORDER.
          *
          * Note that the value of ffsl() and flsl() starts from 1 so we 
need
          * to decrement it by 1.
          */

> 
>> +         * chunk is properly sized to end at power-of-two alignment. We do this
>> +         * by checking the LSB set and use its index as the increment. Both
>> +         * cases need to be guarded by MAX_ORDER.
>> +         *
>> +         * Note that the value of ffsl() and flsl() starts from 1 so we need
>> +         * to decrement it by 1.
>> +         */
>> +        int inc_order = min(MAX_ORDER, flsl(e - s) - 1);
>> +
>> +        if ( s )
>> +            inc_order = min(inc_order, ffsl(s) - 1);
>> +        free_heap_pages(mfn_to_page(_mfn(s)), inc_order, need_scrub);
>> +        s += (1UL << inc_order);
>>       }
>>   }
>>   
>> @@ -1856,12 +1882,13 @@ static void init_heap_pages(
>>       for ( i = 0; i < nr_pages; )
>>       {
>>           unsigned int nid = phys_to_nid(page_to_maddr(pg));
>> +        unsigned int zone = page_to_zone(pg);
>>           unsigned long left = nr_pages - i;
>>           unsigned long contig_pages;
>>   
>>           for ( contig_pages = 1; contig_pages < left; contig_pages++ )
>>           {
>> -            if ( !is_contig_page(pg + contig_pages, nid) )
>> +            if ( !is_contig_page(pg + contig_pages, nid, zone) )
>>                   break;
>>           }
> 
> Coming back to your reply to my comment on patch 1: I think this
> addition of the node check is actually an argument for inlining the
> function's body here (and then dropping the function). That way the
> separate-Xen-heap aspect is visible at the place where it matters,
> rather than requiring an indirection via looking at the helper
> function (and leaving a dead parameter in the opposite case). But as
> said - I'm not going to insist as long as the helper function has a
> suitable name (representing what it does and not misguiding anyone
> with the common "contig"-means-addresses implication).

I have followed your suggestion in patch #1:
   * is_contig_page() is now inlined
   * init_contig_heap_pages() was renamed to _init_heap_pages()

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-07-15 17:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-09  8:30 [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall
2022-06-09  8:30 ` [PATCH 1/2] xen/heap: Split init_heap_pages() in two Julien Grall
2022-06-09 12:09   ` Jan Beulich
2022-06-09 12:33     ` Julien Grall
2022-06-09 13:12       ` Jan Beulich
2022-06-09 13:18         ` Julien Grall
2022-06-09  8:30 ` [PATCH 2/2] xen/heap: pass order to free_heap_pages() in heap init Julien Grall
2022-06-09 13:22   ` Jan Beulich
2022-07-15 17:16     ` Julien Grall
2022-06-28 14:40   ` Bertrand Marquis
2022-07-01 18:03     ` Julien Grall
2022-06-10  9:36 ` [PATCH 0/2] xen/mm: Optimize init_heap_pages() Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.