linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter
  2021-06-25  1:47 ` [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter Gavin Shan
@ 2021-06-25  1:14   ` Alexander Duyck
  2021-06-25  3:58     ` Gavin Shan
  2021-06-25  5:53   ` Michael S. Tsirkin
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Duyck @ 2021-06-25  1:14 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, LKML, David Hildenbrand, Michael S. Tsirkin,
	Andrew Morton, Anshuman Khandual, Catalin Marinas, Will Deacon,
	shan.gavin

On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>
> The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
> threshold. It can't be adjusted at runtime.
>
> This introduces a variable (@page_reporting_order) to replace the
> marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
> meaning the page reporting is disabled. It will be specified by driver
> if valid one is provided. Otherwise, it will fall back to @pageblock_order.
> It's also exported so that the page reporting order can be adjusted at
> runtime.
>
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

So this patch looks like it is technically broken. We need a line in
page_reporting_register that will overwrite the value with
pageblock_order if it is less than page_reporting_order.

> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>  mm/page_reporting.c                             | 9 +++++++--
>  mm/page_reporting.h                             | 5 ++---
>  3 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index cb89dbdedc46..566c4b9af3cd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3566,6 +3566,12 @@
>                         off: turn off poisoning (default)
>                         on: turn on poisoning
>
> +       page_reporting.page_reporting_order=
> +                       [KNL] Minimal page reporting order
> +                       Format: <integer>
> +                       Adjust the minimal page reporting order. The page
> +                       reporting is disabled when it exceeds (MAX_ORDER-1).
> +
>         panic=          [KNL] Kernel behaviour on panic: delay <timeout>
>                         timeout > 0: seconds before rebooting
>                         timeout = 0: wait forever
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index df9c5054e1b4..34bf4d26c2c4 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -4,12 +4,17 @@
>  #include <linux/page_reporting.h>
>  #include <linux/gfp.h>
>  #include <linux/export.h>
> +#include <linux/module.h>
>  #include <linux/delay.h>
>  #include <linux/scatterlist.h>
>
>  #include "page_reporting.h"
>  #include "internal.h"
>
> +unsigned int page_reporting_order = MAX_ORDER;
> +module_param(page_reporting_order, uint, 0644);
> +MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
> +
>  #define PAGE_REPORTING_DELAY   (2 * HZ)
>  static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
>
> @@ -229,7 +234,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>
>         /* Generate minimum watermark to be able to guarantee progress */
>         watermark = low_wmark_pages(zone) +
> -                   (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER);
> +                   (PAGE_REPORTING_CAPACITY << page_reporting_order);
>
>         /*
>          * Cancel request if insufficient free memory or if we failed
> @@ -239,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>                 return err;
>
>         /* Process each free list starting from lowest order/mt */
> -       for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
> +       for (order = page_reporting_order; order < MAX_ORDER; order++) {
>                 for (mt = 0; mt < MIGRATE_TYPES; mt++) {
>                         /* We do not pull pages from the isolate free list */
>                         if (is_migrate_isolate(mt))
> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
> index 2c385dd4ddbd..c51dbc228b94 100644
> --- a/mm/page_reporting.h
> +++ b/mm/page_reporting.h
> @@ -10,10 +10,9 @@
>  #include <linux/pgtable.h>
>  #include <linux/scatterlist.h>
>
> -#define PAGE_REPORTING_MIN_ORDER       pageblock_order
> -
>  #ifdef CONFIG_PAGE_REPORTING
>  DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
> +extern unsigned int page_reporting_order;
>  void __page_reporting_notify(void);
>
>  static inline bool page_reported(struct page *page)
> @@ -38,7 +37,7 @@ static inline void page_reporting_notify_free(unsigned int order)
>                 return;
>
>         /* Determine if we have crossed reporting threshold */
> -       if (order < PAGE_REPORTING_MIN_ORDER)
> +       if (order < page_reporting_order)
>                 return;
>
>         /* This will add a few cycles, but should be called infrequently */
> --
> 2.23.0
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  1:47 ` [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order Gavin Shan
@ 2021-06-25  1:19   ` Alexander Duyck
  2021-06-25  4:00     ` Gavin Shan
  2021-06-25  5:48   ` Michael S. Tsirkin
  1 sibling, 1 reply; 16+ messages in thread
From: Alexander Duyck @ 2021-06-25  1:19 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, LKML, David Hildenbrand, Michael S. Tsirkin,
	Andrew Morton, Anshuman Khandual, Catalin Marinas, Will Deacon,
	shan.gavin

On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>
> The page reporting order (threshold) is sticky to @pageblock_order
> by default. The page reporting can never be triggered because the
> freeing page can't come up with a free area like that huge. The
> situation becomes worse when the system memory becomes heavily
> fragmented.
>
> For example, the following configurations are used on ARM64 when 64KB
> base page size is enabled. In this specific case, the page reporting
> won't be triggered until the freeing page comes up with a 512MB free
> area. That's hard to be met, especially when the system memory becomes
> heavily fragmented.
>
>    PAGE_SIZE:          64KB
>    HPAGE_SIZE:         512MB
>    pageblock_order:    13       (512MB)
>    MAX_ORDER:          14
>
> This allows the drivers to specify the page reporting order when the
> page reporting device is registered. It falls back to @pageblock_order
> if it's not specified by the driver. The existing users (hv_balloon
> and virtio_balloon) don't specify it and @pageblock_order is still
> taken as their page reporting order. So this shouldn't introduce any
> functional changes.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  include/linux/page_reporting.h | 3 +++
>  mm/page_reporting.c            | 6 ++++++
>  2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
> index 3b99e0ec24f2..fe648dfa3a7c 100644
> --- a/include/linux/page_reporting.h
> +++ b/include/linux/page_reporting.h
> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>
>         /* Current state of page reporting */
>         atomic_t state;
> +
> +       /* Minimal order of page reporting */
> +       unsigned int order;
>  };
>
>  /* Tear-down and bring-up for page reporting devices */
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index 34bf4d26c2c4..382958eef8a9 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>                 goto err_out;
>         }
>
> +       /*
> +        * Update the page reporting order if it's specified by driver.
> +        * Otherwise, it falls back to @pageblock_order.
> +        */
> +       page_reporting_order = prdev->order ? : pageblock_order;
> +

An alternative to this would be to look at setting up some
comparisons. I might add another variable and do something like:
order = prdev->order ? : pageblock_order;
if (order < page_reporting_order)
    page_reporting_order = order;

You could essentially do something similar in the previous patch but
just use pageblock_order directly rather than having to add a local
variable.

That way if you need to still pull down the page reporting order you
can do so without prdev->order or pageblock_order overwriting the
value and pushing it back up.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size
@ 2021-06-25  1:47 Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 1/4] mm/page_reporting: Fix code style in __page_reporting_request() Gavin Shan
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  1:47 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, alexander.duyck, david, mst, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

The page reporting threshold is currently equal to @pageblock_order, which
is 13 and 512MB on arm64 with 64KB base page size selected. The page
reporting won't be triggered if the freeing page can't come up with a free
area like that huge. The condition is hard to be met, especially when the
system memory becomes fragmented.

This series intends to solve the issue by having page reporting threshold
as 5 (2MB) on arm64 with 64KB base page size. The patches are organized as:

   PATCH[1/4] Fix some coding style in __page_reporting_request().
   PATCH[2/4] Represents page reporting order with variable so that it can
              be exported as module parameter.
   PATCH[3/4] Allows the device driver (e.g. virtio_balloon) to specify
              the page reporting order when the device info is registered.
   PATCH[4/4] Specifies the page reporting order to 5, corresponding to
              2MB in size on ARM64 when 64KB base page size is used.

Changelog
=========
v4:
   * Set @page_reporting_order to MAX_ORDER. Its value is
     specified by the driver or falls back to @pageblock_order
     when page reporting device is registered.                    (Alex)
   * Include "module.h" in page_reporting.c                       (Andrew)
v3:
   * Avoid overhead introduced by function all                    (Alex)
   * Export page reporting order as module parameter              (Gavin)
v2:
   * Rewrite the patches as Alex suggested                        (Alex)

Gavin Shan (4):
  mm/page_reporting: Fix code style in __page_reporting_request()
  mm/page_reporting: Export reporting order as module parameter
  mm/page_reporting: Allow driver to specify reporting order
  virtio_balloon: Specify page reporting order if needed

 .../admin-guide/kernel-parameters.txt         |  6 ++++++
 drivers/virtio/virtio_balloon.c               | 17 +++++++++++++++++
 include/linux/page_reporting.h                |  3 +++
 mm/page_reporting.c                           | 19 +++++++++++++++----
 mm/page_reporting.h                           |  5 ++---
 5 files changed, 43 insertions(+), 7 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/4] mm/page_reporting: Fix code style in __page_reporting_request()
  2021-06-25  1:47 [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
@ 2021-06-25  1:47 ` Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter Gavin Shan
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  1:47 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, alexander.duyck, david, mst, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

The lines of comments would be starting with one, instead two space.
This corrects the style.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 mm/page_reporting.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index c50d93ffa252..df9c5054e1b4 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -31,8 +31,8 @@ __page_reporting_request(struct page_reporting_dev_info *prdev)
 		return;
 
 	/*
-	 *  If reporting is already active there is nothing we need to do.
-	 *  Test against 0 as that represents PAGE_REPORTING_IDLE.
+	 * If reporting is already active there is nothing we need to do.
+	 * Test against 0 as that represents PAGE_REPORTING_IDLE.
 	 */
 	state = atomic_xchg(&prdev->state, PAGE_REPORTING_REQUESTED);
 	if (state != PAGE_REPORTING_IDLE)
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter
  2021-06-25  1:47 [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 1/4] mm/page_reporting: Fix code style in __page_reporting_request() Gavin Shan
@ 2021-06-25  1:47 ` Gavin Shan
  2021-06-25  1:14   ` Alexander Duyck
  2021-06-25  5:53   ` Michael S. Tsirkin
  2021-06-25  1:47 ` [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed Gavin Shan
  3 siblings, 2 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  1:47 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, alexander.duyck, david, mst, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
threshold. It can't be adjusted at runtime.

This introduces a variable (@page_reporting_order) to replace the
marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
meaning the page reporting is disabled. It will be specified by driver
if valid one is provided. Otherwise, it will fall back to @pageblock_order.
It's also exported so that the page reporting order can be adjusted at
runtime.

Suggested-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
 mm/page_reporting.c                             | 9 +++++++--
 mm/page_reporting.h                             | 5 ++---
 3 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..566c4b9af3cd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3566,6 +3566,12 @@
 			off: turn off poisoning (default)
 			on: turn on poisoning
 
+	page_reporting.page_reporting_order=
+			[KNL] Minimal page reporting order
+			Format: <integer>
+			Adjust the minimal page reporting order. The page
+			reporting is disabled when it exceeds (MAX_ORDER-1).
+
 	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
 			timeout > 0: seconds before rebooting
 			timeout = 0: wait forever
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index df9c5054e1b4..34bf4d26c2c4 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -4,12 +4,17 @@
 #include <linux/page_reporting.h>
 #include <linux/gfp.h>
 #include <linux/export.h>
+#include <linux/module.h>
 #include <linux/delay.h>
 #include <linux/scatterlist.h>
 
 #include "page_reporting.h"
 #include "internal.h"
 
+unsigned int page_reporting_order = MAX_ORDER;
+module_param(page_reporting_order, uint, 0644);
+MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
+
 #define PAGE_REPORTING_DELAY	(2 * HZ)
 static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
 
@@ -229,7 +234,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 
 	/* Generate minimum watermark to be able to guarantee progress */
 	watermark = low_wmark_pages(zone) +
-		    (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER);
+		    (PAGE_REPORTING_CAPACITY << page_reporting_order);
 
 	/*
 	 * Cancel request if insufficient free memory or if we failed
@@ -239,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
 		return err;
 
 	/* Process each free list starting from lowest order/mt */
-	for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
+	for (order = page_reporting_order; order < MAX_ORDER; order++) {
 		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
 			/* We do not pull pages from the isolate free list */
 			if (is_migrate_isolate(mt))
diff --git a/mm/page_reporting.h b/mm/page_reporting.h
index 2c385dd4ddbd..c51dbc228b94 100644
--- a/mm/page_reporting.h
+++ b/mm/page_reporting.h
@@ -10,10 +10,9 @@
 #include <linux/pgtable.h>
 #include <linux/scatterlist.h>
 
-#define PAGE_REPORTING_MIN_ORDER	pageblock_order
-
 #ifdef CONFIG_PAGE_REPORTING
 DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
+extern unsigned int page_reporting_order;
 void __page_reporting_notify(void);
 
 static inline bool page_reported(struct page *page)
@@ -38,7 +37,7 @@ static inline void page_reporting_notify_free(unsigned int order)
 		return;
 
 	/* Determine if we have crossed reporting threshold */
-	if (order < PAGE_REPORTING_MIN_ORDER)
+	if (order < page_reporting_order)
 		return;
 
 	/* This will add a few cycles, but should be called infrequently */
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  1:47 [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 1/4] mm/page_reporting: Fix code style in __page_reporting_request() Gavin Shan
  2021-06-25  1:47 ` [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter Gavin Shan
@ 2021-06-25  1:47 ` Gavin Shan
  2021-06-25  1:19   ` Alexander Duyck
  2021-06-25  5:48   ` Michael S. Tsirkin
  2021-06-25  1:47 ` [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed Gavin Shan
  3 siblings, 2 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  1:47 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, alexander.duyck, david, mst, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

The page reporting order (threshold) is sticky to @pageblock_order
by default. The page reporting can never be triggered because the
freeing page can't come up with a free area like that huge. The
situation becomes worse when the system memory becomes heavily
fragmented.

For example, the following configurations are used on ARM64 when 64KB
base page size is enabled. In this specific case, the page reporting
won't be triggered until the freeing page comes up with a 512MB free
area. That's hard to be met, especially when the system memory becomes
heavily fragmented.

   PAGE_SIZE:          64KB
   HPAGE_SIZE:         512MB
   pageblock_order:    13       (512MB)
   MAX_ORDER:          14

This allows the drivers to specify the page reporting order when the
page reporting device is registered. It falls back to @pageblock_order
if it's not specified by the driver. The existing users (hv_balloon
and virtio_balloon) don't specify it and @pageblock_order is still
taken as their page reporting order. So this shouldn't introduce any
functional changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 include/linux/page_reporting.h | 3 +++
 mm/page_reporting.c            | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
index 3b99e0ec24f2..fe648dfa3a7c 100644
--- a/include/linux/page_reporting.h
+++ b/include/linux/page_reporting.h
@@ -18,6 +18,9 @@ struct page_reporting_dev_info {
 
 	/* Current state of page reporting */
 	atomic_t state;
+
+	/* Minimal order of page reporting */
+	unsigned int order;
 };
 
 /* Tear-down and bring-up for page reporting devices */
diff --git a/mm/page_reporting.c b/mm/page_reporting.c
index 34bf4d26c2c4..382958eef8a9 100644
--- a/mm/page_reporting.c
+++ b/mm/page_reporting.c
@@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
 		goto err_out;
 	}
 
+	/*
+	 * Update the page reporting order if it's specified by driver.
+	 * Otherwise, it falls back to @pageblock_order.
+	 */
+	page_reporting_order = prdev->order ? : pageblock_order;
+
 	/* initialize state and work structures */
 	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
 	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed
  2021-06-25  1:47 [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
                   ` (2 preceding siblings ...)
  2021-06-25  1:47 ` [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order Gavin Shan
@ 2021-06-25  1:47 ` Gavin Shan
  2021-06-25  5:57   ` Michael S. Tsirkin
  3 siblings, 1 reply; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  1:47 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, alexander.duyck, david, mst, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

The page reporting won't be triggered if the freeing page can't come
up with a free area, whose size is equal or bigger than the threshold
(page reporting order). The default page reporting order, equal to
@pageblock_order, is too huge on some architectures to trigger page
reporting. One example is ARM64 when 64KB base page size is used.

      PAGE_SIZE:          64KB
      pageblock_order:    13       (512MB)
      MAX_ORDER:          14

This specifies the page reporting order to 5 (2MB) for this specific
case so that page reporting can be triggered.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
---
 drivers/virtio/virtio_balloon.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 510e9318854d..47dce91f788c 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -993,6 +993,23 @@ static int virtballoon_probe(struct virtio_device *vdev)
 			goto out_unregister_oom;
 		}
 
+		/*
+		 * The default page reporting order is @pageblock_order, which
+		 * corresponds to 512MB in size on ARM64 when 64KB base page
+		 * size is used. The page reporting won't be triggered if the
+		 * freeing page can't come up with a free area like that huge.
+		 * So we specify the page reporting order to 5, corresponding
+		 * to 2MB. It helps to avoid THP splitting if 4KB base page
+		 * size is used by host.
+		 *
+		 * Ideally, the page reporting order is selected based on the
+		 * host's base page size. However, it needs more work to report
+		 * that value. The hard-coded order would be fine currently.
+		 */
+#if defined(CONFIG_ARM64) && defined(CONFIG_ARM64_64K_PAGES)
+		vb->pr_dev_info.order = 5;
+#endif
+
 		err = page_reporting_register(&vb->pr_dev_info);
 		if (err)
 			goto out_unregister_oom;
-- 
2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter
  2021-06-25  1:14   ` Alexander Duyck
@ 2021-06-25  3:58     ` Gavin Shan
  0 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  3:58 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-mm, LKML, David Hildenbrand, Michael S. Tsirkin,
	Andrew Morton, Anshuman Khandual, Catalin Marinas, Will Deacon,
	shan.gavin

On 6/25/21 11:14 AM, Alexander Duyck wrote:
> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
>> threshold. It can't be adjusted at runtime.
>>
>> This introduces a variable (@page_reporting_order) to replace the
>> marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
>> meaning the page reporting is disabled. It will be specified by driver
>> if valid one is provided. Otherwise, it will fall back to @pageblock_order.
>> It's also exported so that the page reporting order can be adjusted at
>> runtime.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> 
> So this patch looks like it is technically broken. We need a line in
> page_reporting_register that will overwrite the value with
> pageblock_order if it is less than page_reporting_order.
> 

Yes, but it's not broken after next (PATCH[3/4]) is applied.
However, It's still worthy to be fixed, to be "git bisect"
friendly. I'll update in v5.

Thanks,
Gavin

>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>>   mm/page_reporting.c                             | 9 +++++++--
>>   mm/page_reporting.h                             | 5 ++---
>>   3 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index cb89dbdedc46..566c4b9af3cd 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3566,6 +3566,12 @@
>>                          off: turn off poisoning (default)
>>                          on: turn on poisoning
>>
>> +       page_reporting.page_reporting_order=
>> +                       [KNL] Minimal page reporting order
>> +                       Format: <integer>
>> +                       Adjust the minimal page reporting order. The page
>> +                       reporting is disabled when it exceeds (MAX_ORDER-1).
>> +
>>          panic=          [KNL] Kernel behaviour on panic: delay <timeout>
>>                          timeout > 0: seconds before rebooting
>>                          timeout = 0: wait forever
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index df9c5054e1b4..34bf4d26c2c4 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -4,12 +4,17 @@
>>   #include <linux/page_reporting.h>
>>   #include <linux/gfp.h>
>>   #include <linux/export.h>
>> +#include <linux/module.h>
>>   #include <linux/delay.h>
>>   #include <linux/scatterlist.h>
>>
>>   #include "page_reporting.h"
>>   #include "internal.h"
>>
>> +unsigned int page_reporting_order = MAX_ORDER;
>> +module_param(page_reporting_order, uint, 0644);
>> +MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
>> +
>>   #define PAGE_REPORTING_DELAY   (2 * HZ)
>>   static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
>>
>> @@ -229,7 +234,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>>
>>          /* Generate minimum watermark to be able to guarantee progress */
>>          watermark = low_wmark_pages(zone) +
>> -                   (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER);
>> +                   (PAGE_REPORTING_CAPACITY << page_reporting_order);
>>
>>          /*
>>           * Cancel request if insufficient free memory or if we failed
>> @@ -239,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>>                  return err;
>>
>>          /* Process each free list starting from lowest order/mt */
>> -       for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
>> +       for (order = page_reporting_order; order < MAX_ORDER; order++) {
>>                  for (mt = 0; mt < MIGRATE_TYPES; mt++) {
>>                          /* We do not pull pages from the isolate free list */
>>                          if (is_migrate_isolate(mt))
>> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
>> index 2c385dd4ddbd..c51dbc228b94 100644
>> --- a/mm/page_reporting.h
>> +++ b/mm/page_reporting.h
>> @@ -10,10 +10,9 @@
>>   #include <linux/pgtable.h>
>>   #include <linux/scatterlist.h>
>>
>> -#define PAGE_REPORTING_MIN_ORDER       pageblock_order
>> -
>>   #ifdef CONFIG_PAGE_REPORTING
>>   DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
>> +extern unsigned int page_reporting_order;
>>   void __page_reporting_notify(void);
>>
>>   static inline bool page_reported(struct page *page)
>> @@ -38,7 +37,7 @@ static inline void page_reporting_notify_free(unsigned int order)
>>                  return;
>>
>>          /* Determine if we have crossed reporting threshold */
>> -       if (order < PAGE_REPORTING_MIN_ORDER)
>> +       if (order < page_reporting_order)
>>                  return;
>>
>>          /* This will add a few cycles, but should be called infrequently */
>> --
>> 2.23.0
>>
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  1:19   ` Alexander Duyck
@ 2021-06-25  4:00     ` Gavin Shan
  2021-06-25  4:24       ` Gavin Shan
  0 siblings, 1 reply; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  4:00 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-mm, LKML, David Hildenbrand, Michael S. Tsirkin,
	Andrew Morton, Anshuman Khandual, Catalin Marinas, Will Deacon,
	shan.gavin

On 6/25/21 11:19 AM, Alexander Duyck wrote:
> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>>
>> The page reporting order (threshold) is sticky to @pageblock_order
>> by default. The page reporting can never be triggered because the
>> freeing page can't come up with a free area like that huge. The
>> situation becomes worse when the system memory becomes heavily
>> fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the page reporting order when the
>> page reporting device is registered. It falls back to @pageblock_order
>> if it's not specified by the driver. The existing users (hv_balloon
>> and virtio_balloon) don't specify it and @pageblock_order is still
>> taken as their page reporting order. So this shouldn't introduce any
>> functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   include/linux/page_reporting.h | 3 +++
>>   mm/page_reporting.c            | 6 ++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>
>>          /* Current state of page reporting */
>>          atomic_t state;
>> +
>> +       /* Minimal order of page reporting */
>> +       unsigned int order;
>>   };
>>
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index 34bf4d26c2c4..382958eef8a9 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>                  goto err_out;
>>          }
>>
>> +       /*
>> +        * Update the page reporting order if it's specified by driver.
>> +        * Otherwise, it falls back to @pageblock_order.
>> +        */
>> +       page_reporting_order = prdev->order ? : pageblock_order;
>> +
> 
> An alternative to this would be to look at setting up some
> comparisons. I might add another variable and do something like:
> order = prdev->order ? : pageblock_order;
> if (order < page_reporting_order)
>      page_reporting_order = order;
> 
> You could essentially do something similar in the previous patch but
> just use pageblock_order directly rather than having to add a local
> variable.
> 
> That way if you need to still pull down the page reporting order you
> can do so without prdev->order or pageblock_order overwriting the
> value and pushing it back up.
> 

Thanks, Alex. Lets do both in v5, which will be posted shortly.

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  4:00     ` Gavin Shan
@ 2021-06-25  4:24       ` Gavin Shan
  0 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  4:24 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: linux-mm, LKML, David Hildenbrand, Michael S. Tsirkin,
	Andrew Morton, Anshuman Khandual, Catalin Marinas, Will Deacon,
	shan.gavin

On 6/25/21 2:00 PM, Gavin Shan wrote:
> On 6/25/21 11:19 AM, Alexander Duyck wrote:
>> On Thu, Jun 24, 2021 at 4:46 PM Gavin Shan <gshan@redhat.com> wrote:
>>>
>>> The page reporting order (threshold) is sticky to @pageblock_order
>>> by default. The page reporting can never be triggered because the
>>> freeing page can't come up with a free area like that huge. The
>>> situation becomes worse when the system memory becomes heavily
>>> fragmented.
>>>
>>> For example, the following configurations are used on ARM64 when 64KB
>>> base page size is enabled. In this specific case, the page reporting
>>> won't be triggered until the freeing page comes up with a 512MB free
>>> area. That's hard to be met, especially when the system memory becomes
>>> heavily fragmented.
>>>
>>>     PAGE_SIZE:          64KB
>>>     HPAGE_SIZE:         512MB
>>>     pageblock_order:    13       (512MB)
>>>     MAX_ORDER:          14
>>>
>>> This allows the drivers to specify the page reporting order when the
>>> page reporting device is registered. It falls back to @pageblock_order
>>> if it's not specified by the driver. The existing users (hv_balloon
>>> and virtio_balloon) don't specify it and @pageblock_order is still
>>> taken as their page reporting order. So this shouldn't introduce any
>>> functional changes.
>>>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>>> ---
>>>   include/linux/page_reporting.h | 3 +++
>>>   mm/page_reporting.c            | 6 ++++++
>>>   2 files changed, 9 insertions(+)
>>>
>>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>>> --- a/include/linux/page_reporting.h
>>> +++ b/include/linux/page_reporting.h
>>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>>
>>>          /* Current state of page reporting */
>>>          atomic_t state;
>>> +
>>> +       /* Minimal order of page reporting */
>>> +       unsigned int order;
>>>   };
>>>
>>>   /* Tear-down and bring-up for page reporting devices */
>>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>>> index 34bf4d26c2c4..382958eef8a9 100644
>>> --- a/mm/page_reporting.c
>>> +++ b/mm/page_reporting.c
>>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>>                  goto err_out;
>>>          }
>>>
>>> +       /*
>>> +        * Update the page reporting order if it's specified by driver.
>>> +        * Otherwise, it falls back to @pageblock_order.
>>> +        */
>>> +       page_reporting_order = prdev->order ? : pageblock_order;
>>> +
>>
>> An alternative to this would be to look at setting up some
>> comparisons. I might add another variable and do something like:
>> order = prdev->order ? : pageblock_order;
>> if (order < page_reporting_order)
>>      page_reporting_order = order;
>>
>> You could essentially do something similar in the previous patch but
>> just use pageblock_order directly rather than having to add a local
>> variable.
>>
>> That way if you need to still pull down the page reporting order you
>> can do so without prdev->order or pageblock_order overwriting the
>> value and pushing it back up.
>>
> 
> Thanks, Alex. Lets do both in v5, which will be posted shortly.
> 

Alex, I just posted v5 to have the checks you suggested. Could
you help to have a quick scan. It's pointless to let Andrew
drop the patches and apply the last one again :)

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  1:47 ` [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order Gavin Shan
  2021-06-25  1:19   ` Alexander Duyck
@ 2021-06-25  5:48   ` Michael S. Tsirkin
  2021-06-25  6:04     ` Gavin Shan
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-06-25  5:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote:
> The page reporting order (threshold) is sticky to @pageblock_order
> by default. The page reporting can never be triggered because the
> freeing page can't come up with a free area like that huge. The
> situation becomes worse when the system memory becomes heavily
> fragmented.
> 
> For example, the following configurations are used on ARM64 when 64KB
> base page size is enabled. In this specific case, the page reporting
> won't be triggered until the freeing page comes up with a 512MB free
> area. That's hard to be met, especially when the system memory becomes
> heavily fragmented.
> 
>    PAGE_SIZE:          64KB
>    HPAGE_SIZE:         512MB
>    pageblock_order:    13       (512MB)
>    MAX_ORDER:          14
> 
> This allows the drivers to specify the page reporting order when the
> page reporting device is registered. It falls back to @pageblock_order
> if it's not specified by the driver. The existing users (hv_balloon
> and virtio_balloon) don't specify it and @pageblock_order is still
> taken as their page reporting order. So this shouldn't introduce any
> functional changes.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  include/linux/page_reporting.h | 3 +++
>  mm/page_reporting.c            | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
> index 3b99e0ec24f2..fe648dfa3a7c 100644
> --- a/include/linux/page_reporting.h
> +++ b/include/linux/page_reporting.h
> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>  
>  	/* Current state of page reporting */
>  	atomic_t state;
> +
> +	/* Minimal order of page reporting */
> +	unsigned int order;
>  };
>  
>  /* Tear-down and bring-up for page reporting devices */
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index 34bf4d26c2c4..382958eef8a9 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>  		goto err_out;
>  	}
>  
> +	/*
> +	 * Update the page reporting order if it's specified by driver.
> +	 * Otherwise, it falls back to @pageblock_order.
> +	 */
> +	page_reporting_order = prdev->order ? : pageblock_order;
> +

Hmm. So on ARM achitectures with 64K pages, the command line parameter
is silently ignored?

Isn't this a problem?

>  	/* initialize state and work structures */
>  	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>  	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter
  2021-06-25  1:47 ` [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter Gavin Shan
  2021-06-25  1:14   ` Alexander Duyck
@ 2021-06-25  5:53   ` Michael S. Tsirkin
  2021-06-25  6:08     ` Gavin Shan
  1 sibling, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-06-25  5:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On Fri, Jun 25, 2021 at 09:47:08AM +0800, Gavin Shan wrote:
> The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
> threshold. It can't be adjusted at runtime.
> 
> This introduces a variable (@page_reporting_order) to replace the
> marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
> meaning the page reporting is disabled. It will be specified by driver
> if valid one is provided. Otherwise, it will fall back to @pageblock_order.
> It's also exported so that the page reporting order can be adjusted at
> runtime.
> 
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>  mm/page_reporting.c                             | 9 +++++++--
>  mm/page_reporting.h                             | 5 ++---
>  3 files changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index cb89dbdedc46..566c4b9af3cd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3566,6 +3566,12 @@
>  			off: turn off poisoning (default)
>  			on: turn on poisoning
>  
> +	page_reporting.page_reporting_order=
> +			[KNL] Minimal page reporting order
> +			Format: <integer>
> +			Adjust the minimal page reporting order. The page
> +			reporting is disabled when it exceeds (MAX_ORDER-1).

Which the admin knows how? Run grep in the kernel source?

> +
>  	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
>  			timeout > 0: seconds before rebooting
>  			timeout = 0: wait forever
> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
> index df9c5054e1b4..34bf4d26c2c4 100644
> --- a/mm/page_reporting.c
> +++ b/mm/page_reporting.c
> @@ -4,12 +4,17 @@
>  #include <linux/page_reporting.h>
>  #include <linux/gfp.h>
>  #include <linux/export.h>
> +#include <linux/module.h>
>  #include <linux/delay.h>
>  #include <linux/scatterlist.h>
>  
>  #include "page_reporting.h"
>  #include "internal.h"
>  
> +unsigned int page_reporting_order = MAX_ORDER;
> +module_param(page_reporting_order, uint, 0644);
> +MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
> +
>  #define PAGE_REPORTING_DELAY	(2 * HZ)
>  static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
>  
> @@ -229,7 +234,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>  
>  	/* Generate minimum watermark to be able to guarantee progress */
>  	watermark = low_wmark_pages(zone) +
> -		    (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER);
> +		    (PAGE_REPORTING_CAPACITY << page_reporting_order);


Looks like this makes it easy to trigger undefined behaviour. Just use
any value > 31.

>  
>  	/*
>  	 * Cancel request if insufficient free memory or if we failed
> @@ -239,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>  		return err;
>  
>  	/* Process each free list starting from lowest order/mt */
> -	for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
> +	for (order = page_reporting_order; order < MAX_ORDER; order++) {
>  		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
>  			/* We do not pull pages from the isolate free list */
>  			if (is_migrate_isolate(mt))
> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
> index 2c385dd4ddbd..c51dbc228b94 100644
> --- a/mm/page_reporting.h
> +++ b/mm/page_reporting.h
> @@ -10,10 +10,9 @@
>  #include <linux/pgtable.h>
>  #include <linux/scatterlist.h>
>  
> -#define PAGE_REPORTING_MIN_ORDER	pageblock_order
> -
>  #ifdef CONFIG_PAGE_REPORTING
>  DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
> +extern unsigned int page_reporting_order;
>  void __page_reporting_notify(void);
>  
>  static inline bool page_reported(struct page *page)
> @@ -38,7 +37,7 @@ static inline void page_reporting_notify_free(unsigned int order)
>  		return;
>  
>  	/* Determine if we have crossed reporting threshold */
> -	if (order < PAGE_REPORTING_MIN_ORDER)
> +	if (order < page_reporting_order)
>  		return;
>  
>  	/* This will add a few cycles, but should be called infrequently */
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed
  2021-06-25  1:47 ` [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed Gavin Shan
@ 2021-06-25  5:57   ` Michael S. Tsirkin
  2021-06-25  6:11     ` Gavin Shan
  0 siblings, 1 reply; 16+ messages in thread
From: Michael S. Tsirkin @ 2021-06-25  5:57 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On Fri, Jun 25, 2021 at 09:47:10AM +0800, Gavin Shan wrote:
> The page reporting won't be triggered if the freeing page can't come
> up with a free area, whose size is equal or bigger than the threshold
> (page reporting order). The default page reporting order, equal to
> @pageblock_order, is too huge on some architectures to trigger page
> reporting. One example is ARM64 when 64KB base page size is used.
> 
>       PAGE_SIZE:          64KB
>       pageblock_order:    13       (512MB)
>       MAX_ORDER:          14
> 
> This specifies the page reporting order to 5 (2MB) for this specific
> case so that page reporting can be triggered.
> 
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: virtualization@lists.linux-foundation.org
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
> ---
>  drivers/virtio/virtio_balloon.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
> index 510e9318854d..47dce91f788c 100644
> --- a/drivers/virtio/virtio_balloon.c
> +++ b/drivers/virtio/virtio_balloon.c
> @@ -993,6 +993,23 @@ static int virtballoon_probe(struct virtio_device *vdev)
>  			goto out_unregister_oom;
>  		}
>  
> +		/*
> +		 * The default page reporting order is @pageblock_order, which
> +		 * corresponds to 512MB in size on ARM64 when 64KB base page
> +		 * size is used. The page reporting won't be triggered if the
> +		 * freeing page can't come up with a free area like that huge.
> +		 * So we specify the page reporting order to 5, corresponding
> +		 * to 2MB. It helps to avoid THP splitting if 4KB base page
> +		 * size is used by host.
> +		 *
> +		 * Ideally, the page reporting order is selected based on the
> +		 * host's base page size. However, it needs more work to report
> +		 * that value. The hard-coded order would be fine currently.
> +		 */
> +#if defined(CONFIG_ARM64) && defined(CONFIG_ARM64_64K_PAGES)
> +		vb->pr_dev_info.order = 5;
> +#endif
> +

I was hoping we can get rid of the hacks in virtio with the new
parameter and logic in mm core. Why not?

>  		err = page_reporting_register(&vb->pr_dev_info);
>  		if (err)
>  			goto out_unregister_oom;
> -- 
> 2.23.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order
  2021-06-25  5:48   ` Michael S. Tsirkin
@ 2021-06-25  6:04     ` Gavin Shan
  0 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  6:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On 6/25/21 3:48 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 25, 2021 at 09:47:09AM +0800, Gavin Shan wrote:
>> The page reporting order (threshold) is sticky to @pageblock_order
>> by default. The page reporting can never be triggered because the
>> freeing page can't come up with a free area like that huge. The
>> situation becomes worse when the system memory becomes heavily
>> fragmented.
>>
>> For example, the following configurations are used on ARM64 when 64KB
>> base page size is enabled. In this specific case, the page reporting
>> won't be triggered until the freeing page comes up with a 512MB free
>> area. That's hard to be met, especially when the system memory becomes
>> heavily fragmented.
>>
>>     PAGE_SIZE:          64KB
>>     HPAGE_SIZE:         512MB
>>     pageblock_order:    13       (512MB)
>>     MAX_ORDER:          14
>>
>> This allows the drivers to specify the page reporting order when the
>> page reporting device is registered. It falls back to @pageblock_order
>> if it's not specified by the driver. The existing users (hv_balloon
>> and virtio_balloon) don't specify it and @pageblock_order is still
>> taken as their page reporting order. So this shouldn't introduce any
>> functional changes.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   include/linux/page_reporting.h | 3 +++
>>   mm/page_reporting.c            | 6 ++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/page_reporting.h b/include/linux/page_reporting.h
>> index 3b99e0ec24f2..fe648dfa3a7c 100644
>> --- a/include/linux/page_reporting.h
>> +++ b/include/linux/page_reporting.h
>> @@ -18,6 +18,9 @@ struct page_reporting_dev_info {
>>   
>>   	/* Current state of page reporting */
>>   	atomic_t state;
>> +
>> +	/* Minimal order of page reporting */
>> +	unsigned int order;
>>   };
>>   
>>   /* Tear-down and bring-up for page reporting devices */
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index 34bf4d26c2c4..382958eef8a9 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -329,6 +329,12 @@ int page_reporting_register(struct page_reporting_dev_info *prdev)
>>   		goto err_out;
>>   	}
>>   
>> +	/*
>> +	 * Update the page reporting order if it's specified by driver.
>> +	 * Otherwise, it falls back to @pageblock_order.
>> +	 */
>> +	page_reporting_order = prdev->order ? : pageblock_order;
>> +
> 
> Hmm. So on ARM achitectures with 64K pages, the command line parameter
> is silently ignored?
> 
> Isn't this a problem?
> 

It's fine as the command line parameter is used as debugging purpose.
Besides, it can be changed by writing to the following file:

    /sys/module/page_reporting/parameters/page_reporting_order


>>   	/* initialize state and work structures */
>>   	atomic_set(&prdev->state, PAGE_REPORTING_IDLE);
>>   	INIT_DELAYED_WORK(&prdev->work, &page_reporting_process);
>> -- 
>> 2.23.0

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter
  2021-06-25  5:53   ` Michael S. Tsirkin
@ 2021-06-25  6:08     ` Gavin Shan
  0 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  6:08 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On 6/25/21 3:53 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 25, 2021 at 09:47:08AM +0800, Gavin Shan wrote:
>> The macro PAGE_REPORTING_MIN_ORDER is defined as the page reporting
>> threshold. It can't be adjusted at runtime.
>>
>> This introduces a variable (@page_reporting_order) to replace the
>> marcro (PAGE_REPORTING_MIN_ORDER). MAX_ORDER is assigned to it initially,
>> meaning the page reporting is disabled. It will be specified by driver
>> if valid one is provided. Otherwise, it will fall back to @pageblock_order.
>> It's also exported so that the page reporting order can be adjusted at
>> runtime.
>>
>> Suggested-by: David Hildenbrand <david@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>>   mm/page_reporting.c                             | 9 +++++++--
>>   mm/page_reporting.h                             | 5 ++---
>>   3 files changed, 15 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index cb89dbdedc46..566c4b9af3cd 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3566,6 +3566,12 @@
>>   			off: turn off poisoning (default)
>>   			on: turn on poisoning
>>   
>> +	page_reporting.page_reporting_order=
>> +			[KNL] Minimal page reporting order
>> +			Format: <integer>
>> +			Adjust the minimal page reporting order. The page
>> +			reporting is disabled when it exceeds (MAX_ORDER-1).
> 
> Which the admin knows how? Run grep in the kernel source?
> 

Well, I guess it's fine as it's used for debugging purpose. I guess
it's mostly used by developers. Also, the value can be changed by
the module parameter in "/sys/module/page_reporting" either.

>> +
>>   	panic=		[KNL] Kernel behaviour on panic: delay <timeout>
>>   			timeout > 0: seconds before rebooting
>>   			timeout = 0: wait forever
>> diff --git a/mm/page_reporting.c b/mm/page_reporting.c
>> index df9c5054e1b4..34bf4d26c2c4 100644
>> --- a/mm/page_reporting.c
>> +++ b/mm/page_reporting.c
>> @@ -4,12 +4,17 @@
>>   #include <linux/page_reporting.h>
>>   #include <linux/gfp.h>
>>   #include <linux/export.h>
>> +#include <linux/module.h>
>>   #include <linux/delay.h>
>>   #include <linux/scatterlist.h>
>>   
>>   #include "page_reporting.h"
>>   #include "internal.h"
>>   
>> +unsigned int page_reporting_order = MAX_ORDER;
>> +module_param(page_reporting_order, uint, 0644);
>> +MODULE_PARM_DESC(page_reporting_order, "Set page reporting order");
>> +
>>   #define PAGE_REPORTING_DELAY	(2 * HZ)
>>   static struct page_reporting_dev_info __rcu *pr_dev_info __read_mostly;
>>   
>> @@ -229,7 +234,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>>   
>>   	/* Generate minimum watermark to be able to guarantee progress */
>>   	watermark = low_wmark_pages(zone) +
>> -		    (PAGE_REPORTING_CAPACITY << PAGE_REPORTING_MIN_ORDER);
>> +		    (PAGE_REPORTING_CAPACITY << page_reporting_order);
> 
> 
> Looks like this makes it easy to trigger undefined behaviour. Just use
> any value > 31.
> 

This function won't be run if page_reporting_order is more than (MAX_ORDER-1).

>>   
>>   	/*
>>   	 * Cancel request if insufficient free memory or if we failed
>> @@ -239,7 +244,7 @@ page_reporting_process_zone(struct page_reporting_dev_info *prdev,
>>   		return err;
>>   
>>   	/* Process each free list starting from lowest order/mt */
>> -	for (order = PAGE_REPORTING_MIN_ORDER; order < MAX_ORDER; order++) {
>> +	for (order = page_reporting_order; order < MAX_ORDER; order++) {
>>   		for (mt = 0; mt < MIGRATE_TYPES; mt++) {
>>   			/* We do not pull pages from the isolate free list */
>>   			if (is_migrate_isolate(mt))
>> diff --git a/mm/page_reporting.h b/mm/page_reporting.h
>> index 2c385dd4ddbd..c51dbc228b94 100644
>> --- a/mm/page_reporting.h
>> +++ b/mm/page_reporting.h
>> @@ -10,10 +10,9 @@
>>   #include <linux/pgtable.h>
>>   #include <linux/scatterlist.h>
>>   
>> -#define PAGE_REPORTING_MIN_ORDER	pageblock_order
>> -
>>   #ifdef CONFIG_PAGE_REPORTING
>>   DECLARE_STATIC_KEY_FALSE(page_reporting_enabled);
>> +extern unsigned int page_reporting_order;
>>   void __page_reporting_notify(void);
>>   
>>   static inline bool page_reported(struct page *page)
>> @@ -38,7 +37,7 @@ static inline void page_reporting_notify_free(unsigned int order)
>>   		return;
>>   
>>   	/* Determine if we have crossed reporting threshold */
>> -	if (order < PAGE_REPORTING_MIN_ORDER)
>> +	if (order < page_reporting_order)
>>   		return;
>>   
>>   	/* This will add a few cycles, but should be called infrequently */
>> -- 
>> 2.23.0
> 

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed
  2021-06-25  5:57   ` Michael S. Tsirkin
@ 2021-06-25  6:11     ` Gavin Shan
  0 siblings, 0 replies; 16+ messages in thread
From: Gavin Shan @ 2021-06-25  6:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, linux-kernel, alexander.duyck, david, akpm,
	anshuman.khandual, catalin.marinas, will, shan.gavin

On 6/25/21 3:57 PM, Michael S. Tsirkin wrote:
> On Fri, Jun 25, 2021 at 09:47:10AM +0800, Gavin Shan wrote:
>> The page reporting won't be triggered if the freeing page can't come
>> up with a free area, whose size is equal or bigger than the threshold
>> (page reporting order). The default page reporting order, equal to
>> @pageblock_order, is too huge on some architectures to trigger page
>> reporting. One example is ARM64 when 64KB base page size is used.
>>
>>        PAGE_SIZE:          64KB
>>        pageblock_order:    13       (512MB)
>>        MAX_ORDER:          14
>>
>> This specifies the page reporting order to 5 (2MB) for this specific
>> case so that page reporting can be triggered.
>>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: virtualization@lists.linux-foundation.org
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
>> ---
>>   drivers/virtio/virtio_balloon.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
>> index 510e9318854d..47dce91f788c 100644
>> --- a/drivers/virtio/virtio_balloon.c
>> +++ b/drivers/virtio/virtio_balloon.c
>> @@ -993,6 +993,23 @@ static int virtballoon_probe(struct virtio_device *vdev)
>>   			goto out_unregister_oom;
>>   		}
>>   
>> +		/*
>> +		 * The default page reporting order is @pageblock_order, which
>> +		 * corresponds to 512MB in size on ARM64 when 64KB base page
>> +		 * size is used. The page reporting won't be triggered if the
>> +		 * freeing page can't come up with a free area like that huge.
>> +		 * So we specify the page reporting order to 5, corresponding
>> +		 * to 2MB. It helps to avoid THP splitting if 4KB base page
>> +		 * size is used by host.
>> +		 *
>> +		 * Ideally, the page reporting order is selected based on the
>> +		 * host's base page size. However, it needs more work to report
>> +		 * that value. The hard-coded order would be fine currently.
>> +		 */
>> +#if defined(CONFIG_ARM64) && defined(CONFIG_ARM64_64K_PAGES)
>> +		vb->pr_dev_info.order = 5;
>> +#endif
>> +
> 
> I was hoping we can get rid of the hacks in virtio with the new
> parameter and logic in mm core. Why not?
> 

Yes, it's something for future as the comments says. Ideally, The
page reporting order can be provided by VMM (QEMU). However, guest's
memory could be backed up by combinations like 4KB pages, 16KB huge
pages, ..., 1GB huge pages. So I need to sort it out before the hack
can be removed from virtio-balloon.

>>   		err = page_reporting_register(&vb->pr_dev_info);
>>   		if (err)
>>   			goto out_unregister_oom;

Thanks,
Gavin


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-06-25  6:12 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-25  1:47 [PATCH v4 0/4] mm/page_reporting: Make page reporting work on arm64 with 64KB page size Gavin Shan
2021-06-25  1:47 ` [PATCH v4 1/4] mm/page_reporting: Fix code style in __page_reporting_request() Gavin Shan
2021-06-25  1:47 ` [PATCH v4 2/4] mm/page_reporting: Export reporting order as module parameter Gavin Shan
2021-06-25  1:14   ` Alexander Duyck
2021-06-25  3:58     ` Gavin Shan
2021-06-25  5:53   ` Michael S. Tsirkin
2021-06-25  6:08     ` Gavin Shan
2021-06-25  1:47 ` [PATCH v4 3/4] mm/page_reporting: Allow driver to specify reporting order Gavin Shan
2021-06-25  1:19   ` Alexander Duyck
2021-06-25  4:00     ` Gavin Shan
2021-06-25  4:24       ` Gavin Shan
2021-06-25  5:48   ` Michael S. Tsirkin
2021-06-25  6:04     ` Gavin Shan
2021-06-25  1:47 ` [PATCH v4 4/4] virtio_balloon: Specify page reporting order if needed Gavin Shan
2021-06-25  5:57   ` Michael S. Tsirkin
2021-06-25  6:11     ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).