linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] Smart scanning mode for KSM
@ 2023-09-12 17:52 Stefan Roesch
  2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Stefan Roesch @ 2023-09-12 17:52 UTC (permalink / raw)
  To: kernel-team; +Cc: shr, akpm, david, hannes, riel, linux-kernel, linux-mm

This patch series adds "smart scanning" for KSM.

What is smart scanning?
=======================
KSM evaluates all the candidate pages for each scan. It does not use historic
information from previous scans. This has the effect that candidate pages that
couldn't be used for KSM de-duplication continue to be evaluated for each scan.

The idea of "smart scanning" is to keep historic information. With the historic
information we can temporarily skip the candidate page for one or several scans.

Details:
========
"Smart scanning" is to keep two small counters to store if the page has been
used for KSM. One counter stores how often we already tried to use the page for
KSM and the other counter stores when a page will be used as a candidate page
again.

How often we skip the candidate page depends how often a page failed KSM
de-duplication. The code skips a maximum of 8 times. During testing this has
shown to be a good compromise for different workloads.

New sysfs knob:
===============
Smart scanning is not enabled by default. With /sys/kernel/mm/ksm/smart_scan
smart scanning can be enabled.

Monitoring:
===========
To monitor how effective smart scanning is a new sysfs knob has been introduced.
/sys/kernel/mm/pages_skipped report how many pages have been skipped by smart
scanning.

Results:
========
- Various workloads have shown a 20% - 25% reduction in page scans
  For the instagram workload for instance, the number of pages scanned has been
  reduced from over 20M pages per scan to less than 15M pages.
- Less pages scans also resulted in an overall higher de-duplication rate as
  some shorter lived pages could be de-duplicated additionally
- Less pages scanned allows to reduce the pages_to_scan parameter
  and this resulted in  a 25% reduction in terms of CPU.
- The improvements have been observed for workloads that enable KSM with
  madvise as well as prctl



Stefan Roesch (4):
  mm/ksm: add "smart" page scanning mode
  mm/ksm: add pages_skipped metric
  mm/ksm: document smart scan mode
  mm/ksm: document pages_skipped sysfs knob

 Documentation/admin-guide/mm/ksm.rst | 11 ++++
 mm/ksm.c                             | 87 ++++++++++++++++++++++++++++
 2 files changed, 98 insertions(+)


base-commit: 15bcc9730fcd7526a3b92eff105d6701767a53bb
-- 
2.39.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-12 17:52 [PATCH v1 0/4] Smart scanning mode for KSM Stefan Roesch
@ 2023-09-12 17:52 ` Stefan Roesch
  2023-09-13 21:07   ` Andrew Morton
  2023-09-18 11:10   ` David Hildenbrand
  2023-09-12 17:52 ` [PATCH v1 2/4] mm/ksm: add pages_skipped metric Stefan Roesch
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Stefan Roesch @ 2023-09-12 17:52 UTC (permalink / raw)
  To: kernel-team; +Cc: shr, akpm, david, hannes, riel, linux-kernel, linux-mm

This change adds a "smart" page scanning mode for KSM. So far all the
candidate pages are continuously scanned to find candidates for
de-duplication. There are a considerably number of pages that cannot be
de-duplicated. This is costly in terms of CPU. By using smart scanning
considerable CPU savings can be achieved.

This change takes the history of scanning pages into account and skips
the page scanning of certain pages for a while if de-deduplication for
this page has not been successful in the past.

To do this it introduces two new fields in the ksm_rmap_item structure:
age and skip_age. age, is the KSM age and skip_page is the age for how
long page scanning of this page is skipped. The age field is incremented
each time the page is scanned and the page cannot be de-duplicated.

How often a page is skipped is dependent how often de-duplication has
been tried so far and the number of skips is currently limited to 8.
This value has shown to be effective with different workloads.

The feature is currently disable by default and can be enabled with the
new smart_scan knob.

The feature has shown to be very effective: upt to 25% of the page scans
can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
a similar de-duplication rate can be maintained.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
---
 mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 75 insertions(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index 981af9c72e7a..bfd5087c7d5a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -56,6 +56,8 @@
 #define DO_NUMA(x)	do { } while (0)
 #endif
 
+typedef u8 rmap_age_t;
+
 /**
  * DOC: Overview
  *
@@ -193,6 +195,8 @@ struct ksm_stable_node {
  * @node: rb node of this rmap_item in the unstable tree
  * @head: pointer to stable_node heading this list in the stable tree
  * @hlist: link into hlist of rmap_items hanging off that stable_node
+ * @age: number of scan iterations since creation
+ * @skip_age: skip rmap item until age reaches skip_age
  */
 struct ksm_rmap_item {
 	struct ksm_rmap_item *rmap_list;
@@ -212,6 +216,8 @@ struct ksm_rmap_item {
 			struct hlist_node hlist;
 		};
 	};
+	rmap_age_t age;
+	rmap_age_t skip_age;
 };
 
 #define SEQNR_MASK	0x0ff	/* low bits of unstable tree seqnr */
@@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
 /* Whether to merge empty (zeroed) pages with actual zero pages */
 static bool ksm_use_zero_pages __read_mostly;
 
+/* Skip pages that couldn't be de-duplicated previously  */
+static bool ksm_smart_scan;
+
 /* The number of zero pages which is placed by KSM */
 unsigned long ksm_zero_pages;
 
@@ -2305,6 +2314,45 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
 	return rmap_item;
 }
 
+static unsigned int inc_skip_age(rmap_age_t age)
+{
+	if (age <= 3)
+		return 1;
+	if (age <= 5)
+		return 2;
+	if (age <= 8)
+		return 4;
+
+	return 8;
+}
+
+static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
+{
+	rmap_age_t age;
+
+	if (!ksm_smart_scan)
+		return false;
+
+	if (PageKsm(page))
+		return false;
+
+	age = rmap_item->age++;
+	if (age < 3)
+		return false;
+
+	if (rmap_item->skip_age == age) {
+		rmap_item->skip_age = 0;
+		return false;
+	}
+
+	if (rmap_item->skip_age == 0) {
+		rmap_item->skip_age = age + inc_skip_age(age);
+		remove_rmap_item_from_tree(rmap_item);
+	}
+
+	return true;
+}
+
 static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
 {
 	struct mm_struct *mm;
@@ -2409,6 +2457,10 @@ static struct ksm_rmap_item *scan_get_next_rmap_item(struct page **page)
 				if (rmap_item) {
 					ksm_scan.rmap_list =
 							&rmap_item->rmap_list;
+
+					if (skip_rmap_item(*page, rmap_item))
+						goto next_page;
+
 					ksm_scan.address += PAGE_SIZE;
 				} else
 					put_page(*page);
@@ -3449,6 +3501,28 @@ static ssize_t full_scans_show(struct kobject *kobj,
 }
 KSM_ATTR_RO(full_scans);
 
+static ssize_t smart_scan_show(struct kobject *kobj,
+			       struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%u\n", ksm_smart_scan);
+}
+
+static ssize_t smart_scan_store(struct kobject *kobj,
+				struct kobj_attribute *attr,
+				const char *buf, size_t count)
+{
+	int err;
+	bool value;
+
+	err = kstrtobool(buf, &value);
+	if (err)
+		return -EINVAL;
+
+	ksm_smart_scan = value;
+	return count;
+}
+KSM_ATTR(smart_scan);
+
 static struct attribute *ksm_attrs[] = {
 	&sleep_millisecs_attr.attr,
 	&pages_to_scan_attr.attr,
@@ -3469,6 +3543,7 @@ static struct attribute *ksm_attrs[] = {
 	&stable_node_chains_prune_millisecs_attr.attr,
 	&use_zero_pages_attr.attr,
 	&general_profit_attr.attr,
+	&smart_scan_attr.attr,
 	NULL,
 };
 
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 2/4] mm/ksm: add pages_skipped metric
  2023-09-12 17:52 [PATCH v1 0/4] Smart scanning mode for KSM Stefan Roesch
  2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
@ 2023-09-12 17:52 ` Stefan Roesch
  2023-09-18 11:28   ` David Hildenbrand
  2023-09-12 17:52 ` [PATCH v1 3/4] mm/ksm: document smart scan mode Stefan Roesch
  2023-09-12 17:52 ` [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob Stefan Roesch
  3 siblings, 1 reply; 14+ messages in thread
From: Stefan Roesch @ 2023-09-12 17:52 UTC (permalink / raw)
  To: kernel-team; +Cc: shr, akpm, david, hannes, riel, linux-kernel, linux-mm

This change adds the "pages skipped" metric. To be able to evaluate how
successful smart page scanning is, the pages skipped metric can be
compared to the pages scanned metric.

The pages skipped metric is a cumulative counter. The counter is stored
under /sys/kernel/mm/ksm/pages_skipped.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
---
 mm/ksm.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index bfd5087c7d5a..728574a3033e 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -293,6 +293,9 @@ static bool ksm_smart_scan;
 /* The number of zero pages which is placed by KSM */
 unsigned long ksm_zero_pages;
 
+/* The number of pages that have been skipped due to "smart scanning" */
+static unsigned long ksm_pages_skipped;
+
 #ifdef CONFIG_NUMA
 /* Zeroed when merging across nodes is not allowed */
 static unsigned int ksm_merge_across_nodes = 1;
@@ -2345,6 +2348,7 @@ static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
 		return false;
 	}
 
+	ksm_pages_skipped++;
 	if (rmap_item->skip_age == 0) {
 		rmap_item->skip_age = age + inc_skip_age(age);
 		remove_rmap_item_from_tree(rmap_item);
@@ -3435,6 +3439,13 @@ static ssize_t pages_volatile_show(struct kobject *kobj,
 }
 KSM_ATTR_RO(pages_volatile);
 
+static ssize_t pages_skipped_show(struct kobject *kobj,
+				  struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%lu\n", ksm_pages_skipped);
+}
+KSM_ATTR_RO(pages_skipped);
+
 static ssize_t ksm_zero_pages_show(struct kobject *kobj,
 				struct kobj_attribute *attr, char *buf)
 {
@@ -3532,6 +3543,7 @@ static struct attribute *ksm_attrs[] = {
 	&pages_sharing_attr.attr,
 	&pages_unshared_attr.attr,
 	&pages_volatile_attr.attr,
+	&pages_skipped_attr.attr,
 	&ksm_zero_pages_attr.attr,
 	&full_scans_attr.attr,
 #ifdef CONFIG_NUMA
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 3/4] mm/ksm: document smart scan mode
  2023-09-12 17:52 [PATCH v1 0/4] Smart scanning mode for KSM Stefan Roesch
  2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
  2023-09-12 17:52 ` [PATCH v1 2/4] mm/ksm: add pages_skipped metric Stefan Roesch
@ 2023-09-12 17:52 ` Stefan Roesch
  2023-09-18 11:28   ` David Hildenbrand
  2023-09-12 17:52 ` [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob Stefan Roesch
  3 siblings, 1 reply; 14+ messages in thread
From: Stefan Roesch @ 2023-09-12 17:52 UTC (permalink / raw)
  To: kernel-team; +Cc: shr, akpm, david, hannes, riel, linux-kernel, linux-mm

This adds documentation for the smart scan mode of KSM.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
---
 Documentation/admin-guide/mm/ksm.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 776f244bdae4..1762219baf51 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -155,6 +155,15 @@ stable_node_chains_prune_millisecs
         scan. It's a noop if not a single KSM page hit the
         ``max_page_sharing`` yet.
 
+smart_scan
+        By default KSM checks every candidate page for each scan. It does
+        not take into account historic information. When smart scan is
+        enabled, pages that have previously not been de-duplicated get
+        skipped. How often these pages are skipped depends on how often
+        de-duplication has already been tried and failed. By default this
+        optimization is disabled. The ``pages_skipped`` metric shows how
+        effetive the setting is.
+
 The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
 
 general_profit
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob
  2023-09-12 17:52 [PATCH v1 0/4] Smart scanning mode for KSM Stefan Roesch
                   ` (2 preceding siblings ...)
  2023-09-12 17:52 ` [PATCH v1 3/4] mm/ksm: document smart scan mode Stefan Roesch
@ 2023-09-12 17:52 ` Stefan Roesch
  2023-09-18 11:28   ` David Hildenbrand
  3 siblings, 1 reply; 14+ messages in thread
From: Stefan Roesch @ 2023-09-12 17:52 UTC (permalink / raw)
  To: kernel-team; +Cc: shr, akpm, david, hannes, riel, linux-kernel, linux-mm

This adds documentation for the new metric pages_skipped.

Signed-off-by: Stefan Roesch <shr@devkernel.io>
---
 Documentation/admin-guide/mm/ksm.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 1762219baf51..27d949250b67 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -178,6 +178,8 @@ pages_unshared
         how many pages unique but repeatedly checked for merging
 pages_volatile
         how many pages changing too fast to be placed in a tree
+pages_skipped
+        how many pages did the "smart" page scanning algorithm skip
 full_scans
         how many times all mergeable areas have been scanned
 stable_node_chains
-- 
2.39.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
@ 2023-09-13 21:07   ` Andrew Morton
  2023-09-18 18:47     ` Stefan Roesch
  2023-09-18 11:10   ` David Hildenbrand
  1 sibling, 1 reply; 14+ messages in thread
From: Andrew Morton @ 2023-09-13 21:07 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, david, hannes, riel, linux-kernel, linux-mm

On Tue, 12 Sep 2023 10:52:25 -0700 Stefan Roesch <shr@devkernel.io> wrote:

> This change adds a "smart" page scanning mode for KSM. So far all the
> candidate pages are continuously scanned to find candidates for
> de-duplication. There are a considerably number of pages that cannot be
> de-duplicated. This is costly in terms of CPU. By using smart scanning
> considerable CPU savings can be achieved.
> 
> This change takes the history of scanning pages into account and skips
> the page scanning of certain pages for a while if de-deduplication for
> this page has not been successful in the past.
> 
> To do this it introduces two new fields in the ksm_rmap_item structure:
> age and skip_age. age, is the KSM age and skip_page is the age for how

s/skip_page/skip_age/

> long page scanning of this page is skipped. The age field is incremented
> each time the page is scanned and the page cannot be de-duplicated.
> 
> How often a page is skipped is dependent how often de-duplication has
> been tried so far and the number of skips is currently limited to 8.
> This value has shown to be effective with different workloads.
> 
> The feature is currently disable by default and can be enabled with the
> new smart_scan knob.
> 
> The feature has shown to be very effective: upt to 25% of the page scans
> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
> a similar de-duplication rate can be maintained.
> 

All seems nice.  I'll sit out v1, see what people have to say.

Some nits:

> --- a/mm/ksm.c
> +++ b/mm/ksm.c
>
> ...
>
> @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>  	return rmap_item;
>  }
>  
> +static unsigned int inc_skip_age(rmap_age_t age)
> +{
> +	if (age <= 3)
> +		return 1;
> +	if (age <= 5)
> +		return 2;
> +	if (age <= 8)
> +		return 4;
> +
> +	return 8;
> +}

"inc_skip_age" sounds like it increments something.  Can we give it a
better name?

And a nice comment explaining its role in life.

> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
> +{
> +	rmap_age_t age;
> +
> +	if (!ksm_smart_scan)
> +		return false;
> +
> +	if (PageKsm(page))
> +		return false;
> +
> +	age = rmap_item->age++;
> +	if (age < 3)
> +		return false;
> +
> +	if (rmap_item->skip_age == age) {
> +		rmap_item->skip_age = 0;
> +		return false;
> +	}
> +
> +	if (rmap_item->skip_age == 0) {
> +		rmap_item->skip_age = age + inc_skip_age(age);
> +		remove_rmap_item_from_tree(rmap_item);
> +	}
> +
> +	return true;
> +}

Would a better name be should_skip_rmap_item()?

But even that name implies that the function is idempotent (has no
side-effects).  Again, an explanatory comment would be good.  And
simple comments over each non-obvious `if' statement.

>
> ...
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
  2023-09-13 21:07   ` Andrew Morton
@ 2023-09-18 11:10   ` David Hildenbrand
  2023-09-18 16:18     ` Stefan Roesch
  1 sibling, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2023-09-18 11:10 UTC (permalink / raw)
  To: Stefan Roesch, kernel-team; +Cc: akpm, hannes, riel, linux-kernel, linux-mm

On 12.09.23 19:52, Stefan Roesch wrote:
> This change adds a "smart" page scanning mode for KSM. So far all the
> candidate pages are continuously scanned to find candidates for
> de-duplication. There are a considerably number of pages that cannot be
> de-duplicated. This is costly in terms of CPU. By using smart scanning
> considerable CPU savings can be achieved.
> 
> This change takes the history of scanning pages into account and skips
> the page scanning of certain pages for a while if de-deduplication for
> this page has not been successful in the past.
> 
> To do this it introduces two new fields in the ksm_rmap_item structure:
> age and skip_age. age, is the KSM age and skip_page is the age for how
> long page scanning of this page is skipped. The age field is incremented
> each time the page is scanned and the page cannot be de-duplicated.
> 
> How often a page is skipped is dependent how often de-duplication has
> been tried so far and the number of skips is currently limited to 8.
> This value has shown to be effective with different workloads.
> 
> The feature is currently disable by default and can be enabled with the
> new smart_scan knob.
> 
> The feature has shown to be very effective: upt to 25% of the page scans
> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
> a similar de-duplication rate can be maintained.
> 
> Signed-off-by: Stefan Roesch <shr@devkernel.io>
> ---
>   mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 75 insertions(+)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 981af9c72e7a..bfd5087c7d5a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -56,6 +56,8 @@
>   #define DO_NUMA(x)	do { } while (0)
>   #endif
>   
> +typedef u8 rmap_age_t;
> +
>   /**
>    * DOC: Overview
>    *
> @@ -193,6 +195,8 @@ struct ksm_stable_node {
>    * @node: rb node of this rmap_item in the unstable tree
>    * @head: pointer to stable_node heading this list in the stable tree
>    * @hlist: link into hlist of rmap_items hanging off that stable_node
> + * @age: number of scan iterations since creation
> + * @skip_age: skip rmap item until age reaches skip_age
>    */
>   struct ksm_rmap_item {
>   	struct ksm_rmap_item *rmap_list;
> @@ -212,6 +216,8 @@ struct ksm_rmap_item {
>   			struct hlist_node hlist;
>   		};
>   	};
> +	rmap_age_t age;
> +	rmap_age_t skip_age;
>   };
>   
>   #define SEQNR_MASK	0x0ff	/* low bits of unstable tree seqnr */
> @@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
>   /* Whether to merge empty (zeroed) pages with actual zero pages */
>   static bool ksm_use_zero_pages __read_mostly;
>   
> +/* Skip pages that couldn't be de-duplicated previously  */
> +static bool ksm_smart_scan;
> +
>   /* The number of zero pages which is placed by KSM */
>   unsigned long ksm_zero_pages;
>   
> @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>   	return rmap_item;
>   }
>   
> +static unsigned int inc_skip_age(rmap_age_t age)
> +{
> +	if (age <= 3)
> +		return 1;
> +	if (age <= 5)
> +		return 2;
> +	if (age <= 8)
> +		return 4;
> +
> +	return 8;
> +}
> +
> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
> +{
> +	rmap_age_t age;
> +
> +	if (!ksm_smart_scan)
> +		return false;
> +
> +	if (PageKsm(page))
> +		return false;


I'm a bit confused about this check here. scan_get_next_rmap_item() 
would return a PageKsm() page and call cmp_and_merge_page().

cmp_and_merge_page() says: "first see if page can be merged into the 
stable tree"

... but shouldn't a PageKsm page *already* be in the stable tree?

Maybe that's what cmp_and_merge_page() does via:

	kpage = stable_tree_search(page);
	if (kpage == page && rmap_item->head == stable_node) {
		put_page(kpage);
		return;
	}


Hoping you can enlighten me :)

> +
> +	age = rmap_item->age++;

Can't we overflow here? Is that desired, or would you want to stop at 
the maximum you can store?

> +	if (age < 3)
> +		return false;
> +
> +	if (rmap_item->skip_age == age) {
> +		rmap_item->skip_age = 0;
> +		return false;
> +	}
> +
> +	if (rmap_item->skip_age == 0) {
> +		rmap_item->skip_age = age + inc_skip_age(age);

Can't you overflow here as well?

> +		remove_rmap_item_from_tree(rmap_item);


Can you enlighten me why that is required?

> +	}
> +
> +	return true;
> +}
> +


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 2/4] mm/ksm: add pages_skipped metric
  2023-09-12 17:52 ` [PATCH v1 2/4] mm/ksm: add pages_skipped metric Stefan Roesch
@ 2023-09-18 11:28   ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2023-09-18 11:28 UTC (permalink / raw)
  To: Stefan Roesch, kernel-team; +Cc: akpm, hannes, riel, linux-kernel, linux-mm

On 12.09.23 19:52, Stefan Roesch wrote:
> This change adds the "pages skipped" metric. To be able to evaluate how
> successful smart page scanning is, the pages skipped metric can be
> compared to the pages scanned metric.
> 
> The pages skipped metric is a cumulative counter. The counter is stored
> under /sys/kernel/mm/ksm/pages_skipped.
> 
> Signed-off-by: Stefan Roesch <shr@devkernel.io>
> ---
>   mm/ksm.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index bfd5087c7d5a..728574a3033e 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -293,6 +293,9 @@ static bool ksm_smart_scan;
>   /* The number of zero pages which is placed by KSM */
>   unsigned long ksm_zero_pages;
>   
> +/* The number of pages that have been skipped due to "smart scanning" */
> +static unsigned long ksm_pages_skipped;
> +
>   #ifdef CONFIG_NUMA
>   /* Zeroed when merging across nodes is not allowed */
>   static unsigned int ksm_merge_across_nodes = 1;
> @@ -2345,6 +2348,7 @@ static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
>   		return false;
>   	}
>   
> +	ksm_pages_skipped++;
>   	if (rmap_item->skip_age == 0) {
>   		rmap_item->skip_age = age + inc_skip_age(age);
>   		remove_rmap_item_from_tree(rmap_item);
> @@ -3435,6 +3439,13 @@ static ssize_t pages_volatile_show(struct kobject *kobj,
>   }
>   KSM_ATTR_RO(pages_volatile);
>   
> +static ssize_t pages_skipped_show(struct kobject *kobj,
> +				  struct kobj_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf, "%lu\n", ksm_pages_skipped);
> +}
> +KSM_ATTR_RO(pages_skipped);
> +
>   static ssize_t ksm_zero_pages_show(struct kobject *kobj,
>   				struct kobj_attribute *attr, char *buf)
>   {
> @@ -3532,6 +3543,7 @@ static struct attribute *ksm_attrs[] = {
>   	&pages_sharing_attr.attr,
>   	&pages_unshared_attr.attr,
>   	&pages_volatile_attr.attr,
> +	&pages_skipped_attr.attr,
>   	&ksm_zero_pages_attr.attr,
>   	&full_scans_attr.attr,
>   #ifdef CONFIG_NUMA

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 3/4] mm/ksm: document smart scan mode
  2023-09-12 17:52 ` [PATCH v1 3/4] mm/ksm: document smart scan mode Stefan Roesch
@ 2023-09-18 11:28   ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2023-09-18 11:28 UTC (permalink / raw)
  To: Stefan Roesch, kernel-team; +Cc: akpm, hannes, riel, linux-kernel, linux-mm

On 12.09.23 19:52, Stefan Roesch wrote:
> This adds documentation for the smart scan mode of KSM.
> 
> Signed-off-by: Stefan Roesch <shr@devkernel.io>
> ---
>   Documentation/admin-guide/mm/ksm.rst | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
> index 776f244bdae4..1762219baf51 100644
> --- a/Documentation/admin-guide/mm/ksm.rst
> +++ b/Documentation/admin-guide/mm/ksm.rst
> @@ -155,6 +155,15 @@ stable_node_chains_prune_millisecs
>           scan. It's a noop if not a single KSM page hit the
>           ``max_page_sharing`` yet.
>   
> +smart_scan
> +        By default KSM checks every candidate page for each scan. It does
> +        not take into account historic information. When smart scan is
> +        enabled, pages that have previously not been de-duplicated get
> +        skipped. How often these pages are skipped depends on how often
> +        de-duplication has already been tried and failed. By default this
> +        optimization is disabled. The ``pages_skipped`` metric shows how
> +        effetive the setting is.
> +
>   The effectiveness of KSM and MADV_MERGEABLE is shown in ``/sys/kernel/mm/ksm/``:
>   
>   general_profit

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob
  2023-09-12 17:52 ` [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob Stefan Roesch
@ 2023-09-18 11:28   ` David Hildenbrand
  0 siblings, 0 replies; 14+ messages in thread
From: David Hildenbrand @ 2023-09-18 11:28 UTC (permalink / raw)
  To: Stefan Roesch, kernel-team; +Cc: akpm, hannes, riel, linux-kernel, linux-mm

On 12.09.23 19:52, Stefan Roesch wrote:
> This adds documentation for the new metric pages_skipped.
> 
> Signed-off-by: Stefan Roesch <shr@devkernel.io>
> ---
>   Documentation/admin-guide/mm/ksm.rst | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
> index 1762219baf51..27d949250b67 100644
> --- a/Documentation/admin-guide/mm/ksm.rst
> +++ b/Documentation/admin-guide/mm/ksm.rst
> @@ -178,6 +178,8 @@ pages_unshared
>           how many pages unique but repeatedly checked for merging
>   pages_volatile
>           how many pages changing too fast to be placed in a tree
> +pages_skipped
> +        how many pages did the "smart" page scanning algorithm skip
>   full_scans
>           how many times all mergeable areas have been scanned
>   stable_node_chains

Reviewed-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-18 11:10   ` David Hildenbrand
@ 2023-09-18 16:18     ` Stefan Roesch
  2023-09-18 16:54       ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Roesch @ 2023-09-18 16:18 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: kernel-team, akpm, hannes, riel, linux-kernel, linux-mm


David Hildenbrand <david@redhat.com> writes:

> On 12.09.23 19:52, Stefan Roesch wrote:
>> This change adds a "smart" page scanning mode for KSM. So far all the
>> candidate pages are continuously scanned to find candidates for
>> de-duplication. There are a considerably number of pages that cannot be
>> de-duplicated. This is costly in terms of CPU. By using smart scanning
>> considerable CPU savings can be achieved.
>> This change takes the history of scanning pages into account and skips
>> the page scanning of certain pages for a while if de-deduplication for
>> this page has not been successful in the past.
>> To do this it introduces two new fields in the ksm_rmap_item structure:
>> age and skip_age. age, is the KSM age and skip_page is the age for how
>> long page scanning of this page is skipped. The age field is incremented
>> each time the page is scanned and the page cannot be de-duplicated.
>> How often a page is skipped is dependent how often de-duplication has
>> been tried so far and the number of skips is currently limited to 8.
>> This value has shown to be effective with different workloads.
>> The feature is currently disable by default and can be enabled with the
>> new smart_scan knob.
>> The feature has shown to be very effective: upt to 25% of the page scans
>> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
>> a similar de-duplication rate can be maintained.
>> Signed-off-by: Stefan Roesch <shr@devkernel.io>
>> ---
>>   mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 75 insertions(+)
>> diff --git a/mm/ksm.c b/mm/ksm.c
>> index 981af9c72e7a..bfd5087c7d5a 100644
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>> @@ -56,6 +56,8 @@
>>   #define DO_NUMA(x)	do { } while (0)
>>   #endif
>>   +typedef u8 rmap_age_t;
>> +
>>   /**
>>    * DOC: Overview
>>    *
>> @@ -193,6 +195,8 @@ struct ksm_stable_node {
>>    * @node: rb node of this rmap_item in the unstable tree
>>    * @head: pointer to stable_node heading this list in the stable tree
>>    * @hlist: link into hlist of rmap_items hanging off that stable_node
>> + * @age: number of scan iterations since creation
>> + * @skip_age: skip rmap item until age reaches skip_age
>>    */
>>   struct ksm_rmap_item {
>>   	struct ksm_rmap_item *rmap_list;
>> @@ -212,6 +216,8 @@ struct ksm_rmap_item {
>>   			struct hlist_node hlist;
>>   		};
>>   	};
>> +	rmap_age_t age;
>> +	rmap_age_t skip_age;
>>   };
>>     #define SEQNR_MASK	0x0ff	/* low bits of unstable tree seqnr */
>> @@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
>>   /* Whether to merge empty (zeroed) pages with actual zero pages */
>>   static bool ksm_use_zero_pages __read_mostly;
>>   +/* Skip pages that couldn't be de-duplicated previously  */
>> +static bool ksm_smart_scan;
>> +
>>   /* The number of zero pages which is placed by KSM */
>>   unsigned long ksm_zero_pages;
>>   @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item
>> *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>>   	return rmap_item;
>>   }
>>   +static unsigned int inc_skip_age(rmap_age_t age)
>> +{
>> +	if (age <= 3)
>> +		return 1;
>> +	if (age <= 5)
>> +		return 2;
>> +	if (age <= 8)
>> +		return 4;
>> +
>> +	return 8;
>> +}
>> +
>> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
>> +{
>> +	rmap_age_t age;
>> +
>> +	if (!ksm_smart_scan)
>> +		return false;
>> +
>> +	if (PageKsm(page))
>> +		return false;
>
>
> I'm a bit confused about this check here. scan_get_next_rmap_item() would return
> a PageKsm() page and call cmp_and_merge_page().
>
> cmp_and_merge_page() says: "first see if page can be merged into the stable
> tree"
>
> ... but shouldn't a PageKsm page *already* be in the stable tree?
>
> Maybe that's what cmp_and_merge_page() does via:
>
> 	kpage = stable_tree_search(page);
> 	if (kpage == page && rmap_item->head == stable_node) {
> 		put_page(kpage);
> 		return;
> 	}
>
>
> Hoping you can enlighten me :)
>

The above description sounds correct. During each scan we go through all
the candidate pages and this includes rmap_items that maps to KSM pages.
The above check simply skips these pages.

>> +
>> +	age = rmap_item->age++;
>
> Can't we overflow here? Is that desired, or would you want to stop at the
> maximum you can store?
>

Yes, we can overflow here and it was a deliberate choice. If we overflow
after we tried unsuccessfully for 255 times, we re-start with shorter
skip values, but that should be fine. In return we avoid an if statement.
The age is defined as unsigned.

>> +	if (age < 3)
>> +		return false;
>> +
>> +	if (rmap_item->skip_age == age) {
>> +		rmap_item->skip_age = 0;
>> +		return false;
>> +	}
>> +
>> +	if (rmap_item->skip_age == 0) {
>> +		rmap_item->skip_age = age + inc_skip_age(age);
>
> Can't you overflow here as well?
>

Yes, you can. See the above discussion. This skip_age is also an
unsigned value.

>> +		remove_rmap_item_from_tree(rmap_item);
>
>
> Can you enlighten me why that is required?
>

This is required for age calculation and BUG_ON check in
remove_rmap_item_from_tree. If we don't call remove_rmap_item_from_tree,
we will hit the BUG_ON for the skipped pages later on.

>> +	}
>> +
>> +	return true;
>> +}
>> +

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-18 16:18     ` Stefan Roesch
@ 2023-09-18 16:54       ` David Hildenbrand
  2023-09-18 17:22         ` Stefan Roesch
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2023-09-18 16:54 UTC (permalink / raw)
  To: Stefan Roesch; +Cc: kernel-team, akpm, hannes, riel, linux-kernel, linux-mm

On 18.09.23 18:18, Stefan Roesch wrote:
> 
> David Hildenbrand <david@redhat.com> writes:
> 
>> On 12.09.23 19:52, Stefan Roesch wrote:
>>> This change adds a "smart" page scanning mode for KSM. So far all the
>>> candidate pages are continuously scanned to find candidates for
>>> de-duplication. There are a considerably number of pages that cannot be
>>> de-duplicated. This is costly in terms of CPU. By using smart scanning
>>> considerable CPU savings can be achieved.
>>> This change takes the history of scanning pages into account and skips
>>> the page scanning of certain pages for a while if de-deduplication for
>>> this page has not been successful in the past.
>>> To do this it introduces two new fields in the ksm_rmap_item structure:
>>> age and skip_age. age, is the KSM age and skip_page is the age for how
>>> long page scanning of this page is skipped. The age field is incremented
>>> each time the page is scanned and the page cannot be de-duplicated.
>>> How often a page is skipped is dependent how often de-duplication has
>>> been tried so far and the number of skips is currently limited to 8.
>>> This value has shown to be effective with different workloads.
>>> The feature is currently disable by default and can be enabled with the
>>> new smart_scan knob.
>>> The feature has shown to be very effective: upt to 25% of the page scans
>>> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
>>> a similar de-duplication rate can be maintained.
>>> Signed-off-by: Stefan Roesch <shr@devkernel.io>
>>> ---
>>>    mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 75 insertions(+)
>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>> index 981af9c72e7a..bfd5087c7d5a 100644
>>> --- a/mm/ksm.c
>>> +++ b/mm/ksm.c
>>> @@ -56,6 +56,8 @@
>>>    #define DO_NUMA(x)	do { } while (0)
>>>    #endif
>>>    +typedef u8 rmap_age_t;
>>> +
>>>    /**
>>>     * DOC: Overview
>>>     *
>>> @@ -193,6 +195,8 @@ struct ksm_stable_node {
>>>     * @node: rb node of this rmap_item in the unstable tree
>>>     * @head: pointer to stable_node heading this list in the stable tree
>>>     * @hlist: link into hlist of rmap_items hanging off that stable_node
>>> + * @age: number of scan iterations since creation
>>> + * @skip_age: skip rmap item until age reaches skip_age
>>>     */
>>>    struct ksm_rmap_item {
>>>    	struct ksm_rmap_item *rmap_list;
>>> @@ -212,6 +216,8 @@ struct ksm_rmap_item {
>>>    			struct hlist_node hlist;
>>>    		};
>>>    	};
>>> +	rmap_age_t age;
>>> +	rmap_age_t skip_age;
>>>    };
>>>      #define SEQNR_MASK	0x0ff	/* low bits of unstable tree seqnr */
>>> @@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
>>>    /* Whether to merge empty (zeroed) pages with actual zero pages */
>>>    static bool ksm_use_zero_pages __read_mostly;
>>>    +/* Skip pages that couldn't be de-duplicated previously  */
>>> +static bool ksm_smart_scan;
>>> +
>>>    /* The number of zero pages which is placed by KSM */
>>>    unsigned long ksm_zero_pages;
>>>    @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item
>>> *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>>>    	return rmap_item;
>>>    }
>>>    +static unsigned int inc_skip_age(rmap_age_t age)
>>> +{
>>> +	if (age <= 3)
>>> +		return 1;
>>> +	if (age <= 5)
>>> +		return 2;
>>> +	if (age <= 8)
>>> +		return 4;
>>> +
>>> +	return 8;
>>> +}
>>> +
>>> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
>>> +{
>>> +	rmap_age_t age;
>>> +
>>> +	if (!ksm_smart_scan)
>>> +		return false;
>>> +
>>> +	if (PageKsm(page))
>>> +		return false;
>>
>>
>> I'm a bit confused about this check here. scan_get_next_rmap_item() would return
>> a PageKsm() page and call cmp_and_merge_page().
>>
>> cmp_and_merge_page() says: "first see if page can be merged into the stable
>> tree"
>>
>> ... but shouldn't a PageKsm page *already* be in the stable tree?
>>
>> Maybe that's what cmp_and_merge_page() does via:
>>
>> 	kpage = stable_tree_search(page);
>> 	if (kpage == page && rmap_item->head == stable_node) {
>> 		put_page(kpage);
>> 		return;
>> 	}
>>
>>
>> Hoping you can enlighten me :)
>>
> 
> The above description sounds correct. During each scan we go through all
> the candidate pages and this includes rmap_items that maps to KSM pages.
> The above check simply skips these pages.

Can we add a comment why we don't skip them? Like

/*
  * Never skip pages that are already KSM; pages cmp_and_merge_page()
  * will essentially ignore them, but we still have to process them
  * properly.
  */

> 
>>> +
>>> +	age = rmap_item->age++;
>>
>> Can't we overflow here? Is that desired, or would you want to stop at the
>> maximum you can store?
>>
> 
> Yes, we can overflow here and it was a deliberate choice. If we overflow
> after we tried unsuccessfully for 255 times, we re-start with shorter
> skip values, but that should be fine. In return we avoid an if statement.
> The age is defined as unsigned.

Can we make that explicit instead? Dealing with implicit overflows 
really makes the code harder to grasp.

> 
>>> +	if (age < 3)
>>> +		return false;
>>> +
>>> +	if (rmap_item->skip_age == age) {
>>> +		rmap_item->skip_age = 0;
>>> +		return false;
>>> +	}
>>> +
>>> +	if (rmap_item->skip_age == 0) {
>>> +		rmap_item->skip_age = age + inc_skip_age(age);
>>
>> Can't you overflow here as well?
>>
> 
> Yes, you can. See the above discussion. This skip_age is also an
> unsigned value.

Dito.

> 
>>> +		remove_rmap_item_from_tree(rmap_item);
>>
>>
>> Can you enlighten me why that is required?
>>
> 
> This is required for age calculation and BUG_ON check in
> remove_rmap_item_from_tree. If we don't call remove_rmap_item_from_tree,
> we will hit the BUG_ON for the skipped pages later on.

I see, thanks!


-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-18 16:54       ` David Hildenbrand
@ 2023-09-18 17:22         ` Stefan Roesch
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2023-09-18 17:22 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: kernel-team, akpm, hannes, riel, linux-kernel, linux-mm


David Hildenbrand <david@redhat.com> writes:

> On 18.09.23 18:18, Stefan Roesch wrote:
>> David Hildenbrand <david@redhat.com> writes:
>>
>>> On 12.09.23 19:52, Stefan Roesch wrote:
>>>> This change adds a "smart" page scanning mode for KSM. So far all the
>>>> candidate pages are continuously scanned to find candidates for
>>>> de-duplication. There are a considerably number of pages that cannot be
>>>> de-duplicated. This is costly in terms of CPU. By using smart scanning
>>>> considerable CPU savings can be achieved.
>>>> This change takes the history of scanning pages into account and skips
>>>> the page scanning of certain pages for a while if de-deduplication for
>>>> this page has not been successful in the past.
>>>> To do this it introduces two new fields in the ksm_rmap_item structure:
>>>> age and skip_age. age, is the KSM age and skip_page is the age for how
>>>> long page scanning of this page is skipped. The age field is incremented
>>>> each time the page is scanned and the page cannot be de-duplicated.
>>>> How often a page is skipped is dependent how often de-duplication has
>>>> been tried so far and the number of skips is currently limited to 8.
>>>> This value has shown to be effective with different workloads.
>>>> The feature is currently disable by default and can be enabled with the
>>>> new smart_scan knob.
>>>> The feature has shown to be very effective: upt to 25% of the page scans
>>>> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
>>>> a similar de-duplication rate can be maintained.
>>>> Signed-off-by: Stefan Roesch <shr@devkernel.io>
>>>> ---
>>>>    mm/ksm.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 75 insertions(+)
>>>> diff --git a/mm/ksm.c b/mm/ksm.c
>>>> index 981af9c72e7a..bfd5087c7d5a 100644
>>>> --- a/mm/ksm.c
>>>> +++ b/mm/ksm.c
>>>> @@ -56,6 +56,8 @@
>>>>    #define DO_NUMA(x)	do { } while (0)
>>>>    #endif
>>>>    +typedef u8 rmap_age_t;
>>>> +
>>>>    /**
>>>>     * DOC: Overview
>>>>     *
>>>> @@ -193,6 +195,8 @@ struct ksm_stable_node {
>>>>     * @node: rb node of this rmap_item in the unstable tree
>>>>     * @head: pointer to stable_node heading this list in the stable tree
>>>>     * @hlist: link into hlist of rmap_items hanging off that stable_node
>>>> + * @age: number of scan iterations since creation
>>>> + * @skip_age: skip rmap item until age reaches skip_age
>>>>     */
>>>>    struct ksm_rmap_item {
>>>>    	struct ksm_rmap_item *rmap_list;
>>>> @@ -212,6 +216,8 @@ struct ksm_rmap_item {
>>>>    			struct hlist_node hlist;
>>>>    		};
>>>>    	};
>>>> +	rmap_age_t age;
>>>> +	rmap_age_t skip_age;
>>>>    };
>>>>      #define SEQNR_MASK	0x0ff	/* low bits of unstable tree seqnr */
>>>> @@ -281,6 +287,9 @@ static unsigned int zero_checksum __read_mostly;
>>>>    /* Whether to merge empty (zeroed) pages with actual zero pages */
>>>>    static bool ksm_use_zero_pages __read_mostly;
>>>>    +/* Skip pages that couldn't be de-duplicated previously  */
>>>> +static bool ksm_smart_scan;
>>>> +
>>>>    /* The number of zero pages which is placed by KSM */
>>>>    unsigned long ksm_zero_pages;
>>>>    @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item
>>>> *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>>>>    	return rmap_item;
>>>>    }
>>>>    +static unsigned int inc_skip_age(rmap_age_t age)
>>>> +{
>>>> +	if (age <= 3)
>>>> +		return 1;
>>>> +	if (age <= 5)
>>>> +		return 2;
>>>> +	if (age <= 8)
>>>> +		return 4;
>>>> +
>>>> +	return 8;
>>>> +}
>>>> +
>>>> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
>>>> +{
>>>> +	rmap_age_t age;
>>>> +
>>>> +	if (!ksm_smart_scan)
>>>> +		return false;
>>>> +
>>>> +	if (PageKsm(page))
>>>> +		return false;
>>>
>>>
>>> I'm a bit confused about this check here. scan_get_next_rmap_item() would return
>>> a PageKsm() page and call cmp_and_merge_page().
>>>
>>> cmp_and_merge_page() says: "first see if page can be merged into the stable
>>> tree"
>>>
>>> ... but shouldn't a PageKsm page *already* be in the stable tree?
>>>
>>> Maybe that's what cmp_and_merge_page() does via:
>>>
>>> 	kpage = stable_tree_search(page);
>>> 	if (kpage == page && rmap_item->head == stable_node) {
>>> 		put_page(kpage);
>>> 		return;
>>> 	}
>>>
>>>
>>> Hoping you can enlighten me :)
>>>
>> The above description sounds correct. During each scan we go through all
>> the candidate pages and this includes rmap_items that maps to KSM pages.
>> The above check simply skips these pages.
>
> Can we add a comment why we don't skip them? Like
>
> /*
>  * Never skip pages that are already KSM; pages cmp_and_merge_page()
>  * will essentially ignore them, but we still have to process them
>  * properly.
>  */
>

I'll add the comment in the next version.

>>
>>>> +
>>>> +	age = rmap_item->age++;
>>>
>>> Can't we overflow here? Is that desired, or would you want to stop at the
>>> maximum you can store?
>>>
>> Yes, we can overflow here and it was a deliberate choice. If we overflow
>> after we tried unsuccessfully for 255 times, we re-start with shorter
>> skip values, but that should be fine. In return we avoid an if statement.
>> The age is defined as unsigned.
>
> Can we make that explicit instead? Dealing with implicit overflows really makes
> the code harder to grasp.
>

I'll make it explicit.

>>
>>>> +	if (age < 3)
>>>> +		return false;
>>>> +
>>>> +	if (rmap_item->skip_age == age) {
>>>> +		rmap_item->skip_age = 0;
>>>> +		return false;
>>>> +	}
>>>> +
>>>> +	if (rmap_item->skip_age == 0) {
>>>> +		rmap_item->skip_age = age + inc_skip_age(age);
>>>
>>> Can't you overflow here as well?
>>>
>> Yes, you can. See the above discussion. This skip_age is also an
>> unsigned value.
>
> Dito.
>

I'll make it explicit.

>>
>>>> +		remove_rmap_item_from_tree(rmap_item);
>>>
>>>
>>> Can you enlighten me why that is required?
>>>
>> This is required for age calculation and BUG_ON check in
>> remove_rmap_item_from_tree. If we don't call remove_rmap_item_from_tree,
>> we will hit the BUG_ON for the skipped pages later on.
>
> I see, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode
  2023-09-13 21:07   ` Andrew Morton
@ 2023-09-18 18:47     ` Stefan Roesch
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Roesch @ 2023-09-18 18:47 UTC (permalink / raw)
  To: Andrew Morton; +Cc: kernel-team, david, hannes, riel, linux-kernel, linux-mm


Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue, 12 Sep 2023 10:52:25 -0700 Stefan Roesch <shr@devkernel.io> wrote:
>
>> This change adds a "smart" page scanning mode for KSM. So far all the
>> candidate pages are continuously scanned to find candidates for
>> de-duplication. There are a considerably number of pages that cannot be
>> de-duplicated. This is costly in terms of CPU. By using smart scanning
>> considerable CPU savings can be achieved.
>>
>> This change takes the history of scanning pages into account and skips
>> the page scanning of certain pages for a while if de-deduplication for
>> this page has not been successful in the past.
>>
>> To do this it introduces two new fields in the ksm_rmap_item structure:
>> age and skip_age. age, is the KSM age and skip_page is the age for how
>
> s/skip_page/skip_age/
>

Fixed in the next version.

>> long page scanning of this page is skipped. The age field is incremented
>> each time the page is scanned and the page cannot be de-duplicated.
>>
>> How often a page is skipped is dependent how often de-duplication has
>> been tried so far and the number of skips is currently limited to 8.
>> This value has shown to be effective with different workloads.
>>
>> The feature is currently disable by default and can be enabled with the
>> new smart_scan knob.
>>
>> The feature has shown to be very effective: upt to 25% of the page scans
>> can be eliminated; the pages_to_scan rate can be reduced by 40 - 50% and
>> a similar de-duplication rate can be maintained.
>>
>
> All seems nice.  I'll sit out v1, see what people have to say.
>
> Some nits:
>
>> --- a/mm/ksm.c
>> +++ b/mm/ksm.c
>>
>> ...
>>
>> @@ -2305,6 +2314,45 @@ static struct ksm_rmap_item *get_next_rmap_item(struct ksm_mm_slot *mm_slot,
>>  	return rmap_item;
>>  }
>>
>> +static unsigned int inc_skip_age(rmap_age_t age)
>> +{
>> +	if (age <= 3)
>> +		return 1;
>> +	if (age <= 5)
>> +		return 2;
>> +	if (age <= 8)
>> +		return 4;
>> +
>> +	return 8;
>> +}
>
> "inc_skip_age" sounds like it increments something.  Can we give it a
> better name?
>
> And a nice comment explaining its role in life.
>

Renamed it to skip_age in the next version and added a comment.

>> +static bool skip_rmap_item(struct page *page, struct ksm_rmap_item *rmap_item)
>> +{
>> +	rmap_age_t age;
>> +
>> +	if (!ksm_smart_scan)
>> +		return false;
>> +
>> +	if (PageKsm(page))
>> +		return false;
>> +
>> +	age = rmap_item->age++;
>> +	if (age < 3)
>> +		return false;
>> +
>> +	if (rmap_item->skip_age == age) {
>> +		rmap_item->skip_age = 0;
>> +		return false;
>> +	}
>> +
>> +	if (rmap_item->skip_age == 0) {
>> +		rmap_item->skip_age = age + inc_skip_age(age);
>> +		remove_rmap_item_from_tree(rmap_item);
>> +	}
>> +
>> +	return true;
>> +}
>
> Would a better name be should_skip_rmap_item()?
>

Renamed it to should_skip_rmap_item().

> But even that name implies that the function is idempotent (has no
> side-effects).  Again, an explanatory comment would be good.  And
> simple comments over each non-obvious `if' statement.
>

Added more comments to the function to explain the different cases.

>>
>> ...
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-18 18:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-12 17:52 [PATCH v1 0/4] Smart scanning mode for KSM Stefan Roesch
2023-09-12 17:52 ` [PATCH v1 1/4] mm/ksm: add "smart" page scanning mode Stefan Roesch
2023-09-13 21:07   ` Andrew Morton
2023-09-18 18:47     ` Stefan Roesch
2023-09-18 11:10   ` David Hildenbrand
2023-09-18 16:18     ` Stefan Roesch
2023-09-18 16:54       ` David Hildenbrand
2023-09-18 17:22         ` Stefan Roesch
2023-09-12 17:52 ` [PATCH v1 2/4] mm/ksm: add pages_skipped metric Stefan Roesch
2023-09-18 11:28   ` David Hildenbrand
2023-09-12 17:52 ` [PATCH v1 3/4] mm/ksm: document smart scan mode Stefan Roesch
2023-09-18 11:28   ` David Hildenbrand
2023-09-12 17:52 ` [PATCH v1 4/4] mm/ksm: document pages_skipped sysfs knob Stefan Roesch
2023-09-18 11:28   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).