[PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-12 16:17 ` Claudio Imbrenda
  0 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-12 16:17 UTC (permalink / raw)
  To: linux-mm
  Cc: borntraeger, hughd, izik.eidus, aarcange, chrisw, akpm, linux-kernel

Some architectures have a set of zero pages (coloured zero pages)
instead of only one zero page, in order to improve the cache
performance. In those cases, the kernel samepage merger (KSM) would
merge all the allocated pages that happen to be filled with zeroes to
the same deduplicated page, thus losing all the advantages of coloured
zero pages.

This patch fixes this behaviour. When coloured zero pages are present,
the checksum of a zero page is calculated during initialisation, and
compared with the checksum of the current canditate during merging. In
case of a match, the normal merging routine is used to merge the page
with the correct coloured zero page, which ensures the candidate page
is checked to be equal to the target zero page.

This behaviour is noticeable when a process accesses large arrays of
allocated pages containing zeroes. A test I conducted on s390 shows
that there is a speed penalty when KSM merges such pages, compared to
not merging them or using actual zero pages from the start without
breaking the COW.

With this patch, the performance with KSM is the same as with non
COW-broken actual zero pages, which is also the same as without KSM.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
---
 mm/ksm.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index 9ae6011..b0cfc30 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -223,6 +223,11 @@ struct rmap_item {
 /* Milliseconds ksmd should sleep between batches */
 static unsigned int ksm_thread_sleep_millisecs = 20;

+#ifdef __HAVE_COLOR_ZERO_PAGE
+/* Checksum of an empty (zeroed) page */
+static unsigned int zero_checksum;
+#endif
+
 #ifdef CONFIG_NUMA
 /* Zeroed when merging across nodes is not allowed */
 static unsigned int ksm_merge_across_nodes = 1;
@@ -1467,6 +1472,25 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 		return;
 	}

+#ifdef __HAVE_COLOR_ZERO_PAGE
+	/*
+	 * Same checksum as an empty page. We attempt to merge it with the
+	 * appropriate zero page.
+	 */
+	if (checksum == zero_checksum) {
+		struct vm_area_struct *vma;
+
+		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
+		err = try_to_merge_one_page(vma, page,
+					    ZERO_PAGE(rmap_item->address));
+		/*
+		 * In case of failure, the page was not really empty, so we
+		 * need to continue. Otherwise we're done.
+		 */
+		if (!err)
+			return;
+	}
+#endif
 	tree_rmap_item =
 		unstable_tree_search_insert(rmap_item, page, &tree_page);
 	if (tree_rmap_item) {
@@ -2304,6 +2328,11 @@ static int __init ksm_init(void)
 	struct task_struct *ksm_thread;
 	int err;

+#ifdef __HAVE_COLOR_ZERO_PAGE
+	/* The correct value depends on page size and endianness */
+	zero_checksum = calc_checksum(ZERO_PAGE(0));
+#endif
+
 	err = ksm_slab_init();
 	if (err)
 		goto out;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-12 16:17 ` Claudio Imbrenda
  0 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-12 16:17 UTC (permalink / raw)
  To: linux-mm
  Cc: borntraeger, hughd, izik.eidus, aarcange, chrisw, akpm, linux-kernel

Some architectures have a set of zero pages (coloured zero pages)
instead of only one zero page, in order to improve the cache
performance. In those cases, the kernel samepage merger (KSM) would
merge all the allocated pages that happen to be filled with zeroes to
the same deduplicated page, thus losing all the advantages of coloured
zero pages.

This patch fixes this behaviour. When coloured zero pages are present,
the checksum of a zero page is calculated during initialisation, and
compared with the checksum of the current canditate during merging. In
case of a match, the normal merging routine is used to merge the page
with the correct coloured zero page, which ensures the candidate page
is checked to be equal to the target zero page.

This behaviour is noticeable when a process accesses large arrays of
allocated pages containing zeroes. A test I conducted on s390 shows
that there is a speed penalty when KSM merges such pages, compared to
not merging them or using actual zero pages from the start without
breaking the COW.

With this patch, the performance with KSM is the same as with non
COW-broken actual zero pages, which is also the same as without KSM.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
---
 mm/ksm.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/mm/ksm.c b/mm/ksm.c
index 9ae6011..b0cfc30 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -223,6 +223,11 @@ struct rmap_item {
 /* Milliseconds ksmd should sleep between batches */
 static unsigned int ksm_thread_sleep_millisecs = 20;

+#ifdef __HAVE_COLOR_ZERO_PAGE
+/* Checksum of an empty (zeroed) page */
+static unsigned int zero_checksum;
+#endif
+
 #ifdef CONFIG_NUMA
 /* Zeroed when merging across nodes is not allowed */
 static unsigned int ksm_merge_across_nodes = 1;
@@ -1467,6 +1472,25 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 		return;
 	}

+#ifdef __HAVE_COLOR_ZERO_PAGE
+	/*
+	 * Same checksum as an empty page. We attempt to merge it with the
+	 * appropriate zero page.
+	 */
+	if (checksum == zero_checksum) {
+		struct vm_area_struct *vma;
+
+		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
+		err = try_to_merge_one_page(vma, page,
+					    ZERO_PAGE(rmap_item->address));
+		/*
+		 * In case of failure, the page was not really empty, so we
+		 * need to continue. Otherwise we're done.
+		 */
+		if (!err)
+			return;
+	}
+#endif
 	tree_rmap_item =
 		unstable_tree_search_insert(rmap_item, page, &tree_page);
 	if (tree_rmap_item) {
@@ -2304,6 +2328,11 @@ static int __init ksm_init(void)
 	struct task_struct *ksm_thread;
 	int err;

+#ifdef __HAVE_COLOR_ZERO_PAGE
+	/* The correct value depends on page size and endianness */
+	zero_checksum = calc_checksum(ZERO_PAGE(0));
+#endif
+
 	err = ksm_slab_init();
 	if (err)
 		goto out;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-12 16:17 ` Claudio Imbrenda
@ 2017-01-12 16:21   ` Christian Borntraeger
  -1 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2017-01-12 16:21 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-mm
  Cc: hughd, izik.eidus, aarcange, chrisw, akpm, linux-kernel

On 01/12/2017 05:17 PM, Claudio Imbrenda wrote:
> Some architectures have a set of zero pages (coloured zero pages)
> instead of only one zero page, in order to improve the cache
> performance. In those cases, the kernel samepage merger (KSM) would
> merge all the allocated pages that happen to be filled with zeroes to
> the same deduplicated page, thus losing all the advantages of coloured
> zero pages.
> 
> This patch fixes this behaviour. When coloured zero pages are present,
> the checksum of a zero page is calculated during initialisation, and
> compared with the checksum of the current canditate during merging. In
> case of a match, the normal merging routine is used to merge the page
> with the correct coloured zero page, which ensures the candidate page
> is checked to be equal to the target zero page.
> 
> This behaviour is noticeable when a process accesses large arrays of
> allocated pages containing zeroes. A test I conducted on s390 shows
> that there is a speed penalty when KSM merges such pages, compared to
> not merging them or using actual zero pages from the start without
> breaking the COW.
> 
> With this patch, the performance with KSM is the same as with non
> COW-broken actual zero pages, which is also the same as without KSM.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>

FWIW, I cannot say if the memory management part is correct and sane. (the
patch below). But this issue (loosing the cache colouring for the zero 
page) is certainly a reason to not use KSM on s390 for specific workloads
(large sparsely matrixes backed by the guest empty zero page).

This patch will fix that.


> ---
>  mm/ksm.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 9ae6011..b0cfc30 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -223,6 +223,11 @@ struct rmap_item {
>  /* Milliseconds ksmd should sleep between batches */
>  static unsigned int ksm_thread_sleep_millisecs = 20;
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +/* Checksum of an empty (zeroed) page */
> +static unsigned int zero_checksum;
> +#endif
> +
>  #ifdef CONFIG_NUMA
>  /* Zeroed when merging across nodes is not allowed */
>  static unsigned int ksm_merge_across_nodes = 1;
> @@ -1467,6 +1472,25 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  		return;
>  	}
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/*
> +	 * Same checksum as an empty page. We attempt to merge it with the
> +	 * appropriate zero page.
> +	 */
> +	if (checksum == zero_checksum) {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		err = try_to_merge_one_page(vma, page,
> +					    ZERO_PAGE(rmap_item->address));
> +		/*
> +		 * In case of failure, the page was not really empty, so we
> +		 * need to continue. Otherwise we're done.
> +		 */
> +		if (!err)
> +			return;
> +	}
> +#endif
>  	tree_rmap_item =
>  		unstable_tree_search_insert(rmap_item, page, &tree_page);
>  	if (tree_rmap_item) {
> @@ -2304,6 +2328,11 @@ static int __init ksm_init(void)
>  	struct task_struct *ksm_thread;
>  	int err;
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/* The correct value depends on page size and endianness */
> +	zero_checksum = calc_checksum(ZERO_PAGE(0));
> +#endif
> +
>  	err = ksm_slab_init();
>  	if (err)
>  		goto out;
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-12 16:21   ` Christian Borntraeger
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Borntraeger @ 2017-01-12 16:21 UTC (permalink / raw)
  To: Claudio Imbrenda, linux-mm
  Cc: hughd, izik.eidus, aarcange, chrisw, akpm, linux-kernel

On 01/12/2017 05:17 PM, Claudio Imbrenda wrote:
> Some architectures have a set of zero pages (coloured zero pages)
> instead of only one zero page, in order to improve the cache
> performance. In those cases, the kernel samepage merger (KSM) would
> merge all the allocated pages that happen to be filled with zeroes to
> the same deduplicated page, thus losing all the advantages of coloured
> zero pages.
> 
> This patch fixes this behaviour. When coloured zero pages are present,
> the checksum of a zero page is calculated during initialisation, and
> compared with the checksum of the current canditate during merging. In
> case of a match, the normal merging routine is used to merge the page
> with the correct coloured zero page, which ensures the candidate page
> is checked to be equal to the target zero page.
> 
> This behaviour is noticeable when a process accesses large arrays of
> allocated pages containing zeroes. A test I conducted on s390 shows
> that there is a speed penalty when KSM merges such pages, compared to
> not merging them or using actual zero pages from the start without
> breaking the COW.
> 
> With this patch, the performance with KSM is the same as with non
> COW-broken actual zero pages, which is also the same as without KSM.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>

FWIW, I cannot say if the memory management part is correct and sane. (the
patch below). But this issue (loosing the cache colouring for the zero 
page) is certainly a reason to not use KSM on s390 for specific workloads
(large sparsely matrixes backed by the guest empty zero page).

This patch will fix that.


> ---
>  mm/ksm.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 9ae6011..b0cfc30 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -223,6 +223,11 @@ struct rmap_item {
>  /* Milliseconds ksmd should sleep between batches */
>  static unsigned int ksm_thread_sleep_millisecs = 20;
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +/* Checksum of an empty (zeroed) page */
> +static unsigned int zero_checksum;
> +#endif
> +
>  #ifdef CONFIG_NUMA
>  /* Zeroed when merging across nodes is not allowed */
>  static unsigned int ksm_merge_across_nodes = 1;
> @@ -1467,6 +1472,25 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  		return;
>  	}
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/*
> +	 * Same checksum as an empty page. We attempt to merge it with the
> +	 * appropriate zero page.
> +	 */
> +	if (checksum == zero_checksum) {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		err = try_to_merge_one_page(vma, page,
> +					    ZERO_PAGE(rmap_item->address));
> +		/*
> +		 * In case of failure, the page was not really empty, so we
> +		 * need to continue. Otherwise we're done.
> +		 */
> +		if (!err)
> +			return;
> +	}
> +#endif
>  	tree_rmap_item =
>  		unstable_tree_search_insert(rmap_item, page, &tree_page);
>  	if (tree_rmap_item) {
> @@ -2304,6 +2328,11 @@ static int __init ksm_init(void)
>  	struct task_struct *ksm_thread;
>  	int err;
> 
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/* The correct value depends on page size and endianness */
> +	zero_checksum = calc_checksum(ZERO_PAGE(0));
> +#endif
> +
>  	err = ksm_slab_init();
>  	if (err)
>  		goto out;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-12 16:17 ` Claudio Imbrenda
@ 2017-01-12 17:21   ` Andrea Arcangeli
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-12 17:21 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

Hello Claudio,

On Thu, Jan 12, 2017 at 05:17:14PM +0100, Claudio Imbrenda wrote:
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/*
> +	 * Same checksum as an empty page. We attempt to merge it with the
> +	 * appropriate zero page.
> +	 */
> +	if (checksum == zero_checksum) {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		err = try_to_merge_one_page(vma, page,
> +					    ZERO_PAGE(rmap_item->address));

So the objective is not to add the zero pages to the stable tree but
just convert them to readonly zerpages?

Maybe this could be a standard option for all archs to disable
enable/disable with a new sysfs control similarly to the NUMA aware
deduplication. The question is if it should be enabled by default in
those archs where page coloring matters a lot. Probably yes.

There are guest OS creating lots of zero pages, not linux though, for
linux guests this is just overhead. Also those guests creating zero
pages wouldn't constantly read from them so again for KVM usage this
is unlikely to help. For certain guest OS it'll create less KSM
metadata with this approach, but it's debatable if it's worth one more
memcpy for every merge-candidate page to save some metadata, it's very
guest-workload dependent too. Of course your usage is not KVM but
number crunching with uninitialized tables, it's different and the
zero page read speed matters.

On the implementation side I think the above is going to call
page_add_anon_rmap(kpage, vma, addr, false) and get_page by mistake,
and it should use pte_mkspecial not mk_pte. I think you need to pass
up a zeropage bool into replace_page and change replace_page to create
a proper zeropage in place of the old page or it'll eventually
overflow the page count crashing etc...

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-12 17:21   ` Andrea Arcangeli
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-12 17:21 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

Hello Claudio,

On Thu, Jan 12, 2017 at 05:17:14PM +0100, Claudio Imbrenda wrote:
> +#ifdef __HAVE_COLOR_ZERO_PAGE
> +	/*
> +	 * Same checksum as an empty page. We attempt to merge it with the
> +	 * appropriate zero page.
> +	 */
> +	if (checksum == zero_checksum) {
> +		struct vm_area_struct *vma;
> +
> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		err = try_to_merge_one_page(vma, page,
> +					    ZERO_PAGE(rmap_item->address));

So the objective is not to add the zero pages to the stable tree but
just convert them to readonly zerpages?

Maybe this could be a standard option for all archs to disable
enable/disable with a new sysfs control similarly to the NUMA aware
deduplication. The question is if it should be enabled by default in
those archs where page coloring matters a lot. Probably yes.

There are guest OS creating lots of zero pages, not linux though, for
linux guests this is just overhead. Also those guests creating zero
pages wouldn't constantly read from them so again for KVM usage this
is unlikely to help. For certain guest OS it'll create less KSM
metadata with this approach, but it's debatable if it's worth one more
memcpy for every merge-candidate page to save some metadata, it's very
guest-workload dependent too. Of course your usage is not KVM but
number crunching with uninitialized tables, it's different and the
zero page read speed matters.

On the implementation side I think the above is going to call
page_add_anon_rmap(kpage, vma, addr, false) and get_page by mistake,
and it should use pte_mkspecial not mk_pte. I think you need to pass
up a zeropage bool into replace_page and change replace_page to create
a proper zeropage in place of the old page or it'll eventually
overflow the page count crashing etc...

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-12 17:21   ` Andrea Arcangeli
@ 2017-01-18 15:15     ` Claudio Imbrenda
  -1 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-18 15:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

Hi Andrea,

On 12/01/17 18:21, Andrea Arcangeli wrote:
> Hello Claudio,
> 
> On Thu, Jan 12, 2017 at 05:17:14PM +0100, Claudio Imbrenda wrote:
>> +#ifdef __HAVE_COLOR_ZERO_PAGE
>> +	/*
>> +	 * Same checksum as an empty page. We attempt to merge it with the
>> +	 * appropriate zero page.
>> +	 */
>> +	if (checksum == zero_checksum) {
>> +		struct vm_area_struct *vma;
>> +
>> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
>> +		err = try_to_merge_one_page(vma, page,
>> +					    ZERO_PAGE(rmap_item->address));
> 
> So the objective is not to add the zero pages to the stable tree but
> just convert them to readonly zerpages?

Yes. I thought that would be the easiest and cleanest way to do it.

> Maybe this could be a standard option for all archs to disable
> enable/disable with a new sysfs control similarly to the NUMA aware
> deduplication. The question is if it should be enabled by default in
> those archs where page coloring matters a lot. Probably yes.

I'm not sure it would make sense to have this for archs that don't have
page coloring. Merging empty pages together instead of with the
ZERO_PAGE() would save exactly one page and it would bring no speed
advantages (or rather: not using the ZERO_PAGE() would not bring any
speed penalty).
That's why I have #ifdef'd it to have it only when page coloring is
present. Also, for what I could see, only MIPS and s390 have page
coloring; I don't like the idea of imposing any overhead to all the
other archs.

I agree that this should be toggleable with a sysfs control, since it's
a change that can potentially negatively affect the performance in some
cases. I'm adding it in the next iteration.

> There are guest OS creating lots of zero pages, not linux though, for
> linux guests this is just overhead. Also those guests creating zero

Unless the userspace in the guests is creating lots of pages full of
zeroes :)

> pages wouldn't constantly read from them so again for KVM usage this
> is unlikely to help. For certain guest OS it'll create less KSM
> metadata with this approach, but it's debatable if it's worth one more

Honestly I don't think this patch will bring any benefits regarding
metadata -- one page more or less in the metadata won't change much. Our
issue is just the reading speed of the deduplicated empty pages.

> memcpy for every merge-candidate page to save some metadata, it's very

I'm confused, why memcpy? did you mean memcmp? We are not doing any
additional memops except in the case when a candidate non-empty page
happens to have the same checksum as an empty page, in which case we
have an extra memcmp compared to the normal operation.

> guest-workload dependent too. Of course your usage is not KVM but
> number crunching with uninitialized tables, it's different and the
> zero page read speed matters.
> 
> On the implementation side I think the above is going to call
> page_add_anon_rmap(kpage, vma, addr, false) and get_page by mistake,
> and it should use pte_mkspecial not mk_pte. I think you need to pass
> up a zeropage bool into replace_page and change replace_page to create
> a proper zeropage in place of the old page or it'll eventually
> overflow the page count crashing etc...

Maybe an even less intrusive change could be to check in replace_page if
is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
to avoid the overhead for archs without page coloring.
So if the replacement page is a ZERO_PAGE() no get_page() and no
page_add_anon_rmap() would be performed, and the set_pte_at_notify()
would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .

thanks,
Claudio

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-18 15:15     ` Claudio Imbrenda
  0 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-18 15:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

Hi Andrea,

On 12/01/17 18:21, Andrea Arcangeli wrote:
> Hello Claudio,
> 
> On Thu, Jan 12, 2017 at 05:17:14PM +0100, Claudio Imbrenda wrote:
>> +#ifdef __HAVE_COLOR_ZERO_PAGE
>> +	/*
>> +	 * Same checksum as an empty page. We attempt to merge it with the
>> +	 * appropriate zero page.
>> +	 */
>> +	if (checksum == zero_checksum) {
>> +		struct vm_area_struct *vma;
>> +
>> +		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
>> +		err = try_to_merge_one_page(vma, page,
>> +					    ZERO_PAGE(rmap_item->address));
> 
> So the objective is not to add the zero pages to the stable tree but
> just convert them to readonly zerpages?

Yes. I thought that would be the easiest and cleanest way to do it.

> Maybe this could be a standard option for all archs to disable
> enable/disable with a new sysfs control similarly to the NUMA aware
> deduplication. The question is if it should be enabled by default in
> those archs where page coloring matters a lot. Probably yes.

I'm not sure it would make sense to have this for archs that don't have
page coloring. Merging empty pages together instead of with the
ZERO_PAGE() would save exactly one page and it would bring no speed
advantages (or rather: not using the ZERO_PAGE() would not bring any
speed penalty).
That's why I have #ifdef'd it to have it only when page coloring is
present. Also, for what I could see, only MIPS and s390 have page
coloring; I don't like the idea of imposing any overhead to all the
other archs.

I agree that this should be toggleable with a sysfs control, since it's
a change that can potentially negatively affect the performance in some
cases. I'm adding it in the next iteration.

> There are guest OS creating lots of zero pages, not linux though, for
> linux guests this is just overhead. Also those guests creating zero

Unless the userspace in the guests is creating lots of pages full of
zeroes :)

> pages wouldn't constantly read from them so again for KVM usage this
> is unlikely to help. For certain guest OS it'll create less KSM
> metadata with this approach, but it's debatable if it's worth one more

Honestly I don't think this patch will bring any benefits regarding
metadata -- one page more or less in the metadata won't change much. Our
issue is just the reading speed of the deduplicated empty pages.

> memcpy for every merge-candidate page to save some metadata, it's very

I'm confused, why memcpy? did you mean memcmp? We are not doing any
additional memops except in the case when a candidate non-empty page
happens to have the same checksum as an empty page, in which case we
have an extra memcmp compared to the normal operation.

> guest-workload dependent too. Of course your usage is not KVM but
> number crunching with uninitialized tables, it's different and the
> zero page read speed matters.
> 
> On the implementation side I think the above is going to call
> page_add_anon_rmap(kpage, vma, addr, false) and get_page by mistake,
> and it should use pte_mkspecial not mk_pte. I think you need to pass
> up a zeropage bool into replace_page and change replace_page to create
> a proper zeropage in place of the old page or it'll eventually
> overflow the page count crashing etc...

Maybe an even less intrusive change could be to check in replace_page if
is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
to avoid the overhead for archs without page coloring.
So if the replacement page is a ZERO_PAGE() no get_page() and no
page_add_anon_rmap() would be performed, and the set_pte_at_notify()
would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .

thanks,
Claudio

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-18 15:15     ` Claudio Imbrenda
@ 2017-01-18 16:29       ` Andrea Arcangeli
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-18 16:29 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On Wed, Jan 18, 2017 at 04:15:56PM +0100, Claudio Imbrenda wrote:
> I'm not sure it would make sense to have this for archs that don't have
> page coloring. Merging empty pages together instead of with the

It's still good to be able to exercise this code on all archs (if
nothing else for debugging purposes). There's nothing arch dependent
in it after all.

> ZERO_PAGE() would save exactly one page and it would bring no speed
> advantages (or rather: not using the ZERO_PAGE() would not bring any
> speed penalty).
> That's why I have #ifdef'd it to have it only when page coloring is
> present. Also, for what I could see, only MIPS and s390 have page
> coloring; I don't like the idea of imposing any overhead to all the
> other archs.

With a sysctl disabled by default, the only overhead is 8 bytes and a
branch?

> I agree that this should be toggleable with a sysfs control, since it's
> a change that can potentially negatively affect the performance in some
> cases. I'm adding it in the next iteration.

Yes the sysctl can be useful for archs doing page coloring too, but I
would add to all.

> Unless the userspace in the guests is creating lots of pages full of
> zeroes :)

One question comes to mind though, why is the app doing memset(0), if
the app limited itself to just reading the memory it'd use page
colored zero pages that wouldn't risk to become PageKsm. That is true
both for bare metal and guest with KSM in host. This looks a not
optimal app.

> Honestly I don't think this patch will bring any benefits regarding
> metadata -- one page more or less in the metadata won't change much. Our
> issue is just the reading speed of the deduplicated empty pages.

The metadata amount changes, for each shared zero page we'd need to
allocate a rmap_item.

The KSMscale introducing stable_node_chain/dup is precisely meant to
deal with not creating a too large rmap_item chain. Because there can
be plenty of those rmap_items, we've to create multiple zero pages and
multiple stable nodes for those zero pages to limit the maximum number
of rmap_items linked in a stable_node. This then limits the maximum
cost of a rmap_walk on a PageKsm during page migration for compaction
or swapping etc..

The KSMscale fix is needed regardless to avoid KSM to hang a very
large server, because there is no guarantee the most equal page will
be a zero page, it could be 0xff or anything.

However if one knows there's a disproportionate amount of memory as
zero (i.e. certain guest OS do that, that to me would be the main
motivation for the patch), he could prefer to use your sysctl instead
of creating the rmap_item metadata on the zero pages.

> > memcpy for every merge-candidate page to save some metadata, it's very
> 
> I'm confused, why memcpy? did you mean memcmp? We are not doing any
> additional memops except in the case when a candidate non-empty page
> happens to have the same checksum as an empty page, in which case we
> have an extra memcmp compared to the normal operation.

I meant memcmp sorry. So if this is further filtered by the
precomputed zero page cksum, the only concern would be then how likely
the cksum would provide a false positive, in which case there will be
one more memcmp for every merge.

> Maybe an even less intrusive change could be to check in replace_page if
> is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
> to avoid the overhead for archs without page coloring.

It's just one branch, the costly things are memcmp. As long as it's
not memcmp I wouldn't worry about one branch in replace_page and in
the caller. replace_page isn't even a fast path, the fast path is
generally the code that scans the memory. The actual real merging is
not as frequent. So I wouldn't use #ifdefs and I'd use the sysctl
instead, potentially also enabled on all archs (or only on those with
page coloring but possible to enable manually on all archs).

> So if the replacement page is a ZERO_PAGE() no get_page() and no
> page_add_anon_rmap() would be performed, and the set_pte_at_notify()
> would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .

That should fix your patch I think yes.

Singling out zero pages was discussed before, just without KSMscale
it's an incomplete fix and just a band aid.

Actually even for your case it's incomplete and only covers a subset
of apps, what if the app initializes all ram to 0xff instead of zero
and keep reading from it? (it would make more sense to initialize the
memory if it wasn't zero in fact) You'd get the same slowdown as you
get now with the same zero page I think.

The real fix also for this, would have to have a stable tree for each
page color possible, but that is not the same as a having a stable
tree for each NUMA node (which is already implemented). There are
likely too many colors (even if you're not fully associative) so you'd
penalize the "compression" ratio if you were to implement a generic
fix that doesn't single out the zero page.

I've never been particularly excited about optimizing bad apps that
initialize a lot of RAM as zero when they could depend on the mmap
behavior instead and get zero pages in the first place (or bad guest
OS). Overall the main reason why I'm quite positive about adding this
as an optimization is because after reading it (even if not complete)
it seems non intrusive enough and some corner case may benefit, but if
we do it, we can as well leave it available to all archs so it's
easier to test and reproduce any problem too.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-18 16:29       ` Andrea Arcangeli
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-18 16:29 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On Wed, Jan 18, 2017 at 04:15:56PM +0100, Claudio Imbrenda wrote:
> I'm not sure it would make sense to have this for archs that don't have
> page coloring. Merging empty pages together instead of with the

It's still good to be able to exercise this code on all archs (if
nothing else for debugging purposes). There's nothing arch dependent
in it after all.

> ZERO_PAGE() would save exactly one page and it would bring no speed
> advantages (or rather: not using the ZERO_PAGE() would not bring any
> speed penalty).
> That's why I have #ifdef'd it to have it only when page coloring is
> present. Also, for what I could see, only MIPS and s390 have page
> coloring; I don't like the idea of imposing any overhead to all the
> other archs.

With a sysctl disabled by default, the only overhead is 8 bytes and a
branch?

> I agree that this should be toggleable with a sysfs control, since it's
> a change that can potentially negatively affect the performance in some
> cases. I'm adding it in the next iteration.

Yes the sysctl can be useful for archs doing page coloring too, but I
would add to all.

> Unless the userspace in the guests is creating lots of pages full of
> zeroes :)

One question comes to mind though, why is the app doing memset(0), if
the app limited itself to just reading the memory it'd use page
colored zero pages that wouldn't risk to become PageKsm. That is true
both for bare metal and guest with KSM in host. This looks a not
optimal app.

> Honestly I don't think this patch will bring any benefits regarding
> metadata -- one page more or less in the metadata won't change much. Our
> issue is just the reading speed of the deduplicated empty pages.

The metadata amount changes, for each shared zero page we'd need to
allocate a rmap_item.

The KSMscale introducing stable_node_chain/dup is precisely meant to
deal with not creating a too large rmap_item chain. Because there can
be plenty of those rmap_items, we've to create multiple zero pages and
multiple stable nodes for those zero pages to limit the maximum number
of rmap_items linked in a stable_node. This then limits the maximum
cost of a rmap_walk on a PageKsm during page migration for compaction
or swapping etc..

The KSMscale fix is needed regardless to avoid KSM to hang a very
large server, because there is no guarantee the most equal page will
be a zero page, it could be 0xff or anything.

However if one knows there's a disproportionate amount of memory as
zero (i.e. certain guest OS do that, that to me would be the main
motivation for the patch), he could prefer to use your sysctl instead
of creating the rmap_item metadata on the zero pages.

> > memcpy for every merge-candidate page to save some metadata, it's very
> 
> I'm confused, why memcpy? did you mean memcmp? We are not doing any
> additional memops except in the case when a candidate non-empty page
> happens to have the same checksum as an empty page, in which case we
> have an extra memcmp compared to the normal operation.

I meant memcmp sorry. So if this is further filtered by the
precomputed zero page cksum, the only concern would be then how likely
the cksum would provide a false positive, in which case there will be
one more memcmp for every merge.

> Maybe an even less intrusive change could be to check in replace_page if
> is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
> to avoid the overhead for archs without page coloring.

It's just one branch, the costly things are memcmp. As long as it's
not memcmp I wouldn't worry about one branch in replace_page and in
the caller. replace_page isn't even a fast path, the fast path is
generally the code that scans the memory. The actual real merging is
not as frequent. So I wouldn't use #ifdefs and I'd use the sysctl
instead, potentially also enabled on all archs (or only on those with
page coloring but possible to enable manually on all archs).

> So if the replacement page is a ZERO_PAGE() no get_page() and no
> page_add_anon_rmap() would be performed, and the set_pte_at_notify()
> would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .

That should fix your patch I think yes.

Singling out zero pages was discussed before, just without KSMscale
it's an incomplete fix and just a band aid.

Actually even for your case it's incomplete and only covers a subset
of apps, what if the app initializes all ram to 0xff instead of zero
and keep reading from it? (it would make more sense to initialize the
memory if it wasn't zero in fact) You'd get the same slowdown as you
get now with the same zero page I think.

The real fix also for this, would have to have a stable tree for each
page color possible, but that is not the same as a having a stable
tree for each NUMA node (which is already implemented). There are
likely too many colors (even if you're not fully associative) so you'd
penalize the "compression" ratio if you were to implement a generic
fix that doesn't single out the zero page.

I've never been particularly excited about optimizing bad apps that
initialize a lot of RAM as zero when they could depend on the mmap
behavior instead and get zero pages in the first place (or bad guest
OS). Overall the main reason why I'm quite positive about adding this
as an optimization is because after reading it (even if not complete)
it seems non intrusive enough and some corner case may benefit, but if
we do it, we can as well leave it available to all archs so it's
easier to test and reproduce any problem too.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-18 16:29       ` Andrea Arcangeli
@ 2017-01-18 17:17         ` Claudio Imbrenda
  -1 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-18 17:17 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On 18/01/17 17:29, Andrea Arcangeli wrote:
> It's still good to be able to exercise this code on all archs (if
> nothing else for debugging purposes). There's nothing arch dependent
> in it after all.

If it's fine for you, I'm definitely not going to complain :)

>> ZERO_PAGE() would save exactly one page and it would bring no speed
>> advantages (or rather: not using the ZERO_PAGE() would not bring any
>> speed penalty).
>> That's why I have #ifdef'd it to have it only when page coloring is
>> present. Also, for what I could see, only MIPS and s390 have page
>> coloring; I don't like the idea of imposing any overhead to all the
>> other archs.
> 
> With a sysctl disabled by default, the only overhead is 8 bytes and a
> branch?

Yes, I'm sometimes excessively paranoid.

>> I agree that this should be toggleable with a sysfs control, since it's
>> a change that can potentially negatively affect the performance in some
>> cases. I'm adding it in the next iteration.
> 
> Yes the sysctl can be useful for archs doing page coloring too, but I
> would add to all.
> 
>> Unless the userspace in the guests is creating lots of pages full of
>> zeroes :)
> 
> One question comes to mind though, why is the app doing memset(0), if
> the app limited itself to just reading the memory it'd use page
> colored zero pages that wouldn't risk to become PageKsm. That is true
> both for bare metal and guest with KSM in host. This looks a not
> optimal app.

I didn't really make a good example, although I can think of scenarios
where that could legitimately happen. Another case would be a KVM guest.
It will have a bunch of colored zero pages somewhere, but if KSM merges
those together, the advantages of colored zero pages disappear in the guest.

>> Honestly I don't think this patch will bring any benefits regarding
>> metadata -- one page more or less in the metadata won't change much. Our
>> issue is just the reading speed of the deduplicated empty pages.
> 
> The metadata amount changes, for each shared zero page we'd need to
> allocate a rmap_item.
> 
> The KSMscale introducing stable_node_chain/dup is precisely meant to
> deal with not creating a too large rmap_item chain. Because there can
> be plenty of those rmap_items, we've to create multiple zero pages and
> multiple stable nodes for those zero pages to limit the maximum number
> of rmap_items linked in a stable_node. This then limits the maximum
> cost of a rmap_walk on a PageKsm during page migration for compaction
> or swapping etc..
> 
> The KSMscale fix is needed regardless to avoid KSM to hang a very
> large server, because there is no guarantee the most equal page will
> be a zero page, it could be 0xff or anything.
> 
> However if one knows there's a disproportionate amount of memory as
> zero (i.e. certain guest OS do that, that to me would be the main
> motivation for the patch), he could prefer to use your sysctl instead
> of creating the rmap_item metadata on the zero pages.

Ok, sorry, I had completely misunderstood what you had meant the first
time. Now I got it.

>>> memcpy for every merge-candidate page to save some metadata, it's very
>>
>> I'm confused, why memcpy? did you mean memcmp? We are not doing any
>> additional memops except in the case when a candidate non-empty page
>> happens to have the same checksum as an empty page, in which case we
>> have an extra memcmp compared to the normal operation.
> 
> I meant memcmp sorry. So if this is further filtered by the
> precomputed zero page cksum, the only concern would be then how likely
> the cksum would provide a false positive, in which case there will be
> one more memcmp for every merge.
> 
>> Maybe an even less intrusive change could be to check in replace_page if
>> is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
>> to avoid the overhead for archs without page coloring.
> 
> It's just one branch, the costly things are memcmp. As long as it's
> not memcmp I wouldn't worry about one branch in replace_page and in
> the caller. replace_page isn't even a fast path, the fast path is
> generally the code that scans the memory. The actual real merging is
> not as frequent. So I wouldn't use #ifdefs and I'd use the sysctl
> instead, potentially also enabled on all archs (or only on those with
> page coloring but possible to enable manually on all archs).

Ok!

>> So if the replacement page is a ZERO_PAGE() no get_page() and no
>> page_add_anon_rmap() would be performed, and the set_pte_at_notify()
>> would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .
> 
> That should fix your patch I think yes.
> 
> Singling out zero pages was discussed before, just without KSMscale
> it's an incomplete fix and just a band aid.
> 
> Actually even for your case it's incomplete and only covers a subset
> of apps, what if the app initializes all ram to 0xff instead of zero
> and keep reading from it? (it would make more sense to initialize the
> memory if it wasn't zero in fact) You'd get the same slowdown as you
> get now with the same zero page I think.

That's true. As I said above, my previous example was not very well
thought. The more realistic scenario is that of having the colored zero
pages of a guest merged.

> The real fix also for this, would have to have a stable tree for each
> page color possible, but that is not the same as a having a stable
> tree for each NUMA node (which is already implemented). There are
> likely too many colors (even if you're not fully associative) so you'd
> penalize the "compression" ratio if you were to implement a generic
> fix that doesn't single out the zero page.

Also in general it's not probable that the same non-zero data will be
read very often at different guest-physical addresses. A stable tree for
each color would pretty much defeat the purpose of KSM.

> I've never been particularly excited about optimizing bad apps that
> initialize a lot of RAM as zero when they could depend on the mmap
> behavior instead and get zero pages in the first place (or bad guest
> OS). Overall the main reason why I'm quite positive about adding this
> as an optimization is because after reading it (even if not complete)
> it seems non intrusive enough and some corner case may benefit, but if
> we do it, we can as well leave it available to all archs so it's
> easier to test and reproduce any problem too.

Ok, I'll fix and respin my patch then.

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-18 17:17         ` Claudio Imbrenda
  0 siblings, 0 replies; 14+ messages in thread
From: Claudio Imbrenda @ 2017-01-18 17:17 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On 18/01/17 17:29, Andrea Arcangeli wrote:
> It's still good to be able to exercise this code on all archs (if
> nothing else for debugging purposes). There's nothing arch dependent
> in it after all.

If it's fine for you, I'm definitely not going to complain :)

>> ZERO_PAGE() would save exactly one page and it would bring no speed
>> advantages (or rather: not using the ZERO_PAGE() would not bring any
>> speed penalty).
>> That's why I have #ifdef'd it to have it only when page coloring is
>> present. Also, for what I could see, only MIPS and s390 have page
>> coloring; I don't like the idea of imposing any overhead to all the
>> other archs.
> 
> With a sysctl disabled by default, the only overhead is 8 bytes and a
> branch?

Yes, I'm sometimes excessively paranoid.

>> I agree that this should be toggleable with a sysfs control, since it's
>> a change that can potentially negatively affect the performance in some
>> cases. I'm adding it in the next iteration.
> 
> Yes the sysctl can be useful for archs doing page coloring too, but I
> would add to all.
> 
>> Unless the userspace in the guests is creating lots of pages full of
>> zeroes :)
> 
> One question comes to mind though, why is the app doing memset(0), if
> the app limited itself to just reading the memory it'd use page
> colored zero pages that wouldn't risk to become PageKsm. That is true
> both for bare metal and guest with KSM in host. This looks a not
> optimal app.

I didn't really make a good example, although I can think of scenarios
where that could legitimately happen. Another case would be a KVM guest.
It will have a bunch of colored zero pages somewhere, but if KSM merges
those together, the advantages of colored zero pages disappear in the guest.

>> Honestly I don't think this patch will bring any benefits regarding
>> metadata -- one page more or less in the metadata won't change much. Our
>> issue is just the reading speed of the deduplicated empty pages.
> 
> The metadata amount changes, for each shared zero page we'd need to
> allocate a rmap_item.
> 
> The KSMscale introducing stable_node_chain/dup is precisely meant to
> deal with not creating a too large rmap_item chain. Because there can
> be plenty of those rmap_items, we've to create multiple zero pages and
> multiple stable nodes for those zero pages to limit the maximum number
> of rmap_items linked in a stable_node. This then limits the maximum
> cost of a rmap_walk on a PageKsm during page migration for compaction
> or swapping etc..
> 
> The KSMscale fix is needed regardless to avoid KSM to hang a very
> large server, because there is no guarantee the most equal page will
> be a zero page, it could be 0xff or anything.
> 
> However if one knows there's a disproportionate amount of memory as
> zero (i.e. certain guest OS do that, that to me would be the main
> motivation for the patch), he could prefer to use your sysctl instead
> of creating the rmap_item metadata on the zero pages.

Ok, sorry, I had completely misunderstood what you had meant the first
time. Now I got it.

>>> memcpy for every merge-candidate page to save some metadata, it's very
>>
>> I'm confused, why memcpy? did you mean memcmp? We are not doing any
>> additional memops except in the case when a candidate non-empty page
>> happens to have the same checksum as an empty page, in which case we
>> have an extra memcmp compared to the normal operation.
> 
> I meant memcmp sorry. So if this is further filtered by the
> precomputed zero page cksum, the only concern would be then how likely
> the cksum would provide a false positive, in which case there will be
> one more memcmp for every merge.
> 
>> Maybe an even less intrusive change could be to check in replace_page if
>> is_zero_pfn(page_to_pfn(kpage)). And of course I would #ifdef that too,
>> to avoid the overhead for archs without page coloring.
> 
> It's just one branch, the costly things are memcmp. As long as it's
> not memcmp I wouldn't worry about one branch in replace_page and in
> the caller. replace_page isn't even a fast path, the fast path is
> generally the code that scans the memory. The actual real merging is
> not as frequent. So I wouldn't use #ifdefs and I'd use the sysctl
> instead, potentially also enabled on all archs (or only on those with
> page coloring but possible to enable manually on all archs).

Ok!

>> So if the replacement page is a ZERO_PAGE() no get_page() and no
>> page_add_anon_rmap() would be performed, and the set_pte_at_notify()
>> would have pte_mkspecial(pfn_pte(page_to_pfn(kpage))) instead of mk_pte() .
> 
> That should fix your patch I think yes.
> 
> Singling out zero pages was discussed before, just without KSMscale
> it's an incomplete fix and just a band aid.
> 
> Actually even for your case it's incomplete and only covers a subset
> of apps, what if the app initializes all ram to 0xff instead of zero
> and keep reading from it? (it would make more sense to initialize the
> memory if it wasn't zero in fact) You'd get the same slowdown as you
> get now with the same zero page I think.

That's true. As I said above, my previous example was not very well
thought. The more realistic scenario is that of having the colored zero
pages of a guest merged.

> The real fix also for this, would have to have a stable tree for each
> page color possible, but that is not the same as a having a stable
> tree for each NUMA node (which is already implemented). There are
> likely too many colors (even if you're not fully associative) so you'd
> penalize the "compression" ratio if you were to implement a generic
> fix that doesn't single out the zero page.

Also in general it's not probable that the same non-zero data will be
read very often at different guest-physical addresses. A stable tree for
each color would pretty much defeat the purpose of KSM.

> I've never been particularly excited about optimizing bad apps that
> initialize a lot of RAM as zero when they could depend on the mmap
> behavior instead and get zero pages in the first place (or bad guest
> OS). Overall the main reason why I'm quite positive about adding this
> as an optimization is because after reading it (even if not complete)
> it seems non intrusive enough and some corner case may benefit, but if
> we do it, we can as well leave it available to all archs so it's
> easier to test and reproduce any problem too.

Ok, I'll fix and respin my patch then.

Thanks,

Claudio

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
  2017-01-18 17:17         ` Claudio Imbrenda
@ 2017-01-18 18:11           ` Andrea Arcangeli
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-18 18:11 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On Wed, Jan 18, 2017 at 06:17:09PM +0100, Claudio Imbrenda wrote:
> That's true. As I said above, my previous example was not very well
> thought. The more realistic scenario is that of having the colored zero
> pages of a guest merged.

That's a good point for making a special case that retains the
coloring of those guest pages, agreed.

Retaining the coloring of guest zero pages is however a different
"feature" than what KSM was supposed to run for though, I mean the
guest may run faster with KSM than without because without KSM you
wouldn't know which host physical page is allocated for each guest
zero page. If you wanted top performance then you wouldn't know if to
enable KSM or not.

I wonder if the zero page coloring would be better solved with a
vhost-zeropage dedicated mechanism that would be always enabled
regardless if KSM is enabled or not. KSM is generally a CPU vs memory
tradeoff, and it's in general good idea to enable it.

It's also ok if KSM improves performance of course, definitely not
forbidden in fact it's ideal, but my point is, the rest of KSM might
decrease performance too, so if you need a top-performance setup for
benchmarks or for some special usage, it'd be hard to decide if to
enable KSM or not on those archs requiring page coloring.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring
@ 2017-01-18 18:11           ` Andrea Arcangeli
  0 siblings, 0 replies; 14+ messages in thread
From: Andrea Arcangeli @ 2017-01-18 18:11 UTC (permalink / raw)
  To: Claudio Imbrenda
  Cc: linux-mm, borntraeger, hughd, izik.eidus, chrisw, akpm, linux-kernel

On Wed, Jan 18, 2017 at 06:17:09PM +0100, Claudio Imbrenda wrote:
> That's true. As I said above, my previous example was not very well
> thought. The more realistic scenario is that of having the colored zero
> pages of a guest merged.

That's a good point for making a special case that retains the
coloring of those guest pages, agreed.

Retaining the coloring of guest zero pages is however a different
"feature" than what KSM was supposed to run for though, I mean the
guest may run faster with KSM than without because without KSM you
wouldn't know which host physical page is allocated for each guest
zero page. If you wanted top performance then you wouldn't know if to
enable KSM or not.

I wonder if the zero page coloring would be better solved with a
vhost-zeropage dedicated mechanism that would be always enabled
regardless if KSM is enabled or not. KSM is generally a CPU vs memory
tradeoff, and it's in general good idea to enable it.

It's also ok if KSM improves performance of course, definitely not
forbidden in fact it's ideal, but my point is, the rest of KSM might
decrease performance too, so if you need a top-performance setup for
benchmarks or for some special usage, it'd be hard to decide if to
enable KSM or not on those archs requiring page coloring.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-01-18 18:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 16:17 [PATCH v1 1/1] mm/ksm: improve deduplication of zero pages with colouring Claudio Imbrenda
2017-01-12 16:17 ` Claudio Imbrenda
2017-01-12 16:21 ` Christian Borntraeger
2017-01-12 16:21   ` Christian Borntraeger
2017-01-12 17:21 ` Andrea Arcangeli
2017-01-12 17:21   ` Andrea Arcangeli
2017-01-18 15:15   ` Claudio Imbrenda
2017-01-18 15:15     ` Claudio Imbrenda
2017-01-18 16:29     ` Andrea Arcangeli
2017-01-18 16:29       ` Andrea Arcangeli
2017-01-18 17:17       ` Claudio Imbrenda
2017-01-18 17:17         ` Claudio Imbrenda
2017-01-18 18:11         ` Andrea Arcangeli
2017-01-18 18:11           ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.