All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
@ 2017-09-11 11:05 ` Kirill Tkhai
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill Tkhai @ 2017-09-11 11:05 UTC (permalink / raw)
  To: akpm, aarcange, minchan, mhocko, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, ktkhai

In this place mm is unlocked, so vmas or list may change.
Down read mmap_sem to protect them from modifications.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
(and compile-tested-by)
---
 mm/ksm.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index db20f8436bc3..86f0db3d6cdb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1990,6 +1990,7 @@ static void stable_tree_append(struct rmap_item *rmap_item,
  */
 static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 {
+	struct mm_struct *mm = rmap_item->mm;
 	struct rmap_item *tree_rmap_item;
 	struct page *tree_page = NULL;
 	struct stable_node *stable_node;
@@ -2062,9 +2063,11 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 	if (ksm_use_zero_pages && (checksum == zero_checksum)) {
 		struct vm_area_struct *vma;
 
-		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
+		down_read(&mm->mmap_sem);
+		vma = find_mergeable_vma(mm, rmap_item->address);
 		err = try_to_merge_one_page(vma, page,
 					    ZERO_PAGE(rmap_item->address));
+		up_read(&mm->mmap_sem);
 		/*
 		 * In case of failure, the page was not really empty, so we
 		 * need to continue. Otherwise we're done.

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
@ 2017-09-11 11:05 ` Kirill Tkhai
  0 siblings, 0 replies; 6+ messages in thread
From: Kirill Tkhai @ 2017-09-11 11:05 UTC (permalink / raw)
  To: akpm, aarcange, minchan, mhocko, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, ktkhai

In this place mm is unlocked, so vmas or list may change.
Down read mmap_sem to protect them from modifications.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
(and compile-tested-by)
---
 mm/ksm.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index db20f8436bc3..86f0db3d6cdb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1990,6 +1990,7 @@ static void stable_tree_append(struct rmap_item *rmap_item,
  */
 static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 {
+	struct mm_struct *mm = rmap_item->mm;
 	struct rmap_item *tree_rmap_item;
 	struct page *tree_page = NULL;
 	struct stable_node *stable_node;
@@ -2062,9 +2063,11 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 	if (ksm_use_zero_pages && (checksum == zero_checksum)) {
 		struct vm_area_struct *vma;
 
-		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
+		down_read(&mm->mmap_sem);
+		vma = find_mergeable_vma(mm, rmap_item->address);
 		err = try_to_merge_one_page(vma, page,
 					    ZERO_PAGE(rmap_item->address));
+		up_read(&mm->mmap_sem);
 		/*
 		 * In case of failure, the page was not really empty, so we
 		 * need to continue. Otherwise we're done.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
  2017-09-11 11:05 ` Kirill Tkhai
@ 2017-09-13 11:25   ` Michal Hocko
  -1 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2017-09-13 11:25 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: akpm, aarcange, minchan, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, Hugh Dickins

[CC Claudio and Hugh]

On Mon 11-09-17 14:05:05, Kirill Tkhai wrote:
> In this place mm is unlocked, so vmas or list may change.
> Down read mmap_sem to protect them from modifications.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> (and compile-tested-by)

Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
AFAICS. Maybe even CC: stable as unstable vma can cause large variety of
issues including memory corruption.

The fix lookds good to me
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/ksm.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index db20f8436bc3..86f0db3d6cdb 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1990,6 +1990,7 @@ static void stable_tree_append(struct rmap_item *rmap_item,
>   */
>  static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  {
> +	struct mm_struct *mm = rmap_item->mm;
>  	struct rmap_item *tree_rmap_item;
>  	struct page *tree_page = NULL;
>  	struct stable_node *stable_node;
> @@ -2062,9 +2063,11 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  	if (ksm_use_zero_pages && (checksum == zero_checksum)) {
>  		struct vm_area_struct *vma;
>  
> -		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		down_read(&mm->mmap_sem);
> +		vma = find_mergeable_vma(mm, rmap_item->address);
>  		err = try_to_merge_one_page(vma, page,
>  					    ZERO_PAGE(rmap_item->address));
> +		up_read(&mm->mmap_sem);
>  		/*
>  		 * In case of failure, the page was not really empty, so we
>  		 * need to continue. Otherwise we're done.
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
@ 2017-09-13 11:25   ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2017-09-13 11:25 UTC (permalink / raw)
  To: Kirill Tkhai
  Cc: akpm, aarcange, minchan, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, Hugh Dickins

[CC Claudio and Hugh]

On Mon 11-09-17 14:05:05, Kirill Tkhai wrote:
> In this place mm is unlocked, so vmas or list may change.
> Down read mmap_sem to protect them from modifications.
> 
> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> (and compile-tested-by)

Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
AFAICS. Maybe even CC: stable as unstable vma can cause large variety of
issues including memory corruption.

The fix lookds good to me
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/ksm.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index db20f8436bc3..86f0db3d6cdb 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1990,6 +1990,7 @@ static void stable_tree_append(struct rmap_item *rmap_item,
>   */
>  static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  {
> +	struct mm_struct *mm = rmap_item->mm;
>  	struct rmap_item *tree_rmap_item;
>  	struct page *tree_page = NULL;
>  	struct stable_node *stable_node;
> @@ -2062,9 +2063,11 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  	if (ksm_use_zero_pages && (checksum == zero_checksum)) {
>  		struct vm_area_struct *vma;
>  
> -		vma = find_mergeable_vma(rmap_item->mm, rmap_item->address);
> +		down_read(&mm->mmap_sem);
> +		vma = find_mergeable_vma(mm, rmap_item->address);
>  		err = try_to_merge_one_page(vma, page,
>  					    ZERO_PAGE(rmap_item->address));
> +		up_read(&mm->mmap_sem);
>  		/*
>  		 * In case of failure, the page was not really empty, so we
>  		 * need to continue. Otherwise we're done.
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
  2017-09-13 11:25   ` Michal Hocko
@ 2017-09-13 13:46     ` Andrea Arcangeli
  -1 siblings, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2017-09-13 13:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill Tkhai, akpm, minchan, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, Hugh Dickins, sioh Lee

On Wed, Sep 13, 2017 at 01:25:09PM +0200, Michal Hocko wrote:
> [CC Claudio and Hugh]

Cc'ed Sioh as well.

> 
> On Mon 11-09-17 14:05:05, Kirill Tkhai wrote:
> > In this place mm is unlocked, so vmas or list may change.
> > Down read mmap_sem to protect them from modifications.
> > 
> > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> > (and compile-tested-by)
> 
> Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
> AFAICS. Maybe even CC: stable as unstable vma can cause large variety of
> issues including memory corruption.
> 
> The fix lookds good to me
> Acked-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

ksm_use_zero_pages is off by default, this is probably why it went
unnoticed.

Wondering if we should consider enabling ksm_use_zero_pages by default
on those arches that have few physical cache colors.

If we change the jhash2 to crc32c-intel like Sioh suggested (to
speedup the background scan), the chance of false positive and waste
of try_to_merge_one_page here will increase to one every 100k or so if
comparing random data against the zeropage (instead of the current
insignificant amount of false positives provided by jhash2 great
random uniformity).

So the ksm_use_zero_pages branch is currently missing a memcmp against
the ZERO_PAGE zeroes before calling write_protect_page. It's not a
functional bug because there's one last memcmp mandatory to run after
write protection, so this isn't destabilizing anything, but especially
if using crc32c (I suppose crc32cbe-vx is going to be much faster for
the background scan on s390 too) it would be a potential inefficiency
that wrprotects non zero pages by mistake once every 100k or more.

We never care about the cksum actual value, we only care if it changed
since the last pass, this is why ultimately I believe crc32c would
suffice for this purpose, it's extremely unlikely it won't change over
a data change. But it's definitely not suitable to find equality
across million of pages on large systems because it has a not suitable
random uniformity.

In short the "zero_checksum" variable should be dropped and the memcmp
will then materialize naturally after removing it.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page()
@ 2017-09-13 13:46     ` Andrea Arcangeli
  0 siblings, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2017-09-13 13:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill Tkhai, akpm, minchan, zhongjiang, mingo, imbrenda,
	kirill.shutemov, linux-mm, linux-kernel, Hugh Dickins, sioh Lee

On Wed, Sep 13, 2017 at 01:25:09PM +0200, Michal Hocko wrote:
> [CC Claudio and Hugh]

Cc'ed Sioh as well.

> 
> On Mon 11-09-17 14:05:05, Kirill Tkhai wrote:
> > In this place mm is unlocked, so vmas or list may change.
> > Down read mmap_sem to protect them from modifications.
> > 
> > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
> > (and compile-tested-by)
> 
> Fixes: e86c59b1b12d ("mm/ksm: improve deduplication of zero pages with colouring")
> AFAICS. Maybe even CC: stable as unstable vma can cause large variety of
> issues including memory corruption.
> 
> The fix lookds good to me
> Acked-by: Michal Hocko <mhocko@suse.com>

Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>

ksm_use_zero_pages is off by default, this is probably why it went
unnoticed.

Wondering if we should consider enabling ksm_use_zero_pages by default
on those arches that have few physical cache colors.

If we change the jhash2 to crc32c-intel like Sioh suggested (to
speedup the background scan), the chance of false positive and waste
of try_to_merge_one_page here will increase to one every 100k or so if
comparing random data against the zeropage (instead of the current
insignificant amount of false positives provided by jhash2 great
random uniformity).

So the ksm_use_zero_pages branch is currently missing a memcmp against
the ZERO_PAGE zeroes before calling write_protect_page. It's not a
functional bug because there's one last memcmp mandatory to run after
write protection, so this isn't destabilizing anything, but especially
if using crc32c (I suppose crc32cbe-vx is going to be much faster for
the background scan on s390 too) it would be a potential inefficiency
that wrprotects non zero pages by mistake once every 100k or more.

We never care about the cksum actual value, we only care if it changed
since the last pass, this is why ultimately I believe crc32c would
suffice for this purpose, it's extremely unlikely it won't change over
a data change. But it's definitely not suitable to find equality
across million of pages on large systems because it has a not suitable
random uniformity.

In short the "zero_checksum" variable should be dropped and the memcmp
will then materialize naturally after removing it.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-09-13 13:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-11 11:05 [PATCH] ksm: Fix unlocked iteration over vmas in cmp_and_merge_page() Kirill Tkhai
2017-09-11 11:05 ` Kirill Tkhai
2017-09-13 11:25 ` Michal Hocko
2017-09-13 11:25   ` Michal Hocko
2017-09-13 13:46   ` Andrea Arcangeli
2017-09-13 13:46     ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.