linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
@ 2018-02-07  7:00 Huang, Ying
  2018-02-07 16:27 ` Konrad Rzeszutek Wilk
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Huang, Ying @ 2018-02-07  7:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Huang, Ying,
	Konrad Rzeszutek Wilk, Dan Streetman, Seth Jennings, Minchan Kim,
	Tetsuo Handa, Shaohua Li, Michal Hocko, Johannes Weiner,
	Mel Gorman, Shakeel Butt, stable, Sergey Senozhatsky

From: Huang Ying <huang.ying.caritas@gmail.com>

It was reported by Sergey Senozhatsky that if THP (Transparent Huge
Page) and frontswap (via zswap) are both enabled, when memory goes low
so that swap is triggered, segfault and memory corruption will occur
in random user space applications as follow,

kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
 #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
 #1  0x00007fc08889c2f3 malloc (libc.so.6)
 #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
 #3  0x0000560e6005e75c n/a (urxvt)
 #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
 #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
 #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
 #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
 #8  0x0000560e6005cb55 ev_run (urxvt)
 #9  0x0000560e6003b9b9 main (urxvt)
 #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
 #11 0x0000560e6003f9da _start (urxvt)

After bisection, it was found the first bad commit is
bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
out").

The root cause is as follow.

When the pages are written to swap device during swapping out in
swap_writepage(), zswap (fontswap) is tried to compress the pages
instead to improve the performance.  But zswap (frontswap) will treat
THP as normal page, so only the head page is saved.  After swapping
in, tail pages will not be restored to its original contents, so cause
the memory corruption in the applications.

This is fixed via splitting THP before writing the page to swap device
if frontswap is enabled.  To deal with the situation where frontswap
is enabled at runtime, whether the page is THP is checked before using
frontswap during swapping out too.

Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Shaohua Li <shli@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: stable@vger.kernel.org # 4.14
Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")

Changelog:

v2:

- Move frontswap check into swapfile.c to avoid to make vmscan.c
  depends on frontswap.
---
 mm/page_io.c  | 2 +-
 mm/swapfile.c | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index b41cf9644585..6dca817ae7a0 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 		unlock_page(page);
 		goto out;
 	}
-	if (frontswap_store(page) == 0) {
+	if (!PageTransHuge(page) && frontswap_store(page) == 0) {
 		set_page_writeback(page);
 		unlock_page(page);
 		end_page_writeback(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 006047b16814..0b7c7883ce64 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
 
 	/* Only single cluster request supported */
 	WARN_ON_ONCE(n_goal > 1 && cluster);
+	/* Frontswap doesn't support THP */
+	if (frontswap_enabled() && cluster)
+		goto noswap;
 
 	avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
 	if (avail_pgs <= 0)
-- 
2.15.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07  7:00 [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled Huang, Ying
@ 2018-02-07 16:27 ` Konrad Rzeszutek Wilk
  2018-02-07 21:05 ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Konrad Rzeszutek Wilk @ 2018-02-07 16:27 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Huang Ying, Dan Streetman,
	Seth Jennings, Minchan Kim, Tetsuo Handa, Shaohua Li,
	Michal Hocko, Johannes Weiner, Mel Gorman, Shakeel Butt, stable,
	Sergey Senozhatsky

On Wed, Feb 07, 2018 at 03:00:35PM +0800, Huang, Ying wrote:
> From: Huang Ying <huang.ying.caritas@gmail.com>
> 
> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> Page) and frontswap (via zswap) are both enabled, when memory goes low
> so that swap is triggered, segfault and memory corruption will occur
> in random user space applications as follow,
> 
> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>  #3  0x0000560e6005e75c n/a (urxvt)
>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>  #8  0x0000560e6005cb55 ev_run (urxvt)
>  #9  0x0000560e6003b9b9 main (urxvt)
>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>  #11 0x0000560e6003f9da _start (urxvt)
> 
> After bisection, it was found the first bad commit is
> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> out").
> 
> The root cause is as follow.
> 
> When the pages are written to swap device during swapping out in
> swap_writepage(), zswap (fontswap) is tried to compress the pages
> instead to improve the performance.  But zswap (frontswap) will treat
> THP as normal page, so only the head page is saved.  After swapping
> in, tail pages will not be restored to its original contents, so cause
> the memory corruption in the applications.
> 
> This is fixed via splitting THP before writing the page to swap device
> if frontswap is enabled.  To deal with the situation where frontswap
> is enabled at runtime, whether the page is THP is checked before using
> frontswap during swapping out too.
> 
> Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

> Cc: Dan Streetman <ddstreet@ieee.org>
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: stable@vger.kernel.org # 4.14
> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> 
> Changelog:
> 
> v2:
> 
> - Move frontswap check into swapfile.c to avoid to make vmscan.c
>   depends on frontswap.
> ---
>  mm/page_io.c  | 2 +-
>  mm/swapfile.c | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index b41cf9644585..6dca817ae7a0 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  		unlock_page(page);
>  		goto out;
>  	}
> -	if (frontswap_store(page) == 0) {
> +	if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>  		set_page_writeback(page);
>  		unlock_page(page);
>  		end_page_writeback(page);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 006047b16814..0b7c7883ce64 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>  
>  	/* Only single cluster request supported */
>  	WARN_ON_ONCE(n_goal > 1 && cluster);
> +	/* Frontswap doesn't support THP */
> +	if (frontswap_enabled() && cluster)
> +		goto noswap;
>  
>  	avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>  	if (avail_pgs <= 0)
> -- 
> 2.15.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07  7:00 [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled Huang, Ying
  2018-02-07 16:27 ` Konrad Rzeszutek Wilk
@ 2018-02-07 21:05 ` Andrew Morton
  2018-02-08  1:28   ` Huang, Ying
  2018-02-08  1:36   ` Sergey Senozhatsky
  2018-02-08 10:17 ` Minchan Kim
  2018-02-08 15:27 ` huang ying
  3 siblings, 2 replies; 13+ messages in thread
From: Andrew Morton @ 2018-02-07 21:05 UTC (permalink / raw)
  To: Huang, Ying
  Cc: linux-mm, linux-kernel, Huang Ying, Konrad Rzeszutek Wilk,
	Dan Streetman, Seth Jennings, Minchan Kim, Tetsuo Handa,
	Shaohua Li, Michal Hocko, Johannes Weiner, Mel Gorman,
	Shakeel Butt, stable, Sergey Senozhatsky

On Wed,  7 Feb 2018 15:00:35 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:

> From: Huang Ying <huang.ying.caritas@gmail.com>
> 
> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> Page) and frontswap (via zswap) are both enabled, when memory goes low
> so that swap is triggered, segfault and memory corruption will occur
> in random user space applications as follow,
> 
> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>  #3  0x0000560e6005e75c n/a (urxvt)
>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>  #8  0x0000560e6005cb55 ev_run (urxvt)
>  #9  0x0000560e6003b9b9 main (urxvt)
>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>  #11 0x0000560e6003f9da _start (urxvt)
> 
> After bisection, it was found the first bad commit is
> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> out").
> 
> The root cause is as follow.
> 
> When the pages are written to swap device during swapping out in
> swap_writepage(), zswap (fontswap) is tried to compress the pages
> instead to improve the performance.  But zswap (frontswap) will treat
> THP as normal page, so only the head page is saved.  After swapping
> in, tail pages will not be restored to its original contents, so cause
> the memory corruption in the applications.
> 
> This is fixed via splitting THP before writing the page to swap device
> if frontswap is enabled.  To deal with the situation where frontswap
> is enabled at runtime, whether the page is THP is checked before using
> frontswap during swapping out too.
>
> ...
>
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  		unlock_page(page);
>  		goto out;
>  	}
> -	if (frontswap_store(page) == 0) {
> +	if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>  		set_page_writeback(page);
>  		unlock_page(page);
>  		end_page_writeback(page);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 006047b16814..0b7c7883ce64 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>  
>  	/* Only single cluster request supported */
>  	WARN_ON_ONCE(n_goal > 1 && cluster);
> +	/* Frontswap doesn't support THP */
> +	if (frontswap_enabled() && cluster)
> +		goto noswap;
>  

hm.  This is assuming that "cluster==true" means "this is thp swap". 
That's presently true, but is it appropriate that get_swap_pages() is
peeking at "cluster" to work out why it is being called?

Or would it be cleaner to do this in get_swap_page()?  Something like

--- a/mm/swap_slots.c~a
+++ a/mm/swap_slots.c
@@ -317,8 +317,11 @@ swp_entry_t get_swap_page(struct page *p
 	entry.val = 0;
 
 	if (PageTransHuge(page)) {
-		if (IS_ENABLED(CONFIG_THP_SWAP))
-			get_swap_pages(1, true, &entry);
+		/* Frontswap doesn't support THP */
+		if (!frontswap_enabled()) {
+			if (IS_ENABLED(CONFIG_THP_SWAP))
+				get_swap_pages(1, true, &entry);
+		}
 		return entry;
 	}
 
_

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07 21:05 ` Andrew Morton
@ 2018-02-08  1:28   ` Huang, Ying
  2018-02-08  1:36   ` Sergey Senozhatsky
  1 sibling, 0 replies; 13+ messages in thread
From: Huang, Ying @ 2018-02-08  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Konrad Rzeszutek Wilk,
	Dan Streetman, Seth Jennings, Minchan Kim, Tetsuo Handa,
	Shaohua Li, Michal Hocko, Johannes Weiner, Mel Gorman,
	Shakeel Butt, stable, Sergey Senozhatsky

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed,  7 Feb 2018 15:00:35 +0800 "Huang, Ying" <ying.huang@intel.com> wrote:
>
>> From: Huang Ying <huang.ying.caritas@gmail.com>
>> 
>> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> so that swap is triggered, segfault and memory corruption will occur
>> in random user space applications as follow,
>> 
>> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>>  #3  0x0000560e6005e75c n/a (urxvt)
>>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>>  #8  0x0000560e6005cb55 ev_run (urxvt)
>>  #9  0x0000560e6003b9b9 main (urxvt)
>>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>>  #11 0x0000560e6003f9da _start (urxvt)
>> 
>> After bisection, it was found the first bad commit is
>> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> out").
>> 
>> The root cause is as follow.
>> 
>> When the pages are written to swap device during swapping out in
>> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> instead to improve the performance.  But zswap (frontswap) will treat
>> THP as normal page, so only the head page is saved.  After swapping
>> in, tail pages will not be restored to its original contents, so cause
>> the memory corruption in the applications.
>> 
>> This is fixed via splitting THP before writing the page to swap device
>> if frontswap is enabled.  To deal with the situation where frontswap
>> is enabled at runtime, whether the page is THP is checked before using
>> frontswap during swapping out too.
>>
>> ...
>>
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>>  		unlock_page(page);
>>  		goto out;
>>  	}
>> -	if (frontswap_store(page) == 0) {
>> +	if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>>  		set_page_writeback(page);
>>  		unlock_page(page);
>>  		end_page_writeback(page);
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 006047b16814..0b7c7883ce64 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>>  
>>  	/* Only single cluster request supported */
>>  	WARN_ON_ONCE(n_goal > 1 && cluster);
>> +	/* Frontswap doesn't support THP */
>> +	if (frontswap_enabled() && cluster)
>> +		goto noswap;
>>  
>
> hm.  This is assuming that "cluster==true" means "this is thp swap". 
> That's presently true, but is it appropriate that get_swap_pages() is
> peeking at "cluster" to work out why it is being called?
>
> Or would it be cleaner to do this in get_swap_page()?  Something like
>
> --- a/mm/swap_slots.c~a
> +++ a/mm/swap_slots.c
> @@ -317,8 +317,11 @@ swp_entry_t get_swap_page(struct page *p
>  	entry.val = 0;
>  
>  	if (PageTransHuge(page)) {
> -		if (IS_ENABLED(CONFIG_THP_SWAP))
> -			get_swap_pages(1, true, &entry);
> +		/* Frontswap doesn't support THP */
> +		if (!frontswap_enabled()) {
> +			if (IS_ENABLED(CONFIG_THP_SWAP))
> +				get_swap_pages(1, true, &entry);
> +		}
>  		return entry;
>  	}
>  

Sure.  I will do this.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07 21:05 ` Andrew Morton
  2018-02-08  1:28   ` Huang, Ying
@ 2018-02-08  1:36   ` Sergey Senozhatsky
  2018-02-08 10:25     ` Minchan Kim
  1 sibling, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-08  1:36 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Huang, Ying, linux-mm, linux-kernel, Huang Ying,
	Konrad Rzeszutek Wilk, Dan Streetman, Seth Jennings,
	Tetsuo Handa, Shaohua Li, Michal Hocko, Johannes Weiner,
	Mel Gorman, Shakeel Butt, stable, Sergey Senozhatsky

On (02/07/18 13:05), Andrew Morton wrote:
[..]
> hm.  This is assuming that "cluster==true" means "this is thp swap". 
> That's presently true, but is it appropriate that get_swap_pages() is
> peeking at "cluster" to work out why it is being called?
> 
> Or would it be cleaner to do this in get_swap_page()?  Something like
> 
> --- a/mm/swap_slots.c~a
> +++ a/mm/swap_slots.c
> @@ -317,8 +317,11 @@ swp_entry_t get_swap_page(struct page *p
>  	entry.val = 0;
>  
>  	if (PageTransHuge(page)) {
> -		if (IS_ENABLED(CONFIG_THP_SWAP))
> -			get_swap_pages(1, true, &entry);
> +		/* Frontswap doesn't support THP */
> +		if (!frontswap_enabled()) {
> +			if (IS_ENABLED(CONFIG_THP_SWAP))
> +				get_swap_pages(1, true, &entry);
> +		}
>  		return entry;
>  	}

I have proposed exactly the same thing [1], Minchan commented that
it would introduce frontswap dependency to swap_slots.c [2]. Which
is true, but I'd still probably prefer to handle it all in
get_swap_page. Minchan, any objections?

[1] https://marc.info/?l=linux-mm&m=151791052007719&w=2
[2] https://marc.info/?l=linux-mm&m=151792646812617&w=2

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07  7:00 [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled Huang, Ying
  2018-02-07 16:27 ` Konrad Rzeszutek Wilk
  2018-02-07 21:05 ` Andrew Morton
@ 2018-02-08 10:17 ` Minchan Kim
  2018-02-08 15:17   ` huang ying
  2018-02-08 15:27 ` huang ying
  3 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2018-02-08 10:17 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Huang Ying,
	Konrad Rzeszutek Wilk, Dan Streetman, Seth Jennings,
	Tetsuo Handa, Shaohua Li, Michal Hocko, Johannes Weiner,
	Mel Gorman, Shakeel Butt, stable, Sergey Senozhatsky

On Wed, Feb 07, 2018 at 03:00:35PM +0800, Huang, Ying wrote:
> From: Huang Ying <huang.ying.caritas@gmail.com>
> 
> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> Page) and frontswap (via zswap) are both enabled, when memory goes low
> so that swap is triggered, segfault and memory corruption will occur
> in random user space applications as follow,
> 
> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>  #3  0x0000560e6005e75c n/a (urxvt)
>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>  #8  0x0000560e6005cb55 ev_run (urxvt)
>  #9  0x0000560e6003b9b9 main (urxvt)
>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>  #11 0x0000560e6003f9da _start (urxvt)
> 
> After bisection, it was found the first bad commit is
> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> out").
> 
> The root cause is as follow.
> 
> When the pages are written to swap device during swapping out in
> swap_writepage(), zswap (fontswap) is tried to compress the pages
> instead to improve the performance.  But zswap (frontswap) will treat
> THP as normal page, so only the head page is saved.  After swapping
> in, tail pages will not be restored to its original contents, so cause
> the memory corruption in the applications.
> 
> This is fixed via splitting THP before writing the page to swap device
> if frontswap is enabled.  To deal with the situation where frontswap
> is enabled at runtime, whether the page is THP is checked before using
> frontswap during swapping out too.
> 
> Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Dan Streetman <ddstreet@ieee.org>
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: stable@vger.kernel.org # 4.14
> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> 
> Changelog:
> 
> v2:
> 
> - Move frontswap check into swapfile.c to avoid to make vmscan.c
>   depends on frontswap.
> ---
>  mm/page_io.c  | 2 +-
>  mm/swapfile.c | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_io.c b/mm/page_io.c
> index b41cf9644585..6dca817ae7a0 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  		unlock_page(page);
>  		goto out;
>  	}
> -	if (frontswap_store(page) == 0) {
> +	if (!PageTransHuge(page) && frontswap_store(page) == 0) {

Why do we need this?

If frontswap_enabled is enabled but it doesn't support THP, it doesn't allow
cluster allocation by below logic so any THP page shouldn't come this path.
What do I missing now?

>  		set_page_writeback(page);
>  		unlock_page(page);
>  		end_page_writeback(page);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 006047b16814..0b7c7883ce64 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>  
>  	/* Only single cluster request supported */
>  	WARN_ON_ONCE(n_goal > 1 && cluster);
> +	/* Frontswap doesn't support THP */
> +	if (frontswap_enabled() && cluster)
> +		goto noswap;
>  
>  	avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>  	if (avail_pgs <= 0)
> -- 
> 2.15.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08  1:36   ` Sergey Senozhatsky
@ 2018-02-08 10:25     ` Minchan Kim
  2018-02-08 11:22       ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Minchan Kim @ 2018-02-08 10:25 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Huang, Ying, linux-mm, linux-kernel, Huang Ying,
	Konrad Rzeszutek Wilk, Dan Streetman, Seth Jennings,
	Tetsuo Handa, Shaohua Li, Michal Hocko, Johannes Weiner,
	Mel Gorman, Shakeel Butt, stable, Sergey Senozhatsky

On Thu, Feb 08, 2018 at 10:36:35AM +0900, Sergey Senozhatsky wrote:
> On (02/07/18 13:05), Andrew Morton wrote:
> [..]
> > hm.  This is assuming that "cluster==true" means "this is thp swap". 
> > That's presently true, but is it appropriate that get_swap_pages() is
> > peeking at "cluster" to work out why it is being called?
> > 
> > Or would it be cleaner to do this in get_swap_page()?  Something like
> > 
> > --- a/mm/swap_slots.c~a
> > +++ a/mm/swap_slots.c
> > @@ -317,8 +317,11 @@ swp_entry_t get_swap_page(struct page *p
> >  	entry.val = 0;
> >  
> >  	if (PageTransHuge(page)) {
> > -		if (IS_ENABLED(CONFIG_THP_SWAP))
> > -			get_swap_pages(1, true, &entry);
> > +		/* Frontswap doesn't support THP */
> > +		if (!frontswap_enabled()) {
> > +			if (IS_ENABLED(CONFIG_THP_SWAP))
> > +				get_swap_pages(1, true, &entry);
> > +		}
> >  		return entry;
> >  	}
> 
> I have proposed exactly the same thing [1], Minchan commented that
> it would introduce frontswap dependency to swap_slots.c [2]. Which
> is true, but I'd still probably prefer to handle it all in
> get_swap_page. Minchan, any objections?

I didn't want to spread out frontswap stuff unless it has good value
because most of frontswap functions are located in mm/swapfile.c
at this moment. It gives me good feeling frontswap's abstraction
is wonderful.
However, if frontswap matainer has no problem, I am not against, either.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08 10:25     ` Minchan Kim
@ 2018-02-08 11:22       ` Sergey Senozhatsky
  0 siblings, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2018-02-08 11:22 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Huang, Ying, linux-mm,
	linux-kernel, Huang Ying, Konrad Rzeszutek Wilk, Dan Streetman,
	Seth Jennings, Tetsuo Handa, Shaohua Li, Michal Hocko,
	Johannes Weiner, Mel Gorman, Shakeel Butt, stable,
	Sergey Senozhatsky

On (02/08/18 02:25), Minchan Kim wrote:
[..]
> > >  	if (PageTransHuge(page)) {
> > > -		if (IS_ENABLED(CONFIG_THP_SWAP))
> > > -			get_swap_pages(1, true, &entry);
> > > +		/* Frontswap doesn't support THP */
> > > +		if (!frontswap_enabled()) {
> > > +			if (IS_ENABLED(CONFIG_THP_SWAP))
> > > +				get_swap_pages(1, true, &entry);
> > > +		}
> > >  		return entry;
> > >  	}
> > 
> > I have proposed exactly the same thing [1], Minchan commented that
> > it would introduce frontswap dependency to swap_slots.c [2]. Which
> > is true, but I'd still probably prefer to handle it all in
> > get_swap_page. Minchan, any objections?
> 
> I didn't want to spread out frontswap stuff unless it has good value
> because most of frontswap functions are located in mm/swapfile.c
> at this moment.

Sure, your points are perfectly valid. At the same time it might be the
case that we already kind of expose that THP dependency thing to vmscan.
The whole

	if (!add_to_swap()) {
		if (!PageTransHuge(page))
			goto activate_locked;

		split_huge_page_to_list(page);
		add_to_swap(page);
	}

looks a bit suspicious - if add_to_swap() fails and the page is THP then
split it and add_to_swap() again.

	-ss

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08 10:17 ` Minchan Kim
@ 2018-02-08 15:17   ` huang ying
  0 siblings, 0 replies; 13+ messages in thread
From: huang ying @ 2018-02-08 15:17 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Huang, Ying, Andrew Morton, linux-mm, LKML,
	Konrad Rzeszutek Wilk, Dan Streetman, Seth Jennings,
	Tetsuo Handa, Shaohua Li, Michal Hocko, Johannes Weiner,
	Mel Gorman, Shakeel Butt, stable, Sergey Senozhatsky

On Thu, Feb 8, 2018 at 6:17 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Wed, Feb 07, 2018 at 03:00:35PM +0800, Huang, Ying wrote:
>> From: Huang Ying <huang.ying.caritas@gmail.com>
>>
>> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> Page) and frontswap (via zswap) are both enabled, when memory goes low
>> so that swap is triggered, segfault and memory corruption will occur
>> in random user space applications as follow,
>>
>> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>>  #3  0x0000560e6005e75c n/a (urxvt)
>>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>>  #8  0x0000560e6005cb55 ev_run (urxvt)
>>  #9  0x0000560e6003b9b9 main (urxvt)
>>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>>  #11 0x0000560e6003f9da _start (urxvt)
>>
>> After bisection, it was found the first bad commit is
>> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> out").
>>
>> The root cause is as follow.
>>
>> When the pages are written to swap device during swapping out in
>> swap_writepage(), zswap (fontswap) is tried to compress the pages
>> instead to improve the performance.  But zswap (frontswap) will treat
>> THP as normal page, so only the head page is saved.  After swapping
>> in, tail pages will not be restored to its original contents, so cause
>> the memory corruption in the applications.
>>
>> This is fixed via splitting THP before writing the page to swap device
>> if frontswap is enabled.  To deal with the situation where frontswap
>> is enabled at runtime, whether the page is THP is checked before using
>> frontswap during swapping out too.
>>
>> Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Dan Streetman <ddstreet@ieee.org>
>> Cc: Seth Jennings <sjenning@redhat.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> Cc: Shaohua Li <shli@kernel.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Shakeel Butt <shakeelb@google.com>
>> Cc: stable@vger.kernel.org # 4.14
>> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
>>
>> Changelog:
>>
>> v2:
>>
>> - Move frontswap check into swapfile.c to avoid to make vmscan.c
>>   depends on frontswap.
>> ---
>>  mm/page_io.c  | 2 +-
>>  mm/swapfile.c | 3 +++
>>  2 files changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_io.c b/mm/page_io.c
>> index b41cf9644585..6dca817ae7a0 100644
>> --- a/mm/page_io.c
>> +++ b/mm/page_io.c
>> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>>               unlock_page(page);
>>               goto out;
>>       }
>> -     if (frontswap_store(page) == 0) {
>> +     if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>
> Why do we need this?
>
> If frontswap_enabled is enabled but it doesn't support THP, it doesn't allow
> cluster allocation by below logic so any THP page shouldn't come this path.
> What do I missing now?

If frontswap_enabled() becomes true at runtime after swap cluster
allocation but before swap_writepage(), this can prevent memory
corruption.  I know this isn't true now.  Because frontswap isn't very
dynamic now.  But I still think this is good thing to do.

Best Regards,
Huang, Ying

>>               set_page_writeback(page);
>>               unlock_page(page);
>>               end_page_writeback(page);
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 006047b16814..0b7c7883ce64 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>>
>>       /* Only single cluster request supported */
>>       WARN_ON_ONCE(n_goal > 1 && cluster);
>> +     /* Frontswap doesn't support THP */
>> +     if (frontswap_enabled() && cluster)
>> +             goto noswap;
>>
>>       avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>>       if (avail_pgs <= 0)
>> --
>> 2.15.1
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-07  7:00 [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled Huang, Ying
                   ` (2 preceding siblings ...)
  2018-02-08 10:17 ` Minchan Kim
@ 2018-02-08 15:27 ` huang ying
  2018-02-08 17:37   ` Minchan Kim
  3 siblings, 1 reply; 13+ messages in thread
From: huang ying @ 2018-02-08 15:27 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk, Minchan Kim, Andrew Morton
  Cc: linux-mm, LKML, Dan Streetman, Seth Jennings, Tetsuo Handa,
	Shaohua Li, Michal Hocko, Johannes Weiner, Mel Gorman,
	Shakeel Butt, stable, Sergey Senozhatsky, Huang, Ying

On Wed, Feb 7, 2018 at 3:00 PM, Huang, Ying <ying.huang@intel.com> wrote:
> From: Huang Ying <huang.ying.caritas@gmail.com>
>
> It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> Page) and frontswap (via zswap) are both enabled, when memory goes low
> so that swap is triggered, segfault and memory corruption will occur
> in random user space applications as follow,
>
> kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>  #3  0x0000560e6005e75c n/a (urxvt)
>  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>  #8  0x0000560e6005cb55 ev_run (urxvt)
>  #9  0x0000560e6003b9b9 main (urxvt)
>  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>  #11 0x0000560e6003f9da _start (urxvt)
>
> After bisection, it was found the first bad commit is
> bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> out").
>
> The root cause is as follow.
>
> When the pages are written to swap device during swapping out in
> swap_writepage(), zswap (fontswap) is tried to compress the pages
> instead to improve the performance.  But zswap (frontswap) will treat
> THP as normal page, so only the head page is saved.  After swapping
> in, tail pages will not be restored to its original contents, so cause
> the memory corruption in the applications.
>
> This is fixed via splitting THP before writing the page to swap device
> if frontswap is enabled.  To deal with the situation where frontswap
> is enabled at runtime, whether the page is THP is checked before using
> frontswap during swapping out too.
>
> Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Dan Streetman <ddstreet@ieee.org>
> Cc: Seth Jennings <sjenning@redhat.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: stable@vger.kernel.org # 4.14
> Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
>
> Changelog:
>
> v2:
>
> - Move frontswap check into swapfile.c to avoid to make vmscan.c
>   depends on frontswap.
> ---
>  mm/page_io.c  | 2 +-
>  mm/swapfile.c | 3 +++
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_io.c b/mm/page_io.c
> index b41cf9644585..6dca817ae7a0 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>                 unlock_page(page);
>                 goto out;
>         }
> -       if (frontswap_store(page) == 0) {
> +       if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>                 set_page_writeback(page);
>                 unlock_page(page);
>                 end_page_writeback(page);
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 006047b16814..0b7c7883ce64 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>
>         /* Only single cluster request supported */
>         WARN_ON_ONCE(n_goal > 1 && cluster);
> +       /* Frontswap doesn't support THP */
> +       if (frontswap_enabled() && cluster)
> +               goto noswap;

I found this will cause THP swap optimization be turned off forever if
CONFIG_ZSWAP=y (which cannot =m).  Because frontswap is enabled quite
statically instead of dynamically.  If frontswap_ops is registered, it
will be enabled unconditionally and forever.  And zswap will register
frontswap_ops during initialize regardless whether zswap is enabled or
not.

So I think it will be better to remove swapfile.c changes in this
patch, just keep page_io.c changes.  Because THP is more dynamic, it
is usually used if madvised by default.  And even if it is used always
by default, this can be changed dynamically.  And even THP is used,
zswap can still be used for non-THP pages.

Best Regards,
Huang, Ying

>
>         avail_pgs = atomic_long_read(&nr_swap_pages) / nr_pages;
>         if (avail_pgs <= 0)
> --
> 2.15.1
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08 15:27 ` huang ying
@ 2018-02-08 17:37   ` Minchan Kim
  2018-02-09  0:39     ` Huang, Ying
  2018-02-12 16:26     ` Dan Streetman
  0 siblings, 2 replies; 13+ messages in thread
From: Minchan Kim @ 2018-02-08 17:37 UTC (permalink / raw)
  To: huang ying
  Cc: Konrad Rzeszutek Wilk, Andrew Morton, linux-mm, LKML,
	Dan Streetman, Seth Jennings, Tetsuo Handa, Shaohua Li,
	Michal Hocko, Johannes Weiner, Mel Gorman, Shakeel Butt, stable,
	Sergey Senozhatsky, Huang, Ying

Hi Huang,

On Thu, Feb 08, 2018 at 11:27:50PM +0800, huang ying wrote:
> On Wed, Feb 7, 2018 at 3:00 PM, Huang, Ying <ying.huang@intel.com> wrote:
> > From: Huang Ying <huang.ying.caritas@gmail.com>
> >
> > It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> > Page) and frontswap (via zswap) are both enabled, when memory goes low
> > so that swap is triggered, segfault and memory corruption will occur
> > in random user space applications as follow,
> >
> > kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
> >  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
> >  #1  0x00007fc08889c2f3 malloc (libc.so.6)
> >  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
> >  #3  0x0000560e6005e75c n/a (urxvt)
> >  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
> >  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
> >  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
> >  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
> >  #8  0x0000560e6005cb55 ev_run (urxvt)
> >  #9  0x0000560e6003b9b9 main (urxvt)
> >  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
> >  #11 0x0000560e6003f9da _start (urxvt)
> >
> > After bisection, it was found the first bad commit is
> > bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> > out").
> >
> > The root cause is as follow.
> >
> > When the pages are written to swap device during swapping out in
> > swap_writepage(), zswap (fontswap) is tried to compress the pages
> > instead to improve the performance.  But zswap (frontswap) will treat
> > THP as normal page, so only the head page is saved.  After swapping
> > in, tail pages will not be restored to its original contents, so cause
> > the memory corruption in the applications.
> >
> > This is fixed via splitting THP before writing the page to swap device
> > if frontswap is enabled.  To deal with the situation where frontswap
> > is enabled at runtime, whether the page is THP is checked before using
> > frontswap during swapping out too.
> >
> > Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Cc: Dan Streetman <ddstreet@ieee.org>
> > Cc: Seth Jennings <sjenning@redhat.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Cc: Shaohua Li <shli@kernel.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Shakeel Butt <shakeelb@google.com>
> > Cc: stable@vger.kernel.org # 4.14
> > Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> >
> > Changelog:
> >
> > v2:
> >
> > - Move frontswap check into swapfile.c to avoid to make vmscan.c
> >   depends on frontswap.
> > ---
> >  mm/page_io.c  | 2 +-
> >  mm/swapfile.c | 3 +++
> >  2 files changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index b41cf9644585..6dca817ae7a0 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> >                 unlock_page(page);
> >                 goto out;
> >         }
> > -       if (frontswap_store(page) == 0) {
> > +       if (!PageTransHuge(page) && frontswap_store(page) == 0) {
> >                 set_page_writeback(page);
> >                 unlock_page(page);
> >                 end_page_writeback(page);
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 006047b16814..0b7c7883ce64 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
> >
> >         /* Only single cluster request supported */
> >         WARN_ON_ONCE(n_goal > 1 && cluster);
> > +       /* Frontswap doesn't support THP */
> > +       if (frontswap_enabled() && cluster)
> > +               goto noswap;
> 
> I found this will cause THP swap optimization be turned off forever if
> CONFIG_ZSWAP=y (which cannot =m).  Because frontswap is enabled quite
> statically instead of dynamically.  If frontswap_ops is registered, it
> will be enabled unconditionally and forever.  And zswap will register
> frontswap_ops during initialize regardless whether zswap is enabled or
> not.

Indeed.

> 
> So I think it will be better to remove swapfile.c changes in this
> patch, just keep page_io.c changes.  Because THP is more dynamic, it

Then, I think it should be done by frontswap backend rather than generic
swap layer. Because there are two backends now and one of them can support
first.

diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
index bf13d1ec51f3..bdaf309aeea6 100644
--- a/drivers/xen/tmem.c
+++ b/drivers/xen/tmem.c
@@ -284,6 +284,9 @@ static int tmem_frontswap_store(unsigned type, pgoff_t offset,
        int pool = tmem_frontswap_poolid;
        int ret;
 
+       if (PageTransHuge(page))
+               return -EINVAL;
+
        if (pool < 0)
                return -1;
        if (ind64 != ind)
diff --git a/mm/zswap.c b/mm/zswap.c
index c004aa4fd3f4..e343534d2892 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1007,6 +1007,9 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
        u8 *src, *dst;
        struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) };
 
+       if (PageTransHuge(page))
+               return -EINVAL;
+
        if (!zswap_enabled || !tree) {
                ret = -ENODEV;
                goto reject;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08 17:37   ` Minchan Kim
@ 2018-02-09  0:39     ` Huang, Ying
  2018-02-12 16:26     ` Dan Streetman
  1 sibling, 0 replies; 13+ messages in thread
From: Huang, Ying @ 2018-02-09  0:39 UTC (permalink / raw)
  To: Minchan Kim
  Cc: huang ying, Konrad Rzeszutek Wilk, Andrew Morton, linux-mm, LKML,
	Dan Streetman, Seth Jennings, Tetsuo Handa, Shaohua Li,
	Michal Hocko, Johannes Weiner, Mel Gorman, Shakeel Butt, stable,
	Sergey Senozhatsky

Minchan Kim <minchan@kernel.org> writes:

> Hi Huang,
>
> On Thu, Feb 08, 2018 at 11:27:50PM +0800, huang ying wrote:
>> On Wed, Feb 7, 2018 at 3:00 PM, Huang, Ying <ying.huang@intel.com> wrote:
>> > From: Huang Ying <huang.ying.caritas@gmail.com>
>> >
>> > It was reported by Sergey Senozhatsky that if THP (Transparent Huge
>> > Page) and frontswap (via zswap) are both enabled, when memory goes low
>> > so that swap is triggered, segfault and memory corruption will occur
>> > in random user space applications as follow,
>> >
>> > kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
>> >  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
>> >  #1  0x00007fc08889c2f3 malloc (libc.so.6)
>> >  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
>> >  #3  0x0000560e6005e75c n/a (urxvt)
>> >  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
>> >  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
>> >  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
>> >  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
>> >  #8  0x0000560e6005cb55 ev_run (urxvt)
>> >  #9  0x0000560e6003b9b9 main (urxvt)
>> >  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
>> >  #11 0x0000560e6003f9da _start (urxvt)
>> >
>> > After bisection, it was found the first bad commit is
>> > bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
>> > out").
>> >
>> > The root cause is as follow.
>> >
>> > When the pages are written to swap device during swapping out in
>> > swap_writepage(), zswap (fontswap) is tried to compress the pages
>> > instead to improve the performance.  But zswap (frontswap) will treat
>> > THP as normal page, so only the head page is saved.  After swapping
>> > in, tail pages will not be restored to its original contents, so cause
>> > the memory corruption in the applications.
>> >
>> > This is fixed via splitting THP before writing the page to swap device
>> > if frontswap is enabled.  To deal with the situation where frontswap
>> > is enabled at runtime, whether the page is THP is checked before using
>> > frontswap during swapping out too.
>> >
>> > Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
>> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > Cc: Dan Streetman <ddstreet@ieee.org>
>> > Cc: Seth Jennings <sjenning@redhat.com>
>> > Cc: Minchan Kim <minchan@kernel.org>
>> > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>> > Cc: Shaohua Li <shli@kernel.org>
>> > Cc: Michal Hocko <mhocko@suse.com>
>> > Cc: Johannes Weiner <hannes@cmpxchg.org>
>> > Cc: Mel Gorman <mgorman@techsingularity.net>
>> > Cc: Shakeel Butt <shakeelb@google.com>
>> > Cc: stable@vger.kernel.org # 4.14
>> > Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
>> >
>> > Changelog:
>> >
>> > v2:
>> >
>> > - Move frontswap check into swapfile.c to avoid to make vmscan.c
>> >   depends on frontswap.
>> > ---
>> >  mm/page_io.c  | 2 +-
>> >  mm/swapfile.c | 3 +++
>> >  2 files changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/page_io.c b/mm/page_io.c
>> > index b41cf9644585..6dca817ae7a0 100644
>> > --- a/mm/page_io.c
>> > +++ b/mm/page_io.c
>> > @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>> >                 unlock_page(page);
>> >                 goto out;
>> >         }
>> > -       if (frontswap_store(page) == 0) {
>> > +       if (!PageTransHuge(page) && frontswap_store(page) == 0) {
>> >                 set_page_writeback(page);
>> >                 unlock_page(page);
>> >                 end_page_writeback(page);
>> > diff --git a/mm/swapfile.c b/mm/swapfile.c
>> > index 006047b16814..0b7c7883ce64 100644
>> > --- a/mm/swapfile.c
>> > +++ b/mm/swapfile.c
>> > @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
>> >
>> >         /* Only single cluster request supported */
>> >         WARN_ON_ONCE(n_goal > 1 && cluster);
>> > +       /* Frontswap doesn't support THP */
>> > +       if (frontswap_enabled() && cluster)
>> > +               goto noswap;
>> 
>> I found this will cause THP swap optimization be turned off forever if
>> CONFIG_ZSWAP=y (which cannot =m).  Because frontswap is enabled quite
>> statically instead of dynamically.  If frontswap_ops is registered, it
>> will be enabled unconditionally and forever.  And zswap will register
>> frontswap_ops during initialize regardless whether zswap is enabled or
>> not.
>
> Indeed.
>
>> 
>> So I think it will be better to remove swapfile.c changes in this
>> patch, just keep page_io.c changes.  Because THP is more dynamic, it
>
> Then, I think it should be done by frontswap backend rather than generic
> swap layer. Because there are two backends now and one of them can support
> first.
>
> diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
> index bf13d1ec51f3..bdaf309aeea6 100644
> --- a/drivers/xen/tmem.c
> +++ b/drivers/xen/tmem.c
> @@ -284,6 +284,9 @@ static int tmem_frontswap_store(unsigned type, pgoff_t offset,
>         int pool = tmem_frontswap_poolid;
>         int ret;
>  
> +       if (PageTransHuge(page))
> +               return -EINVAL;
> +
>         if (pool < 0)
>                 return -1;
>         if (ind64 != ind)
> diff --git a/mm/zswap.c b/mm/zswap.c
> index c004aa4fd3f4..e343534d2892 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1007,6 +1007,9 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>         u8 *src, *dst;
>         struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) };
>  
> +       if (PageTransHuge(page))
> +               return -EINVAL;
> +
>         if (!zswap_enabled || !tree) {
>                 ret = -ENODEV;
>                 goto reject;

Good suggestion!  I will do this.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled
  2018-02-08 17:37   ` Minchan Kim
  2018-02-09  0:39     ` Huang, Ying
@ 2018-02-12 16:26     ` Dan Streetman
  1 sibling, 0 replies; 13+ messages in thread
From: Dan Streetman @ 2018-02-12 16:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: huang ying, Konrad Rzeszutek Wilk, Andrew Morton, linux-mm, LKML,
	Seth Jennings, Tetsuo Handa, Shaohua Li, Michal Hocko,
	Johannes Weiner, Mel Gorman, Shakeel Butt, stable,
	Sergey Senozhatsky, Huang, Ying


On Thu, 8 Feb 2018, Minchan Kim wrote:

> Hi Huang,
> 
> On Thu, Feb 08, 2018 at 11:27:50PM +0800, huang ying wrote:
> > On Wed, Feb 7, 2018 at 3:00 PM, Huang, Ying <ying.huang@intel.com> wrote:
> > > From: Huang Ying <huang.ying.caritas@gmail.com>
> > >
> > > It was reported by Sergey Senozhatsky that if THP (Transparent Huge
> > > Page) and frontswap (via zswap) are both enabled, when memory goes low
> > > so that swap is triggered, segfault and memory corruption will occur
> > > in random user space applications as follow,
> > >
> > > kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
> > >  #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
> > >  #1  0x00007fc08889c2f3 malloc (libc.so.6)
> > >  #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
> > >  #3  0x0000560e6005e75c n/a (urxvt)
> > >  #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
> > >  #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
> > >  #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
> > >  #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
> > >  #8  0x0000560e6005cb55 ev_run (urxvt)
> > >  #9  0x0000560e6003b9b9 main (urxvt)
> > >  #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
> > >  #11 0x0000560e6003f9da _start (urxvt)
> > >
> > > After bisection, it was found the first bad commit is
> > > bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped
> > > out").
> > >
> > > The root cause is as follow.
> > >
> > > When the pages are written to swap device during swapping out in
> > > swap_writepage(), zswap (fontswap) is tried to compress the pages
> > > instead to improve the performance.  But zswap (frontswap) will treat
> > > THP as normal page, so only the head page is saved.  After swapping
> > > in, tail pages will not be restored to its original contents, so cause
> > > the memory corruption in the applications.
> > >
> > > This is fixed via splitting THP before writing the page to swap device
> > > if frontswap is enabled.  To deal with the situation where frontswap
> > > is enabled at runtime, whether the page is THP is checked before using
> > > frontswap during swapping out too.
> > >
> > > Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> > > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Cc: Dan Streetman <ddstreet@ieee.org>
> > > Cc: Seth Jennings <sjenning@redhat.com>
> > > Cc: Minchan Kim <minchan@kernel.org>
> > > Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Cc: Shaohua Li <shli@kernel.org>
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Cc: Mel Gorman <mgorman@techsingularity.net>
> > > Cc: Shakeel Butt <shakeelb@google.com>
> > > Cc: stable@vger.kernel.org # 4.14
> > > Fixes: bd4c82c22c367e068 ("mm, THP, swap: delay splitting THP after swapped out")
> > >
> > > Changelog:
> > >
> > > v2:
> > >
> > > - Move frontswap check into swapfile.c to avoid to make vmscan.c
> > >   depends on frontswap.
> > > ---
> > >  mm/page_io.c  | 2 +-
> > >  mm/swapfile.c | 3 +++
> > >  2 files changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/mm/page_io.c b/mm/page_io.c
> > > index b41cf9644585..6dca817ae7a0 100644
> > > --- a/mm/page_io.c
> > > +++ b/mm/page_io.c
> > > @@ -250,7 +250,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> > >                 unlock_page(page);
> > >                 goto out;
> > >         }
> > > -       if (frontswap_store(page) == 0) {
> > > +       if (!PageTransHuge(page) && frontswap_store(page) == 0) {
> > >                 set_page_writeback(page);
> > >                 unlock_page(page);
> > >                 end_page_writeback(page);
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index 006047b16814..0b7c7883ce64 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -934,6 +934,9 @@ int get_swap_pages(int n_goal, bool cluster, swp_entry_t swp_entries[])
> > >
> > >         /* Only single cluster request supported */
> > >         WARN_ON_ONCE(n_goal > 1 && cluster);
> > > +       /* Frontswap doesn't support THP */
> > > +       if (frontswap_enabled() && cluster)
> > > +               goto noswap;
> > 
> > I found this will cause THP swap optimization be turned off forever if
> > CONFIG_ZSWAP=y (which cannot =m).  Because frontswap is enabled quite
> > statically instead of dynamically.  If frontswap_ops is registered, it
> > will be enabled unconditionally and forever.  And zswap will register
> > frontswap_ops during initialize regardless whether zswap is enabled or
> > not.
> 
> Indeed.
> 
> > 
> > So I think it will be better to remove swapfile.c changes in this
> > patch, just keep page_io.c changes.  Because THP is more dynamic, it
> 
> Then, I think it should be done by frontswap backend rather than generic
> swap layer. Because there are two backends now and one of them can support
> first.

I like this approach.  It allows zswap or xen tmem to support THP in the 
future.

> 
> diff --git a/drivers/xen/tmem.c b/drivers/xen/tmem.c
> index bf13d1ec51f3..bdaf309aeea6 100644
> --- a/drivers/xen/tmem.c
> +++ b/drivers/xen/tmem.c
> @@ -284,6 +284,9 @@ static int tmem_frontswap_store(unsigned type, pgoff_t offset,
>         int pool = tmem_frontswap_poolid;
>         int ret;
>  
> +       if (PageTransHuge(page))
> +               return -EINVAL;
> +
>         if (pool < 0)
>                 return -1;
>         if (ind64 != ind)
> diff --git a/mm/zswap.c b/mm/zswap.c
> index c004aa4fd3f4..e343534d2892 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -1007,6 +1007,9 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>         u8 *src, *dst;
>         struct zswap_header zhdr = { .swpentry = swp_entry(type, offset) };
>  
> +       if (PageTransHuge(page))
> +               return -EINVAL;
> +
>         if (!zswap_enabled || !tree) {
>                 ret = -ENODEV;
>                 goto reject;
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2018-02-12 16:26 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-07  7:00 [PATCH -mm -v2] mm, swap, frontswap: Fix THP swap if frontswap enabled Huang, Ying
2018-02-07 16:27 ` Konrad Rzeszutek Wilk
2018-02-07 21:05 ` Andrew Morton
2018-02-08  1:28   ` Huang, Ying
2018-02-08  1:36   ` Sergey Senozhatsky
2018-02-08 10:25     ` Minchan Kim
2018-02-08 11:22       ` Sergey Senozhatsky
2018-02-08 10:17 ` Minchan Kim
2018-02-08 15:17   ` huang ying
2018-02-08 15:27 ` huang ying
2018-02-08 17:37   ` Minchan Kim
2018-02-09  0:39     ` Huang, Ying
2018-02-12 16:26     ` Dan Streetman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).