linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/2] mm: fix race condition in MADV_FREE
@ 2017-09-22 18:46 Shaohua Li
  2017-09-22 18:46 ` [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
  2017-09-22 18:46 ` [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
  0 siblings, 2 replies; 7+ messages in thread
From: Shaohua Li @ 2017-09-22 18:46 UTC (permalink / raw)
  To: linux-mm; +Cc: Artem Savkov, Kernel-team, Shaohua Li

From: Shaohua Li <shli@fb.com>

Artem Savkov reported a race condition[1] in MADV_FREE. MADV_FREE clear pte
dirty bit and then mark the page lazyfree. There is no lock to prevent the
page is added to swap cache between these two steps by page reclaim. There are
two problems:
- page in swapcache is marked lazyfree (clear SwapBacked). This confuses some
  code pathes, like page fault handling.
- The page is added into swapcache, and freed but the page isn't swapout
  because pte isn't dity. This will cause data corruption.

The patches will fix the issues.

Thanks,
Shaohua

V1->V2:
- dirty page in add_to_swap instead of in shrink_page_list as suggested by Minchan

Shaohua Li (2):
  mm: avoid marking swap cached page as lazyfree
  mm: fix data corruption caused by lazyfree page

 mm/swap.c       | 4 ++--
 mm/swap_state.c | 3 ++-
 2 files changed, 4 insertions(+), 3 deletions(-)

-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree
  2017-09-22 18:46 [PATCH V2 0/2] mm: fix race condition in MADV_FREE Shaohua Li
@ 2017-09-22 18:46 ` Shaohua Li
  2017-09-25  5:07   ` Minchan Kim
  2017-09-26 13:07   ` Michal Hocko
  2017-09-22 18:46 ` [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
  1 sibling, 2 replies; 7+ messages in thread
From: Shaohua Li @ 2017-09-22 18:46 UTC (permalink / raw)
  To: linux-mm
  Cc: Artem Savkov, Kernel-team, Shaohua Li, stable, Johannes Weiner,
	Michal Hocko, Hillf Danton, Minchan Kim, Hugh Dickins,
	Mel Gorman, Andrew Morton

From: Shaohua Li <shli@fb.com>

MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap cache
between these two steps by page reclaim. If the page is added to swap
cache, marking the page lazyfree will confuse page fault if the page is
reclaimed and refault.

Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>
Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/swap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 9295ae9..a77d68f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
 			    void *arg)
 {
 	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
-	    !PageUnevictable(page)) {
+	    !PageSwapCache(page) && !PageUnevictable(page)) {
 		bool active = PageActive(page);
 
 		del_page_from_lru_list(page, lruvec,
@@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page)
 void mark_page_lazyfree(struct page *page)
 {
 	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
-	    !PageUnevictable(page)) {
+	    !PageSwapCache(page) && !PageUnevictable(page)) {
 		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
 
 		get_page(page);
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page
  2017-09-22 18:46 [PATCH V2 0/2] mm: fix race condition in MADV_FREE Shaohua Li
  2017-09-22 18:46 ` [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
@ 2017-09-22 18:46 ` Shaohua Li
  2017-09-26 13:13   ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Shaohua Li @ 2017-09-22 18:46 UTC (permalink / raw)
  To: linux-mm
  Cc: Artem Savkov, Kernel-team, Shaohua Li, stable, Johannes Weiner,
	Michal Hocko, Hillf Danton, Minchan Kim, Hugh Dickins,
	Rik van Riel, Mel Gorman, Andrew Morton

From: Shaohua Li <shli@fb.com>

MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
SwapBacked). There is no lock to prevent the page is added to swap cache
between these two steps by page reclaim. If page reclaim finds such
page, it will simply add the page to swap cache without pageout the page
to swap because the page is marked as clean. Next time, page fault will
read data from the swap slot which doesn't have the original data, so we
have a data corruption. To fix issue, we mark the page dirty and pageout
the page.

However, we shouldn't dirty all pages which is clean and in swap cache.
swapin page is swap cache and clean too. So we only dirty page which is
added into swap cache in page reclaim, which shouldn't be swapin page.
As Minchan suggested, simply dirty the page in add_to_swap can do the
job.

Reported-by: Artem Savkov <asavkov@redhat.com>
Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
Signed-off-by: Shaohua Li <shli@fb.com>
Cc: stable@vger.kernel.org
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/swap_state.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 71ce2d1..2d64ec4 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -231,7 +231,7 @@ int add_to_swap(struct page *page)
 	 * deadlock in the swap out path.
 	 */
 	/*
-	 * Add it to the swap cache.
+	 * Add it to the swap cache and mark it dirty
 	 */
 	err = add_to_swap_cache(page, entry,
 			__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
@@ -242,6 +242,7 @@ int add_to_swap(struct page *page)
 		 * clear SWAP_HAS_CACHE flag.
 		 */
 		goto fail;
+	set_page_dirty(page);
 
 	return 1;
 
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree
  2017-09-22 18:46 ` [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
@ 2017-09-25  5:07   ` Minchan Kim
  2017-09-26 13:07   ` Michal Hocko
  1 sibling, 0 replies; 7+ messages in thread
From: Minchan Kim @ 2017-09-25  5:07 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, stable,
	Johannes Weiner, Michal Hocko, Hillf Danton, Hugh Dickins,
	Mel Gorman, Andrew Morton

On Fri, Sep 22, 2017 at 11:46:30AM -0700, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
> 
> MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> SwapBacked). There is no lock to prevent the page is added to swap cache
> between these two steps by page reclaim. If the page is added to swap
> cache, marking the page lazyfree will confuse page fault if the page is
> reclaimed and refault.

If page is added to swapcache while it stays lru_lazyfree_pvec,
it ends up having !PG_swapbacked, PG_swapcache and !PG_dirty.
Most important thing is PG_dirty. Without it, VM will reclaim the
page without *writeback* so we lose the data.

Although we prevent the page adding to swapcache, we lose the data
unless we apply [2/2] so this patch alone doesn't fix the problem.
That's why I said to you we don't need to separate patches.

> 
> Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>
> Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/swap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 9295ae9..a77d68f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  			    void *arg)
>  {
>  	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> -	    !PageUnevictable(page)) {
> +	    !PageSwapCache(page) && !PageUnevictable(page)) {
>  		bool active = PageActive(page);
>  
>  		del_page_from_lru_list(page, lruvec,
> @@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page)
>  void mark_page_lazyfree(struct page *page)
>  {
>  	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> -	    !PageUnevictable(page)) {
> +	    !PageSwapCache(page) && !PageUnevictable(page)) {
>  		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
>  
>  		get_page(page);
> -- 
> 2.9.5
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree
  2017-09-22 18:46 ` [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
  2017-09-25  5:07   ` Minchan Kim
@ 2017-09-26 13:07   ` Michal Hocko
  2017-09-27  7:16     ` Michal Hocko
  1 sibling, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2017-09-26 13:07 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, stable,
	Johannes Weiner, Hillf Danton, Minchan Kim, Hugh Dickins,
	Mel Gorman, Andrew Morton

On Fri 22-09-17 11:46:30, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
> 
> MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> SwapBacked). There is no lock to prevent the page is added to swap cache
> between these two steps by page reclaim. If the page is added to swap
> cache, marking the page lazyfree will confuse page fault if the page is
> reclaimed and refault.

Could you be more specific how exactly what kind of the confusion is the
result? I suspect you are talking about VM_BUG_ON_PAGE in
__add_to_swap_cache right?

I am also not sure how that would actually happen to be honest. If we
raced with the reclaim then the page should have been isolated and so
PageLRU is no longer true. Or am I missing something?

> Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>
> Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/swap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/swap.c b/mm/swap.c
> index 9295ae9..a77d68f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -575,7 +575,7 @@ static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  			    void *arg)
>  {
>  	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> -	    !PageUnevictable(page)) {
> +	    !PageSwapCache(page) && !PageUnevictable(page)) {
>  		bool active = PageActive(page);
>  
>  		del_page_from_lru_list(page, lruvec,
> @@ -665,7 +665,7 @@ void deactivate_file_page(struct page *page)
>  void mark_page_lazyfree(struct page *page)
>  {
>  	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> -	    !PageUnevictable(page)) {
> +	    !PageSwapCache(page) && !PageUnevictable(page)) {
>  		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
>  
>  		get_page(page);
> -- 
> 2.9.5
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page
  2017-09-22 18:46 ` [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
@ 2017-09-26 13:13   ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2017-09-26 13:13 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, stable,
	Johannes Weiner, Hillf Danton, Minchan Kim, Hugh Dickins,
	Rik van Riel, Mel Gorman, Andrew Morton

On Fri 22-09-17 11:46:31, Shaohua Li wrote:
> From: Shaohua Li <shli@fb.com>
> 
> MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> SwapBacked). There is no lock to prevent the page is added to swap cache
> between these two steps by page reclaim. If page reclaim finds such
> page, it will simply add the page to swap cache without pageout the page
> to swap because the page is marked as clean. Next time, page fault will
> read data from the swap slot which doesn't have the original data, so we
> have a data corruption. To fix issue, we mark the page dirty and pageout
> the page.
> 
> However, we shouldn't dirty all pages which is clean and in swap cache.
> swapin page is swap cache and clean too. So we only dirty page which is
> added into swap cache in page reclaim, which shouldn't be swapin page.
> As Minchan suggested, simply dirty the page in add_to_swap can do the
> job.
> 
> Reported-by: Artem Savkov <asavkov@redhat.com>
> Fix: 802a3a92ad7a(mm: reclaim MADV_FREE pages)
> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>

OK the patch makes sense to me
Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  mm/swap_state.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 71ce2d1..2d64ec4 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -231,7 +231,7 @@ int add_to_swap(struct page *page)
>  	 * deadlock in the swap out path.
>  	 */
>  	/*
> -	 * Add it to the swap cache.
> +	 * Add it to the swap cache and mark it dirty
>  	 */

this comment is simply useless. It doesn't explain why we need it and it
is trivial to see what we do there. So I would remove it altogether or
make it more useful explaning the MADV_FREE race which would be IMHO
much better.

>  	err = add_to_swap_cache(page, entry,
>  			__GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
> @@ -242,6 +242,7 @@ int add_to_swap(struct page *page)
>  		 * clear SWAP_HAS_CACHE flag.
>  		 */
>  		goto fail;
> +	set_page_dirty(page);
>  
>  	return 1;
>  
> -- 
> 2.9.5
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree
  2017-09-26 13:07   ` Michal Hocko
@ 2017-09-27  7:16     ` Michal Hocko
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Hocko @ 2017-09-27  7:16 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, Artem Savkov, Kernel-team, Shaohua Li, stable,
	Johannes Weiner, Hillf Danton, Minchan Kim, Hugh Dickins,
	Mel Gorman, Andrew Morton

[ups this got stuck in the outgoing queue]

On Tue 26-09-17 15:07:05, Michal Hocko wrote:
> On Fri 22-09-17 11:46:30, Shaohua Li wrote:
> > From: Shaohua Li <shli@fb.com>
> > 
> > MADV_FREE clears pte dirty bit and then marks the page lazyfree (clear
> > SwapBacked). There is no lock to prevent the page is added to swap cache
> > between these two steps by page reclaim. If the page is added to swap
> > cache, marking the page lazyfree will confuse page fault if the page is
> > reclaimed and refault.
> 
> Could you be more specific how exactly what kind of the confusion is the
> result? I suspect you are talking about VM_BUG_ON_PAGE in
> __add_to_swap_cache right?

I completely mixed reclaim and the #PF path here

> I am also not sure how that would actually happen to be honest. If we
> raced with the reclaim then the page should have been isolated and so
> PageLRU is no longer true. Or am I missing something?

And here I've completely missed that the swapcache page will go back to
the LRU. Stupid me. Your new changelog [1] explained it all. Thanks and
sorry for these stupid questions.

[1] http://lkml.kernel.org/r/6537ef3814398c0073630b03f176263bc81f0902.1506446061.git.shli@fb.com
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-09-27  7:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-22 18:46 [PATCH V2 0/2] mm: fix race condition in MADV_FREE Shaohua Li
2017-09-22 18:46 ` [PATCH V2 1/2] mm: avoid marking swap cached page as lazyfree Shaohua Li
2017-09-25  5:07   ` Minchan Kim
2017-09-26 13:07   ` Michal Hocko
2017-09-27  7:16     ` Michal Hocko
2017-09-22 18:46 ` [PATCH V2 2/2] mm: fix data corruption caused by lazyfree page Shaohua Li
2017-09-26 13:13   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).