linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"n-horiguchi@ah.jp.nec.com" <n-horiguchi@ah.jp.nec.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
Date: Mon, 7 Dec 2020 02:34:30 +0000	[thread overview]
Message-ID: <20201207023429.GA8986@hori.linux.bs1.fc.nec.co.jp> (raw)
In-Reply-To: <20201205153423.GA4108@localhost.localdomain>

On Sat, Dec 05, 2020 at 04:34:23PM +0100, Oscar Salvador wrote:
> On Fri, Dec 04, 2020 at 06:25:31PM +0100, Vlastimil Babka wrote:
> > OK, so that means we don't introduce this race for MADV_SOFT_OFFLINE, but it's
> > already (and still) there for MADV_HWPOISON since Dan's 23e7b5c2e271 ("mm,
> > madvise_inject_error: Let memory_failure() optionally take a page reference") no?
> 
> What about the following?
> CCing Dan as well.

Hi Oscar, Vlastimil,

Thanks for mentioning this. I agree with that direction.

> 
> From: Oscar Salvador <osalvador@suse.de>
> Date: Sat, 5 Dec 2020 16:14:40 +0100
> Subject: [PATCH] mm,memory_failure: Always pin the page in
>  madvise_inject_error
> 
> madvise_inject_error() uses get_user_pages_fast to get the page
> from the addr we specified.
> After [1], we drop such extra reference for memory_failure() path.
> That commit says that memory_failure wanted to keep the pin in order
> to take the page out of circulation.
> 
> The truth is that we need to keep the page pinned, otherwise the
> page might be re-used after the put_page(), and we can end up messing
> with someone else's memory.
> E.g:
> 
> CPU0
> process X					CPU1
>  madvise_inject_error
>   get_user_pages
>    put_page
> 					page gets reclaimed
> 					process Y allocates the page
>   memory_failure
>    // We mess with process Y memory
> 
> madvise() is meant to operate on a self address space, so messing with
> pages that do not belong to us seems the wrong thing to do.
> To avoid that, let us keep the page pinned for memory_failure as well.
> 
> Pages for DAX mappings will release this extra refcount in
> memory_failure_dev_pagemap.
> 
> [1] ("23e7b5c2e271: mm, madvise_inject_error:
>       Let memory_failure() optionally take a page reference")
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Fixes: 23e7b5c2e271 ("mm, madvise_inject_error: Let memory_failure() optionally take a page reference")
> ---
>  mm/madvise.c        | 9 +--------
>  mm/memory-failure.c | 6 ++++++
>  2 files changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c6b5524add58..19edddba196d 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -907,14 +907,7 @@ static int madvise_inject_error(int behavior,
>  		} else {
>  			pr_info("Injecting memory failure for pfn %#lx at process virtual address %#lx\n",
>  				 pfn, start);
> -			/*
> -			 * Drop the page reference taken by get_user_pages_fast(). In
> -			 * the absence of MF_COUNT_INCREASED the memory_failure()
> -			 * routine is responsible for pinning the page to prevent it
> -			 * from being released back to the page allocator.
> -			 */
> -			put_page(page);
> -			ret = memory_failure(pfn, 0);
> +			ret = memory_failure(pfn, MF_COUNT_INCREASED);
>  		}
>  
>  		if (ret)
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index 869ece2a1de2..ba861169c9ae 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1269,6 +1269,12 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
>  	if (!cookie)
>  		goto out;
>  
> +	if (flags & MF_COUNT_INCREASED)
> +		/*
> +		 * Drop the extra refcount in case we come from madvise().
> +		 */
> +		put_page(page);
> +

Should this if-block come before dax_lock_page() block?
It seems that if dax_lock_page returns NULL, memory_failure_dev_pagemap()
returns without releasing the refcount.
memory_failure() on dev_pagemap doesn't use page refcount (unlike other
type of memory), so we can release it unconditionally.

Thanks,
Naoya Horiguchi

  reply	other threads:[~2020-12-07  2:37 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-19 10:57 [PATCH 0/7] HWPoison: Refactor get page interface Oscar Salvador
2020-11-19 10:57 ` [PATCH 1/7] mm,hwpoison: Refactor get_any_page Oscar Salvador
2020-11-20  1:33   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-25 16:54   ` Vlastimil Babka
2020-11-19 10:57 ` [PATCH 2/7] mm,hwpoison: Drop pfn parameter Oscar Salvador
2020-11-20  1:33   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-25 16:55   ` Vlastimil Babka
2020-11-19 10:57 ` [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED Oscar Salvador
2020-11-25 18:20   ` Vlastimil Babka
2020-12-01 11:35     ` Oscar Salvador
2020-12-04 17:25       ` Vlastimil Babka
2020-12-05 15:34         ` Oscar Salvador
2020-12-07  2:34           ` HORIGUCHI NAOYA(堀口 直也) [this message]
2020-12-07  7:24             ` Oscar Salvador
2020-11-19 10:57 ` [PATCH 4/7] mm,hwpoison: remove MF_COUNT_INCREASED Oscar Salvador
2020-11-19 10:57 ` [PATCH 5/7] mm,hwpoison: remove flag argument from soft offline functions Oscar Salvador
2020-11-19 10:57 ` [PATCH 6/7] mm,hwpoison: Disable pcplists before grabbing a refcount Oscar Salvador
2020-11-20  1:33   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-26 13:45   ` Vlastimil Babka
2020-11-28  0:51     ` Andrew Morton
2020-11-19 10:57 ` [PATCH 7/7] mm,hwpoison: Remove drain_all_pages from shake_page Oscar Salvador
2020-11-20  1:33   ` HORIGUCHI NAOYA(堀口 直也)
2020-11-26 13:52   ` Vlastimil Babka
2020-11-27  7:20     ` Oscar Salvador
2020-12-02 13:34 ` [PATCH 0/7] HWPoison: Refactor get page interface Qian Cai
2020-12-02 13:41   ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201207023429.GA8986@hori.linux.bs1.fc.nec.co.jp \
    --to=naoya.horiguchi@nec.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=osalvador@suse.de \
    --cc=vbabka@suse.cz \
    --subject='Re: [PATCH 3/7] mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox