linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Aili Yao <yaoaili@kingsoft.com>
To: "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	"sunhao2@kingsoft.com" <sunhao2@kingsoft.com>,
	<yaoaili126@gmail.com>
Subject: Re: [PATCH v7] mm/gup: check page hwpoison status for memory recovery failures.
Date: Mon, 10 May 2021 11:13:28 +0800	[thread overview]
Message-ID: <20210510111328.1eef98f0@alex-virtual-machine> (raw)
In-Reply-To: <20210406104123.451ee3c3@alex-virtual-machine>

On Tue, 6 Apr 2021 10:41:23 +0800
Aili Yao <yaoaili@kingsoft.com> wrote:

> When we call get_user_pages() to pin user page in memory, there may be
> hwpoison page, currently, we just handle the normal case that memory
> recovery jod is correctly finished, and we will not return the hwpoison
> page to callers, but for other cases like memory recovery fails and the
> user process related pte is not correctly set invalid, we will still
> return the hwpoison page, and may touch it and lead to panic.
> 
> In gup.c, for normal page, after we call follow_page_mask(), we will
> return the related page pointer; or like another hwpoison case with pte
> invalid, it will return NULL. For NULL, we will handle it in if (!page)
> branch. In this patch, we will filter out the hwpoison page in
> follow_page_mask() and return error code for recovery failure cases.
> 
> We will check the page hwpoison status as soon as possible and avoid doing
> followed normal procedure and try not to grab related pages.
> 
> Changes since v6:
> - Fix wrong page pointer check in follow_trans_huge_pmd();
> 
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>

After considering more, I still insist that we should take more care to treat the case that kernel touched  user hwpoison pages.

Usually when process touched the hwpoison page, it will be killed, and no other bad consequence will happen. while in a case, kernel touches
the user process hwpoison page, It should be one more serious issue than the user process's access, the error return nor Null return is
not sufficient and reasonable for me.

There will be some doubt places, 
1) should the user process be killed first? 
2) if error returned, the process can correctly process it?  It seems there is no way the process can finish that. 
3) if NULL returned, job continue again? the data integrity will be guaranteed?  I doubted that.

There are so many arguements about the CopyIn patch from intel that the process should killed or not...
And this gup module seems more compicated and have the same question. 
When error returned or NULL return from gup, in some way and some scenario, this will make the hw issue even more complicated.

Yes, there is a flag FOLL_HWPOISON, but this flag seems be designed for different return values that the process can accept,
it is not designed to decide kernel to panic or not;

Currently I have no other scenario like coredump case. And I havn't one good way to fix this leak in a graceful way.

Maybe, For some scenario, the kernel recovery thing for user process is wrong and not deserved.

I have a lot of other works to do and have not much time to continue on this thread, And I insist my original post.

There should be a better option for this issue, when the better option is ready, this patch can and should be reverted.

Thanks!
Aili Yao




  parent reply	other threads:[~2021-05-10  3:13 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  8:37 [PATCH] mm/gup: check page posion status for coredump Aili Yao
2021-03-17  9:12 ` David Hildenbrand
2021-03-18  3:15   ` Aili Yao
2021-03-18  3:18   ` [PATCH v2] " Aili Yao
2021-03-18  4:46   ` [PATCH] " Matthew Wilcox
2021-03-18  5:34     ` Aili Yao
2021-03-19  2:44       ` [PATCH v3] " Aili Yao
2021-03-20  0:35         ` Matthew Wilcox
2021-03-22  3:40           ` Aili Yao
2021-03-22 11:33           ` [PATCH v5] mm/gup: check page hwposion " Aili Yao
2021-03-26 14:09             ` David Hildenbrand
2021-03-26 14:22               ` David Hildenbrand
2021-03-31  1:52                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  2:43                   ` Aili Yao
2021-03-31  4:32                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  6:44                       ` David Hildenbrand
2021-03-31  7:07                         ` Aili Yao
2021-04-01  2:31                         ` Aili Yao
2021-04-06  2:23                         ` [PATCH v6] mm/gup: check page hwpoison status for memory recovery failures Aili Yao
2021-04-06  2:41                           ` [PATCH v7] " Aili Yao
2021-04-07  1:54                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-07  7:48                               ` Aili Yao
2021-05-10  3:13                             ` Aili Yao [this message]
2021-03-31  6:07                   ` [PATCH v5] mm/gup: check page hwposion status for coredump Matthew Wilcox
2021-03-31  6:53                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-31  7:05                       ` David Hildenbrand
2021-03-18  8:14     ` [PATCH] mm/gup: check page posion " David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210510111328.1eef98f0@alex-virtual-machine \
    --to=yaoaili@kingsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sunhao2@kingsoft.com \
    --cc=yangfeng1@kingsoft.com \
    --cc=yaoaili126@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).