All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aili Yao <yaoaili@kingsoft.com>
To: "Luck, Tony" <tony.luck@intel.com>
Cc: "HORIGUCHI NAOYA(堀口 直也)" <naoya.horiguchi@nec.com>,
	"Oscar Salvador" <osalvador@suse.de>,
	"david@redhat.com" <david@redhat.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"yangfeng1@kingsoft.com" <yangfeng1@kingsoft.com>,
	yaoaili@kingsoft.com
Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned
Date: Wed, 3 Mar 2021 16:39:12 +0800	[thread overview]
Message-ID: <20210303163912.3d508e0f@alex-virtual-machine> (raw)
In-Reply-To: <20210303115710.2e9f8e23@alex-virtual-machine>

Hi tony:

> On Tue, 2 Mar 2021 19:39:53 -0800
> "Luck, Tony" <tony.luck@intel.com> wrote:
> 
> > On Fri, Feb 26, 2021 at 10:59:15AM +0800, Aili Yao wrote:  
> > > Hi naoya, tony:    
> > > > > 
> > > > > Idea for what we should do next ... Now that x86 is calling memory_failure()
> > > > > from user context ... maybe parallel calls for the same page should
> > > > > be blocked until the first caller completes so we can:
> > > > > a) know that pages are unmapped (if that happens)
> > > > > b) all get the same success/fail status      
> > > > 
> > > > One memory_failure() call changes the target page's status and
> > > > affects all mappings to all affected processes, so I think that
> > > > (ideally) we don't have to block other threads (letting them
> > > > early return seems fine).  Sometimes memory_failure() fails,
> > > > but even in such case, PG_hwpoison is set on the page and other
> > > > threads properly get SIGBUSs with this patch, so I think that
> > > > we can avoid the worst scenario (like system stall by MCE loop).
> > > >     
> > > I agree with naoya's point, if we block for this issue, Does this change the result
> > > that the process should be killed? Or is there something other still need to be considered?    
> > 
> > Ok ... no blocking ... 

I do think about blocking method and the error address issue with sigbus,here is my opinion, maybe helpful:

For blocking, if we block here, there are some undefine work i think should be done.
As we don't know the process B triggering this error again is early-kill or not, so the previous memory_failure() call may 
not add B on kill_list, even if B is on kill_list, the error level for B is not proper set, as B should get an AR SIGBUS;

So we can't just wait, We must have some logic adding the process B to kill list, and as this is an AR error
another change should be done to current code, we need more logic in kill_proc or some other place.

Even if all the work is done right. There is one more serious scenario though, we even don't know the current step the previous memory_failure() is on,
So previous modification may not be usefull at all; When this scenario happens, what we can do?  block or return ?
if finally we return, an error code should be taken back; so we have to go to error process logic and a signal without right address will be sent;

For error address with sigbus, i think this is not an issue resulted by the patch i post, before my patch, the issue is already there.
I don't find a realizable way to get the correct address for same reason --- we don't know whether the page mapping is there or not when
we got to kill_me_maybe(), in some case, we may get it, but there are a lot of parallel issue need to consider, and if failed we have to fallback
to the error brach again, remaining current code may be an easy option;

Any methods or patchs can solve the issue in a better way is OK to me, i want this issue fixed and in more complete way!

-- 
Thanks!
Aili Yao

  reply	other threads:[~2021-03-03 15:05 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-24  7:16 [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-02-24 10:10 ` David Hildenbrand
2021-02-24 10:31 ` Oscar Salvador
2021-02-25  3:43   ` Aili Yao
2021-02-25 11:28     ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 11:39       ` Oscar Salvador
2021-02-25 12:38         ` HORIGUCHI NAOYA(堀口 直也)
2021-02-25 18:15           ` Luck, Tony
2021-02-26  2:19             ` HORIGUCHI NAOYA(堀口 直也)
2021-02-26  2:59               ` Aili Yao
2021-03-03  3:39                 ` Luck, Tony
2021-03-03  3:57                   ` Aili Yao
2021-03-03  8:39                     ` Aili Yao [this message]
2021-03-03 15:41                       ` Luck, Tony
2021-03-04  2:16                         ` Aili Yao
2021-03-04  4:19                           ` Aili Yao
2021-03-04  6:45                             ` Aili Yao
2021-03-04 23:57                               ` Luck, Tony
2021-03-05  1:30                                 ` Aili Yao
2021-03-05  1:36                                   ` Aili Yao
2021-03-05 22:11                                     ` Luck, Tony
2021-03-08  6:45                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 18:54                                         ` Luck, Tony
2021-03-08 22:38                                           ` HORIGUCHI NAOYA(堀口 直也)
2021-03-08 22:55                                             ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Luck, Tony
2021-03-08 23:42                                               ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  2:04                                               ` Aili Yao
2021-03-09  6:04                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09  6:35                                                   ` [PATCH v2] mm,hwpoison: return -EBUSY when page already poisoned Aili Yao
2021-03-09  8:28                                                     ` HORIGUCHI NAOYA(堀口 直也)
2021-03-09 20:01                                                       ` Luck, Tony
2021-03-10  8:05                                                         ` HORIGUCHI NAOYA(堀口 直也)
2021-03-13  1:55                                                         ` Jue Wang
2021-03-13  1:55                                                           ` Jue Wang
2021-03-10  8:01                                                       ` Aili Yao
2021-03-31 11:25                                                     ` [PATCH v3] mm,hwpoison: return -EHWPOISON " Aili Yao
2021-04-01 15:33                                                       ` Luck, Tony
2021-04-02  1:18                                                         ` Aili Yao
2021-04-02 15:11                                                           ` Luck, Tony
2021-04-05 13:50                                                             ` HORIGUCHI NAOYA(堀口 直也)
2021-04-06  1:04                                                               ` Aili Yao
2021-03-09  6:38                                                   ` [PATCH] mm/memory-failure: Use a mutex to avoid memory_failure() races Aili Yao
2021-03-05 15:55                                   ` [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Luck, Tony
2021-03-10  6:10                                     ` Aili Yao
2021-03-11  8:55                                       ` HORIGUCHI NAOYA(堀口 直也)
2021-03-11 11:23                                         ` Aili Yao
2021-03-11 17:05                                         ` Luck, Tony
2021-03-12  5:55                                           ` Aili Yao
2021-03-12 16:29                                             ` Luck, Tony
2021-03-12 23:48                                               ` Luck, Tony
2021-03-16  6:42                                                 ` HORIGUCHI NAOYA(堀口 直也)
2021-03-16  7:54                                                   ` Aili Yao
2021-03-17  0:29                                                 ` Luck, Tony
2021-03-17  9:07                                                   ` Aili Yao
2021-03-17  7:48                                         ` Aili Yao
2021-03-17  8:23                                           ` Aili Yao
2021-02-26  3:26               ` Tony Luck
2021-02-26  3:26                 ` Tony Luck
2021-02-26  2:52         ` Aili Yao
2021-02-26 17:58           ` Luck, Tony
2021-03-02  4:32             ` Aili Yao
2021-03-31 10:56         ` Aili Yao
2021-03-31 10:58           ` David Hildenbrand
2021-03-01 23:21 Jue Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210303163912.3d508e0f@alex-virtual-machine \
    --to=yaoaili@kingsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=osalvador@suse.de \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    --cc=yangfeng1@kingsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.