All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address
@ 2021-04-20  2:03 ` Jue Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jue Wang @ 2021-04-20  2:03 UTC (permalink / raw)
  To: nao.horiguchi, Luck, Tony
  Cc: akpm, bp, david, linux-kernel, linux-mm, luto, naoya.horiguchi,
	osalvador, yaoaili

On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:

> This patch suggests to do page table walk to find the error virtual
> address.  If we find multiple virtual addresses in walking, we now can't
> determine which one is correct, so we fall back to sending SIGBUS in
> kill_me_maybe() without error info as we do now.  This corner case needs
> to be solved in the future.

Instead of walking the page tables, I wonder what about the following idea:

When failing to get vaddr, memory_failure just ensures the mapping is removed
and an hwpoisoned swap pte is put in place; or the original page is flagged with
PG_HWPOISONED and kept in the radix tree (e.g., for SHMEM THP).

NOTE: no SIGBUS is sent to user space.

Then do_machine_check just returns to user space to resume execution, the
re-execution will result in a #PF and should land to the exact page fault
handling code that generates a SIGBUS with the precise vaddr info:

https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3290
https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3647

Thanks,
-Jue

^ permalink raw reply	[flat|nested] 11+ messages in thread
* Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address
@ 2021-04-20  1:49 ` Jue Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jue Wang @ 2021-04-20  1:49 UTC (permalink / raw)
  To: nao.horiguchi, Luck, Tony
  Cc: akpm, bp, david, linux-kernel, linux-mm, luto, naoya.horiguchi,
	osalvador, yaoaili

On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
...
> + * This function is intended to handle "Action Required" MCEs on already
> + * hardware poisoned pages. They could happen, for example, when
> + * memory_failure() failed to unmap the error page at the first call, or
> + * when multiple Action Optional MCE events races on different CPUs with
> + * Local MCE enabled.

+Tony Luck

Hey Tony, I thought SRAO MCEs are broadcasted to all cores in the system
as they come without an execution context, is it correct?

If Yes, Naoya, I think we might want to remove the comments about the
"multiple Action Optional MCE racing" part.

Best,
-Jue

^ permalink raw reply	[flat|nested] 11+ messages in thread
* [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE
@ 2021-04-12 22:43 Naoya Horiguchi
  2021-04-12 22:43 ` [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi
  0 siblings, 1 reply; 11+ messages in thread
From: Naoya Horiguchi @ 2021-04-12 22:43 UTC (permalink / raw)
  To: linux-mm, Tony Luck, Aili Yao
  Cc: Andrew Morton, Oscar Salvador, David Hildenbrand,
	Borislav Petkov, Andy Lutomirski, Naoya Horiguchi, linux-kernel

Hi,

I wrote this patchset to materialize what I think is the current
allowable solution mentioned by the previous discussion [1].
I simply borrowed Tony's mutex patch and Aili's return code patch,
then I queued another one to find error virtual address in the best
effort manner.  I know that this is not a perfect solution, but
should work for some typical case.

My simple testing showed this patchset seems to work as intended,
but if you have the related testcases, could you please test and
let me have some feedback?

Thanks,
Naoya Horiguchi

[1]: https://lore.kernel.org/linux-mm/20210331192540.2141052f@alex-virtual-machine/
---
Summary:

Aili Yao (1):
      mm,hwpoison: return -EHWPOISON when page already

Naoya Horiguchi (1):
      mm,hwpoison: add kill_accessing_process() to find error virtual address

Tony Luck (1):
      mm/memory-failure: Use a mutex to avoid memory_failure() races

 arch/x86/kernel/cpu/mce/core.c |  13 +++-
 include/linux/swapops.h        |   5 ++
 mm/memory-failure.c            | 166 ++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 178 insertions(+), 6 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-04-21  1:04 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-20  2:03 [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Jue Wang
2021-04-20  2:03 ` Jue Wang
2021-04-20 15:47 ` Luck, Tony
2021-04-20 16:30   ` Jue Wang
2021-04-20 17:15     ` Luck, Tony
  -- strict thread matches above, loose matches on Subject: below --
2021-04-20  1:49 Jue Wang
2021-04-20  1:49 ` Jue Wang
2021-04-20  7:51 ` HORIGUCHI NAOYA(堀口 直也)
2021-04-20 15:42 ` Luck, Tony
2021-04-21  1:04   ` HORIGUCHI NAOYA(堀口 直也)
2021-04-12 22:43 [PATCH v1 0/3] mm,hwpoison: fix sending SIGBUS for Action Required MCE Naoya Horiguchi
2021-04-12 22:43 ` [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address Naoya Horiguchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.