linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery
@ 2021-07-22 13:54 Jue Wang
  2021-07-22 15:19 ` Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Jue Wang @ 2021-07-22 13:54 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, dinghui, huangcun, linux-edac, linux-kernel,
	HORIGUCHI NAOYA(堀口 直也),
	Oscar Salvador, x86, Song, Youquan

This patch assumes the UC error consumed in kernel is always the same UC.

Yet it's possible two UCs on different pages are consumed in a row.
The patch below will panic on the 2nd MCE. How can we make the code works
on multiple UC errors?


> + int count = ++current->mce_count;
> +
> + /* First call, save all the details */
> + if (count == 1) {
> + current->mce_addr = m->addr;
> + current->mce_kflags = m->kflags;
> + current->mce_ripv = !!(m->mcgstatus & MCG_STATUS_RIPV);
> + current->mce_whole_page = whole_page(m);
> + current->mce_kill_me.func = func;
> + }
> ......
> + /* Second or later call, make sure page address matches the one from first call */
> + if (count > 1 && (current->mce_addr >> PAGE_SHIFT) != (m->addr >> PAGE_SHIFT))
> + mce_panic("Machine checks to different user pages", m, msg);

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery
@ 2021-07-31  6:30 Jue Wang
  2021-07-31 20:43 ` Luck, Tony
  0 siblings, 1 reply; 12+ messages in thread
From: Jue Wang @ 2021-07-31  6:30 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, dinghui, huangcun, Jue Wang, linux-edac,
	linux-kernel, HORIGUCHI NAOYA(堀口 直也),
	Oscar Salvador, x86, Song, Youquan

Been busy with some other work.

After cherry picking patch 1 & 2, I saw the following with 2 UC errors injected
into the user space buffer passed into write(2), as expected:

[  287.994754] Kernel panic - not syncing: Machine checks to different
user pages

The kernel tested with has its x86/mce and mm/memory-failure aligned with
upstream till around 2020/11.

Is there any other patch that I have missed to the write syscall etc?

Thanks,
-Jue

^ permalink raw reply	[flat|nested] 12+ messages in thread
* [PATCH 0/3] More machine check recovery fixes
@ 2021-07-06 19:06 Tony Luck
  2021-07-06 19:06 ` [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
  0 siblings, 1 reply; 12+ messages in thread
From: Tony Luck @ 2021-07-06 19:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, Ding Hui, naoya.horiguchi, osalvador, Youquan Song,
	huangcun, x86, linux-edac, linux-kernel

Fix a couple of issues in machine check handling

1) A repeated machine check inside the kernel without calling the task
   work function between machine checks it will go into an infinite
   loop
2) Machine checks in kernel functions copying data from user addresses
   send SIGBUS to the user as if the application had consumed the
   poison. But this is wrong. The user should see either an -EFAULT
   error return or a reduced byte count (in the case of write(2)).

Tony Luck (3):
  x86/mce: Change to not send SIGBUS error during copy from user
  x86/mce: Avoid infinite loop for copy from user recovery
  x86/mce: Drop copyin special case for #MC

 arch/x86/kernel/cpu/mce/core.c | 62 ++++++++++++++++++++++++----------
 arch/x86/lib/copy_user_64.S    | 13 -------
 include/linux/sched.h          |  1 +
 3 files changed, 45 insertions(+), 31 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-02 15:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-22 13:54 [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Jue Wang
2021-07-22 15:19 ` Luck, Tony
2021-07-22 23:30   ` Jue Wang
2021-07-23  0:14     ` Luck, Tony
2021-07-23  3:47       ` Jue Wang
2021-07-23  4:01         ` Luck, Tony
2021-07-23  4:16           ` Jue Wang
2021-07-23 14:47             ` Luck, Tony
  -- strict thread matches above, loose matches on Subject: below --
2021-07-31  6:30 Jue Wang
2021-07-31 20:43 ` Luck, Tony
2021-08-02 15:29   ` Jue Wang
2021-07-06 19:06 [PATCH 0/3] More machine check recovery fixes Tony Luck
2021-07-06 19:06 ` [PATCH 2/3] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).