All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Fix machine check recovery for copy_from_user
@ 2021-03-26  0:02 Tony Luck
  2021-03-26  0:02 ` [PATCH 1/4] x86/mce: Fix copyin code to return -EFAULT on machine check Tony Luck
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Tony Luck @ 2021-03-26  0:02 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tony Luck, x86, linux-kernel, linux-mm, Andy Lutomirski,
	Aili Yao,
	HORIGUCHI NAOYA( 堀口 直也)

Maybe this is the way forward?  I made some poor choices before
to treat poison consumption in the kernel when accessing user data
(get_user() or copy_from_user()) ... in particular assuming that
the right action was sending a SIGBUS to the task as if it had
synchronously accessed the poison location.

First three patches may need to be combined (or broken up differently)
for bisectablilty. But they are presented separately here since they
touch separate parts of the problem.

Second part is definitley incomplete. But I'd like to check that it
is the right approach before expending more brain cells in the maze
of nested macros that is lib/iov_iter.c

Last part has been posted before. It covers the case where the kernel
takes more than one swing at reading poison data before returning to
user.

Tony Luck (4):
  x86/mce: Fix copyin code to return -EFAULT on machine check.
  mce/iter: Check for copyin failure & return error up stack
  mce/copyin: fix to not SIGBUS when copying from user hits poison
  x86/mce: Avoid infinite loop for copy from user recovery

 arch/x86/kernel/cpu/mce/core.c     | 63 +++++++++++++++++++++---------
 arch/x86/kernel/cpu/mce/severity.c |  2 -
 arch/x86/lib/copy_user_64.S        | 18 +++++----
 fs/iomap/buffered-io.c             |  8 +++-
 include/linux/sched.h              |  2 +-
 include/linux/uio.h                |  2 +-
 lib/iov_iter.c                     | 15 ++++++-
 7 files changed, 77 insertions(+), 33 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison
@ 2021-04-14  5:47 ` Jue Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jue Wang @ 2021-04-14  5:47 UTC (permalink / raw)
  To: bp; +Cc: linux-kernel, linux-mm, luto, naoya.horiguchi, tony.luck, x86, yaoaili

On Tue, 13 Apr 2021 12:07:22 +0200, Petkov, Borislav wrote:

>> KVM apparently passes a machine check into the guest.

> Ah, there it is:

> static void kvm_send_hwpoison_signal(unsigned long address, struct task_struct *tsk)
> {
>         send_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, PAGE_SHIFT, tsk);
> }

This path is when EPT #PF finds accesses to a hwpoisoned page and
sends SIGBUS to user space (KVM exits into user space) with the same
semantic as if regular #PF found access to a hwpoisoned page.

The KVM_X86_SET_MCE ioctl actually injects a machine check into the guest.

We are in process to launch a product with MCE recovery capability in
a KVM based virtualization product and plan to expand the scope of the
application of it in the near future.

> So what I'm missing with all this fun is, yeah, sure, we have this
> facility out there but who's using it? Is anyone even using it at all?

The in-memory database and analytical domain are definitely using it.
A couple examples:
SAP HANA - as we've tested and planned to launch as a strategic
enterprise use case with MCE recovery capability in our product
SQL server - https://support.microsoft.com/en-us/help/2967651/inf-sql-server-may-display-memory-corruption-and-recovery-errors


Cheers,
-Jue

^ permalink raw reply	[flat|nested] 24+ messages in thread
* Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison
@ 2021-04-19 20:32 Jue Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jue Wang @ 2021-04-19 20:32 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, linux-kernel, linux-mm, luto,
	HORIGUCHI NAOYA(堀口 直也),
	x86, yaoaili

On Thu, 8 Apr 2021 10:08:52 -0700, Tony Luck wrote:
> KVM apparently passes a machine check into the guest. Though it seems
> to be misisng the MCG_STATUS information to tell the guest whether this
> is an "Action Required" machine check, or an "Action Optional" (i.e.
> whether the poison was found synchonously by execution of the current
> instruction, or asynchronously).

The KVM_X86_SET_MCE ioctl takes a parameter of struct kvm_x86_mce, hypervisor
can set with necessary semantics.

1140 #ifdef KVM_CAP_MCE
1141 /* x86 MCE */
1142 struct kvm_x86_mce {
1143         __u64 status;
1144         __u64 addr;
1145         __u64 misc;
1146         __u64 mcg_status;
1147         __u8 bank;
1148         __u8 pad1[7];
1149         __u64 pad2[3];
1150 };
1151 #endif

> > Are we documenting somewhere: "if your process gets a SIGBUS and this
> > and that, which means your page got offlined, you should do this and
> > that to recover"?

> Essentially it boils down to:
> SIGBUS handler gets additional data giving virtual address that has gone away

> 1) Can the application replace the lost page?
> Use mmap(addr, MAP_FIXED, ...) to map a fresh page into the gap
> and fill with replacement data. This case can return from SIGBUS
> handler to re-execute failed instruction
> 2) Can the application continue in degraded mode w/o the lost page?
> Hunt down pointers to lost page and update structures to say
> "this data lost". Use siglongjmp() to go to preset recovery path
> 3) Can the application shut down gracefully?
> Record details of the lost page. Inform next-of-kin. Exit.
> 4) Default - just exit
Two possible addition to these great points:
5) If for some reason the page cannot be unmapped (e.g.,
either losing to much memory like hugetlbfs 1G pages, or
THP split failure for SHMEM THP), kernel maintains a
consistent semantic (i.e., MCE SIGBUS with vaddr) to all future
accesses from user space, by leaving the hwpoisoned page
mapped or in the radix tree.
6). If for some reason the vaddr is not available upon the
first MCE recovery and page is unmapped, kernel provides
correct semantic (MCE SIGBUS with vaddr) in subsequent
page faults from user space accessing the same vaddr.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-04-19 20:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-26  0:02 [RFC 0/4] Fix machine check recovery for copy_from_user Tony Luck
2021-03-26  0:02 ` [PATCH 1/4] x86/mce: Fix copyin code to return -EFAULT on machine check Tony Luck
2021-04-06 19:24   ` Borislav Petkov
2021-03-26  0:02 ` [PATCH 2/4] mce/iter: Check for copyin failure & return error up stack Tony Luck
2021-03-26  0:02 ` [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison Tony Luck
2021-04-07 21:18   ` Borislav Petkov
2021-04-07 21:43     ` Luck, Tony
2021-04-08  8:49       ` Borislav Petkov
2021-04-08 17:08         ` Luck, Tony
2021-04-13 10:07           ` Borislav Petkov
2021-04-13 16:13             ` Luck, Tony
2021-04-14 13:05               ` Borislav Petkov
2021-03-26  0:02 ` [PATCH 4/4] x86/mce: Avoid infinite loop for copy from user recovery Tony Luck
2021-04-08 13:36   ` Borislav Petkov
2021-04-08 16:06     ` Luck, Tony
2021-04-08  2:13 ` [RFC 0/4] Fix machine check recovery for copy_from_user Aili Yao
2021-04-08 14:39   ` Luck, Tony
2021-04-09  6:49     ` Aili Yao
2021-04-14  5:47 [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison Jue Wang
2021-04-14  5:47 ` Jue Wang
2021-04-14 13:10 ` Borislav Petkov
2021-04-14 14:46   ` Jue Wang
2021-04-14 15:35     ` Borislav Petkov
2021-04-19 20:32 Jue Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.