[PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.
@ 2021-01-28 11:43 Aili Yao
  2021-01-28 17:43 ` Luck, Tony
  0 siblings, 1 reply; 5+ messages in thread
From: Aili Yao @ 2021-01-28 11:43 UTC (permalink / raw)
  To: x86, tony.luck, naoya.horiguchi; +Cc: linux-kernel, yangfeng1

when one page is already hwpoisoned by AO action, process may not be
killed, the process mapping this page may make a syscall include this
page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
mode it may be fixed by fixup_exception, current code will just return
error code to user process.

This is not suffient, we should send a SIGBUS to the process and log the
info to console, as we can't trust the process will handle the error
correctly.

Suggested-by: Feng Yang <yangfeng1@kingsoft.com>
Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
---
 arch/x86/mm/fault.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index f1f1b5a0956a..36d1e385512b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
 		 * In this case we need to make sure we're not recursively
 		 * faulting through the emulate_vsyscall() logic.
 		 */
+#ifdef CONFIG_MEMORY_FAILURE
+		if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
+			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
+				current->comm, current->pid, address);
+
+		if ((current->thread.sig_on_uaccess_err && signal) ||
+			(si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
+#else
 		if (current->thread.sig_on_uaccess_err && signal) {
+#endif
 			sanitize_error_code(address, &error_code);
 
 			set_signal_archinfo(address, error_code);
@@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
 {
 	/* Kernel mode? Handle exceptions or die: */
 	if (!(error_code & X86_PF_USER)) {
+#ifdef CONFIG_MEMORY_FAILURE
+		if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
+			no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
+		else
+			no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+#else
 		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
+#endif
 		return;
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.
  2021-01-28 11:43 [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access Aili Yao
@ 2021-01-28 17:43 ` Luck, Tony
  2021-01-29  3:33   ` Aili Yao
  0 siblings, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2021-01-28 17:43 UTC (permalink / raw)
  To: Aili Yao; +Cc: x86, naoya.horiguchi, linux-kernel, yangfeng1

On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> when one page is already hwpoisoned by AO action, process may not be
> killed, the process mapping this page may make a syscall include this
> page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> mode it may be fixed by fixup_exception, current code will just return
> error code to user process.

Shouldn't the AO action that poisoned the page have also unmapped it?
> 
> This is not suffient, we should send a SIGBUS to the process and log the
> info to console, as we can't trust the process will handle the error
> correctly.

I agree with this part ... few apps check for -EFAULT and do the right
thing.  But I'm not sure how this happens. Can you provide a bit more
detail on the steps

-Tony

P.S. Typo: s/suffient/sufficient/

> 
> Suggested-by: Feng Yang <yangfeng1@kingsoft.com>
> Signed-off-by: Aili Yao <yaoaili@kingsoft.com>
> ---
>  arch/x86/mm/fault.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index f1f1b5a0956a..36d1e385512b 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -662,7 +662,16 @@ no_context(struct pt_regs *regs, unsigned long error_code,
>  		 * In this case we need to make sure we're not recursively
>  		 * faulting through the emulate_vsyscall() logic.
>  		 */
> +#ifdef CONFIG_MEMORY_FAILURE
> +		if (si_code == BUS_MCEERR_AR && signal == SIGBUS)
> +			pr_err("MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
> +				current->comm, current->pid, address);
> +
> +		if ((current->thread.sig_on_uaccess_err && signal) ||
> +			(si_code == BUS_MCEERR_AR && signal == SIGBUS)) {
> +#else
>  		if (current->thread.sig_on_uaccess_err && signal) {
> +#endif
>  			sanitize_error_code(address, &error_code);
>  
>  			set_signal_archinfo(address, error_code);
> @@ -927,7 +936,14 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
>  {
>  	/* Kernel mode? Handle exceptions or die: */
>  	if (!(error_code & X86_PF_USER)) {
> +#ifdef CONFIG_MEMORY_FAILURE
> +		if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE))
> +			no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> +		else
> +			no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#else
>  		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> +#endif
>  		return;
>  	}
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.
  2021-01-28 17:43 ` Luck, Tony
@ 2021-01-29  3:33   ` Aili Yao
  2021-01-29 22:55     ` Luck, Tony
  0 siblings, 1 reply; 5+ messages in thread
From: Aili Yao @ 2021-01-29  3:33 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86, naoya.horiguchi, linux-kernel, yangfeng1

[-- Attachment #1: Type: text/plain, Size: 5218 bytes --]

On Thu, 28 Jan 2021 09:43:52 -0800
"Luck, Tony" <tony.luck@intel.com> wrote:

> On Thu, Jan 28, 2021 at 07:43:26PM +0800, Aili Yao wrote:
> > when one page is already hwpoisoned by AO action, process may not be
> > killed, the process mapping this page may make a syscall include this
> > page and result to trigger a VM_FAULT_HWPOISON fault, as it's in kernel
> > mode it may be fixed by fixup_exception, current code will just return
> > error code to user process.  
> 
> Shouldn't the AO action that poisoned the page have also unmapped it?

Yes, The page has been unmapped in the code mm/rmap.c:

1567                 if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
1568                         pteval = swp_entry_to_pte(make_hwpoison_entry(subpage));
1569                         if (PageHuge(page)) {
1570                                 hugetlb_count_sub(compound_nr(page), mm);
1571                                 set_huge_swap_pte_at(mm, address,
1572                                                      pvmw.pte, pteval,
1573                                                      vma_mmu_pagesize(vma));
1574                         } else {
1575                                 dec_mm_counter(mm, mm_counter(page));
1576                                 set_pte_at(mm, address, pvmw.pte, pteval);
1577                         }
1578 
1579                 }

The pte for this page of processes mapping it should be marked with SWP_HWPOISON.
But the process may not be informed and may continue with the address which has been
ummaped, Then accessing the content in the page will trigger a page fault.

Normally, it will hit the code in arch/x86/mm/fault.c:

 945 #ifdef CONFIG_MEMORY_FAILURE
 946         if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
 947                 struct task_struct *tsk = current;
 948                 unsigned lsb = 0;
 949 
 950                 pr_err(
 951         "MCE: Killing %s:%d due to hardware memory corruption fault at %lx\n",
 952                         tsk->comm, tsk->pid, address);
 953                 if (fault & VM_FAULT_HWPOISON_LARGE)
 954                         lsb = hstate_index_to_shift(VM_FAULT_GET_HINDEX(fault));
 955                 if (fault & VM_FAULT_HWPOISON)
 956                         lsb = PAGE_SHIFT;
 957                 force_sig_mceerr(BUS_MCEERR_AR, (void __user *)address, lsb);
 958                 return;
 959         }
 960 #endif
 961         force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address);
 962 }

But when the user processes do a syscall and make a copyin action in kernel space, 
the page fault triggered by this action will not got the the above code, it actually
go to the code in arch/x86/mm/fault.c:

 650         if (fixup_exception(regs, X86_TRAP_PF, error_code, address)) {
 674                 /*
 675                  * Barring that, we can do the fixup and be happy.
 676                  */
 677                 return;
 678         }

> > 
> > This is not suffient, we should send a SIGBUS to the process and log the
> > info to console, as we can't trust the process will handle the error
> > correctly.  
> 
> I agree with this part ... few apps check for -EFAULT and do the right
> thing.  But I'm not sure how this happens. Can you provide a bit more
> detail on the steps
> 

Attachment is a simple code to test this, you can try this test with:
./einj_mem_uc -f copyin2

In my enviroment, the stack will be:

[ 1583.063050] Memory failure: 0x1030254: recovery action for dirty LRU page: Recovered
[ 1583.071724] MCE: Killing einj_mem_uc:11139 due to hardware memory corruption fault at 7f4d59032000
[ 1583.081732] CPU: 38 PID: 11139 Comm: einj_mem_uc Kdump: loaded Not tainted 5.11.0-rc2+ #43
[ 1583.102607] Call Trace:
[ 1583.105338]  dump_stack+0x57/0x6a
[ 1583.109041]  no_context.cold+0xf6/0x284
[ 1583.113315]  mm_fault_error+0xc3/0x1b0
[ 1583.117503]  exc_page_fault+0x57/0x110
[ 1583.121690]  asm_exc_page_fault+0x1e/0x30
[ 1583.126159] RIP: 0010:__get_user_nocheck_8+0x10/0x13
[ 1583.131704] Code: 0f b7 10 31 c0 0f 01 ca c3 90 0f 01 cb 0f ae e8 8b 10 31 c0 0f 01 ca c3 66 90 0f 01 cb 0f ae e8 48 8b 10 31 c0 0f 01 ca c3 90 <0f> 01 ca 31 d2 48 c7 c0 f2 ff ff ff c3 cc cc cc 0f 1f 44 00 00 40
[ 1583.152659] RSP: 0018:ffffb9e462917d90 EFLAGS: 00050293
[ 1583.158490] RAX: 00007f4d59032000 RBX: 0000000000000000 RCX: 00007f4d59032000
[ 1583.166453] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 00007f4d590321ff
[ 1583.174418] RBP: 0000000000000200 R08: 0000000000000200 R09: ffffb9e462917e50
[ 1583.182382] R10: 0000000000000200 R11: 0000000000000000 R12: ffffb9e462917e60
[ 1583.190345] R13: ffff941470e93058 R14: 0000000000001000 R15: ffffffffc0626540
[ 1583.198310]  iov_iter_fault_in_readable+0x4f/0x120
[ 1583.203657]  generic_perform_write+0x83/0x1c0
[ 1583.208520]  ext4_buffered_write_iter+0x93/0x150 [ext4]
[ 1583.214378]  new_sync_write+0x11f/0x1b0
[ 1583.218661]  vfs_write+0x1c0/0x280
[ 1583.222455]  ksys_write+0x5f/0xe0
[ 1583.226153]  do_syscall_64+0x33/0x40
[ 1583.230142]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1583.235778] RIP: 0033:0x7f4d58b35cd0

> P.S. Typo: s/suffient/sufficient/

Thanks for correction!

-- 
Best Regards!

Aili Yao

[-- Attachment #2: einj_mem_uc.c --]
[-- Type: text/x-c++src, Size: 15234 bytes --]

/*
 * Copyright (C) 2015 Intel Corporation
 * Author: Tony Luck
 *
 * This software may be redistributed and/or modified under the terms of
 * the GNU General Public License ("GPL") version 2 only as published by the
 * Free Software Foundation.
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <time.h>
#include <sys/mman.h>
#include <sys/time.h>
#include <setjmp.h>
#include <signal.h>
#define _GNU_SOURCE 1
#define __USE_GNU 1
#include <sched.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/prctl.h>
#include <setjmp.h>

extern long long vtop(long long);
extern void proc_cpuinfo(int *nsockets, int *ncpus, char *model, int **apicmap);
extern void proc_interrupts(long *nmce, long *ncmci);
extern void do_memcpy(void *dst, void *src, int cnt);
static void show_help(void);

static char *progname;
static int nsockets, ncpus, lcpus_persocket;
static int force_flag;
static int all_flag;
static long pagesize;
static int *apicmap;
#define	CACHE_LINE_SIZE	64

#define EINJ_ETYPE "/sys/kernel/debug/apei/einj/error_type"
#define EINJ_ADDR "/sys/kernel/debug/apei/einj/param1"
#define EINJ_MASK "/sys/kernel/debug/apei/einj/param2"
#define EINJ_APIC "/sys/kernel/debug/apei/einj/param3"
#define EINJ_FLAGS "/sys/kernel/debug/apei/einj/flags"
#define EINJ_NOTRIGGER "/sys/kernel/debug/apei/einj/notrigger"
#define EINJ_DOIT "/sys/kernel/debug/apei/einj/error_inject"

static void wfile(char *file, unsigned long long val)
{
	FILE *fp = fopen(file, "w");

	if (fp == NULL) {
		fprintf(stderr, "%s: cannot open '%s'\n", progname, file);
		exit(1);
	}
	fprintf(fp, "0x%llx\n", val);
	if (fclose(fp) == EOF) {
		fprintf(stderr, "%s: write error on '%s'\n", progname, file);
		exit(1);
	}
}

static void * copyin2_addr = NULL;
static void inject_madvise(unsigned long long page,int notrigger)
{
	if(copyin2_addr == NULL){
		printf("Invalid parameter \n");
		exit(0);
	}
	if (madvise(copyin2_addr, 100, 100) != 0) {
		if (errno == EINVAL) {
			printf("Kernel doesn't support poison injection\n");
			exit(0);
		}
		printf("madvise \n");
	}
}

static void inject_uc(unsigned long long addr, int notrigger)
{
	wfile(EINJ_ETYPE, 0x10);
	wfile(EINJ_ADDR, addr);
	wfile(EINJ_MASK, ~0x0ul);
	//wfile(EINJ_FLAGS, 2);
	wfile(EINJ_NOTRIGGER, notrigger);
	wfile(EINJ_DOIT, 1);
}

static void inject_llc(unsigned long long addr, int notrigger)
{
	unsigned cpu;

	cpu = sched_getcpu();
	wfile(EINJ_ETYPE, 0x2);
	wfile(EINJ_ADDR, addr);
	wfile(EINJ_MASK, ~0x0ul);
	wfile(EINJ_APIC, apicmap[cpu]);
	wfile(EINJ_FLAGS, 3);
	wfile(EINJ_NOTRIGGER, notrigger);
	wfile(EINJ_DOIT, 1);
}

static int is_advanced_ras(char *model)
{
	if (strstr(model, "E7-"))
		return 1;
	if (strstr(model, "Platinum"))
		return 1;
	if (strstr(model, "Gold"))
		return 1;
	return 0;
}

static void check_configuration(void)
{
	char	model[512];

	if (getuid() != 0) {
		fprintf(stderr, "%s: must be root to run error injection tests\n", progname);
		exit(1);
	}
	if (access("/sys/firmware/acpi/tables/EINJ", R_OK) == -1) {
		fprintf(stderr, "%s: Error injection not supported, check your BIOS settings\n", progname);
		exit(1);
	}
	if (access(EINJ_NOTRIGGER, R_OK|W_OK) == -1) {
		fprintf(stderr, "%s: Is the einj.ko module loaded?\n", progname);
		exit(1);
	}
	model[0] = '\0';
	proc_cpuinfo(&nsockets, &ncpus, model, &apicmap);
	if (nsockets == 0 || ncpus == 0) {
		fprintf(stderr, "%s: could not find number of sockets/cpus\n", progname);
		exit(1);
	}
	if (ncpus % nsockets) {
		fprintf(stderr, "%s: strange topology. Are all cpus online?\n", progname);
		exit(1);
	}
	lcpus_persocket = ncpus / nsockets;
	if (!force_flag && !is_advanced_ras(model)) {
		fprintf(stderr, "%s: warning: cpu may not support recovery\n", progname);
		exit(1);
	}
}

#define REP9(stmt) stmt;stmt;stmt;stmt;stmt;stmt;stmt;stmt;stmt

volatile int vol;

int dosums(void)
{
	vol = 0;
	REP9(REP9(REP9(vol++)));
	return vol;
}

#define MB(n)	((n) * 1024 * 1024)

static void *thp_data_alloc(void)
{
	char	*p = malloc(MB(128));
	int	i;

	if (p == NULL) {
		fprintf(stderr, "%s: cannot allocate memory\n", progname);
		exit(1);
	}
	srandom(getpid() * time(NULL));
	for (i = 0; i < MB(128); i++)
		p[i] = random();
	return p + MB(64);
}

static void *data_alloc(void)
{
	char	*p = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
	int	i;

	if (p == NULL) {
		fprintf(stderr, "%s: cannot allocate memory\n", progname);
		exit(1);
	}
	srandom(getpid() * time(NULL));
	for (i = 0; i < pagesize; i++)
		p[i] = random();
	return p + pagesize / 4;
}
static void *data_alloc_copyin2(void)
{
        char    *p = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
        int     i;

        if (p == NULL) {
                fprintf(stderr, "%s: cannot allocate memory\n", progname);
                exit(1);
        }
        srandom(getpid() * time(NULL));
        for (i = 0; i < pagesize; i++)
                p[i] = random();

		copyin2_addr = p;
        return p;
}

static FILE *pcfile;

static void *page_cache_alloc(void)
{
	char c, *p;
	int i;

	pcfile = tmpfile();
	for (i = 0; i < pagesize; i++) {
		c = random();
		fputc(c, pcfile);
	}
	fflush(pcfile);

	p = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED, fileno(pcfile), 0);
	if (p == NULL) {
		fprintf(stderr, "%s: cannot mmap tmpfile\n", progname);
		exit(1);
	}
	*p = random();

	return p + pagesize / 4;
}

static void *mlock_data_alloc(void)
{
	char	*p = mmap(NULL, pagesize, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1, 0);
	int	i;

	if (p == NULL) {
		fprintf(stderr, "%s: cannot allocate memory\n", progname);
		exit(1);
	}
	srandom(getpid() * time(NULL));
	for (i = 0; i < pagesize; i++)
		p[i] = random();
	if (mlock(p, pagesize) == -1) {
		fprintf(stderr, "%s: cannot mlock(2) memory\n", progname);
		exit(1);
	}
	return p + pagesize / 4;
}

static void *instr_alloc(void)
{
	char	*p = (char *)dosums;

	p += 2 * pagesize;

	return (void *)((long)p & ~(pagesize - 1));
}

int trigger_single(char *addr)
{
	return addr[0];
}

int trigger_double(char *addr)
{
	return addr[0] + addr[1];
}

int trigger_split(char *addr)
{
	long *a = (long *)(addr - 1);

	return a[0];
}

int trigger_write(char *addr)
{
	addr[0] = 'a';
	return 0;
}

/*
 * parameters to the memcpy and copyin tests.
 */
int memcpy_runup = 0;	/* how much to copy before hitting poison */
int memcpy_size = 512;	/* Total amount to copy */
int memcpy_align = 0;	/* Relative alignment of src/dst */

/* argument is "runup:size:align" */
void parse_memcpy(char *arg)
{
	char *endp;

	memcpy_runup = strtol(arg, &endp, 0);
	if (*endp != ':')
		show_help();
	memcpy_size = strtol(endp + 1, &endp, 0);
	if (*endp != ':')
		show_help();
	memcpy_align = strtol(endp + 1, &endp, 0);
	if (*endp != '\0')
		show_help();
	if (memcpy_runup < 0 || memcpy_runup > pagesize / 4) {
		fprintf(stderr, "%s: runup out of range\n", progname);
		exit(1);
	}
	if (memcpy_size < 0 || memcpy_size > pagesize / 4) {
		fprintf(stderr, "%s: size out of range\n", progname);
		exit(1);
	}
	if (memcpy_runup > memcpy_size) {
		fprintf(stderr, "%s: runup must be less than size\n", progname);
		exit(1);
	}
	if (memcpy_align < 0 || memcpy_align >= CACHE_LINE_SIZE) {
		fprintf(stderr, "%s: bad alignment\n", progname);
		exit(1);
	}
}

int trigger_memcpy(char *addr)
{
	char *src = addr - memcpy_runup;
	char *dst = addr + pagesize / 2;

	dst -= memcpy_align;
	do_memcpy(dst, src, memcpy_size);
	return 0;
}

int trigger_copyin(char *addr)
{
	int	fd, ret;
	char	filename[] = "/tmp/einj-XXXXXX";

	if ((fd = mkstemp(filename)) == -1) {
		fprintf(stderr, "%s: couldn't make temp file\n", progname);
		return -1;
	}
	(void)unlink(filename);
	if ((ret = write(fd, addr - memcpy_runup, memcpy_size) != memcpy_size)) {
		if (ret == -1)
			fprintf(stderr, "%s: couldn't write temp file (errno=%d)\n", progname, errno);
		else
			fprintf(stderr, "%s: short (%d bytes) write to temp file\n", ret, progname);
	}
	close(fd);
	return 0;
}

int trigger_copyout(char *addr)
{
	char *buf = malloc(pagesize);
	int ret;

	if (buf == NULL) {
		fprintf(stderr, "%s: couldn't allocate memory\n", progname);
		return -1;
	}
	rewind(pcfile);
	ret = fread(buf, 1, pagesize, pcfile);
	fprintf(stderr, "%s: read returned %d\n", progname);

	return 0;
}

int trigger_patrol(char *addr)
{
	sleep(1);
}

int trigger_llc(char *addr)
{
	asm volatile("clflush %0" : "+m" (*addr));
}

int trigger_instr(char *addr)
{
	int ret = dosums();

	if (ret != 729)
		printf("Corruption during instruction fault recovery (%d)\n", ret);

	return ret;
}

/* attributes of the test and which events will follow our trigger */
#define	F_MCE		1
#define	F_CMCI		2
#define F_SIGBUS	4
#define	F_FATAL		8

struct test {
	char	*testname;
	char	*testhelp;
	void	*(*alloc)(void);
	void	(*inject)(unsigned long long, int);
	int	notrigger;
	int	(*trigger)(char *);
	int	flags;
} tests[] = {
	{
		"single", "Single read in pipeline to target address, generates SRAR machine check",
		data_alloc, inject_uc, 1, trigger_single, F_MCE|F_CMCI|F_SIGBUS,
	},
	{
		"double", "Double read in pipeline to target address, generates SRAR machine check",
		data_alloc, inject_uc, 1, trigger_double, F_MCE|F_CMCI|F_SIGBUS,
	},
	{
		"split", "Unaligned read crosses cacheline from good to bad. Probably fatal",
		data_alloc, inject_uc, 1, trigger_split, F_MCE|F_CMCI|F_SIGBUS|F_FATAL,
	},
	{
		"THP", "Try to inject in transparent huge page, generates SRAR machine check",
		thp_data_alloc, inject_uc, 1, trigger_single, F_MCE|F_CMCI|F_SIGBUS,
	},
	{
		"store", "Write to target address. Should generate a UCNA/CMCI",
		data_alloc, inject_uc, 1, trigger_write, F_CMCI,
	},
	{
		"memcpy", "Streaming read from target address. Probably fatal",
		data_alloc, inject_uc, 1, trigger_memcpy, F_MCE|F_CMCI|F_SIGBUS|F_FATAL,
	},
	{
		"instr", "Instruction fetch. Generates SRAR that OS should transparently fix",
		instr_alloc, inject_uc, 1, trigger_instr, F_MCE|F_CMCI,
	},
	{
		"patrol", "Patrol scrubber, generates SRAO machine check",
		data_alloc, inject_uc, 0, trigger_patrol, F_MCE,
	},
	{
		"llc", "Cache write-back, generates SRAO machine check",
		data_alloc, inject_llc, 1, trigger_llc, F_MCE,
	},
	{
		"copyin", "Kernel copies data from user. Probably fatal",
		data_alloc, inject_uc, 1, trigger_copyin, F_MCE|F_CMCI|F_SIGBUS|F_FATAL,
	},
	{
		"copyin2", "Recover form Kernel copies data from user, user page already poisoned!",
		data_alloc_copyin2, inject_madvise, 1, trigger_copyin, F_MCE|F_SIGBUS,
	},
	{
		"copyout", "Kernel copies data to user. Probably fatal",
		page_cache_alloc, inject_uc, 1, trigger_copyout, F_MCE|F_SIGBUS|F_CMCI|F_FATAL,
	},
	{
		"mlock", "mlock target page then inject/read to generates SRAR machine check",
		mlock_data_alloc, inject_uc, 1, trigger_single, F_MCE|F_CMCI|F_SIGBUS,
	},
	{ NULL }
};

static void show_help(void)
{
	struct test *t;

	printf("Usage: %s [-a][-c count][-d delay][-f] [-m runup:size:align][testname]\n", progname);
	printf("  %-8s %-5s %s\n", "Testname", "Fatal", "Description");
	for (t = tests; t->testname; t++)
		printf("  %-8s %-5s %s\n", t->testname,
			(t->flags & F_FATAL) ? "YES" : "no",
			t->testhelp);
	exit(0);
}

static struct test *lookup_test(char *s)
{
	struct test *t;

	for (t = tests; t->testname; t++)
		if (strcmp(s, t->testname) == 0)
			return t;
	fprintf(stderr, "%s: unknown test '%s'\n", progname, s);
	exit(1);
}

static struct test *next_test(struct test *t)
{
	t++;
	if (t->testname == NULL)
		t = tests;
	return t;
}

static jmp_buf env;

static void recover(int sig, siginfo_t *si, void *v)
{
	printf("SIGBUS: addr = %p\n", si->si_addr);
	siglongjmp(env, 1);
}

struct sigaction recover_act = {
	.sa_sigaction = recover,
	.sa_flags = SA_SIGINFO,
};

int main(int argc, char **argv)
{
	int c, i;
	int	count = 1, cmci_wait_count = 0;
	double	delay = 1.0;
	struct test *t;
	void	*vaddr;
	long long paddr;
	long	b_mce, b_cmci, a_mce, a_cmci;
	struct timeval t1, t2;

	progname = argv[0];
	pagesize = getpagesize();

	while ((c = getopt(argc, argv, "ac:d:fhm:")) != -1) switch (c) {
	case 'a':
		all_flag = 1;
		break;
	case 'c':
		count = strtol(optarg, NULL, 0);
		break;
	case 'd':
		delay = strtod(optarg, NULL);
		break;
	case 'f':
		force_flag = 1;
		break;
	case 'm':
		parse_memcpy(optarg);
		break;
	case 'h': case '?':
		show_help();
		break;
	}

	check_configuration();

	if (optind < argc)
		t = lookup_test(argv[optind]);
	else
		t = tests;

	if ((t->flags & F_FATAL) && !force_flag) {
		fprintf(stderr, "%s: selected test may be fatal. Use '-f' flag if you really want to do this\n", progname);
		exit(1);
	}

	sigaction(SIGBUS, &recover_act, NULL);

	for (i = 0; i < count; i++) {
		cmci_wait_count = 0;
		vaddr = t->alloc();
		paddr = vtop((long long)vaddr);
		printf("%d: %-8s vaddr = %p paddr = %llx\n", i, t->testname, vaddr, paddr);

		proc_interrupts(&b_mce, &b_cmci);
		gettimeofday(&t1, NULL);
		if (sigsetjmp(env, 1)) {
			if ((t->flags & F_SIGBUS) == 0) {
				printf("Unexpected SIGBUS\n");
			}
		} else {
			t->inject(paddr, t->notrigger);
			t->trigger(vaddr);
			if (t->flags & F_SIGBUS) {
				printf("Expected SIGBUS, didn't get one\n");
			}
		}

		if (pcfile) {
			fclose(pcfile);
			pcfile = NULL;
		}

		/* if system didn't already take page offline, ask it to do so now */
		if (paddr == vtop((long long)vaddr)) {
			printf("Manually take page offline\n");
			wfile("/sys/devices/system/memory/hard_offline_page", paddr);
		}

		/* Give system a chance to process on possibly deep C-state idle cpus */
		usleep(100);

		proc_interrupts(&a_mce, &a_cmci);

		if (t->flags & F_FATAL) {
			printf("Big surprise ... still running. Thought that would be fatal\n");
		}

		if (t->flags & F_MCE) {
			if (a_mce == b_mce) {
				printf("Expected MCE, but none seen\n");
			} else if (a_mce == b_mce + 1) {
				printf("Saw local machine check\n");
			} else if (a_mce == b_mce + ncpus) {
				printf("Saw broadcast machine check\n");
			} else {
				printf("Unusual number of MCEs seen: %ld\n", a_mce - b_mce);
			}
		} else {
			if (a_mce != b_mce) {
				printf("Saw %ld unexpected MCEs (%ld systemwide)\n", b_mce - a_mce, (b_mce - a_mce) / ncpus);
			}
		}

		if (t->flags & F_CMCI) {
			while (a_cmci < b_cmci + lcpus_persocket) {
				if (cmci_wait_count > 1000) {
					break;
				}
				usleep(100);
				proc_interrupts(&a_mce, &a_cmci);
				cmci_wait_count++;
			}
			if (cmci_wait_count != 0) {
				gettimeofday(&t2, NULL);
				printf("CMCIs took ~%ld usecs to be reported.\n",
					1000000 * (t2.tv_sec - t1.tv_sec) +
						(t2.tv_usec - t1.tv_usec));
			}
			if (a_cmci == b_cmci) {
				printf("Expected CMCI, but none seen\n");
				printf("Test failed\n");
				return 1;
			} else if (a_cmci < b_cmci + lcpus_persocket) {
				printf("Unusual number of CMCIs seen: %ld\n", a_cmci - b_cmci);
				printf("Test failed\n");
				return 1;
			}
		} else {
			if (a_cmci != b_cmci) {
				printf("Saw %ld unexpected CMCIs (%ld per socket)\n", a_cmci - b_cmci, (a_cmci - b_cmci) / lcpus_persocket);
				printf("Test failed\n");
				return 1;
			}
		}

		usleep((useconds_t)(delay * 1.0e6));
		if (all_flag) {
			t = next_test(t);
			while (t->flags & F_FATAL)
				t = next_test(t);
		}
	}

	printf("Test passed\n");
	return 0;
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.
  2021-01-29  3:33   ` Aili Yao
@ 2021-01-29 22:55     ` Luck, Tony
  2021-02-01  7:21       ` Aili Yao
  0 siblings, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2021-01-29 22:55 UTC (permalink / raw)
  To: Aili Yao; +Cc: x86, naoya.horiguchi, linux-kernel, yangfeng1

Thanks for the explanation and test code. I think I see better what
is going on here.

[I took your idea for using madvise(...MADV_HWPOISON) and added a new "-S"
option to my einj_mem_uc test program to use madvise instead of ACPI/EINJ
for injections. Update pushed here:
	git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git ]

There have been some small changes to arch/x86/mm/fault.c since you wrote
the patch.  Can you rebase to v5.11-rc5?

Also maybe this might be a case to use IS_ENABLED() instead of #ifdef to
make the code a little less ugly. At least for the 2nd hunk in your patch
this would work well:

	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
	    (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)))
		no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
	else
		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);

The first hunk might need a bit more thought.

-Tony

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access.
  2021-01-29 22:55     ` Luck, Tony
@ 2021-02-01  7:21       ` Aili Yao
  0 siblings, 0 replies; 5+ messages in thread
From: Aili Yao @ 2021-02-01  7:21 UTC (permalink / raw)
  To: Luck, Tony; +Cc: x86, naoya.horiguchi, linux-kernel, yangfeng1

On Fri, 29 Jan 2021 14:55:29 -0800
"Luck, Tony" <tony.luck@intel.com> wrote:

> Thanks for the explanation and test code. I think I see better what
> is going on here.
> 
> [I took your idea for using madvise(...MADV_HWPOISON) and added a new "-S"
> option to my einj_mem_uc test program to use madvise instead of ACPI/EINJ
> for injections. Update pushed here:
> 	git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git ]
> 
> There have been some small changes to arch/x86/mm/fault.c since you wrote
> the patch.  Can you rebase to v5.11-rc5?

Yes, I will rebase it to newest version.
 
> Also maybe this might be a case to use IS_ENABLED() instead of #ifdef to
> make the code a little less ugly. At least for the 2nd hunk in your patch
> this would work well:
> 
> 	if (IS_ENABLED(CONFIG_MEMORY_FAILURE) &&
> 	    (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)))
> 		no_context(regs, error_code, address, SIGBUS, BUS_MCEERR_AR);
> 	else
> 		no_context(regs, error_code, address, SIGBUS, BUS_ADRERR);
> 
> The first hunk might need a bit more thought.
> 
Do you mean the force_sig_mceerr and force_sig_fault difference? I see a hwpoison related comment
there, but it's better to follow the usual way force_sig_mceerr, I will modify this in a v2 patch.

Or something other, you may post a better one.

Thanks

-- 
Best Regards!

Aili Yao

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-02-01  7:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-28 11:43 [PATCH] x86/fault: Send SIGBUS to user process always for hwpoison page access Aili Yao
2021-01-28 17:43 ` Luck, Tony
2021-01-29  3:33   ` Aili Yao
2021-01-29 22:55     ` Luck, Tony
2021-02-01  7:21       ` Aili Yao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.