Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
@ 2020-02-10 22:51 Jeff Moyer
  2020-02-11  4:17 ` Justin He
  2020-02-11 14:51 ` Kirill A. Shutemov
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Moyer @ 2020-02-10 22:51 UTC (permalink / raw)
  To: Jia He; +Cc: Catalin Marinas, Kirill A. Shutemov, linux-mm

[-- Attachment #1: Type: text/plain, Size: 4149 bytes --]

Hi,

While running xfstests test generic/437, I noticed that the following
WARN_ON_ONCE inside cow_user_page() was triggered:

	/*
	 * This really shouldn't fail, because the page is there
	 * in the page tables. But it might just be unreadable,
	 * in which case we just give up and fill the result with
	 * zeroes.
	 */
	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
		/*
		 * Give a warn in case there can be some obscure
		 * use-case
		 */
		WARN_ON_ONCE(1);
		clear_page(kaddr);
	}

Just clearing the page, in this case, will result in data corruption.  I
think the assumption that the copy fails because the memory is
inaccessible may be wrong.  In this instance, it may be the other thread
issuing the madvise call?

I reverted the commit in question, and the data corruption is gone.

Below is the (ppc64) stack trace (this will reproduce on x86 as well).
I've attached the reproducer, which is a modified version of the
xfs test.  You'll need to setup a file system on pmem, and mount it with
-o dax.  Then issue "./t_mmap_cow_race /mnt/pmem/foo".

Any help tracking this down is appreciated.

Thanks!
Jeff

[ 3690.894950] run fstests generic/437 at 2020-02-10 13:40:37
[ 3691.173277] ------------[ cut here ]------------
[ 3691.173302] WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317 wp_page_copy+0xc40/0xd50
[ 3691.173308] Modules linked in: ext4 mbcache jbd2 loop rfkill ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp i2c_dev rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core dax_pmem_compat device_dax nd_pmem nd_btt dax_pmem_core of_pmem libnvdimm ses enclosure ipmi_powernv ipmi_devintf sg ibmpowernv xts ipmi_msghandler opal_prd vmx_crypto leds_powernv powernv_op_panel uio_pdrv_genirq uio xfs libcrc32c sd_mod t10_pi mlx5_core ipr mpt3sas libata raid_class tg3 scsi_transport_sas mlxfw dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
[ 3691.173363] CPU: 76 PID: 51024 Comm: t_mmap_cow_race Not tainted 5.5.0+ #4
[ 3691.173369] NIP:  c0000000003df6e0 LR: c0000000003df42c CTR: 0000000000007f10
[ 3691.173375] REGS: c000002ec66fb830 TRAP: 0700   Not tainted  (5.5.0+)
[ 3691.173379] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022244  XER: 20040000
[ 3691.173388] CFAR: c0000000003df458 IRQMASK: 0 
               GPR00: 0000000000000000 c000002ec66fbac0 c000000001746a00 0000000000007f10 
               GPR04: 00007fff976080f0 0000000000007f10 00000000000080f0 0000000000000000 
               GPR08: 0000000000000000 c000000000000000 c000002de046a000 0000000000000030 
               GPR12: 0000000000000040 c000002ffffa9800 0000000014a70278 00007fff981f0000 
               GPR16: 00007fff98f24410 00007fff98f24420 00007fff97600000 0000000014a70270 
               GPR20: 0000000010000fc0 c000002fa974ebf0 00007fff97600000 c0000000017fd9e8 
               GPR24: c0000000017fd960 c000000001775e98 c000002eabcc5a00 c000002fc3b40000 
               GPR28: 0000000000000000 c000002fa974ebf0 c00c00000bf0ed00 c000002ec66fbc10 
[ 3691.173430] NIP [c0000000003df6e0] wp_page_copy+0xc40/0xd50
[ 3691.173434] LR [c0000000003df42c] wp_page_copy+0x98c/0xd50
[ 3691.173438] Call Trace:
[ 3691.173442] [c000002ec66fbac0] [c0000000003df42c] wp_page_copy+0x98c/0xd50 (unreliable)
[ 3691.173448] [c000002ec66fbb80] [c0000000003e3448] do_wp_page+0xd8/0xad0
[ 3691.173454] [c000002ec66fbbd0] [c0000000003e6248] __handle_mm_fault+0x748/0x1b90
[ 3691.173460] [c000002ec66fbcd0] [c0000000003e77b0] handle_mm_fault+0x120/0x1f0
[ 3691.173466] [c000002ec66fbd10] [c000000000086b60] __do_page_fault+0x240/0xd70
[ 3691.173473] [c000002ec66fbde0] [c0000000000876c8] do_page_fault+0x38/0xd0
[ 3691.173480] [c000002ec66fbe20] [c00000000000a888] handle_page_fault+0x10/0x30
[ 3691.173484] Instruction dump:
[ 3691.173488] 38800003 4bfffaf8 60000000 60000000 7f83e378 4800d585 60000000 4bfffcb8 
[ 3691.173496] 7f83e378 4bfa72b5 60000000 4bfffc8c <0fe00000> 3d42ffeb 394acb48 812a0008 
[ 3691.173503] ---[ end trace a8dffbf7f73e8243 ]---


[-- Attachment #2: t_mmap_cow_race.c --]
[-- Type: text/plain, Size: 2494 bytes --]

// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2017 Intel Corporation. */
#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define MiB(a) ((a)*1024*1024)
#define NUM_THREADS 2

void err_exit(char *op)
{
	fprintf(stderr, "%s: %s\n", op, strerror(errno));
	exit(1);
}

void worker_fn(void *ptr)
{
	char *data = (char *)ptr;
	volatile int a;
	volatile int d;
	int i, err;

	for (i = 0; i < 10; i++) {
		a = data[0];
		data[0] = a;

		err = madvise(data, MiB(2), MADV_DONTNEED);
		if (err < 0)
			err_exit("madvise");
		for (i = 0; i < MiB(2); i++) {
			if ((d = data[i]) != 1) {
				fprintf(stderr, "Data corruption!\n");
				fprintf(stderr, "data[%d] = %d\n", i, d);
				exit(1);
			}
		}

		/* Mix up the thread timings to encourage the race. */
		err = usleep(rand() % 100);
		if (err < 0)
			err_exit("usleep");
	}
}

int main(int argc, char *argv[])
{
	pthread_t thread[NUM_THREADS];
	int i, j, fd, err;
	char *data;

	if (argc < 2) {
		printf("Usage: %s <file>\n", basename(argv[0]));
		exit(0);
	}

	fd = open(argv[1], O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
	if (fd < 0)
		err_exit("fd");

	/* This allows us to map a huge page. */
	ftruncate(fd, 0);
	ftruncate(fd, MiB(2));

	/*
	 * First we set up a shared mapping.  Our write will (hopefully) get
	 * the filesystem to give us a 2MiB huge page DAX mapping.  We will
	 * then use this 2MiB page for our private mapping race.
	 */
	data = mmap(NULL, MiB(2), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (data == MAP_FAILED)
		err_exit("shared mmap");

	memset(data, 1, MiB(2));

	err = munmap(data, MiB(2));
	if (err < 0)
		err_exit("shared munmap");

	for (i = 0; i < 500; i++) {
		data = mmap(NULL, MiB(2), PROT_READ|PROT_WRITE, MAP_PRIVATE,
				fd, 0);
		if (data == MAP_FAILED)
			err_exit("private mmap");

		for (j = 0; j < NUM_THREADS; j++) {
			err = pthread_create(&thread[j], NULL,
					(void*)&worker_fn, data);
			if (err)
				err_exit("pthread_create");
		}

		for (j = 0; j < NUM_THREADS; j++) {
			err = pthread_join(thread[j], NULL);
			if (err)
				err_exit("pthread_join");
		}

		err = munmap(data, MiB(2));
		if (err < 0)
			err_exit("private munmap");
	}

	err = close(fd);
	if (err < 0)
		err_exit("close");

	return 0;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-10 22:51 bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared") Jeff Moyer
@ 2020-02-11  4:17 ` Justin He
  2020-02-11  4:29   ` Justin He
  2020-02-11 14:51 ` Kirill A. Shutemov
  1 sibling, 1 reply; 15+ messages in thread
From: Justin He @ 2020-02-11  4:17 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Catalin Marinas, Kirill A.Shutemov, linux-mm

Hi Jeff

> -----Original Message-----
> From: Jeff Moyer <jmoyer@redhat.com>
> Sent: Tuesday, February 11, 2020 6:52 AM
> To: Justin He <Justin.He@arm.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Kirill A.Shutemov
> <kirill.shutemov@linux.intel.com>; linux-mm@kvack.org
> Subject: bug: data corruption introduced by commit 83d116c53058 ("mm: fix
> double page fault on arm64 if PTE_AF is cleared")
>
> Hi,
>
> While running xfstests test generic/437, I noticed that the following
> WARN_ON_ONCE inside cow_user_page() was triggered:
>
> /*
>  * This really shouldn't fail, because the page is there
>  * in the page tables. But it might just be unreadable,
>  * in which case we just give up and fill the result with
>  * zeroes.
>  */
> if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
> /*
>  * Give a warn in case there can be some obscure
>  * use-case
>  */
> WARN_ON_ONCE(1);
> clear_page(kaddr);
> }
>
> Just clearing the page, in this case, will result in data corruption.  I
> think the assumption that the copy fails because the memory is
> inaccessible may be wrong.  In this instance, it may be the other thread
> issuing the madvise call?
>
> I reverted the commit in question, and the data corruption is gone.
>

Thanks for the report. But this commit 83d116c53058 doesn't add the
new clear_page code path. Besides the pte_mkyoung part, It just refines
the codes(no functional change) and add a WARN_ON_ONCE to indicate
there is any obscure case before.

--
Cheers,
Justin (Jia He)


> Below is the (ppc64) stack trace (this will reproduce on x86 as well).
> I've attached the reproducer, which is a modified version of the
> xfs test.  You'll need to setup a file system on pmem, and mount it with
> -o dax.  Then issue "./t_mmap_cow_race /mnt/pmem/foo".
>
> Any help tracking this down is appreciated.
>
> Thanks!
> Jeff
>
> [ 3690.894950] run fstests generic/437 at 2020-02-10 13:40:37
> [ 3691.173277] ------------[ cut here ]------------
> [ 3691.173302] WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317
> wp_page_copy+0xc40/0xd50
> [ 3691.173308] Modules linked in: ext4 mbcache jbd2 loop rfkill ib_isert
> iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp i2c_dev
> rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm iw_cm libiscsi
> scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core
> dax_pmem_compat device_dax nd_pmem nd_btt dax_pmem_core of_pmem
> libnvdimm ses enclosure ipmi_powernv ipmi_devintf sg ibmpowernv xts
> ipmi_msghandler opal_prd vmx_crypto leds_powernv powernv_op_panel
> uio_pdrv_genirq uio xfs libcrc32c sd_mod t10_pi mlx5_core ipr mpt3sas libata
> raid_class tg3 scsi_transport_sas mlxfw dm_mirror dm_region_hash dm_log
> dm_mod [last unloaded: scsi_debug]
> [ 3691.173363] CPU: 76 PID: 51024 Comm: t_mmap_cow_race Not tainted
> 5.5.0+ #4
> [ 3691.173369] NIP:  c0000000003df6e0 LR: c0000000003df42c CTR:
> 0000000000007f10
> [ 3691.173375] REGS: c000002ec66fb830 TRAP: 0700   Not tainted  (5.5.0+)
> [ 3691.173379] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR:
> 24022244  XER: 20040000
> [ 3691.173388] CFAR: c0000000003df458 IRQMASK: 0
>                GPR00: 0000000000000000 c000002ec66fbac0 c000000001746a00
> 0000000000007f10
>                GPR04: 00007fff976080f0 0000000000007f10 00000000000080f0
> 0000000000000000
>                GPR08: 0000000000000000 c000000000000000 c000002de046a000
> 0000000000000030
>                GPR12: 0000000000000040 c000002ffffa9800 0000000014a70278
> 00007fff981f0000
>                GPR16: 00007fff98f24410 00007fff98f24420 00007fff97600000
> 0000000014a70270
>                GPR20: 0000000010000fc0 c000002fa974ebf0 00007fff97600000
> c0000000017fd9e8
>                GPR24: c0000000017fd960 c000000001775e98 c000002eabcc5a00
> c000002fc3b40000
>                GPR28: 0000000000000000 c000002fa974ebf0 c00c00000bf0ed00
> c000002ec66fbc10
> [ 3691.173430] NIP [c0000000003df6e0] wp_page_copy+0xc40/0xd50
> [ 3691.173434] LR [c0000000003df42c] wp_page_copy+0x98c/0xd50
> [ 3691.173438] Call Trace:
> [ 3691.173442] [c000002ec66fbac0] [c0000000003df42c]
> wp_page_copy+0x98c/0xd50 (unreliable)
> [ 3691.173448] [c000002ec66fbb80] [c0000000003e3448]
> do_wp_page+0xd8/0xad0
> [ 3691.173454] [c000002ec66fbbd0] [c0000000003e6248]
> __handle_mm_fault+0x748/0x1b90
> [ 3691.173460] [c000002ec66fbcd0] [c0000000003e77b0]
> handle_mm_fault+0x120/0x1f0
> [ 3691.173466] [c000002ec66fbd10] [c000000000086b60]
> __do_page_fault+0x240/0xd70
> [ 3691.173473] [c000002ec66fbde0] [c0000000000876c8]
> do_page_fault+0x38/0xd0
> [ 3691.173480] [c000002ec66fbe20] [c00000000000a888]
> handle_page_fault+0x10/0x30
> [ 3691.173484] Instruction dump:
> [ 3691.173488] 38800003 4bfffaf8 60000000 60000000 7f83e378 4800d585
> 60000000 4bfffcb8
> [ 3691.173496] 7f83e378 4bfa72b5 60000000 4bfffc8c <0fe00000> 3d42ffeb
> 394acb48 812a0008
> [ 3691.173503] ---[ end trace a8dffbf7f73e8243 ]---

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11  4:17 ` Justin He
@ 2020-02-11  4:29   ` Justin He
  2020-02-11 16:44     ` Jeff Moyer
  0 siblings, 1 reply; 15+ messages in thread
From: Justin He @ 2020-02-11  4:29 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Catalin Marinas, Kirill A.Shutemov, linux-mm

Hi Jeff

> -----Original Message-----
> From: Justin He
> Sent: Tuesday, February 11, 2020 12:18 PM
> To: Jeff Moyer <jmoyer@redhat.com>
> Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Kirill A.Shutemov
> <kirill.shutemov@linux.intel.com>; linux-mm@kvack.org
> Subject: RE: bug: data corruption introduced by commit 83d116c53058 ("mm:
> fix double page fault on arm64 if PTE_AF is cleared")
>
> Hi Jeff
>
> > -----Original Message-----
> > From: Jeff Moyer <jmoyer@redhat.com>
> > Sent: Tuesday, February 11, 2020 6:52 AM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Catalin Marinas <Catalin.Marinas@arm.com>; Kirill A.Shutemov
> > <kirill.shutemov@linux.intel.com>; linux-mm@kvack.org
> > Subject: bug: data corruption introduced by commit 83d116c53058 ("mm: fix
> > double page fault on arm64 if PTE_AF is cleared")
> >
> > Hi,
> >
> > While running xfstests test generic/437, I noticed that the following
> > WARN_ON_ONCE inside cow_user_page() was triggered:
> >
> > /*
> >  * This really shouldn't fail, because the page is there
> >  * in the page tables. But it might just be unreadable,
> >  * in which case we just give up and fill the result with
> >  * zeroes.
> >  */
> > if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
> > /*
> >  * Give a warn in case there can be some obscure
> >  * use-case
> >  */
> > WARN_ON_ONCE(1);
> > clear_page(kaddr);
> > }
> >
> > Just clearing the page, in this case, will result in data corruption.  I
> > think the assumption that the copy fails because the memory is
> > inaccessible may be wrong.  In this instance, it may be the other thread
> > issuing the madvise call?
> >
> > I reverted the commit in question, and the data corruption is gone.
> >
>
> Thanks for the report. But this commit 83d116c53058 doesn't add the
> new clear_page code path. Besides the pte_mkyoung part, It just refines
> the codes(no functional change) and add a WARN_ON_ONCE to indicate
> there is any obscure case before.

I can't reproduce it with your provided test file on my arm64 qemu with
a pmem device.
Could you do me a favor that just revert 83d116c53058 but keep that
WARN_ON_ONCE after clear_page()? Is there any difference?
Thanks for your help

--
Cheers,
Justin (Jia He)


>
> --
> Cheers,
> Justin (Jia He)
>
>
> > Below is the (ppc64) stack trace (this will reproduce on x86 as well).
> > I've attached the reproducer, which is a modified version of the
> > xfs test.  You'll need to setup a file system on pmem, and mount it with
> > -o dax.  Then issue "./t_mmap_cow_race /mnt/pmem/foo".
> >
> > Any help tracking this down is appreciated.
> >
> > Thanks!
> > Jeff
> >
> > [ 3690.894950] run fstests generic/437 at 2020-02-10 13:40:37
> > [ 3691.173277] ------------[ cut here ]------------
> > [ 3691.173302] WARNING: CPU: 76 PID: 51024 at mm/memory.c:2317
> > wp_page_copy+0xc40/0xd50
> > [ 3691.173308] Modules linked in: ext4 mbcache jbd2 loop rfkill ib_isert
> > iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp i2c_dev
> > rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm iw_cm libiscsi
> > scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core
> > dax_pmem_compat device_dax nd_pmem nd_btt dax_pmem_core
> of_pmem
> > libnvdimm ses enclosure ipmi_powernv ipmi_devintf sg ibmpowernv xts
> > ipmi_msghandler opal_prd vmx_crypto leds_powernv powernv_op_panel
> > uio_pdrv_genirq uio xfs libcrc32c sd_mod t10_pi mlx5_core ipr mpt3sas
> libata
> > raid_class tg3 scsi_transport_sas mlxfw dm_mirror dm_region_hash dm_log
> > dm_mod [last unloaded: scsi_debug]
> > [ 3691.173363] CPU: 76 PID: 51024 Comm: t_mmap_cow_race Not tainted
> > 5.5.0+ #4
> > [ 3691.173369] NIP:  c0000000003df6e0 LR: c0000000003df42c CTR:
> > 0000000000007f10
> > [ 3691.173375] REGS: c000002ec66fb830 TRAP: 0700   Not tainted  (5.5.0+)
> > [ 3691.173379] MSR:  9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR:
> > 24022244  XER: 20040000
> > [ 3691.173388] CFAR: c0000000003df458 IRQMASK: 0
> >                GPR00: 0000000000000000 c000002ec66fbac0 c000000001746a00
> > 0000000000007f10
> >                GPR04: 00007fff976080f0 0000000000007f10 00000000000080f0
> > 0000000000000000
> >                GPR08: 0000000000000000 c000000000000000 c000002de046a000
> > 0000000000000030
> >                GPR12: 0000000000000040 c000002ffffa9800 0000000014a70278
> > 00007fff981f0000
> >                GPR16: 00007fff98f24410 00007fff98f24420 00007fff97600000
> > 0000000014a70270
> >                GPR20: 0000000010000fc0 c000002fa974ebf0 00007fff97600000
> > c0000000017fd9e8
> >                GPR24: c0000000017fd960 c000000001775e98 c000002eabcc5a00
> > c000002fc3b40000
> >                GPR28: 0000000000000000 c000002fa974ebf0 c00c00000bf0ed00
> > c000002ec66fbc10
> > [ 3691.173430] NIP [c0000000003df6e0] wp_page_copy+0xc40/0xd50
> > [ 3691.173434] LR [c0000000003df42c] wp_page_copy+0x98c/0xd50
> > [ 3691.173438] Call Trace:
> > [ 3691.173442] [c000002ec66fbac0] [c0000000003df42c]
> > wp_page_copy+0x98c/0xd50 (unreliable)
> > [ 3691.173448] [c000002ec66fbb80] [c0000000003e3448]
> > do_wp_page+0xd8/0xad0
> > [ 3691.173454] [c000002ec66fbbd0] [c0000000003e6248]
> > __handle_mm_fault+0x748/0x1b90
> > [ 3691.173460] [c000002ec66fbcd0] [c0000000003e77b0]
> > handle_mm_fault+0x120/0x1f0
> > [ 3691.173466] [c000002ec66fbd10] [c000000000086b60]
> > __do_page_fault+0x240/0xd70
> > [ 3691.173473] [c000002ec66fbde0] [c0000000000876c8]
> > do_page_fault+0x38/0xd0
> > [ 3691.173480] [c000002ec66fbe20] [c00000000000a888]
> > handle_page_fault+0x10/0x30
> > [ 3691.173484] Instruction dump:
> > [ 3691.173488] 38800003 4bfffaf8 60000000 60000000 7f83e378 4800d585
> > 60000000 4bfffcb8
> > [ 3691.173496] 7f83e378 4bfa72b5 60000000 4bfffc8c <0fe00000> 3d42ffeb
> > 394acb48 812a0008
> > [ 3691.173503] ---[ end trace a8dffbf7f73e8243 ]---

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-10 22:51 bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared") Jeff Moyer
  2020-02-11  4:17 ` Justin He
@ 2020-02-11 14:51 ` Kirill A. Shutemov
  2020-02-11 16:27   ` Jeff Moyer
  1 sibling, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-11 14:51 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

On Mon, Feb 10, 2020 at 05:51:50PM -0500, Jeff Moyer wrote:
> Hi,
> 
> While running xfstests test generic/437, I noticed that the following
> WARN_ON_ONCE inside cow_user_page() was triggered:
> 
> 	/*
> 	 * This really shouldn't fail, because the page is there
> 	 * in the page tables. But it might just be unreadable,
> 	 * in which case we just give up and fill the result with
> 	 * zeroes.
> 	 */
> 	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
> 		/*
> 		 * Give a warn in case there can be some obscure
> 		 * use-case
> 		 */
> 		WARN_ON_ONCE(1);
> 		clear_page(kaddr);
> 	}
> 
> Just clearing the page, in this case, will result in data corruption.  I
> think the assumption that the copy fails because the memory is
> inaccessible may be wrong.  In this instance, it may be the other thread
> issuing the madvise call?
> 
> I reverted the commit in question, and the data corruption is gone.
> 
> Below is the (ppc64) stack trace (this will reproduce on x86 as well).
> I've attached the reproducer, which is a modified version of the
> xfs test.  You'll need to setup a file system on pmem, and mount it with
> -o dax.  Then issue "./t_mmap_cow_race /mnt/pmem/foo".
> 
> Any help tracking this down is appreciated.

My guess is that MADV_DONTNEED get the page unmapped under you and
__copy_from_user_inatomic() sees empty PTE instead of the populated PTE it
expects.

Below is my completely untested attempt to fix it.

It is going to hurt perfomance in common case, but it should be good
enough to test my idea.

The real solution would be to retry __copy_from_user_inatomic() under ptl
if the first attempt fails. I expect it to be ugly.

diff --git a/mm/memory.c b/mm/memory.c
index 0bccc622e482..362a791f47fd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2257,7 +2257,6 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	bool ret;
 	void *kaddr;
 	void __user *uaddr;
-	bool force_mkyoung;
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long addr = vmf->address;
@@ -2278,27 +2277,18 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	kaddr = kmap_atomic(dst);
 	uaddr = (void __user *)(addr & PAGE_MASK);
 
+	vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+	if (!pte_same(*vmf->pte, vmf->orig_pte)) {
+		ret = false;
+		goto pte_unlock;
+	}
+
 	/*
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
-	if (force_mkyoung) {
-		pte_t entry;
-
-		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
-		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
-			/*
-			 * Other thread has already handled the fault
-			 * and we don't need to do anything. If it's
-			 * not the case, the fault will be triggered
-			 * again on the same address.
-			 */
-			ret = false;
-			goto pte_unlock;
-		}
-
-		entry = pte_mkyoung(vmf->orig_pte);
+	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
+		pte_t entry = pte_mkyoung(vmf->orig_pte);
 		if (ptep_set_access_flags(vma, addr, vmf->pte, entry, 0))
 			update_mmu_cache(vma, addr, vmf->pte);
 	}
@@ -2321,8 +2311,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	ret = true;
 
 pte_unlock:
-	if (force_mkyoung)
-		pte_unmap_unlock(vmf->pte, vmf->ptl);
+	pte_unmap_unlock(vmf->pte, vmf->ptl);
 	kunmap_atomic(kaddr);
 	flush_dcache_page(dst);
 
-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 14:51 ` Kirill A. Shutemov
@ 2020-02-11 16:27   ` Jeff Moyer
  2020-02-11 22:40     ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-02-11 16:27 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

Hi, Kirill,

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> My guess is that MADV_DONTNEED get the page unmapped under you and
> __copy_from_user_inatomic() sees empty PTE instead of the populated PTE it
> expects.
>
> Below is my completely untested attempt to fix it.
>
> It is going to hurt perfomance in common case, but it should be good
> enough to test my idea.

Yes, that resolves the issue for me.

> The real solution would be to retry __copy_from_user_inatomic() under ptl
> if the first attempt fails. I expect it to be ugly.

So long as it's correct.  :)

Thanks!
Jeff



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11  4:29   ` Justin He
@ 2020-02-11 16:44     ` Jeff Moyer
  2020-02-11 17:33       ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-02-11 16:44 UTC (permalink / raw)
  To: Justin He; +Cc: Catalin Marinas, Kirill A.Shutemov, linux-mm\

Hi, Justin,

Justin He <Justin.He@arm.com> writes:
>> Thanks for the report. But this commit 83d116c53058 doesn't add the
>> new clear_page code path. Besides the pte_mkyoung part, It just refines
>> the codes(no functional change) and add a WARN_ON_ONCE to indicate
>> there is any obscure case before.
>
> I can't reproduce it with your provided test file on my arm64 qemu with
> a pmem device.
> Could you do me a favor that just revert 83d116c53058 but keep that
> WARN_ON_ONCE after clear_page()? Is there any difference?
> Thanks for your help

Below is the patch I used to put the WARN_ON_ONCE after the clear_page,
just to be sure that's what you intended.  So with 83d116c53058
reverted, and the below patch applied, the WARN_ON_ONCE does not
trigger.

-Jeff

diff --git a/mm/memory.c b/mm/memory.c
index 3bab0d3976ea..3fea34375c7f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2259,8 +2259,10 @@ static inline void cow_user_page(struct page *dst, struct page *src, unsigned lo
 		 * in which case we just give up and fill the result with
 		 * zeroes.
 		 */
-		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE))
+		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+			WARN_ON_ONCE(1);
 			clear_page(kaddr);
+		}
 		kunmap_atomic(kaddr);
 		flush_dcache_page(dst);
 	} else



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 16:44     ` Jeff Moyer
@ 2020-02-11 17:33       ` Kirill A. Shutemov
  2020-02-11 17:55         ` Jeff Moyer
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-11 17:33 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Justin He, Catalin Marinas, Kirill A.Shutemov, linux-mm

On Tue, Feb 11, 2020 at 11:44:06AM -0500, Jeff Moyer wrote:
> Hi, Justin,
> 
> Justin He <Justin.He@arm.com> writes:
> >> Thanks for the report. But this commit 83d116c53058 doesn't add the
> >> new clear_page code path. Besides the pte_mkyoung part, It just refines
> >> the codes(no functional change) and add a WARN_ON_ONCE to indicate
> >> there is any obscure case before.
> >
> > I can't reproduce it with your provided test file on my arm64 qemu with
> > a pmem device.
> > Could you do me a favor that just revert 83d116c53058 but keep that
> > WARN_ON_ONCE after clear_page()? Is there any difference?
> > Thanks for your help
> 
> Below is the patch I used to put the WARN_ON_ONCE after the clear_page,
> just to be sure that's what you intended.  So with 83d116c53058
> reverted, and the below patch applied, the WARN_ON_ONCE does not
> trigger.

I cannot explain this. There is no locking to prevent the same scenario
before. It might be an timing difference.

Could try to put a delay before the copy to make race window larger?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 17:33       ` Kirill A. Shutemov
@ 2020-02-11 17:55         ` Jeff Moyer
  2020-02-11 21:44           ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-02-11 17:55 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Justin He, Catalin Marinas, Kirill A.Shutemov, linux-mm\

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Tue, Feb 11, 2020 at 11:44:06AM -0500, Jeff Moyer wrote:
>> Hi, Justin,
>> 
>> Justin He <Justin.He@arm.com> writes:
>> >> Thanks for the report. But this commit 83d116c53058 doesn't add the
>> >> new clear_page code path. Besides the pte_mkyoung part, It just refines
>> >> the codes(no functional change) and add a WARN_ON_ONCE to indicate
>> >> there is any obscure case before.
>> >
>> > I can't reproduce it with your provided test file on my arm64 qemu with
>> > a pmem device.
>> > Could you do me a favor that just revert 83d116c53058 but keep that
>> > WARN_ON_ONCE after clear_page()? Is there any difference?
>> > Thanks for your help
>> 
>> Below is the patch I used to put the WARN_ON_ONCE after the clear_page,
>> just to be sure that's what you intended.  So with 83d116c53058
>> reverted, and the below patch applied, the WARN_ON_ONCE does not
>> trigger.
>
> I cannot explain this. There is no locking to prevent the same scenario
> before. It might be an timing difference.
>
> Could try to put a delay before the copy to make race window larger?

I reverted my change to the reproducer, and now it triggers the warning.

-Jeff



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 17:55         ` Jeff Moyer
@ 2020-02-11 21:44           ` Kirill A. Shutemov
  2020-02-11 22:01             ` Jeff Moyer
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-11 21:44 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Justin He, Catalin Marinas, Kirill A.Shutemov, linux-mm

On Tue, Feb 11, 2020 at 12:55:50PM -0500, Jeff Moyer wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Tue, Feb 11, 2020 at 11:44:06AM -0500, Jeff Moyer wrote:
> >> Hi, Justin,
> >> 
> >> Justin He <Justin.He@arm.com> writes:
> >> >> Thanks for the report. But this commit 83d116c53058 doesn't add the
> >> >> new clear_page code path. Besides the pte_mkyoung part, It just refines
> >> >> the codes(no functional change) and add a WARN_ON_ONCE to indicate
> >> >> there is any obscure case before.
> >> >
> >> > I can't reproduce it with your provided test file on my arm64 qemu with
> >> > a pmem device.
> >> > Could you do me a favor that just revert 83d116c53058 but keep that
> >> > WARN_ON_ONCE after clear_page()? Is there any difference?
> >> > Thanks for your help
> >> 
> >> Below is the patch I used to put the WARN_ON_ONCE after the clear_page,
> >> just to be sure that's what you intended.  So with 83d116c53058
> >> reverted, and the below patch applied, the WARN_ON_ONCE does not
> >> trigger.
> >
> > I cannot explain this. There is no locking to prevent the same scenario
> > before. It might be an timing difference.
> >
> > Could try to put a delay before the copy to make race window larger?
> 
> I reverted my change to the reproducer, and now it triggers the warning.

I'm not sure I follow.

My understanding is that you failed to reproduce the issue with
83d116c53058 reverted and WARN_ON_ONCE() placed.

My ask was to try to put some mdelay() just before
__copy_from_user_inatomic(). The mdelay() may help with reproducing the
issue on the old code.

If the bug still fails to reproduce I may misunderstand the source of the
bug and need to look further.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 21:44           ` Kirill A. Shutemov
@ 2020-02-11 22:01             ` Jeff Moyer
  2020-02-11 22:15               ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-02-11 22:01 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Justin He, Catalin Marinas, Kirill A.Shutemov, linux-mm\

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Tue, Feb 11, 2020 at 12:55:50PM -0500, Jeff Moyer wrote:
>> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>> 
>> > On Tue, Feb 11, 2020 at 11:44:06AM -0500, Jeff Moyer wrote:
>> >> Hi, Justin,
>> >> 
>> >> Justin He <Justin.He@arm.com> writes:
>> >> >> Thanks for the report. But this commit 83d116c53058 doesn't add the
>> >> >> new clear_page code path. Besides the pte_mkyoung part, It just refines
>> >> >> the codes(no functional change) and add a WARN_ON_ONCE to indicate
>> >> >> there is any obscure case before.
>> >> >
>> >> > I can't reproduce it with your provided test file on my arm64 qemu with
>> >> > a pmem device.
>> >> > Could you do me a favor that just revert 83d116c53058 but keep that
>> >> > WARN_ON_ONCE after clear_page()? Is there any difference?
>> >> > Thanks for your help
>> >> 
>> >> Below is the patch I used to put the WARN_ON_ONCE after the clear_page,
>> >> just to be sure that's what you intended.  So with 83d116c53058
>> >> reverted, and the below patch applied, the WARN_ON_ONCE does not
>> >> trigger.
>> >
>> > I cannot explain this. There is no locking to prevent the same scenario
>> > before. It might be an timing difference.
>> >
>> > Could try to put a delay before the copy to make race window larger?
>> 
>> I reverted my change to the reproducer, and now it triggers the warning.
>
> I'm not sure I follow.
>
> My understanding is that you failed to reproduce the issue with
> 83d116c53058 reverted and WARN_ON_ONCE() placed.

I failed to reproduce the issue with the test code I provided in this
email thread.  However, if I simply use the original t_mmap_cow_race
from xfstests, I can trigger the WARN_ON_ONCE.  There is no need to
insert a delay in the kernel.

Does that make sense?

> My ask was to try to put some mdelay() just before
> __copy_from_user_inatomic(). The mdelay() may help with reproducing the
> issue on the old code.
>
> If the bug still fails to reproduce I may misunderstand the source of the
> bug and need to look further.

I understand your request.  Inserting a udelay(10) with the code I
provided does not trigger the warning.  However, see above.

I'm including the unmodified t_mmap_cow_race.c code here for your
convenience.  This is the code that triggers the warning with
83d116c53058 reverted, and the WARN_ON_ONCE added.

Let me know if I'm stilling confusing you.  :)

Cheers,
Jeff


[-- Attachment #2: t_mmap_cow_race.c --]
[-- Type: text/plain, Size: 2284 bytes --]

// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2017 Intel Corporation. */
#include <errno.h>
#include <fcntl.h>
#include <libgen.h>
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define MiB(a) ((a)*1024*1024)
#define NUM_THREADS 2

void err_exit(char *op)
{
	fprintf(stderr, "%s: %s\n", op, strerror(errno));
	exit(1);
}

void worker_fn(void *ptr)
{
	char *data = (char *)ptr;
	volatile int a;
	int i, err;

	for (i = 0; i < 10; i++) {
		a = data[0];
		data[0] = a;

		err = madvise(data, MiB(2), MADV_DONTNEED);
		if (err < 0)
			err_exit("madvise");

		/* Mix up the thread timings to encourage the race. */
		err = usleep(rand() % 100);
		if (err < 0)
			err_exit("usleep");
	}
}

int main(int argc, char *argv[])
{
	pthread_t thread[NUM_THREADS];
	int i, j, fd, err;
	char *data;

	if (argc < 2) {
		printf("Usage: %s <file>\n", basename(argv[0]));
		exit(0);
	}

	fd = open(argv[1], O_RDWR|O_CREAT, S_IRUSR|S_IWUSR);
	if (fd < 0)
		err_exit("fd");

	/* This allows us to map a huge page. */
	ftruncate(fd, 0);
	ftruncate(fd, MiB(2));

	/*
	 * First we set up a shared mapping.  Our write will (hopefully) get
	 * the filesystem to give us a 2MiB huge page DAX mapping.  We will
	 * then use this 2MiB page for our private mapping race.
	 */
	data = mmap(NULL, MiB(2), PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
	if (data == MAP_FAILED)
		err_exit("shared mmap");

	data[0] = 1;

	err = munmap(data, MiB(2));
	if (err < 0)
		err_exit("shared munmap");

	for (i = 0; i < 500; i++) {
		data = mmap(NULL, MiB(2), PROT_READ|PROT_WRITE, MAP_PRIVATE,
				fd, 0);
		if (data == MAP_FAILED)
			err_exit("private mmap");

		for (j = 0; j < NUM_THREADS; j++) {
			err = pthread_create(&thread[j], NULL,
					(void*)&worker_fn, data);
			if (err)
				err_exit("pthread_create");
		}

		for (j = 0; j < NUM_THREADS; j++) {
			err = pthread_join(thread[j], NULL);
			if (err)
				err_exit("pthread_join");
		}

		err = munmap(data, MiB(2));
		if (err < 0)
			err_exit("private munmap");
	}

	err = close(fd);
	if (err < 0)
		err_exit("close");

	return 0;
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 22:01             ` Jeff Moyer
@ 2020-02-11 22:15               ` Kirill A. Shutemov
  0 siblings, 0 replies; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-11 22:15 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Justin He, Catalin Marinas, Kirill A.Shutemov, linux-mm

On Tue, Feb 11, 2020 at 05:01:40PM -0500, Jeff Moyer wrote:
> Let me know if I'm stilling confusing you.  :)

Okay. The commit is not to blame for the bug, but only for its exposure.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 16:27   ` Jeff Moyer
@ 2020-02-11 22:40     ` Kirill A. Shutemov
  2020-02-12 14:22       ` Jeff Moyer
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-11 22:40 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

On Tue, Feb 11, 2020 at 11:27:36AM -0500, Jeff Moyer wrote:
> > The real solution would be to retry __copy_from_user_inatomic() under ptl
> > if the first attempt fails. I expect it to be ugly.
> 
> So long as it's correct.  :)

The first attempt on the real solution is below.

Yeah, this is ugly. Any suggestion on clearing up this mess is welcome.

Jeff, could you give it a try?

diff --git a/mm/memory.c b/mm/memory.c
index 0bccc622e482..e8bfdf0d9d1d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2257,7 +2257,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	bool ret;
 	void *kaddr;
 	void __user *uaddr;
-	bool force_mkyoung;
+	bool locked = false;
 	struct vm_area_struct *vma = vmf->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long addr = vmf->address;
@@ -2282,11 +2282,11 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	 * On architectures with software "accessed" bits, we would
 	 * take a double page fault, so mark it accessed here.
 	 */
-	force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
-	if (force_mkyoung) {
+	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
 		pte_t entry;
 
 		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+		locked = true;
 		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
 			/*
 			 * Other thread has already handled the fault
@@ -2310,18 +2310,37 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
 	 * zeroes.
 	 */
 	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+		if (locked)
+			goto warn;
+
+		/* Re-validate under PTL if the page is still mapped */
+		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
+		locked = true;
+		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
+			/* The PTE changed under us. Retry page fault. */
+			ret = false;
+			goto pte_unlock;
+		}
+
 		/*
-		 * Give a warn in case there can be some obscure
-		 * use-case
+		 * The same page can be mapped back since last copy attampt.
+		 * Try to copy again under PTL.
 		 */
-		WARN_ON_ONCE(1);
-		clear_page(kaddr);
+		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
+			/*
+			 * Give a warn in case there can be some obscure
+			 * use-case
+			 */
+warn:
+			WARN_ON_ONCE(1);
+			clear_page(kaddr);
+		}
 	}
 
 	ret = true;
 
 pte_unlock:
-	if (force_mkyoung)
+	if (locked)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 	kunmap_atomic(kaddr);
 	flush_dcache_page(dst);
-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-11 22:40     ` Kirill A. Shutemov
@ 2020-02-12 14:22       ` Jeff Moyer
  2020-02-13 12:14         ` Kirill A. Shutemov
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Moyer @ 2020-02-12 14:22 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Tue, Feb 11, 2020 at 11:27:36AM -0500, Jeff Moyer wrote:
>> > The real solution would be to retry __copy_from_user_inatomic() under ptl
>> > if the first attempt fails. I expect it to be ugly.
>> 
>> So long as it's correct.  :)
>
> The first attempt on the real solution is below.
>
> Yeah, this is ugly. Any suggestion on clearing up this mess is welcome.
>
> Jeff, could you give it a try?

Yes, that patch appears to fix the problem.  I wonder if we could remove
the clear_page completely, though.  I'd rather see the program segfault
than operate on bad data.  What do you think?

-Jeff

>
> diff --git a/mm/memory.c b/mm/memory.c
> index 0bccc622e482..e8bfdf0d9d1d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2257,7 +2257,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
>  	bool ret;
>  	void *kaddr;
>  	void __user *uaddr;
> -	bool force_mkyoung;
> +	bool locked = false;
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct mm_struct *mm = vma->vm_mm;
>  	unsigned long addr = vmf->address;
> @@ -2282,11 +2282,11 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
>  	 * On architectures with software "accessed" bits, we would
>  	 * take a double page fault, so mark it accessed here.
>  	 */
> -	force_mkyoung = arch_faults_on_old_pte() && !pte_young(vmf->orig_pte);
> -	if (force_mkyoung) {
> +	if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) {
>  		pte_t entry;
>  
>  		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
> +		locked = true;
>  		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
>  			/*
>  			 * Other thread has already handled the fault
> @@ -2310,18 +2310,37 @@ static inline bool cow_user_page(struct page *dst, struct page *src,
>  	 * zeroes.
>  	 */
>  	if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
> +		if (locked)
> +			goto warn;
> +
> +		/* Re-validate under PTL if the page is still mapped */
> +		vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);
> +		locked = true;
> +		if (!likely(pte_same(*vmf->pte, vmf->orig_pte))) {
> +			/* The PTE changed under us. Retry page fault. */
> +			ret = false;
> +			goto pte_unlock;
> +		}
> +
>  		/*
> -		 * Give a warn in case there can be some obscure
> -		 * use-case
> +		 * The same page can be mapped back since last copy attampt.
> +		 * Try to copy again under PTL.
>  		 */
> -		WARN_ON_ONCE(1);
> -		clear_page(kaddr);
> +		if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) {
> +			/*
> +			 * Give a warn in case there can be some obscure
> +			 * use-case
> +			 */
> +warn:
> +			WARN_ON_ONCE(1);
> +			clear_page(kaddr);
> +		}
>  	}
>  
>  	ret = true;
>  
>  pte_unlock:
> -	if (force_mkyoung)
> +	if (locked)
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
>  	kunmap_atomic(kaddr);
>  	flush_dcache_page(dst);



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-12 14:22       ` Jeff Moyer
@ 2020-02-13 12:14         ` Kirill A. Shutemov
  2020-02-14 21:07           ` Jeff Moyer
  0 siblings, 1 reply; 15+ messages in thread
From: Kirill A. Shutemov @ 2020-02-13 12:14 UTC (permalink / raw)
  To: Jeff Moyer; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

On Wed, Feb 12, 2020 at 09:22:03AM -0500, Jeff Moyer wrote:
> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
> 
> > On Tue, Feb 11, 2020 at 11:27:36AM -0500, Jeff Moyer wrote:
> >> > The real solution would be to retry __copy_from_user_inatomic() under ptl
> >> > if the first attempt fails. I expect it to be ugly.
> >> 
> >> So long as it's correct.  :)
> >
> > The first attempt on the real solution is below.
> >
> > Yeah, this is ugly. Any suggestion on clearing up this mess is welcome.
> >
> > Jeff, could you give it a try?
> 
> Yes, that patch appears to fix the problem.  I wonder if we could remove
> the clear_page completely, though.  I'd rather see the program segfault
> than operate on bad data.  What do you think?

It is long standing policy: see 6aab341e0a28 ("mm: re-architect the
VM_UNPAGED logic") from 2005. Some obscure case may break if change it.

I think it is fine to live with the WARN for a while and change it to
SIGBUS once we can be relatively sure that it is okay.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared")
  2020-02-13 12:14         ` Kirill A. Shutemov
@ 2020-02-14 21:07           ` Jeff Moyer
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Moyer @ 2020-02-14 21:07 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Jia He, Catalin Marinas, Kirill A. Shutemov, linux-mm

"Kirill A. Shutemov" <kirill@shutemov.name> writes:

> On Wed, Feb 12, 2020 at 09:22:03AM -0500, Jeff Moyer wrote:
>> "Kirill A. Shutemov" <kirill@shutemov.name> writes:
>> 
>> > On Tue, Feb 11, 2020 at 11:27:36AM -0500, Jeff Moyer wrote:
>> >> > The real solution would be to retry __copy_from_user_inatomic() under ptl
>> >> > if the first attempt fails. I expect it to be ugly.
>> >> 
>> >> So long as it's correct.  :)
>> >
>> > The first attempt on the real solution is below.
>> >
>> > Yeah, this is ugly. Any suggestion on clearing up this mess is welcome.
>> >
>> > Jeff, could you give it a try?
>> 
>> Yes, that patch appears to fix the problem.  I wonder if we could remove
>> the clear_page completely, though.  I'd rather see the program segfault
>> than operate on bad data.  What do you think?
>
> It is long standing policy: see 6aab341e0a28 ("mm: re-architect the
> VM_UNPAGED logic") from 2005. Some obscure case may break if change it.

I'll take your word for it.

> I think it is fine to live with the WARN for a while and change it to
> SIGBUS once we can be relatively sure that it is okay.

OK, fine by me.

Thanks for looking into this!

-Jeff



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, back to index

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 22:51 bug: data corruption introduced by commit 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared") Jeff Moyer
2020-02-11  4:17 ` Justin He
2020-02-11  4:29   ` Justin He
2020-02-11 16:44     ` Jeff Moyer
2020-02-11 17:33       ` Kirill A. Shutemov
2020-02-11 17:55         ` Jeff Moyer
2020-02-11 21:44           ` Kirill A. Shutemov
2020-02-11 22:01             ` Jeff Moyer
2020-02-11 22:15               ` Kirill A. Shutemov
2020-02-11 14:51 ` Kirill A. Shutemov
2020-02-11 16:27   ` Jeff Moyer
2020-02-11 22:40     ` Kirill A. Shutemov
2020-02-12 14:22       ` Jeff Moyer
2020-02-13 12:14         ` Kirill A. Shutemov
2020-02-14 21:07           ` Jeff Moyer

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git