linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
* a crash when running strace from persistent memory
@ 2020-09-03 19:24 Mikulas Patocka
  2020-09-03 19:55 ` Linus Torvalds
  2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka
  0 siblings, 2 replies; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-03 19:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton,
	linux-mm, linux-kernel, linux-nvdimm

Hi

There's a bug when you run strace from dax-based filesystem.

-- create real or emulated persistent memory device (/dev/pmem0)
mkfs.ext2 /dev/pmem0
-- mount it
mount -t ext2 -o dax /dev/pmem0 /mnt/test
-- copy the system to it (well, you can copy just a few files that are 
   needed for running strace and ls)
cp -ax / /mnt/test
-- bind the system directories
mount --bind /dev /mnt/test/dev
mount --bind /proc /mnt/test/proc
mount --bind /sys /mnt/test/sys
-- run strace on the ls command
chroot /mnt/test/ strace /bin/ls

You get this warning and ls is killed with SIGSEGV.

I bisected the problem and it is caused by the commit 
17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around 
"COW can break either way" issue). When I revert the patch (on the kernel 
5.9-rc3), the bug goes away.

Mikulas


[   84.190961] ------------[ cut here ]------------
[   84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6
[   84.192398] Modules linked in: ext2 uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb fbdev ipv6 tun autofs4 binfmt_misc configfs af_packet mousedev virtio_balloon virtio_rng evdev rng_core pcspkr button raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx libcrc32c raid1 raid0 md_mod sd_mod t10_pi virtio_scsi virtio_net psmouse net_failover scsi_mod failover
[   84.196301] CPU: 6 PID: 1350 Comm: strace Not tainted 5.9.0-rc3 #6
[   84.197020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   84.197685] RIP: 0010:wp_page_copy.cold+0xdb/0xf6
[   84.198231] Code: ff ff ff 0f 00 eb 8e 48 8b 3c 24 48 8b 74 24 08 ba 00 10 00 00 e8 33 87 1f 00 85 c0 74 1f 48 c7 c7 a7 2b ba 81 e8 cc b6 f0 ff <0f> 0b 48 8b 3c 24 e8 08 82 1f 00 41 be 01 00 00 00 eb ae 41 be 01
[   84.200410] RSP: 0018:ffff88940c1dba58 EFLAGS: 00010282
[   84.201035] RAX: 0000000000000006 RBX: ffff88940c1dbb00 RCX: 0000000000000000
[   84.201842] RDX: 0000000000000003 RSI: ffffffff81b9c0fa RDI: 00000000ffffffff
[   84.202650] RBP: ffffea004f0e4d80 R08: 0000000000000000 R09: 0000000000000000
[   84.203460] R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000000
[   84.204265] R13: ffff88940aa86318 R14: 00000000f7fac000 R15: ffff8893c8db3c40
[   84.205083] FS:  00007fd8a8320740(0000) GS:ffff88940fb80000(0000) knlGS:0000000000000000
[   84.206000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   84.206664] CR2: 00000000f7fac000 CR3: 00000013c93a6000 CR4: 00000000000006a0
[   84.207481] Call Trace:
[   84.207883]  do_wp_page+0x172/0x6a0
[   84.208285]  handle_mm_fault+0xd0b/0x1540
[   84.208753]  __get_user_pages+0x21a/0x6c0
[   84.209213]  __get_user_pages_remote+0xc8/0x2a0
[   84.209735]  process_vm_rw_core.isra.0+0x1ac/0x440
[   84.210318]  ? __might_fault+0x26/0x40
[   84.210758]  ? _copy_from_user+0x6a/0xa0
[   84.211208]  ? __might_fault+0x26/0x40
[   84.211642]  ? _copy_from_user+0x6a/0xa0
[   84.212091]  process_vm_rw+0xd1/0x100
[   84.212511]  ? _copy_to_user+0x69/0x80
[   84.212946]  ? ptrace_get_syscall_info+0x9b/0x180
[   84.213484]  ? find_held_lock+0x2b/0x80
[   84.213926]  ? __x64_sys_ptrace+0x106/0x140
[   84.214405]  ? fpregs_assert_state_consistent+0x19/0x40
[   84.215002]  ? exit_to_user_mode_prepare+0x2d/0x120
[   84.215556]  __x64_sys_process_vm_readv+0x22/0x40
[   84.216103]  do_syscall_64+0x2d/0x80
[   84.216518]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   84.217098] RIP: 0033:0x7fd8a84896da
[   84.217512] Code: 48 8b 15 b9 f7 0b 00 f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 18 f0 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 36 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 81 f7 0b 00 f7
[   84.219618] RSP: 002b:00007ffd08c3c678 EFLAGS: 00000246 ORIG_RAX: 0000000000000136
[   84.220563] RAX: ffffffffffffffda RBX: 00000000f7fac000 RCX: 00007fd8a84896da
[   84.221380] RDX: 0000000000000001 RSI: 00007ffd08c3c680 RDI: 0000000000000549
[   84.222194] RBP: 00007ffd08c3c760 R08: 0000000000000001 R09: 0000000000000000
[   84.222999] R10: 00007ffd08c3c690 R11: 0000000000000246 R12: 00000000f7faca80
[   84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80
[   84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80
[   84.224612] ---[ end trace d8dbf2da5dc1b7ca ]---
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a crash when running strace from persistent memory
  2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka
@ 2020-09-03 19:55 ` Linus Torvalds
  2020-09-04  8:08   ` Mikulas Patocka
  2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-09-03 19:55 UTC (permalink / raw)
  To: Mikulas Patocka, Peter Xu
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton,
	Linux-MM, Linux Kernel Mailing List, linux-nvdimm

On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> There's a bug when you run strace from dax-based filesystem.
>
> -- create real or emulated persistent memory device (/dev/pmem0)
> mkfs.ext2 /dev/pmem0
> -- mount it
> mount -t ext2 -o dax /dev/pmem0 /mnt/test
> -- copy the system to it (well, you can copy just a few files that are
>    needed for running strace and ls)
> cp -ax / /mnt/test
> -- bind the system directories
> mount --bind /dev /mnt/test/dev
> mount --bind /proc /mnt/test/proc
> mount --bind /sys /mnt/test/sys
> -- run strace on the ls command
> chroot /mnt/test/ strace /bin/ls
>
> You get this warning and ls is killed with SIGSEGV.
>
> I bisected the problem and it is caused by the commit
> 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
> "COW can break either way" issue). When I revert the patch (on the kernel
> 5.9-rc3), the bug goes away.

Funky. I really don't see how it could cause that, but we have the
UDDF issue too, so I'm guessing I will have to fix it the radical way
with Peter Xu's series based on my "rip out COW special cases" patch.

Or maybe I'm just using that as an excuse for really wanting to apply
that series.. Because we can't just revert that GUP commit due to
security concerns.

> [   84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6

I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have
some extra debug patch that causes that line to be off by 4? Because
at least for me, line 2486 is actually an empty line in v5.9-rc3.

That said, I really think this is a pre-existing race, and all the
"COW can break either way" patch does is change the timing (presumably
due to the actual pattern of actually doing the COW changing).

See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into
PFN-mapped VMA") for background.

Mikulas, can you check that everything works ok for that case if you
apply Peter's series? See

    https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

or if you have 'b4' installed, use

    b4 am 20200821234958.7896-1-peterx@redhat.com

to get the series..

                     Linus
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a crash when running strace from persistent memory
  2020-09-03 19:55 ` Linus Torvalds
@ 2020-09-04  8:08   ` Mikulas Patocka
  2020-09-04 17:11     ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-04  8:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Xu, Jann Horn, Christoph Hellwig, Oleg Nesterov,
	Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm



On Thu, 3 Sep 2020, Linus Torvalds wrote:

> On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > There's a bug when you run strace from dax-based filesystem.
> >
> > -- create real or emulated persistent memory device (/dev/pmem0)
> > mkfs.ext2 /dev/pmem0
> > -- mount it
> > mount -t ext2 -o dax /dev/pmem0 /mnt/test
> > -- copy the system to it (well, you can copy just a few files that are
> >    needed for running strace and ls)
> > cp -ax / /mnt/test
> > -- bind the system directories
> > mount --bind /dev /mnt/test/dev
> > mount --bind /proc /mnt/test/proc
> > mount --bind /sys /mnt/test/sys
> > -- run strace on the ls command
> > chroot /mnt/test/ strace /bin/ls
> >
> > You get this warning and ls is killed with SIGSEGV.
> >
> > I bisected the problem and it is caused by the commit
> > 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
> > "COW can break either way" issue). When I revert the patch (on the kernel
> > 5.9-rc3), the bug goes away.
> 
> Funky. I really don't see how it could cause that, but we have the
> UDDF issue too, so I'm guessing I will have to fix it the radical way
> with Peter Xu's series based on my "rip out COW special cases" patch.
> 
> Or maybe I'm just using that as an excuse for really wanting to apply
> that series.. Because we can't just revert that GUP commit due to
> security concerns.
> 
> > [   84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6
> 
> I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have
> some extra debug patch that causes that line to be off by 4? Because
> at least for me, line 2486 is actually an empty line in v5.9-rc3.

Yes, that's it. I added a few printk to look at the control flow.

> That said, I really think this is a pre-existing race, and all the
> "COW can break either way" patch does is change the timing (presumably
> due to the actual pattern of actually doing the COW changing).
> 
> See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into
> PFN-mapped VMA") for background.
> 
> Mikulas, can you check that everything works ok for that case if you
> apply Peter's series? See
> 
>     https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/

I applied these four patches and strace works well. There is no longer any 
warning or crash.

Mikulas

> or if you have 'b4' installed, use
> 
>     b4 am 20200821234958.7896-1-peterx@redhat.com
> 
> to get the series..
> 
>                      Linus
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory)
  2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka
  2020-09-03 19:55 ` Linus Torvalds
@ 2020-09-04 16:21 ` Mikulas Patocka
  2020-09-05 12:11   ` Mikulas Patocka
  1 sibling, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-04 16:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Jan Kara, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4



On Thu, 3 Sep 2020, Mikulas Patocka wrote:

> Hi
> 
> There's a bug when you run strace from dax-based filesystem.

Hmm, so I've found another bug in dax mode.

If you extract the Linux kernel tree on dax-based ext2 filesystem (use the 
real ext2 driver, not ext4), and then you run make twice, the second 
invocation will rebuild everything. It seems like a problem with 
timestamps.

mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/
cd /mnt/ext2/usr/src/git/linux-2.6
make clean
make -j12
make -j12	<--- this rebuilds the whole tree, althought it shouldn't

I wasn't able to bisect it because this bug seems to be present in every 
kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug.

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: a crash when running strace from persistent memory
  2020-09-04  8:08   ` Mikulas Patocka
@ 2020-09-04 17:11     ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-09-04 17:11 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Peter Xu, Jann Horn, Christoph Hellwig, Oleg Nesterov,
	Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm

On Fri, Sep 4, 2020 at 1:08 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> I applied these four patches and strace works well. There is no longer any
> warning or crash.

Ok. I obviously approve of that series whole-heartedly, but I still
didn't want to apply it this way (and with this kind of "mid-rc"
timing).

I was hoping to just leave it for the next merge window, but there are
now two independent problems that that forced COW patch of mine
caused, and a plain revert isn't acceptable either, so I've just
applied that series to my tree despite the garbage timing.

Maybe I'm just making excuses and rationalizing because I wanted that
series anyway, and patches that remove lines in core code make me
happy, but I don't see other great alternatives.

              Linus
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory)
  2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka
@ 2020-09-05 12:11   ` Mikulas Patocka
  2020-09-05 12:12     ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka
  2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
  0 siblings, 2 replies; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-05 12:11 UTC (permalink / raw)
  To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs



On Fri, 4 Sep 2020, Mikulas Patocka wrote:

> Hmm, so I've found another bug in dax mode.
> 
> If you extract the Linux kernel tree on dax-based ext2 filesystem (use the 
> real ext2 driver, not ext4), and then you run make twice, the second 
> invocation will rebuild everything. It seems like a problem with 
> timestamps.
> 
> mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/
> cd /mnt/ext2/usr/src/git/linux-2.6
> make clean
> make -j12
> make -j12	<--- this rebuilds the whole tree, althought it shouldn't
> 
> I wasn't able to bisect it because this bug seems to be present in every 
> kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug.
> 
> Mikulas

I've found out the root cause for this bug (XFS has it too) and I'm 
sending patches.

Mikulas
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] ext2: don't update mtime on COW faults
  2020-09-05 12:11   ` Mikulas Patocka
@ 2020-09-05 12:12     ` Mikulas Patocka
  2020-09-07  9:00       ` Jan Kara
  2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
  1 sibling, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-05 12:12 UTC (permalink / raw)
  To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs

When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.

This breaks building of the Linux kernel.
How to reproduce:
1. extract the Linux kernel tree on dax-mounted ext2 filesystem
2. run make clean
3. run make -j12
4. run make -j12
- at step 4, make would incorrectly rebuild the whole kernel (although it
  was already built in step 3).

The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 fs/ext2/file.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/ext2/file.c
===================================================================
--- linux-2.6.orig/fs/ext2/file.c	2020-09-05 10:01:41.000000000 +0200
+++ linux-2.6/fs/ext2/file.c	2020-09-05 13:09:50.000000000 +0200
@@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct ext2_inode_info *ei = EXT2_I(inode);
 	vm_fault_t ret;
+	bool write = (vmf->flags & FAULT_FLAG_WRITE) &&
+		(vmf->vma->vm_flags & VM_SHARED);
 
-	if (vmf->flags & FAULT_FLAG_WRITE) {
+	if (write) {
 		sb_start_pagefault(inode->i_sb);
 		file_update_time(vmf->vma->vm_file);
 	}
@@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct
 	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops);
 
 	up_read(&ei->dax_sem);
-	if (vmf->flags & FAULT_FLAG_WRITE)
+	if (write)
 		sb_end_pagefault(inode->i_sb);
 	return ret;
 }
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 12:11   ` Mikulas Patocka
  2020-09-05 12:12     ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka
@ 2020-09-05 12:13     ` Mikulas Patocka
  2020-09-05 15:36       ` Darrick J. Wong
                         ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-05 12:13 UTC (permalink / raw)
  To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner
  Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs

When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.

This breaks building of the Linux kernel.
How to reproduce:
1. extract the Linux kernel tree on dax-mounted xfs filesystem
2. run make clean
3. run make -j12
4. run make -j12
- at step 4, make would incorrectly rebuild the whole kernel (although it
  was already built in step 3).

The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 fs/xfs/xfs_file.c |   11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux-2.6/fs/xfs/xfs_file.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_file.c	2020-09-05 10:01:42.000000000 +0200
+++ linux-2.6/fs/xfs/xfs_file.c	2020-09-05 13:59:12.000000000 +0200
@@ -1223,6 +1223,13 @@ __xfs_filemap_fault(
 	return ret;
 }
 
+static bool
+xfs_is_write_fault(
+	struct vm_fault		*vmf)
+{
+	return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;
+}
+
 static vm_fault_t
 xfs_filemap_fault(
 	struct vm_fault		*vmf)
@@ -1230,7 +1237,7 @@ xfs_filemap_fault(
 	/* DAX can shortcut the normal fault path on write faults! */
 	return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
 			IS_DAX(file_inode(vmf->vma->vm_file)) &&
-			(vmf->flags & FAULT_FLAG_WRITE));
+			xfs_is_write_fault(vmf));
 }
 
 static vm_fault_t
@@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault(
 
 	/* DAX can shortcut the normal fault path on write faults! */
 	return __xfs_filemap_fault(vmf, pe_size,
-			(vmf->flags & FAULT_FLAG_WRITE));
+			xfs_is_write_fault(vmf));
 }
 
 static vm_fault_t
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
@ 2020-09-05 15:36       ` Darrick J. Wong
  2020-09-05 17:02         ` Mikulas Patocka
  2020-09-05 16:47       ` Linus Torvalds
  2020-09-07  6:47       ` [PATCH 2/2] " Christoph Hellwig
  2 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2020-09-05 15:36 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs

On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote:
> When running in a dax mode, if the user maps a page with MAP_PRIVATE and
> PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
> when the user hits a COW fault.
> 
> This breaks building of the Linux kernel.
> How to reproduce:
> 1. extract the Linux kernel tree on dax-mounted xfs filesystem
> 2. run make clean
> 3. run make -j12
> 4. run make -j12
> - at step 4, make would incorrectly rebuild the whole kernel (although it
>   was already built in step 3).
> 
> The reason for the breakage is that almost all object files depend on
> objtool. When we run objtool, it takes COW page fault on its .data
> section, and these faults will incorrectly update the timestamp of the
> objtool binary. The updated timestamp causes make to rebuild the whole
> tree.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
> 
> ---
>  fs/xfs/xfs_file.c |   11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/fs/xfs/xfs_file.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_file.c	2020-09-05 10:01:42.000000000 +0200
> +++ linux-2.6/fs/xfs/xfs_file.c	2020-09-05 13:59:12.000000000 +0200
> @@ -1223,6 +1223,13 @@ __xfs_filemap_fault(
>  	return ret;
>  }
>  
> +static bool
> +xfs_is_write_fault(

Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test?

You might as well make it a static inline.

> +	struct vm_fault		*vmf)
> +{
> +	return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;

Also, is "shortcutting the normal fault path" the reason for ext2 and
xfs both being broken?

/me puzzles over why write_fault is always true for page_mkwrite and
pfn_mkwrite, but not for fault and huge_fault...

Also: Can you please turn this (checking for timestamp update behavior
wrt shared and private mapping write faults) into an fstest so we don't
mess this up again?

--D

> +}
> +
>  static vm_fault_t
>  xfs_filemap_fault(
>  	struct vm_fault		*vmf)
> @@ -1230,7 +1237,7 @@ xfs_filemap_fault(
>  	/* DAX can shortcut the normal fault path on write faults! */
>  	return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
>  			IS_DAX(file_inode(vmf->vma->vm_file)) &&
> -			(vmf->flags & FAULT_FLAG_WRITE));
> +			xfs_is_write_fault(vmf));
>  }
>  
>  static vm_fault_t
> @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault(
>  
>  	/* DAX can shortcut the normal fault path on write faults! */
>  	return __xfs_filemap_fault(vmf, pe_size,
> -			(vmf->flags & FAULT_FLAG_WRITE));
> +			xfs_is_write_fault(vmf));
>  }
>  
>  static vm_fault_t
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
  2020-09-05 15:36       ` Darrick J. Wong
@ 2020-09-05 16:47       ` Linus Torvalds
  2020-09-05 17:03         ` Linus Torvalds
  2020-09-05 17:04         ` [PATCH 2/2 v2] " Mikulas Patocka
  2020-09-07  6:47       ` [PATCH 2/2] " Christoph Hellwig
  2 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-09-05 16:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm,
	Ext4 Developers List, linux-xfs

On Sat, Sep 5, 2020 at 5:13 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> When running in a dax mode, if the user maps a page with MAP_PRIVATE and
> PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
> when the user hits a COW fault.

So your patch is obviously correct,  but at the same time I look at
the (buggy) ext2/xfs code you fixed, and I go "well, that was a really
natural mistake to make".

So I get the feeling that "yes, this was an ext2 and xfs bug, but we
kind of set those filesystems up to fail".

Could this possibly have been avoided by having nicer interfaces?

Grepping around, and doing a bit of "git blame", I note that ext4 used
to have this exact same bug too, but it was fixed three years ago in
commit fd96b8da68d3 ("ext4: fix fault handling when mounted with -o
dax,ro") and nobody at the time clearly realized it might be a
pattern.

And honestly, it's possible that the pattern came from cut-and-paste
errors, but it's equally likely that the pattern was there simply
because it was such a natural pattern and such an easy and natural
mistake to make.

Maybe it's inevitable. Some people do want (and need) the information
whether it was a write just because they care about the page table
issues (ie marking the pte dirty etc). To that kind of situation,
whether it's shared or not might not matter all that much. But to a
filesystem, a private write vs a shared write are quite different
things.

So I don't really have any suggestions, and maybe it's just what it
is, but maybe somebody has an idea for how to make it slightly less
natural to make this mistake..

But maybe just a test-case is all it takes, like Darrick suggests.

                  Linus
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 15:36       ` Darrick J. Wong
@ 2020-09-05 17:02         ` Mikulas Patocka
  2020-09-10  6:06           ` Darrick J. Wong
  0 siblings, 1 reply; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-05 17:02 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs



On Sat, 5 Sep 2020, Darrick J. Wong wrote:

> On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote:
> > When running in a dax mode, if the user maps a page with MAP_PRIVATE and
> > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
> > when the user hits a COW fault.
> > 
> > This breaks building of the Linux kernel.
> > How to reproduce:
> > 1. extract the Linux kernel tree on dax-mounted xfs filesystem
> > 2. run make clean
> > 3. run make -j12
> > 4. run make -j12
> > - at step 4, make would incorrectly rebuild the whole kernel (although it
> >   was already built in step 3).
> > 
> > The reason for the breakage is that almost all object files depend on
> > objtool. When we run objtool, it takes COW page fault on its .data
> > section, and these faults will incorrectly update the timestamp of the
> > objtool binary. The updated timestamp causes make to rebuild the whole
> > tree.
> > 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org
> > 
> > ---
> >  fs/xfs/xfs_file.c |   11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > Index: linux-2.6/fs/xfs/xfs_file.c
> > ===================================================================
> > --- linux-2.6.orig/fs/xfs/xfs_file.c	2020-09-05 10:01:42.000000000 +0200
> > +++ linux-2.6/fs/xfs/xfs_file.c	2020-09-05 13:59:12.000000000 +0200
> > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault(
> >  	return ret;
> >  }
> >  
> > +static bool
> > +xfs_is_write_fault(
> 
> Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test?
> 
> You might as well make it a static inline.

Yes, it is possible. I'll send a second version.

> > +	struct vm_fault		*vmf)
> > +{
> > +	return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;
> 
> Also, is "shortcutting the normal fault path" the reason for ext2 and
> xfs both being broken?
> 
> /me puzzles over why write_fault is always true for page_mkwrite and
> pfn_mkwrite, but not for fault and huge_fault...
> 
> Also: Can you please turn this (checking for timestamp update behavior
> wrt shared and private mapping write faults) into an fstest so we don't
> mess this up again?

I've written this program that tests it - you can integrate it into your 
testsuite.

Mikulas


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/stat.h>

#define FILE_NAME	"test.txt"

static struct stat st1, st2;

int main(void)
{
	int h, r;
	char *map;
	unlink(FILE_NAME);
	h = creat(FILE_NAME, 0600);
	if (h == -1) perror("creat"), exit(1);
	r = write(h, "x", 1);
	if (r != 1) perror("write"), exit(1);
	if (close(h)) perror("close"), exit(1);
	h = open(FILE_NAME, O_RDWR);
	if (h == -1) perror("open"), exit(1);

	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0);
	if (map == MAP_FAILED) perror("mmap"), exit(1);
	if (fstat(h, &st1)) perror("fstat"), exit(1);
	sleep(2);
	*map = 'y';
	if (fstat(h, &st2)) perror("fstat"), exit(1);
	if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1);
	if (munmap(map, 4096)) perror("munmap"), exit(1);

	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0);
	if (map == MAP_FAILED) perror("mmap"), exit(1);
	if (fstat(h, &st1)) perror("fstat"), exit(1);
	sleep(2);
	*map = 'z';
	if (fstat(h, &st2)) perror("fstat"), exit(1);
	if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1);
	if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1);
	if (munmap(map, 4096)) perror("munmap"), exit(1);

	if (close(h)) perror("close"), exit(1);
	if (unlink(FILE_NAME)) perror("unlink"), exit(1);
	return 0;
}
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 16:47       ` Linus Torvalds
@ 2020-09-05 17:03         ` Linus Torvalds
  2020-09-07  8:59           ` Jan Kara
  2020-09-05 17:04         ` [PATCH 2/2 v2] " Mikulas Patocka
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-09-05 17:03 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm,
	Ext4 Developers List, linux-xfs

On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> So your patch is obviously correct, [..]

Oh, and I had a xfs pull request in my inbox already, so rather than
expect Darrick to do another one just for this and have Jan do one for
ext2, I just applied these two directly as "ObviouslyCorrect(tm)".

I added the "inline" as suggested by Darrick, and I also added
parenthesis around the bit tests.

Yes, I know the C precedence rules, but I just personally find the
code easier to read if I don't even have to think about it and the
different subexpressions of a logical operation are just visually very
clear. And as I was editing the patch anyway...

So that xfs helper function now looks like this

+static inline bool
+xfs_is_write_fault(
+       struct vm_fault         *vmf)
+{
+       return (vmf->flags & FAULT_FLAG_WRITE) &&
+              (vmf->vma->vm_flags & VM_SHARED);
+}

instead.

            Linus
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2 v2] xfs: don't update mtime on COW faults
  2020-09-05 16:47       ` Linus Torvalds
  2020-09-05 17:03         ` Linus Torvalds
@ 2020-09-05 17:04         ` Mikulas Patocka
  1 sibling, 0 replies; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-05 17:04 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm,
	Ext4 Developers List, linux-xfs

When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.

This breaks building of the Linux kernel.
How to reproduce:
1. extract the Linux kernel tree on dax-mounted xfs filesystem
2. run make clean
3. run make -j12
4. run make -j12
- at step 4, make would incorrectly rebuild the whole kernel (although it
  was already built in step 3).

The reason for the breakage is that almost all object files depend on
objtool. When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary. The updated timestamp causes make to rebuild the whole
tree.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 fs/xfs/xfs_file.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6/fs/xfs/xfs_file.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_file.c	2020-09-05 18:48:54.000000000 +0200
+++ linux-2.6/fs/xfs/xfs_file.c	2020-09-05 18:56:48.000000000 +0200
@@ -1223,14 +1223,21 @@ __xfs_filemap_fault(
 	return ret;
 }
 
+static inline bool
+xfs_is_shared_dax_write_fault(
+	struct vm_fault		*vmf)
+{
+	return IS_DAX(file_inode(vmf->vma->vm_file)) &&
+	       vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;
+}
+
 static vm_fault_t
 xfs_filemap_fault(
 	struct vm_fault		*vmf)
 {
 	/* DAX can shortcut the normal fault path on write faults! */
 	return __xfs_filemap_fault(vmf, PE_SIZE_PTE,
-			IS_DAX(file_inode(vmf->vma->vm_file)) &&
-			(vmf->flags & FAULT_FLAG_WRITE));
+			xfs_is_shared_dax_write_fault(vmf));
 }
 
 static vm_fault_t
@@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault(
 
 	/* DAX can shortcut the normal fault path on write faults! */
 	return __xfs_filemap_fault(vmf, pe_size,
-			(vmf->flags & FAULT_FLAG_WRITE));
+			xfs_is_shared_dax_write_fault(vmf));
 }
 
 static vm_fault_t
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
  2020-09-05 15:36       ` Darrick J. Wong
  2020-09-05 16:47       ` Linus Torvalds
@ 2020-09-07  6:47       ` Christoph Hellwig
  2 siblings, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2020-09-07  6:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner,
	Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs

> +static bool
> +xfs_is_write_fault(
> +	struct vm_fault		*vmf)
> +{
> +	return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;
> +}

This function does not look xfs specific at all.  Why isn't it it in
fs.h?  While we're at it the name sounds rather generic, and there are
no good comments.

Maybe we just need to split FAULT_FLAG_WRITE into two and check those
instead of such crazy workarounds?
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 17:03         ` Linus Torvalds
@ 2020-09-07  8:59           ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2020-09-07  8:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mikulas Patocka, Jan Kara, Darrick J. Wong, Dave Chinner,
	Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm,
	Ext4 Developers List, linux-xfs

On Sat 05-09-20 10:03:20, Linus Torvalds wrote:
> On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > So your patch is obviously correct, [..]
> 
> Oh, and I had a xfs pull request in my inbox already, so rather than
> expect Darrick to do another one just for this and have Jan do one for
> ext2, I just applied these two directly as "ObviouslyCorrect(tm)".

OK, thanks!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ext2: don't update mtime on COW faults
  2020-09-05 12:12     ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka
@ 2020-09-07  9:00       ` Jan Kara
  0 siblings, 0 replies; 18+ messages in thread
From: Jan Kara @ 2020-09-07  9:00 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner,
	Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs

On Sat 05-09-20 08:12:01, Mikulas Patocka wrote:
> When running in a dax mode, if the user maps a page with MAP_PRIVATE and
> PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
> when the user hits a COW fault.
> 
> This breaks building of the Linux kernel.
> How to reproduce:
> 1. extract the Linux kernel tree on dax-mounted ext2 filesystem
> 2. run make clean
> 3. run make -j12
> 4. run make -j12
> - at step 4, make would incorrectly rebuild the whole kernel (although it
>   was already built in step 3).
> 
> The reason for the breakage is that almost all object files depend on
> objtool. When we run objtool, it takes COW page fault on its .data
> section, and these faults will incorrectly update the timestamp of the
> objtool binary. The updated timestamp causes make to rebuild the whole
> tree.
> 
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org

Thanks. Good spotting! Linus has already merged this so nothing more to do
here.

								Honza

> 
> ---
>  fs/ext2/file.c |    6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> Index: linux-2.6/fs/ext2/file.c
> ===================================================================
> --- linux-2.6.orig/fs/ext2/file.c	2020-09-05 10:01:41.000000000 +0200
> +++ linux-2.6/fs/ext2/file.c	2020-09-05 13:09:50.000000000 +0200
> @@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct
>  	struct inode *inode = file_inode(vmf->vma->vm_file);
>  	struct ext2_inode_info *ei = EXT2_I(inode);
>  	vm_fault_t ret;
> +	bool write = (vmf->flags & FAULT_FLAG_WRITE) &&
> +		(vmf->vma->vm_flags & VM_SHARED);
>  
> -	if (vmf->flags & FAULT_FLAG_WRITE) {
> +	if (write) {
>  		sb_start_pagefault(inode->i_sb);
>  		file_update_time(vmf->vma->vm_file);
>  	}
> @@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct
>  	ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops);
>  
>  	up_read(&ei->dax_sem);
> -	if (vmf->flags & FAULT_FLAG_WRITE)
> +	if (write)
>  		sb_end_pagefault(inode->i_sb);
>  	return ret;
>  }
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-05 17:02         ` Mikulas Patocka
@ 2020-09-10  6:06           ` Darrick J. Wong
  2020-09-11 16:41             ` Mikulas Patocka
  0 siblings, 1 reply; 18+ messages in thread
From: Darrick J. Wong @ 2020-09-10  6:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs, Eric Sandeen

On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote:
> 
> 
> On Sat, 5 Sep 2020, Darrick J. Wong wrote:
> 
> > On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote:
> > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and
> > > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
> > > when the user hits a COW fault.
> > > 
> > > This breaks building of the Linux kernel.
> > > How to reproduce:
> > > 1. extract the Linux kernel tree on dax-mounted xfs filesystem
> > > 2. run make clean
> > > 3. run make -j12
> > > 4. run make -j12
> > > - at step 4, make would incorrectly rebuild the whole kernel (although it
> > >   was already built in step 3).
> > > 
> > > The reason for the breakage is that almost all object files depend on
> > > objtool. When we run objtool, it takes COW page fault on its .data
> > > section, and these faults will incorrectly update the timestamp of the
> > > objtool binary. The updated timestamp causes make to rebuild the whole
> > > tree.
> > > 
> > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > > Cc: stable@vger.kernel.org
> > > 
> > > ---
> > >  fs/xfs/xfs_file.c |   11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > 
> > > Index: linux-2.6/fs/xfs/xfs_file.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/xfs/xfs_file.c	2020-09-05 10:01:42.000000000 +0200
> > > +++ linux-2.6/fs/xfs/xfs_file.c	2020-09-05 13:59:12.000000000 +0200
> > > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault(
> > >  	return ret;
> > >  }
> > >  
> > > +static bool
> > > +xfs_is_write_fault(
> > 
> > Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test?
> > 
> > You might as well make it a static inline.
> 
> Yes, it is possible. I'll send a second version.
> 
> > > +	struct vm_fault		*vmf)
> > > +{
> > > +	return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED;
> > 
> > Also, is "shortcutting the normal fault path" the reason for ext2 and
> > xfs both being broken?
> > 
> > /me puzzles over why write_fault is always true for page_mkwrite and
> > pfn_mkwrite, but not for fault and huge_fault...
> > 
> > Also: Can you please turn this (checking for timestamp update behavior
> > wrt shared and private mapping write faults) into an fstest so we don't
> > mess this up again?
> 
> I've written this program that tests it - you can integrate it into your 
> testsuite.

I don't get it.  You're a filesystem maintainer too, which means you're
a regular contributor.  Do you:

(a) not use fstests?  If you don't, I really hope you use something else
to QA hpfs.

(b) really think that it's my problem to integrate and submit your
regression tests for you?

> Mikulas
> 
> 
> #include <stdio.h>

and (c) what do you want me to do with a piece of code that has no
signoff tag, no copyright, and no license?  This is your patch, and
therefore your responsibility to develop enough of an appropriate
regression test in a proper form that the rest of us can easily
determine we have the rights to contribute to it.

I don't have a problem with helping to tweak a properly licensed and
tagged test program into fstests, but this is a non-starter.

--D

> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <string.h>
> #include <sys/mman.h>
> #include <sys/stat.h>
> 
> #define FILE_NAME	"test.txt"
> 
> static struct stat st1, st2;
> 
> int main(void)
> {
> 	int h, r;
> 	char *map;
> 	unlink(FILE_NAME);
> 	h = creat(FILE_NAME, 0600);
> 	if (h == -1) perror("creat"), exit(1);
> 	r = write(h, "x", 1);
> 	if (r != 1) perror("write"), exit(1);
> 	if (close(h)) perror("close"), exit(1);
> 	h = open(FILE_NAME, O_RDWR);
> 	if (h == -1) perror("open"), exit(1);
> 
> 	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0);
> 	if (map == MAP_FAILED) perror("mmap"), exit(1);
> 	if (fstat(h, &st1)) perror("fstat"), exit(1);
> 	sleep(2);
> 	*map = 'y';
> 	if (fstat(h, &st2)) perror("fstat"), exit(1);
> 	if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1);
> 	if (munmap(map, 4096)) perror("munmap"), exit(1);
> 
> 	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0);
> 	if (map == MAP_FAILED) perror("mmap"), exit(1);
> 	if (fstat(h, &st1)) perror("fstat"), exit(1);
> 	sleep(2);
> 	*map = 'z';
> 	if (fstat(h, &st2)) perror("fstat"), exit(1);
> 	if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1);
> 	if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1);
> 	if (munmap(map, 4096)) perror("munmap"), exit(1);
> 
> 	if (close(h)) perror("close"), exit(1);
> 	if (unlink(FILE_NAME)) perror("unlink"), exit(1);
> 	return 0;
> }
> 
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] xfs: don't update mtime on COW faults
  2020-09-10  6:06           ` Darrick J. Wong
@ 2020-09-11 16:41             ` Mikulas Patocka
  0 siblings, 0 replies; 18+ messages in thread
From: Mikulas Patocka @ 2020-09-11 16:41 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn,
	Christoph Hellwig, Oleg Nesterov, Kirill Shutemov,
	Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox,
	Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4,
	linux-xfs, Eric Sandeen



On Wed, 9 Sep 2020, Darrick J. Wong wrote:

> On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote:
> > > > 
> > 
> > I've written this program that tests it - you can integrate it into your 
> > testsuite.
> 
> I don't get it.  You're a filesystem maintainer too, which means you're
> a regular contributor.  Do you:
> 
> (a) not use fstests?  If you don't, I really hope you use something else
> to QA hpfs.

I don't use xfstests on HPFS. I was testing it just by using it. Now I use 
it just a little, but I don't modify it much.

> (b) really think that it's my problem to integrate and submit your
> regression tests for you?
> 
> and (c) what do you want me to do with a piece of code that has no 
> signoff tag, no copyright, and no license?  This is your patch, and 
> therefore your responsibility to develop enough of an appropriate 
> regression test in a proper form that the rest of us can easily 
> determine we have the rights to contribute to it.


If you want a full patch (I copied the script from test 313), I send it 
here.

Mikulas


From: Mikulas Patocka <mpatocka@redhat.com>
Subject: [PATCH] check ctime and mtime vs mmap

Check ctime and mtime are not updated on COW faults
and that they are updated on shared faults

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 src/Makefile          |    3 +-
 src/mmap-timestamp.c  |   53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/generic/609     |   40 +++++++++++++++++++++++++++++++++++++
 tests/generic/609.out |    2 +
 tests/generic/group   |    1 
 5 files changed, 98 insertions(+), 1 deletion(-)

Index: xfstests-dev/src/Makefile
===================================================================
--- xfstests-dev.orig/src/Makefile	2020-09-06 12:38:40.000000000 +0200
+++ xfstests-dev/src/Makefile	2020-09-11 17:39:04.000000000 +0200
@@ -17,7 +17,8 @@ TARGETS = dirstress fill fill2 getpagesi
 	t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro \
 	t_ext4_dax_journal_corruption t_ext4_dax_inline_corruption \
 	t_ofd_locks t_mmap_collision mmap-write-concurrent \
-	t_get_file_time t_create_short_dirs t_create_long_dirs
+	t_get_file_time t_create_short_dirs t_create_long_dirs \
+	mmap-timestamp
 
 LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
 	preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
Index: xfstests-dev/src/mmap-timestamp.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ xfstests-dev/src/mmap-timestamp.c	2020-09-11 18:21:40.000000000 +0200
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2020 Red Hat, Inc. All Rights reserved.
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <fcntl.h>
+#include <string.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+
+#define FILE_NAME	argv[1]
+
+static struct stat st1, st2;
+
+int main(int argc, char *argv[])
+{
+	int h, r;
+	char *map;
+	unlink(FILE_NAME);
+	h = creat(FILE_NAME, 0600);
+	if (h == -1) perror("creat"), exit(1);
+	r = write(h, "x", 1);
+	if (r != 1) perror("write"), exit(1);
+	if (close(h)) perror("close"), exit(1);
+	h = open(FILE_NAME, O_RDWR);
+	if (h == -1) perror("open"), exit(1);
+
+	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0);
+	if (map == MAP_FAILED) perror("mmap"), exit(1);
+	if (fstat(h, &st1)) perror("fstat"), exit(1);
+	sleep(2);
+	*map = 'y';
+	if (fstat(h, &st2)) perror("fstat"), exit(1);
+	if (st1.st_mtime != st2.st_mtime) fprintf(stderr, "BUG: COW fault changed mtime!\n"), exit(1);
+	if (st1.st_ctime != st2.st_ctime) fprintf(stderr, "BUG: COW fault changed ctime!\n"), exit(1);
+	if (munmap(map, 4096)) perror("munmap"), exit(1);
+
+	map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0);
+	if (map == MAP_FAILED) perror("mmap"), exit(1);
+	if (fstat(h, &st1)) perror("fstat"), exit(1);
+	sleep(2);
+	*map = 'z';
+	if (msync(map, 4096, MS_SYNC)) perror("msync"), exit(1);
+	if (fstat(h, &st2)) perror("fstat"), exit(1);
+	if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1);
+	if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1);
+	if (munmap(map, 4096)) perror("munmap"), exit(1);
+
+	if (close(h)) perror("close"), exit(1);
+	if (unlink(FILE_NAME)) perror("unlink"), exit(1);
+	return 0;
+}
Index: xfstests-dev/tests/generic/609
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ xfstests-dev/tests/generic/609	2020-09-11 18:30:30.000000000 +0200
@@ -0,0 +1,40 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2020 Red Hat, Inc. All Rights Reserved.
+#
+# FS QA Test No. 609
+#
+# Check ctime and mtime are not updated on COW faults
+# and that they are updated on shared faults
+#
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1	# failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+    cd /
+    rm -f $testfile
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_supported_os Linux
+_require_test
+
+testfile=$TEST_DIR/testfile.$seq
+
+echo "Silence is golden"
+
+$here/src/mmap-timestamp $testfile 2>&1
+
+status=0
+exit
Index: xfstests-dev/tests/generic/609.out
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ xfstests-dev/tests/generic/609.out	2020-09-11 18:24:24.000000000 +0200
@@ -0,0 +1,2 @@
+QA output created by 609
+Silence is golden
Index: xfstests-dev/tests/generic/group
===================================================================
--- xfstests-dev.orig/tests/generic/group	2020-09-06 12:38:40.000000000 +0200
+++ xfstests-dev/tests/generic/group	2020-09-11 18:25:09.000000000 +0200
@@ -611,3 +611,4 @@
 606 auto attr quick dax
 607 auto attr quick dax
 608 auto attr quick dax
+609 auto quick
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-09-11 16:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka
2020-09-03 19:55 ` Linus Torvalds
2020-09-04  8:08   ` Mikulas Patocka
2020-09-04 17:11     ` Linus Torvalds
2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka
2020-09-05 12:11   ` Mikulas Patocka
2020-09-05 12:12     ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka
2020-09-07  9:00       ` Jan Kara
2020-09-05 12:13     ` [PATCH 2/2] xfs: " Mikulas Patocka
2020-09-05 15:36       ` Darrick J. Wong
2020-09-05 17:02         ` Mikulas Patocka
2020-09-10  6:06           ` Darrick J. Wong
2020-09-11 16:41             ` Mikulas Patocka
2020-09-05 16:47       ` Linus Torvalds
2020-09-05 17:03         ` Linus Torvalds
2020-09-07  8:59           ` Jan Kara
2020-09-05 17:04         ` [PATCH 2/2 v2] " Mikulas Patocka
2020-09-07  6:47       ` [PATCH 2/2] " Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).