* a crash when running strace from persistent memory @ 2020-09-03 19:24 Mikulas Patocka 2020-09-03 19:55 ` Linus Torvalds 2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka 0 siblings, 2 replies; 18+ messages in thread From: Mikulas Patocka @ 2020-09-03 19:24 UTC (permalink / raw) To: Linus Torvalds Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm Hi There's a bug when you run strace from dax-based filesystem. -- create real or emulated persistent memory device (/dev/pmem0) mkfs.ext2 /dev/pmem0 -- mount it mount -t ext2 -o dax /dev/pmem0 /mnt/test -- copy the system to it (well, you can copy just a few files that are needed for running strace and ls) cp -ax / /mnt/test -- bind the system directories mount --bind /dev /mnt/test/dev mount --bind /proc /mnt/test/proc mount --bind /sys /mnt/test/sys -- run strace on the ls command chroot /mnt/test/ strace /bin/ls You get this warning and ls is killed with SIGSEGV. I bisected the problem and it is caused by the commit 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around "COW can break either way" issue). When I revert the patch (on the kernel 5.9-rc3), the bug goes away. Mikulas [ 84.190961] ------------[ cut here ]------------ [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6 [ 84.192398] Modules linked in: ext2 uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb fbdev ipv6 tun autofs4 binfmt_misc configfs af_packet mousedev virtio_balloon virtio_rng evdev rng_core pcspkr button raid10 raid456 async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx libcrc32c raid1 raid0 md_mod sd_mod t10_pi virtio_scsi virtio_net psmouse net_failover scsi_mod failover [ 84.196301] CPU: 6 PID: 1350 Comm: strace Not tainted 5.9.0-rc3 #6 [ 84.197020] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 84.197685] RIP: 0010:wp_page_copy.cold+0xdb/0xf6 [ 84.198231] Code: ff ff ff 0f 00 eb 8e 48 8b 3c 24 48 8b 74 24 08 ba 00 10 00 00 e8 33 87 1f 00 85 c0 74 1f 48 c7 c7 a7 2b ba 81 e8 cc b6 f0 ff <0f> 0b 48 8b 3c 24 e8 08 82 1f 00 41 be 01 00 00 00 eb ae 41 be 01 [ 84.200410] RSP: 0018:ffff88940c1dba58 EFLAGS: 00010282 [ 84.201035] RAX: 0000000000000006 RBX: ffff88940c1dbb00 RCX: 0000000000000000 [ 84.201842] RDX: 0000000000000003 RSI: ffffffff81b9c0fa RDI: 00000000ffffffff [ 84.202650] RBP: ffffea004f0e4d80 R08: 0000000000000000 R09: 0000000000000000 [ 84.203460] R10: 0000000000000046 R11: 0000000000000000 R12: 0000000000000000 [ 84.204265] R13: ffff88940aa86318 R14: 00000000f7fac000 R15: ffff8893c8db3c40 [ 84.205083] FS: 00007fd8a8320740(0000) GS:ffff88940fb80000(0000) knlGS:0000000000000000 [ 84.206000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 84.206664] CR2: 00000000f7fac000 CR3: 00000013c93a6000 CR4: 00000000000006a0 [ 84.207481] Call Trace: [ 84.207883] do_wp_page+0x172/0x6a0 [ 84.208285] handle_mm_fault+0xd0b/0x1540 [ 84.208753] __get_user_pages+0x21a/0x6c0 [ 84.209213] __get_user_pages_remote+0xc8/0x2a0 [ 84.209735] process_vm_rw_core.isra.0+0x1ac/0x440 [ 84.210318] ? __might_fault+0x26/0x40 [ 84.210758] ? _copy_from_user+0x6a/0xa0 [ 84.211208] ? __might_fault+0x26/0x40 [ 84.211642] ? _copy_from_user+0x6a/0xa0 [ 84.212091] process_vm_rw+0xd1/0x100 [ 84.212511] ? _copy_to_user+0x69/0x80 [ 84.212946] ? ptrace_get_syscall_info+0x9b/0x180 [ 84.213484] ? find_held_lock+0x2b/0x80 [ 84.213926] ? __x64_sys_ptrace+0x106/0x140 [ 84.214405] ? fpregs_assert_state_consistent+0x19/0x40 [ 84.215002] ? exit_to_user_mode_prepare+0x2d/0x120 [ 84.215556] __x64_sys_process_vm_readv+0x22/0x40 [ 84.216103] do_syscall_64+0x2d/0x80 [ 84.216518] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 84.217098] RIP: 0033:0x7fd8a84896da [ 84.217512] Code: 48 8b 15 b9 f7 0b 00 f7 d8 64 89 02 b8 ff ff ff ff eb d2 e8 18 f0 00 00 0f 1f 84 00 00 00 00 00 49 89 ca b8 36 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 81 f7 0b 00 f7 [ 84.219618] RSP: 002b:00007ffd08c3c678 EFLAGS: 00000246 ORIG_RAX: 0000000000000136 [ 84.220563] RAX: ffffffffffffffda RBX: 00000000f7fac000 RCX: 00007fd8a84896da [ 84.221380] RDX: 0000000000000001 RSI: 00007ffd08c3c680 RDI: 0000000000000549 [ 84.222194] RBP: 00007ffd08c3c760 R08: 0000000000000001 R09: 0000000000000000 [ 84.222999] R10: 00007ffd08c3c690 R11: 0000000000000246 R12: 00000000f7faca80 [ 84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80 [ 84.223804] R13: 0000000000000580 R14: 0000000000000549 R15: 00005589a50eee80 [ 84.224612] ---[ end trace d8dbf2da5dc1b7ca ]--- _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: a crash when running strace from persistent memory 2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka @ 2020-09-03 19:55 ` Linus Torvalds 2020-09-04 8:08 ` Mikulas Patocka 2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2020-09-03 19:55 UTC (permalink / raw) To: Mikulas Patocka, Peter Xu Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > There's a bug when you run strace from dax-based filesystem. > > -- create real or emulated persistent memory device (/dev/pmem0) > mkfs.ext2 /dev/pmem0 > -- mount it > mount -t ext2 -o dax /dev/pmem0 /mnt/test > -- copy the system to it (well, you can copy just a few files that are > needed for running strace and ls) > cp -ax / /mnt/test > -- bind the system directories > mount --bind /dev /mnt/test/dev > mount --bind /proc /mnt/test/proc > mount --bind /sys /mnt/test/sys > -- run strace on the ls command > chroot /mnt/test/ strace /bin/ls > > You get this warning and ls is killed with SIGSEGV. > > I bisected the problem and it is caused by the commit > 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around > "COW can break either way" issue). When I revert the patch (on the kernel > 5.9-rc3), the bug goes away. Funky. I really don't see how it could cause that, but we have the UDDF issue too, so I'm guessing I will have to fix it the radical way with Peter Xu's series based on my "rip out COW special cases" patch. Or maybe I'm just using that as an excuse for really wanting to apply that series.. Because we can't just revert that GUP commit due to security concerns. > [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6 I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have some extra debug patch that causes that line to be off by 4? Because at least for me, line 2486 is actually an empty line in v5.9-rc3. That said, I really think this is a pre-existing race, and all the "COW can break either way" patch does is change the timing (presumably due to the actual pattern of actually doing the COW changing). See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into PFN-mapped VMA") for background. Mikulas, can you check that everything works ok for that case if you apply Peter's series? See https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/ or if you have 'b4' installed, use b4 am 20200821234958.7896-1-peterx@redhat.com to get the series.. Linus _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: a crash when running strace from persistent memory 2020-09-03 19:55 ` Linus Torvalds @ 2020-09-04 8:08 ` Mikulas Patocka 2020-09-04 17:11 ` Linus Torvalds 0 siblings, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2020-09-04 8:08 UTC (permalink / raw) To: Linus Torvalds Cc: Peter Xu, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm On Thu, 3 Sep 2020, Linus Torvalds wrote: > On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <mpatocka@redhat.com> wrote: > > > > There's a bug when you run strace from dax-based filesystem. > > > > -- create real or emulated persistent memory device (/dev/pmem0) > > mkfs.ext2 /dev/pmem0 > > -- mount it > > mount -t ext2 -o dax /dev/pmem0 /mnt/test > > -- copy the system to it (well, you can copy just a few files that are > > needed for running strace and ls) > > cp -ax / /mnt/test > > -- bind the system directories > > mount --bind /dev /mnt/test/dev > > mount --bind /proc /mnt/test/proc > > mount --bind /sys /mnt/test/sys > > -- run strace on the ls command > > chroot /mnt/test/ strace /bin/ls > > > > You get this warning and ls is killed with SIGSEGV. > > > > I bisected the problem and it is caused by the commit > > 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around > > "COW can break either way" issue). When I revert the patch (on the kernel > > 5.9-rc3), the bug goes away. > > Funky. I really don't see how it could cause that, but we have the > UDDF issue too, so I'm guessing I will have to fix it the radical way > with Peter Xu's series based on my "rip out COW special cases" patch. > > Or maybe I'm just using that as an excuse for really wanting to apply > that series.. Because we can't just revert that GUP commit due to > security concerns. > > > [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6 > > I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have > some extra debug patch that causes that line to be off by 4? Because > at least for me, line 2486 is actually an empty line in v5.9-rc3. Yes, that's it. I added a few printk to look at the control flow. > That said, I really think this is a pre-existing race, and all the > "COW can break either way" patch does is change the timing (presumably > due to the actual pattern of actually doing the COW changing). > > See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into > PFN-mapped VMA") for background. > > Mikulas, can you check that everything works ok for that case if you > apply Peter's series? See > > https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@redhat.com/ I applied these four patches and strace works well. There is no longer any warning or crash. Mikulas > or if you have 'b4' installed, use > > b4 am 20200821234958.7896-1-peterx@redhat.com > > to get the series.. > > Linus > _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: a crash when running strace from persistent memory 2020-09-04 8:08 ` Mikulas Patocka @ 2020-09-04 17:11 ` Linus Torvalds 0 siblings, 0 replies; 18+ messages in thread From: Linus Torvalds @ 2020-09-04 17:11 UTC (permalink / raw) To: Mikulas Patocka Cc: Peter Xu, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm On Fri, Sep 4, 2020 at 1:08 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > I applied these four patches and strace works well. There is no longer any > warning or crash. Ok. I obviously approve of that series whole-heartedly, but I still didn't want to apply it this way (and with this kind of "mid-rc" timing). I was hoping to just leave it for the next merge window, but there are now two independent problems that that forced COW patch of mine caused, and a plain revert isn't acceptable either, so I've just applied that series to my tree despite the garbage timing. Maybe I'm just making excuses and rationalizing because I wanted that series anyway, and patches that remove lines in core code make me happy, but I don't see other great alternatives. Linus _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) 2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka 2020-09-03 19:55 ` Linus Torvalds @ 2020-09-04 16:21 ` Mikulas Patocka 2020-09-05 12:11 ` Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2020-09-04 16:21 UTC (permalink / raw) To: Linus Torvalds Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Jan Kara, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4 On Thu, 3 Sep 2020, Mikulas Patocka wrote: > Hi > > There's a bug when you run strace from dax-based filesystem. Hmm, so I've found another bug in dax mode. If you extract the Linux kernel tree on dax-based ext2 filesystem (use the real ext2 driver, not ext4), and then you run make twice, the second invocation will rebuild everything. It seems like a problem with timestamps. mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/ cd /mnt/ext2/usr/src/git/linux-2.6 make clean make -j12 make -j12 <--- this rebuilds the whole tree, althought it shouldn't I wasn't able to bisect it because this bug seems to be present in every kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug. Mikulas _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) 2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka @ 2020-09-05 12:11 ` Mikulas Patocka 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 0 siblings, 2 replies; 18+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:11 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Fri, 4 Sep 2020, Mikulas Patocka wrote: > Hmm, so I've found another bug in dax mode. > > If you extract the Linux kernel tree on dax-based ext2 filesystem (use the > real ext2 driver, not ext4), and then you run make twice, the second > invocation will rebuild everything. It seems like a problem with > timestamps. > > mount -t ext2 -o dax /dev/pmem0 /mnt/ext2/ > cd /mnt/ext2/usr/src/git/linux-2.6 > make clean > make -j12 > make -j12 <--- this rebuilds the whole tree, althought it shouldn't > > I wasn't able to bisect it because this bug seems to be present in every > kernel I tried (back to 4.16.0). Ext4 doesn't seem to have this bug. > > Mikulas I've found out the root cause for this bug (XFS has it too) and I'm sending patches. Mikulas _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] ext2: don't update mtime on COW faults 2020-09-05 12:11 ` Mikulas Patocka @ 2020-09-05 12:12 ` Mikulas Patocka 2020-09-07 9:00 ` Jan Kara 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:12 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted ext2 filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/ext2/file.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) Index: linux-2.6/fs/ext2/file.c =================================================================== --- linux-2.6.orig/fs/ext2/file.c 2020-09-05 10:01:41.000000000 +0200 +++ linux-2.6/fs/ext2/file.c 2020-09-05 13:09:50.000000000 +0200 @@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct struct inode *inode = file_inode(vmf->vma->vm_file); struct ext2_inode_info *ei = EXT2_I(inode); vm_fault_t ret; + bool write = (vmf->flags & FAULT_FLAG_WRITE) && + (vmf->vma->vm_flags & VM_SHARED); - if (vmf->flags & FAULT_FLAG_WRITE) { + if (write) { sb_start_pagefault(inode->i_sb); file_update_time(vmf->vma->vm_file); } @@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); up_read(&ei->dax_sem); - if (vmf->flags & FAULT_FLAG_WRITE) + if (write) sb_end_pagefault(inode->i_sb); return ret; } _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] ext2: don't update mtime on COW faults 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka @ 2020-09-07 9:00 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2020-09-07 9:00 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat 05-09-20 08:12:01, Mikulas Patocka wrote: > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. > > This breaks building of the Linux kernel. > How to reproduce: > 1. extract the Linux kernel tree on dax-mounted ext2 filesystem > 2. run make clean > 3. run make -j12 > 4. run make -j12 > - at step 4, make would incorrectly rebuild the whole kernel (although it > was already built in step 3). > > The reason for the breakage is that almost all object files depend on > objtool. When we run objtool, it takes COW page fault on its .data > section, and these faults will incorrectly update the timestamp of the > objtool binary. The updated timestamp causes make to rebuild the whole > tree. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org Thanks. Good spotting! Linus has already merged this so nothing more to do here. Honza > > --- > fs/ext2/file.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > Index: linux-2.6/fs/ext2/file.c > =================================================================== > --- linux-2.6.orig/fs/ext2/file.c 2020-09-05 10:01:41.000000000 +0200 > +++ linux-2.6/fs/ext2/file.c 2020-09-05 13:09:50.000000000 +0200 > @@ -93,8 +93,10 @@ static vm_fault_t ext2_dax_fault(struct > struct inode *inode = file_inode(vmf->vma->vm_file); > struct ext2_inode_info *ei = EXT2_I(inode); > vm_fault_t ret; > + bool write = (vmf->flags & FAULT_FLAG_WRITE) && > + (vmf->vma->vm_flags & VM_SHARED); > > - if (vmf->flags & FAULT_FLAG_WRITE) { > + if (write) { > sb_start_pagefault(inode->i_sb); > file_update_time(vmf->vma->vm_file); > } > @@ -103,7 +105,7 @@ static vm_fault_t ext2_dax_fault(struct > ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops); > > up_read(&ei->dax_sem); > - if (vmf->flags & FAULT_FLAG_WRITE) > + if (write) > sb_end_pagefault(inode->i_sb); > return ret; > } > -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:11 ` Mikulas Patocka 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka @ 2020-09-05 12:13 ` Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong ` (2 more replies) 1 sibling, 3 replies; 18+ messages in thread From: Mikulas Patocka @ 2020-09-05 12:13 UTC (permalink / raw) To: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner Cc: Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/xfs/xfs_file.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) Index: linux-2.6/fs/xfs/xfs_file.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( return ret; } +static bool +xfs_is_write_fault( + struct vm_fault *vmf) +{ + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; +} + static vm_fault_t xfs_filemap_fault( struct vm_fault *vmf) @@ -1230,7 +1237,7 @@ xfs_filemap_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, IS_DAX(file_inode(vmf->vma->vm_file)) && - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_write_fault(vmf)); } static vm_fault_t @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, pe_size, - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_write_fault(vmf)); } static vm_fault_t _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka @ 2020-09-05 15:36 ` Darrick J. Wong 2020-09-05 17:02 ` Mikulas Patocka 2020-09-05 16:47 ` Linus Torvalds 2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig 2 siblings, 1 reply; 18+ messages in thread From: Darrick J. Wong @ 2020-09-05 15:36 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. > > This breaks building of the Linux kernel. > How to reproduce: > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > 2. run make clean > 3. run make -j12 > 4. run make -j12 > - at step 4, make would incorrectly rebuild the whole kernel (although it > was already built in step 3). > > The reason for the breakage is that almost all object files depend on > objtool. When we run objtool, it takes COW page fault on its .data > section, and these faults will incorrectly update the timestamp of the > objtool binary. The updated timestamp causes make to rebuild the whole > tree. > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > Cc: stable@vger.kernel.org > > --- > fs/xfs/xfs_file.c | 11 +++++++++-- > 1 file changed, 9 insertions(+), 2 deletions(-) > > Index: linux-2.6/fs/xfs/xfs_file.c > =================================================================== > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > return ret; > } > > +static bool > +xfs_is_write_fault( Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? You might as well make it a static inline. > + struct vm_fault *vmf) > +{ > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; Also, is "shortcutting the normal fault path" the reason for ext2 and xfs both being broken? /me puzzles over why write_fault is always true for page_mkwrite and pfn_mkwrite, but not for fault and huge_fault... Also: Can you please turn this (checking for timestamp update behavior wrt shared and private mapping write faults) into an fstest so we don't mess this up again? --D > +} > + > static vm_fault_t > xfs_filemap_fault( > struct vm_fault *vmf) > @@ -1230,7 +1237,7 @@ xfs_filemap_fault( > /* DAX can shortcut the normal fault path on write faults! */ > return __xfs_filemap_fault(vmf, PE_SIZE_PTE, > IS_DAX(file_inode(vmf->vma->vm_file)) && > - (vmf->flags & FAULT_FLAG_WRITE)); > + xfs_is_write_fault(vmf)); > } > > static vm_fault_t > @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( > > /* DAX can shortcut the normal fault path on write faults! */ > return __xfs_filemap_fault(vmf, pe_size, > - (vmf->flags & FAULT_FLAG_WRITE)); > + xfs_is_write_fault(vmf)); > } > > static vm_fault_t > _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 15:36 ` Darrick J. Wong @ 2020-09-05 17:02 ` Mikulas Patocka 2020-09-10 6:06 ` Darrick J. Wong 0 siblings, 1 reply; 18+ messages in thread From: Mikulas Patocka @ 2020-09-05 17:02 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs On Sat, 5 Sep 2020, Darrick J. Wong wrote: > On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > > when the user hits a COW fault. > > > > This breaks building of the Linux kernel. > > How to reproduce: > > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > > 2. run make clean > > 3. run make -j12 > > 4. run make -j12 > > - at step 4, make would incorrectly rebuild the whole kernel (although it > > was already built in step 3). > > > > The reason for the breakage is that almost all object files depend on > > objtool. When we run objtool, it takes COW page fault on its .data > > section, and these faults will incorrectly update the timestamp of the > > objtool binary. The updated timestamp causes make to rebuild the whole > > tree. > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > Cc: stable@vger.kernel.org > > > > --- > > fs/xfs/xfs_file.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > Index: linux-2.6/fs/xfs/xfs_file.c > > =================================================================== > > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > > return ret; > > } > > > > +static bool > > +xfs_is_write_fault( > > Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? > > You might as well make it a static inline. Yes, it is possible. I'll send a second version. > > + struct vm_fault *vmf) > > +{ > > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > > Also, is "shortcutting the normal fault path" the reason for ext2 and > xfs both being broken? > > /me puzzles over why write_fault is always true for page_mkwrite and > pfn_mkwrite, but not for fault and huge_fault... > > Also: Can you please turn this (checking for timestamp update behavior > wrt shared and private mapping write faults) into an fstest so we don't > mess this up again? I've written this program that tests it - you can integrate it into your testsuite. Mikulas #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #define FILE_NAME "test.txt" static struct stat st1, st2; int main(void) { int h, r; char *map; unlink(FILE_NAME); h = creat(FILE_NAME, 0600); if (h == -1) perror("creat"), exit(1); r = write(h, "x", 1); if (r != 1) perror("write"), exit(1); if (close(h)) perror("close"), exit(1); h = open(FILE_NAME, O_RDWR); if (h == -1) perror("open"), exit(1); map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); if (map == MAP_FAILED) perror("mmap"), exit(1); if (fstat(h, &st1)) perror("fstat"), exit(1); sleep(2); *map = 'y'; if (fstat(h, &st2)) perror("fstat"), exit(1); if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1); if (munmap(map, 4096)) perror("munmap"), exit(1); map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); if (map == MAP_FAILED) perror("mmap"), exit(1); if (fstat(h, &st1)) perror("fstat"), exit(1); sleep(2); *map = 'z'; if (fstat(h, &st2)) perror("fstat"), exit(1); if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); if (munmap(map, 4096)) perror("munmap"), exit(1); if (close(h)) perror("close"), exit(1); if (unlink(FILE_NAME)) perror("unlink"), exit(1); return 0; } _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 17:02 ` Mikulas Patocka @ 2020-09-10 6:06 ` Darrick J. Wong 2020-09-11 16:41 ` Mikulas Patocka 0 siblings, 1 reply; 18+ messages in thread From: Darrick J. Wong @ 2020-09-10 6:06 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs, Eric Sandeen On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote: > > > On Sat, 5 Sep 2020, Darrick J. Wong wrote: > > > On Sat, Sep 05, 2020 at 08:13:02AM -0400, Mikulas Patocka wrote: > > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > > > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > > > when the user hits a COW fault. > > > > > > This breaks building of the Linux kernel. > > > How to reproduce: > > > 1. extract the Linux kernel tree on dax-mounted xfs filesystem > > > 2. run make clean > > > 3. run make -j12 > > > 4. run make -j12 > > > - at step 4, make would incorrectly rebuild the whole kernel (although it > > > was already built in step 3). > > > > > > The reason for the breakage is that almost all object files depend on > > > objtool. When we run objtool, it takes COW page fault on its .data > > > section, and these faults will incorrectly update the timestamp of the > > > objtool binary. The updated timestamp causes make to rebuild the whole > > > tree. > > > > > > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> > > > Cc: stable@vger.kernel.org > > > > > > --- > > > fs/xfs/xfs_file.c | 11 +++++++++-- > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > Index: linux-2.6/fs/xfs/xfs_file.c > > > =================================================================== > > > --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 10:01:42.000000000 +0200 > > > +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 13:59:12.000000000 +0200 > > > @@ -1223,6 +1223,13 @@ __xfs_filemap_fault( > > > return ret; > > > } > > > > > > +static bool > > > +xfs_is_write_fault( > > > > Call this xfs_is_shared_dax_write_fault, and throw in the IS_DAX() test? > > > > You might as well make it a static inline. > > Yes, it is possible. I'll send a second version. > > > > + struct vm_fault *vmf) > > > +{ > > > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > > > > Also, is "shortcutting the normal fault path" the reason for ext2 and > > xfs both being broken? > > > > /me puzzles over why write_fault is always true for page_mkwrite and > > pfn_mkwrite, but not for fault and huge_fault... > > > > Also: Can you please turn this (checking for timestamp update behavior > > wrt shared and private mapping write faults) into an fstest so we don't > > mess this up again? > > I've written this program that tests it - you can integrate it into your > testsuite. I don't get it. You're a filesystem maintainer too, which means you're a regular contributor. Do you: (a) not use fstests? If you don't, I really hope you use something else to QA hpfs. (b) really think that it's my problem to integrate and submit your regression tests for you? > Mikulas > > > #include <stdio.h> and (c) what do you want me to do with a piece of code that has no signoff tag, no copyright, and no license? This is your patch, and therefore your responsibility to develop enough of an appropriate regression test in a proper form that the rest of us can easily determine we have the rights to contribute to it. I don't have a problem with helping to tweak a properly licensed and tagged test program into fstests, but this is a non-starter. --D > #include <stdlib.h> > #include <unistd.h> > #include <fcntl.h> > #include <string.h> > #include <sys/mman.h> > #include <sys/stat.h> > > #define FILE_NAME "test.txt" > > static struct stat st1, st2; > > int main(void) > { > int h, r; > char *map; > unlink(FILE_NAME); > h = creat(FILE_NAME, 0600); > if (h == -1) perror("creat"), exit(1); > r = write(h, "x", 1); > if (r != 1) perror("write"), exit(1); > if (close(h)) perror("close"), exit(1); > h = open(FILE_NAME, O_RDWR); > if (h == -1) perror("open"), exit(1); > > map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > if (fstat(h, &st1)) perror("fstat"), exit(1); > sleep(2); > *map = 'y'; > if (fstat(h, &st2)) perror("fstat"), exit(1); > if (memcmp(&st1, &st2, sizeof(struct stat))) fprintf(stderr, "BUG: COW fault changed time!\n"), exit(1); > if (munmap(map, 4096)) perror("munmap"), exit(1); > > map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); > if (map == MAP_FAILED) perror("mmap"), exit(1); > if (fstat(h, &st1)) perror("fstat"), exit(1); > sleep(2); > *map = 'z'; > if (fstat(h, &st2)) perror("fstat"), exit(1); > if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); > if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); > if (munmap(map, 4096)) perror("munmap"), exit(1); > > if (close(h)) perror("close"), exit(1); > if (unlink(FILE_NAME)) perror("unlink"), exit(1); > return 0; > } > _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-10 6:06 ` Darrick J. Wong @ 2020-09-11 16:41 ` Mikulas Patocka 0 siblings, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2020-09-11 16:41 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs, Eric Sandeen On Wed, 9 Sep 2020, Darrick J. Wong wrote: > On Sat, Sep 05, 2020 at 01:02:33PM -0400, Mikulas Patocka wrote: > > > > > > > > I've written this program that tests it - you can integrate it into your > > testsuite. > > I don't get it. You're a filesystem maintainer too, which means you're > a regular contributor. Do you: > > (a) not use fstests? If you don't, I really hope you use something else > to QA hpfs. I don't use xfstests on HPFS. I was testing it just by using it. Now I use it just a little, but I don't modify it much. > (b) really think that it's my problem to integrate and submit your > regression tests for you? > > and (c) what do you want me to do with a piece of code that has no > signoff tag, no copyright, and no license? This is your patch, and > therefore your responsibility to develop enough of an appropriate > regression test in a proper form that the rest of us can easily > determine we have the rights to contribute to it. If you want a full patch (I copied the script from test 313), I send it here. Mikulas From: Mikulas Patocka <mpatocka@redhat.com> Subject: [PATCH] check ctime and mtime vs mmap Check ctime and mtime are not updated on COW faults and that they are updated on shared faults Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> --- src/Makefile | 3 +- src/mmap-timestamp.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++ tests/generic/609 | 40 +++++++++++++++++++++++++++++++++++++ tests/generic/609.out | 2 + tests/generic/group | 1 5 files changed, 98 insertions(+), 1 deletion(-) Index: xfstests-dev/src/Makefile =================================================================== --- xfstests-dev.orig/src/Makefile 2020-09-06 12:38:40.000000000 +0200 +++ xfstests-dev/src/Makefile 2020-09-11 17:39:04.000000000 +0200 @@ -17,7 +17,8 @@ TARGETS = dirstress fill fill2 getpagesi t_mmap_cow_race t_mmap_fallocate fsync-err t_mmap_write_ro \ t_ext4_dax_journal_corruption t_ext4_dax_inline_corruption \ t_ofd_locks t_mmap_collision mmap-write-concurrent \ - t_get_file_time t_create_short_dirs t_create_long_dirs + t_get_file_time t_create_short_dirs t_create_long_dirs \ + mmap-timestamp LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \ preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \ Index: xfstests-dev/src/mmap-timestamp.c =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/src/mmap-timestamp.c 2020-09-11 18:21:40.000000000 +0200 @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2020 Red Hat, Inc. All Rights reserved. + +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <fcntl.h> +#include <string.h> +#include <sys/mman.h> +#include <sys/stat.h> + +#define FILE_NAME argv[1] + +static struct stat st1, st2; + +int main(int argc, char *argv[]) +{ + int h, r; + char *map; + unlink(FILE_NAME); + h = creat(FILE_NAME, 0600); + if (h == -1) perror("creat"), exit(1); + r = write(h, "x", 1); + if (r != 1) perror("write"), exit(1); + if (close(h)) perror("close"), exit(1); + h = open(FILE_NAME, O_RDWR); + if (h == -1) perror("open"), exit(1); + + map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, h, 0); + if (map == MAP_FAILED) perror("mmap"), exit(1); + if (fstat(h, &st1)) perror("fstat"), exit(1); + sleep(2); + *map = 'y'; + if (fstat(h, &st2)) perror("fstat"), exit(1); + if (st1.st_mtime != st2.st_mtime) fprintf(stderr, "BUG: COW fault changed mtime!\n"), exit(1); + if (st1.st_ctime != st2.st_ctime) fprintf(stderr, "BUG: COW fault changed ctime!\n"), exit(1); + if (munmap(map, 4096)) perror("munmap"), exit(1); + + map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_SHARED, h, 0); + if (map == MAP_FAILED) perror("mmap"), exit(1); + if (fstat(h, &st1)) perror("fstat"), exit(1); + sleep(2); + *map = 'z'; + if (msync(map, 4096, MS_SYNC)) perror("msync"), exit(1); + if (fstat(h, &st2)) perror("fstat"), exit(1); + if (st1.st_mtime == st2.st_mtime) fprintf(stderr, "BUG: Shared fault did not change mtime!\n"), exit(1); + if (st1.st_ctime == st2.st_ctime) fprintf(stderr, "BUG: Shared fault did not change ctime!\n"), exit(1); + if (munmap(map, 4096)) perror("munmap"), exit(1); + + if (close(h)) perror("close"), exit(1); + if (unlink(FILE_NAME)) perror("unlink"), exit(1); + return 0; +} Index: xfstests-dev/tests/generic/609 =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/tests/generic/609 2020-09-11 18:30:30.000000000 +0200 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2020 Red Hat, Inc. All Rights Reserved. +# +# FS QA Test No. 609 +# +# Check ctime and mtime are not updated on COW faults +# and that they are updated on shared faults +# +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo "QA output created by $seq" + +here=`pwd` +status=1 # failure is the default! +trap "_cleanup; exit \$status" 0 1 2 3 15 + +_cleanup() +{ + cd / + rm -f $testfile +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter + +# real QA test starts here +_supported_fs generic +_supported_os Linux +_require_test + +testfile=$TEST_DIR/testfile.$seq + +echo "Silence is golden" + +$here/src/mmap-timestamp $testfile 2>&1 + +status=0 +exit Index: xfstests-dev/tests/generic/609.out =================================================================== --- /dev/null 1970-01-01 00:00:00.000000000 +0000 +++ xfstests-dev/tests/generic/609.out 2020-09-11 18:24:24.000000000 +0200 @@ -0,0 +1,2 @@ +QA output created by 609 +Silence is golden Index: xfstests-dev/tests/generic/group =================================================================== --- xfstests-dev.orig/tests/generic/group 2020-09-06 12:38:40.000000000 +0200 +++ xfstests-dev/tests/generic/group 2020-09-11 18:25:09.000000000 +0200 @@ -611,3 +611,4 @@ 606 auto attr quick dax 607 auto attr quick dax 608 auto attr quick dax +609 auto quick _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong @ 2020-09-05 16:47 ` Linus Torvalds 2020-09-05 17:03 ` Linus Torvalds 2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka 2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig 2 siblings, 2 replies; 18+ messages in thread From: Linus Torvalds @ 2020-09-05 16:47 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat, Sep 5, 2020 at 5:13 AM Mikulas Patocka <mpatocka@redhat.com> wrote: > > When running in a dax mode, if the user maps a page with MAP_PRIVATE and > PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime > when the user hits a COW fault. So your patch is obviously correct, but at the same time I look at the (buggy) ext2/xfs code you fixed, and I go "well, that was a really natural mistake to make". So I get the feeling that "yes, this was an ext2 and xfs bug, but we kind of set those filesystems up to fail". Could this possibly have been avoided by having nicer interfaces? Grepping around, and doing a bit of "git blame", I note that ext4 used to have this exact same bug too, but it was fixed three years ago in commit fd96b8da68d3 ("ext4: fix fault handling when mounted with -o dax,ro") and nobody at the time clearly realized it might be a pattern. And honestly, it's possible that the pattern came from cut-and-paste errors, but it's equally likely that the pattern was there simply because it was such a natural pattern and such an easy and natural mistake to make. Maybe it's inevitable. Some people do want (and need) the information whether it was a write just because they care about the page table issues (ie marking the pte dirty etc). To that kind of situation, whether it's shared or not might not matter all that much. But to a filesystem, a private write vs a shared write are quite different things. So I don't really have any suggestions, and maybe it's just what it is, but maybe somebody has an idea for how to make it slightly less natural to make this mistake.. But maybe just a test-case is all it takes, like Darrick suggests. Linus _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 16:47 ` Linus Torvalds @ 2020-09-05 17:03 ` Linus Torvalds 2020-09-07 8:59 ` Jan Kara 2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka 1 sibling, 1 reply; 18+ messages in thread From: Linus Torvalds @ 2020-09-05 17:03 UTC (permalink / raw) To: Mikulas Patocka Cc: Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > So your patch is obviously correct, [..] Oh, and I had a xfs pull request in my inbox already, so rather than expect Darrick to do another one just for this and have Jan do one for ext2, I just applied these two directly as "ObviouslyCorrect(tm)". I added the "inline" as suggested by Darrick, and I also added parenthesis around the bit tests. Yes, I know the C precedence rules, but I just personally find the code easier to read if I don't even have to think about it and the different subexpressions of a logical operation are just visually very clear. And as I was editing the patch anyway... So that xfs helper function now looks like this +static inline bool +xfs_is_write_fault( + struct vm_fault *vmf) +{ + return (vmf->flags & FAULT_FLAG_WRITE) && + (vmf->vma->vm_flags & VM_SHARED); +} instead. Linus _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 17:03 ` Linus Torvalds @ 2020-09-07 8:59 ` Jan Kara 0 siblings, 0 replies; 18+ messages in thread From: Jan Kara @ 2020-09-07 8:59 UTC (permalink / raw) To: Linus Torvalds Cc: Mikulas Patocka, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs On Sat 05-09-20 10:03:20, Linus Torvalds wrote: > On Sat, Sep 5, 2020 at 9:47 AM Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > So your patch is obviously correct, [..] > > Oh, and I had a xfs pull request in my inbox already, so rather than > expect Darrick to do another one just for this and have Jan do one for > ext2, I just applied these two directly as "ObviouslyCorrect(tm)". OK, thanks! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2 v2] xfs: don't update mtime on COW faults 2020-09-05 16:47 ` Linus Torvalds 2020-09-05 17:03 ` Linus Torvalds @ 2020-09-05 17:04 ` Mikulas Patocka 1 sibling, 0 replies; 18+ messages in thread From: Mikulas Patocka @ 2020-09-05 17:04 UTC (permalink / raw) To: Darrick J. Wong Cc: Linus Torvalds, Jan Kara, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, Linux-MM, Linux Kernel Mailing List, linux-nvdimm, Ext4 Developers List, linux-xfs When running in a dax mode, if the user maps a page with MAP_PRIVATE and PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime when the user hits a COW fault. This breaks building of the Linux kernel. How to reproduce: 1. extract the Linux kernel tree on dax-mounted xfs filesystem 2. run make clean 3. run make -j12 4. run make -j12 - at step 4, make would incorrectly rebuild the whole kernel (although it was already built in step 3). The reason for the breakage is that almost all object files depend on objtool. When we run objtool, it takes COW page fault on its .data section, and these faults will incorrectly update the timestamp of the objtool binary. The updated timestamp causes make to rebuild the whole tree. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org --- fs/xfs/xfs_file.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) Index: linux-2.6/fs/xfs/xfs_file.c =================================================================== --- linux-2.6.orig/fs/xfs/xfs_file.c 2020-09-05 18:48:54.000000000 +0200 +++ linux-2.6/fs/xfs/xfs_file.c 2020-09-05 18:56:48.000000000 +0200 @@ -1223,14 +1223,21 @@ __xfs_filemap_fault( return ret; } +static inline bool +xfs_is_shared_dax_write_fault( + struct vm_fault *vmf) +{ + return IS_DAX(file_inode(vmf->vma->vm_file)) && + vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; +} + static vm_fault_t xfs_filemap_fault( struct vm_fault *vmf) { /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, PE_SIZE_PTE, - IS_DAX(file_inode(vmf->vma->vm_file)) && - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_shared_dax_write_fault(vmf)); } static vm_fault_t @@ -1243,7 +1250,7 @@ xfs_filemap_huge_fault( /* DAX can shortcut the normal fault path on write faults! */ return __xfs_filemap_fault(vmf, pe_size, - (vmf->flags & FAULT_FLAG_WRITE)); + xfs_is_shared_dax_write_fault(vmf)); } static vm_fault_t _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/2] xfs: don't update mtime on COW faults 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong 2020-09-05 16:47 ` Linus Torvalds @ 2020-09-07 6:47 ` Christoph Hellwig 2 siblings, 0 replies; 18+ messages in thread From: Christoph Hellwig @ 2020-09-07 6:47 UTC (permalink / raw) To: Mikulas Patocka Cc: Linus Torvalds, Jan Kara, Darrick J. Wong, Dave Chinner, Jann Horn, Christoph Hellwig, Oleg Nesterov, Kirill Shutemov, Theodore Ts'o, Andrea Arcangeli, Matthew Wilcox, Andrew Morton, linux-mm, linux-kernel, linux-nvdimm, linux-ext4, linux-xfs > +static bool > +xfs_is_write_fault( > + struct vm_fault *vmf) > +{ > + return vmf->flags & FAULT_FLAG_WRITE && vmf->vma->vm_flags & VM_SHARED; > +} This function does not look xfs specific at all. Why isn't it it in fs.h? While we're at it the name sounds rather generic, and there are no good comments. Maybe we just need to split FAULT_FLAG_WRITE into two and check those instead of such crazy workarounds? _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-09-11 16:41 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-03 19:24 a crash when running strace from persistent memory Mikulas Patocka 2020-09-03 19:55 ` Linus Torvalds 2020-09-04 8:08 ` Mikulas Patocka 2020-09-04 17:11 ` Linus Torvalds 2020-09-04 16:21 ` make misbehavior on ext2 in dax mode (was: a crash when running strace from persistent memory) Mikulas Patocka 2020-09-05 12:11 ` Mikulas Patocka 2020-09-05 12:12 ` [PATCH 1/2] ext2: don't update mtime on COW faults Mikulas Patocka 2020-09-07 9:00 ` Jan Kara 2020-09-05 12:13 ` [PATCH 2/2] xfs: " Mikulas Patocka 2020-09-05 15:36 ` Darrick J. Wong 2020-09-05 17:02 ` Mikulas Patocka 2020-09-10 6:06 ` Darrick J. Wong 2020-09-11 16:41 ` Mikulas Patocka 2020-09-05 16:47 ` Linus Torvalds 2020-09-05 17:03 ` Linus Torvalds 2020-09-07 8:59 ` Jan Kara 2020-09-05 17:04 ` [PATCH 2/2 v2] " Mikulas Patocka 2020-09-07 6:47 ` [PATCH 2/2] " Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).