All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
@ 2019-01-02 21:13 Dave Chinner
  2019-01-02 21:25 ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2019-01-02 21:13 UTC (permalink / raw)
  To: linux-xfs; +Cc: linux-mm, Matthew Wilcox, Dan Williams

Hi folks,

An overnight test run on a current TOT kernel failed generic/413
with the following dmesg output:

[ 9486.521975] run fstests generic/413 at 2019-01-02 16:50:14
[ 9486.664868] XFS (pmem0): Mounting V5 Filesystem
[ 9486.669103] XFS (pmem0): Ending clean mount
[ 9486.892718] XFS (pmem1): Unmounting Filesystem
[ 9486.932496] XFS (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[ 9486.935203] XFS (pmem1): Mounting V4 Filesystem
[ 9486.938639] XFS (pmem1): Ending clean mount
[ 9486.980640] XFS (pmem1): Unmounting Filesystem
[ 9487.060934] XFS (pmem0): Unmounting Filesystem
[ 9487.073078] XFS (pmem0): Mounting V5 Filesystem
[ 9487.077239] XFS (pmem0): Ending clean mount
[ 9487.093628] XFS (pmem1): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[ 9487.096252] XFS (pmem1): Mounting V4 Filesystem
[ 9487.099734] XFS (pmem1): Ending clean mount
[ 9487.262308] BUG: unable to handle kernel paging request at fffffffff3ff842c
[ 9487.264682] #PF error: [normal kernel read fault]
[ 9487.266540] PGD 2410067 P4D 2410067 PUD 2412067 PMD 0
[ 9487.268734] Oops: 0000 [#1] PREEMPT SMP
[ 9487.270551] CPU: 10 PID: 6118 Comm: t_mmap_dio Not tainted 4.20.0-dgc+ #920
[ 9487.273603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
[ 9487.278034] Code: 12 01 00 00 4d 85 e4 0f 84 b7 00 00 00 48 89 d8 48 25 00 f0 ff ff 49 89 44 24 08 48 05 00 10 00 00 49 89 44 24 10 49 8b 04 24 <48> 83 b8 30 04 00 00 00 74 0e 41 c69
[ 9487.284301] RSP: 0018:ffffc9000282fc60 EFLAGS: 00010206
[ 9487.286068] RAX: fffffffff3ff7ffc RBX: 00007f5e53501000 RCX: fff0000000000fff
[ 9487.288451] RDX: 000000033b5d1067 RSI: 0000000000000000 RDI: 0000000000000001
[ 9487.290845] RBP: ffff8882e4af94d0 R08: ffffc9000282fca8 R09: ffffc9000282fcb0
[ 9487.293232] R10: 00000002e4af9000 R11: ffff8880000004d0 R12: ffffc9000282fcb8
[ 9487.295614] R13: ffff88833f9b4500 R14: ffffc9000282fcb0 R15: ffffc9000282fca8
[ 9487.298011] FS:  00007f5e528e3740(0000) GS:ffff88833fd00000(0000) knlGS:0000000000000000
[ 9487.300722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9487.302666] CR2: fffffffff3ff842c CR3: 0000000338f95001 CR4: 0000000000060ee0
[ 9487.305065] Call Trace:
[ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
[ 9487.309096]  ? xas_store+0x29/0x530
[ 9487.310307]  dax_writeback_mapping_range+0x1c2/0x560
[ 9487.311986]  do_writepages+0x3e/0xe0
[ 9487.315335]  ? __sb_end_write+0x39/0x60
[ 9487.316648]  ? touch_atime+0xd1/0xe0
[ 9487.317886]  __filemap_fdatawrite_range+0x81/0xb0
[ 9487.323525]  file_write_and_wait_range+0x4c/0xa0
[ 9487.325466]  xfs_file_fsync+0x5d/0x260
[ 9487.329938]  __x64_sys_msync+0x181/0x200
[ 9487.331416]  do_syscall_64+0x54/0x170
[ 9487.332671]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 9487.334408] RIP: 0033:0x7f5e530c2ba1
[ 9487.335625] Code: 00 48 8b 15 21 a4 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 0f 1f 40 00 8b 05 6a e8 00 00 85 c0 75 16 b8 1a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f3
[ 9487.341909] RSP: 002b:00007ffeaaf9fcd8 EFLAGS: 00000246 ORIG_RAX: 000000000000001a
[ 9487.344435] RAX: ffffffffffffffda RBX: 0000000000000400 RCX: 00007f5e530c2ba1
[ 9487.346809] RDX: 0000000000000004 RSI: 0000000000000400 RDI: 00007f5e53501000
[ 9487.349204] RBP: 00007ffeaafa1ab3 R08: 0000000000000003 R09: 0000000000000000
[ 9487.351591] R10: 0000000000000103 R11: 0000000000000246 R12: 0000000000000004
[ 9487.353987] R13: 0000000000000003 R14: 00007ffeaafa1a9c R15: 00007f5e53501000
[ 9487.356376] CR2: fffffffff3ff842c
[ 9487.357519] ---[ end trace 40e0c04119f18109 ]---
[ 9487.359076] RIP: 0010:__follow_pte_pmd+0x22d/0x340
[ 9487.360693] Code: 12 01 00 00 4d 85 e4 0f 84 b7 00 00 00 48 89 d8 48 25 00 f0 ff ff 49 89 44 24 08 48 05 00 10 00 00 49 89 44 24 10 49 8b 04 24 <48> 83 b8 30 04 00 00 00 74 0e 41 c69
[ 9487.366924] RSP: 0018:ffffc9000282fc60 EFLAGS: 00010206
[ 9487.368676] RAX: fffffffff3ff7ffc RBX: 00007f5e53501000 RCX: fff0000000000fff
[ 9487.371055] RDX: 000000033b5d1067 RSI: 0000000000000000 RDI: 0000000000000001
[ 9487.373439] RBP: ffff8882e4af94d0 R08: ffffc9000282fca8 R09: ffffc9000282fcb0
[ 9487.375812] R10: 00000002e4af9000 R11: ffff8880000004d0 R12: ffffc9000282fcb8
[ 9487.378200] R13: ffff88833f9b4500 R14: ffffc9000282fcb0 R15: ffffc9000282fca8
[ 9487.380581] FS:  00007f5e528e3740(0000) GS:ffff88833fd00000(0000) knlGS:0000000000000000
[ 9487.383288] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9487.385227] CR2: fffffffff3ff842c CR3: 0000000338f95001 CR4: 0000000000060ee0
[ 9487.387616] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:34
[ 9487.390580] in_atomic(): 0, irqs_disabled(): 1, pid: 6118, name: t_mmap_dio
[ 9487.392916] CPU: 10 PID: 6118 Comm: t_mmap_dio Tainted: G      D           4.20.0-dgc+ #920
[ 9487.395716] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[ 9487.398512] Call Trace:
[ 9487.399391]  dump_stack+0x67/0x90
[ 9487.400539]  ___might_sleep.cold.83+0x80/0x8d
[ 9487.413519]  exit_signals+0x30/0x230
[ 9487.421507]  do_exit+0xb4/0xbe0
[ 9487.423081]  ? __x64_sys_msync+0x181/0x200
[ 9487.424613]  rewind_stack_do_exit+0x17/0x20

This is with MKFS_OPTIONS="-m crc=0". No idea if it is reproducable,
but I've never seen this before so my initial thoughts is that it is
a merge window regression. Looks like a DAX or Xarray issue, and
it's reproducable (reboot and rerun g/413 immediately reproduced
it).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-02 21:13 [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd() Dave Chinner
@ 2019-01-02 21:25 ` Matthew Wilcox
  2019-01-02 22:50   ` Matthew Wilcox
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-01-02 21:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-mm, Dan Williams, Jérôme Glisse,
	Christian König, Jan Kara, akpm

On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> Hi folks,
> 
> An overnight test run on a current TOT kernel failed generic/413
> with the following dmesg output:
> 
> [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> [ 9487.305065] Call Trace:
> [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0

We've only got one commit touching dax_entry_mkclean and it's Jerome's.
Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
it's missing a call to mmu_notifier_range_init().

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-02 21:25 ` Matthew Wilcox
@ 2019-01-02 22:50   ` Matthew Wilcox
  2019-01-03  0:03     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2019-01-02 22:50 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-xfs, linux-mm, Dan Williams, Jérôme Glisse,
	Christian König, Jan Kara, akpm

On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > An overnight test run on a current TOT kernel failed generic/413
> > with the following dmesg output:
> > 
> > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > [ 9487.305065] Call Trace:
> > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> 
> We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> it's missing a call to mmu_notifier_range_init().

Could I persuade you to give this a try?

diff --git a/mm/memory.c b/mm/memory.c
index 2dd2f9ab57f4..21a650368be0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4078,8 +4078,8 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
 		goto out;
 
 	if (range) {
-		range->start = address & PAGE_MASK;
-		range->end = range->start + PAGE_SIZE;
+		mmu_notifier_range_init(range, mm, address & PAGE_MASK,
+				     (address & PAGE_MASK) + PAGE_SIZE);
 		mmu_notifier_invalidate_range_start(range);
 	}
 	ptep = pte_offset_map_lock(mm, pmd, address, ptlp);

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-02 22:50   ` Matthew Wilcox
@ 2019-01-03  0:03     ` Dave Chinner
  2019-01-03 19:11       ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2019-01-03  0:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-xfs, linux-mm, Dan Williams, Jérôme Glisse,
	Christian König, Jan Kara, akpm

On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > An overnight test run on a current TOT kernel failed generic/413
> > > with the following dmesg output:
> > > 
> > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > [ 9487.305065] Call Trace:
> > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > 
> > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > it's missing a call to mmu_notifier_range_init().
> 
> Could I persuade you to give this a try?

Yup, that fixes it.

And looking at the code, the dax mmu notifier code clearly wasn't
tested. i.e. dax_entry_mkclean() is the *only* code that exercises
the conditional range parameter code paths inside
__follow_pte_pmd().  This means it wasn't tested before it was
proposed for inclusion and since inclusion no-one using -akpm,
linux-next or the current mainline TOT has done any filesystem DAX
testing until I tripped over it.

IOws, this is the second "this was never tested before it was merged
into mainline" XFS regression that I've found in the last 3 weeks.
Both commits have been merged through the -akpm tree, and that
implies we currently have no significant filesystem QA coverage on
changes being merged through this route. This seems like an area
that needs significant improvement to me....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-03  0:03     ` Dave Chinner
@ 2019-01-03 19:11       ` Dan Williams
  2019-01-03 19:19         ` Andrew Morton
  2019-01-03 19:30           ` Jerome Glisse
  0 siblings, 2 replies; 9+ messages in thread
From: Dan Williams @ 2019-01-03 19:11 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Matthew Wilcox, linux-xfs, Linux MM, Jérôme Glisse,
	Christian König, Jan Kara, Andrew Morton

On Wed, Jan 2, 2019 at 4:04 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> > On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > > Hi folks,
> > > >
> > > > An overnight test run on a current TOT kernel failed generic/413
> > > > with the following dmesg output:
> > > >
> > > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > > [ 9487.305065] Call Trace:
> > > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > >
> > > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > > it's missing a call to mmu_notifier_range_init().
> >
> > Could I persuade you to give this a try?
>
> Yup, that fixes it.
>
> And looking at the code, the dax mmu notifier code clearly wasn't
> tested. i.e. dax_entry_mkclean() is the *only* code that exercises
> the conditional range parameter code paths inside
> __follow_pte_pmd().  This means it wasn't tested before it was
> proposed for inclusion and since inclusion no-one using -akpm,
> linux-next or the current mainline TOT has done any filesystem DAX
> testing until I tripped over it.
>
> IOws, this is the second "this was never tested before it was merged
> into mainline" XFS regression that I've found in the last 3 weeks.
> Both commits have been merged through the -akpm tree, and that
> implies we currently have no significant filesystem QA coverage on
> changes being merged through this route. This seems like an area
> that needs significant improvement to me....

Yes, this is also part of a series I explicitly NAK'd [1] because
there are no upstream users for it. I didn't bother to test it because
I thought the NAK was sufficient.

Andrew, any reason to not revert the set? They provide no upstream
value and actively break DAX.

[1]: https://www.spinics.net/lists/linux-fsdevel/msg137309.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-03 19:11       ` Dan Williams
@ 2019-01-03 19:19         ` Andrew Morton
  2019-01-03 20:25           ` Dan Williams
  2019-01-03 19:30           ` Jerome Glisse
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2019-01-03 19:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Chinner, Matthew Wilcox, linux-xfs, Linux MM,
	Jérôme Glisse, Christian König, Jan Kara

On Thu, 3 Jan 2019 11:11:49 -0800 Dan Williams <dan.j.williams@intel.com> wrote:

> On Wed, Jan 2, 2019 at 4:04 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> > > On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > > > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > > > Hi folks,
> > > > >
> > > > > An overnight test run on a current TOT kernel failed generic/413
> > > > > with the following dmesg output:
> > > > >
> > > > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > > > [ 9487.305065] Call Trace:
> > > > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > > >
> > > > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > > > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > > > it's missing a call to mmu_notifier_range_init().
> > >
> > > Could I persuade you to give this a try?
> >
> > Yup, that fixes it.
> >
> > And looking at the code, the dax mmu notifier code clearly wasn't
> > tested. i.e. dax_entry_mkclean() is the *only* code that exercises
> > the conditional range parameter code paths inside
> > __follow_pte_pmd().  This means it wasn't tested before it was
> > proposed for inclusion and since inclusion no-one using -akpm,
> > linux-next or the current mainline TOT has done any filesystem DAX
> > testing until I tripped over it.
> >
> > IOws, this is the second "this was never tested before it was merged
> > into mainline" XFS regression that I've found in the last 3 weeks.
> > Both commits have been merged through the -akpm tree, and that
> > implies we currently have no significant filesystem QA coverage on
> > changes being merged through this route. This seems like an area
> > that needs significant improvement to me....
> 
> Yes, this is also part of a series I explicitly NAK'd [1] because
> there are no upstream users for it. I didn't bother to test it because
> I thought the NAK was sufficient.
> 
> Andrew, any reason to not revert the set? They provide no upstream
> value and actively break DAX.
> 
> [1]: https://www.spinics.net/lists/linux-fsdevel/msg137309.html

You objected to "mm/mmu_notifier: contextual information for event
triggering invalidation" and, agreeing, I have held that back pending
further examination.

The culprit here appears to be ac46d4f3c ("mm/mmu_notifier: use
structure for invalidate_range_start/end calls") which seems to have a
bug, which appears to now have a fix?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-03 19:11       ` Dan Williams
@ 2019-01-03 19:30           ` Jerome Glisse
  2019-01-03 19:30           ` Jerome Glisse
  1 sibling, 0 replies; 9+ messages in thread
From: Jerome Glisse @ 2019-01-03 19:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Chinner, Matthew Wilcox, linux-xfs, Linux MM,
	Christian König, Jan Kara, Andrew Morton

On Thu, Jan 03, 2019 at 11:11:49AM -0800, Dan Williams wrote:
> On Wed, Jan 2, 2019 at 4:04 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> > > On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > > > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > > > Hi folks,
> > > > >
> > > > > An overnight test run on a current TOT kernel failed generic/413
> > > > > with the following dmesg output:
> > > > >
> > > > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > > > [ 9487.305065] Call Trace:
> > > > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > > >
> > > > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > > > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > > > it's missing a call to mmu_notifier_range_init().
> > >
> > > Could I persuade you to give this a try?
> >
> > Yup, that fixes it.
> >
> > And looking at the code, the dax mmu notifier code clearly wasn't
> > tested. i.e. dax_entry_mkclean() is the *only* code that exercises
> > the conditional range parameter code paths inside
> > __follow_pte_pmd().  This means it wasn't tested before it was
> > proposed for inclusion and since inclusion no-one using -akpm,
> > linux-next or the current mainline TOT has done any filesystem DAX
> > testing until I tripped over it.
> >
> > IOws, this is the second "this was never tested before it was merged
> > into mainline" XFS regression that I've found in the last 3 weeks.
> > Both commits have been merged through the -akpm tree, and that
> > implies we currently have no significant filesystem QA coverage on
> > changes being merged through this route. This seems like an area
> > that needs significant improvement to me....
> 
> Yes, this is also part of a series I explicitly NAK'd [1] because
> there are no upstream users for it. I didn't bother to test it because
> I thought the NAK was sufficient.
> 
> Andrew, any reason to not revert the set? They provide no upstream
> value and actively break DAX.
> 
> [1]: https://www.spinics.net/lists/linux-fsdevel/msg137309.html

I tested it but with the patch that was not included and that
extra patch did properly initialize the range struct. So the
patchset had a broken step.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
@ 2019-01-03 19:30           ` Jerome Glisse
  0 siblings, 0 replies; 9+ messages in thread
From: Jerome Glisse @ 2019-01-03 19:30 UTC (permalink / raw)
  To: Dan Williams
  Cc: Dave Chinner, Matthew Wilcox, linux-xfs, Linux MM,
	Christian König, Jan Kara, Andrew Morton

On Thu, Jan 03, 2019 at 11:11:49AM -0800, Dan Williams wrote:
> On Wed, Jan 2, 2019 at 4:04 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> > > On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > > > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > > > Hi folks,
> > > > >
> > > > > An overnight test run on a current TOT kernel failed generic/413
> > > > > with the following dmesg output:
> > > > >
> > > > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > > > [ 9487.305065] Call Trace:
> > > > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > > >
> > > > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > > > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > > > it's missing a call to mmu_notifier_range_init().
> > >
> > > Could I persuade you to give this a try?
> >
> > Yup, that fixes it.
> >
> > And looking at the code, the dax mmu notifier code clearly wasn't
> > tested. i.e. dax_entry_mkclean() is the *only* code that exercises
> > the conditional range parameter code paths inside
> > __follow_pte_pmd().  This means it wasn't tested before it was
> > proposed for inclusion and since inclusion no-one using -akpm,
> > linux-next or the current mainline TOT has done any filesystem DAX
> > testing until I tripped over it.
> >
> > IOws, this is the second "this was never tested before it was merged
> > into mainline" XFS regression that I've found in the last 3 weeks.
> > Both commits have been merged through the -akpm tree, and that
> > implies we currently have no significant filesystem QA coverage on
> > changes being merged through this route. This seems like an area
> > that needs significant improvement to me....
> 
> Yes, this is also part of a series I explicitly NAK'd [1] because
> there are no upstream users for it. I didn't bother to test it because
> I thought the NAK was sufficient.
> 
> Andrew, any reason to not revert the set? They provide no upstream
> value and actively break DAX.
> 
> [1]: https://www.spinics.net/lists/linux-fsdevel/msg137309.html

I tested it but with the patch that was not included and that
extra patch did properly initialize the range struct. So the
patchset had a broken step.

Cheers,
J�r�me

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd()
  2019-01-03 19:19         ` Andrew Morton
@ 2019-01-03 20:25           ` Dan Williams
  0 siblings, 0 replies; 9+ messages in thread
From: Dan Williams @ 2019-01-03 20:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Dave Chinner, Matthew Wilcox, linux-xfs, Linux MM,
	Jérôme Glisse, Christian König, Jan Kara

On Thu, Jan 3, 2019 at 11:19 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 3 Jan 2019 11:11:49 -0800 Dan Williams <dan.j.williams@intel.com> wrote:
>
> > On Wed, Jan 2, 2019 at 4:04 PM Dave Chinner <david@fromorbit.com> wrote:
> > >
> > > On Wed, Jan 02, 2019 at 02:50:05PM -0800, Matthew Wilcox wrote:
> > > > On Wed, Jan 02, 2019 at 01:25:31PM -0800, Matthew Wilcox wrote:
> > > > > On Thu, Jan 03, 2019 at 08:13:32AM +1100, Dave Chinner wrote:
> > > > > > Hi folks,
> > > > > >
> > > > > > An overnight test run on a current TOT kernel failed generic/413
> > > > > > with the following dmesg output:
> > > > > >
> > > > > > [ 9487.276402] RIP: 0010:__follow_pte_pmd+0x22d/0x340
> > > > > > [ 9487.305065] Call Trace:
> > > > > > [ 9487.307310]  dax_entry_mkclean+0xbb/0x1f0
> > > > >
> > > > > We've only got one commit touching dax_entry_mkclean and it's Jerome's.
> > > > > Looking through ac46d4f3c43241ffa23d5bf36153a0830c0e02cc, I'd say
> > > > > it's missing a call to mmu_notifier_range_init().
> > > >
> > > > Could I persuade you to give this a try?
> > >
> > > Yup, that fixes it.
> > >
> > > And looking at the code, the dax mmu notifier code clearly wasn't
> > > tested. i.e. dax_entry_mkclean() is the *only* code that exercises
> > > the conditional range parameter code paths inside
> > > __follow_pte_pmd().  This means it wasn't tested before it was
> > > proposed for inclusion and since inclusion no-one using -akpm,
> > > linux-next or the current mainline TOT has done any filesystem DAX
> > > testing until I tripped over it.
> > >
> > > IOws, this is the second "this was never tested before it was merged
> > > into mainline" XFS regression that I've found in the last 3 weeks.
> > > Both commits have been merged through the -akpm tree, and that
> > > implies we currently have no significant filesystem QA coverage on
> > > changes being merged through this route. This seems like an area
> > > that needs significant improvement to me....
> >
> > Yes, this is also part of a series I explicitly NAK'd [1] because
> > there are no upstream users for it. I didn't bother to test it because
> > I thought the NAK was sufficient.
> >
> > Andrew, any reason to not revert the set? They provide no upstream
> > value and actively break DAX.
> >
> > [1]: https://www.spinics.net/lists/linux-fsdevel/msg137309.html
>
> You objected to "mm/mmu_notifier: contextual information for event
> triggering invalidation" and, agreeing, I have held that back pending
> further examination.

Ah, ok, I thought the whole set went in, my mistake.

> The culprit here appears to be ac46d4f3c ("mm/mmu_notifier: use
> structure for invalidate_range_start/end calls") which seems to have a
> bug, which appears to now have a fix?

It does, but I'm not sure we need the rest of the code movement
without the missing final step that builds on the refactoring.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-01-03 20:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-02 21:13 [BUG, TOT] xfs w/ dax failure in __follow_pte_pmd() Dave Chinner
2019-01-02 21:25 ` Matthew Wilcox
2019-01-02 22:50   ` Matthew Wilcox
2019-01-03  0:03     ` Dave Chinner
2019-01-03 19:11       ` Dan Williams
2019-01-03 19:19         ` Andrew Morton
2019-01-03 20:25           ` Dan Williams
2019-01-03 19:30         ` Jerome Glisse
2019-01-03 19:30           ` Jerome Glisse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.