* [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context @ 2020-09-01 13:06 Johannes Thumshirn 2020-09-01 13:11 ` Johannes Thumshirn ` (3 more replies) 0 siblings, 4 replies; 31+ messages in thread From: Johannes Thumshirn @ 2020-09-01 13:06 UTC (permalink / raw) To: David Sterba Cc: linux-btrfs @ vger . kernel . org, Filipe Manana, Josef Bacik, Johannes Thumshirn Fstests generic/113 exposes a deadlock introduced by the switch to iomap for direct I/O. [ 18.291293] [ 18.291532] ============================================ [ 18.292115] WARNING: possible recursive locking detected [ 18.292723] 5.9.0-rc2+ #746 Not tainted [ 18.293145] -------------------------------------------- [ 18.293718] aio-stress/922 is trying to acquire lock: [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] [ 18.295450] [ 18.295450] but task is already holding lock: [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] [ 18.297249] [ 18.297249] other info that might help us debug this: [ 18.297960] Possible unsafe locking scenario: [ 18.297960] [ 18.298605] CPU0 [ 18.298880] ---- [ 18.299151] lock(&sb->s_type->i_mutex_key#11); [ 18.299653] lock(&sb->s_type->i_mutex_key#11); [ 18.300156] [ 18.300156] *** DEADLOCK *** [ 18.300156] [ 18.300802] May be due to missing lock nesting notation [ 18.300802] [ 18.301542] 2 locks held by aio-stress/922: [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] [ 18.304223] [ 18.304223] stack backtrace: [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 [ 18.306532] Call Trace: [ 18.306796] dump_stack+0x78/0xa0 [ 18.307145] __lock_acquire.cold+0x121/0x29f [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] [ 18.308140] lock_acquire+0x93/0x3b0 [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] [ 18.309036] down_write+0x33/0x70 [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] [ 18.310384] iomap_dio_complete+0x10d/0x120 [ 18.310824] iomap_dio_rw+0x3c8/0x520 [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] [ 18.312264] ? find_held_lock+0x2b/0x80 [ 18.312662] aio_write+0xcd/0x180 [ 18.313011] ? __might_fault+0x31/0x80 [ 18.313408] ? find_held_lock+0x2b/0x80 [ 18.313817] ? __might_fault+0x31/0x80 [ 18.314217] io_submit_one+0x4e1/0xb30 [ 18.314606] ? find_held_lock+0x2b/0x80 [ 18.315010] __x64_sys_io_submit+0x71/0x220 [ 18.315449] do_syscall_64+0x33/0x40 [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 18.316363] RIP: 0033:0x7f5940881f79 [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 This happens because iomap_dio_complete() calls into generic_write_sync() if we have the data-sync flag set. But as we're still under the inode_lock() from btrfs_file_write_iter() we will deadlock once btrfs_sync_file() tries to acquire the inode_lock(). Calling into generic_write_sync() is not needed as __btrfs_direct_write() already takes care of persisting the data on disk. We can temporarily drop the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the iomap code won't try to call into the sync routines as well. References: https://github.com/btrfs/fstests/issues/12 Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index b62679382799..c75c0f2a5f72 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, atomic_inc(&BTRFS_I(inode)->sync_writers); if (iocb->ki_flags & IOCB_DIRECT) { + iocb->ki_flags &= ~IOCB_DSYNC; num_written = __btrfs_direct_write(iocb, from); } else { num_written = btrfs_buffered_write(iocb, from); @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, if (num_written > 0) num_written = generic_write_sync(iocb, num_written); - if (sync) + if (sync) { + iocb->ki_flags |= IOCB_DSYNC; atomic_dec(&BTRFS_I(inode)->sync_writers); + } out: current->backing_dev_info = NULL; return num_written ? num_written : err; -- 2.26.2 ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 13:06 [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Johannes Thumshirn @ 2020-09-01 13:11 ` Johannes Thumshirn 2020-09-01 14:17 ` Goldwyn Rodrigues ` (2 subsequent siblings) 3 siblings, 0 replies; 31+ messages in thread From: Johannes Thumshirn @ 2020-09-01 13:11 UTC (permalink / raw) To: David Sterba Cc: linux-btrfs @ vger . kernel . org, Filipe Manana, Josef Bacik, Goldwyn Rodrigues Oops forgot to Cc Goldwyn On 01/09/2020 15:07, Johannes Thumshirn wrote: > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > for direct I/O. > > [ 18.291293] > [ 18.291532] ============================================ > [ 18.292115] WARNING: possible recursive locking detected > [ 18.292723] 5.9.0-rc2+ #746 Not tainted > [ 18.293145] -------------------------------------------- > [ 18.293718] aio-stress/922 is trying to acquire lock: > [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.295450] > [ 18.295450] but task is already holding lock: > [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.297249] > [ 18.297249] other info that might help us debug this: > [ 18.297960] Possible unsafe locking scenario: > [ 18.297960] > [ 18.298605] CPU0 > [ 18.298880] ---- > [ 18.299151] lock(&sb->s_type->i_mutex_key#11); > [ 18.299653] lock(&sb->s_type->i_mutex_key#11); > [ 18.300156] > [ 18.300156] *** DEADLOCK *** > [ 18.300156] > [ 18.300802] May be due to missing lock nesting notation > [ 18.300802] > [ 18.301542] 2 locks held by aio-stress/922: > [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > [ 18.304223] > [ 18.304223] stack backtrace: > [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > [ 18.306532] Call Trace: > [ 18.306796] dump_stack+0x78/0xa0 > [ 18.307145] __lock_acquire.cold+0x121/0x29f > [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > [ 18.308140] lock_acquire+0x93/0x3b0 > [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309036] down_write+0x33/0x70 > [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.310384] iomap_dio_complete+0x10d/0x120 > [ 18.310824] iomap_dio_rw+0x3c8/0x520 > [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] > [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] > [ 18.312264] ? find_held_lock+0x2b/0x80 > [ 18.312662] aio_write+0xcd/0x180 > [ 18.313011] ? __might_fault+0x31/0x80 > [ 18.313408] ? find_held_lock+0x2b/0x80 > [ 18.313817] ? __might_fault+0x31/0x80 > [ 18.314217] io_submit_one+0x4e1/0xb30 > [ 18.314606] ? find_held_lock+0x2b/0x80 > [ 18.315010] __x64_sys_io_submit+0x71/0x220 > [ 18.315449] do_syscall_64+0x33/0x40 > [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 18.316363] RIP: 0033:0x7f5940881f79 > [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > This happens because iomap_dio_complete() calls into generic_write_sync() > if we have the data-sync flag set. But as we're still under the > inode_lock() from btrfs_file_write_iter() we will deadlock once > btrfs_sync_file() tries to acquire the inode_lock(). > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > already takes care of persisting the data on disk. We can temporarily drop > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > iomap code won't try to call into the sync routines as well. > > References: https://github.com/btrfs/fstests/issues/12 > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/file.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b62679382799..c75c0f2a5f72 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > + iocb->ki_flags &= ~IOCB_DSYNC; > num_written = __btrfs_direct_write(iocb, from); > } else { > num_written = btrfs_buffered_write(iocb, from); > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > if (num_written > 0) > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > + if (sync) { > + iocb->ki_flags |= IOCB_DSYNC; > atomic_dec(&BTRFS_I(inode)->sync_writers); > + } > out: > current->backing_dev_info = NULL; > return num_written ? num_written : err; > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 13:06 [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Johannes Thumshirn 2020-09-01 13:11 ` Johannes Thumshirn @ 2020-09-01 14:17 ` Goldwyn Rodrigues 2020-09-01 14:20 ` Johannes Thumshirn 2020-09-01 14:37 ` Filipe Manana 2020-09-01 15:11 ` Josef Bacik 3 siblings, 1 reply; 31+ messages in thread From: Goldwyn Rodrigues @ 2020-09-01 14:17 UTC (permalink / raw) To: Johannes Thumshirn Cc: David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Josef Bacik Hi Johannes, Thanks for the fix. On 22:06 01/09, Johannes Thumshirn wrote: > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > for direct I/O. > > [ 18.291293] > [ 18.291532] ============================================ > [ 18.292115] WARNING: possible recursive locking detected > [ 18.292723] 5.9.0-rc2+ #746 Not tainted > [ 18.293145] -------------------------------------------- > [ 18.293718] aio-stress/922 is trying to acquire lock: > [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.295450] > [ 18.295450] but task is already holding lock: > [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.297249] > [ 18.297249] other info that might help us debug this: > [ 18.297960] Possible unsafe locking scenario: > [ 18.297960] > [ 18.298605] CPU0 > [ 18.298880] ---- > [ 18.299151] lock(&sb->s_type->i_mutex_key#11); > [ 18.299653] lock(&sb->s_type->i_mutex_key#11); > [ 18.300156] > [ 18.300156] *** DEADLOCK *** > [ 18.300156] > [ 18.300802] May be due to missing lock nesting notation > [ 18.300802] > [ 18.301542] 2 locks held by aio-stress/922: > [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > [ 18.304223] > [ 18.304223] stack backtrace: > [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > [ 18.306532] Call Trace: > [ 18.306796] dump_stack+0x78/0xa0 > [ 18.307145] __lock_acquire.cold+0x121/0x29f > [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > [ 18.308140] lock_acquire+0x93/0x3b0 > [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309036] down_write+0x33/0x70 > [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.310384] iomap_dio_complete+0x10d/0x120 > [ 18.310824] iomap_dio_rw+0x3c8/0x520 > [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] > [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] > [ 18.312264] ? find_held_lock+0x2b/0x80 > [ 18.312662] aio_write+0xcd/0x180 > [ 18.313011] ? __might_fault+0x31/0x80 > [ 18.313408] ? find_held_lock+0x2b/0x80 > [ 18.313817] ? __might_fault+0x31/0x80 > [ 18.314217] io_submit_one+0x4e1/0xb30 > [ 18.314606] ? find_held_lock+0x2b/0x80 > [ 18.315010] __x64_sys_io_submit+0x71/0x220 > [ 18.315449] do_syscall_64+0x33/0x40 > [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 18.316363] RIP: 0033:0x7f5940881f79 > [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > This happens because iomap_dio_complete() calls into generic_write_sync() > if we have the data-sync flag set. But as we're still under the > inode_lock() from btrfs_file_write_iter() we will deadlock once > btrfs_sync_file() tries to acquire the inode_lock(). > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > already takes care of persisting the data on disk. We can temporarily drop > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > iomap code won't try to call into the sync routines as well. > > References: https://github.com/btrfs/fstests/issues/12 > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/file.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b62679382799..c75c0f2a5f72 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > + iocb->ki_flags &= ~IOCB_DSYNC; > num_written = __btrfs_direct_write(iocb, from); > } else { > num_written = btrfs_buffered_write(iocb, from); > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > if (num_written > 0) > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > + if (sync) { > + iocb->ki_flags |= IOCB_DSYNC; > atomic_dec(&BTRFS_I(inode)->sync_writers); > + } > out: > current->backing_dev_info = NULL; > return num_written ? num_written : err; This must be applied before calling generic_write_sync() as opposed to after so generic_write_sync() can be called with the correct parameters. This is required for AIO cases. -- Goldwyn ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 14:17 ` Goldwyn Rodrigues @ 2020-09-01 14:20 ` Johannes Thumshirn 0 siblings, 0 replies; 31+ messages in thread From: Johannes Thumshirn @ 2020-09-01 14:20 UTC (permalink / raw) To: Goldwyn Rodrigues Cc: David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Josef Bacik On 01/09/2020 16:17, Goldwyn Rodrigues wrote: > This must be applied before calling generic_write_sync() as opposed to > after so generic_write_sync() can be called with the correct parameters. > This is required for AIO cases. Thanks for spotting, will update and retest. The question I have though is, is this a correct fix or am I papering over something (hence the RFC tag). Maybe we can relax the time we're under the inode_lock() a bit so we can avoid this deadlock. David, Josef, Filipe any comments? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 13:06 [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Johannes Thumshirn 2020-09-01 13:11 ` Johannes Thumshirn 2020-09-01 14:17 ` Goldwyn Rodrigues @ 2020-09-01 14:37 ` Filipe Manana 2020-09-01 14:44 ` Johannes Thumshirn 2020-09-01 15:11 ` Josef Bacik 3 siblings, 1 reply; 31+ messages in thread From: Filipe Manana @ 2020-09-01 14:37 UTC (permalink / raw) To: Johannes Thumshirn Cc: David Sterba, linux-btrfs @ vger . kernel . org, Josef Bacik, Goldwyn Rodrigues On Tue, Sep 1, 2020 at 2:06 PM Johannes Thumshirn <johannes.thumshirn@wdc.com> wrote: > > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > for direct I/O. > > [ 18.291293] > [ 18.291532] ============================================ > [ 18.292115] WARNING: possible recursive locking detected > [ 18.292723] 5.9.0-rc2+ #746 Not tainted > [ 18.293145] -------------------------------------------- > [ 18.293718] aio-stress/922 is trying to acquire lock: > [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.295450] > [ 18.295450] but task is already holding lock: > [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.297249] > [ 18.297249] other info that might help us debug this: > [ 18.297960] Possible unsafe locking scenario: > [ 18.297960] > [ 18.298605] CPU0 > [ 18.298880] ---- > [ 18.299151] lock(&sb->s_type->i_mutex_key#11); > [ 18.299653] lock(&sb->s_type->i_mutex_key#11); > [ 18.300156] > [ 18.300156] *** DEADLOCK *** > [ 18.300156] > [ 18.300802] May be due to missing lock nesting notation > [ 18.300802] > [ 18.301542] 2 locks held by aio-stress/922: > [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > [ 18.304223] > [ 18.304223] stack backtrace: > [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > [ 18.306532] Call Trace: > [ 18.306796] dump_stack+0x78/0xa0 > [ 18.307145] __lock_acquire.cold+0x121/0x29f > [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > [ 18.308140] lock_acquire+0x93/0x3b0 > [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309036] down_write+0x33/0x70 > [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.310384] iomap_dio_complete+0x10d/0x120 > [ 18.310824] iomap_dio_rw+0x3c8/0x520 > [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] > [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] > [ 18.312264] ? find_held_lock+0x2b/0x80 > [ 18.312662] aio_write+0xcd/0x180 > [ 18.313011] ? __might_fault+0x31/0x80 > [ 18.313408] ? find_held_lock+0x2b/0x80 > [ 18.313817] ? __might_fault+0x31/0x80 > [ 18.314217] io_submit_one+0x4e1/0xb30 > [ 18.314606] ? find_held_lock+0x2b/0x80 > [ 18.315010] __x64_sys_io_submit+0x71/0x220 > [ 18.315449] do_syscall_64+0x33/0x40 > [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 18.316363] RIP: 0033:0x7f5940881f79 > [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > This happens because iomap_dio_complete() calls into generic_write_sync() > if we have the data-sync flag set. But as we're still under the > inode_lock() from btrfs_file_write_iter() we will deadlock once > btrfs_sync_file() tries to acquire the inode_lock(). > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > already takes care of persisting the data on disk. We can temporarily drop > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > iomap code won't try to call into the sync routines as well. > > References: https://github.com/btrfs/fstests/issues/12 > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/file.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b62679382799..c75c0f2a5f72 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > + iocb->ki_flags &= ~IOCB_DSYNC; > num_written = __btrfs_direct_write(iocb, from); > } else { > num_written = btrfs_buffered_write(iocb, from); > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > if (num_written > 0) > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > + if (sync) { > + iocb->ki_flags |= IOCB_DSYNC; This should be set before calling generic_write_sync() above, otherwise we don't do the persistence required by dsync. I have to be honest, I don't like this approach that much, it's a bit fragile - what if some other code, invoked in between clearing and setting back the flag, needs that flag to operate correctly? Thanks. > atomic_dec(&BTRFS_I(inode)->sync_writers); > + } > out: > current->backing_dev_info = NULL; > return num_written ? num_written : err; > -- > 2.26.2 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 14:37 ` Filipe Manana @ 2020-09-01 14:44 ` Johannes Thumshirn 2020-09-01 18:40 ` Goldwyn Rodrigues 0 siblings, 1 reply; 31+ messages in thread From: Johannes Thumshirn @ 2020-09-01 14:44 UTC (permalink / raw) To: fdmanana Cc: David Sterba, linux-btrfs @ vger . kernel . org, Josef Bacik, Goldwyn Rodrigues On 01/09/2020 16:37, Filipe Manana wrote: >> + iocb->ki_flags |= IOCB_DSYNC; > > This should be set before calling generic_write_sync() above, > otherwise we don't do the persistence required by dsync. > > I have to be honest, I don't like this approach that much, it's a bit > fragile - what if some other code, invoked in between clearing and > setting back the flag, needs that flag to operate correctly? Yes I don't like it either. I've compared btrfs to ext4 and xfs and both don't hold on the inode lock for (nearly) the whole xxx_file_write_iter() time, so I think the correct fix would be to relax this time. But I haven't found any documentation yet on which functions need to be under the inode_lock and which not. Any input is appreciated here. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 14:44 ` Johannes Thumshirn @ 2020-09-01 18:40 ` Goldwyn Rodrigues 0 siblings, 0 replies; 31+ messages in thread From: Goldwyn Rodrigues @ 2020-09-01 18:40 UTC (permalink / raw) To: Johannes Thumshirn Cc: fdmanana, David Sterba, linux-btrfs @ vger . kernel . org, Josef Bacik On 14:44 01/09, Johannes Thumshirn wrote: > On 01/09/2020 16:37, Filipe Manana wrote: > >> + iocb->ki_flags |= IOCB_DSYNC; > > > > This should be set before calling generic_write_sync() above, > > otherwise we don't do the persistence required by dsync. > > > > I have to be honest, I don't like this approach that much, it's a bit > > fragile - what if some other code, invoked in between clearing and > > setting back the flag, needs that flag to operate correctly? > > Yes I don't like it either. > > I've compared btrfs to ext4 and xfs and both don't hold on the inode > lock for (nearly) the whole xxx_file_write_iter() time, so I think > the correct fix would be to relax this time. But I haven't found any > documentation yet on which functions need to be under the inode_lock > and which not. I think you have got this reversed. ext4 and xfs hold the inode lock for the entire xxx_file_write_iter(). OTOH, they don't take the inode lock during fsync operations. The comments in btrfs_sync_file() mention that it needs to make sure inode lock is acquired to make sure no other process dirty the pages while sync is running. > > Any input is appreciated here. > -- Goldwyn ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 13:06 [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Johannes Thumshirn ` (2 preceding siblings ...) 2020-09-01 14:37 ` Filipe Manana @ 2020-09-01 15:11 ` Josef Bacik 2020-09-01 17:45 ` Darrick J. Wong ` (2 more replies) 3 siblings, 3 replies; 31+ messages in thread From: Josef Bacik @ 2020-09-01 15:11 UTC (permalink / raw) To: Johannes Thumshirn, David Sterba Cc: linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/1/20 9:06 AM, Johannes Thumshirn wrote: > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > for direct I/O. > > [ 18.291293] > [ 18.291532] ============================================ > [ 18.292115] WARNING: possible recursive locking detected > [ 18.292723] 5.9.0-rc2+ #746 Not tainted > [ 18.293145] -------------------------------------------- > [ 18.293718] aio-stress/922 is trying to acquire lock: > [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.295450] > [ 18.295450] but task is already holding lock: > [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.297249] > [ 18.297249] other info that might help us debug this: > [ 18.297960] Possible unsafe locking scenario: > [ 18.297960] > [ 18.298605] CPU0 > [ 18.298880] ---- > [ 18.299151] lock(&sb->s_type->i_mutex_key#11); > [ 18.299653] lock(&sb->s_type->i_mutex_key#11); > [ 18.300156] > [ 18.300156] *** DEADLOCK *** > [ 18.300156] > [ 18.300802] May be due to missing lock nesting notation > [ 18.300802] > [ 18.301542] 2 locks held by aio-stress/922: > [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > [ 18.304223] > [ 18.304223] stack backtrace: > [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > [ 18.306532] Call Trace: > [ 18.306796] dump_stack+0x78/0xa0 > [ 18.307145] __lock_acquire.cold+0x121/0x29f > [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > [ 18.308140] lock_acquire+0x93/0x3b0 > [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309036] down_write+0x33/0x70 > [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] > [ 18.310384] iomap_dio_complete+0x10d/0x120 > [ 18.310824] iomap_dio_rw+0x3c8/0x520 > [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] > [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] > [ 18.312264] ? find_held_lock+0x2b/0x80 > [ 18.312662] aio_write+0xcd/0x180 > [ 18.313011] ? __might_fault+0x31/0x80 > [ 18.313408] ? find_held_lock+0x2b/0x80 > [ 18.313817] ? __might_fault+0x31/0x80 > [ 18.314217] io_submit_one+0x4e1/0xb30 > [ 18.314606] ? find_held_lock+0x2b/0x80 > [ 18.315010] __x64_sys_io_submit+0x71/0x220 > [ 18.315449] do_syscall_64+0x33/0x40 > [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 18.316363] RIP: 0033:0x7f5940881f79 > [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > This happens because iomap_dio_complete() calls into generic_write_sync() > if we have the data-sync flag set. But as we're still under the > inode_lock() from btrfs_file_write_iter() we will deadlock once > btrfs_sync_file() tries to acquire the inode_lock(). > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > already takes care of persisting the data on disk. We can temporarily drop > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > iomap code won't try to call into the sync routines as well. > > References: https://github.com/btrfs/fstests/issues/12 > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/file.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > index b62679382799..c75c0f2a5f72 100644 > --- a/fs/btrfs/file.c > +++ b/fs/btrfs/file.c > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > + iocb->ki_flags &= ~IOCB_DSYNC; > num_written = __btrfs_direct_write(iocb, from); > } else { > num_written = btrfs_buffered_write(iocb, from); > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > if (num_written > 0) > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > + if (sync) { > + iocb->ki_flags |= IOCB_DSYNC; > atomic_dec(&BTRFS_I(inode)->sync_writers); > + } > out: > current->backing_dev_info = NULL; > return num_written ? num_written : err; > Christoph, I feel like this is broken. Xfs and ext4 get away with this for different reasons, ext4 doesn't take the inode_lock() at all in fsync, and xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses inode_lock() in ->fsync (not for the IO, just for the logging part). A long time ago I specifically pushed the inode locking down into ->fsync() handlers to give us this sort of control. I'm not 100% on the iomap stuff, but the fix seems like we need to move the generic_write_sync() out of iomap_dio_complete() completely, and the callers do their own thing, much like the normal generic_file_write_iter() does. And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() so we can avoid this sort of thing in the future. What do you think? Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 15:11 ` Josef Bacik @ 2020-09-01 17:45 ` Darrick J. Wong 2020-09-01 17:55 ` Josef Bacik 2020-09-01 21:46 ` Dave Chinner 2020-09-03 16:32 ` Christoph Hellwig 2 siblings, 1 reply; 31+ messages in thread From: Darrick J. Wong @ 2020-09-01 17:45 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: > On 9/1/20 9:06 AM, Johannes Thumshirn wrote: > > Fstests generic/113 exposes a deadlock introduced by the switch to iomap > > for direct I/O. > > > > [ 18.291293] > > [ 18.291532] ============================================ > > [ 18.292115] WARNING: possible recursive locking detected > > [ 18.292723] 5.9.0-rc2+ #746 Not tainted > > [ 18.293145] -------------------------------------------- > > [ 18.293718] aio-stress/922 is trying to acquire lock: > > [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] > > [ 18.295450] > > [ 18.295450] but task is already holding lock: > > [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > > [ 18.297249] > > [ 18.297249] other info that might help us debug this: > > [ 18.297960] Possible unsafe locking scenario: > > [ 18.297960] > > [ 18.298605] CPU0 > > [ 18.298880] ---- > > [ 18.299151] lock(&sb->s_type->i_mutex_key#11); > > [ 18.299653] lock(&sb->s_type->i_mutex_key#11); > > [ 18.300156] > > [ 18.300156] *** DEADLOCK *** > > [ 18.300156] > > [ 18.300802] May be due to missing lock nesting notation > > [ 18.300802] > > [ 18.301542] 2 locks held by aio-stress/922: > > [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] > > [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] > > [ 18.304223] > > [ 18.304223] stack backtrace: > > [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 > > [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 > > [ 18.306532] Call Trace: > > [ 18.306796] dump_stack+0x78/0xa0 > > [ 18.307145] __lock_acquire.cold+0x121/0x29f > > [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] > > [ 18.308140] lock_acquire+0x93/0x3b0 > > [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] > > [ 18.309036] down_write+0x33/0x70 > > [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] > > [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] > > [ 18.310384] iomap_dio_complete+0x10d/0x120 > > [ 18.310824] iomap_dio_rw+0x3c8/0x520 > > [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] > > [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] > > [ 18.312264] ? find_held_lock+0x2b/0x80 > > [ 18.312662] aio_write+0xcd/0x180 > > [ 18.313011] ? __might_fault+0x31/0x80 > > [ 18.313408] ? find_held_lock+0x2b/0x80 > > [ 18.313817] ? __might_fault+0x31/0x80 > > [ 18.314217] io_submit_one+0x4e1/0xb30 > > [ 18.314606] ? find_held_lock+0x2b/0x80 > > [ 18.315010] __x64_sys_io_submit+0x71/0x220 > > [ 18.315449] do_syscall_64+0x33/0x40 > > [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 18.316363] RIP: 0033:0x7f5940881f79 > > [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 > > [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 > > [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 > > [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 > > [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 > > [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 > > [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 > > > > This happens because iomap_dio_complete() calls into generic_write_sync() > > if we have the data-sync flag set. But as we're still under the > > inode_lock() from btrfs_file_write_iter() we will deadlock once > > btrfs_sync_file() tries to acquire the inode_lock(). > > > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > > already takes care of persisting the data on disk. We can temporarily drop > > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > > iomap code won't try to call into the sync routines as well. > > > > References: https://github.com/btrfs/fstests/issues/12 > > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > --- > > fs/btrfs/file.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > > index b62679382799..c75c0f2a5f72 100644 > > --- a/fs/btrfs/file.c > > +++ b/fs/btrfs/file.c > > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > > + iocb->ki_flags &= ~IOCB_DSYNC; > > num_written = __btrfs_direct_write(iocb, from); > > } else { > > num_written = btrfs_buffered_write(iocb, from); > > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > if (num_written > 0) > > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > > + if (sync) { > > + iocb->ki_flags |= IOCB_DSYNC; > > atomic_dec(&BTRFS_I(inode)->sync_writers); > > + } > > out: > > current->backing_dev_info = NULL; > > return num_written ? num_written : err; > > > > Christoph, I feel like this is broken. Xfs and ext4 get away with this for > different reasons, ext4 doesn't take the inode_lock() at all in fsync, and > xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses > inode_lock() in ->fsync (not for the IO, just for the logging part). A long > time ago I specifically pushed the inode locking down into ->fsync() > handlers to give us this sort of control. > > I'm not 100% on the iomap stuff, but the fix seems like we need to move the > generic_write_sync() out of iomap_dio_complete() completely, and the callers > do their own thing, much like the normal generic_file_write_iter() does. > And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() > so we can avoid this sort of thing in the future. What do you think? Hmm, I was under the impression that the direct write completion in either path (iomap or classic) could call generic_write_sync? How did this work in btrfs before the iomap conversion? --D > Thanks, > > Josef > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 17:45 ` Darrick J. Wong @ 2020-09-01 17:55 ` Josef Bacik 0 siblings, 0 replies; 31+ messages in thread From: Josef Bacik @ 2020-09-01 17:55 UTC (permalink / raw) To: Darrick J. Wong Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/1/20 1:45 PM, Darrick J. Wong wrote: > On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: >> On 9/1/20 9:06 AM, Johannes Thumshirn wrote: >>> Fstests generic/113 exposes a deadlock introduced by the switch to iomap >>> for direct I/O. >>> >>> [ 18.291293] >>> [ 18.291532] ============================================ >>> [ 18.292115] WARNING: possible recursive locking detected >>> [ 18.292723] 5.9.0-rc2+ #746 Not tainted >>> [ 18.293145] -------------------------------------------- >>> [ 18.293718] aio-stress/922 is trying to acquire lock: >>> [ 18.294274] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_sync_file+0xf7/0x560 [btrfs] >>> [ 18.295450] >>> [ 18.295450] but task is already holding lock: >>> [ 18.296086] ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] >>> [ 18.297249] >>> [ 18.297249] other info that might help us debug this: >>> [ 18.297960] Possible unsafe locking scenario: >>> [ 18.297960] >>> [ 18.298605] CPU0 >>> [ 18.298880] ---- >>> [ 18.299151] lock(&sb->s_type->i_mutex_key#11); >>> [ 18.299653] lock(&sb->s_type->i_mutex_key#11); >>> [ 18.300156] >>> [ 18.300156] *** DEADLOCK *** >>> [ 18.300156] >>> [ 18.300802] May be due to missing lock nesting notation >>> [ 18.300802] >>> [ 18.301542] 2 locks held by aio-stress/922: >>> [ 18.302000] #0: ffff888217412010 (&sb->s_type->i_mutex_key#11){++++}-{3:3}, at: btrfs_file_write_iter+0x6e/0x630 [btrfs] >>> [ 18.303194] #1: ffff888217411ea0 (&ei->dio_sem){++++}-{3:3}, at: btrfs_direct_IO+0x113/0x160 [btrfs] >>> [ 18.304223] >>> [ 18.304223] stack backtrace: >>> [ 18.304695] CPU: 0 PID: 922 Comm: aio-stress Not tainted 5.9.0-rc2+ #746 >>> [ 18.305383] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4-rebuilt.opensuse.org 04/01/2014 >>> [ 18.306532] Call Trace: >>> [ 18.306796] dump_stack+0x78/0xa0 >>> [ 18.307145] __lock_acquire.cold+0x121/0x29f >>> [ 18.307613] ? btrfs_dio_iomap_end+0x65/0x130 [btrfs] >>> [ 18.308140] lock_acquire+0x93/0x3b0 >>> [ 18.308544] ? btrfs_sync_file+0xf7/0x560 [btrfs] >>> [ 18.309036] down_write+0x33/0x70 >>> [ 18.309402] ? btrfs_sync_file+0xf7/0x560 [btrfs] >>> [ 18.309912] btrfs_sync_file+0xf7/0x560 [btrfs] >>> [ 18.310384] iomap_dio_complete+0x10d/0x120 >>> [ 18.310824] iomap_dio_rw+0x3c8/0x520 >>> [ 18.311225] btrfs_direct_IO+0xd3/0x160 [btrfs] >>> [ 18.311727] btrfs_file_write_iter+0x1fe/0x630 [btrfs] >>> [ 18.312264] ? find_held_lock+0x2b/0x80 >>> [ 18.312662] aio_write+0xcd/0x180 >>> [ 18.313011] ? __might_fault+0x31/0x80 >>> [ 18.313408] ? find_held_lock+0x2b/0x80 >>> [ 18.313817] ? __might_fault+0x31/0x80 >>> [ 18.314217] io_submit_one+0x4e1/0xb30 >>> [ 18.314606] ? find_held_lock+0x2b/0x80 >>> [ 18.315010] __x64_sys_io_submit+0x71/0x220 >>> [ 18.315449] do_syscall_64+0x33/0x40 >>> [ 18.315829] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [ 18.316363] RIP: 0033:0x7f5940881f79 >>> [ 18.316740] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 4e 0c 00 f7 d8 64 89 01 48 >>> [ 18.318651] RSP: 002b:00007f5934f51d88 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1 >>> [ 18.319428] RAX: ffffffffffffffda RBX: 00007f5934f52680 RCX: 00007f5940881f79 >>> [ 18.320168] RDX: 0000000000b56030 RSI: 0000000000000008 RDI: 00007f593171f000 >>> [ 18.320895] RBP: 00007f593171f000 R08: 0000000000000000 R09: 0000000000b56030 >>> [ 18.321630] R10: 00007fffd599e080 R11: 0000000000000246 R12: 0000000000000008 >>> [ 18.322369] R13: 0000000000000000 R14: 0000000000b56030 R15: 0000000000b56070 >>> >>> This happens because iomap_dio_complete() calls into generic_write_sync() >>> if we have the data-sync flag set. But as we're still under the >>> inode_lock() from btrfs_file_write_iter() we will deadlock once >>> btrfs_sync_file() tries to acquire the inode_lock(). >>> >>> Calling into generic_write_sync() is not needed as __btrfs_direct_write() >>> already takes care of persisting the data on disk. We can temporarily drop >>> the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the >>> iomap code won't try to call into the sync routines as well. >>> >>> References: https://github.com/btrfs/fstests/issues/12 >>> Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") >>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>> --- >>> fs/btrfs/file.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> index b62679382799..c75c0f2a5f72 100644 >>> --- a/fs/btrfs/file.c >>> +++ b/fs/btrfs/file.c >>> @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>> atomic_inc(&BTRFS_I(inode)->sync_writers); >>> if (iocb->ki_flags & IOCB_DIRECT) { >>> + iocb->ki_flags &= ~IOCB_DSYNC; >>> num_written = __btrfs_direct_write(iocb, from); >>> } else { >>> num_written = btrfs_buffered_write(iocb, from); >>> @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>> if (num_written > 0) >>> num_written = generic_write_sync(iocb, num_written); >>> - if (sync) >>> + if (sync) { >>> + iocb->ki_flags |= IOCB_DSYNC; >>> atomic_dec(&BTRFS_I(inode)->sync_writers); >>> + } >>> out: >>> current->backing_dev_info = NULL; >>> return num_written ? num_written : err; >>> >> >> Christoph, I feel like this is broken. Xfs and ext4 get away with this for >> different reasons, ext4 doesn't take the inode_lock() at all in fsync, and >> xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses >> inode_lock() in ->fsync (not for the IO, just for the logging part). A long >> time ago I specifically pushed the inode locking down into ->fsync() >> handlers to give us this sort of control. >> >> I'm not 100% on the iomap stuff, but the fix seems like we need to move the >> generic_write_sync() out of iomap_dio_complete() completely, and the callers >> do their own thing, much like the normal generic_file_write_iter() does. >> And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() >> so we can avoid this sort of thing in the future. What do you think? > > Hmm, I was under the impression that the direct write completion in > either path (iomap or classic) could call generic_write_sync? How did > this work in btrfs before the iomap conversion? Looking at the code, we have this if (iocb->ki_flags & IOCB_DSYNC) retval = dio_set_defer_completion(dio); it only happens on async, and then the generic_write_sync() happens outside of the context of the submitting task, and since we're async we wouldn't be waiting on the IO inside of the area where we're holding the inode_lock(). Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 15:11 ` Josef Bacik 2020-09-01 17:45 ` Darrick J. Wong @ 2020-09-01 21:46 ` Dave Chinner 2020-09-01 22:19 ` Josef Bacik 2020-09-03 16:32 ` Christoph Hellwig 2 siblings, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-01 21:46 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: > On 9/1/20 9:06 AM, Johannes Thumshirn wrote: > > This happens because iomap_dio_complete() calls into generic_write_sync() > > if we have the data-sync flag set. But as we're still under the > > inode_lock() from btrfs_file_write_iter() we will deadlock once > > btrfs_sync_file() tries to acquire the inode_lock(). > > > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > > already takes care of persisting the data on disk. We can temporarily drop > > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > > iomap code won't try to call into the sync routines as well. > > > > References: https://github.com/btrfs/fstests/issues/12 > > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > --- > > fs/btrfs/file.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > > index b62679382799..c75c0f2a5f72 100644 > > --- a/fs/btrfs/file.c > > +++ b/fs/btrfs/file.c > > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > atomic_inc(&BTRFS_I(inode)->sync_writers); > > if (iocb->ki_flags & IOCB_DIRECT) { > > + iocb->ki_flags &= ~IOCB_DSYNC; > > num_written = __btrfs_direct_write(iocb, from); > > } else { > > num_written = btrfs_buffered_write(iocb, from); > > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > if (num_written > 0) > > num_written = generic_write_sync(iocb, num_written); > > - if (sync) > > + if (sync) { > > + iocb->ki_flags |= IOCB_DSYNC; > > atomic_dec(&BTRFS_I(inode)->sync_writers); > > + } > > out: > > current->backing_dev_info = NULL; > > return num_written ? num_written : err; > > > > Christoph, I feel like this is broken. No, it isn't broken, it's just a -different design- to the old direct IO path. It was done this way done by design because the old way of requiring separate paths for calling generic_write_sync() for sync and AIO is .... nasty, and doesn't allow for optimisation of IO completion functionality that may be wholly dependent on submission time inode state. e.g. moving the O_DSYNC completion out of the context of the IOMAP_F_DIRTY submission context means we can't reliably do FUA writes to avoid calls to generic_write_sync() completely. Compromising that functionality is going to cause major performance regressions for high performance enterprise databases using O_DSYNC AIO+DIO... > Xfs and ext4 get away with this for > different reasons, No, they "don't get away with it", this is how it was designed to work. > ext4 doesn't take the inode_lock() at all in fsync, and > xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses > inode_lock() in ->fsync (not for the IO, just for the logging part). A long > time ago I specifically pushed the inode locking down into ->fsync() > handlers to give us this sort of control. > > I'm not 100% on the iomap stuff, but the fix seems like we need to move the > generic_write_sync() out of iomap_dio_complete() completely, and the callers > do their own thing, much like the normal generic_file_write_iter() does. That effectively breaks O_DSYNC AIO and requires us to reintroduce all the nasty code that the old direct IO path required both the infrastructure and the filesystems to handle it. That's really not acceptible solution to an internal btrfs locking issue... > And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() > so we can avoid this sort of thing in the future. What do you think? That's not going to work, either. There are filesystems that call vfs_fsync_range() directly from under the inode_lock(). For example, the fallocate() path in gfs2. And it's called under the ext4 and XFS MMAPLOCK from the dax page fault path, which is the page fault equivalent of the inode_lock(). IOWs, if you know that you aren't going to take inode locks in your ->fsync() method, there's nothing that says you cannot call vfs_fsync_range() while holding those inode locks. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 21:46 ` Dave Chinner @ 2020-09-01 22:19 ` Josef Bacik 2020-09-01 23:58 ` Dave Chinner 0 siblings, 1 reply; 31+ messages in thread From: Josef Bacik @ 2020-09-01 22:19 UTC (permalink / raw) To: Dave Chinner Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/1/20 5:46 PM, Dave Chinner wrote: > On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: >> On 9/1/20 9:06 AM, Johannes Thumshirn wrote: >>> This happens because iomap_dio_complete() calls into generic_write_sync() >>> if we have the data-sync flag set. But as we're still under the >>> inode_lock() from btrfs_file_write_iter() we will deadlock once >>> btrfs_sync_file() tries to acquire the inode_lock(). >>> >>> Calling into generic_write_sync() is not needed as __btrfs_direct_write() >>> already takes care of persisting the data on disk. We can temporarily drop >>> the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the >>> iomap code won't try to call into the sync routines as well. >>> >>> References: https://github.com/btrfs/fstests/issues/12 >>> Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") >>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>> --- >>> fs/btrfs/file.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>> index b62679382799..c75c0f2a5f72 100644 >>> --- a/fs/btrfs/file.c >>> +++ b/fs/btrfs/file.c >>> @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>> atomic_inc(&BTRFS_I(inode)->sync_writers); >>> if (iocb->ki_flags & IOCB_DIRECT) { >>> + iocb->ki_flags &= ~IOCB_DSYNC; >>> num_written = __btrfs_direct_write(iocb, from); >>> } else { >>> num_written = btrfs_buffered_write(iocb, from); >>> @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>> if (num_written > 0) >>> num_written = generic_write_sync(iocb, num_written); >>> - if (sync) >>> + if (sync) { >>> + iocb->ki_flags |= IOCB_DSYNC; >>> atomic_dec(&BTRFS_I(inode)->sync_writers); >>> + } >>> out: >>> current->backing_dev_info = NULL; >>> return num_written ? num_written : err; >>> >> >> Christoph, I feel like this is broken. > > No, it isn't broken, it's just a -different design- to the old > direct IO path. It was done this way done by design because the old > way of requiring separate paths for calling generic_write_sync() for > sync and AIO is .... nasty, and doesn't allow for optimisation of > IO completion functionality that may be wholly dependent on > submission time inode state. > > e.g. moving the O_DSYNC completion out of the context of the > IOMAP_F_DIRTY submission context means we can't reliably do FUA > writes to avoid calls to generic_write_sync() completely. > Compromising that functionality is going to cause major performance > regressions for high performance enterprise databases using O_DSYNC > AIO+DIO... > >> Xfs and ext4 get away with this for >> different reasons, > > No, they "don't get away with it", this is how it was designed to > work. > Didn't mean this as a slight, just saying this is why it works fine for you guys and doesn't work for us. Because when we first were looking at this we couldn't understand how it didn't blow up for you and it did blow up for us. I'm providing context, not saying you guys are broken or doing it wrong. >> ext4 doesn't take the inode_lock() at all in fsync, and >> xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses >> inode_lock() in ->fsync (not for the IO, just for the logging part). A long >> time ago I specifically pushed the inode locking down into ->fsync() >> handlers to give us this sort of control. >> >> I'm not 100% on the iomap stuff, but the fix seems like we need to move the >> generic_write_sync() out of iomap_dio_complete() completely, and the callers >> do their own thing, much like the normal generic_file_write_iter() does. > > That effectively breaks O_DSYNC AIO and requires us to reintroduce > all the nasty code that the old direct IO path required both the > infrastructure and the filesystems to handle it. That's really not > acceptible solution to an internal btrfs locking issue... > >> And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() >> so we can avoid this sort of thing in the future. What do you think? > > That's not going to work, either. There are filesystems that call > vfs_fsync_range() directly from under the inode_lock(). For example, > the fallocate() path in gfs2. And it's called under the ext4 and XFS > MMAPLOCK from the dax page fault path, which is the page fault > equivalent of the inode_lock(). IOWs, if you know that you aren't > going to take inode locks in your ->fsync() method, there's nothing > that says you cannot call vfs_fsync_range() while holding those > inode locks. I converted ->fsync to not have the i_mutex taken before calling _years_ ago 02c24a82187d5a628c68edfe71ae60dc135cd178 and part of what I did was update the locking document around it. So in my head, the locking rules were "No VFS locks held on entry". Obviously that's not true today, but if we're going to change the assumptions around these things then we really ought to 1) Make sure they're true for _all_ file systems. 2) Document it when it's changed. Ok so iomap was designed assuming it was safe to take the inode_lock() before calling ->fsync. That's fine, but this is kind of a bad way to find out. We really shouldn't have generic helper's that have different generic locking rules based on which file system uses them. Because then we end up with situations like this, where suddenly we're having to come up with some weird solution because the generic thing only works for a subset of file systems. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 22:19 ` Josef Bacik @ 2020-09-01 23:58 ` Dave Chinner 2020-09-02 0:22 ` Josef Bacik 2020-09-02 11:44 ` Matthew Wilcox 0 siblings, 2 replies; 31+ messages in thread From: Dave Chinner @ 2020-09-01 23:58 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Tue, Sep 01, 2020 at 06:19:17PM -0400, Josef Bacik wrote: > On 9/1/20 5:46 PM, Dave Chinner wrote: > > On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: > > > On 9/1/20 9:06 AM, Johannes Thumshirn wrote: > > > > This happens because iomap_dio_complete() calls into generic_write_sync() > > > > if we have the data-sync flag set. But as we're still under the > > > > inode_lock() from btrfs_file_write_iter() we will deadlock once > > > > btrfs_sync_file() tries to acquire the inode_lock(). > > > > > > > > Calling into generic_write_sync() is not needed as __btrfs_direct_write() > > > > already takes care of persisting the data on disk. We can temporarily drop > > > > the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the > > > > iomap code won't try to call into the sync routines as well. > > > > > > > > References: https://github.com/btrfs/fstests/issues/12 > > > > Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") > > > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > > > --- > > > > fs/btrfs/file.c | 5 ++++- > > > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c > > > > index b62679382799..c75c0f2a5f72 100644 > > > > --- a/fs/btrfs/file.c > > > > +++ b/fs/btrfs/file.c > > > > @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > > > atomic_inc(&BTRFS_I(inode)->sync_writers); > > > > if (iocb->ki_flags & IOCB_DIRECT) { > > > > + iocb->ki_flags &= ~IOCB_DSYNC; > > > > num_written = __btrfs_direct_write(iocb, from); > > > > } else { > > > > num_written = btrfs_buffered_write(iocb, from); > > > > @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, > > > > if (num_written > 0) > > > > num_written = generic_write_sync(iocb, num_written); > > > > - if (sync) > > > > + if (sync) { > > > > + iocb->ki_flags |= IOCB_DSYNC; > > > > atomic_dec(&BTRFS_I(inode)->sync_writers); > > > > + } > > > > out: > > > > current->backing_dev_info = NULL; > > > > return num_written ? num_written : err; > > > > > > > > > > Christoph, I feel like this is broken. > > > > No, it isn't broken, it's just a -different design- to the old > > direct IO path. It was done this way done by design because the old > > way of requiring separate paths for calling generic_write_sync() for > > sync and AIO is .... nasty, and doesn't allow for optimisation of > > IO completion functionality that may be wholly dependent on > > submission time inode state. > > > > e.g. moving the O_DSYNC completion out of the context of the > > IOMAP_F_DIRTY submission context means we can't reliably do FUA > > writes to avoid calls to generic_write_sync() completely. > > Compromising that functionality is going to cause major performance > > regressions for high performance enterprise databases using O_DSYNC > > AIO+DIO... > > > > > Xfs and ext4 get away with this for > > > different reasons, > > > > No, they "don't get away with it", this is how it was designed to > > work. > > > > Didn't mean this as a slight, just saying this is why it works fine for you > guys and doesn't work for us. Because when we first were looking at this we > couldn't understand how it didn't blow up for you and it did blow up for us. > I'm providing context, not saying you guys are broken or doing it wrong. > > > > ext4 doesn't take the inode_lock() at all in fsync, and > > > xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses > > > inode_lock() in ->fsync (not for the IO, just for the logging part). A long > > > time ago I specifically pushed the inode locking down into ->fsync() > > > handlers to give us this sort of control. > > > > > > I'm not 100% on the iomap stuff, but the fix seems like we need to move the > > > generic_write_sync() out of iomap_dio_complete() completely, and the callers > > > do their own thing, much like the normal generic_file_write_iter() does. > > > > That effectively breaks O_DSYNC AIO and requires us to reintroduce > > all the nasty code that the old direct IO path required both the > > infrastructure and the filesystems to handle it. That's really not > > acceptible solution to an internal btrfs locking issue... > > > > > And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() > > > so we can avoid this sort of thing in the future. What do you think? > > > > That's not going to work, either. There are filesystems that call > > vfs_fsync_range() directly from under the inode_lock(). For example, > > the fallocate() path in gfs2. And it's called under the ext4 and XFS > > MMAPLOCK from the dax page fault path, which is the page fault > > equivalent of the inode_lock(). IOWs, if you know that you aren't > > going to take inode locks in your ->fsync() method, there's nothing > > that says you cannot call vfs_fsync_range() while holding those > > inode locks. > > I converted ->fsync to not have the i_mutex taken before calling _years_ ago > > 02c24a82187d5a628c68edfe71ae60dc135cd178 > > and part of what I did was update the locking document around it. So in my > head, the locking rules were "No VFS locks held on entry". Obviously that's > not true today, but if we're going to change the assumptions around these > things then we really ought to > > 1) Make sure they're true for _all_ file systems. > 2) Document it when it's changed. > > Ok so iomap was designed assuming it was safe to take the inode_lock() > before calling ->fsync. No, I did not say that. Stop trying to put your words in my mouth, Josef. The iomap design simply assumes that ->fsync can run without needing to hold the same locks as IO submission requires. iomap requires a ->fsync implementation that doesn't take the inode_lock() if the inode_lock() is used to serialise IO submission. i.e. iomap separates IO exclusion from -internal inode state serialisation-. This design assumption means DIO submission can safely -flush dirty data-, invalidate the page cache, sync the inode/journal to stable storage, etc all while holding the IO exclusion lock to ensure submission won't race with other metadata modification operations like truncate, holepunch, etc. IOWs, iomap expects the inode_lock() to work as an -IO submission exclusion lock-, not as an inode _state protection_ lock. Yes, this means iomap has a different IO locking model to the original direct and bufferhead based buffered IO paths. However, these architectural changes are required to replace bufferhead based locking, implement the physical storage exclusion DAX requires, close mmap vs hole punch races, etc. In practice, this means all the inode state in XFS (and ext4) are protected by internal inode locks rather than the inode_lock(). As such, they they do not require the inode_lock() to be able to write back dirty data or modify inode state or flush inode state to stable storage. Put simply: converting a filesystem to use iomap is not a "change the filesystem interfacing code and it will work" modification. We ask that filesystems are modified to conform to the iomap IO exclusion model; adding special cases for every potential locking and mapping quirk every different filesystem has is part of what turned the old direct IO code into an unmaintainable nightmare. > That's fine, but this is kind of a bad way to find > out. We really shouldn't have generic helper's that have different generic > locking rules based on which file system uses them. We certainly can change the rules for new infrastructure. Indeed, we had to change the rules to support DAX. The whole point of the iomap infrastructure was that it enabled us to use code that already worked for DAX (the XFS code) in multiple filesystems. And as people have realised that the DIO via iomap is much faster than the old DIO code and is a much more efficient way of doing large buffered IO, other filesystems have started to use it. However.... > Because then we end up > with situations like this, where suddenly we're having to come up with some > weird solution because the generic thing only works for a subset of file > systems. Thanks, .... we've always said "you need to change the filesystem code to use iomap". This is simply a reflection on the fact that iomap has different rules and constraints to the old code and so it's not a direct plug in replacement. There are no short cuts here... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 23:58 ` Dave Chinner @ 2020-09-02 0:22 ` Josef Bacik 2020-09-02 7:12 ` Johannes Thumshirn 2020-09-02 11:44 ` Matthew Wilcox 1 sibling, 1 reply; 31+ messages in thread From: Josef Bacik @ 2020-09-02 0:22 UTC (permalink / raw) To: Dave Chinner Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/1/20 7:58 PM, Dave Chinner wrote: > On Tue, Sep 01, 2020 at 06:19:17PM -0400, Josef Bacik wrote: >> On 9/1/20 5:46 PM, Dave Chinner wrote: >>> On Tue, Sep 01, 2020 at 11:11:58AM -0400, Josef Bacik wrote: >>>> On 9/1/20 9:06 AM, Johannes Thumshirn wrote: >>>>> This happens because iomap_dio_complete() calls into generic_write_sync() >>>>> if we have the data-sync flag set. But as we're still under the >>>>> inode_lock() from btrfs_file_write_iter() we will deadlock once >>>>> btrfs_sync_file() tries to acquire the inode_lock(). >>>>> >>>>> Calling into generic_write_sync() is not needed as __btrfs_direct_write() >>>>> already takes care of persisting the data on disk. We can temporarily drop >>>>> the IOCB_DSYNC flag before calling into __btrfs_direct_write() so the >>>>> iomap code won't try to call into the sync routines as well. >>>>> >>>>> References: https://github.com/btrfs/fstests/issues/12 >>>>> Fixes: da4d7c1b4c45 ("btrfs: switch to iomap for direct IO") >>>>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>>>> --- >>>>> fs/btrfs/file.c | 5 ++++- >>>>> 1 file changed, 4 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c >>>>> index b62679382799..c75c0f2a5f72 100644 >>>>> --- a/fs/btrfs/file.c >>>>> +++ b/fs/btrfs/file.c >>>>> @@ -2023,6 +2023,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>>>> atomic_inc(&BTRFS_I(inode)->sync_writers); >>>>> if (iocb->ki_flags & IOCB_DIRECT) { >>>>> + iocb->ki_flags &= ~IOCB_DSYNC; >>>>> num_written = __btrfs_direct_write(iocb, from); >>>>> } else { >>>>> num_written = btrfs_buffered_write(iocb, from); >>>>> @@ -2046,8 +2047,10 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, >>>>> if (num_written > 0) >>>>> num_written = generic_write_sync(iocb, num_written); >>>>> - if (sync) >>>>> + if (sync) { >>>>> + iocb->ki_flags |= IOCB_DSYNC; >>>>> atomic_dec(&BTRFS_I(inode)->sync_writers); >>>>> + } >>>>> out: >>>>> current->backing_dev_info = NULL; >>>>> return num_written ? num_written : err; >>>>> >>>> >>>> Christoph, I feel like this is broken. >>> >>> No, it isn't broken, it's just a -different design- to the old >>> direct IO path. It was done this way done by design because the old >>> way of requiring separate paths for calling generic_write_sync() for >>> sync and AIO is .... nasty, and doesn't allow for optimisation of >>> IO completion functionality that may be wholly dependent on >>> submission time inode state. >>> >>> e.g. moving the O_DSYNC completion out of the context of the >>> IOMAP_F_DIRTY submission context means we can't reliably do FUA >>> writes to avoid calls to generic_write_sync() completely. >>> Compromising that functionality is going to cause major performance >>> regressions for high performance enterprise databases using O_DSYNC >>> AIO+DIO... >>> >>>> Xfs and ext4 get away with this for >>>> different reasons, >>> >>> No, they "don't get away with it", this is how it was designed to >>> work. >>> >> >> Didn't mean this as a slight, just saying this is why it works fine for you >> guys and doesn't work for us. Because when we first were looking at this we >> couldn't understand how it didn't blow up for you and it did blow up for us. >> I'm providing context, not saying you guys are broken or doing it wrong. >> >>>> ext4 doesn't take the inode_lock() at all in fsync, and >>>> xfs takes the ILOCK instead of the IOLOCK, so it's fine. However btrfs uses >>>> inode_lock() in ->fsync (not for the IO, just for the logging part). A long >>>> time ago I specifically pushed the inode locking down into ->fsync() >>>> handlers to give us this sort of control. >>>> >>>> I'm not 100% on the iomap stuff, but the fix seems like we need to move the >>>> generic_write_sync() out of iomap_dio_complete() completely, and the callers >>>> do their own thing, much like the normal generic_file_write_iter() does. >>> >>> That effectively breaks O_DSYNC AIO and requires us to reintroduce >>> all the nasty code that the old direct IO path required both the >>> infrastructure and the filesystems to handle it. That's really not >>> acceptible solution to an internal btrfs locking issue... >>> >>>> And then I'd like to add a WARN_ON(lockdep_is_held()) in vfs_fsync_range() >>>> so we can avoid this sort of thing in the future. What do you think? >>> >>> That's not going to work, either. There are filesystems that call >>> vfs_fsync_range() directly from under the inode_lock(). For example, >>> the fallocate() path in gfs2. And it's called under the ext4 and XFS >>> MMAPLOCK from the dax page fault path, which is the page fault >>> equivalent of the inode_lock(). IOWs, if you know that you aren't >>> going to take inode locks in your ->fsync() method, there's nothing >>> that says you cannot call vfs_fsync_range() while holding those >>> inode locks. >> >> I converted ->fsync to not have the i_mutex taken before calling _years_ ago >> >> 02c24a82187d5a628c68edfe71ae60dc135cd178 >> >> and part of what I did was update the locking document around it. So in my >> head, the locking rules were "No VFS locks held on entry". Obviously that's >> not true today, but if we're going to change the assumptions around these >> things then we really ought to >> >> 1) Make sure they're true for _all_ file systems. >> 2) Document it when it's changed. >> >> Ok so iomap was designed assuming it was safe to take the inode_lock() >> before calling ->fsync. > > No, I did not say that. Stop trying to put your words in my mouth, > Josef. > Why are you being combative? I'm simply trying to work towards a reasonable solution and figure out how we keep this class of problems from happening again. I literally do not care at all about the design decisions for iomap, you are an intelligent person and I assume good intent, I assume you did it for good reasons. I'm simply trying to work out holes in my understanding, and work towards a solution we're all happy with. I'm not trying to argue with you or throw mud, I'm simply trying to understand. > The iomap design simply assumes that ->fsync can run without needing > to hold the same locks as IO submission requires. iomap > requires a ->fsync implementation that doesn't take the inode_lock() > if the inode_lock() is used to serialise IO submission. i.e. iomap > separates IO exclusion from -internal inode state serialisation-. > This is only slightly different from what I was saying, and in reality is the same thing because we all currently use the inode_lock() for the submission side. So while the assumption may be slightly different, the outcome is the same. > This design assumption means DIO submission can safely -flush dirty > data-, invalidate the page cache, sync the inode/journal to stable > storage, etc all while holding the IO exclusion lock to ensure > submission won't race with other metadata modification operations > like truncate, holepunch, etc. > > IOWs, iomap expects the inode_lock() to work as an -IO submission > exclusion lock-, not as an inode _state protection_ lock. Yes, this > means iomap has a different IO locking model to the original direct > and bufferhead based buffered IO paths. However, these architectural > changes are required to replace bufferhead based locking, implement > the physical storage exclusion DAX requires, close mmap vs hole > punch races, etc. > > In practice, this means all the inode state in XFS (and ext4) are > protected by internal inode locks rather than the inode_lock(). As > such, they they do not require the inode_lock() to be able to write > back dirty data or modify inode state or flush inode state to stable > storage. > > Put simply: converting a filesystem to use iomap is not a "change > the filesystem interfacing code and it will work" modification. We > ask that filesystems are modified to conform to the iomap IO > exclusion model; adding special cases for every potential > locking and mapping quirk every different filesystem has is part of > what turned the old direct IO code into an unmaintainable nightmare. > >> That's fine, but this is kind of a bad way to find >> out. We really shouldn't have generic helper's that have different generic >> locking rules based on which file system uses them. > > We certainly can change the rules for new infrastructure. Indeed, we > had to change the rules to support DAX. The whole point of the > iomap infrastructure was that it enabled us to use code that already > worked for DAX (the XFS code) in multiple filesystems. And as people > have realised that the DIO via iomap is much faster than the old DIO > code and is a much more efficient way of doing large buffered IO, > other filesystems have started to use it. > > However.... > >> Because then we end up >> with situations like this, where suddenly we're having to come up with some >> weird solution because the generic thing only works for a subset of file >> systems. Thanks, > > .... we've always said "you need to change the filesystem code to > use iomap". This is simply a reflection on the fact that iomap has > different rules and constraints to the old code and so it's not a > direct plug in replacement. There are no short cuts here... > That's fine, again I'm not saying you did anything wrong with the implementation. I'm saying these requirements were not clear, and with no warning or annotations or documentation we simply find out by random chance while Johannes is running with a non-standard xfstests setup. Perhaps that's what the system working looks like, but honestly I would have rather found out at the point that I applied, built, and reviewed the code so we could have addressed it then. Instead now we have to rip it out until we figure out what to do about it. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 0:22 ` Josef Bacik @ 2020-09-02 7:12 ` Johannes Thumshirn 2020-09-02 11:10 ` Josef Bacik 0 siblings, 1 reply; 31+ messages in thread From: Johannes Thumshirn @ 2020-09-02 7:12 UTC (permalink / raw) To: Josef Bacik, Dave Chinner Cc: David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 02/09/2020 02:22, Josef Bacik wrote: > Instead now we have to rip > it out until we figure out what to do about it. I don't think we need to rip out the iomap conversion. We can take my fix albeit not pretty, until we have reworked the locking around ->fsync(). Probably with a big fat comment attached to it. Byte, Johannes ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 7:12 ` Johannes Thumshirn @ 2020-09-02 11:10 ` Josef Bacik 2020-09-02 16:29 ` Darrick J. Wong 0 siblings, 1 reply; 31+ messages in thread From: Josef Bacik @ 2020-09-02 11:10 UTC (permalink / raw) To: Johannes Thumshirn, Dave Chinner Cc: David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/2/20 3:12 AM, Johannes Thumshirn wrote: > On 02/09/2020 02:22, Josef Bacik wrote: >> Instead now we have to rip >> it out until we figure out what to do about it. > > I don't think we need to rip out the iomap conversion. We can > take my fix albeit not pretty, until we have reworked the locking > around ->fsync(). Probably with a big fat comment attached to it. > We do, because your fix breaks DSYNC for AIO. You didn't hit this with direct io, you hit it with AIO, and the reason you hit it is because you are on zram, so your bio's completed before we exited iomap_dio_rw. So that was the last put on the iomap_dio, and thus we ran iomap_dio_complete() and deadlocked. We can't just drop the DSYNC thing for AIO because in the normal case where this doesn't happen we need to know when the last thing is finished in order to run ->fsync(), we can't just run it after submission. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 11:10 ` Josef Bacik @ 2020-09-02 16:29 ` Darrick J. Wong 2020-09-02 16:47 ` Josef Bacik 0 siblings, 1 reply; 31+ messages in thread From: Darrick J. Wong @ 2020-09-02 16:29 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Thumshirn, Dave Chinner, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Wed, Sep 02, 2020 at 07:10:08AM -0400, Josef Bacik wrote: > On 9/2/20 3:12 AM, Johannes Thumshirn wrote: > > On 02/09/2020 02:22, Josef Bacik wrote: > > > Instead now we have to rip > > > it out until we figure out what to do about it. > > > > I don't think we need to rip out the iomap conversion. We can > > take my fix albeit not pretty, until we have reworked the locking > > around ->fsync(). Probably with a big fat comment attached to it. > > > > We do, because your fix breaks DSYNC for AIO. You didn't hit this with > direct io, you hit it with AIO, and the reason you hit it is because you are > on zram, so your bio's completed before we exited iomap_dio_rw. So that was > the last put on the iomap_dio, and thus we ran > iomap_dio_complete() and deadlocked. We can't just drop the DSYNC thing for > AIO because in the normal case where this doesn't happen we need to know > when the last thing is finished in order to run ->fsync(), we can't just run > it after submission. Thanks, Bleh, Oracle mail (or vger or something) is being slow again... It occurred to me that we added iomap_dio_ops.submit_io for the benefit of btrfs. Could we solve all this for now by adding a ->write_sync function pointer to iomap_dio_ops that could lead back into a btrfs function that would flush the necessary bits without itself taking the inode lock? And if a ->write_sync is not supplied, then the caller gets generic_write_sync? It's kind of a bandaid, but maybe less bad of one than restructuring the btrfs locking model under time pressure... --D > Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 16:29 ` Darrick J. Wong @ 2020-09-02 16:47 ` Josef Bacik 0 siblings, 0 replies; 31+ messages in thread From: Josef Bacik @ 2020-09-02 16:47 UTC (permalink / raw) To: Darrick J. Wong Cc: Johannes Thumshirn, Dave Chinner, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/2/20 12:29 PM, Darrick J. Wong wrote: > On Wed, Sep 02, 2020 at 07:10:08AM -0400, Josef Bacik wrote: >> On 9/2/20 3:12 AM, Johannes Thumshirn wrote: >>> On 02/09/2020 02:22, Josef Bacik wrote: >>>> Instead now we have to rip >>>> it out until we figure out what to do about it. >>> >>> I don't think we need to rip out the iomap conversion. We can >>> take my fix albeit not pretty, until we have reworked the locking >>> around ->fsync(). Probably with a big fat comment attached to it. >>> >> >> We do, because your fix breaks DSYNC for AIO. You didn't hit this with >> direct io, you hit it with AIO, and the reason you hit it is because you are >> on zram, so your bio's completed before we exited iomap_dio_rw. So that was >> the last put on the iomap_dio, and thus we ran >> iomap_dio_complete() and deadlocked. We can't just drop the DSYNC thing for >> AIO because in the normal case where this doesn't happen we need to know >> when the last thing is finished in order to run ->fsync(), we can't just run >> it after submission. Thanks, > > Bleh, Oracle mail (or vger or something) is being slow again... > > It occurred to me that we added iomap_dio_ops.submit_io for the benefit > of btrfs. Could we solve all this for now by adding a ->write_sync > function pointer to iomap_dio_ops that could lead back into a btrfs > function that would flush the necessary bits without itself taking the > inode lock? And if a ->write_sync is not supplied, then the caller gets > generic_write_sync? > > It's kind of a bandaid, but maybe less bad of one than restructuring the > btrfs locking model under time pressure... > I'd rather not mess around with the generic iomap stuff for this, coordinating changes between generic and fs stuff is annoying enough as it is. We've got a strategy to work around this in btrfs so we don't have to rip out the iomap work right now. And then we'll rip out the workaround once we've reworked the locking, since the locking stuff will require a fair bit of testing and soak time to be sure it's safe. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 23:58 ` Dave Chinner 2020-09-02 0:22 ` Josef Bacik @ 2020-09-02 11:44 ` Matthew Wilcox 2020-09-02 12:20 ` Dave Chinner 1 sibling, 1 reply; 31+ messages in thread From: Matthew Wilcox @ 2020-09-02 11:44 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Wed, Sep 02, 2020 at 09:58:30AM +1000, Dave Chinner wrote: > Put simply: converting a filesystem to use iomap is not a "change > the filesystem interfacing code and it will work" modification. We > ask that filesystems are modified to conform to the iomap IO > exclusion model; adding special cases for every potential > locking and mapping quirk every different filesystem has is part of > what turned the old direct IO code into an unmaintainable nightmare. > > > That's fine, but this is kind of a bad way to find > > out. We really shouldn't have generic helper's that have different generic > > locking rules based on which file system uses them. > > We certainly can change the rules for new infrastructure. Indeed, we > had to change the rules to support DAX. The whole point of the > iomap infrastructure was that it enabled us to use code that already > worked for DAX (the XFS code) in multiple filesystems. And as people > have realised that the DIO via iomap is much faster than the old DIO > code and is a much more efficient way of doing large buffered IO, > other filesystems have started to use it. > > However.... > > > Because then we end up > > with situations like this, where suddenly we're having to come up with some > > weird solution because the generic thing only works for a subset of file > > systems. Thanks, > > .... we've always said "you need to change the filesystem code to > use iomap". This is simply a reflection on the fact that iomap has > different rules and constraints to the old code and so it's not a > direct plug in replacement. There are no short cuts here... Can you point me (and I suspect Josef!) towards the documentation of the locking model? I was hoping to find Documentation/filesystems/iomap.rst but all the 'iomap' strings in Documentation/ refer to pci_iomap and similar, except for this in the DAX documentation: : - implementing ->read_iter and ->write_iter operations which use dax_iomap_rw() : when inode has S_DAX flag set : - implementing an mmap file operation for DAX files which sets the : VM_MIXEDMAP and VM_HUGEPAGE flags on the VMA, and setting the vm_ops to : include handlers for fault, pmd_fault, page_mkwrite, pfn_mkwrite. These : handlers should probably call dax_iomap_fault() passing the appropriate : fault size and iomap operations. : - calling iomap_zero_range() passing appropriate iomap operations instead of : block_truncate_page() for DAX files : - ensuring that there is sufficient locking between reads, writes, : truncates and page faults : : The iomap handlers for allocating blocks must make sure that allocated blocks : are zeroed out and converted to written extents before being returned to avoid : exposure of uninitialized data through mmap. which doesn't bear on this situation. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 11:44 ` Matthew Wilcox @ 2020-09-02 12:20 ` Dave Chinner 2020-09-02 12:42 ` Josef Bacik 0 siblings, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-02 12:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Wed, Sep 02, 2020 at 12:44:14PM +0100, Matthew Wilcox wrote: > On Wed, Sep 02, 2020 at 09:58:30AM +1000, Dave Chinner wrote: > > Put simply: converting a filesystem to use iomap is not a "change > > the filesystem interfacing code and it will work" modification. We > > ask that filesystems are modified to conform to the iomap IO > > exclusion model; adding special cases for every potential > > locking and mapping quirk every different filesystem has is part of > > what turned the old direct IO code into an unmaintainable nightmare. > > > > > That's fine, but this is kind of a bad way to find > > > out. We really shouldn't have generic helper's that have different generic > > > locking rules based on which file system uses them. > > > > We certainly can change the rules for new infrastructure. Indeed, we > > had to change the rules to support DAX. The whole point of the > > iomap infrastructure was that it enabled us to use code that already > > worked for DAX (the XFS code) in multiple filesystems. And as people > > have realised that the DIO via iomap is much faster than the old DIO > > code and is a much more efficient way of doing large buffered IO, > > other filesystems have started to use it. > > > > However.... > > > > > Because then we end up > > > with situations like this, where suddenly we're having to come up with some > > > weird solution because the generic thing only works for a subset of file > > > systems. Thanks, > > > > .... we've always said "you need to change the filesystem code to > > use iomap". This is simply a reflection on the fact that iomap has > > different rules and constraints to the old code and so it's not a > > direct plug in replacement. There are no short cuts here... > > Can you point me (and I suspect Josef!) towards the documentation of the > locking model? I was hoping to find Documentation/filesystems/iomap.rst > but all the 'iomap' strings in Documentation/ refer to pci_iomap and > similar, except for this in the DAX documentation: There's no locking model documentation because there is no locking in the iomap direct IO code. The filesystem defines and does all the locking, so there's pretty much nothing to document for iomap. IOWs, the only thing iomap_dio_rw requires is that the IO completion paths do not take same locks that the IO submission path requires. And that's because: /* * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the IO * is being issued as AIO or not. [...] So you obviously can't sit waiting for dio completion in iomap_dio_rw() while holding the submission lock if completion requires the submission lock to make progress. FWIW, iomap_dio_rw() originally required the inode_lock() to be held and contained a lockdep assert to verify this, but.... commit 3ad99bec6e82e32fa9faf2f84e74b134586b46f7 Author: Goldwyn Rodrigues <rgoldwyn@suse.com> Date: Sat Nov 30 09:59:25 2019 -0600 iomap: remove lockdep_assert_held() Filesystems such as btrfs can perform direct I/O without holding the inode->i_rwsem in some of the cases like writing within i_size. So, remove the check for lockdep_assert_held() in iomap_dio_rw(). Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> ... btrfs has special corner cases for direct IO locking and hence we removed the lockdep assert.... IOWs, iomap_dio_rw() really does not care what strategy filesystems use to serialise DIO against other operations. Filesystems can use whatever IO serialisation mechanism they want (mutex, rwsem, range locks, etc) as long as they obey the one simple requirement: do not take the DIO submission lock in the DIO completion path. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 12:20 ` Dave Chinner @ 2020-09-02 12:42 ` Josef Bacik 2020-09-03 2:28 ` Dave Chinner 0 siblings, 1 reply; 31+ messages in thread From: Josef Bacik @ 2020-09-02 12:42 UTC (permalink / raw) To: Dave Chinner, Matthew Wilcox Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On 9/2/20 8:20 AM, Dave Chinner wrote: > On Wed, Sep 02, 2020 at 12:44:14PM +0100, Matthew Wilcox wrote: >> On Wed, Sep 02, 2020 at 09:58:30AM +1000, Dave Chinner wrote: >>> Put simply: converting a filesystem to use iomap is not a "change >>> the filesystem interfacing code and it will work" modification. We >>> ask that filesystems are modified to conform to the iomap IO >>> exclusion model; adding special cases for every potential >>> locking and mapping quirk every different filesystem has is part of >>> what turned the old direct IO code into an unmaintainable nightmare. >>> >>>> That's fine, but this is kind of a bad way to find >>>> out. We really shouldn't have generic helper's that have different generic >>>> locking rules based on which file system uses them. >>> >>> We certainly can change the rules for new infrastructure. Indeed, we >>> had to change the rules to support DAX. The whole point of the >>> iomap infrastructure was that it enabled us to use code that already >>> worked for DAX (the XFS code) in multiple filesystems. And as people >>> have realised that the DIO via iomap is much faster than the old DIO >>> code and is a much more efficient way of doing large buffered IO, >>> other filesystems have started to use it. >>> >>> However.... >>> >>>> Because then we end up >>>> with situations like this, where suddenly we're having to come up with some >>>> weird solution because the generic thing only works for a subset of file >>>> systems. Thanks, >>> >>> .... we've always said "you need to change the filesystem code to >>> use iomap". This is simply a reflection on the fact that iomap has >>> different rules and constraints to the old code and so it's not a >>> direct plug in replacement. There are no short cuts here... >> >> Can you point me (and I suspect Josef!) towards the documentation of the >> locking model? I was hoping to find Documentation/filesystems/iomap.rst >> but all the 'iomap' strings in Documentation/ refer to pci_iomap and >> similar, except for this in the DAX documentation: > > There's no locking model documentation because there is no locking > in the iomap direct IO code. The filesystem defines and does all the > locking, so there's pretty much nothing to document for iomap. > > IOWs, the only thing iomap_dio_rw requires is that the IO completion > paths do not take same locks that the IO submission path > requires. And that's because: > > /* > * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the IO > * is being issued as AIO or not. [...] > > So you obviously can't sit waiting for dio completion in > iomap_dio_rw() while holding the submission lock if completion > requires the submission lock to make progress. > > FWIW, iomap_dio_rw() originally required the inode_lock() to be held > and contained a lockdep assert to verify this, but.... > > commit 3ad99bec6e82e32fa9faf2f84e74b134586b46f7 > Author: Goldwyn Rodrigues <rgoldwyn@suse.com> > Date: Sat Nov 30 09:59:25 2019 -0600 > > iomap: remove lockdep_assert_held() > > Filesystems such as btrfs can perform direct I/O without holding the > inode->i_rwsem in some of the cases like writing within i_size. So, > remove the check for lockdep_assert_held() in iomap_dio_rw(). > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> > Signed-off-by: David Sterba <dsterba@suse.com> > > ... btrfs has special corner cases for direct IO locking and hence > we removed the lockdep assert.... > > IOWs, iomap_dio_rw() really does not care what strategy filesystems > use to serialise DIO against other operations. Filesystems can use > whatever IO serialisation mechanism they want (mutex, rwsem, range > locks, etc) as long as they obey the one simple requirement: do not > take the DIO submission lock in the DIO completion path. > Goldwyn has been working on these patches for a long time, and is actually familiar with this code, and he missed that these two interfaces are being mixed. This is a problem that I want to solve. He didn't notice it in any of his testing, which IIRC was like 6 months to get this stuff actually into the btrfs tree. If we're going to mix interfaces then it should be blatantly obvious to developers that's what's happening so the find out during development, not after the patches have landed, and certainly not after they've made it out to users. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-02 12:42 ` Josef Bacik @ 2020-09-03 2:28 ` Dave Chinner 2020-09-03 9:49 ` Filipe Manana 0 siblings, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-03 2:28 UTC (permalink / raw) To: Josef Bacik Cc: Matthew Wilcox, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel On Wed, Sep 02, 2020 at 08:42:25AM -0400, Josef Bacik wrote: > On 9/2/20 8:20 AM, Dave Chinner wrote: > > On Wed, Sep 02, 2020 at 12:44:14PM +0100, Matthew Wilcox wrote: > > > On Wed, Sep 02, 2020 at 09:58:30AM +1000, Dave Chinner wrote: > > > > Put simply: converting a filesystem to use iomap is not a "change > > > > the filesystem interfacing code and it will work" modification. We > > > > ask that filesystems are modified to conform to the iomap IO > > > > exclusion model; adding special cases for every potential > > > > locking and mapping quirk every different filesystem has is part of > > > > what turned the old direct IO code into an unmaintainable nightmare. > > > > > > > > > That's fine, but this is kind of a bad way to find > > > > > out. We really shouldn't have generic helper's that have different generic > > > > > locking rules based on which file system uses them. > > > > > > > > We certainly can change the rules for new infrastructure. Indeed, we > > > > had to change the rules to support DAX. The whole point of the > > > > iomap infrastructure was that it enabled us to use code that already > > > > worked for DAX (the XFS code) in multiple filesystems. And as people > > > > have realised that the DIO via iomap is much faster than the old DIO > > > > code and is a much more efficient way of doing large buffered IO, > > > > other filesystems have started to use it. > > > > > > > > However.... > > > > > > > > > Because then we end up > > > > > with situations like this, where suddenly we're having to come up with some > > > > > weird solution because the generic thing only works for a subset of file > > > > > systems. Thanks, > > > > > > > > .... we've always said "you need to change the filesystem code to > > > > use iomap". This is simply a reflection on the fact that iomap has > > > > different rules and constraints to the old code and so it's not a > > > > direct plug in replacement. There are no short cuts here... > > > > > > Can you point me (and I suspect Josef!) towards the documentation of the > > > locking model? I was hoping to find Documentation/filesystems/iomap.rst > > > but all the 'iomap' strings in Documentation/ refer to pci_iomap and > > > similar, except for this in the DAX documentation: > > > > There's no locking model documentation because there is no locking > > in the iomap direct IO code. The filesystem defines and does all the > > locking, so there's pretty much nothing to document for iomap. > > > > IOWs, the only thing iomap_dio_rw requires is that the IO completion > > paths do not take same locks that the IO submission path > > requires. And that's because: > > > > /* > > * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the IO > > * is being issued as AIO or not. [...] > > > > So you obviously can't sit waiting for dio completion in > > iomap_dio_rw() while holding the submission lock if completion > > requires the submission lock to make progress. > > > > FWIW, iomap_dio_rw() originally required the inode_lock() to be held > > and contained a lockdep assert to verify this, but.... > > > > commit 3ad99bec6e82e32fa9faf2f84e74b134586b46f7 > > Author: Goldwyn Rodrigues <rgoldwyn@suse.com> > > Date: Sat Nov 30 09:59:25 2019 -0600 > > > > iomap: remove lockdep_assert_held() > > Filesystems such as btrfs can perform direct I/O without holding the > > inode->i_rwsem in some of the cases like writing within i_size. So, > > remove the check for lockdep_assert_held() in iomap_dio_rw(). > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> > > Signed-off-by: David Sterba <dsterba@suse.com> > > > > ... btrfs has special corner cases for direct IO locking and hence > > we removed the lockdep assert.... > > > > IOWs, iomap_dio_rw() really does not care what strategy filesystems > > use to serialise DIO against other operations. Filesystems can use > > whatever IO serialisation mechanism they want (mutex, rwsem, range > > locks, etc) as long as they obey the one simple requirement: do not > > take the DIO submission lock in the DIO completion path. > > > > Goldwyn has been working on these patches for a long time, and is actually > familiar with this code, and he missed that these two interfaces are being > mixed. This is a problem that I want to solve. He didn't notice it in any > of his testing, which IIRC was like 6 months to get this stuff actually into > the btrfs tree. And that "been working on it for 6 months" is what really worries me the most. Hence I've done some post-mortem analysis of the "inode_lock" deadlock situation. I decided to do this after I realised this same patch set was merged last cycle and reverted late in the cycle because of problems it was found to have with page invalidation when mixed IO types were used. I did a bit of looking around yesterday afternoon, both in the btrfs code to understand the iomap_dio_rw changes and at our test coverage for DIO functionality. The more I dug, the more I'm finding that the most important question we need to answer is not "why wasn't this iomap locking requirement documented", the question we should be trying to answer is this: Why wasn't this *obvious* deadlock detected months ago? First of all, the inode-lock() complaints are largely a red herring as the typical btrfs DIO write path through the patchset does not take the inode lock and hence it is actually "safe" to take the inode lock in the completion path in the common case. i.e. for DIO writes wholly inside EOF, submission drops the inode_lock() before calling iomap_dio_rw() and so there is no inode_lock() deadlock present. From that view point, it looks like there's only a specific corner case where this deadlock might occur and that would explain why it hasn't been discovered until now. However. The new btrfs dio write path always does down_read(BTRFS_I()->dio_sem), even when the inode lock has not been taken. That's important because btrfs_sync_file() always does a down_write(&BTRFS_I(inode)->dio_sem) call and will deadlock iin IO completion if the IO submission code is holding the dio_sem. So regardless of the inode_lock(), non-AIO O_DSYNC DIO writes through iomap_dio_rw() should deadlock immediately on the first IO with this locking arrangement. It will deadlock on either the inode_lock or the dio_sem, depending on whether the ODSYNC DIO write is wholly within EOF or not, but the deadlock is guaranteed to occur. Hence we can completely ignore the "inode_lock vs fsync" side show, because other btrfs internal locks will trigger the same issue. If this is correct, then how did this "should happen instantly" bug go undiscovered for months? Well.... It appears that fstests has no coverage of non-AIO O_DSYNC DIO writes. Tools like fsx and fstress don't have O_SYNC, O_DSYNC or RWF_DSYNC modes and O_SYNC and O_DSYNC is not used by any of the common DIO test programs. xfs_io can open files O_SYNC and there's a test that uses xfs_io's RWF_DSYNC capability, but they all use buffered IO and so none of tests that open or write data synchronously use direct IO. The only conclusion I can make from thsi is that the case that should deadlock instantly isn't actually covered by fstests at all. This conclusion only stands up this O_DSYNC code path was only "tested" for feature coverage with fstests. However, it does imply that no pre-implementation audit was done to determine if fstests actually covered all the functionality that needed to be tested here.... I tend to use xfs_io and fio for DIO feature correctness testing long before I run fstests on new code. That's how I developed and tested the FUA optimisations - xfs_io and fio w/ RWF_DSYNC on XFS on iscsi - so I've never noticed that fstests doesn't actually exercise this syscall path directly. Granted, the problem was eventually discovered by fstests, but this also raised questions. The failing test was an AIO+DIO O_DSYNC test, but the trigger has been described as a "unusual event on a rarely tested configuration". That "unusual event" was an DIO completion being run from submission context because the IO completed before the submission had been finish. This is not actually unusual - it's how all AIO on synchronous IO devices complete. i.e. if you have a ram device or a (fake) pmem device, every single AIO will complete in this way as the "IO reference" held by submit_bio() is completed inside submit_bio() before it returns to the submission context. Hence the submission context always drops the last IO reference and completes the IO. Therefore running fstests on a ramdisk or (fake) pmem would have triggered this deadlock *instantly* on the first O_DSYNC AIO+DIO write that fstests issued. This implies that btrfs is rarely tested on fast synchronous storage devices despite ramdisks being available on every test machine that can run fstests. To provide a contrast, the iomap infrastructure is regularly tested on such devices - both Darrick and I have both have (fake) pmem test setups and exercise synchronous completion code paths like this on a daily basis. Hence it looks to me like the answer to the "why wasn't this found earlier" question is a combination of multiple factors: 1. fstests has no specific non-AIO O_DSYNC DIO write unit tests, nor do the stress tests allow use O_DSYNC or RWF_DSYNC. 2. No test coverage audit was performed prior to making a critical change to the btrfs IO path so this specific lack of coverage was not noticed until now. 3. after the first revert of this functionality, post-mortem analysis either wasn't performed or didn't identify process and/or test coverage issues that allowed serious issues in the patch set to go undiscovered. 4. tools and benchmarks that could have easily discovered the problem either weren't run or they weren't configured to test and exercise all the IO path features the change affected. 5. btrfs is not regularly tested on a variety of storage that have distinctly different IO path behaviours. > If we're going to mix interfaces then it should be > blatantly obvious to developers that's what's happening so the find out > during development, not after the patches have landed, and certainly not > after they've made it out to users. Thanks, As the above indicates, this issue _should_ have been blatantly obvious months ago and documentation would not change this fact. IOWs, even if the iomap requirement was documented and followed, a locking bug in the btrfs implementation would still not have been discovered until now because that's how long it took to actually exercise the buggy code path and expose it. So, yeah, the lack of documentation contributed to the bug being present. But taking 6 months to actually exercise the new code containing the bug is most definitely not an interface documentation problem, nor a problem that can be fixed by correcting the interface documentation.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-03 2:28 ` Dave Chinner @ 2020-09-03 9:49 ` Filipe Manana 0 siblings, 0 replies; 31+ messages in thread From: Filipe Manana @ 2020-09-03 9:49 UTC (permalink / raw) To: Dave Chinner Cc: Josef Bacik, Matthew Wilcox, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Christoph Hellwig, Linux FS Devel On Thu, Sep 3, 2020 at 3:28 AM Dave Chinner <david@fromorbit.com> wrote: > > On Wed, Sep 02, 2020 at 08:42:25AM -0400, Josef Bacik wrote: > > On 9/2/20 8:20 AM, Dave Chinner wrote: > > > On Wed, Sep 02, 2020 at 12:44:14PM +0100, Matthew Wilcox wrote: > > > > On Wed, Sep 02, 2020 at 09:58:30AM +1000, Dave Chinner wrote: > > > > > Put simply: converting a filesystem to use iomap is not a "change > > > > > the filesystem interfacing code and it will work" modification. We > > > > > ask that filesystems are modified to conform to the iomap IO > > > > > exclusion model; adding special cases for every potential > > > > > locking and mapping quirk every different filesystem has is part of > > > > > what turned the old direct IO code into an unmaintainable nightmare. > > > > > > > > > > > That's fine, but this is kind of a bad way to find > > > > > > out. We really shouldn't have generic helper's that have different generic > > > > > > locking rules based on which file system uses them. > > > > > > > > > > We certainly can change the rules for new infrastructure. Indeed, we > > > > > had to change the rules to support DAX. The whole point of the > > > > > iomap infrastructure was that it enabled us to use code that already > > > > > worked for DAX (the XFS code) in multiple filesystems. And as people > > > > > have realised that the DIO via iomap is much faster than the old DIO > > > > > code and is a much more efficient way of doing large buffered IO, > > > > > other filesystems have started to use it. > > > > > > > > > > However.... > > > > > > > > > > > Because then we end up > > > > > > with situations like this, where suddenly we're having to come up with some > > > > > > weird solution because the generic thing only works for a subset of file > > > > > > systems. Thanks, > > > > > > > > > > .... we've always said "you need to change the filesystem code to > > > > > use iomap". This is simply a reflection on the fact that iomap has > > > > > different rules and constraints to the old code and so it's not a > > > > > direct plug in replacement. There are no short cuts here... > > > > > > > > Can you point me (and I suspect Josef!) towards the documentation of the > > > > locking model? I was hoping to find Documentation/filesystems/iomap.rst > > > > but all the 'iomap' strings in Documentation/ refer to pci_iomap and > > > > similar, except for this in the DAX documentation: > > > > > > There's no locking model documentation because there is no locking > > > in the iomap direct IO code. The filesystem defines and does all the > > > locking, so there's pretty much nothing to document for iomap. > > > > > > IOWs, the only thing iomap_dio_rw requires is that the IO completion > > > paths do not take same locks that the IO submission path > > > requires. And that's because: > > > > > > /* > > > * iomap_dio_rw() always completes O_[D]SYNC writes regardless of whether the IO > > > * is being issued as AIO or not. [...] > > > > > > So you obviously can't sit waiting for dio completion in > > > iomap_dio_rw() while holding the submission lock if completion > > > requires the submission lock to make progress. > > > > > > FWIW, iomap_dio_rw() originally required the inode_lock() to be held > > > and contained a lockdep assert to verify this, but.... > > > > > > commit 3ad99bec6e82e32fa9faf2f84e74b134586b46f7 > > > Author: Goldwyn Rodrigues <rgoldwyn@suse.com> > > > Date: Sat Nov 30 09:59:25 2019 -0600 > > > > > > iomap: remove lockdep_assert_held() > > > Filesystems such as btrfs can perform direct I/O without holding the > > > inode->i_rwsem in some of the cases like writing within i_size. So, > > > remove the check for lockdep_assert_held() in iomap_dio_rw(). > > > Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> > > > Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> > > > Signed-off-by: David Sterba <dsterba@suse.com> > > > > > > ... btrfs has special corner cases for direct IO locking and hence > > > we removed the lockdep assert.... > > > > > > IOWs, iomap_dio_rw() really does not care what strategy filesystems > > > use to serialise DIO against other operations. Filesystems can use > > > whatever IO serialisation mechanism they want (mutex, rwsem, range > > > locks, etc) as long as they obey the one simple requirement: do not > > > take the DIO submission lock in the DIO completion path. > > > > > > > Goldwyn has been working on these patches for a long time, and is actually > > familiar with this code, and he missed that these two interfaces are being > > mixed. This is a problem that I want to solve. He didn't notice it in any > > of his testing, which IIRC was like 6 months to get this stuff actually into > > the btrfs tree. > > And that "been working on it for 6 months" is what really worries me > the most. Hence I've done some post-mortem analysis of the > "inode_lock" deadlock situation. I decided to do this after I > realised this same patch set was merged last cycle and reverted late in > the cycle because of problems it was found to have with page > invalidation when mixed IO types were used. > > I did a bit of looking around yesterday afternoon, both in the btrfs > code to understand the iomap_dio_rw changes and at our test > coverage for DIO functionality. > > The more I dug, the more I'm finding that the most important > question we need to answer is not "why wasn't this iomap locking > requirement documented", the question we should be trying to answer > is this: > > Why wasn't this *obvious* deadlock detected months ago? > > First of all, the inode-lock() complaints are largely a red herring > as the typical btrfs DIO write path through the patchset does not > take the inode lock and hence it is actually "safe" to take the > inode lock in the completion path in the common case. i.e. for DIO > writes wholly inside EOF, submission drops the inode_lock() before > calling iomap_dio_rw() and so there is no inode_lock() deadlock > present. From that view point, it looks like there's only a specific > corner case where this deadlock might occur and that would explain > why it hasn't been discovered until now. > > However. > > The new btrfs dio write path always does > down_read(BTRFS_I()->dio_sem), even when the inode lock has not been > taken. That's important because btrfs_sync_file() always does a > down_write(&BTRFS_I(inode)->dio_sem) call and will deadlock iin IO > completion if the IO submission code is holding the dio_sem. Taking a look at the latest integration branch (misc-next) after 2 weeks away, indeed the new iomap dio switch change still has that problem. I reported that in June: https://lore.kernel.org/linux-btrfs/CAL3q7H4F9iQJy3tgwZrWOKwenAnnn7oSthQZUMEJ_vWx3WE3WQ@mail.gmail.com/ For me, with generic/113 the deadlock happens always. > > So regardless of the inode_lock(), non-AIO O_DSYNC DIO writes > through iomap_dio_rw() should deadlock immediately on the first IO > with this locking arrangement. It will deadlock on either the > inode_lock or the dio_sem, depending on whether the ODSYNC DIO write > is wholly within EOF or not, but the deadlock is guaranteed to > occur. Hence we can completely ignore the "inode_lock vs fsync" side > show, because other btrfs internal locks will trigger the same > issue. > > If this is correct, then how did this "should happen instantly" > bug go undiscovered for months? > > Well.... It appears that fstests has no coverage of non-AIO > O_DSYNC DIO writes. Tools like fsx and fstress don't have O_SYNC, > O_DSYNC or RWF_DSYNC modes and O_SYNC and O_DSYNC is not used by any > of the common DIO test programs. xfs_io can open files O_SYNC and > there's a test that uses xfs_io's RWF_DSYNC capability, but they all > use buffered IO and so none of tests that open or write data > synchronously use direct IO. > > The only conclusion I can make from thsi is that the case that > should deadlock instantly isn't actually covered by fstests at all. > This conclusion only stands up this O_DSYNC code path was only > "tested" for feature coverage with fstests. However, it does imply > that no pre-implementation audit was done to determine if fstests > actually covered all the functionality that needed to be tested > here.... > > I tend to use xfs_io and fio for DIO feature correctness testing > long before I run fstests on new code. That's how I developed and > tested the FUA optimisations - xfs_io and fio w/ RWF_DSYNC on XFS on > iscsi - so I've never noticed that fstests doesn't actually exercise > this syscall path directly. > > Granted, the problem was eventually discovered by fstests, but this > also raised questions. The failing test was an AIO+DIO O_DSYNC test, > but the trigger has been described as a "unusual event on a rarely > tested configuration". > > That "unusual event" was an DIO completion being run from submission > context because the IO completed before the submission had been > finish. This is not actually unusual - it's how all AIO on > synchronous IO devices complete. i.e. if you have a ram device or a > (fake) pmem device, every single AIO will complete in this way as > the "IO reference" held by submit_bio() is completed inside > submit_bio() before it returns to the submission context. Hence the > submission context always drops the last IO reference and completes > the IO. > > Therefore running fstests on a ramdisk or (fake) pmem would have > triggered this deadlock *instantly* on the first O_DSYNC AIO+DIO > write that fstests issued. This implies that btrfs is rarely tested > on fast synchronous storage devices despite ramdisks being available > on every test machine that can run fstests. To provide a contrast, > the iomap infrastructure is regularly tested on such devices - both > Darrick and I have both have (fake) pmem test setups and exercise > synchronous completion code paths like this on a daily basis. > > Hence it looks to me like the answer to the "why wasn't this found > earlier" question is a combination of multiple factors: > > 1. fstests has no specific non-AIO O_DSYNC DIO write unit tests, nor > do the stress tests allow use O_DSYNC or RWF_DSYNC. > > 2. No test coverage audit was performed prior to making a critical > change to the btrfs IO path so this specific lack of coverage was > not noticed until now. > > 3. after the first revert of this functionality, post-mortem > analysis either wasn't performed or didn't identify process and/or > test coverage issues that allowed serious issues in the patch set to > go undiscovered. > > 4. tools and benchmarks that could have easily discovered the > problem either weren't run or they weren't configured to test > and exercise all the IO path features the change affected. > > 5. btrfs is not regularly tested on a variety of storage that have > distinctly different IO path behaviours. > > > If we're going to mix interfaces then it should be > > blatantly obvious to developers that's what's happening so the find out > > during development, not after the patches have landed, and certainly not > > after they've made it out to users. Thanks, > > As the above indicates, this issue _should_ have been blatantly > obvious months ago and documentation would not change this fact. > IOWs, even if the iomap requirement was documented and followed, a > locking bug in the btrfs implementation would still not have been > discovered until now because that's how long it took to actually > exercise the buggy code path and expose it. > > So, yeah, the lack of documentation contributed to the bug being > present. But taking 6 months to actually exercise the new code > containing the bug is most definitely not an interface documentation > problem, nor a problem that can be fixed by correcting the interface > documentation.... > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.” ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-01 15:11 ` Josef Bacik 2020-09-01 17:45 ` Darrick J. Wong 2020-09-01 21:46 ` Dave Chinner @ 2020-09-03 16:32 ` Christoph Hellwig 2020-09-03 16:46 ` Josef Bacik 2020-09-07 0:04 ` Dave Chinner 2 siblings, 2 replies; 31+ messages in thread From: Christoph Hellwig @ 2020-09-03 16:32 UTC (permalink / raw) To: Josef Bacik Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Christoph Hellwig, Linux FS Devel We could trivially do something like this to allow the file system to call iomap_dio_complete without i_rwsem: diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index c1aafb2ab99072..d970c6bbbe115d 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -76,7 +76,7 @@ static void iomap_dio_submit_bio(struct iomap_dio *dio, struct iomap *iomap, dio->submit.cookie = submit_bio(bio); } -static ssize_t iomap_dio_complete(struct iomap_dio *dio) +ssize_t iomap_dio_complete(struct iomap_dio *dio) { const struct iomap_dio_ops *dops = dio->dops; struct kiocb *iocb = dio->iocb; @@ -130,6 +130,7 @@ static ssize_t iomap_dio_complete(struct iomap_dio *dio) return ret; } +EXPORT_SYMBOL_GPL(iomap_dio_complete); static void iomap_dio_complete_work(struct work_struct *work) { @@ -406,8 +407,8 @@ iomap_dio_actor(struct inode *inode, loff_t pos, loff_t length, * Returns -ENOTBLK In case of a page invalidation invalidation failure for * writes. The callers needs to fall back to buffered I/O in this case. */ -ssize_t -iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, +struct iomap_dio * +__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, bool wait_for_completion) { @@ -421,14 +422,14 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, struct iomap_dio *dio; if (!count) - return 0; + return NULL; if (WARN_ON(is_sync_kiocb(iocb) && !wait_for_completion)) - return -EIO; + return ERR_PTR(-EIO); dio = kmalloc(sizeof(*dio), GFP_KERNEL); if (!dio) - return -ENOMEM; + return ERR_PTR(-ENOMEM); dio->iocb = iocb; atomic_set(&dio->ref, 1); @@ -558,7 +559,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->wait_for_completion = wait_for_completion; if (!atomic_dec_and_test(&dio->ref)) { if (!wait_for_completion) - return -EIOCBQUEUED; + return ERR_PTR(-EIOCBQUEUED); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); @@ -574,10 +575,25 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, __set_current_state(TASK_RUNNING); } - return iomap_dio_complete(dio); + return dio; out_free_dio: kfree(dio); - return ret; + return ERR_PTR(ret); +} +EXPORT_SYMBOL_GPL(__iomap_dio_rw); + +ssize_t +iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops, const struct iomap_dio_ops *dops, + bool wait_for_completion) +{ + struct iomap_dio *dio; + + dio = __iomap_dio_rw(iocb, iter, ops, dops, wait_for_completion); + if (IS_ERR_OR_NULL(dio)) + return PTR_ERR_OR_ZERO(dio); + return iomap_dio_complete(dio); } EXPORT_SYMBOL_GPL(iomap_dio_rw); + diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4d1d3c3469e9a4..172b3397a1a371 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -13,6 +13,7 @@ struct address_space; struct fiemap_extent_info; struct inode; +struct iomap_dio; struct iomap_writepage_ctx; struct iov_iter; struct kiocb; @@ -258,6 +259,10 @@ struct iomap_dio_ops { ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, bool wait_for_completion); +struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, + const struct iomap_ops *ops, const struct iomap_dio_ops *dops, + bool wait_for_completion); +ssize_t iomap_dio_complete(struct iomap_dio *dio); int iomap_dio_iopoll(struct kiocb *kiocb, bool spin); #ifdef CONFIG_SWAP ^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-03 16:32 ` Christoph Hellwig @ 2020-09-03 16:46 ` Josef Bacik 2020-09-07 0:04 ` Dave Chinner 1 sibling, 0 replies; 31+ messages in thread From: Josef Bacik @ 2020-09-03 16:46 UTC (permalink / raw) To: Christoph Hellwig Cc: Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On 9/3/20 12:32 PM, Christoph Hellwig wrote: > We could trivially do something like this to allow the file system > to call iomap_dio_complete without i_rwsem: > This is sort of how I envisioned it worked, which is why I was surprised when it bit us. This would be the cleanest stop-gap right now, but as it stands I've already sent a patch to work around the problem in btrfs for now, and we plan to take up the locking rework next go around. Thanks, Josef ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-03 16:32 ` Christoph Hellwig 2020-09-03 16:46 ` Josef Bacik @ 2020-09-07 0:04 ` Dave Chinner 2020-09-15 21:48 ` Goldwyn Rodrigues 1 sibling, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-07 0:04 UTC (permalink / raw) To: Christoph Hellwig Cc: Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On Thu, Sep 03, 2020 at 06:32:36PM +0200, Christoph Hellwig wrote: > We could trivially do something like this to allow the file system > to call iomap_dio_complete without i_rwsem: That just exposes another deadlock vector: P0 P1 inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) __iomap_dio_rw() inode_lock() <block> <submits IO> <completes IO> inode_unlock() <gets inode_lock()> inode_dio_wait() iomap_dio_complete() generic_write_sync() btrfs_file_fsync() inode_lock() <deadlock> Basically, the only safe thing to do is implement ->fsync without holding the DIO IO submission lock.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-07 0:04 ` Dave Chinner @ 2020-09-15 21:48 ` Goldwyn Rodrigues 2020-09-17 3:09 ` Dave Chinner 0 siblings, 1 reply; 31+ messages in thread From: Goldwyn Rodrigues @ 2020-09-15 21:48 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On 10:04 07/09, Dave Chinner wrote: > On Thu, Sep 03, 2020 at 06:32:36PM +0200, Christoph Hellwig wrote: > > We could trivially do something like this to allow the file system > > to call iomap_dio_complete without i_rwsem: > > That just exposes another deadlock vector: > > P0 P1 > inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) > __iomap_dio_rw() inode_lock() > <block> > <submits IO> > <completes IO> > inode_unlock() > <gets inode_lock()> > inode_dio_wait() > iomap_dio_complete() > generic_write_sync() > btrfs_file_fsync() > inode_lock() > <deadlock> Can inode_dio_end() be called before generic_write_sync(), as it is done in fs/direct-io.c:dio_complete()? Christoph's solution is a clean approach and would prefer to use it as the final solution. -- Goldwyn ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-15 21:48 ` Goldwyn Rodrigues @ 2020-09-17 3:09 ` Dave Chinner 2020-09-17 5:52 ` Christoph Hellwig 0 siblings, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-17 3:09 UTC (permalink / raw) To: Goldwyn Rodrigues Cc: Christoph Hellwig, Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On Tue, Sep 15, 2020 at 04:48:53PM -0500, Goldwyn Rodrigues wrote: > On 10:04 07/09, Dave Chinner wrote: > > On Thu, Sep 03, 2020 at 06:32:36PM +0200, Christoph Hellwig wrote: > > > We could trivially do something like this to allow the file system > > > to call iomap_dio_complete without i_rwsem: > > > > That just exposes another deadlock vector: > > > > P0 P1 > > inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) > > __iomap_dio_rw() inode_lock() > > <block> > > <submits IO> > > <completes IO> > > inode_unlock() > > <gets inode_lock()> > > inode_dio_wait() > > iomap_dio_complete() > > generic_write_sync() > > btrfs_file_fsync() > > inode_lock() > > <deadlock> > > Can inode_dio_end() be called before generic_write_sync(), as it is done > in fs/direct-io.c:dio_complete()? Don't think so. inode_dio_wait() is supposed to indicate that all DIO is complete, and having the "make it stable" parts of an O_DSYNC DIO still running after inode_dio_wait() returns means that we still have DIO running.... For some filesystems, ensuring the DIO data is stable may involve flushing other data (perhaps we did EOF zeroing before the file extending DIO) and/or metadata to the log, so we need to guarantee these DIO related operations are complete and stable before we say the DIO is done. > Christoph's solution is a clean approach and would prefer to use it as > the final solution. /me shrugs Christoph's solution simply means you can't use inode_dio_wait() in the filesystem. btrfs would need its own DIO barrier.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-17 3:09 ` Dave Chinner @ 2020-09-17 5:52 ` Christoph Hellwig 2020-09-17 6:29 ` Dave Chinner 0 siblings, 1 reply; 31+ messages in thread From: Christoph Hellwig @ 2020-09-17 5:52 UTC (permalink / raw) To: Dave Chinner Cc: Goldwyn Rodrigues, Christoph Hellwig, Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On Thu, Sep 17, 2020 at 01:09:42PM +1000, Dave Chinner wrote: > > > iomap_dio_complete() > > > generic_write_sync() > > > btrfs_file_fsync() > > > inode_lock() > > > <deadlock> > > > > Can inode_dio_end() be called before generic_write_sync(), as it is done > > in fs/direct-io.c:dio_complete()? > > Don't think so. inode_dio_wait() is supposed to indicate that all > DIO is complete, and having the "make it stable" parts of an O_DSYNC > DIO still running after inode_dio_wait() returns means that we still > have DIO running.... > > For some filesystems, ensuring the DIO data is stable may involve > flushing other data (perhaps we did EOF zeroing before the file > extending DIO) and/or metadata to the log, so we need to guarantee > these DIO related operations are complete and stable before we say > the DIO is done. inode_dio_wait really just waits for active I/O that writes to or reads from the file. It does not imply that the I/O is stable, just like i_rwsem itself doesn't. Various file systems have historically called the syncing outside i_rwsem and inode_dio_wait (in fact that is what the fs/direct-io.c code does, so XFS did as well until a few years ago), and that isn't a problem at all - we just can't return to userspace (or call ki_complete for in-kernel users) before the data is stable on disk. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-17 5:52 ` Christoph Hellwig @ 2020-09-17 6:29 ` Dave Chinner 2020-09-17 6:42 ` Christoph Hellwig 0 siblings, 1 reply; 31+ messages in thread From: Dave Chinner @ 2020-09-17 6:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Goldwyn Rodrigues, Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On Thu, Sep 17, 2020 at 07:52:32AM +0200, Christoph Hellwig wrote: > On Thu, Sep 17, 2020 at 01:09:42PM +1000, Dave Chinner wrote: > > > > iomap_dio_complete() > > > > generic_write_sync() > > > > btrfs_file_fsync() > > > > inode_lock() > > > > <deadlock> > > > > > > Can inode_dio_end() be called before generic_write_sync(), as it is done > > > in fs/direct-io.c:dio_complete()? > > > > Don't think so. inode_dio_wait() is supposed to indicate that all > > DIO is complete, and having the "make it stable" parts of an O_DSYNC > > DIO still running after inode_dio_wait() returns means that we still > > have DIO running.... > > > > For some filesystems, ensuring the DIO data is stable may involve > > flushing other data (perhaps we did EOF zeroing before the file > > extending DIO) and/or metadata to the log, so we need to guarantee > > these DIO related operations are complete and stable before we say > > the DIO is done. > > inode_dio_wait really just waits for active I/O that writes to or reads > from the file. It does not imply that the I/O is stable, just like > i_rwsem itself doesn't. No, but iomap_dio_rw() considers a O_DSYNC write to be incomplete until it is stable so that it presents consistent behaviour to anythign calling inode_dio_wait(). > Various file systems have historically called > the syncing outside i_rwsem and inode_dio_wait (in fact that is what the > fs/direct-io.c code does, so XFS did as well until a few years ago), and > that isn't a problem at all - we just can't return to userspace (or call > ki_complete for in-kernel users) before the data is stable on disk. I'm really not caring about userspace here - we use inode_dio_wait() as an IO completion notification for the purposes of synchronising internal filesystem state before modifying user data via direct metadata manipulation. Hence I want sane, consistent, predictable IO completion notification behaviour regardless of the implementation path it goes through. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context 2020-09-17 6:29 ` Dave Chinner @ 2020-09-17 6:42 ` Christoph Hellwig 0 siblings, 0 replies; 31+ messages in thread From: Christoph Hellwig @ 2020-09-17 6:42 UTC (permalink / raw) To: Dave Chinner Cc: Christoph Hellwig, Goldwyn Rodrigues, Josef Bacik, Johannes Thumshirn, David Sterba, linux-btrfs @ vger . kernel . org, Filipe Manana, Linux FS Devel On Thu, Sep 17, 2020 at 04:29:23PM +1000, Dave Chinner wrote: > > inode_dio_wait really just waits for active I/O that writes to or reads > > from the file. It does not imply that the I/O is stable, just like > > i_rwsem itself doesn't. > > No, but iomap_dio_rw() considers a O_DSYNC write to be incomplete > until it is stable so that it presents consistent behaviour to > anythign calling inode_dio_wait(). But that point is that inode_dio_wait does not care about that "consistency". It cares about when the I/O is done. I know because I wrote it (and I regret that as we should have stuck with the non-owner release of the rwsem which makes a whole lot more sense). > > > Various file systems have historically called > > the syncing outside i_rwsem and inode_dio_wait (in fact that is what the > > fs/direct-io.c code does, so XFS did as well until a few years ago), and > > that isn't a problem at all - we just can't return to userspace (or call > > ki_complete for in-kernel users) before the data is stable on disk. > > I'm really not caring about userspace here - we use inode_dio_wait() > as an IO completion notification for the purposes of synchronising > internal filesystem state before modifying user data via direct > metadata manipulation. Hence I want sane, consistent, predictable IO > completion notification behaviour regardless of the implementation > path it goes through. And none of that consistency matters. Think of it: - an O_(D)SYNC write is nothing but a write plus a ranged fsync, even if we do some optimizations to speed up the fsync by e.g. using the FUA flag - another fsync can come up at any time after we completed a write (with or without O_SYNC) - so any synchronization using inode_dio_wait (or i_rwsem for that matter) must not care if an fsync runs in parallel. - take a look at where we call inode_dio_wait to verify this - the prime original use case was truncate as we can't have I/O in progress while trunating. We then later extended it to all the truncate-like more compliated operations like hole punches, extent insert an collapse, etc. But in all that cases what matters is the actual I/O, not the sync. By having done direct I/O the page cache side of the sync doesn't matter to start with (but the callers all invalidate it anyway), so what matter is the metadata flush, aka the log force in the XFS case. And for that we absolutely do not need to be before inode_dio_wait returns. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com ---end quoted text--- ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2020-09-17 6:42 UTC | newest] Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-01 13:06 [RFC PATCH] btrfs: don't call btrfs_sync_file from iomap context Johannes Thumshirn 2020-09-01 13:11 ` Johannes Thumshirn 2020-09-01 14:17 ` Goldwyn Rodrigues 2020-09-01 14:20 ` Johannes Thumshirn 2020-09-01 14:37 ` Filipe Manana 2020-09-01 14:44 ` Johannes Thumshirn 2020-09-01 18:40 ` Goldwyn Rodrigues 2020-09-01 15:11 ` Josef Bacik 2020-09-01 17:45 ` Darrick J. Wong 2020-09-01 17:55 ` Josef Bacik 2020-09-01 21:46 ` Dave Chinner 2020-09-01 22:19 ` Josef Bacik 2020-09-01 23:58 ` Dave Chinner 2020-09-02 0:22 ` Josef Bacik 2020-09-02 7:12 ` Johannes Thumshirn 2020-09-02 11:10 ` Josef Bacik 2020-09-02 16:29 ` Darrick J. Wong 2020-09-02 16:47 ` Josef Bacik 2020-09-02 11:44 ` Matthew Wilcox 2020-09-02 12:20 ` Dave Chinner 2020-09-02 12:42 ` Josef Bacik 2020-09-03 2:28 ` Dave Chinner 2020-09-03 9:49 ` Filipe Manana 2020-09-03 16:32 ` Christoph Hellwig 2020-09-03 16:46 ` Josef Bacik 2020-09-07 0:04 ` Dave Chinner 2020-09-15 21:48 ` Goldwyn Rodrigues 2020-09-17 3:09 ` Dave Chinner 2020-09-17 5:52 ` Christoph Hellwig 2020-09-17 6:29 ` Dave Chinner 2020-09-17 6:42 ` Christoph Hellwig
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.