* Problems doing DIO to netfs cache on XFS from Ceph
@ 2020-12-03 14:10 David Howells
2020-12-03 15:50 ` Matthew Wilcox
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: David Howells @ 2020-12-03 14:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: dhowells, jlayton, Matthew Wilcox (Oracle),
dchinner, linux-fsdevel, linux-cachefs
Hi Christoph,
We're having a problem making the fscache/cachefiles rewrite work with XFS, if
you could have a look? Jeff Layton just tripped the attached warning from
this:
/*
* Given that we do not allow direct reclaim to call us, we should
* never be called in a recursive filesystem reclaim context.
*/
if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
goto redirty;
The chain of events is the following:
(1) Ceph is asked to do an ordinary write by userspace. It calls the fscache
netfs_write_begin() helper to read the region it's going to modify so
that the cache can be preloaded.
(2) In this case, the cache already has it, so cachefiles_read() dispatches
an async DIO read to the backing filesystem (in this case XFS).
(3) iomap, on behalf of XFS, flushes the pagecache attached to the backing
inode, which appears to be populated, causing do_writepages() to run.
(4) The XFS write-out eventually wends its way to iomap_do_writepage(), which
complains about NOFS being set and cancels the write.
Now, I'm doing:
old_nofs = memalloc_nofs_save();
ret = call_read_iter(file, &ki->iocb, iter);
memalloc_nofs_restore(old_nofs);
in cachefiles_read() to prevent the cache causing writeout in the netfs to
occur. Possibly overriding NOFS here is overkill and is only really necessary
in cachefiles_write() - which can be called from netfs writeback.
cachefiles_read() should only be called from netfs ->readpage(), ->readahead()
and ->write_begin() and maybe a workqueue in the case that the cache returns a
short read.
Note that I'm only doing async DIO reads and writes, so I was a bit surprised
that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
actually just a hint and the filesystem can turn it into buffered I/O if it
wants.
Thanks,
David
---
WARNING: CPU: 6 PID: 7412 at fs/iomap/buffered-io.c:1465 iomap_do_writepage+0x76a/0x8b0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
RIP: 0010:iomap_do_writepage+0x76a/0x8b0
Code: 89 f5 41 89 c7 48 83 7d 48 00 0f 85 6e fb ff ff 48 8b 44 24 48 48 8d 5c 24 48 48 39 d8 0f 84 5b fb ff ff 0f 0b e9 54 fb ff ff <0f> 0b e9 76 ff ff ff 0f 0b e9 64 fb ff ff 0f 0b e9 9a fb ff ff 0f
RSP: 0018:ffffb19b4155f6e0 EFLAGS: 00010206
RAX: 0000000000440100 RBX: ffffb19b4155f7a8 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffb19b4155f940 RDI: ffffd1e484446740
RBP: ffffb19b4155f868 R08: ffffffffffffffff R09: 0000000000030360
R10: 0000000000000002 R11: 0000000000000006 R12: ffff8a5108ad4d30
R13: 0000000000002a9a R14: ffffb19b4155f7b0 R15: ffffd1e484446740
FS: 00007f6ff479d740(0000) GS:ffff8a542fb80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000fc02a8 CR3: 000000013b218000 CR4: 00000000003506e0
Call Trace:
? page_referenced_one+0x150/0x150
? __mod_memcg_lruvec_state+0x21/0xe0
? clear_page_dirty_for_io+0xf1/0x240
write_cache_pages+0x186/0x3d0
? iomap_readahead+0x1b0/0x1b0
? blk_mq_submit_bio+0x2ee/0x4f0
? elv_rb_del+0x1f/0x30
? deadline_remove_request+0x55/0xb0
? dd_dispatch_request+0x151/0x210
iomap_writepages+0x1c/0x40
xfs_vm_writepages+0x56/0x70 [xfs]
do_writepages+0x28/0xa0
? xfs_iunlock+0xa3/0xe0 [xfs]
? wbc_attach_and_unlock_inode+0xb5/0x140
__filemap_fdatawrite_range+0xa7/0xe0
filemap_write_and_wait_range+0x3d/0x90
__iomap_dio_rw+0x149/0x490
iomap_dio_rw+0xe/0x30
xfs_file_dio_aio_read+0xb9/0x100 [xfs]
xfs_file_read_iter+0xba/0xd0 [xfs]
cachefiles_read+0x1ee/0x3f0 [cachefiles]
? netfs_subreq_terminated+0x240/0x240 [netfs]
netfs_read_from_cache+0x70/0x80 [netfs]
netfs_rreq_submit_slice+0x169/0x310 [netfs]
netfs_write_begin+0x4e4/0x6a0 [netfs]
? ceph_put_fmode+0x43/0xd0 [ceph]
ceph_write_begin+0x141/0x250 [ceph]
generic_perform_write+0xaf/0x190
ceph_write_iter+0xab6/0xc90 [ceph]
? _cond_resched+0x16/0x40
? __ceph_setattr+0x895/0x960 [ceph]
? new_sync_write+0x108/0x180
new_sync_write+0x108/0x180
vfs_write+0x1bc/0x270
ksys_write+0x4f/0xc0
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems doing DIO to netfs cache on XFS from Ceph
2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
@ 2020-12-03 15:50 ` Matthew Wilcox
2020-12-03 16:04 ` David Howells
2020-12-03 22:12 ` Dave Chinner
2 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2020-12-03 15:50 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, jlayton, dchinner, linux-fsdevel, linux-cachefs
On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> Note that I'm only doing async DIO reads and writes, so I was a bit surprised
> that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
> actually just a hint and the filesystem can turn it into buffered I/O if it
> wants.
That's almost the exact opposite of what is going on. XFS sees that
you're going to do an O_DIRECT read, so it writes back the dirty memory
that's currently in the page cache so that your read doesn't read stale
data from disk.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems doing DIO to netfs cache on XFS from Ceph
2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
2020-12-03 15:50 ` Matthew Wilcox
@ 2020-12-03 16:04 ` David Howells
2020-12-03 22:12 ` Dave Chinner
2 siblings, 0 replies; 6+ messages in thread
From: David Howells @ 2020-12-03 16:04 UTC (permalink / raw)
To: Matthew Wilcox
Cc: dhowells, Christoph Hellwig, jlayton, dchinner, linux-fsdevel,
linux-cachefs
Matthew Wilcox <willy@infradead.org> wrote:
> > Note that I'm only doing async DIO reads and writes, so I was a bit surprised
> > that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
> > actually just a hint and the filesystem can turn it into buffered I/O if it
> > wants.
>
> That's almost the exact opposite of what is going on. XFS sees that
> you're going to do an O_DIRECT read, so it writes back the dirty memory
> that's currently in the page cache so that your read doesn't read stale
> data from disk.
In this trace, yes, that's true - but where did the dirty memory in the
pagecache come from? I'm only doing DIO reads and DIO writes - oh, and as it
turns out, fallocate(FALLOC_FL_ZERO_RANGE) - which, I think, may be the source
of the dirty data.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems doing DIO to netfs cache on XFS from Ceph
2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
2020-12-03 15:50 ` Matthew Wilcox
2020-12-03 16:04 ` David Howells
@ 2020-12-03 22:12 ` Dave Chinner
2020-12-03 23:05 ` Matthew Wilcox
2 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2020-12-03 22:12 UTC (permalink / raw)
To: David Howells
Cc: Christoph Hellwig, jlayton, Matthew Wilcox (Oracle),
dchinner, linux-fsdevel, linux-cachefs
On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> Hi Christoph,
>
> We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> you could have a look? Jeff Layton just tripped the attached warning from
> this:
>
> /*
> * Given that we do not allow direct reclaim to call us, we should
> * never be called in a recursive filesystem reclaim context.
> */
> if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> goto redirty;
I've pointed out in other threads where issues like this have been
raised that this check is not correct and was broken some time ago
by the PF_FSTRANS removal. The "NOFS" case here was originally using
PF_FSTRANS to protect against recursion from within transaction
contexts, not recursion through memory reclaim. Doing writeback
from memory reclaim is caught by the preceeding PF_MEMALLOC check,
not this one.
What it is supposed to be warning about is that writeback in XFS can
start new transactions and nesting transactions is a guaranteed way
to deadlock the journal. IOWs, doing writeback from an active
transaction context is a bug in XFS.
IOWs, we are waiting on a new version of this patchset to be posted:
https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/
so that we can get rid of this from iomap and check the transaction
recursion case directly in the XFS code. Then your problem goes away
completely....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems doing DIO to netfs cache on XFS from Ceph
2020-12-03 22:12 ` Dave Chinner
@ 2020-12-03 23:05 ` Matthew Wilcox
2020-12-04 1:50 ` Yafang Shao
0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2020-12-03 23:05 UTC (permalink / raw)
To: Dave Chinner
Cc: David Howells, Christoph Hellwig, jlayton, dchinner,
linux-fsdevel, linux-cachefs, Yafang Shao
Might be a good idea to cc Yafang on this ...
On Fri, Dec 04, 2020 at 09:12:02AM +1100, Dave Chinner wrote:
> On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> > Hi Christoph,
> >
> > We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> > you could have a look? Jeff Layton just tripped the attached warning from
> > this:
> >
> > /*
> > * Given that we do not allow direct reclaim to call us, we should
> > * never be called in a recursive filesystem reclaim context.
> > */
> > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > goto redirty;
>
> I've pointed out in other threads where issues like this have been
> raised that this check is not correct and was broken some time ago
> by the PF_FSTRANS removal. The "NOFS" case here was originally using
> PF_FSTRANS to protect against recursion from within transaction
> contexts, not recursion through memory reclaim. Doing writeback
> from memory reclaim is caught by the preceeding PF_MEMALLOC check,
> not this one.
>
> What it is supposed to be warning about is that writeback in XFS can
> start new transactions and nesting transactions is a guaranteed way
> to deadlock the journal. IOWs, doing writeback from an active
> transaction context is a bug in XFS.
>
> IOWs, we are waiting on a new version of this patchset to be posted:
>
> https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/
>
> so that we can get rid of this from iomap and check the transaction
> recursion case directly in the XFS code. Then your problem goes away
> completely....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Problems doing DIO to netfs cache on XFS from Ceph
2020-12-03 23:05 ` Matthew Wilcox
@ 2020-12-04 1:50 ` Yafang Shao
0 siblings, 0 replies; 6+ messages in thread
From: Yafang Shao @ 2020-12-04 1:50 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Dave Chinner, David Howells, Christoph Hellwig, jlayton,
Dave Chinner, linux-fsdevel, linux-cachefs
On Fri, Dec 4, 2020 at 7:05 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> Might be a good idea to cc Yafang on this ...
>
Thanks
> On Fri, Dec 04, 2020 at 09:12:02AM +1100, Dave Chinner wrote:
> > On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> > > Hi Christoph,
> > >
> > > We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> > > you could have a look? Jeff Layton just tripped the attached warning from
> > > this:
> > >
> > > /*
> > > * Given that we do not allow direct reclaim to call us, we should
> > > * never be called in a recursive filesystem reclaim context.
> > > */
> > > if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > > goto redirty;
> >
> > I've pointed out in other threads where issues like this have been
> > raised that this check is not correct and was broken some time ago
> > by the PF_FSTRANS removal. The "NOFS" case here was originally using
> > PF_FSTRANS to protect against recursion from within transaction
> > contexts, not recursion through memory reclaim. Doing writeback
> > from memory reclaim is caught by the preceeding PF_MEMALLOC check,
> > not this one.
> >
> > What it is supposed to be warning about is that writeback in XFS can
> > start new transactions and nesting transactions is a guaranteed way
> > to deadlock the journal. IOWs, doing writeback from an active
> > transaction context is a bug in XFS.
> >
> > IOWs, we are waiting on a new version of this patchset to be posted:
> >
> > https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/
> >
I will post it soon.
> > so that we can get rid of this from iomap and check the transaction
> > recursion case directly in the XFS code. Then your problem goes away
> > completely....
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
--
Thanks
Yafang
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-12-04 1:51 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
2020-12-03 15:50 ` Matthew Wilcox
2020-12-03 16:04 ` David Howells
2020-12-03 22:12 ` Dave Chinner
2020-12-03 23:05 ` Matthew Wilcox
2020-12-04 1:50 ` Yafang Shao
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.