All of lore.kernel.org
 help / color / mirror / Atom feed
* Problems doing DIO to netfs cache on XFS from Ceph
@ 2020-12-03 14:10 David Howells
  2020-12-03 15:50 ` Matthew Wilcox
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: David Howells @ 2020-12-03 14:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, jlayton, Matthew Wilcox (Oracle),
	dchinner, linux-fsdevel, linux-cachefs

Hi Christoph,

We're having a problem making the fscache/cachefiles rewrite work with XFS, if
you could have a look?  Jeff Layton just tripped the attached warning from
this:

	/*
	 * Given that we do not allow direct reclaim to call us, we should
	 * never be called in a recursive filesystem reclaim context.
	 */
	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
		goto redirty;

The chain of events is the following:

 (1) Ceph is asked to do an ordinary write by userspace.  It calls the fscache
     netfs_write_begin() helper to read the region it's going to modify so
     that the cache can be preloaded.

 (2) In this case, the cache already has it, so cachefiles_read() dispatches
     an async DIO read to the backing filesystem (in this case XFS).

 (3) iomap, on behalf of XFS, flushes the pagecache attached to the backing
     inode, which appears to be populated, causing do_writepages() to run.

 (4) The XFS write-out eventually wends its way to iomap_do_writepage(), which
     complains about NOFS being set and cancels the write.

Now, I'm doing:

	old_nofs = memalloc_nofs_save();
	ret = call_read_iter(file, &ki->iocb, iter);
	memalloc_nofs_restore(old_nofs);

in cachefiles_read() to prevent the cache causing writeout in the netfs to
occur.  Possibly overriding NOFS here is overkill and is only really necessary
in cachefiles_write() - which can be called from netfs writeback.
cachefiles_read() should only be called from netfs ->readpage(), ->readahead()
and ->write_begin() and maybe a workqueue in the case that the cache returns a
short read.

Note that I'm only doing async DIO reads and writes, so I was a bit surprised
that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
actually just a hint and the filesystem can turn it into buffered I/O if it
wants.

Thanks,
David
---
 WARNING: CPU: 6 PID: 7412 at fs/iomap/buffered-io.c:1465 iomap_do_writepage+0x76a/0x8b0
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
 RIP: 0010:iomap_do_writepage+0x76a/0x8b0
 Code: 89 f5 41 89 c7 48 83 7d 48 00 0f 85 6e fb ff ff 48 8b 44 24 48 48 8d 5c 24 48 48 39 d8 0f 84 5b fb ff ff 0f 0b e9 54 fb ff ff <0f> 0b e9 76 ff ff ff 0f 0b e9 64 fb ff ff 0f 0b e9 9a fb ff ff 0f
 RSP: 0018:ffffb19b4155f6e0 EFLAGS: 00010206
 RAX: 0000000000440100 RBX: ffffb19b4155f7a8 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: ffffb19b4155f940 RDI: ffffd1e484446740
 RBP: ffffb19b4155f868 R08: ffffffffffffffff R09: 0000000000030360
 R10: 0000000000000002 R11: 0000000000000006 R12: ffff8a5108ad4d30
 R13: 0000000000002a9a R14: ffffb19b4155f7b0 R15: ffffd1e484446740
 FS:  00007f6ff479d740(0000) GS:ffff8a542fb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000fc02a8 CR3: 000000013b218000 CR4: 00000000003506e0
 Call Trace:
  ? page_referenced_one+0x150/0x150
  ? __mod_memcg_lruvec_state+0x21/0xe0
  ? clear_page_dirty_for_io+0xf1/0x240
  write_cache_pages+0x186/0x3d0
  ? iomap_readahead+0x1b0/0x1b0
  ? blk_mq_submit_bio+0x2ee/0x4f0
  ? elv_rb_del+0x1f/0x30
  ? deadline_remove_request+0x55/0xb0
  ? dd_dispatch_request+0x151/0x210
  iomap_writepages+0x1c/0x40
  xfs_vm_writepages+0x56/0x70 [xfs]
  do_writepages+0x28/0xa0
  ? xfs_iunlock+0xa3/0xe0 [xfs]
  ? wbc_attach_and_unlock_inode+0xb5/0x140
  __filemap_fdatawrite_range+0xa7/0xe0
  filemap_write_and_wait_range+0x3d/0x90
  __iomap_dio_rw+0x149/0x490
  iomap_dio_rw+0xe/0x30
  xfs_file_dio_aio_read+0xb9/0x100 [xfs]
  xfs_file_read_iter+0xba/0xd0 [xfs]
  cachefiles_read+0x1ee/0x3f0 [cachefiles]
  ? netfs_subreq_terminated+0x240/0x240 [netfs]
  netfs_read_from_cache+0x70/0x80 [netfs]
  netfs_rreq_submit_slice+0x169/0x310 [netfs]
  netfs_write_begin+0x4e4/0x6a0 [netfs]
  ? ceph_put_fmode+0x43/0xd0 [ceph]
  ceph_write_begin+0x141/0x250 [ceph]
  generic_perform_write+0xaf/0x190
  ceph_write_iter+0xab6/0xc90 [ceph]
  ? _cond_resched+0x16/0x40
  ? __ceph_setattr+0x895/0x960 [ceph]
  ? new_sync_write+0x108/0x180
  new_sync_write+0x108/0x180
  vfs_write+0x1bc/0x270
  ksys_write+0x4f/0xc0
  do_syscall_64+0x33/0x40
  entry_SYSCALL_64_after_hwframe+0x44/0xa9


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems doing DIO to netfs cache on XFS from Ceph
  2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
@ 2020-12-03 15:50 ` Matthew Wilcox
  2020-12-03 16:04 ` David Howells
  2020-12-03 22:12 ` Dave Chinner
  2 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2020-12-03 15:50 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, jlayton, dchinner, linux-fsdevel, linux-cachefs

On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> Note that I'm only doing async DIO reads and writes, so I was a bit surprised
> that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
> actually just a hint and the filesystem can turn it into buffered I/O if it
> wants.

That's almost the exact opposite of what is going on.  XFS sees that
you're going to do an O_DIRECT read, so it writes back the dirty memory
that's currently in the page cache so that your read doesn't read stale
data from disk.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems doing DIO to netfs cache on XFS from Ceph
  2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
  2020-12-03 15:50 ` Matthew Wilcox
@ 2020-12-03 16:04 ` David Howells
  2020-12-03 22:12 ` Dave Chinner
  2 siblings, 0 replies; 6+ messages in thread
From: David Howells @ 2020-12-03 16:04 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: dhowells, Christoph Hellwig, jlayton, dchinner, linux-fsdevel,
	linux-cachefs

Matthew Wilcox <willy@infradead.org> wrote:

> > Note that I'm only doing async DIO reads and writes, so I was a bit surprised
> > that XFS is doing a writeback at all - but I guess that IOCB_DIRECT is
> > actually just a hint and the filesystem can turn it into buffered I/O if it
> > wants.
> 
> That's almost the exact opposite of what is going on.  XFS sees that
> you're going to do an O_DIRECT read, so it writes back the dirty memory
> that's currently in the page cache so that your read doesn't read stale
> data from disk.

In this trace, yes, that's true - but where did the dirty memory in the
pagecache come from?  I'm only doing DIO reads and DIO writes - oh, and as it
turns out, fallocate(FALLOC_FL_ZERO_RANGE) - which, I think, may be the source
of the dirty data.

David


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems doing DIO to netfs cache on XFS from Ceph
  2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
  2020-12-03 15:50 ` Matthew Wilcox
  2020-12-03 16:04 ` David Howells
@ 2020-12-03 22:12 ` Dave Chinner
  2020-12-03 23:05   ` Matthew Wilcox
  2 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2020-12-03 22:12 UTC (permalink / raw)
  To: David Howells
  Cc: Christoph Hellwig, jlayton, Matthew Wilcox (Oracle),
	dchinner, linux-fsdevel, linux-cachefs

On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> Hi Christoph,
> 
> We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> you could have a look?  Jeff Layton just tripped the attached warning from
> this:
> 
> 	/*
> 	 * Given that we do not allow direct reclaim to call us, we should
> 	 * never be called in a recursive filesystem reclaim context.
> 	 */
> 	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> 		goto redirty;

I've pointed out in other threads where issues like this have been
raised that this check is not correct and was broken some time ago
by the PF_FSTRANS removal. The "NOFS" case here was originally using
PF_FSTRANS to protect against recursion from within transaction
contexts, not recursion through memory reclaim.  Doing writeback
from memory reclaim is caught by the preceeding PF_MEMALLOC check,
not this one.

What it is supposed to be warning about is that writeback in XFS can
start new transactions and nesting transactions is a guaranteed way
to deadlock the journal. IOWs, doing writeback from an active
transaction context is a bug in XFS.

IOWs, we are waiting on a new version of this patchset to be posted:

https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/

so that we can get rid of this from iomap and check the transaction
recursion case directly in the XFS code. Then your problem goes away
completely....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems doing DIO to netfs cache on XFS from Ceph
  2020-12-03 22:12 ` Dave Chinner
@ 2020-12-03 23:05   ` Matthew Wilcox
  2020-12-04  1:50     ` Yafang Shao
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Wilcox @ 2020-12-03 23:05 UTC (permalink / raw)
  To: Dave Chinner
  Cc: David Howells, Christoph Hellwig, jlayton, dchinner,
	linux-fsdevel, linux-cachefs, Yafang Shao

Might be a good idea to cc Yafang on this ...

On Fri, Dec 04, 2020 at 09:12:02AM +1100, Dave Chinner wrote:
> On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> > Hi Christoph,
> > 
> > We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> > you could have a look?  Jeff Layton just tripped the attached warning from
> > this:
> > 
> > 	/*
> > 	 * Given that we do not allow direct reclaim to call us, we should
> > 	 * never be called in a recursive filesystem reclaim context.
> > 	 */
> > 	if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > 		goto redirty;
> 
> I've pointed out in other threads where issues like this have been
> raised that this check is not correct and was broken some time ago
> by the PF_FSTRANS removal. The "NOFS" case here was originally using
> PF_FSTRANS to protect against recursion from within transaction
> contexts, not recursion through memory reclaim.  Doing writeback
> from memory reclaim is caught by the preceeding PF_MEMALLOC check,
> not this one.
> 
> What it is supposed to be warning about is that writeback in XFS can
> start new transactions and nesting transactions is a guaranteed way
> to deadlock the journal. IOWs, doing writeback from an active
> transaction context is a bug in XFS.
> 
> IOWs, we are waiting on a new version of this patchset to be posted:
> 
> https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/
> 
> so that we can get rid of this from iomap and check the transaction
> recursion case directly in the XFS code. Then your problem goes away
> completely....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Problems doing DIO to netfs cache on XFS from Ceph
  2020-12-03 23:05   ` Matthew Wilcox
@ 2020-12-04  1:50     ` Yafang Shao
  0 siblings, 0 replies; 6+ messages in thread
From: Yafang Shao @ 2020-12-04  1:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Dave Chinner, David Howells, Christoph Hellwig, jlayton,
	Dave Chinner, linux-fsdevel, linux-cachefs

On Fri, Dec 4, 2020 at 7:05 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> Might be a good idea to cc Yafang on this ...
>

Thanks

> On Fri, Dec 04, 2020 at 09:12:02AM +1100, Dave Chinner wrote:
> > On Thu, Dec 03, 2020 at 02:10:56PM +0000, David Howells wrote:
> > > Hi Christoph,
> > >
> > > We're having a problem making the fscache/cachefiles rewrite work with XFS, if
> > > you could have a look?  Jeff Layton just tripped the attached warning from
> > > this:
> > >
> > >     /*
> > >      * Given that we do not allow direct reclaim to call us, we should
> > >      * never be called in a recursive filesystem reclaim context.
> > >      */
> > >     if (WARN_ON_ONCE(current->flags & PF_MEMALLOC_NOFS))
> > >             goto redirty;
> >
> > I've pointed out in other threads where issues like this have been
> > raised that this check is not correct and was broken some time ago
> > by the PF_FSTRANS removal. The "NOFS" case here was originally using
> > PF_FSTRANS to protect against recursion from within transaction
> > contexts, not recursion through memory reclaim.  Doing writeback
> > from memory reclaim is caught by the preceeding PF_MEMALLOC check,
> > not this one.
> >
> > What it is supposed to be warning about is that writeback in XFS can
> > start new transactions and nesting transactions is a guaranteed way
> > to deadlock the journal. IOWs, doing writeback from an active
> > transaction context is a bug in XFS.
> >
> > IOWs, we are waiting on a new version of this patchset to be posted:
> >
> > https://lore.kernel.org/linux-xfs/20201103131754.94949-1-laoar.shao@gmail.com/
> >

I will post it soon.

> > so that we can get rid of this from iomap and check the transaction
> > recursion case directly in the XFS code. Then your problem goes away
> > completely....
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-12-04  1:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 14:10 Problems doing DIO to netfs cache on XFS from Ceph David Howells
2020-12-03 15:50 ` Matthew Wilcox
2020-12-03 16:04 ` David Howells
2020-12-03 22:12 ` Dave Chinner
2020-12-03 23:05   ` Matthew Wilcox
2020-12-04  1:50     ` Yafang Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.