All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: David Howells <dhowells@redhat.com>
Cc: willy@infradead.org, hch@lst.de, trond.myklebust@primarydata.com,
	Theodore Ts'o <tytso@mit.edu>,
	linux-block@vger.kernel.org, ceph-devel@vger.kernel.org,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-mm@kvack.org, Bob Liu <bob.liu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Jens Axboe <axboe@kernel.dk>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-cifs@vger.kernel.org,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
	Minchan Kim <minchan@kernel.org>,
	Steve French <sfrench@samba.org>, NeilBrown <neilb@suse.de>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	linux-nfs@vger.kernel.org, Ilya Dryomov <idryomov@gmail.com>,
	linux-btrfs@vger.kernel.org, viro@zeniv.linux.org.uk,
	torvalds@linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC][PATCH v3 0/9] mm: Use DIO for swap and fix NFS swapfiles
Date: Sun, 26 Sep 2021 09:42:43 +1000	[thread overview]
Message-ID: <20210925234243.GA1756565@dread.disaster.area> (raw)
In-Reply-To: <163250387273.2330363.13240781819520072222.stgit@warthog.procyon.org.uk>

On Fri, Sep 24, 2021 at 06:17:52PM +0100, David Howells wrote:
> 
> Hi Willy, Trond, Christoph,
> 
> Here's v3 of a change to make reads and writes from the swapfile use async
> DIO, adding a new ->swap_rw() address_space method, rather than readpage()
> or direct_IO(), as requested by Willy.  This allows NFS to bypass the write
> checks that prevent swapfiles from working, plus a bunch of other checks
> that may or may not be necessary.
> 
> Whilst trying to make this work, I found that NFS's support for swapfiles
> seems to have been non-functional since Aug 2019 (I think), so the first
> patch fixes that.  Question is: do we actually *want* to keep this
> functionality, given that it seems that no one's tested it with an upstream
> kernel in the last couple of years?
> 
> There are additional patches to get rid of noop_direct_IO and replace it
> with a feature bitmask, to make btrfs, ext4, xfs and raw blockdevs use the
> new ->swap_rw method and thence remove the direct BIO submission paths from
> swap.
> 
> I kept the IOCB_SWAP flag, using it to enable REQ_SWAP.  I'm not sure if
> that's necessary, but it seems accounting related.
> 
> The synchronous DIO I/O code on NFS, raw blockdev, ext4 swapfile and xfs
> swapfile all seem to work fine.  Btrfs refuses to swapon because the file
> might be CoW'd.  I've tried doing "chattr +C", but that didn't help.

Ok, so if the filesystem is doing block mapping in the IO path now,
why does the swap file still need to map the file into a private
block mapping now?  i.e all the work that iomap_swapfile_activate()
does for filesystems like XFS and ext4 - it's this completely
redundant now that we are doing block mapping during swap file IO
via iomap_dio_rw()?

Actually, that path does all the "can we use this file as a swap
file" checking. So the extent iteration can't go away, just the swap
file mapping part (iomap_swapfile_add_extent()). This is necessary
to ensure there aren't any holes in the file, and we still need that
because the DIO write path will allocate into holes, which leads
me to my main concern here.

Using the DIO path opens up the possibility that the filesystem
could want to run transactions are part of the DIO. Right now we
support unwritten extents for swap files (so they don't have to be
written to allocate the backing store before activation) and that
means we'll be doing DIO to unwritten extents. IO completion of a
DIO write to an unwritten extent will run a transaction to convert
that extent to written. A similar problem with sparse files exists,
because allocation of blocks can be done from the DIO path, and that
requires transactions. File extension is another potential
transaction path we open up by using DIO writes dor swap.

The problem is that a transaction run in swap IO context will will
deadlock the filesystem. Either through the unbound memory demand of
metadata modification, or from needing log space that can't be freed
up because the metadata IO that will free the log space is waiting
on memory allocation that is waiting on swap IO...

I think some more thought needs to be put into controlling the
behaviour/semantics of the DIO path so that it can be safely used
by swap IO, because it's not a direct 1:1 behavioural mapping with
existing DIO and there are potential deadlock vectors we need to
avoid.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2021-09-25 23:43 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-24 17:17 [RFC][PATCH v3 0/9] mm: Use DIO for swap and fix NFS swapfiles David Howells
2021-09-24 17:18 ` [PATCH v3 1/9] mm: Remove the callback func argument from __swap_writepage() David Howells
2021-09-24 17:18 ` [PATCH v3 2/9] mm: Add 'supports' field to the address_space_operations to list features David Howells
2021-09-24 20:10   ` Matthew Wilcox
2021-09-24 17:18 ` [PATCH v3 3/9] mm: Make swap_readpage() void David Howells
2021-09-24 22:07   ` Matthew Wilcox
2021-09-24 17:18 ` [PATCH v3 4/9] Introduce IOCB_SWAP kiocb flag to trigger REQ_SWAP David Howells
2021-09-26 21:56   ` Dave Chinner
2021-09-24 17:18 ` [PATCH v3 5/9] mm: Make swap_readpage() for SWP_FS_OPS use ->swap_rw() not ->readpage() David Howells
2021-09-24 17:18 ` [PATCH v3 6/9] mm: Make __swap_writepage() do async DIO if asked for it David Howells
2021-09-24 17:19 ` [PATCH v3 7/9] nfs: Fix write to swapfile failure due to generic_write_checks() David Howells
2021-09-24 17:19 ` [PATCH v3 8/9] block, btrfs, ext4, xfs: Implement swap_rw David Howells
2021-09-24 17:19 ` [PATCH v3 9/9] mm: Remove swap BIO paths and only use DIO paths David Howells
2021-09-25 14:56   ` Matthew Wilcox
2021-09-25 15:36   ` David Howells
2021-09-25 17:09     ` Matthew Wilcox
2021-09-26 23:08       ` Damien Le Moal
2021-09-27  1:25         ` Dave Chinner
2021-09-27  1:41           ` Damien Le Moal
2021-09-27 20:03     ` David Sterba
2021-09-25 23:42 ` Dave Chinner [this message]
2021-09-26  3:10   ` [RFC][PATCH v3 0/9] mm: Use DIO for swap and fix NFS swapfiles Matthew Wilcox
2021-09-26 22:36     ` Dave Chinner
2021-09-27 20:07 ` David Sterba
2021-09-28  3:11 ` NeilBrown
2021-09-30 15:54   ` Steve French
2021-09-30 15:54     ` Steve French
2021-09-29 15:45 ` David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210925234243.GA1756565@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=anna.schumaker@netapp.com \
    --cc=axboe@kernel.dk \
    --cc=bob.liu@oracle.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hch@lst.de \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=neilb@suse.de \
    --cc=sfrench@samba.org \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=trond.myklebust@primarydata.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.