All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "mgorman@suse.de" <mgorman@suse.de>,
	"dhowells@redhat.com" <dhowells@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"neilb@suse.de" <neilb@suse.de>,
	"anna.schumaker@netapp.com" <anna.schumaker@netapp.com>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 22/23] NFS: swap-out must always use STABLE writes.
Date: Wed, 26 Jan 2022 03:45:42 +0000	[thread overview]
Message-ID: <e50bf6286a89d60ee0879e55a30b15d84e97d9a4.camel@hammerspace.com> (raw)
In-Reply-To: <164299611287.26253.13462969110743208198.stgit@noble.brown>

On Mon, 2022-01-24 at 14:48 +1100, NeilBrown wrote:
> The commit handling code is not safe against memory-pressure
> deadlocks
> when writing to swap.  In particular, nfs_commitdata_alloc() blocks
> indefinitely waiting for memory, and this can consume all available
> workqueue threads.
> 
> swap-out most likely uses STABLE writes anyway as COND_STABLE
> indicates
> that a stable write should be used if the write fits in a single
> request, and it normally does.  However if we ever swap with a small
> wsize, or gather unusually large numbers of pages for a single write,
> this might change.
> 
> For safety, make it explicit in the code that direct writes used for
> swap
> must always use FLUSH_COND_STABLE.

OK. Your explanation above has me extremely confused.

If you want to avoid commit, then you should be using FLUSH_STABLE,
since that forces the writes to be synchronous. FLUSH_COND_STABLE can
and will use unstable writes if it sees that there are more writes to
come.

> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfs/direct.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 43a956d7fd62..29c007b2a17a 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -791,7 +791,7 @@ static const struct nfs_pgio_completion_ops
> nfs_direct_write_completion_ops = {
>   */
>  static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req
> *dreq,
>                                                struct iov_iter *iter,
> -                                              loff_t pos)
> +                                              loff_t pos, int
> ioflags)
>  {
>         struct nfs_pageio_descriptor desc;
>         struct inode *inode = dreq->inode;
> @@ -799,7 +799,7 @@ static ssize_t
> nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
>         size_t requested_bytes = 0;
>         size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize,
> PAGE_SIZE);
>  
> -       nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false,
> +       nfs_pageio_init_write(&desc, inode, ioflags, false,
>                               &nfs_direct_write_completion_ops);
>         desc.pg_dreq = dreq;
>         get_dreq(dreq);
> @@ -905,6 +905,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb,
> struct iov_iter *iter,
>         struct nfs_direct_req *dreq;
>         struct nfs_lock_context *l_ctx;
>         loff_t pos, end;
> +       int ioflags = swap ? FLUSH_COND_STABLE : FLUSH_STABLE;

This is an unacceptable change in behaviour for the non-swap case, so
NACK.

>  
>         dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n",
>                 file, iov_iter_count(iter), (long long) iocb-
> >ki_pos);
> @@ -947,7 +948,7 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb,
> struct iov_iter *iter,
>         if (!swap)
>                 nfs_start_io_direct(inode);
>  
> -       requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
> +       requested = nfs_direct_write_schedule_iovec(dreq, iter, pos,
> ioflags);
>  
>         if (mapping->nrpages) {
>                 invalidate_inode_pages2_range(mapping,
> 
> 

-- 
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
​
www.hammer.space


  reply	other threads:[~2022-01-26  3:45 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-24  3:48 [PATCH 00/23 V3] Repair SWAP-over_NFS NeilBrown
2022-01-24  3:48 ` [PATCH 05/23] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space NeilBrown
2022-01-24  7:31   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 03/23] MM: drop swap_set_page_dirty NeilBrown
2022-01-24  7:28   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 14/23] NFS: swap IO handling is slightly different for O_DIRECT IO NeilBrown
2022-01-24  8:58   ` Christoph Hellwig
2022-01-24 13:22   ` Mark Hemment
2022-01-26 22:51     ` NeilBrown
2022-01-24  3:48 ` [PATCH 22/23] NFS: swap-out must always use STABLE writes NeilBrown
2022-01-26  3:45   ` Trond Myklebust [this message]
2022-01-26 21:42     ` NeilBrown
2022-01-24  3:48 ` [PATCH 23/23] SUNRPC: lock against ->sock changing during sysfs read NeilBrown
2022-01-24  3:48 ` [PATCH 08/23] DOC: update documentation for swap_activate and swap_rw NeilBrown
2022-01-24  8:50   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 07/23] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw NeilBrown
2022-01-24  8:49   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 02/23] MM: extend block-plugging to cover all swap reads with read-ahead NeilBrown
2022-01-24  7:27   ` Christoph Hellwig
2022-01-26 21:47     ` NeilBrown
2022-01-26 23:09       ` Hugh Dickins
2022-01-27  0:32         ` NeilBrown
2022-01-24  3:48 ` [PATCH 16/23] SUNRPC/auth: async tasks mustn't block waiting for memory NeilBrown
2022-01-24  3:48 ` [PATCH 04/23] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate NeilBrown
2022-01-24  7:30   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 06/23] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space NeilBrown
2022-01-24  8:48   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 15/23] SUNRPC/call_alloc: async tasks mustn't block waiting for memory NeilBrown
2022-01-24  3:48 ` [PATCH 20/23] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC NeilBrown
2022-01-24  3:48 ` [PATCH 01/23] MM: create new mm/swap.h header file NeilBrown
2022-02-07 13:51   ` Geert Uytterhoeven
2022-01-24  3:48 ` [PATCH 09/23] MM: submit multipage reads for SWP_FS_OPS swap-space NeilBrown
2022-01-24  8:25   ` kernel test robot
2022-01-24  8:25     ` kernel test robot
2022-01-24  8:52   ` Christoph Hellwig
2022-01-24  9:27   ` kernel test robot
2022-01-24  9:27     ` kernel test robot
2022-01-24 13:16   ` Mark Hemment
2022-01-26 22:04     ` NeilBrown
2022-02-08 11:07   ` Geert Uytterhoeven
2022-01-24  3:48 ` [PATCH 12/23] NFS: remove IS_SWAPFILE hack NeilBrown
2022-01-24  8:56   ` Christoph Hellwig
2022-01-24  3:48 ` [PATCH 19/23] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NeilBrown
2022-01-24  3:48 ` [PATCH 17/23] SUNRPC/xprt: async tasks mustn't block waiting for memory NeilBrown
2022-01-24  3:48 ` [PATCH 18/23] SUNRPC: remove scheduling boost for "SWAPPER" tasks NeilBrown
2022-01-24  3:48 ` [PATCH 21/23] NFSv4: keep state manager thread active if swap is enabled NeilBrown
2022-01-24  3:48 ` [PATCH 11/23] VFS: Add FMODE_CAN_ODIRECT file flag NeilBrown
2022-01-24  8:56   ` Christoph Hellwig
2022-01-26 22:14     ` NeilBrown
2022-01-24  3:48 ` [PATCH 10/23] MM: submit multipage write for SWP_FS_OPS swap-space NeilBrown
2022-01-24  8:55   ` Christoph Hellwig
2022-01-24 10:29   ` kernel test robot
2022-01-24 10:29     ` kernel test robot
2022-01-24  3:48 ` [PATCH 13/23] NFS: rename nfs_direct_IO and use as ->swap_rw NeilBrown
2022-01-24  8:57   ` Christoph Hellwig
2022-02-07 17:55 ` [PATCH 00/23 V3] Repair SWAP-over_NFS Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e50bf6286a89d60ee0879e55a30b15d84e97d9a4.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=akpm@linux-foundation.org \
    --cc=anna.schumaker@netapp.com \
    --cc=chuck.lever@oracle.com \
    --cc=dhowells@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.