All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Howells <dhowells@redhat.com>
To: willy@infradead.org
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Seth Jennings <sjenning@linux.vnet.ibm.com>,
	Bob Liu <bob.liu@oracle.com>,
	linux-nfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dan Magenheimer <dan.magenheimer@oracle.com>,
	Trond Myklebust <trond.myklebust@hammerspace.com>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Minchan Kim <minchan@kernel.org>,
	dhowells@redhat.com, dhowells@redhat.com,
	trond.myklebust@primarydata.com, darrick.wong@oracle.com,
	hch@lst.de, viro@zeniv.linux.org.uk, jlayton@kernel.org,
	sfrench@samba.org, torvalds@linux-foundation.org,
	linux-nfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [RFC PATCH v2 0/5] mm: Fix NFS swapfiles and use DIO for swapfiles
Date: Thu, 12 Aug 2021 21:21:57 +0100	[thread overview]
Message-ID: <162879971699.3306668.8977537647318498651.stgit@warthog.procyon.org.uk> (raw)


Hi Willy, Trond,

Here's v2 of a change to make reads and writes from the swapfile use async
DIO, via the ->direct_IO() method, rather than readpage(), as requested by
Willy.

The consensus seems to be that this is probably the wrong approach and
->direct_IO() needs replacing with a swap-specific method - but that will
require a bunch of filesystems to be modified also.

Note that I'm refcounting the kiocb struct around the call to
->direct_IO().  This is required in cachefiles where I'm going in through
the ->read_iter/->write_iter methods as both core routines and filesystems
touch kiocb *after* calling the completion routine.  Should this practice
be disallowed?

I've also added an additional patch to try and remove the bio-submission
path entirely from swap_readpage/writepage code and only go down the
->direct_IO route, but this fails spectacularly (from I/O errors to ATA
failure messages on the test disk using a normal swapspace).  This was
suggested by Willy as the bio-submission code is a potential data corruptor
if it's asked to write out a compound page that crosses extent boundaries.

Whilst trying to make this work, I found that NFS's support for swapfiles
seems to have been non-functional since Aug 2019 (I think), so the first
patch fixes that.  Question is: do we actually *want* to keep this
functionality, given that it seems that no one's tested it with an upstream
kernel in the last couple of years?

My patches can be found here also:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=swap-dio

I tested this using the procedure and program outlined in the first patch.

I also encountered occasional instances of the following warning, so I'm
wondering if there's a scheduling problem somewhere:

BUG: workqueue lockup - pool cpus=0-3 flags=0x5 nice=0 stuck for 34s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    in-flight: 1565:fill_page_cache_func
workqueue events_highpri: flags=0x10
  pwq 3: cpus=1 node=0 flags=0x1 nice=-20 active=1/256 refcnt=2
    in-flight: 1547:fill_page_cache_func
  pwq 1: cpus=0 node=0 flags=0x0 nice=-20 active=1/256 refcnt=2
    in-flight: 1811:fill_page_cache_func
workqueue events_unbound: flags=0x2
  pwq 8: cpus=0-3 flags=0x5 nice=0 active=3/512 refcnt=5
    pending: fsnotify_connector_destroy_workfn, fsnotify_mark_destroy_workfn, cleanup_offline_cgwbs_workfn
workqueue events_power_efficient: flags=0x82
  pwq 8: cpus=0-3 flags=0x5 nice=0 active=4/256 refcnt=6
    pending: neigh_periodic_work, neigh_periodic_work, check_lifetime, do_cache_clean
workqueue writeback: flags=0x4a
  pwq 8: cpus=0-3 flags=0x5 nice=0 active=1/256 refcnt=4
    in-flight: 433(RESCUER):wb_workfn
workqueue rpciod: flags=0xa
  pwq 8: cpus=0-3 flags=0x5 nice=0 active=38/256 refcnt=40
    in-flight: 7:rpc_async_schedule, 1609:rpc_async_schedule, 1610:rpc_async_schedule, 912:rpc_async_schedule, 1613:rpc_async_schedule, 1631:rpc_async_schedule, 34:rpc_async_schedule, 44:rpc_async_schedule
    pending: rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule, rpc_async_schedule
workqueue ext4-rsv-conversion: flags=0x2000a
pool 1: cpus=0 node=0 flags=0x0 nice=-20 hung=59s workers=2 idle: 6
pool 3: cpus=1 node=0 flags=0x1 nice=-20 hung=43s workers=2 manager: 20
pool 6: cpus=3 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 498 29
pool 8: cpus=0-3 flags=0x5 nice=0 hung=34s workers=9 manager: 1623
pool 9: cpus=0-3 flags=0x5 nice=-20 hung=0s workers=2 manager: 5224 idle: 859

Note that this is due to DIO writes to NFS only, as far as I can tell, and
that no reads had happened yet.

Changes:
========
ver #2:
   - Remove the callback param to __swap_writepage() as it's invariant.
   - Allocate the kiocb on the stack in sync mode.
   - Do an async DIO write if WB_SYNC_ALL isn't set.
   - Try to remove the BIO submission paths.

David

Link: https://lore.kernel.org/r/162876946134.3068428.15475611190876694695.stgit@warthog.procyon.org.uk/ # v1
---
David Howells (5):
      nfs: Fix write to swapfile failure due to generic_write_checks()
      mm: Remove the callback func argument from __swap_writepage()
      mm: Make swap_readpage() for SWP_FS_OPS use ->direct_IO() not ->readpage()
      mm: Make __swap_writepage() do async DIO if asked for it
      mm: Remove swap BIO paths and only use DIO paths [BROKEN]


 fs/direct-io.c       |   2 +
 include/linux/bio.h  |   2 +
 include/linux/fs.h   |   1 +
 include/linux/swap.h |   4 +-
 mm/page_io.c         | 379 ++++++++++++++++++++++---------------------
 mm/zswap.c           |   2 +-
 6 files changed, 204 insertions(+), 186 deletions(-)



             reply	other threads:[~2021-08-12 20:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-12 20:21 David Howells [this message]
2021-08-12 20:22 ` [RFC PATCH v2 1/5] nfs: Fix write to swapfile failure due to generic_write_checks() David Howells
2021-08-13  3:09   ` Matthew Wilcox
2021-08-13  7:02     ` Christoph Hellwig
2021-08-19 22:38   ` NeilBrown
2021-08-12 20:22 ` [RFC PATCH v2 2/5] mm: Remove the callback func argument from __swap_writepage() David Howells
2021-08-13  7:03   ` Christoph Hellwig
2021-08-12 20:22 ` [RFC PATCH v2 3/5] mm: Make swap_readpage() for SWP_FS_OPS use ->direct_IO() not ->readpage() David Howells
2021-08-12 21:23   ` Matthew Wilcox
2021-08-13  7:12   ` Christoph Hellwig
2021-08-12 20:22 ` [RFC PATCH v2 4/5] mm: Make __swap_writepage() do async DIO if asked for it David Howells
2021-08-12 20:22 ` [RFC PATCH v2 5/5] mm: Remove swap BIO paths and only use DIO paths [BROKEN] David Howells
2021-08-13  7:14   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=162879971699.3306668.8977537647318498651.stgit@warthog.procyon.org.uk \
    --to=dhowells@redhat.com \
    --cc=bob.liu@oracle.com \
    --cc=dan.magenheimer@oracle.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=minchan@kernel.org \
    --cc=sfrench@samba.org \
    --cc=sjenning@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=trond.myklebust@hammerspace.com \
    --cc=trond.myklebust@primarydata.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.