linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm@lists.01.org, Jan Kara <jack@suse.cz>,
	Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, snitzer@redhat.com
Subject: Re: [PATCH v8 18/18] xfs, dax: introduce xfs_break_dax_layouts()
Date: Wed, 4 Apr 2018 11:55:18 +0200	[thread overview]
Message-ID: <20180404095518.65exgpxuca3tqhav@quack2.suse.cz> (raw)
In-Reply-To: <152246902607.36038.15813002361509305325.stgit@dwillia2-desk3.amr.corp.intel.com>

On Fri 30-03-18 21:03:46, Dan Williams wrote:
> xfs_break_dax_layouts(), similar to xfs_break_leased_layouts(), scans
> for busy / pinned dax pages and waits for those pages to go idle before
> any potential extent unmap operation.
> 
> dax_layout_busy_page() handles synchronizing against new page-busy
> events (get_user_pages). It invalidates all mappings to trigger the
> get_user_pages slow path which will eventually block on the xfs inode
> lock held in XFS_MMAPLOCK_EXCL mode. If dax_layout_busy_page() finds a
> busy page it returns it for xfs to wait for the page-idle event that
> will fire when the page reference count reaches 1 (recall ZONE_DEVICE
> pages are idle at count 1, see generic_dax_pagefree()).
> 
> While waiting, the XFS_MMAPLOCK_EXCL lock is dropped in order to not
> deadlock the process that might be trying to elevate the page count of
> more pages before arranging for any of them to go idle. I.e. the typical
> case of submitting I/O is that iov_iter_get_pages() elevates the
> reference count of all pages in the I/O before starting I/O on the first
> page. The process of elevating the reference count of all pages involved
> in an I/O may cause faults that need to take XFS_MMAPLOCK_EXCL.
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

...

> ---
>  fs/xfs/xfs_file.c |   60 +++++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 49 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 51e6506bdcb1..0342f6fb782f 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -752,6 +752,38 @@ xfs_file_write_iter(
>  	return ret;
>  }
>  
> +static void
> +xfs_wait_var_event(
> +	struct inode		*inode,
> +	uint			iolock,
> +	bool			*did_unlock)
> +{
> +	struct xfs_inode        *ip = XFS_I(inode);
> +
> +	*did_unlock = true;
> +	xfs_iunlock(ip, iolock);
> +	schedule();
> +	xfs_ilock(ip, iolock);
> +}

With this scheme, there's a problem that it can be easily livelocked. E.g.
when I created a program that maps a file on DAX fs and does AIO DIO from
it indefinitely (with 64 iocbs in flight), truncate of that file never gets
past xfs_break_layouts(). The reason is that once we drop all locks, new
iocbs can be submitted, they grab new page references and these prevent
truncation next time... So I think we need to somehow fix this retry scheme
so that we guarantee forward progress of the truncate. E.g. if we kept
IOLOCK locked, that would prevent new iocbs from being submitted...

								Honza


> +
> +static int
> +xfs_break_dax_layouts(
> +	struct inode		*inode,
> +	uint			iolock,
> +	bool			*did_unlock)
> +{
> +	struct page		*page;
> +
> +	*did_unlock = false;
> +	page = dax_layout_busy_page(inode->i_mapping);
> +	if (!page)
> +		return 0;
> +
> +	return ___wait_var_event(&page->_refcount,
> +			atomic_read(&page->_refcount) == 1, TASK_INTERRUPTIBLE,
> +			0, 0, xfs_wait_var_event(inode, iolock, did_unlock));
> +}
> +
>  int
>  xfs_break_layouts(
>  	struct inode		*inode,
> @@ -763,17 +795,23 @@ xfs_break_layouts(
>  
>  	ASSERT(xfs_isilocked(XFS_I(inode), XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL));
>  
> -	switch (reason) {
> -	case BREAK_UNMAP:
> -		ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL));
> -		/* fall through */
> -	case BREAK_WRITE:
> -		error = xfs_break_leased_layouts(inode, iolock, &retry);
> -		break;
> -	default:
> -		WARN_ON_ONCE(1);
> -		return -EINVAL;
> -	}
> +	do {
> +		switch (reason) {
> +		case BREAK_UNMAP:
> +			ASSERT(xfs_isilocked(XFS_I(inode), XFS_MMAPLOCK_EXCL));
> +
> +			error = xfs_break_dax_layouts(inode, *iolock, &retry);
> +			/* fall through */
> +		case BREAK_WRITE:
> +			if (error || retry)
> +				break;
> +			error = xfs_break_leased_layouts(inode, iolock, &retry);
> +			break;
> +		default:
> +			WARN_ON_ONCE(1);
> +			return -EINVAL;
> +		}
> +	} while (error == 0 && retry);
>  
>  	return error;
>  }
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

      reply	other threads:[~2018-04-04  9:55 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-31  4:02 [PATCH v8 00/18] dax: fix dma vs truncate/hole-punch Dan Williams
2018-03-31  4:02 ` [PATCH v8 01/18] dax: store pfns in the radix Dan Williams
2018-03-31  4:02 ` [PATCH v8 02/18] fs, dax: prepare for dax-specific address_space_operations Dan Williams
2018-03-31  4:02 ` [PATCH v8 03/18] block, dax: remove dead code in blkdev_writepages() Dan Williams
2018-03-31  4:02 ` [PATCH v8 04/18] xfs, dax: introduce xfs_dax_aops Dan Williams
2018-03-31  4:02 ` [PATCH v8 05/18] ext4, dax: introduce ext4_dax_aops Dan Williams
2018-04-03 11:50   ` Jan Kara
2018-03-31  4:02 ` [PATCH v8 06/18] ext2, dax: introduce ext2_dax_aops Dan Williams
2018-04-03 11:51   ` Jan Kara
2018-03-31  4:02 ` [PATCH v8 07/18] fs, dax: use page->mapping to warn if truncate collides with a busy page Dan Williams
2018-03-31  4:02 ` [PATCH v8 08/18] dax: introduce CONFIG_DAX_DRIVER Dan Williams
2018-03-31  4:02 ` [PATCH v8 09/18] dax, dm: allow device-mapper to operate without dax support Dan Williams
2018-03-31  4:03 ` [PATCH v8 10/18] dax, dm: introduce ->fs_{claim, release}() dax_device infrastructure Dan Williams
2018-04-03 18:24   ` Dan Williams
2018-04-03 19:39     ` Mike Snitzer
2018-04-03 19:47       ` Dan Williams
2018-04-03 20:36         ` [PATCH v9] " Dan Williams
2018-04-03 21:13           ` Mike Snitzer
2018-03-31  4:03 ` [PATCH v8 11/18] mm, dax: enable filesystems to trigger dev_pagemap ->page_free callbacks Dan Williams
2018-04-04 21:23   ` [v8, " Andrei Vagin
2018-04-04 21:27     ` Dan Williams
2018-04-04 21:35       ` Dan Williams
2018-04-04 23:19         ` Stephen Rothwell
2018-04-04 21:40     ` Andrei Vagin
2018-03-31  4:03 ` [PATCH v8 12/18] memremap: split devm_memremap_pages() and memremap() infrastructure Dan Williams
2018-03-31  4:03 ` [PATCH v8 13/18] mm, dev_pagemap: introduce CONFIG_DEV_PAGEMAP_OPS Dan Williams
2018-03-31  4:03 ` [PATCH v8 14/18] memremap: mark devm_memremap_pages() EXPORT_SYMBOL_GPL Dan Williams
2018-03-31  4:03 ` [PATCH v8 15/18] mm, fs, dax: handle layout changes to pinned dax mappings Dan Williams
2018-04-04  9:46   ` Jan Kara
2018-04-04 10:06     ` Jan Kara
2018-04-04 14:12     ` Dan Williams
2018-04-07 19:38     ` Dan Williams
2018-04-08  3:11       ` Paul E. McKenney
2018-04-09 16:39         ` Jan Kara
2018-04-09 18:14           ` Paul E. McKenney
2018-04-09 16:49       ` Jan Kara
2018-04-09 16:51         ` Dan Williams
2018-04-13 22:03           ` Dan Williams
2018-04-13 22:48             ` Paul E. McKenney
2018-04-19 10:44             ` Jan Kara
2018-04-20  3:00               ` Dan Williams
2018-03-31  4:03 ` [PATCH v8 16/18] xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL Dan Williams
2018-03-31  4:03 ` [PATCH v8 17/18] xfs: prepare xfs_break_layouts() for another layout type Dan Williams
2018-03-31  4:03 ` [PATCH v8 18/18] xfs, dax: introduce xfs_break_dax_layouts() Dan Williams
2018-04-04  9:55   ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180404095518.65exgpxuca3tqhav@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=snitzer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).