From: Filipe Manana <fdmanana@gmail.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>,
xfs <linux-xfs@vger.kernel.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v3] vfs: fix page locking deadlocks when deduping files
Date: Tue, 13 Aug 2019 16:53:09 +0100 [thread overview]
Message-ID: <CAL3q7H6bL1DO-3mAk5yPncVF62=ehStz7kZMTYK_4nXQ1H3k-A@mail.gmail.com> (raw)
In-Reply-To: <20190813151434.GQ7138@magnolia>
On Tue, Aug 13, 2019 at 4:15 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> When dedupe wants to use the page cache to compare parts of two files
> for dedupe, we must be very careful to handle locking correctly. The
> current code doesn't do this. It must lock and unlock the page only
> once if the two pages are the same, since the overlapping range check
> doesn't catch this when blocksize < pagesize. If the pages are distinct
> but from the same file, we must observe page locking order and lock them
> in order of increasing offset to avoid clashing with writeback locking.
>
> Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
We actually had the same bug in btrfs, before we had cloning/dedupe in
vfs/xfs/etc, and fixed it back in 2017 [1].
I totally missed this behaviour in the vfs helpers when I updated
btrfs to use them some months ago.
Thanks.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b1517622f2524f531113b12c27b9a0ea69c38983
> ---
> v3: revalidate page after locking it
> v2: provide an unlock helper
> ---
> fs/read_write.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
> 1 file changed, 42 insertions(+), 8 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 1f5088dec566..da341eb3033c 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
> return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
> }
>
> -/*
> - * Read a page's worth of file data into the page cache. Return the page
> - * locked.
> - */
> +/* Read a page's worth of file data into the page cache. */
> static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
> {
> struct page *page;
> @@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
> put_page(page);
> return ERR_PTR(-EIO);
> }
> - lock_page(page);
> return page;
> }
>
> +/*
> + * Lock two pages, ensuring that we lock in offset order if the pages are from
> + * the same file.
> + */
> +static void vfs_lock_two_pages(struct page *page1, struct page *page2)
> +{
> + /* Always lock in order of increasing index. */
> + if (page1->index > page2->index)
> + swap(page1, page2);
> +
> + lock_page(page1);
> + if (page1 != page2)
> + lock_page(page2);
> +}
> +
> +/* Unlock two pages, being careful not to unlock the same page twice. */
> +static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
> +{
> + unlock_page(page1);
> + if (page1 != page2)
> + unlock_page(page2);
> +}
> +
> /*
> * Compare extents of two files to see if they are the same.
> * Caller must have locked both inodes to prevent write races.
> @@ -1867,10 +1886,25 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
> dest_page = vfs_dedupe_get_page(dest, destoff);
> if (IS_ERR(dest_page)) {
> error = PTR_ERR(dest_page);
> - unlock_page(src_page);
> put_page(src_page);
> goto out_error;
> }
> +
> + vfs_lock_two_pages(src_page, dest_page);
> +
> + /*
> + * Now that we've locked both pages, make sure they still
> + * represent the data we're interested in. If not, someone
> + * is invalidating pages on us and we lose.
> + */
> + if (src_page->mapping != src->i_mapping ||
> + src_page->index != srcoff >> PAGE_SHIFT ||
> + dest_page->mapping != dest->i_mapping ||
> + dest_page->index != destoff >> PAGE_SHIFT) {
> + same = false;
> + goto unlock;
> + }
> +
> src_addr = kmap_atomic(src_page);
> dest_addr = kmap_atomic(dest_page);
>
> @@ -1882,8 +1916,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
>
> kunmap_atomic(dest_addr);
> kunmap_atomic(src_addr);
> - unlock_page(dest_page);
> - unlock_page(src_page);
> +unlock:
> + vfs_unlock_two_pages(src_page, dest_page);
> put_page(dest_page);
> put_page(src_page);
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
prev parent reply other threads:[~2019-08-13 15:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-13 15:14 [PATCH v3] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
2019-08-13 15:40 ` Matthew Wilcox
2019-08-14 7:03 ` Gao Xiang
2019-08-14 7:17 ` Gao Xiang
2019-08-14 9:54 ` Dave Chinner
2019-08-14 15:33 ` Darrick J. Wong
2019-08-14 21:28 ` Dave Chinner
2019-08-15 0:41 ` Darrick J. Wong
2019-08-13 15:53 ` Filipe Manana [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAL3q7H6bL1DO-3mAk5yPncVF62=ehStz7kZMTYK_4nXQ1H3k-A@mail.gmail.com' \
--to=fdmanana@gmail.com \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).