linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] vfs: fix page locking deadlocks when deduping files
@ 2019-08-15 16:49 Darrick J. Wong
  2019-08-15 18:18 ` Matthew Wilcox
  2019-08-16 16:13 ` [PATCH v5] " Darrick J. Wong
  0 siblings, 2 replies; 5+ messages in thread
From: Darrick J. Wong @ 2019-08-15 16:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs, linux-fsdevel, fdmanana, gaoxiang25, willy

From: Darrick J. Wong <darrick.wong@oracle.com>

When dedupe wants to use the page cache to compare parts of two files
for dedupe, we must be very careful to handle locking correctly.  The
current code doesn't do this.  It must lock and unlock the page only
once if the two pages are the same, since the overlapping range check
doesn't catch this when blocksize < pagesize.  If the pages are distinct
but from the same file, we must observe page locking order and lock them
in order of increasing offset to avoid clashing with writeback locking.

Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v4: drop the unnecessary page offset checks
v3: revalidate page after locking it
v2: provide an unlock helper
---
 fs/read_write.c |   48 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1f5088dec566..68bcdd65b9f6 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
 	return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
 }
 
-/*
- * Read a page's worth of file data into the page cache.  Return the page
- * locked.
- */
+/* Read a page's worth of file data into the page cache. */
 static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 {
 	struct page *page;
@@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 		put_page(page);
 		return ERR_PTR(-EIO);
 	}
-	lock_page(page);
 	return page;
 }
 
+/*
+ * Lock two pages, ensuring that we lock in offset order if the pages are from
+ * the same file.
+ */
+static void vfs_lock_two_pages(struct page *page1, struct page *page2)
+{
+	/* Always lock in order of increasing index. */
+	if (page1->index > page2->index)
+		swap(page1, page2);
+
+	lock_page(page1);
+	if (page1 != page2)
+		lock_page(page2);
+}
+
+/* Unlock two pages, being careful not to unlock the same page twice. */
+static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
+{
+	unlock_page(page1);
+	if (page1 != page2)
+		unlock_page(page2);
+}
+
 /*
  * Compare extents of two files to see if they are the same.
  * Caller must have locked both inodes to prevent write races.
@@ -1867,10 +1886,23 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 		dest_page = vfs_dedupe_get_page(dest, destoff);
 		if (IS_ERR(dest_page)) {
 			error = PTR_ERR(dest_page);
-			unlock_page(src_page);
 			put_page(src_page);
 			goto out_error;
 		}
+
+		vfs_lock_two_pages(src_page, dest_page);
+
+		/*
+		 * Now that we've locked both pages, make sure they're still
+		 * mapped to the file we're interested in.  If not, someone
+		 * is invalidating pages on us and we lose.
+		 */
+		if (src_page->mapping != src->i_mapping ||
+		    dest_page->mapping != dest->i_mapping) {
+			same = false;
+			goto unlock;
+		}
+
 		src_addr = kmap_atomic(src_page);
 		dest_addr = kmap_atomic(dest_page);
 
@@ -1882,8 +1914,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 
 		kunmap_atomic(dest_addr);
 		kunmap_atomic(src_addr);
-		unlock_page(dest_page);
-		unlock_page(src_page);
+unlock:
+		vfs_unlock_two_pages(src_page, dest_page);
 		put_page(dest_page);
 		put_page(src_page);
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] vfs: fix page locking deadlocks when deduping files
  2019-08-15 16:49 [PATCH v4] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
@ 2019-08-15 18:18 ` Matthew Wilcox
  2019-08-16  6:47   ` Christoph Hellwig
  2019-08-16 16:13 ` [PATCH v5] " Darrick J. Wong
  1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2019-08-15 18:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, xfs, linux-fsdevel, fdmanana, gaoxiang25

On Thu, Aug 15, 2019 at 09:49:40AM -0700, Darrick J. Wong wrote:
> Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>

(I approve of this patch going upstream)

However, I think there are further simplifications to be made here.
With this patch applied, vfs_dedupe_get_page() now looks like this:

static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
{
        struct page *page;

        page = read_mapping_page(inode->i_mapping, offset >> PAGE_SHIFT, NULL);
        if (IS_ERR(page))
                return page;
        if (!PageUptodate(page)) {
                put_page(page);
                return ERR_PTR(-EIO);
        }
        return page;
}

But I don't think read_mapping_page() can return a page which doesn't have
PageUptodate set.  Follow the path down through read_cache_page() into
do_read_cache_page().

Other than the locations which return an ERR_PTR, the only return point
is at the out: label.  Three of the gotos to 'out' are guarded by 'if
(PageUptodate)'.  The fourth is after calling wait_on_page_read().  Which
will return ERR_PTR(-EIO) if the page isn't Uptodate after being unlocked.

Subtracting that never-exercised check from the routine leaves us with:

static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
{
        struct page *page;

        page = read_mapping_page(inode->i_mapping, offset >> PAGE_SHIFT, NULL);
        if (IS_ERR(page))
                return page;
        return page;
}

which is fundamentally just:

static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
{
        return read_mapping_page(inode->i_mapping, offset >> PAGE_SHIFT, NULL);
}

which seems like it might have a better name and be located in pagemap.h?

I might also like to see

 out:
+	VM_BUG_ON(!PageUptodate(page));
 	mark_page_accessed(page);

at the end of do_read_cache_page(), just to be sure nobody ever screws
that up.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] vfs: fix page locking deadlocks when deduping files
  2019-08-15 18:18 ` Matthew Wilcox
@ 2019-08-16  6:47   ` Christoph Hellwig
  2019-08-16 16:57     ` Matthew Wilcox
  0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2019-08-16  6:47 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Darrick J. Wong, Dave Chinner, xfs, linux-fsdevel, fdmanana, gaoxiang25

On Thu, Aug 15, 2019 at 11:18:04AM -0700, Matthew Wilcox wrote:
> But I don't think read_mapping_page() can return a page which doesn't have
> PageUptodate set.  Follow the path down through read_cache_page() into
> do_read_cache_page().

read_mapping_page() can't, but I think we need the check after the 
lock_page still.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v5] vfs: fix page locking deadlocks when deduping files
  2019-08-15 16:49 [PATCH v4] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
  2019-08-15 18:18 ` Matthew Wilcox
@ 2019-08-16 16:13 ` Darrick J. Wong
  1 sibling, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2019-08-16 16:13 UTC (permalink / raw)
  To: Dave Chinner
  Cc: xfs, linux-fsdevel, fdmanana, gaoxiang25, willy, Christoph Hellwig

From: Darrick J. Wong <darrick.wong@oracle.com>

When dedupe wants to use the page cache to compare parts of two files
for dedupe, we must be very careful to handle locking correctly.  The
current code doesn't do this.  It must lock and unlock the page only
once if the two pages are the same, since the overlapping range check
doesn't catch this when blocksize < pagesize.  If the pages are distinct
but from the same file, we must observe page locking order and lock them
in order of increasing offset to avoid clashing with writeback locking.

Fixes: 876bec6f9bbfcb3 ("vfs: refactor clone/dedupe_file_range common functions")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
v5: recheck pageuptodate
v4: drop the unnecessary page offset checks
v3: revalidate page after locking it
v2: provide an unlock helper
---
 fs/read_write.c |   49 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 41 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1f5088dec566..5bbf587f5bc1 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1811,10 +1811,7 @@ static int generic_remap_check_len(struct inode *inode_in,
 	return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL;
 }
 
-/*
- * Read a page's worth of file data into the page cache.  Return the page
- * locked.
- */
+/* Read a page's worth of file data into the page cache. */
 static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 {
 	struct page *page;
@@ -1826,10 +1823,32 @@ static struct page *vfs_dedupe_get_page(struct inode *inode, loff_t offset)
 		put_page(page);
 		return ERR_PTR(-EIO);
 	}
-	lock_page(page);
 	return page;
 }
 
+/*
+ * Lock two pages, ensuring that we lock in offset order if the pages are from
+ * the same file.
+ */
+static void vfs_lock_two_pages(struct page *page1, struct page *page2)
+{
+	/* Always lock in order of increasing index. */
+	if (page1->index > page2->index)
+		swap(page1, page2);
+
+	lock_page(page1);
+	if (page1 != page2)
+		lock_page(page2);
+}
+
+/* Unlock two pages, being careful not to unlock the same page twice. */
+static void vfs_unlock_two_pages(struct page *page1, struct page *page2)
+{
+	unlock_page(page1);
+	if (page1 != page2)
+		unlock_page(page2);
+}
+
 /*
  * Compare extents of two files to see if they are the same.
  * Caller must have locked both inodes to prevent write races.
@@ -1867,10 +1886,24 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 		dest_page = vfs_dedupe_get_page(dest, destoff);
 		if (IS_ERR(dest_page)) {
 			error = PTR_ERR(dest_page);
-			unlock_page(src_page);
 			put_page(src_page);
 			goto out_error;
 		}
+
+		vfs_lock_two_pages(src_page, dest_page);
+
+		/*
+		 * Now that we've locked both pages, make sure they're still
+		 * mapped to the file data we're interested in.  If not,
+		 * someone is invalidating pages on us and we lose.
+		 */
+		if (!PageUptodate(src_page) || !PageUptodate(dest_page) ||
+		    src_page->mapping != src->i_mapping ||
+		    dest_page->mapping != dest->i_mapping) {
+			same = false;
+			goto unlock;
+		}
+
 		src_addr = kmap_atomic(src_page);
 		dest_addr = kmap_atomic(dest_page);
 
@@ -1882,8 +1915,8 @@ static int vfs_dedupe_file_range_compare(struct inode *src, loff_t srcoff,
 
 		kunmap_atomic(dest_addr);
 		kunmap_atomic(src_addr);
-		unlock_page(dest_page);
-		unlock_page(src_page);
+unlock:
+		vfs_unlock_two_pages(src_page, dest_page);
 		put_page(dest_page);
 		put_page(src_page);
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v4] vfs: fix page locking deadlocks when deduping files
  2019-08-16  6:47   ` Christoph Hellwig
@ 2019-08-16 16:57     ` Matthew Wilcox
  0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2019-08-16 16:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Darrick J. Wong, Dave Chinner, xfs, linux-fsdevel, fdmanana, gaoxiang25

On Thu, Aug 15, 2019 at 11:47:53PM -0700, Christoph Hellwig wrote:
> On Thu, Aug 15, 2019 at 11:18:04AM -0700, Matthew Wilcox wrote:
> > But I don't think read_mapping_page() can return a page which doesn't have
> > PageUptodate set.  Follow the path down through read_cache_page() into
> > do_read_cache_page().
> 
> read_mapping_page() can't, but I think we need the check after the 
> lock_page still.

The current code checks before locking the page:

        if (!PageUptodate(page)) {
                put_page(page);
                return ERR_PTR(-EIO);
        }
        lock_page(page);

so the race with clearing Uptodate already existed.

XFS can ClearPageUptodate if it gets an error while doing writeback of
a dirty page.  But we shouldn't be able to get a dirty page.  Before we
get to this point, we call filemap_write_and_wait_range() while holding
the inode's modification lock.

So I think we can't see a !Uptodate page, but of course I'd be happy to
see a sequence of events that breaks this reasoning.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-16 16:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 16:49 [PATCH v4] vfs: fix page locking deadlocks when deduping files Darrick J. Wong
2019-08-15 18:18 ` Matthew Wilcox
2019-08-16  6:47   ` Christoph Hellwig
2019-08-16 16:57     ` Matthew Wilcox
2019-08-16 16:13 ` [PATCH v5] " Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).