All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: linux-mm@kvack.org
Cc: Matthew Wilcox <willy@infradead.org>, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Jason Gunthorpe <jgg@nvidia.com>, Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>,
	Dave Chinner <david@fromorbit.com>,
	nvdimm@lists.linux.dev, akpm@linux-foundation.org,
	linux-fsdevel@vger.kernel.org
Subject: [PATCH v3 05/25] fsdax: Wait for pinned pages during truncate_inode_pages_final()
Date: Fri, 14 Oct 2022 16:57:25 -0700	[thread overview]
Message-ID: <166579184544.2236710.791897642091142558.stgit@dwillia2-xfh.jf.intel.com> (raw)
In-Reply-To: <166579181584.2236710.17813547487183983273.stgit@dwillia2-xfh.jf.intel.com>

The fsdax truncate vs page pinning solution is incomplete. The initial
solution landed in v4.17 and covered typical truncate invoked through
truncate(2) and fallocate(2), i.e. the truncate_inode_pages() called on
open files. However, that enabling left truncate_inode_pages_final(),
called after iput_final() to free the inode, unprotected. Thankfully
that v4.17 enabling also left a warning in place to fire if any truncate
is attempted while a DAX page is still pinned:

commit d2c997c0f145 ("fs, dax: use page->mapping to warn if truncate collides with a busy page")

While a lore search indicates no reports of that firing, the hole is
there nonetheless. The concern is that if/when that warning fires it
indicates a use-after-free condition whereby the filesystem has lost the
ability to arbitrate access to its storage blocks. For example, in the
worst case, DMA may be ongoing while the filesystem thinks the block is
free to be reallocated to another inode.

This patch is based on an observation from Dave that during iput_final()
there is no need to hold filesystem locks like the explicit truncate
path. The wait can occur from within dax_delete_mapping_entry() called
by truncate_folio_batch_exceptionals().

This solution trades off fixing the use-after-free with a theoretical
deadlock scenario. If the agent holding the page pin triggers inode
reclaim and that reclaim waits for the pin to drop it will deadlock. Two
observations make this approach still worth pursuing:

1/ Any existing scenarios where that happens would have triggered the
   warning referenced above which has shipped upstream for ~5 years
   without a bug report on lore.

2/ Most I/O drivers only hold page pins in their fast paths and new
   __GFP_FS allocations are unlikely in a driver fast path. I.e. if the
   deadlock triggers the likely fix would be in the offending driver, not
   new band-aids in fsdax.

So, update the DAX core to notice that the inode->i_mapping is in the
exiting state and use that as a signal that the inode is unreferenced
await page-pins to drain.

Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Hubbard <jhubbard@nvidia.com>
Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 fs/dax.c |   26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/fs/dax.c b/fs/dax.c
index a75d4bf541b4..e3deb60a792f 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -803,13 +803,37 @@ static int __dax_invalidate_entry(struct address_space *mapping,
 	return ret;
 }
 
+/*
+ * wait indefinitely for all pins to drop, the alternative to waiting is
+ * a potential use-after-free scenario
+ */
+static void dax_break_layout(struct address_space *mapping, pgoff_t index)
+{
+	/* To do this without locks, the inode needs to be unreferenced */
+	WARN_ON(atomic_read(&mapping->host->i_count));
+	do {
+		struct page *page;
+
+		page = dax_zap_mappings_range(mapping, index << PAGE_SHIFT,
+					      (index + 1) << PAGE_SHIFT);
+		if (!page)
+			return;
+		wait_var_event(page, dax_page_idle(page));
+	} while (true);
+}
+
 /*
  * Delete DAX entry at @index from @mapping.  Wait for it
  * to be unlocked before deleting it.
  */
 int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index)
 {
-	int ret = __dax_invalidate_entry(mapping, index, true);
+	int ret;
+
+	if (mapping_exiting(mapping))
+		dax_break_layout(mapping, index);
+
+	ret = __dax_invalidate_entry(mapping, index, true);
 
 	/*
 	 * This gets called from truncate / punch_hole path. As such, the caller


  parent reply	other threads:[~2022-10-14 23:57 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-14 23:56 [PATCH v3 00/25] Fix the DAX-gup mistake Dan Williams
2022-10-14 23:57 ` [PATCH v3 01/25] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-10-14 23:57 ` [PATCH v3 02/25] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-10-14 23:57 ` [PATCH v3 03/25] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-10-14 23:57 ` [PATCH v3 04/25] fsdax: Introduce dax_zap_mappings() Dan Williams
2022-11-02 13:04   ` Aneesh Kumar K.V
2022-10-14 23:57 ` Dan Williams [this message]
2022-10-14 23:57 ` [PATCH v3 06/25] fsdax: Validate DAX layouts broken before truncate Dan Williams
2022-10-14 23:57 ` [PATCH v3 07/25] fsdax: Hold dax lock over mapping insertion Dan Williams
2022-10-17 19:31   ` Jason Gunthorpe
2022-10-17 20:17     ` Dan Williams
2022-10-18  5:26       ` Christoph Hellwig
2022-10-18 17:30         ` Dan Williams
2022-10-14 23:57 ` [PATCH v3 08/25] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-10-14 23:57 ` [PATCH v3 09/25] fsdax: Rework for_each_mapped_pfn() to dax_for_each_folio() Dan Williams
2022-10-14 23:57 ` [PATCH v3 10/25] fsdax: Introduce pgmap_request_folios() Dan Williams
2022-10-17  6:31   ` Alistair Popple
2022-10-17 20:06     ` Dan Williams
2022-10-17 20:11       ` Jason Gunthorpe
2022-10-17 20:51         ` Dan Williams
2022-10-17 23:57           ` Jason Gunthorpe
2022-10-18  0:19             ` Dan Williams
2022-10-17 19:41   ` Jason Gunthorpe
2022-10-14 23:58 ` [PATCH v3 11/25] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-10-14 23:58 ` [PATCH v3 12/25] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-10-14 23:58 ` [PATCH v3 13/25] devdax: Minor warning fixups Dan Williams
2022-10-14 23:58 ` [PATCH v3 14/25] devdax: Fix sparse lock imbalance warning Dan Williams
2022-10-14 23:58 ` [PATCH v3 15/25] libnvdimm/pmem: Support pmem block devices without dax Dan Williams
2022-10-14 23:58 ` [PATCH v3 16/25] devdax: Move address_space helpers to the DAX core Dan Williams
2022-10-14 23:58 ` [PATCH v3 17/25] devdax: Sparse fixes for xarray locking Dan Williams
2022-10-14 23:58 ` [PATCH v3 18/25] devdax: Sparse fixes for vmfault_t / dax-entry conversions Dan Williams
2022-10-14 23:58 ` [PATCH v3 19/25] devdax: Sparse fixes for vm_fault_t in tracepoints Dan Williams
2022-10-14 23:58 ` [PATCH v3 20/25] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-10-14 23:59 ` [PATCH v3 21/25] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-10-14 23:59 ` [PATCH v3 22/25] mm/memremap_pages: Replace zone_device_page_init() with pgmap_request_folios() Dan Williams
2022-10-17 19:17   ` Lyude Paul
2022-10-14 23:59 ` [PATCH v3 23/25] mm/memremap_pages: Initialize all ZONE_DEVICE pages to start at refcount 0 Dan Williams
2022-10-17  7:04   ` Alistair Popple
2022-10-17 19:48   ` Jason Gunthorpe
2022-10-14 23:59 ` [PATCH v3 24/25] mm/meremap_pages: Delete put_devmap_managed_page_refs() Dan Williams
2022-10-17  7:08   ` Alistair Popple
2022-10-14 23:59 ` [PATCH v3 25/25] mm/gup: Drop DAX pgmap accounting Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=166579184544.2236710.791897642091142558.stgit@dwillia2-xfh.jf.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.