All of lore.kernel.org
 help / color / mirror / Atom feed
From: alexander.levin@verizon.com
To: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Cc: Jan Kara <jack@suse.cz>, Dan Williams <dan.j.williams@intel.com>,
	alexander.levin@verizon.com
Subject: [PATCH AUTOSEL for 4.9 01/54] dax: Avoid page invalidation races and unnecessary radix tree traversals
Date: Wed, 22 Nov 2017 22:23:23 +0000	[thread overview]
Message-ID: <20171122222246.19618-2-alexander.levin@verizon.com> (raw)
In-Reply-To: <20171122222246.19618-1-alexander.levin@verizon.com>

From: Jan Kara <jack@suse.cz>

[ Upstream commit e3fce68cdbed297d927e993b3ea7b8b1cee545da ]

Currently dax_iomap_rw() takes care of invalidating page tables and
evicting hole pages from the radix tree when write(2) to the file
happens. This invalidation is only necessary when there is some block
allocation resulting from write(2). Furthermore in current place the
invalidation is racy wrt page fault instantiating a hole page just after
we have invalidated it.

So perform the page invalidation inside dax_iomap_actor() where we can
do it only when really necessary and after blocks have been allocated so
nobody will be instantiating new hole pages anymore.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
---
 fs/dax.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index bf6218da7928..800748f10b3d 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1265,6 +1265,17 @@ iomap_dax_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 	if (WARN_ON_ONCE(iomap->type != IOMAP_MAPPED))
 		return -EIO;
 
+	/*
+	 * Write can allocate block for an area which has a hole page mapped
+	 * into page tables. We have to tear down these mappings so that data
+	 * written by write(2) is visible in mmap.
+	 */
+	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+		invalidate_inode_pages2_range(inode->i_mapping,
+					      pos >> PAGE_SHIFT,
+					      (end - 1) >> PAGE_SHIFT);
+	}
+
 	while (pos < end) {
 		unsigned offset = pos & (PAGE_SIZE - 1);
 		struct blk_dax_ctl dax = { 0 };
@@ -1329,23 +1340,6 @@ iomap_dax_rw(struct kiocb *iocb, struct iov_iter *iter,
 	if (iov_iter_rw(iter) == WRITE)
 		flags |= IOMAP_WRITE;
 
-	/*
-	 * Yes, even DAX files can have page cache attached to them:  A zeroed
-	 * page is inserted into the pagecache when we have to serve a write
-	 * fault on a hole.  It should never be dirtied and can simply be
-	 * dropped from the pagecache once we get real data for the page.
-	 *
-	 * XXX: This is racy against mmap, and there's nothing we can do about
-	 * it. We'll eventually need to shift this down even further so that
-	 * we can check if we allocated blocks over a hole first.
-	 */
-	if (mapping->nrpages) {
-		ret = invalidate_inode_pages2_range(mapping,
-				pos >> PAGE_SHIFT,
-				(pos + iov_iter_count(iter) - 1) >> PAGE_SHIFT);
-		WARN_ON_ONCE(ret);
-	}
-
 	while (iov_iter_count(iter)) {
 		ret = iomap_apply(inode, pos, iov_iter_count(iter), flags, ops,
 				iter, iomap_dax_actor);
-- 
2.11.0

  reply	other threads:[~2017-11-22 22:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22 22:23 [PATCH AUTOSEL for 4.9 01/56] ACPICA: Resources: Not a valid resource if buffer length too long alexander.levin
2017-11-22 22:23 ` alexander.levin [this message]
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 02/54] net/mlx4_en: Fix type mismatch for 32-bit systems alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 02/56] RDS: make message size limit compliant with spec alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 03/54] l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 04/56] RDS: RDMA: fix the ib_map_mr_sg_zbva() argument alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 05/54] dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 03/56] RDS: RDMA: return appropriate error on rdma map failures alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 04/54] dmaengine: stm32-dma: Set correct args number for DMA request from DT alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 06/56] drm/sun4i: Fix a return value in case of error alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 06/54] usb: gadget: f_fs: Fix ExtCompat descriptor validation alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 05/56] PCI: Apply _HPX settings only to relevant devices alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 07/56] clk: sunxi-ng: A31: Fix spdif clock register alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 08/56] clk: sunxi-ng: fix PLL_CPUX adjusting on A33 alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 09/56] dmaengine: zx: set DMA_CYCLIC cap_mask bit alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 08/54] net: systemport: Utilize skb_put_padto() alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 07/54] libcxgb: fix error check for ip6_route_output() alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 10/56] fscrypt: use ENOKEY when file cannot be created w/o key alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 09/54] net: systemport: Pad packet before inserting TSB alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 10/54] ARM: OMAP2+: Fix WL1283 Bluetooth Baud Rate alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 11/54] ARM: OMAP1: DMA: Correct the number of logical channels alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 12/56] net: Allow IP_MULTICAST_IF to set index to L3 slave alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 13/54] be2net: fix accesses to unicast list alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 11/56] fscrypt: use ENOTDIR when setting encryption policy on nondirectory alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 12/54] vti6: fix device register to report IFLA_INFO_KIND alexander.levin
2017-11-22 22:23 ` [PATCH AUTOSEL for 4.9 13/56] net: 3com: typhoon: typhoon_init_one: make return values more specific alexander.levin
2017-11-22 22:23 [PATCH AUTOSEL for 4.9 01/54] dax: Avoid page invalidation races and unnecessary radix tree traversals alexander.levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171122222246.19618-2-alexander.levin@verizon.com \
    --to=alexander.levin@verizon.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.