All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Cc: Latchesar Ionkov <lucho@ionkov.net>, Jan Kara <jack@suse.cz>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	linux-mm@kvack.org, Christoph Hellwig <hch@lst.de>,
	linux-cifs@vger.kernel.org,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Eric Van Hensbergen <ericvh@gmail.com>,
	linux-nvdimm@lists.01.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	v9fs-developer@lists.sourceforge.net,
	Jens Axboe <axboe@kernel.dk>,
	linux-nfs@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-fsdevel@vger.kernel.org, Ron Minnich <rminnich@sandia.gov>,
	Anna Schumaker <anna.schumaker@netapp.com>
Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads
Date: Thu,  4 May 2017 13:59:10 -0600	[thread overview]
Message-ID: <20170504195910.11579-2-ross.zwisler@linux.intel.com> (raw)
In-Reply-To: <20170504195910.11579-1-ross.zwisler@linux.intel.com>

Users of DAX can suffer data corruption from stale mmap reads via the
following sequence:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p).  The write succeeds but we incorrectly
  leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written.  Since the zero page
  mapping is still intact we read back zeroes instead of the new data.

We fix this by unconditionally calling invalidate_inode_pages2_range() in
dax_iomap_actor() for new block allocations, and by enhancing
invalidate_inode_pages2_range() so that it properly unmaps the DAX entries
being removed from the radix tree.

This is based on an initial patch from Jan Kara.

I've written an fstest that triggers this error:
http://www.spinics.net/lists/linux-mm/msg126276.html

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>    [4.10+]
---

Changes since v1:
 - Instead of unmapping each DAX entry individually in
   __dax_invalidate_mapping_entry(), instead unmap the whole range at once
   inside of invalidate_inode_pages2_range().  Each unmap requires an rmap
   walk so this should be less expensive, plus now we don't have to drop
   and re-acquire the mapping->tree_lock for each entry. (Jan)

These patches apply cleanly to v4.11 and have passed an xfstest run.
They also apply to v4.10.13 with a little help from git am's 3-way merger.

---
 fs/dax.c      |  8 ++++----
 mm/truncate.c | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 166504c..1f2c880 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index c537184..ad40316 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 		cond_resched();
 		index++;
 	}
+
+	/*
+	 * Ensure that any DAX exceptional entries that have been invalidated
+	 * are also unmapped.
+	 */
+	if (dax_mapping(mapping)) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT,
+				(loff_t)(1 + end - start) << PAGE_SHIFT, 0);
+	}
+
 	cleancache_invalidate_inode(mapping);
 	return ret;
 }
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: Latchesar Ionkov <lucho-OnYtXJJ0/fesTnJN9+BGXg@public.gmane.org>,
	Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Matthew Wilcox <mawilcox-0li6OtcxBFHby3iVrkZq2A@public.gmane.org>,
	Andrey Ryabinin
	<aryabinin-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Eric Van Hensbergen
	<ericvh-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	v9fs-developer-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org,
	Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Darrick J. Wong"
	<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ@public.gmane.org,
	Steve French <sfrench-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>,
	Alexey Kuznetsov <kuznet-5HdwGun5lf+gSpxsJD1C4w@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Ron Minnich <rminnich-4OHPYypu0djtX7QSmKvirg@public.gmane.org>,
	Anna Schumaker
	<anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org>
Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads
Date: Thu,  4 May 2017 13:59:10 -0600	[thread overview]
Message-ID: <20170504195910.11579-2-ross.zwisler@linux.intel.com> (raw)
In-Reply-To: <20170504195910.11579-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>

Users of DAX can suffer data corruption from stale mmap reads via the
following sequence:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p).  The write succeeds but we incorrectly
  leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written.  Since the zero page
  mapping is still intact we read back zeroes instead of the new data.

We fix this by unconditionally calling invalidate_inode_pages2_range() in
dax_iomap_actor() for new block allocations, and by enhancing
invalidate_inode_pages2_range() so that it properly unmaps the DAX entries
being removed from the radix tree.

This is based on an initial patch from Jan Kara.

I've written an fstest that triggers this error:
http://www.spinics.net/lists/linux-mm/msg126276.html

Signed-off-by: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>    [4.10+]
---

Changes since v1:
 - Instead of unmapping each DAX entry individually in
   __dax_invalidate_mapping_entry(), instead unmap the whole range at once
   inside of invalidate_inode_pages2_range().  Each unmap requires an rmap
   walk so this should be less expensive, plus now we don't have to drop
   and re-acquire the mapping->tree_lock for each entry. (Jan)

These patches apply cleanly to v4.11 and have passed an xfstest run.
They also apply to v4.10.13 with a little help from git am's 3-way merger.

---
 fs/dax.c      |  8 ++++----
 mm/truncate.c | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 166504c..1f2c880 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index c537184..ad40316 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 		cond_resched();
 		index++;
 	}
+
+	/*
+	 * Ensure that any DAX exceptional entries that have been invalidated
+	 * are also unmapped.
+	 */
+	if (dax_mapping(mapping)) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT,
+				(loff_t)(1 + end - start) << PAGE_SHIFT, 0);
+	}
+
 	cleancache_invalidate_inode(mapping);
 	return ret;
 }
-- 
2.9.3

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>, Jan Kara <jack@suse.cz>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-nvdimm@ml01.01.org, Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads
Date: Thu,  4 May 2017 13:59:10 -0600	[thread overview]
Message-ID: <20170504195910.11579-2-ross.zwisler@linux.intel.com> (raw)
In-Reply-To: <20170504195910.11579-1-ross.zwisler@linux.intel.com>

Users of DAX can suffer data corruption from stale mmap reads via the
following sequence:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p).  The write succeeds but we incorrectly
  leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written.  Since the zero page
  mapping is still intact we read back zeroes instead of the new data.

We fix this by unconditionally calling invalidate_inode_pages2_range() in
dax_iomap_actor() for new block allocations, and by enhancing
invalidate_inode_pages2_range() so that it properly unmaps the DAX entries
being removed from the radix tree.

This is based on an initial patch from Jan Kara.

I've written an fstest that triggers this error:
http://www.spinics.net/lists/linux-mm/msg126276.html

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>    [4.10+]
---

Changes since v1:
 - Instead of unmapping each DAX entry individually in
   __dax_invalidate_mapping_entry(), instead unmap the whole range at once
   inside of invalidate_inode_pages2_range().  Each unmap requires an rmap
   walk so this should be less expensive, plus now we don't have to drop
   and re-acquire the mapping->tree_lock for each entry. (Jan)

These patches apply cleanly to v4.11 and have passed an xfstest run.
They also apply to v4.10.13 with a little help from git am's 3-way merger.

---
 fs/dax.c      |  8 ++++----
 mm/truncate.c | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 166504c..1f2c880 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index c537184..ad40316 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 		cond_resched();
 		index++;
 	}
+
+	/*
+	 * Ensure that any DAX exceptional entries that have been invalidated
+	 * are also unmapped.
+	 */
+	if (dax_mapping(mapping)) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT,
+				(loff_t)(1 + end - start) << PAGE_SHIFT, 0);
+	}
+
 	cleancache_invalidate_inode(mapping);
 	return ret;
 }
-- 
2.9.3

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>, Jan Kara <jack@suse.cz>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-nvdimm@lists.01.org,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads
Date: Thu,  4 May 2017 13:59:10 -0600	[thread overview]
Message-ID: <20170504195910.11579-2-ross.zwisler@linux.intel.com> (raw)
In-Reply-To: <20170504195910.11579-1-ross.zwisler@linux.intel.com>

Users of DAX can suffer data corruption from stale mmap reads via the
following sequence:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p).  The write succeeds but we incorrectly
  leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written.  Since the zero page
  mapping is still intact we read back zeroes instead of the new data.

We fix this by unconditionally calling invalidate_inode_pages2_range() in
dax_iomap_actor() for new block allocations, and by enhancing
invalidate_inode_pages2_range() so that it properly unmaps the DAX entries
being removed from the radix tree.

This is based on an initial patch from Jan Kara.

I've written an fstest that triggers this error:
http://www.spinics.net/lists/linux-mm/msg126276.html

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>    [4.10+]
---

Changes since v1:
 - Instead of unmapping each DAX entry individually in
   __dax_invalidate_mapping_entry(), instead unmap the whole range at once
   inside of invalidate_inode_pages2_range().  Each unmap requires an rmap
   walk so this should be less expensive, plus now we don't have to drop
   and re-acquire the mapping->tree_lock for each entry. (Jan)

These patches apply cleanly to v4.11 and have passed an xfstest run.
They also apply to v4.10.13 with a little help from git am's 3-way merger.

---
 fs/dax.c      |  8 ++++----
 mm/truncate.c | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 166504c..1f2c880 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index c537184..ad40316 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 		cond_resched();
 		index++;
 	}
+
+	/*
+	 * Ensure that any DAX exceptional entries that have been invalidated
+	 * are also unmapped.
+	 */
+	if (dax_mapping(mapping)) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT,
+				(loff_t)(1 + end - start) << PAGE_SHIFT, 0);
+	}
+
 	cleancache_invalidate_inode(mapping);
 	return ret;
 }
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Ross Zwisler <ross.zwisler@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Alexey Kuznetsov <kuznet@virtuozzo.com>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Anna Schumaker <anna.schumaker@netapp.com>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Eric Van Hensbergen <ericvh@gmail.com>, Jan Kara <jack@suse.cz>,
	Jens Axboe <axboe@kernel.dk>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	Latchesar Ionkov <lucho@ionkov.net>,
	linux-cifs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-nfs@vger.kernel.org,
	linux-nvdimm@lists.01.org,
	Matthew Wilcox <mawilcox@microsoft.com>,
	Ron Minnich <rminnich@sandia.gov>,
	samba-technical@lists.samba.org, Steve French <sfrench@samba.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	v9fs-developer@lists.sourceforge.net
Subject: [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads
Date: Thu,  4 May 2017 13:59:10 -0600	[thread overview]
Message-ID: <20170504195910.11579-2-ross.zwisler@linux.intel.com> (raw)
In-Reply-To: <20170504195910.11579-1-ross.zwisler@linux.intel.com>

Users of DAX can suffer data corruption from stale mmap reads via the
following sequence:

- open an mmap over a 2MiB hole

- read from a 2MiB hole, faulting in a 2MiB zero page

- write to the hole with write(3p).  The write succeeds but we incorrectly
  leave the 2MiB zero page mapping intact.

- via the mmap, read the data that was just written.  Since the zero page
  mapping is still intact we read back zeroes instead of the new data.

We fix this by unconditionally calling invalidate_inode_pages2_range() in
dax_iomap_actor() for new block allocations, and by enhancing
invalidate_inode_pages2_range() so that it properly unmaps the DAX entries
being removed from the radix tree.

This is based on an initial patch from Jan Kara.

I've written an fstest that triggers this error:
http://www.spinics.net/lists/linux-mm/msg126276.html

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Fixes: c6dcf52c23d2 ("mm: Invalidate DAX radix tree entries only if appropriate")
Reported-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>    [4.10+]
---

Changes since v1:
 - Instead of unmapping each DAX entry individually in
   __dax_invalidate_mapping_entry(), instead unmap the whole range at once
   inside of invalidate_inode_pages2_range().  Each unmap requires an rmap
   walk so this should be less expensive, plus now we don't have to drop
   and re-acquire the mapping->tree_lock for each entry. (Jan)

These patches apply cleanly to v4.11 and have passed an xfstest run.
They also apply to v4.10.13 with a little help from git am's 3-way merger.

---
 fs/dax.c      |  8 ++++----
 mm/truncate.c | 10 ++++++++++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 166504c..1f2c880 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -999,11 +999,11 @@ dax_iomap_actor(struct inode *inode, loff_t pos, loff_t length, void *data,
 		return -EIO;
 
 	/*
-	 * Write can allocate block for an area which has a hole page mapped
-	 * into page tables. We have to tear down these mappings so that data
-	 * written by write(2) is visible in mmap.
+	 * Write can allocate block for an area which has a hole page or zero
+	 * PMD entry in the radix tree.  We have to tear down these mappings so
+	 * that data written by write(2) is visible in mmap.
 	 */
-	if ((iomap->flags & IOMAP_F_NEW) && inode->i_mapping->nrpages) {
+	if (iomap->flags & IOMAP_F_NEW) {
 		invalidate_inode_pages2_range(inode->i_mapping,
 					      pos >> PAGE_SHIFT,
 					      (end - 1) >> PAGE_SHIFT);
diff --git a/mm/truncate.c b/mm/truncate.c
index c537184..ad40316 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -683,6 +683,16 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 		cond_resched();
 		index++;
 	}
+
+	/*
+	 * Ensure that any DAX exceptional entries that have been invalidated
+	 * are also unmapped.
+	 */
+	if (dax_mapping(mapping)) {
+		unmap_mapping_range(mapping, (loff_t)start << PAGE_SHIFT,
+				(loff_t)(1 + end - start) << PAGE_SHIFT, 0);
+	}
+
 	cleancache_invalidate_inode(mapping);
 	return ret;
 }
-- 
2.9.3


  reply	other threads:[~2017-05-04 19:59 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-04 19:59 [PATCH v2 1/2] dax: prevent invalidation of mapped DAX entries Ross Zwisler
2017-05-04 19:59 ` Ross Zwisler
2017-05-04 19:59 ` Ross Zwisler
2017-05-04 19:59 ` Ross Zwisler
2017-05-04 19:59 ` Ross Zwisler
2017-05-04 19:59 ` Ross Zwisler [this message]
2017-05-04 19:59   ` [PATCH v2 2/2] dax: fix data corruption due to stale mmap reads Ross Zwisler
2017-05-04 19:59   ` Ross Zwisler
2017-05-04 19:59   ` Ross Zwisler
2017-05-04 19:59   ` Ross Zwisler
2017-05-05  7:29 ` [PATCH v2 1/2] dax: prevent invalidation of mapped DAX entries Jan Kara
2017-05-05  7:29   ` Jan Kara
2017-05-05  7:29   ` Jan Kara
2017-05-05  7:29   ` Jan Kara
2017-05-05  7:29   ` Jan Kara
2017-05-08 17:08   ` Ross Zwisler
2017-05-08 17:08     ` Ross Zwisler
2017-05-08 17:08     ` Ross Zwisler
2017-05-08 17:08     ` Ross Zwisler
2017-05-08 17:08     ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170504195910.11579-2-ross.zwisler@linux.intel.com \
    --to=ross.zwisler@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=anna.schumaker@netapp.com \
    --cc=aryabinin@virtuozzo.com \
    --cc=axboe@kernel.dk \
    --cc=darrick.wong@oracle.com \
    --cc=ericvh@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=kuznet@virtuozzo.com \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=lucho@ionkov.net \
    --cc=mawilcox@microsoft.com \
    --cc=rminnich@sandia.gov \
    --cc=samba-technical@lists.samba.org \
    --cc=sfrench@samba.org \
    --cc=trond.myklebust@primarydata.com \
    --cc=v9fs-developer@lists.sourceforge.net \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.