All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: Dave Kleikamp <shaggy@kernel.org>,
	jfs-discussion@lists.sourceforge.net, tytso@mit.edu,
	Jeff Mahoney <jeffm@suse.de>, Mark Fasheh <mfasheh@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	reiserfs-devel@vger.kernel.org, xfs@oss.sgi.com,
	cluster-devel@redhat.com, Joel Becker <jlbec@evilplan.org>,
	Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org,
	Steven Whitehouse <swhiteho@redhat.com>,
	ocfs2-devel@oss.oracle.com, viro@zeniv.linux.org.uk
Subject: [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data
Date: Fri, 10 Oct 2014 16:23:39 +0200	[thread overview]
Message-ID: <1412951028-4085-35-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz>

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c        |  3 +++
 include/linux/mm.h |  8 ++++++++
 mm/truncate.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111bbb8b..3ba5a6a1bc5f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2080,6 +2080,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
+	loff_t old_size = inode->i_size;
 	int i_size_changed = 0;
 
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2099,6 +2100,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	page_cache_release(page);
 
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc882ed2..f0e53e5a3b17 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1155,6 +1155,14 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
 
 extern void truncate_pagecache(struct inode *inode, loff_t new);
 extern void truncate_setsize(struct inode *inode, loff_t newsize);
+#ifdef CONFIG_MMU
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
+#else
+static inline void pagecache_isize_extended(struct inode *inode, loff_t from,
+					    loff_t to)
+{
+}
+#endif
 void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
 int truncate_inode_page(struct address_space *mapping, struct page *page);
 int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 96d167372d89..261eaf6e5a19 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/buffer_head.h>	/* grr. try_to_release_page,
 				   do_invalidatepage */
 #include <linux/cleancache.h>
+#include <linux/rmap.h>
 #include "internal.h"
 
 static void clear_exceptional_entry(struct address_space *mapping,
@@ -719,12 +720,68 @@ EXPORT_SYMBOL(truncate_pagecache);
  */
 void truncate_setsize(struct inode *inode, loff_t newsize)
 {
+	loff_t oldsize = inode->i_size;
+
 	i_size_write(inode, newsize);
+	if (newsize > oldsize)
+		pagecache_isize_extended(inode, oldsize, newsize);
 	truncate_pagecache(inode, newsize);
 }
 EXPORT_SYMBOL(truncate_setsize);
 
 /**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode:	inode for which i_size was extended
+ * @from:	original inode size
+ * @to:		new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page.  This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+	int bsize = 1 << inode->i_blkbits;
+	loff_t rounded_from;
+	struct page *page;
+	pgoff_t index;
+
+	WARN_ON(!mutex_is_locked(&inode->i_mutex));
+	WARN_ON(to > inode->i_size);
+
+	if (from >= to || bsize == PAGE_CACHE_SIZE)
+		return;
+	/* Page straddling @from will not have any hole block created? */
+	rounded_from = round_up(from, bsize);
+	if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+		return;
+
+	index = from >> PAGE_CACHE_SHIFT;
+	page = find_lock_page(inode->i_mapping, index);
+	/* Page not cached? Nothing to do */
+	if (!page)
+		return;
+	/*
+	 * See clear_page_dirty_for_io() for details why set_page_dirty()
+	 * is needed.
+	 */
+	if (page_mkclean(page))
+		set_page_dirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
  * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
  * @inode: inode
  * @lstart: offset of beginning of hole
-- 
1.8.1.4


------------------------------------------------------------------------------
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: Dave Kleikamp <shaggy@kernel.org>,
	jfs-discussion@lists.sourceforge.net, tytso@mit.edu,
	Jeff Mahoney <jeffm@suse.de>, Mark Fasheh <mfasheh@suse.com>,
	reiserfs-devel@vger.kernel.org, xfs@oss.sgi.com,
	cluster-devel@redhat.com, Joel Becker <jlbec@evilplan.org>,
	Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org,
	Steven Whitehouse <swhiteho@redhat.com>,
	ocfs2-devel@oss.oracle.com, viro@zeniv.linux.org.uk
Subject: [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data
Date: Fri, 10 Oct 2014 16:23:39 +0200	[thread overview]
Message-ID: <1412951028-4085-35-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz>

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c        |  3 +++
 include/linux/mm.h |  8 ++++++++
 mm/truncate.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111bbb8b..3ba5a6a1bc5f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2080,6 +2080,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
+	loff_t old_size = inode->i_size;
 	int i_size_changed = 0;
 
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2099,6 +2100,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	page_cache_release(page);
 
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc882ed2..f0e53e5a3b17 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1155,6 +1155,14 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
 
 extern void truncate_pagecache(struct inode *inode, loff_t new);
 extern void truncate_setsize(struct inode *inode, loff_t newsize);
+#ifdef CONFIG_MMU
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
+#else
+static inline void pagecache_isize_extended(struct inode *inode, loff_t from,
+					    loff_t to)
+{
+}
+#endif
 void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
 int truncate_inode_page(struct address_space *mapping, struct page *page);
 int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 96d167372d89..261eaf6e5a19 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/buffer_head.h>	/* grr. try_to_release_page,
 				   do_invalidatepage */
 #include <linux/cleancache.h>
+#include <linux/rmap.h>
 #include "internal.h"
 
 static void clear_exceptional_entry(struct address_space *mapping,
@@ -719,12 +720,68 @@ EXPORT_SYMBOL(truncate_pagecache);
  */
 void truncate_setsize(struct inode *inode, loff_t newsize)
 {
+	loff_t oldsize = inode->i_size;
+
 	i_size_write(inode, newsize);
+	if (newsize > oldsize)
+		pagecache_isize_extended(inode, oldsize, newsize);
 	truncate_pagecache(inode, newsize);
 }
 EXPORT_SYMBOL(truncate_setsize);
 
 /**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode:	inode for which i_size was extended
+ * @from:	original inode size
+ * @to:		new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page.  This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+	int bsize = 1 << inode->i_blkbits;
+	loff_t rounded_from;
+	struct page *page;
+	pgoff_t index;
+
+	WARN_ON(!mutex_is_locked(&inode->i_mutex));
+	WARN_ON(to > inode->i_size);
+
+	if (from >= to || bsize == PAGE_CACHE_SIZE)
+		return;
+	/* Page straddling @from will not have any hole block created? */
+	rounded_from = round_up(from, bsize);
+	if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+		return;
+
+	index = from >> PAGE_CACHE_SHIFT;
+	page = find_lock_page(inode->i_mapping, index);
+	/* Page not cached? Nothing to do */
+	if (!page)
+		return;
+	/*
+	 * See clear_page_dirty_for_io() for details why set_page_dirty()
+	 * is needed.
+	 */
+	if (page_mkclean(page))
+		set_page_dirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
  * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
  * @inode: inode
  * @lstart: offset of beginning of hole
-- 
1.8.1.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: linux-fsdevel@vger.kernel.org
Cc: Dave Kleikamp <shaggy@kernel.org>,
	jfs-discussion@lists.sourceforge.net, tytso@mit.edu,
	Jeff Mahoney <jeffm@suse.de>, Mark Fasheh <mfasheh@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	reiserfs-devel@vger.kernel.org, xfs@oss.sgi.com,
	cluster-devel@redhat.com, Joel Becker <jlbec@evilplan.org>,
	Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org,
	Steven Whitehouse <swhiteho@redhat.com>,
	ocfs2-devel@oss.oracle.com, viro@zeniv.linux.org.uk
Subject: [Ocfs2-devel] [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data
Date: Fri, 10 Oct 2014 16:23:39 +0200	[thread overview]
Message-ID: <1412951028-4085-35-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz>

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c        |  3 +++
 include/linux/mm.h |  8 ++++++++
 mm/truncate.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111bbb8b..3ba5a6a1bc5f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2080,6 +2080,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
+	loff_t old_size = inode->i_size;
 	int i_size_changed = 0;
 
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2099,6 +2100,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	page_cache_release(page);
 
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc882ed2..f0e53e5a3b17 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1155,6 +1155,14 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
 
 extern void truncate_pagecache(struct inode *inode, loff_t new);
 extern void truncate_setsize(struct inode *inode, loff_t newsize);
+#ifdef CONFIG_MMU
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
+#else
+static inline void pagecache_isize_extended(struct inode *inode, loff_t from,
+					    loff_t to)
+{
+}
+#endif
 void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
 int truncate_inode_page(struct address_space *mapping, struct page *page);
 int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 96d167372d89..261eaf6e5a19 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/buffer_head.h>	/* grr. try_to_release_page,
 				   do_invalidatepage */
 #include <linux/cleancache.h>
+#include <linux/rmap.h>
 #include "internal.h"
 
 static void clear_exceptional_entry(struct address_space *mapping,
@@ -719,12 +720,68 @@ EXPORT_SYMBOL(truncate_pagecache);
  */
 void truncate_setsize(struct inode *inode, loff_t newsize)
 {
+	loff_t oldsize = inode->i_size;
+
 	i_size_write(inode, newsize);
+	if (newsize > oldsize)
+		pagecache_isize_extended(inode, oldsize, newsize);
 	truncate_pagecache(inode, newsize);
 }
 EXPORT_SYMBOL(truncate_setsize);
 
 /**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode:	inode for which i_size was extended
+ * @from:	original inode size
+ * @to:		new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page.  This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+	int bsize = 1 << inode->i_blkbits;
+	loff_t rounded_from;
+	struct page *page;
+	pgoff_t index;
+
+	WARN_ON(!mutex_is_locked(&inode->i_mutex));
+	WARN_ON(to > inode->i_size);
+
+	if (from >= to || bsize == PAGE_CACHE_SIZE)
+		return;
+	/* Page straddling @from will not have any hole block created? */
+	rounded_from = round_up(from, bsize);
+	if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+		return;
+
+	index = from >> PAGE_CACHE_SHIFT;
+	page = find_lock_page(inode->i_mapping, index);
+	/* Page not cached? Nothing to do */
+	if (!page)
+		return;
+	/*
+	 * See clear_page_dirty_for_io() for details why set_page_dirty()
+	 * is needed.
+	 */
+	if (page_mkclean(page))
+		set_page_dirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
  * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
  * @inode: inode
  * @lstart: offset of beginning of hole
-- 
1.8.1.4

WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data
Date: Fri, 10 Oct 2014 16:23:39 +0200	[thread overview]
Message-ID: <1412951028-4085-35-git-send-email-jack@suse.cz> (raw)
In-Reply-To: <1412951028-4085-1-git-send-email-jack@suse.cz>

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
  ftruncate(fd, 0);
  pwrite(fd, buf, 1024, 0);
  map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
  map[0] = 'a';       ----> page_mkwrite() for index 0 is called
  ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
  mremap(map, 1024, 10000, 0);
  map[4095] = 'a';    ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c        |  3 +++
 include/linux/mm.h |  8 ++++++++
 mm/truncate.c      | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 68 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index 8f05111bbb8b..3ba5a6a1bc5f 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2080,6 +2080,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 			struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
+	loff_t old_size = inode->i_size;
 	int i_size_changed = 0;
 
 	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2099,6 +2100,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	page_cache_release(page);
 
+	if (old_size < pos)
+		pagecache_isize_extended(inode, old_size, pos);
 	/*
 	 * Don't mark the inode dirty under page lock. First, it unnecessarily
 	 * makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8981cc882ed2..f0e53e5a3b17 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1155,6 +1155,14 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
 
 extern void truncate_pagecache(struct inode *inode, loff_t new);
 extern void truncate_setsize(struct inode *inode, loff_t newsize);
+#ifdef CONFIG_MMU
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
+#else
+static inline void pagecache_isize_extended(struct inode *inode, loff_t from,
+					    loff_t to)
+{
+}
+#endif
 void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
 int truncate_inode_page(struct address_space *mapping, struct page *page);
 int generic_error_remove_page(struct address_space *mapping, struct page *page);
diff --git a/mm/truncate.c b/mm/truncate.c
index 96d167372d89..261eaf6e5a19 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
 #include <linux/buffer_head.h>	/* grr. try_to_release_page,
 				   do_invalidatepage */
 #include <linux/cleancache.h>
+#include <linux/rmap.h>
 #include "internal.h"
 
 static void clear_exceptional_entry(struct address_space *mapping,
@@ -719,12 +720,68 @@ EXPORT_SYMBOL(truncate_pagecache);
  */
 void truncate_setsize(struct inode *inode, loff_t newsize)
 {
+	loff_t oldsize = inode->i_size;
+
 	i_size_write(inode, newsize);
+	if (newsize > oldsize)
+		pagecache_isize_extended(inode, oldsize, newsize);
 	truncate_pagecache(inode, newsize);
 }
 EXPORT_SYMBOL(truncate_setsize);
 
 /**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode:	inode for which i_size was extended
+ * @from:	original inode size
+ * @to:		new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page.  This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+	int bsize = 1 << inode->i_blkbits;
+	loff_t rounded_from;
+	struct page *page;
+	pgoff_t index;
+
+	WARN_ON(!mutex_is_locked(&inode->i_mutex));
+	WARN_ON(to > inode->i_size);
+
+	if (from >= to || bsize == PAGE_CACHE_SIZE)
+		return;
+	/* Page straddling @from will not have any hole block created? */
+	rounded_from = round_up(from, bsize);
+	if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+		return;
+
+	index = from >> PAGE_CACHE_SHIFT;
+	page = find_lock_page(inode->i_mapping, index);
+	/* Page not cached? Nothing to do */
+	if (!page)
+		return;
+	/*
+	 * See clear_page_dirty_for_io() for details why set_page_dirty()
+	 * is needed.
+	 */
+	if (page_mkclean(page))
+		set_page_dirty(page);
+	unlock_page(page);
+	page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
  * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
  * @inode: inode
  * @lstart: offset of beginning of hole
-- 
1.8.1.4



  parent reply	other threads:[~2014-10-10 14:23 UTC|newest]

Thread overview: 189+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-10 14:23 [PATCH 0/2 v2] Fix data corruption when blocksize < pagesize for mmapped data Jan Kara
2014-10-10 14:23 ` [Cluster-devel] " Jan Kara
2014-10-10 14:23 ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23 ` Jan Kara
2014-10-10 14:23 ` [PATCH 1/2 RESEND] bdi: Fix hung task on sync Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] block: free q->flush_rq in blk_init_allocated_queue error paths Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 15:19   ` Dave Jones
2014-10-10 15:19     ` [Cluster-devel] " Dave Jones
2014-10-10 15:32     ` Jan Kara
2014-10-10 15:32       ` [Cluster-devel] " Jan Kara
2014-10-10 15:32       ` [Ocfs2-devel] " Jan Kara
2014-10-10 15:32       ` Jan Kara
2014-10-10 14:23 ` [PATCH] block: improve rq_affinity placement Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] block: Make rq_affinity = 1 work as expected Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] block: strict rq_affinity Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext3: Fix deadlock in data=journal mode when fs is frozen Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext3: Speedup WB_SYNC_ALL pass Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Avoid lock inversion between i_mmap_mutex and transaction start Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] ext4: Don't check quota format when there are no quota files Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] ext4: Fix block zeroing when punching holes in indirect block files Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Fix buffer double free in ext4_alloc_branch() Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Fix jbd2 warning under heavy xattr load Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Fix zeroing of page during writeback Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Remove orphan list handling Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ext4: Speedup WB_SYNC_ALL pass Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23 ` [PATCH for 3.14-stable] fanotify: fix double free of pending permission events Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] fs: Avoid userspace mounting anon_inodefs filesystem Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] jbd2: Avoid pointless scanning of checkpoint lists Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] jbd2: Optimize jbd2_log_do_checkpoint() a bit Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] lockdep: Dump info via tracing Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] mm: Fixup pagecache_isize_extended() definitions for !CONFIG_MMU Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ncpfs: fix rmdir returns Device or resource busy Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] ocfs2: Fix quota file corruption Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 1/2] printk: Debug patch1 Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] printk: debug: Slow down printing to 9600 bauds Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] printk: enable interrupts before calling console_trylock_for_printk() Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] quota: Fix race between dqput() and dquot_scan_active() Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] scsi: Keep interrupts disabled while submitting requests Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] sync: don't block the flusher thread waiting on IO Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] udf: Avoid infinite loop when processing indirect ICBs Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] udf: Print error when inode is loaded Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] vfs: Allocate anon_inode_inode in anon_inode_init() Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` Jan Kara [this message]
2014-10-10 14:23   ` [Cluster-devel] [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH RESEND] vfs: Return EINVAL for default SEEK_HOLE, SEEK_DATA implementation Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] writeback: plug writeback at a high level Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH] x86: Fixup lockdep complaint caused by io apic code Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2 RESEND] bdi: Avoid oops on device removal Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext3: Don't check quota format when there are no quota files Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext4: Fix hole punching for files with indirect blocks Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] ext4: Fix mmap data corruption when blocksize < pagesize Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] jbd2: Simplify calling convention around __jbd2_journal_clean_checkpoint_list Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
2014-10-10 14:23 ` [PATCH 2/2] printk: Debug patch 2 Jan Kara
2014-10-10 14:23   ` [Cluster-devel] " Jan Kara
2014-10-10 14:23   ` [Ocfs2-devel] " Jan Kara
2014-10-10 14:23   ` Jan Kara
  -- strict thread matches above, loose matches on Subject: below --
2014-09-25 12:41 [PATCH 0/2 v2] Fix data corruption when blocksize < pagesize for mmapped data Jan Kara
2014-09-25 12:41 ` [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data Jan Kara
2014-09-25 12:41   ` Jan Kara
2014-10-02  2:06   ` Theodore Ts'o
2014-10-02  2:06     ` Theodore Ts'o
2014-09-23 15:03 [PATCH 0/2] Fix data corruption when blocksize < pagesize Jan Kara
2014-09-23 15:03 ` [PATCH 1/2] vfs: Fix data corruption when blocksize < pagesize for mmaped data Jan Kara
2014-09-23 15:03   ` Jan Kara
2014-09-25  1:32   ` Dave Chinner
2014-09-25  9:34     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1412951028-4085-35-git-send-email-jack@suse.cz \
    --to=jack@suse.cz \
    --cc=cluster-devel@redhat.com \
    --cc=david@fromorbit.com \
    --cc=jeffm@suse.de \
    --cc=jfs-discussion@lists.sourceforge.net \
    --cc=jlbec@evilplan.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=ocfs2-devel@oss.oracle.com \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=shaggy@kernel.org \
    --cc=swhiteho@redhat.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.