linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes
@ 2024-05-15  2:28 Zhang Yi
  2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-15  2:28 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, linux-ext4, djwong, hch, brauner, david,
	chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

This series fix a stale data exposure issue reported by Chandan when
running fstests generic/561 on xfs with realtime device[1]. The real
problem is xfs_setattr_size() doesn't zero out enough range when
truncating a realtime inode, please see the third patch or [1] for
details.

The first two patches modify iomap_truncate_page() and
dax_truncate_page() to pass filesystem identified blocksize, and drop
the assumption of i_blocksize() as Dave suggested. The third patch fix
the issue by modifying xfs_truncate_page() to pass the correct
blocksize, and make sure zeroed range have been flushed to disk before
updating i_size.

[1] https://lore.kernel.org/linux-xfs/87ttj8ircu.fsf@debian-BULLSEYE-live-builder-AMD64/

Thanks,
Yi.

Zhang Yi (3):
  iomap: pass blocksize to iomap_truncate_page()
  fsdax: pass blocksize to dax_truncate_page()
  xfs: correct the zeroing truncate range

 fs/dax.c               |  7 +++----
 fs/ext2/inode.c        |  4 ++--
 fs/iomap/buffered-io.c |  7 +++----
 fs/xfs/xfs_iomap.c     | 36 ++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_iops.c      | 10 ----------
 include/linux/dax.h    |  4 ++--
 include/linux/iomap.h  |  4 ++--
 7 files changed, 44 insertions(+), 28 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page()
  2024-05-15  2:28 [PATCH 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
@ 2024-05-15  2:28 ` Zhang Yi
  2024-05-15 13:16   ` kernel test robot
  2024-05-15 13:16   ` kernel test robot
  2024-05-15  2:28 ` [PATCH 2/3] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
  2024-05-15  2:28 ` [PATCH 3/3] xfs: correct the zeroing truncate range Zhang Yi
  2 siblings, 2 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-15  2:28 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, linux-ext4, djwong, hch, brauner, david,
	chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

iomap_truncate_page() always assumes the block size of the truncating
inode is i_blocksize(), this is not always true for some filesystems,
e.g. XFS do extent size alignment for realtime inodes. Drop this
assumption and pass the block size for zeroing into
iomap_truncate_page(), allow filesystems to indicate the correct block
size.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/iomap/buffered-io.c | 7 +++----
 fs/xfs/xfs_iomap.c     | 3 ++-
 include/linux/iomap.h  | 4 ++--
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index 0926d216a5af..4cfe0a4b3325 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1445,11 +1445,10 @@ iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 EXPORT_SYMBOL_GPL(iomap_zero_range);
 
 int
-iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops)
+iomap_truncate_page(struct inode *inode, loff_t pos, loff_t blocksize,
+		bool *did_zero, const struct iomap_ops *ops)
 {
-	unsigned int blocksize = i_blocksize(inode);
-	unsigned int off = pos & (blocksize - 1);
+	unsigned int off = pos % blocksize;
 
 	/* Block boundary? Nothing to do */
 	if (!off)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 2857ef1b0272..31ac07bb8425 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1467,10 +1467,11 @@ xfs_truncate_page(
 	bool			*did_zero)
 {
 	struct inode		*inode = VFS_I(ip);
+	unsigned int		blocksize = i_blocksize(inode);
 
 	if (IS_DAX(inode))
 		return dax_truncate_page(inode, pos, did_zero,
 					&xfs_dax_write_iomap_ops);
-	return iomap_truncate_page(inode, pos, did_zero,
+	return iomap_truncate_page(inode, pos, blocksize, did_zero,
 				   &xfs_buffered_write_iomap_ops);
 }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 6fc1c858013d..27d59e464502 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -273,8 +273,8 @@ int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len,
 		bool *did_zero, const struct iomap_ops *ops);
-int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops);
+int iomap_truncate_page(struct inode *inode, loff_t pos, loff_t blocksize,
+		bool *did_zero, const struct iomap_ops *ops);
 vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
 			const struct iomap_ops *ops);
 int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] fsdax: pass blocksize to dax_truncate_page()
  2024-05-15  2:28 [PATCH 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
  2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
@ 2024-05-15  2:28 ` Zhang Yi
  2024-05-15  2:28 ` [PATCH 3/3] xfs: correct the zeroing truncate range Zhang Yi
  2 siblings, 0 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-15  2:28 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, linux-ext4, djwong, hch, brauner, david,
	chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

dax_truncate_page() always assumes the block size of the truncating
inode is i_blocksize(), this is not always true for some filesystems,
e.g. XFS do extent size alignment for realtime inodes. Drop this
assumption and pass the block size for zeroing into dax_truncate_page(),
allow filesystems to indicate the correct block size.

Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/dax.c            | 7 +++----
 fs/ext2/inode.c     | 4 ++--
 fs/xfs/xfs_iomap.c  | 2 +-
 include/linux/dax.h | 4 ++--
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index 423fc1607dfa..ae3133cc816c 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1403,11 +1403,10 @@ int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 }
 EXPORT_SYMBOL_GPL(dax_zero_range);
 
-int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops)
+int dax_truncate_page(struct inode *inode, loff_t pos, loff_t blocksize,
+		bool *did_zero, const struct iomap_ops *ops)
 {
-	unsigned int blocksize = i_blocksize(inode);
-	unsigned int off = pos & (blocksize - 1);
+	unsigned int off = pos % blocksize;
 
 	/* Block boundary? Nothing to do */
 	if (!off)
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index f3d570a9302b..fbbd479f3c4e 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1278,8 +1278,8 @@ static int ext2_setsize(struct inode *inode, loff_t newsize)
 	inode_dio_wait(inode);
 
 	if (IS_DAX(inode))
-		error = dax_truncate_page(inode, newsize, NULL,
-					  &ext2_iomap_ops);
+		error = dax_truncate_page(inode, newsize, i_blocksize(inode),
+					  NULL, &ext2_iomap_ops);
 	else
 		error = block_truncate_page(inode->i_mapping,
 				newsize, ext2_get_block);
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 31ac07bb8425..4958cc3337bc 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1470,7 +1470,7 @@ xfs_truncate_page(
 	unsigned int		blocksize = i_blocksize(inode);
 
 	if (IS_DAX(inode))
-		return dax_truncate_page(inode, pos, did_zero,
+		return dax_truncate_page(inode, pos, blocksize, did_zero,
 					&xfs_dax_write_iomap_ops);
 	return iomap_truncate_page(inode, pos, blocksize, did_zero,
 				   &xfs_buffered_write_iomap_ops);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9d3e3327af4c..861fac466c76 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -210,8 +210,8 @@ int dax_file_unshare(struct inode *inode, loff_t pos, loff_t len,
 		const struct iomap_ops *ops);
 int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero,
 		const struct iomap_ops *ops);
-int dax_truncate_page(struct inode *inode, loff_t pos, bool *did_zero,
-		const struct iomap_ops *ops);
+int dax_truncate_page(struct inode *inode, loff_t pos, loff_t blocksize,
+		bool *did_zero, const struct iomap_ops *ops);
 
 #if IS_ENABLED(CONFIG_DAX)
 int dax_read_lock(void);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] xfs: correct the zeroing truncate range
  2024-05-15  2:28 [PATCH 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
  2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
  2024-05-15  2:28 ` [PATCH 2/3] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
@ 2024-05-15  2:28 ` Zhang Yi
  2 siblings, 0 replies; 6+ messages in thread
From: Zhang Yi @ 2024-05-15  2:28 UTC (permalink / raw)
  To: linux-xfs, linux-fsdevel
  Cc: linux-kernel, linux-ext4, djwong, hch, brauner, david,
	chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1, yukuai3

From: Zhang Yi <yi.zhang@huawei.com>

When truncating a realtime file unaligned to a shorter size,
xfs_setattr_size() only flush the EOF page before zeroing out, and
xfs_truncate_page() also only zeros the EOF block. This could expose
stale data since 943bc0882ceb ("iomap: don't increase i_size if it's not
a write operation").

If the sb_rextsize is bigger than one block, and we have a realtime
inode that contains a long enough written extent. If we unaligned
truncate into the middle of this extent, xfs_itruncate_extents() could
split the extent and align the it's tail to sb_rextsize, there maybe
have more than one blocks more between the end of the file. Since
xfs_truncate_page() only zeros the trailing portion of the i_blocksize()
value, so it may leftover some blocks contains stale data that could be
exposed if we append write it over a long enough distance later.

xfs_truncate_page() should flush, zeros out the entire rtextsize range,
and make sure the entire zeroed range have been flushed to disk before
updating the inode size.

Fixes: 943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/xfs/xfs_iomap.c | 35 +++++++++++++++++++++++++++++++----
 fs/xfs/xfs_iops.c  | 10 ----------
 2 files changed, 31 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 4958cc3337bc..fc379450fe74 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -1466,12 +1466,39 @@ xfs_truncate_page(
 	loff_t			pos,
 	bool			*did_zero)
 {
+	struct xfs_mount	*mp = ip->i_mount;
 	struct inode		*inode = VFS_I(ip);
 	unsigned int		blocksize = i_blocksize(inode);
+	int			error;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		blocksize = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize);
+
+	/*
+	 * iomap won't detect a dirty page over an unwritten block (or a
+	 * cow block over a hole) and subsequently skips zeroing the
+	 * newly post-EOF portion of the page. Flush the new EOF to
+	 * convert the block before the pagecache truncate.
+	 */
+	error = filemap_write_and_wait_range(inode->i_mapping, pos,
+					     roundup_64(pos, blocksize));
+	if (error)
+		return error;
 
 	if (IS_DAX(inode))
-		return dax_truncate_page(inode, pos, blocksize, did_zero,
-					&xfs_dax_write_iomap_ops);
-	return iomap_truncate_page(inode, pos, blocksize, did_zero,
-				   &xfs_buffered_write_iomap_ops);
+		error = dax_truncate_page(inode, pos, blocksize, did_zero,
+					  &xfs_dax_write_iomap_ops);
+	else
+		error = iomap_truncate_page(inode, pos, blocksize, did_zero,
+					    &xfs_buffered_write_iomap_ops);
+	if (error)
+		return error;
+
+	/*
+	 * Write back path won't write dirty blocks post EOF folio,
+	 * flush the entire zeroed range before updating the inode
+	 * size.
+	 */
+	return filemap_write_and_wait_range(inode->i_mapping, pos,
+					    roundup_64(pos, blocksize));
 }
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 66f8c47642e8..baeeddf4a6bb 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -845,16 +845,6 @@ xfs_setattr_size(
 		error = xfs_zero_range(ip, oldsize, newsize - oldsize,
 				&did_zeroing);
 	} else {
-		/*
-		 * iomap won't detect a dirty page over an unwritten block (or a
-		 * cow block over a hole) and subsequently skips zeroing the
-		 * newly post-EOF portion of the page. Flush the new EOF to
-		 * convert the block before the pagecache truncate.
-		 */
-		error = filemap_write_and_wait_range(inode->i_mapping, newsize,
-						     newsize);
-		if (error)
-			return error;
 		error = xfs_truncate_page(ip, newsize, &did_zeroing);
 	}
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page()
  2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
  2024-05-15 13:16   ` kernel test robot
@ 2024-05-15 13:16   ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-05-15 13:16 UTC (permalink / raw)
  To: Zhang Yi, linux-xfs, linux-fsdevel
  Cc: oe-kbuild-all, linux-kernel, linux-ext4, djwong, hch, brauner,
	david, chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1,
	yukuai3

Hi Zhang,

kernel test robot noticed the following build errors:

[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on linus/master v6.9 next-20240515]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zhang-Yi/iomap-pass-blocksize-to-iomap_truncate_page/20240515-104121
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link:    https://lore.kernel.org/r/20240515022829.2455554-2-yi.zhang%40huaweicloud.com
patch subject: [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page()
config: powerpc-allnoconfig (https://download.01.org/0day-ci/archive/20240515/202405152010.jZ3OhPim-lkp@intel.com/config)
compiler: powerpc-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240515/202405152010.jZ3OhPim-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405152010.jZ3OhPim-lkp@intel.com/

All errors (new ones prefixed by >>):

   powerpc-linux-ld: fs/iomap/buffered-io.o: in function `iomap_truncate_page':
>> buffered-io.c:(.text+0x4398): undefined reference to `__moddi3'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page()
  2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
@ 2024-05-15 13:16   ` kernel test robot
  2024-05-15 13:16   ` kernel test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2024-05-15 13:16 UTC (permalink / raw)
  To: Zhang Yi, linux-xfs, linux-fsdevel
  Cc: oe-kbuild-all, linux-kernel, linux-ext4, djwong, hch, brauner,
	david, chandanbabu, jack, yi.zhang, yi.zhang, chengzhihao1,
	yukuai3

Hi Zhang,

kernel test robot noticed the following build errors:

[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on linus/master v6.9 next-20240515]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zhang-Yi/iomap-pass-blocksize-to-iomap_truncate_page/20240515-104121
base:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link:    https://lore.kernel.org/r/20240515022829.2455554-2-yi.zhang%40huaweicloud.com
patch subject: [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page()
config: xtensa-allnoconfig (https://download.01.org/0day-ci/archive/20240515/202405152037.DjvUiyJ1-lkp@intel.com/config)
compiler: xtensa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240515/202405152037.DjvUiyJ1-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405152037.DjvUiyJ1-lkp@intel.com/

All errors (new ones prefixed by >>):

   xtensa-linux-ld: fs/iomap/buffered-io.o: in function `iomap_file_unshare':
   buffered-io.c:(.text+0x1f48): undefined reference to `__moddi3'
>> xtensa-linux-ld: buffered-io.c:(.text+0x1f57): undefined reference to `__moddi3'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-15 13:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-15  2:28 [PATCH 0/3] iomap/xfs: fix stale data exposure when truncating realtime inodes Zhang Yi
2024-05-15  2:28 ` [PATCH 1/3] iomap: pass blocksize to iomap_truncate_page() Zhang Yi
2024-05-15 13:16   ` kernel test robot
2024-05-15 13:16   ` kernel test robot
2024-05-15  2:28 ` [PATCH 2/3] fsdax: pass blocksize to dax_truncate_page() Zhang Yi
2024-05-15  2:28 ` [PATCH 3/3] xfs: correct the zeroing truncate range Zhang Yi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).