linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap
@ 2020-02-28  9:26 Ritesh Harjani
  2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

Hello All, 

PATCHv4 -> PATCHv5
Fixed static symbol warning. Added Reported-by & Reviewed-by tags.

Background
==========
These are v4 patches to move ext4 bmap & fiemap calls to use iomap APIs.
Previous version can be found in links mentioned below.
After some discussions with the community in [RFCv2] all of the below
observational differences between the old and the new (iomap based
implementation) has been agreed upon. It looks like we should be good to move
ext4_fiemap & ext4_bmap too to iomap interface.

FYI - this patch reduces the users of ext4_get_block API and thus a step
towards getting rid of buffer_heads from ext4.
Also reduces a lot of code by making use of existing iomap_ops.

PATCHv3 -> PATCHv4
==================
1. Patch-1: Fixed indentation and checking of flags EXT4_MAP_UNWRITTEN.
2. Patch-4: Moved the checking of offset beyond what indirect mapped file inode
   can support, to early in ext4_iomap_begin_report().
3. Patch-5: Fixed no in-inode & no external block case in case of xattr.
   Returning -ENOENT in that case.
4. Patch-6: Added more info in documentation about when FIEMAP_EXTENT_LAST could
   be set.


RFCv2 -> PATCHv3
================
1. Fixed IOMAP_INLINE & IOMAP_MAPPED flag setting in *xattr_fiemap() based
   on, whether it is inline v/s external block.
2. Fixed fiemap for non-extent based mapping. [PATCHv3 4/6] fixes it.
3. Updated the documentation for description about FIEMAP_EXTENT_LAST flag.
   [PATCHv3 6/6].


Testing (done on ext4 master branch)
========
'xfstests -g quick' passes with default mkfs/mount configuration
(v/s which also pass with vanilla kernel without this patch). Except
generic/473 which also failes on XFS. This seems to be the test case issue
since it expects the data in slightly different way as compared to what iomap
returns.
Point 2.a below describes more about this.

Observations/Review required
============================
1. bmap related old v/s new method differences:-
	a. In case if addr > INT_MAX, it issues a warning and
	   returns 0 as the block no. While earlier it used to return the
	   truncated value with no warning.
	[Again this should be fine, it was just an observation worth mentioning]

	b. block no. is only returned in case of iomap->type is IOMAP_MAPPED,
	   but not when iomap->type is IOMAP_UNWRITTEN. While with previously
	   we used to get block no. for both of above cases.
	[Darrick:- not much reason to map unwritten blocks. So this may not
	 be relevant here [5]]

2. Fiemap related old v/s new method differences:-
	a. iomap_fiemap returns the disk extent information in exact
	   correspondence with start of user requested logical offset till the
	   length requested by user. While in previous implementation the
	   returned information used to give the complete extent information if
	   the range requested by user lies in between the extent mapping.
	[Both behaviors should be fine here as per documentation - [5]]

	b. iomap_fiemap adds the FIEMAP_EXTENT_LAST flag also at the last
	   fiemap_extent mapping range requested by the user via fm_length (
	   if that has a valid mapped extent on the disk). But if the user
	   requested for more fm_length which could not be mapped in the last
	   fiemap_extent, then the flag is not set.
	[This does not seems to be an issue after some community discussion.
	Since this flag is not consistent across different filesystems.
	In ext4 itself for extent v/s non-extent based mapping, this flag is
	set differently. So we rather decided to update the documentation rather
	than complicating it more, which anyway no one seems to cares about -
	[6,7]]
	   

Below is CTRL-C -> CTRL-V from from previous versions
=====================================================

e.g. output for above differences 2.a & 2.b
===========================================
create a file with below cmds. 
1. fallocate -o 0 -l 8K testfile.txt
2. xfs_io -c "pwrite 8K 8K" testfile.txt
3. check extent mapping:- xfs_io -c "fiemap -v" testfile.txt
4. run this binary on with and without these patches:- ./a.out (test_fiemap_diff.c) [4]

o/p of xfs_io -c "fiemap -v"
============================================
With this patch on patched kernel:-
testfile.txt:
 EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
   0: [0..15]:         122802736..122802751    16 0x800
   1: [16..31]:        122687536..122687551    16   0x1

without patch on vanilla kernel (no difference):-
testfile.txt:
 EXT: FILE-OFFSET      BLOCK-RANGE          TOTAL FLAGS
   0: [0..15]:         332211376..332211391    16 0x800
   1: [16..31]:        332722392..332722407    16   0x1


o/p of a.out without patch:-
================
riteshh-> ./a.out 
logical: [       0..      15] phys: 332211376..332211391 flags: 0x800 tot: 16
(0) extent flag = 2048

o/p of a.out with patch (both point 2.a & 2.b could be seen)
=======================
riteshh-> ./a.out
logical: [       0..       7] phys: 122802736..122802743 flags: 0x801 tot: 8
(0) extent flag = 2049

FYI - In test_fiemap_diff.c test we had 
a. fm_extent_count = 1
b. fm_start = 0
c. fm_length = 4K
Whereas when we change fm_extent_count = 32, then we don't see any difference.

e.g. output for above difference listed in point 1.b
====================================================

o/p without patch (block no returned for unwritten block as well)
=========Testing IOCTL FIBMAP=========
File size = 16384, blkcnt = 4, blocksize = 4096
  0   41526422
  1   41526423
  2   41590299
  3   41590300

o/p with patch (0 returned for unwritten block)
=========Testing IOCTL FIBMAP=========
File size = 16384, blkcnt = 4, blocksize = 4096
  0          0          0
  1          0          0
  2   15335942      29953
  3   15335943      29953

Summary:-
========
Due to some of the observational differences to user, listed above,
requesting to please help with a careful review in moving this to iomap.
Digging into some older threads, it looks like these differences should be fine,
since the same tools have been working fine with XFS (which uses iomap based
implementation) [1]
Also as Ted suggested in [3]: Fiemap & bmap spec could be made based on the ext4
implementation. But since all the tools also work with xfs which uses iomap
based fiemap, so we should be good there.


References of some previous discussions:
=======================================
[RFCv1]: https://www.spinics.net/lists/linux-ext4/msg67077.html
[RFCv2]: https://marc.info/?l=linux-ext4&m=158020672413871&w=2 
[PATCHv3]: https://www.spinics.net/lists/linux-ext4/msg70403.html
[PATCHv4]: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=161168
[1]: https://www.spinics.net/lists/linux-fsdevel/msg128370.html
[2]: https://www.spinics.net/lists/linux-fsdevel/msg127675.html
[3]: https://www.spinics.net/lists/linux-fsdevel/msg128368.html
[4]: https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/tools/test_fiemap_diff.c
[5]: https://marc.info/?l=linux-fsdevel&m=158040005907862&w=2
[6]: https://marc.info/?l=linux-fsdevel&m=158221859807604&w=2
[7]: https://marc.info/?l=linux-ext4&m=158228563431539&w=2

Ritesh Harjani (6):
  ext4: Add IOMAP_F_MERGED for non-extent based mapping
  ext4: Optimize ext4_ext_precache for 0 depth
  ext4: Move ext4 bmap to use iomap infrastructure.
  ext4: Make ext4_ind_map_blocks work with fiemap
  ext4: Move ext4_fiemap to use iomap framework.
  Documentation: Correct the description of FIEMAP_EXTENT_LAST

 Documentation/filesystems/fiemap.txt |  10 +-
 fs/ext4/extents.c                    | 299 +++++----------------------
 fs/ext4/inline.c                     |  41 ----
 fs/ext4/inode.c                      |  22 +-
 4 files changed, 80 insertions(+), 292 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-02-28 15:26   ` Darrick J. Wong
  2020-03-13 19:52   ` Theodore Y. Ts'o
  2020-02-28  9:26 ` [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth Ritesh Harjani
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

IOMAP_F_MERGED needs to be set in case of non-extent based mapping.
This is needed in later patches for conversion of ext4_fiemap to use iomap.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d035acab5b2a..6cf3b969dc86 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3335,6 +3335,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
 	iomap->offset = (u64) map->m_lblk << blkbits;
 	iomap->length = (u64) map->m_len << blkbits;
 
+	if ((map->m_flags & EXT4_MAP_MAPPED) &&
+	    !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
+		iomap->flags |= IOMAP_F_MERGED;
+
 	/*
 	 * Flags passed to ext4_map_blocks() for direct I/O writes can result
 	 * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
  2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-03-13 20:05   ` Theodore Y. Ts'o
  2020-02-28  9:26 ` [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure Ritesh Harjani
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

This patch avoids the memory alloc & free path when depth is 0,
since anyway there is no extra caching done in that case.
So on checking depth 0, simply return early.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index ee83fe7c98aa..0de548bb3c90 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -594,6 +594,12 @@ int ext4_ext_precache(struct inode *inode)
 	down_read(&ei->i_data_sem);
 	depth = ext_depth(inode);
 
+	/* Don't cache anything if there are no external extent blocks */
+	if (!depth) {
+		up_read(&ei->i_data_sem);
+		return ret;
+	}
+
 	path = kcalloc(depth + 1, sizeof(struct ext4_ext_path),
 		       GFP_NOFS);
 	if (path == NULL) {
@@ -601,9 +607,6 @@ int ext4_ext_precache(struct inode *inode)
 		return -ENOMEM;
 	}
 
-	/* Don't cache anything if there are no external extent blocks */
-	if (depth == 0)
-		goto out;
 	path[0].p_hdr = ext_inode_hdr(inode);
 	ret = ext4_ext_check(inode, path[0].p_hdr, depth, 0);
 	if (ret)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
  2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
  2020-02-28  9:26 ` [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-02-28 15:25   ` Darrick J. Wong
  2020-03-13 20:16   ` Theodore Y. Ts'o
  2020-02-28  9:26 ` [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap Ritesh Harjani
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

ext4_iomap_begin is already implemented which provides ext4_map_blocks,
so just move the API from generic_block_bmap to iomap_bmap for iomap
conversion.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6cf3b969dc86..81fccbae0aea 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 			return 0;
 	}
 
-	return generic_block_bmap(mapping, block, ext4_get_block);
+	return iomap_bmap(mapping, block, &ext4_iomap_ops);
 }
 
 static int ext4_readpage(struct file *file, struct page *page)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
                   ` (2 preceding siblings ...)
  2020-02-28  9:26 ` [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-03-13 20:18   ` Theodore Y. Ts'o
  2020-02-28  9:26 ` [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework Ritesh Harjani
  2020-02-28  9:26 ` [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST Ritesh Harjani
  5 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

For indirect block mapping if the i_block > max supported block in inode
then ext4_ind_map_blocks() returns a -EIO error. But in case of fiemap
this could be a valid query to ->iomap_begin call.
So check if the offset >= s_bitmap_maxbytes in ext4_iomap_begin_report(),
then simply skip calling ext4_map_blocks().

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 81fccbae0aea..4364864fc709 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3548,12 +3548,28 @@ static int ext4_iomap_begin_report(struct inode *inode, loff_t offset,
 	map.m_len = min_t(loff_t, (offset + length - 1) >> blkbits,
 			  EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
 
+	/*
+	 * Fiemap callers may call for offset beyond s_bitmap_maxbytes.
+	 * So handle it here itself instead of querying ext4_map_blocks().
+	 * Since ext4_map_blocks() will warn about it and will return
+	 * -EIO error.
+	 */
+	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) {
+		struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+
+		if (offset >= sbi->s_bitmap_maxbytes) {
+			map.m_flags = 0;
+			goto set_iomap;
+		}
+	}
+
 	ret = ext4_map_blocks(NULL, inode, &map, 0);
 	if (ret < 0)
 		return ret;
 	if (ret == 0)
 		delalloc = ext4_iomap_is_delalloc(inode, &map);
 
+set_iomap:
 	ext4_set_iomap(inode, iomap, &map, offset, length);
 	if (delalloc && iomap->type == IOMAP_HOLE)
 		iomap->type = IOMAP_DELALLOC;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework.
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
                   ` (3 preceding siblings ...)
  2020-02-28  9:26 ` [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-02-28 15:21   ` Darrick J. Wong
  2020-03-14  3:03   ` Theodore Y. Ts'o
  2020-02-28  9:26 ` [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST Ritesh Harjani
  5 siblings, 2 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani, kbuild test robot

This patch moves ext4_fiemap to use iomap framework.
For xattr a new 'ext4_iomap_xattr_ops' is added.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
Reported-by: kbuild test robot <lkp@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/extents.c | 290 ++++++++--------------------------------------
 fs/ext4/inline.c  |  41 -------
 2 files changed, 48 insertions(+), 283 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 0de548bb3c90..0e3dfea0c065 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -28,6 +28,7 @@
 #include <linux/uaccess.h>
 #include <linux/fiemap.h>
 #include <linux/backing-dev.h>
+#include <linux/iomap.h>
 #include "ext4_jbd2.h"
 #include "ext4_extents.h"
 #include "xattr.h"
@@ -97,9 +98,6 @@ static int ext4_split_extent_at(handle_t *handle,
 			     int split_flag,
 			     int flags);
 
-static int ext4_find_delayed_extent(struct inode *inode,
-				    struct extent_status *newes);
-
 static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
 {
 	/*
@@ -2176,155 +2174,6 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
 	return err;
 }
 
-static int ext4_fill_fiemap_extents(struct inode *inode,
-				    ext4_lblk_t block, ext4_lblk_t num,
-				    struct fiemap_extent_info *fieinfo)
-{
-	struct ext4_ext_path *path = NULL;
-	struct ext4_extent *ex;
-	struct extent_status es;
-	ext4_lblk_t next, next_del, start = 0, end = 0;
-	ext4_lblk_t last = block + num;
-	int exists, depth = 0, err = 0;
-	unsigned int flags = 0;
-	unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
-
-	while (block < last && block != EXT_MAX_BLOCKS) {
-		num = last - block;
-		/* find extent for this block */
-		down_read(&EXT4_I(inode)->i_data_sem);
-
-		path = ext4_find_extent(inode, block, &path, 0);
-		if (IS_ERR(path)) {
-			up_read(&EXT4_I(inode)->i_data_sem);
-			err = PTR_ERR(path);
-			path = NULL;
-			break;
-		}
-
-		depth = ext_depth(inode);
-		if (unlikely(path[depth].p_hdr == NULL)) {
-			up_read(&EXT4_I(inode)->i_data_sem);
-			EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
-			err = -EFSCORRUPTED;
-			break;
-		}
-		ex = path[depth].p_ext;
-		next = ext4_ext_next_allocated_block(path);
-
-		flags = 0;
-		exists = 0;
-		if (!ex) {
-			/* there is no extent yet, so try to allocate
-			 * all requested space */
-			start = block;
-			end = block + num;
-		} else if (le32_to_cpu(ex->ee_block) > block) {
-			/* need to allocate space before found extent */
-			start = block;
-			end = le32_to_cpu(ex->ee_block);
-			if (block + num < end)
-				end = block + num;
-		} else if (block >= le32_to_cpu(ex->ee_block)
-					+ ext4_ext_get_actual_len(ex)) {
-			/* need to allocate space after found extent */
-			start = block;
-			end = block + num;
-			if (end >= next)
-				end = next;
-		} else if (block >= le32_to_cpu(ex->ee_block)) {
-			/*
-			 * some part of requested space is covered
-			 * by found extent
-			 */
-			start = block;
-			end = le32_to_cpu(ex->ee_block)
-				+ ext4_ext_get_actual_len(ex);
-			if (block + num < end)
-				end = block + num;
-			exists = 1;
-		} else {
-			BUG();
-		}
-		BUG_ON(end <= start);
-
-		if (!exists) {
-			es.es_lblk = start;
-			es.es_len = end - start;
-			es.es_pblk = 0;
-		} else {
-			es.es_lblk = le32_to_cpu(ex->ee_block);
-			es.es_len = ext4_ext_get_actual_len(ex);
-			es.es_pblk = ext4_ext_pblock(ex);
-			if (ext4_ext_is_unwritten(ex))
-				flags |= FIEMAP_EXTENT_UNWRITTEN;
-		}
-
-		/*
-		 * Find delayed extent and update es accordingly. We call
-		 * it even in !exists case to find out whether es is the
-		 * last existing extent or not.
-		 */
-		next_del = ext4_find_delayed_extent(inode, &es);
-		if (!exists && next_del) {
-			exists = 1;
-			flags |= (FIEMAP_EXTENT_DELALLOC |
-				  FIEMAP_EXTENT_UNKNOWN);
-		}
-		up_read(&EXT4_I(inode)->i_data_sem);
-
-		if (unlikely(es.es_len == 0)) {
-			EXT4_ERROR_INODE(inode, "es.es_len == 0");
-			err = -EFSCORRUPTED;
-			break;
-		}
-
-		/*
-		 * This is possible iff next == next_del == EXT_MAX_BLOCKS.
-		 * we need to check next == EXT_MAX_BLOCKS because it is
-		 * possible that an extent is with unwritten and delayed
-		 * status due to when an extent is delayed allocated and
-		 * is allocated by fallocate status tree will track both of
-		 * them in a extent.
-		 *
-		 * So we could return a unwritten and delayed extent, and
-		 * its block is equal to 'next'.
-		 */
-		if (next == next_del && next == EXT_MAX_BLOCKS) {
-			flags |= FIEMAP_EXTENT_LAST;
-			if (unlikely(next_del != EXT_MAX_BLOCKS ||
-				     next != EXT_MAX_BLOCKS)) {
-				EXT4_ERROR_INODE(inode,
-						 "next extent == %u, next "
-						 "delalloc extent = %u",
-						 next, next_del);
-				err = -EFSCORRUPTED;
-				break;
-			}
-		}
-
-		if (exists) {
-			err = fiemap_fill_next_extent(fieinfo,
-				(__u64)es.es_lblk << blksize_bits,
-				(__u64)es.es_pblk << blksize_bits,
-				(__u64)es.es_len << blksize_bits,
-				flags);
-			if (err < 0)
-				break;
-			if (err == 1) {
-				err = 0;
-				break;
-			}
-		}
-
-		block = es.es_lblk + es.es_len;
-	}
-
-	ext4_ext_drop_refs(path);
-	kfree(path);
-	return err;
-}
-
 static int ext4_fill_es_cache_info(struct inode *inode,
 				   ext4_lblk_t block, ext4_lblk_t num,
 				   struct fiemap_extent_info *fieinfo)
@@ -5058,64 +4907,13 @@ int ext4_convert_unwritten_io_end_vec(handle_t *handle, ext4_io_end_t *io_end)
 	return ret < 0 ? ret : err;
 }
 
-/*
- * If newes is not existing extent (newes->ec_pblk equals zero) find
- * delayed extent at start of newes and update newes accordingly and
- * return start of the next delayed extent.
- *
- * If newes is existing extent (newes->ec_pblk is not equal zero)
- * return start of next delayed extent or EXT_MAX_BLOCKS if no delayed
- * extent found. Leave newes unmodified.
- */
-static int ext4_find_delayed_extent(struct inode *inode,
-				    struct extent_status *newes)
-{
-	struct extent_status es;
-	ext4_lblk_t block, next_del;
-
-	if (newes->es_pblk == 0) {
-		ext4_es_find_extent_range(inode, &ext4_es_is_delayed,
-					  newes->es_lblk,
-					  newes->es_lblk + newes->es_len - 1,
-					  &es);
-
-		/*
-		 * No extent in extent-tree contains block @newes->es_pblk,
-		 * then the block may stay in 1)a hole or 2)delayed-extent.
-		 */
-		if (es.es_len == 0)
-			/* A hole found. */
-			return 0;
-
-		if (es.es_lblk > newes->es_lblk) {
-			/* A hole found. */
-			newes->es_len = min(es.es_lblk - newes->es_lblk,
-					    newes->es_len);
-			return 0;
-		}
-
-		newes->es_len = es.es_lblk + es.es_len - newes->es_lblk;
-	}
-
-	block = newes->es_lblk + newes->es_len;
-	ext4_es_find_extent_range(inode, &ext4_es_is_delayed, block,
-				  EXT_MAX_BLOCKS, &es);
-	if (es.es_len == 0)
-		next_del = EXT_MAX_BLOCKS;
-	else
-		next_del = es.es_lblk;
-
-	return next_del;
-}
-
-static int ext4_xattr_fiemap(struct inode *inode,
-				struct fiemap_extent_info *fieinfo)
+static int ext4_iomap_xattr_fiemap(struct inode *inode, struct iomap *iomap)
 {
 	__u64 physical = 0;
-	__u64 length;
-	__u32 flags = FIEMAP_EXTENT_LAST;
+	__u64 length = 0;
 	int blockbits = inode->i_sb->s_blocksize_bits;
 	int error = 0;
+	u16 iomap_type;
 
 	/* in-inode? */
 	if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
@@ -5130,40 +4928,49 @@ static int ext4_xattr_fiemap(struct inode *inode,
 				EXT4_I(inode)->i_extra_isize;
 		physical += offset;
 		length = EXT4_SB(inode->i_sb)->s_inode_size - offset;
-		flags |= FIEMAP_EXTENT_DATA_INLINE;
 		brelse(iloc.bh);
-	} else { /* external block */
+		iomap_type = IOMAP_INLINE;
+	} else if (EXT4_I(inode)->i_file_acl) { /* external block */
 		physical = (__u64)EXT4_I(inode)->i_file_acl << blockbits;
 		length = inode->i_sb->s_blocksize;
+		iomap_type = IOMAP_MAPPED;
+	} else {
+		/* no in-inode or external block for xattr, so return -ENOENT */
+		error = -ENOENT;
+		goto out;
 	}
 
-	if (physical)
-		error = fiemap_fill_next_extent(fieinfo, 0, physical,
-						length, flags);
-	return (error < 0 ? error : 0);
+	iomap->addr = physical;
+	iomap->offset = 0;
+	iomap->length = length;
+	iomap->type = iomap_type;
+	iomap->flags = 0;
+out:
+	return error;
 }
 
-static int _ext4_fiemap(struct inode *inode,
-			struct fiemap_extent_info *fieinfo,
-			__u64 start, __u64 len,
-			int (*fill)(struct inode *, ext4_lblk_t,
-				    ext4_lblk_t,
-				    struct fiemap_extent_info *))
+static int ext4_iomap_xattr_begin(struct inode *inode, loff_t offset,
+				  loff_t length, unsigned flags,
+				  struct iomap *iomap, struct iomap *srcmap)
 {
-	ext4_lblk_t start_blk;
-	u32 ext4_fiemap_flags = FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR;
+	int error;
 
-	int error = 0;
-
-	if (ext4_has_inline_data(inode)) {
-		int has_inline = 1;
+	error = ext4_iomap_xattr_fiemap(inode, iomap);
+	if (error == 0 && (offset >= iomap->length))
+		error = -ENOENT;
+	return error;
+}
 
-		error = ext4_inline_data_fiemap(inode, fieinfo, &has_inline,
-						start, len);
+static const struct iomap_ops ext4_iomap_xattr_ops = {
+	.iomap_begin		= ext4_iomap_xattr_begin,
+};
 
-		if (has_inline)
-			return error;
-	}
+static int _ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
+			__u64 start, __u64 len, bool from_es_cache)
+{
+	ext4_lblk_t start_blk;
+	u32 ext4_fiemap_flags = FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR;
+	int error = 0;
 
 	if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
 		error = ext4_ext_precache(inode);
@@ -5172,19 +4979,19 @@ static int _ext4_fiemap(struct inode *inode,
 		fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE;
 	}
 
-	/* fallback to generic here if not in extents fmt */
-	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) &&
-	    fill == ext4_fill_fiemap_extents)
-		return generic_block_fiemap(inode, fieinfo, start, len,
-			ext4_get_block);
-
-	if (fill == ext4_fill_es_cache_info)
+	if (from_es_cache)
 		ext4_fiemap_flags &= FIEMAP_FLAG_XATTR;
+
 	if (fiemap_check_flags(fieinfo, ext4_fiemap_flags))
 		return -EBADR;
 
 	if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
-		error = ext4_xattr_fiemap(inode, fieinfo);
+		fieinfo->fi_flags &= ~FIEMAP_FLAG_XATTR;
+		error = iomap_fiemap(inode, fieinfo, start, len,
+				     &ext4_iomap_xattr_ops);
+	} else if (!from_es_cache) {
+		error = iomap_fiemap(inode, fieinfo, start, len,
+				     &ext4_iomap_report_ops);
 	} else {
 		ext4_lblk_t len_blks;
 		__u64 last_blk;
@@ -5199,7 +5006,8 @@ static int _ext4_fiemap(struct inode *inode,
 		 * Walk the extent tree gathering extent information
 		 * and pushing extents back to the user.
 		 */
-		error = fill(inode, start_blk, len_blks, fieinfo);
+		error = ext4_fill_es_cache_info(inode, start_blk, len_blks,
+						fieinfo);
 	}
 	return error;
 }
@@ -5207,8 +5015,7 @@ static int _ext4_fiemap(struct inode *inode,
 int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
 		__u64 start, __u64 len)
 {
-	return _ext4_fiemap(inode, fieinfo, start, len,
-			    ext4_fill_fiemap_extents);
+	return _ext4_fiemap(inode, fieinfo, start, len, false);
 }
 
 int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
@@ -5224,8 +5031,7 @@ int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
 			return 0;
 	}
 
-	return _ext4_fiemap(inode, fieinfo, start, len,
-			    ext4_fill_es_cache_info);
+	return _ext4_fiemap(inode, fieinfo, start, len, true);
 }
 
 
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index e61603f47035..b8b99634c832 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -1857,47 +1857,6 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
 	return error;
 }
 
-int ext4_inline_data_fiemap(struct inode *inode,
-			    struct fiemap_extent_info *fieinfo,
-			    int *has_inline, __u64 start, __u64 len)
-{
-	__u64 physical = 0;
-	__u64 inline_len;
-	__u32 flags = FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED |
-		FIEMAP_EXTENT_LAST;
-	int error = 0;
-	struct ext4_iloc iloc;
-
-	down_read(&EXT4_I(inode)->xattr_sem);
-	if (!ext4_has_inline_data(inode)) {
-		*has_inline = 0;
-		goto out;
-	}
-	inline_len = min_t(size_t, ext4_get_inline_size(inode),
-			   i_size_read(inode));
-	if (start >= inline_len)
-		goto out;
-	if (start + len < inline_len)
-		inline_len = start + len;
-	inline_len -= start;
-
-	error = ext4_get_inode_loc(inode, &iloc);
-	if (error)
-		goto out;
-
-	physical = (__u64)iloc.bh->b_blocknr << inode->i_sb->s_blocksize_bits;
-	physical += (char *)ext4_raw_inode(&iloc) - iloc.bh->b_data;
-	physical += offsetof(struct ext4_inode, i_block);
-
-	brelse(iloc.bh);
-out:
-	up_read(&EXT4_I(inode)->xattr_sem);
-	if (physical)
-		error = fiemap_fill_next_extent(fieinfo, start, physical,
-						inline_len, flags);
-	return (error < 0 ? error : 0);
-}
-
 int ext4_inline_data_truncate(struct inode *inode, int *has_inline)
 {
 	handle_t *handle;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST
  2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
                   ` (4 preceding siblings ...)
  2020-02-28  9:26 ` [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework Ritesh Harjani
@ 2020-02-28  9:26 ` Ritesh Harjani
  2020-02-28 15:20   ` Darrick J. Wong
  2020-02-28 15:36   ` Matthew Wilcox
  5 siblings, 2 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-02-28  9:26 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, adilger.kernel, linux-fsdevel, darrick.wong, hch,
	cmaiolino, david, Ritesh Harjani

Currently FIEMAP_EXTENT_LAST is not working consistently across
different filesystem's fiemap implementations. So add more information
about how else this flag could set in other implementation.

Also in general, user should not completely rely on this flag as
such since it could return false value for e.g.
when there is a delalloc extent which might get converted during
writeback, immediately after the fiemap calls return.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 Documentation/filesystems/fiemap.txt | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
index f6d9c99103a4..fedfa9b9dde5 100644
--- a/Documentation/filesystems/fiemap.txt
+++ b/Documentation/filesystems/fiemap.txt
@@ -71,8 +71,7 @@ allocated is less than would be required to map the requested range,
 the maximum number of extents that can be mapped in the fm_extent[]
 array will be returned and fm_mapped_extents will be equal to
 fm_extent_count. In that case, the last extent in the array will not
-complete the requested range and will not have the FIEMAP_EXTENT_LAST
-flag set (see the next section on extent flags).
+complete the requested range.
 
 Each extent is described by a single fiemap_extent structure as
 returned in fm_extents.
@@ -96,7 +95,7 @@ block size of the file system.  With the exception of extents flagged as
 FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
 
 The fe_flags field contains flags which describe the extent returned.
-A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
+A special flag, FIEMAP_EXTENT_LAST *may be* set on the last extent in
 the file so that the process making fiemap calls can determine when no
 more extents are available, without having to call the ioctl again.
 
@@ -115,8 +114,9 @@ data. Note that the opposite is not true - it would be valid for
 FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
 
 * FIEMAP_EXTENT_LAST
-This is the last extent in the file. A mapping attempt past this
-extent will return nothing.
+This is generally the last extent in the file. A mapping attempt past this
+extent may return nothing. In some implementations this flag is also set on
+the last dataset queried by the user (via fiemap->fm_length).
 
 * FIEMAP_EXTENT_UNKNOWN
 The location of this extent is currently unknown. This may indicate
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST
  2020-02-28  9:26 ` [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST Ritesh Harjani
@ 2020-02-28 15:20   ` Darrick J. Wong
  2020-02-28 15:36   ` Matthew Wilcox
  1 sibling, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2020-02-28 15:20 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:59PM +0530, Ritesh Harjani wrote:
> Currently FIEMAP_EXTENT_LAST is not working consistently across
> different filesystem's fiemap implementations. So add more information
> about how else this flag could set in other implementation.
> 
> Also in general, user should not completely rely on this flag as
> such since it could return false value for e.g.
> when there is a delalloc extent which might get converted during
> writeback, immediately after the fiemap calls return.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Looks reasonable,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  Documentation/filesystems/fiemap.txt | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
> index f6d9c99103a4..fedfa9b9dde5 100644
> --- a/Documentation/filesystems/fiemap.txt
> +++ b/Documentation/filesystems/fiemap.txt
> @@ -71,8 +71,7 @@ allocated is less than would be required to map the requested range,
>  the maximum number of extents that can be mapped in the fm_extent[]
>  array will be returned and fm_mapped_extents will be equal to
>  fm_extent_count. In that case, the last extent in the array will not
> -complete the requested range and will not have the FIEMAP_EXTENT_LAST
> -flag set (see the next section on extent flags).
> +complete the requested range.
>  
>  Each extent is described by a single fiemap_extent structure as
>  returned in fm_extents.
> @@ -96,7 +95,7 @@ block size of the file system.  With the exception of extents flagged as
>  FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
>  
>  The fe_flags field contains flags which describe the extent returned.
> -A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
> +A special flag, FIEMAP_EXTENT_LAST *may be* set on the last extent in
>  the file so that the process making fiemap calls can determine when no
>  more extents are available, without having to call the ioctl again.
>  
> @@ -115,8 +114,9 @@ data. Note that the opposite is not true - it would be valid for
>  FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
>  
>  * FIEMAP_EXTENT_LAST
> -This is the last extent in the file. A mapping attempt past this
> -extent will return nothing.
> +This is generally the last extent in the file. A mapping attempt past this
> +extent may return nothing. In some implementations this flag is also set on
> +the last dataset queried by the user (via fiemap->fm_length).
>  
>  * FIEMAP_EXTENT_UNKNOWN
>  The location of this extent is currently unknown. This may indicate
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework.
  2020-02-28  9:26 ` [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework Ritesh Harjani
@ 2020-02-28 15:21   ` Darrick J. Wong
  2020-03-14  3:03   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2020-02-28 15:21 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david, kbuild test robot

On Fri, Feb 28, 2020 at 02:56:58PM +0530, Ritesh Harjani wrote:
> This patch moves ext4_fiemap to use iomap framework.
> For xattr a new 'ext4_iomap_xattr_ops' is added.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

The interactions with iomap look ok,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/ext4/extents.c | 290 ++++++++--------------------------------------
>  fs/ext4/inline.c  |  41 -------
>  2 files changed, 48 insertions(+), 283 deletions(-)
> 
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 0de548bb3c90..0e3dfea0c065 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -28,6 +28,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/fiemap.h>
>  #include <linux/backing-dev.h>
> +#include <linux/iomap.h>
>  #include "ext4_jbd2.h"
>  #include "ext4_extents.h"
>  #include "xattr.h"
> @@ -97,9 +98,6 @@ static int ext4_split_extent_at(handle_t *handle,
>  			     int split_flag,
>  			     int flags);
>  
> -static int ext4_find_delayed_extent(struct inode *inode,
> -				    struct extent_status *newes);
> -
>  static int ext4_ext_trunc_restart_fn(struct inode *inode, int *dropped)
>  {
>  	/*
> @@ -2176,155 +2174,6 @@ int ext4_ext_insert_extent(handle_t *handle, struct inode *inode,
>  	return err;
>  }
>  
> -static int ext4_fill_fiemap_extents(struct inode *inode,
> -				    ext4_lblk_t block, ext4_lblk_t num,
> -				    struct fiemap_extent_info *fieinfo)
> -{
> -	struct ext4_ext_path *path = NULL;
> -	struct ext4_extent *ex;
> -	struct extent_status es;
> -	ext4_lblk_t next, next_del, start = 0, end = 0;
> -	ext4_lblk_t last = block + num;
> -	int exists, depth = 0, err = 0;
> -	unsigned int flags = 0;
> -	unsigned char blksize_bits = inode->i_sb->s_blocksize_bits;
> -
> -	while (block < last && block != EXT_MAX_BLOCKS) {
> -		num = last - block;
> -		/* find extent for this block */
> -		down_read(&EXT4_I(inode)->i_data_sem);
> -
> -		path = ext4_find_extent(inode, block, &path, 0);
> -		if (IS_ERR(path)) {
> -			up_read(&EXT4_I(inode)->i_data_sem);
> -			err = PTR_ERR(path);
> -			path = NULL;
> -			break;
> -		}
> -
> -		depth = ext_depth(inode);
> -		if (unlikely(path[depth].p_hdr == NULL)) {
> -			up_read(&EXT4_I(inode)->i_data_sem);
> -			EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
> -			err = -EFSCORRUPTED;
> -			break;
> -		}
> -		ex = path[depth].p_ext;
> -		next = ext4_ext_next_allocated_block(path);
> -
> -		flags = 0;
> -		exists = 0;
> -		if (!ex) {
> -			/* there is no extent yet, so try to allocate
> -			 * all requested space */
> -			start = block;
> -			end = block + num;
> -		} else if (le32_to_cpu(ex->ee_block) > block) {
> -			/* need to allocate space before found extent */
> -			start = block;
> -			end = le32_to_cpu(ex->ee_block);
> -			if (block + num < end)
> -				end = block + num;
> -		} else if (block >= le32_to_cpu(ex->ee_block)
> -					+ ext4_ext_get_actual_len(ex)) {
> -			/* need to allocate space after found extent */
> -			start = block;
> -			end = block + num;
> -			if (end >= next)
> -				end = next;
> -		} else if (block >= le32_to_cpu(ex->ee_block)) {
> -			/*
> -			 * some part of requested space is covered
> -			 * by found extent
> -			 */
> -			start = block;
> -			end = le32_to_cpu(ex->ee_block)
> -				+ ext4_ext_get_actual_len(ex);
> -			if (block + num < end)
> -				end = block + num;
> -			exists = 1;
> -		} else {
> -			BUG();
> -		}
> -		BUG_ON(end <= start);
> -
> -		if (!exists) {
> -			es.es_lblk = start;
> -			es.es_len = end - start;
> -			es.es_pblk = 0;
> -		} else {
> -			es.es_lblk = le32_to_cpu(ex->ee_block);
> -			es.es_len = ext4_ext_get_actual_len(ex);
> -			es.es_pblk = ext4_ext_pblock(ex);
> -			if (ext4_ext_is_unwritten(ex))
> -				flags |= FIEMAP_EXTENT_UNWRITTEN;
> -		}
> -
> -		/*
> -		 * Find delayed extent and update es accordingly. We call
> -		 * it even in !exists case to find out whether es is the
> -		 * last existing extent or not.
> -		 */
> -		next_del = ext4_find_delayed_extent(inode, &es);
> -		if (!exists && next_del) {
> -			exists = 1;
> -			flags |= (FIEMAP_EXTENT_DELALLOC |
> -				  FIEMAP_EXTENT_UNKNOWN);
> -		}
> -		up_read(&EXT4_I(inode)->i_data_sem);
> -
> -		if (unlikely(es.es_len == 0)) {
> -			EXT4_ERROR_INODE(inode, "es.es_len == 0");
> -			err = -EFSCORRUPTED;
> -			break;
> -		}
> -
> -		/*
> -		 * This is possible iff next == next_del == EXT_MAX_BLOCKS.
> -		 * we need to check next == EXT_MAX_BLOCKS because it is
> -		 * possible that an extent is with unwritten and delayed
> -		 * status due to when an extent is delayed allocated and
> -		 * is allocated by fallocate status tree will track both of
> -		 * them in a extent.
> -		 *
> -		 * So we could return a unwritten and delayed extent, and
> -		 * its block is equal to 'next'.
> -		 */
> -		if (next == next_del && next == EXT_MAX_BLOCKS) {
> -			flags |= FIEMAP_EXTENT_LAST;
> -			if (unlikely(next_del != EXT_MAX_BLOCKS ||
> -				     next != EXT_MAX_BLOCKS)) {
> -				EXT4_ERROR_INODE(inode,
> -						 "next extent == %u, next "
> -						 "delalloc extent = %u",
> -						 next, next_del);
> -				err = -EFSCORRUPTED;
> -				break;
> -			}
> -		}
> -
> -		if (exists) {
> -			err = fiemap_fill_next_extent(fieinfo,
> -				(__u64)es.es_lblk << blksize_bits,
> -				(__u64)es.es_pblk << blksize_bits,
> -				(__u64)es.es_len << blksize_bits,
> -				flags);
> -			if (err < 0)
> -				break;
> -			if (err == 1) {
> -				err = 0;
> -				break;
> -			}
> -		}
> -
> -		block = es.es_lblk + es.es_len;
> -	}
> -
> -	ext4_ext_drop_refs(path);
> -	kfree(path);
> -	return err;
> -}
> -
>  static int ext4_fill_es_cache_info(struct inode *inode,
>  				   ext4_lblk_t block, ext4_lblk_t num,
>  				   struct fiemap_extent_info *fieinfo)
> @@ -5058,64 +4907,13 @@ int ext4_convert_unwritten_io_end_vec(handle_t *handle, ext4_io_end_t *io_end)
>  	return ret < 0 ? ret : err;
>  }
>  
> -/*
> - * If newes is not existing extent (newes->ec_pblk equals zero) find
> - * delayed extent at start of newes and update newes accordingly and
> - * return start of the next delayed extent.
> - *
> - * If newes is existing extent (newes->ec_pblk is not equal zero)
> - * return start of next delayed extent or EXT_MAX_BLOCKS if no delayed
> - * extent found. Leave newes unmodified.
> - */
> -static int ext4_find_delayed_extent(struct inode *inode,
> -				    struct extent_status *newes)
> -{
> -	struct extent_status es;
> -	ext4_lblk_t block, next_del;
> -
> -	if (newes->es_pblk == 0) {
> -		ext4_es_find_extent_range(inode, &ext4_es_is_delayed,
> -					  newes->es_lblk,
> -					  newes->es_lblk + newes->es_len - 1,
> -					  &es);
> -
> -		/*
> -		 * No extent in extent-tree contains block @newes->es_pblk,
> -		 * then the block may stay in 1)a hole or 2)delayed-extent.
> -		 */
> -		if (es.es_len == 0)
> -			/* A hole found. */
> -			return 0;
> -
> -		if (es.es_lblk > newes->es_lblk) {
> -			/* A hole found. */
> -			newes->es_len = min(es.es_lblk - newes->es_lblk,
> -					    newes->es_len);
> -			return 0;
> -		}
> -
> -		newes->es_len = es.es_lblk + es.es_len - newes->es_lblk;
> -	}
> -
> -	block = newes->es_lblk + newes->es_len;
> -	ext4_es_find_extent_range(inode, &ext4_es_is_delayed, block,
> -				  EXT_MAX_BLOCKS, &es);
> -	if (es.es_len == 0)
> -		next_del = EXT_MAX_BLOCKS;
> -	else
> -		next_del = es.es_lblk;
> -
> -	return next_del;
> -}
> -
> -static int ext4_xattr_fiemap(struct inode *inode,
> -				struct fiemap_extent_info *fieinfo)
> +static int ext4_iomap_xattr_fiemap(struct inode *inode, struct iomap *iomap)
>  {
>  	__u64 physical = 0;
> -	__u64 length;
> -	__u32 flags = FIEMAP_EXTENT_LAST;
> +	__u64 length = 0;
>  	int blockbits = inode->i_sb->s_blocksize_bits;
>  	int error = 0;
> +	u16 iomap_type;
>  
>  	/* in-inode? */
>  	if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
> @@ -5130,40 +4928,49 @@ static int ext4_xattr_fiemap(struct inode *inode,
>  				EXT4_I(inode)->i_extra_isize;
>  		physical += offset;
>  		length = EXT4_SB(inode->i_sb)->s_inode_size - offset;
> -		flags |= FIEMAP_EXTENT_DATA_INLINE;
>  		brelse(iloc.bh);
> -	} else { /* external block */
> +		iomap_type = IOMAP_INLINE;
> +	} else if (EXT4_I(inode)->i_file_acl) { /* external block */
>  		physical = (__u64)EXT4_I(inode)->i_file_acl << blockbits;
>  		length = inode->i_sb->s_blocksize;
> +		iomap_type = IOMAP_MAPPED;
> +	} else {
> +		/* no in-inode or external block for xattr, so return -ENOENT */
> +		error = -ENOENT;
> +		goto out;
>  	}
>  
> -	if (physical)
> -		error = fiemap_fill_next_extent(fieinfo, 0, physical,
> -						length, flags);
> -	return (error < 0 ? error : 0);
> +	iomap->addr = physical;
> +	iomap->offset = 0;
> +	iomap->length = length;
> +	iomap->type = iomap_type;
> +	iomap->flags = 0;
> +out:
> +	return error;
>  }
>  
> -static int _ext4_fiemap(struct inode *inode,
> -			struct fiemap_extent_info *fieinfo,
> -			__u64 start, __u64 len,
> -			int (*fill)(struct inode *, ext4_lblk_t,
> -				    ext4_lblk_t,
> -				    struct fiemap_extent_info *))
> +static int ext4_iomap_xattr_begin(struct inode *inode, loff_t offset,
> +				  loff_t length, unsigned flags,
> +				  struct iomap *iomap, struct iomap *srcmap)
>  {
> -	ext4_lblk_t start_blk;
> -	u32 ext4_fiemap_flags = FIEMAP_FLAG_SYNC|FIEMAP_FLAG_XATTR;
> +	int error;
>  
> -	int error = 0;
> -
> -	if (ext4_has_inline_data(inode)) {
> -		int has_inline = 1;
> +	error = ext4_iomap_xattr_fiemap(inode, iomap);
> +	if (error == 0 && (offset >= iomap->length))
> +		error = -ENOENT;
> +	return error;
> +}
>  
> -		error = ext4_inline_data_fiemap(inode, fieinfo, &has_inline,
> -						start, len);
> +static const struct iomap_ops ext4_iomap_xattr_ops = {
> +	.iomap_begin		= ext4_iomap_xattr_begin,
> +};
>  
> -		if (has_inline)
> -			return error;
> -	}
> +static int _ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
> +			__u64 start, __u64 len, bool from_es_cache)
> +{
> +	ext4_lblk_t start_blk;
> +	u32 ext4_fiemap_flags = FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR;
> +	int error = 0;
>  
>  	if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
>  		error = ext4_ext_precache(inode);
> @@ -5172,19 +4979,19 @@ static int _ext4_fiemap(struct inode *inode,
>  		fieinfo->fi_flags &= ~FIEMAP_FLAG_CACHE;
>  	}
>  
> -	/* fallback to generic here if not in extents fmt */
> -	if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) &&
> -	    fill == ext4_fill_fiemap_extents)
> -		return generic_block_fiemap(inode, fieinfo, start, len,
> -			ext4_get_block);
> -
> -	if (fill == ext4_fill_es_cache_info)
> +	if (from_es_cache)
>  		ext4_fiemap_flags &= FIEMAP_FLAG_XATTR;
> +
>  	if (fiemap_check_flags(fieinfo, ext4_fiemap_flags))
>  		return -EBADR;
>  
>  	if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
> -		error = ext4_xattr_fiemap(inode, fieinfo);
> +		fieinfo->fi_flags &= ~FIEMAP_FLAG_XATTR;
> +		error = iomap_fiemap(inode, fieinfo, start, len,
> +				     &ext4_iomap_xattr_ops);
> +	} else if (!from_es_cache) {
> +		error = iomap_fiemap(inode, fieinfo, start, len,
> +				     &ext4_iomap_report_ops);
>  	} else {
>  		ext4_lblk_t len_blks;
>  		__u64 last_blk;
> @@ -5199,7 +5006,8 @@ static int _ext4_fiemap(struct inode *inode,
>  		 * Walk the extent tree gathering extent information
>  		 * and pushing extents back to the user.
>  		 */
> -		error = fill(inode, start_blk, len_blks, fieinfo);
> +		error = ext4_fill_es_cache_info(inode, start_blk, len_blks,
> +						fieinfo);
>  	}
>  	return error;
>  }
> @@ -5207,8 +5015,7 @@ static int _ext4_fiemap(struct inode *inode,
>  int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		__u64 start, __u64 len)
>  {
> -	return _ext4_fiemap(inode, fieinfo, start, len,
> -			    ext4_fill_fiemap_extents);
> +	return _ext4_fiemap(inode, fieinfo, start, len, false);
>  }
>  
>  int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
> @@ -5224,8 +5031,7 @@ int ext4_get_es_cache(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  			return 0;
>  	}
>  
> -	return _ext4_fiemap(inode, fieinfo, start, len,
> -			    ext4_fill_es_cache_info);
> +	return _ext4_fiemap(inode, fieinfo, start, len, true);
>  }
>  
>  
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index e61603f47035..b8b99634c832 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -1857,47 +1857,6 @@ int ext4_inline_data_iomap(struct inode *inode, struct iomap *iomap)
>  	return error;
>  }
>  
> -int ext4_inline_data_fiemap(struct inode *inode,
> -			    struct fiemap_extent_info *fieinfo,
> -			    int *has_inline, __u64 start, __u64 len)
> -{
> -	__u64 physical = 0;
> -	__u64 inline_len;
> -	__u32 flags = FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED |
> -		FIEMAP_EXTENT_LAST;
> -	int error = 0;
> -	struct ext4_iloc iloc;
> -
> -	down_read(&EXT4_I(inode)->xattr_sem);
> -	if (!ext4_has_inline_data(inode)) {
> -		*has_inline = 0;
> -		goto out;
> -	}
> -	inline_len = min_t(size_t, ext4_get_inline_size(inode),
> -			   i_size_read(inode));
> -	if (start >= inline_len)
> -		goto out;
> -	if (start + len < inline_len)
> -		inline_len = start + len;
> -	inline_len -= start;
> -
> -	error = ext4_get_inode_loc(inode, &iloc);
> -	if (error)
> -		goto out;
> -
> -	physical = (__u64)iloc.bh->b_blocknr << inode->i_sb->s_blocksize_bits;
> -	physical += (char *)ext4_raw_inode(&iloc) - iloc.bh->b_data;
> -	physical += offsetof(struct ext4_inode, i_block);
> -
> -	brelse(iloc.bh);
> -out:
> -	up_read(&EXT4_I(inode)->xattr_sem);
> -	if (physical)
> -		error = fiemap_fill_next_extent(fieinfo, start, physical,
> -						inline_len, flags);
> -	return (error < 0 ? error : 0);
> -}
> -
>  int ext4_inline_data_truncate(struct inode *inode, int *has_inline)
>  {
>  	handle_t *handle;
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-02-28  9:26 ` [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure Ritesh Harjani
@ 2020-02-28 15:25   ` Darrick J. Wong
  2020-03-02  8:58     ` Ritesh Harjani
  2020-03-13 20:16   ` Theodore Y. Ts'o
  1 sibling, 1 reply; 27+ messages in thread
From: Darrick J. Wong @ 2020-02-28 15:25 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> so just move the API from generic_block_bmap to iomap_bmap for iomap
> conversion.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6cf3b969dc86..81fccbae0aea 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>  			return 0;
>  	}
>  
> -	return generic_block_bmap(mapping, block, ext4_get_block);
> +	return iomap_bmap(mapping, block, &ext4_iomap_ops);

/me notes that iomap_bmap will filemap_write_and_wait for you, so one
could optimize ext4_bmap to avoid the double-flush by moving the
filemap_write_and_wait at the top of the function into the JDATA state
clearing block.

OTOH it's bmap, who cares... :)

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

>  }
>  
>  static int ext4_readpage(struct file *file, struct page *page)
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping
  2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
@ 2020-02-28 15:26   ` Darrick J. Wong
  2020-03-13 19:52   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 27+ messages in thread
From: Darrick J. Wong @ 2020-02-28 15:26 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:54PM +0530, Ritesh Harjani wrote:
> IOMAP_F_MERGED needs to be set in case of non-extent based mapping.
> This is needed in later patches for conversion of ext4_fiemap to use iomap.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Seems reasonable,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
>  fs/ext4/inode.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d035acab5b2a..6cf3b969dc86 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3335,6 +3335,10 @@ static void ext4_set_iomap(struct inode *inode, struct iomap *iomap,
>  	iomap->offset = (u64) map->m_lblk << blkbits;
>  	iomap->length = (u64) map->m_len << blkbits;
>  
> +	if ((map->m_flags & EXT4_MAP_MAPPED) &&
> +	    !ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))
> +		iomap->flags |= IOMAP_F_MERGED;
> +
>  	/*
>  	 * Flags passed to ext4_map_blocks() for direct I/O writes can result
>  	 * in m_flags having both EXT4_MAP_MAPPED and EXT4_MAP_UNWRITTEN bits
> -- 
> 2.21.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST
  2020-02-28  9:26 ` [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST Ritesh Harjani
  2020-02-28 15:20   ` Darrick J. Wong
@ 2020-02-28 15:36   ` Matthew Wilcox
  2020-03-02  8:10     ` Ritesh Harjani
  1 sibling, 1 reply; 27+ messages in thread
From: Matthew Wilcox @ 2020-02-28 15:36 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel,
	darrick.wong, hch, cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:59PM +0530, Ritesh Harjani wrote:
> Currently FIEMAP_EXTENT_LAST is not working consistently across
> different filesystem's fiemap implementations. So add more information
> about how else this flag could set in other implementation.
> 
> Also in general, user should not completely rely on this flag as
> such since it could return false value for e.g.
> when there is a delalloc extent which might get converted during
> writeback, immediately after the fiemap calls return.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> ---
>  Documentation/filesystems/fiemap.txt | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
> index f6d9c99103a4..fedfa9b9dde5 100644
> --- a/Documentation/filesystems/fiemap.txt
> +++ b/Documentation/filesystems/fiemap.txt
> @@ -71,8 +71,7 @@ allocated is less than would be required to map the requested range,
>  the maximum number of extents that can be mapped in the fm_extent[]
>  array will be returned and fm_mapped_extents will be equal to
>  fm_extent_count. In that case, the last extent in the array will not
> -complete the requested range and will not have the FIEMAP_EXTENT_LAST
> -flag set (see the next section on extent flags).
> +complete the requested range.

This sentence still seems like it should be true.  If the filesystem knows
there are more extents to come, it will definitely not set the LAST flag.

> @@ -96,7 +95,7 @@ block size of the file system.  With the exception of extents flagged as
>  FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
>  
>  The fe_flags field contains flags which describe the extent returned.
> -A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
> +A special flag, FIEMAP_EXTENT_LAST *may be* set on the last extent in
>  the file so that the process making fiemap calls can determine when no
>  more extents are available, without having to call the ioctl again.

I'm not sure I'd highlight 'may be' here.

> @@ -115,8 +114,9 @@ data. Note that the opposite is not true - it would be valid for
>  FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
>  
>  * FIEMAP_EXTENT_LAST
> -This is the last extent in the file. A mapping attempt past this
> -extent will return nothing.
> +This is generally the last extent in the file. A mapping attempt past this
> +extent may return nothing. In some implementations this flag is also set on
> +the last dataset queried by the user (via fiemap->fm_length).

The word 'dataset' is used nowhere else in this document.  How about

"Some filesystems set this flag to indicate this extent is the last one in
the range queried by the user"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST
  2020-02-28 15:36   ` Matthew Wilcox
@ 2020-03-02  8:10     ` Ritesh Harjani
  2020-03-14  3:48       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2020-03-02  8:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-ext4, jack, tytso, adilger.kernel, linux-fsdevel,
	darrick.wong, hch, cmaiolino, david



On 2/28/20 9:06 PM, Matthew Wilcox wrote:
> On Fri, Feb 28, 2020 at 02:56:59PM +0530, Ritesh Harjani wrote:
>> Currently FIEMAP_EXTENT_LAST is not working consistently across
>> different filesystem's fiemap implementations. So add more information
>> about how else this flag could set in other implementation.
>>
>> Also in general, user should not completely rely on this flag as
>> such since it could return false value for e.g.
>> when there is a delalloc extent which might get converted during
>> writeback, immediately after the fiemap calls return.
>>
>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>> ---
>>   Documentation/filesystems/fiemap.txt | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
>> index f6d9c99103a4..fedfa9b9dde5 100644
>> --- a/Documentation/filesystems/fiemap.txt
>> +++ b/Documentation/filesystems/fiemap.txt
>> @@ -71,8 +71,7 @@ allocated is less than would be required to map the requested range,
>>   the maximum number of extents that can be mapped in the fm_extent[]
>>   array will be returned and fm_mapped_extents will be equal to
>>   fm_extent_count. In that case, the last extent in the array will not
>> -complete the requested range and will not have the FIEMAP_EXTENT_LAST
>> -flag set (see the next section on extent flags).
>> +complete the requested range.
> 
> This sentence still seems like it should be true.  If the filesystem knows
> there are more extents to come, it will definitely not set the LAST flag.
> 

sure.

>> @@ -96,7 +95,7 @@ block size of the file system.  With the exception of extents flagged as
>>   FIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
>>   
>>   The fe_flags field contains flags which describe the extent returned.
>> -A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
>> +A special flag, FIEMAP_EXTENT_LAST *may be* set on the last extent in
>>   the file so that the process making fiemap calls can determine when no
>>   more extents are available, without having to call the ioctl again.
> 
> I'm not sure I'd highlight 'may be' here.

Sure.

> 
>> @@ -115,8 +114,9 @@ data. Note that the opposite is not true - it would be valid for
>>   FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
>>   
>>   * FIEMAP_EXTENT_LAST
>> -This is the last extent in the file. A mapping attempt past this
>> -extent will return nothing.
>> +This is generally the last extent in the file. A mapping attempt past this
>> +extent may return nothing. In some implementations this flag is also set on
>> +the last dataset queried by the user (via fiemap->fm_length).
> 
> The word 'dataset' is used nowhere else in this document.  How about
> 
> "Some filesystems set this flag to indicate this extent is the last one in
> the range queried by the user"

Sure.

Thanks for the review.
Will make the suggested changes and send a v6.

-ritesh


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-02-28 15:25   ` Darrick J. Wong
@ 2020-03-02  8:58     ` Ritesh Harjani
  2020-03-03 15:47       ` Darrick J. Wong
  0 siblings, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2020-03-02  8:58 UTC (permalink / raw)
  To: Darrick J. Wong, jack
  Cc: linux-ext4, tytso, adilger.kernel, linux-fsdevel, hch, cmaiolino, david

[-- Attachment #1: Type: text/plain, Size: 1515 bytes --]



On 2/28/20 8:55 PM, Darrick J. Wong wrote:
> On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
>> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
>> so just move the API from generic_block_bmap to iomap_bmap for iomap
>> conversion.
>>
>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>> Reviewed-by: Jan Kara <jack@suse.cz>
>> ---
>>   fs/ext4/inode.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 6cf3b969dc86..81fccbae0aea 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>>   			return 0;
>>   	}
>>   
>> -	return generic_block_bmap(mapping, block, ext4_get_block);
>> +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
> 
> /me notes that iomap_bmap will filemap_write_and_wait for you, so one
> could optimize ext4_bmap to avoid the double-flush by moving the
> filemap_write_and_wait at the top of the function into the JDATA state
> clearing block.

IIUC, delalloc and data=journal mode are both mutually exclusive.
So we could get rid of calling filemap_write_and_wait() all together
from ext4_bmap().
And as you pointed filemap_write_and_wait() is called by default in
iomap_bmap which should cover for delalloc case.


@Jan/Darrick,
Could you check if the attached patch looks good. If yes then
will add your Reviewed-by and send a v6.

Thanks for the review!!

-ritesh



[-- Attachment #2: 0001-ext4-Move-ext4-bmap-to-use-iomap-infrastructure.patch --]
[-- Type: text/x-patch, Size: 1689 bytes --]

From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
From: Ritesh Harjani <riteshh@linux.ibm.com>
Date: Tue, 20 Aug 2019 18:36:33 +0530
Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.

ext4_iomap_begin is already implemented which provides ext4_map_blocks,
so just move the API from generic_block_bmap to iomap_bmap for iomap
conversion.

Also no need to call for filemap_write_and_wait() any more in ext4_bmap
since data=journal mode anyway doesn't support delalloc and for all other
cases iomap_bmap() anyway calls the same function, so no need for doing
it twice.

Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
---
 fs/ext4/inode.c | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6cf3b969dc86..fac8adbbb3f6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3174,16 +3174,6 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 	if (ext4_has_inline_data(inode))
 		return 0;
 
-	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
-			test_opt(inode->i_sb, DELALLOC)) {
-		/*
-		 * With delalloc we want to sync the file
-		 * so that we can make sure we allocate
-		 * blocks for file
-		 */
-		filemap_write_and_wait(mapping);
-	}
-
 	if (EXT4_JOURNAL(inode) &&
 	    ext4_test_inode_state(inode, EXT4_STATE_JDATA)) {
 		/*
@@ -3214,7 +3204,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
 			return 0;
 	}
 
-	return generic_block_bmap(mapping, block, ext4_get_block);
+	return iomap_bmap(mapping, block, &ext4_iomap_ops);
 }
 
 static int ext4_readpage(struct file *file, struct page *page)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-02  8:58     ` Ritesh Harjani
@ 2020-03-03 15:47       ` Darrick J. Wong
  2020-03-04 12:42         ` Jan Kara
  0 siblings, 1 reply; 27+ messages in thread
From: Darrick J. Wong @ 2020-03-03 15:47 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: jack, linux-ext4, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david

On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
> 
> 
> On 2/28/20 8:55 PM, Darrick J. Wong wrote:
> > On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > conversion.
> > > 
> > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > ---
> > >   fs/ext4/inode.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 6cf3b969dc86..81fccbae0aea 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> > >   			return 0;
> > >   	}
> > > -	return generic_block_bmap(mapping, block, ext4_get_block);
> > > +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
> > 
> > /me notes that iomap_bmap will filemap_write_and_wait for you, so one
> > could optimize ext4_bmap to avoid the double-flush by moving the
> > filemap_write_and_wait at the top of the function into the JDATA state
> > clearing block.
> 
> IIUC, delalloc and data=journal mode are both mutually exclusive.
> So we could get rid of calling filemap_write_and_wait() all together
> from ext4_bmap().
> And as you pointed filemap_write_and_wait() is called by default in
> iomap_bmap which should cover for delalloc case.
> 
> 
> @Jan/Darrick,
> Could you check if the attached patch looks good. If yes then
> will add your Reviewed-by and send a v6.
> 
> Thanks for the review!!
> 
> -ritesh
> 
> 

> From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
> From: Ritesh Harjani <riteshh@linux.ibm.com>
> Date: Tue, 20 Aug 2019 18:36:33 +0530
> Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
> 
> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> so just move the API from generic_block_bmap to iomap_bmap for iomap
> conversion.
> 
> Also no need to call for filemap_write_and_wait() any more in ext4_bmap
> since data=journal mode anyway doesn't support delalloc and for all other
> cases iomap_bmap() anyway calls the same function, so no need for doing
> it twice.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>

Hmmm.  I don't recall how jdata actually works, but I get the impression
here that we're trying to flush dirty data out to the journal and then
out to disk, and then drop the JDATA state from the inode.  This
mechanism exists (I guess?) so that dirty file pages get checkpointed
out of jbd2 back into the filesystem so that bmap() returns meaningful
results to lilo.

This makes me wonder if you still need the filemap_write_and_wait in the
JDATA case because otherwise the journal flush won't have the effect of
writing all the dirty pagecache back to the filesystem?  OTOH I suppose
the implicit write-and-wait call after we clear JDATA will not be
writing to the journal.

Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?

--D

> ---
>  fs/ext4/inode.c | 12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6cf3b969dc86..fac8adbbb3f6 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3174,16 +3174,6 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>  	if (ext4_has_inline_data(inode))
>  		return 0;
>  
> -	if (mapping_tagged(mapping, PAGECACHE_TAG_DIRTY) &&
> -			test_opt(inode->i_sb, DELALLOC)) {
> -		/*
> -		 * With delalloc we want to sync the file
> -		 * so that we can make sure we allocate
> -		 * blocks for file
> -		 */
> -		filemap_write_and_wait(mapping);
> -	}
> -
>  	if (EXT4_JOURNAL(inode) &&
>  	    ext4_test_inode_state(inode, EXT4_STATE_JDATA)) {
>  		/*
> @@ -3214,7 +3204,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>  			return 0;
>  	}
>  
> -	return generic_block_bmap(mapping, block, ext4_get_block);
> +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
>  }
>  
>  static int ext4_readpage(struct file *file, struct page *page)
> -- 
> 2.21.0
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-03 15:47       ` Darrick J. Wong
@ 2020-03-04 12:42         ` Jan Kara
  2020-03-04 15:37           ` Darrick J. Wong
  2020-03-06 17:49           ` Ritesh Harjani
  0 siblings, 2 replies; 27+ messages in thread
From: Jan Kara @ 2020-03-04 12:42 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ritesh Harjani, jack, linux-ext4, tytso, adilger.kernel,
	linux-fsdevel, hch, cmaiolino, david

On Tue 03-03-20 07:47:09, Darrick J. Wong wrote:
> On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
> > 
> > 
> > On 2/28/20 8:55 PM, Darrick J. Wong wrote:
> > > On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> > > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > > conversion.
> > > > 
> > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > ---
> > > >   fs/ext4/inode.c | 2 +-
> > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > index 6cf3b969dc86..81fccbae0aea 100644
> > > > --- a/fs/ext4/inode.c
> > > > +++ b/fs/ext4/inode.c
> > > > @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> > > >   			return 0;
> > > >   	}
> > > > -	return generic_block_bmap(mapping, block, ext4_get_block);
> > > > +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
> > > 
> > > /me notes that iomap_bmap will filemap_write_and_wait for you, so one
> > > could optimize ext4_bmap to avoid the double-flush by moving the
> > > filemap_write_and_wait at the top of the function into the JDATA state
> > > clearing block.
> > 
> > IIUC, delalloc and data=journal mode are both mutually exclusive.
> > So we could get rid of calling filemap_write_and_wait() all together
> > from ext4_bmap().
> > And as you pointed filemap_write_and_wait() is called by default in
> > iomap_bmap which should cover for delalloc case.
> > 
> > 
> > @Jan/Darrick,
> > Could you check if the attached patch looks good. If yes then
> > will add your Reviewed-by and send a v6.
> > 
> > Thanks for the review!!
> > 
> > -ritesh
> > 
> > 
> 
> > From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
> > From: Ritesh Harjani <riteshh@linux.ibm.com>
> > Date: Tue, 20 Aug 2019 18:36:33 +0530
> > Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
> > 
> > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > conversion.
> > 
> > Also no need to call for filemap_write_and_wait() any more in ext4_bmap
> > since data=journal mode anyway doesn't support delalloc and for all other
> > cases iomap_bmap() anyway calls the same function, so no need for doing
> > it twice.
> > 
> > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> 
> Hmmm.  I don't recall how jdata actually works, but I get the impression
> here that we're trying to flush dirty data out to the journal and then
> out to disk, and then drop the JDATA state from the inode.  This
> mechanism exists (I guess?) so that dirty file pages get checkpointed
> out of jbd2 back into the filesystem so that bmap() returns meaningful
> results to lilo.

Exactly. E.g. when we are journalling data, we fill hole through mmap, we will
have block allocated as unwritten and we need to write it out so that the
data gets to the journal and then do journal flush to get the data to disk
so that lilo can read it from the devices. So removing
filemap_write_and_wait() when journalling data is wrong.

> This makes me wonder if you still need the filemap_write_and_wait in the
> JDATA case because otherwise the journal flush won't have the effect of
> writing all the dirty pagecache back to the filesystem?  OTOH I suppose
> the implicit write-and-wait call after we clear JDATA will not be
> writing to the journal.
> 
> Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?

Yeah, it should do that but that's only performance optimization so that we
bother with journal flushing only when someone uses block mapping call on
a file with journalled dirty data. So you can hardly notice the bug by
testing...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-04 12:42         ` Jan Kara
@ 2020-03-04 15:37           ` Darrick J. Wong
  2020-03-07  2:32             ` Theodore Y. Ts'o
  2020-03-06 17:49           ` Ritesh Harjani
  1 sibling, 1 reply; 27+ messages in thread
From: Darrick J. Wong @ 2020-03-04 15:37 UTC (permalink / raw)
  To: Jan Kara
  Cc: Ritesh Harjani, linux-ext4, tytso, adilger.kernel, linux-fsdevel,
	hch, cmaiolino, david

On Wed, Mar 04, 2020 at 01:42:11PM +0100, Jan Kara wrote:
> On Tue 03-03-20 07:47:09, Darrick J. Wong wrote:
> > On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
> > > 
> > > 
> > > On 2/28/20 8:55 PM, Darrick J. Wong wrote:
> > > > On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> > > > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > > > conversion.
> > > > > 
> > > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > ---
> > > > >   fs/ext4/inode.c | 2 +-
> > > > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > index 6cf3b969dc86..81fccbae0aea 100644
> > > > > --- a/fs/ext4/inode.c
> > > > > +++ b/fs/ext4/inode.c
> > > > > @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> > > > >   			return 0;
> > > > >   	}
> > > > > -	return generic_block_bmap(mapping, block, ext4_get_block);
> > > > > +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
> > > > 
> > > > /me notes that iomap_bmap will filemap_write_and_wait for you, so one
> > > > could optimize ext4_bmap to avoid the double-flush by moving the
> > > > filemap_write_and_wait at the top of the function into the JDATA state
> > > > clearing block.
> > > 
> > > IIUC, delalloc and data=journal mode are both mutually exclusive.
> > > So we could get rid of calling filemap_write_and_wait() all together
> > > from ext4_bmap().
> > > And as you pointed filemap_write_and_wait() is called by default in
> > > iomap_bmap which should cover for delalloc case.
> > > 
> > > 
> > > @Jan/Darrick,
> > > Could you check if the attached patch looks good. If yes then
> > > will add your Reviewed-by and send a v6.
> > > 
> > > Thanks for the review!!
> > > 
> > > -ritesh
> > > 
> > > 
> > 
> > > From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
> > > From: Ritesh Harjani <riteshh@linux.ibm.com>
> > > Date: Tue, 20 Aug 2019 18:36:33 +0530
> > > Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
> > > 
> > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > conversion.
> > > 
> > > Also no need to call for filemap_write_and_wait() any more in ext4_bmap
> > > since data=journal mode anyway doesn't support delalloc and for all other
> > > cases iomap_bmap() anyway calls the same function, so no need for doing
> > > it twice.
> > > 
> > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > 
> > Hmmm.  I don't recall how jdata actually works, but I get the impression
> > here that we're trying to flush dirty data out to the journal and then
> > out to disk, and then drop the JDATA state from the inode.  This
> > mechanism exists (I guess?) so that dirty file pages get checkpointed
> > out of jbd2 back into the filesystem so that bmap() returns meaningful
> > results to lilo.
> 
> Exactly. E.g. when we are journalling data, we fill hole through mmap, we will
> have block allocated as unwritten and we need to write it out so that the
> data gets to the journal and then do journal flush to get the data to disk
> so that lilo can read it from the devices. So removing
> filemap_write_and_wait() when journalling data is wrong.

<nod>

> > This makes me wonder if you still need the filemap_write_and_wait in the
> > JDATA case because otherwise the journal flush won't have the effect of
> > writing all the dirty pagecache back to the filesystem?  OTOH I suppose
> > the implicit write-and-wait call after we clear JDATA will not be
> > writing to the journal.
> > 
> > Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?
> 
> Yeah, it should do that but that's only performance optimization so that we
> bother with journal flushing only when someone uses block mapping call on
> a file with journalled dirty data. So you can hardly notice the bug by
> testing...

If we ever decide to deprecate FIBMAP officially and push bootloaders to
use FIEMAP, then we'll have to emulate all the flushing behaviors.  But
that's something for a separate patch.

--D

> 								Honza
> 
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-04 12:42         ` Jan Kara
  2020-03-04 15:37           ` Darrick J. Wong
@ 2020-03-06 17:49           ` Ritesh Harjani
  2020-03-07  0:51             ` Darrick J. Wong
  1 sibling, 1 reply; 27+ messages in thread
From: Ritesh Harjani @ 2020-03-06 17:49 UTC (permalink / raw)
  To: Jan Kara, Darrick J. Wong
  Cc: linux-ext4, tytso, adilger.kernel, linux-fsdevel, hch, cmaiolino, david



On 3/4/20 6:12 PM, Jan Kara wrote:
> On Tue 03-03-20 07:47:09, Darrick J. Wong wrote:
>> On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
>>>
>>>
>>> On 2/28/20 8:55 PM, Darrick J. Wong wrote:
>>>> On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
>>>>> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
>>>>> so just move the API from generic_block_bmap to iomap_bmap for iomap
>>>>> conversion.
>>>>>
>>>>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>> ---
>>>>>    fs/ext4/inode.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>>> index 6cf3b969dc86..81fccbae0aea 100644
>>>>> --- a/fs/ext4/inode.c
>>>>> +++ b/fs/ext4/inode.c
>>>>> @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>>>>>    			return 0;
>>>>>    	}
>>>>> -	return generic_block_bmap(mapping, block, ext4_get_block);
>>>>> +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
>>>>
>>>> /me notes that iomap_bmap will filemap_write_and_wait for you, so one
>>>> could optimize ext4_bmap to avoid the double-flush by moving the
>>>> filemap_write_and_wait at the top of the function into the JDATA state
>>>> clearing block.
>>>
>>> IIUC, delalloc and data=journal mode are both mutually exclusive.
>>> So we could get rid of calling filemap_write_and_wait() all together
>>> from ext4_bmap().
>>> And as you pointed filemap_write_and_wait() is called by default in
>>> iomap_bmap which should cover for delalloc case.
>>>
>>>
>>> @Jan/Darrick,
>>> Could you check if the attached patch looks good. If yes then
>>> will add your Reviewed-by and send a v6.
>>>
>>> Thanks for the review!!
>>>
>>> -ritesh
>>>
>>>
>>
>>>  From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
>>> From: Ritesh Harjani <riteshh@linux.ibm.com>
>>> Date: Tue, 20 Aug 2019 18:36:33 +0530
>>> Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
>>>
>>> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
>>> so just move the API from generic_block_bmap to iomap_bmap for iomap
>>> conversion.
>>>
>>> Also no need to call for filemap_write_and_wait() any more in ext4_bmap
>>> since data=journal mode anyway doesn't support delalloc and for all other
>>> cases iomap_bmap() anyway calls the same function, so no need for doing
>>> it twice.
>>>
>>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>>
>> Hmmm.  I don't recall how jdata actually works, but I get the impression
>> here that we're trying to flush dirty data out to the journal and then
>> out to disk, and then drop the JDATA state from the inode.  This
>> mechanism exists (I guess?) so that dirty file pages get checkpointed
>> out of jbd2 back into the filesystem so that bmap() returns meaningful
>> results to lilo.
> 
> Exactly. E.g. when we are journalling data, we fill hole through mmap, we will
> have block allocated as unwritten and we need to write it out so that the
> data gets to the journal and then do journal flush to get the data to disk

So in data=journal case in ext4_page_mkwrite the data buffer will also
be marked as, to be journalled. So does jbd2_journal_flush() itself
don't take care of writing back any dirty page cache before it commit
that transaction? and after then checkpoint it?

Sorry my knowledge about jbd2 is very naive.

> so that lilo can read it from the devices. So removing
> filemap_write_and_wait() when journalling data is wrong.

Sure I understand this part. But was just curious on above query.
Otherwise, IIUC, we will have to add
filemap_write_and_wait() for JDATA case as well before calling
for jbd2_journal_flush(). Will add this as a separate patch.


-ritesh

> 
>> This makes me wonder if you still need the filemap_write_and_wait in the
>> JDATA case because otherwise the journal flush won't have the effect of
>> writing all the dirty pagecache back to the filesystem?  OTOH I suppose
>> the implicit write-and-wait call after we clear JDATA will not be
>> writing to the journal.
>>
>> Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?
> 
> Yeah, it should do that but that's only performance optimization so that we
> bother with journal flushing only when someone uses block mapping call on
> a file with journalled dirty data. So you can hardly notice the bug by
> testing...
> 
> 								Honza
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-06 17:49           ` Ritesh Harjani
@ 2020-03-07  0:51             ` Darrick J. Wong
  2020-03-07  5:50               ` Ritesh Harjani
  0 siblings, 1 reply; 27+ messages in thread
From: Darrick J. Wong @ 2020-03-07  0:51 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david

On Fri, Mar 06, 2020 at 11:19:31PM +0530, Ritesh Harjani wrote:
> 
> 
> On 3/4/20 6:12 PM, Jan Kara wrote:
> > On Tue 03-03-20 07:47:09, Darrick J. Wong wrote:
> > > On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
> > > > 
> > > > 
> > > > On 2/28/20 8:55 PM, Darrick J. Wong wrote:
> > > > > On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> > > > > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > > > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > > > > conversion.
> > > > > > 
> > > > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > > > > > Reviewed-by: Jan Kara <jack@suse.cz>
> > > > > > ---
> > > > > >    fs/ext4/inode.c | 2 +-
> > > > > >    1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > > index 6cf3b969dc86..81fccbae0aea 100644
> > > > > > --- a/fs/ext4/inode.c
> > > > > > +++ b/fs/ext4/inode.c
> > > > > > @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
> > > > > >    			return 0;
> > > > > >    	}
> > > > > > -	return generic_block_bmap(mapping, block, ext4_get_block);
> > > > > > +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
> > > > > 
> > > > > /me notes that iomap_bmap will filemap_write_and_wait for you, so one
> > > > > could optimize ext4_bmap to avoid the double-flush by moving the
> > > > > filemap_write_and_wait at the top of the function into the JDATA state
> > > > > clearing block.
> > > > 
> > > > IIUC, delalloc and data=journal mode are both mutually exclusive.
> > > > So we could get rid of calling filemap_write_and_wait() all together
> > > > from ext4_bmap().
> > > > And as you pointed filemap_write_and_wait() is called by default in
> > > > iomap_bmap which should cover for delalloc case.
> > > > 
> > > > 
> > > > @Jan/Darrick,
> > > > Could you check if the attached patch looks good. If yes then
> > > > will add your Reviewed-by and send a v6.
> > > > 
> > > > Thanks for the review!!
> > > > 
> > > > -ritesh
> > > > 
> > > > 
> > > 
> > > >  From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
> > > > From: Ritesh Harjani <riteshh@linux.ibm.com>
> > > > Date: Tue, 20 Aug 2019 18:36:33 +0530
> > > > Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
> > > > 
> > > > ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> > > > so just move the API from generic_block_bmap to iomap_bmap for iomap
> > > > conversion.
> > > > 
> > > > Also no need to call for filemap_write_and_wait() any more in ext4_bmap
> > > > since data=journal mode anyway doesn't support delalloc and for all other
> > > > cases iomap_bmap() anyway calls the same function, so no need for doing
> > > > it twice.
> > > > 
> > > > Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> > > 
> > > Hmmm.  I don't recall how jdata actually works, but I get the impression
> > > here that we're trying to flush dirty data out to the journal and then
> > > out to disk, and then drop the JDATA state from the inode.  This
> > > mechanism exists (I guess?) so that dirty file pages get checkpointed
> > > out of jbd2 back into the filesystem so that bmap() returns meaningful
> > > results to lilo.
> > 
> > Exactly. E.g. when we are journalling data, we fill hole through mmap, we will
> > have block allocated as unwritten and we need to write it out so that the
> > data gets to the journal and then do journal flush to get the data to disk
> 
> So in data=journal case in ext4_page_mkwrite the data buffer will also
> be marked as, to be journalled. So does jbd2_journal_flush() itself
> don't take care of writing back any dirty page cache before it commit
> that transaction? and after then checkpoint it?

Er... this sentence is a little garbled, but I think the answer you're
looking for is:

"Yes, writeback (i.e. filemap_write_and_wait) attaches the dirty blocks
to a journal transaction; then jbd2_journal_flush forces the transaction
data out to the on-disk journal; and it also checkpoints the journal so
that the dirty blocks are then written back into the filesystem."

> Sorry my knowledge about jbd2 is very naive.
> 
> > so that lilo can read it from the devices. So removing
> > filemap_write_and_wait() when journalling data is wrong.
> 
> Sure I understand this part. But was just curious on above query.
> Otherwise, IIUC, we will have to add
> filemap_write_and_wait() for JDATA case as well before calling
> for jbd2_journal_flush(). Will add this as a separate patch.

Well you could just move it...

bmap()
{
	/*
	 * In data=journal mode, we must checkpoint the journal to
	 * ensure that any dirty blocks in the journalare checkpointed
	 * to the location that we return to userspace.  Clear JDATA so
	 * that future writes will not be written through the journal.
	 */
	if (JDATA) {
		filemap_write_and_wait(...);
		clear JDATA
		jbd2_journal_flush(...);
	}

	return iomap_bmap(...);
}

(or did "Will add this as a separate patch" refer to fixing FIEMAP?)

--D

> 
> -ritesh
> 
> > 
> > > This makes me wonder if you still need the filemap_write_and_wait in the
> > > JDATA case because otherwise the journal flush won't have the effect of
> > > writing all the dirty pagecache back to the filesystem?  OTOH I suppose
> > > the implicit write-and-wait call after we clear JDATA will not be
> > > writing to the journal.
> > > 
> > > Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?
> > 
> > Yeah, it should do that but that's only performance optimization so that we
> > bother with journal flushing only when someone uses block mapping call on
> > a file with journalled dirty data. So you can hardly notice the bug by
> > testing...
> > 
> > 								Honza
> > 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-04 15:37           ` Darrick J. Wong
@ 2020-03-07  2:32             ` Theodore Y. Ts'o
  0 siblings, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-07  2:32 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, Ritesh Harjani, linux-ext4, adilger.kernel,
	linux-fsdevel, hch, cmaiolino, david

On Wed, Mar 04, 2020 at 07:37:45AM -0800, Darrick J. Wong wrote:
> > > This makes me wonder if you still need the filemap_write_and_wait in the
> > > JDATA case because otherwise the journal flush won't have the effect of
> > > writing all the dirty pagecache back to the filesystem?  OTOH I suppose
> > > the implicit write-and-wait call after we clear JDATA will not be
> > > writing to the journal.
> > > 
> > > Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?
> > 
> > Yeah, it should do that but that's only performance optimization so that we
> > bother with journal flushing only when someone uses block mapping call on
> > a file with journalled dirty data. So you can hardly notice the bug by
> > testing...
> 
> If we ever decide to deprecate FIBMAP officially and push bootloaders to
> use FIEMAP, then we'll have to emulate all the flushing behaviors.  But
> that's something for a separate patch.

This is really only needed for LILO, since I believe this is the only
bootloader which uses the output of FIBMAP to determine the block
number where it will attempt to ***write*** into a data block of a
mounted file system.

I seem to recall either Dave or Christoph ranting at one point that
any program which attempted to write into a mounted file system using
the output of FIEMAP was insane, and we should not be encouraging that
kind of wacko behavior.  :-)

What most bootloaders want is simply the accurate list of block
locations so they can write that into the stage 1 bootloader so it can
read the stage 2 bootloader from the disk.  The reason why we have the
JDATA hack in the bmap code is because LILO will get the block
location, and then try to write config information into that block.
So we are trying to prevent LILO's write of the boot command line from
possibly getting rewritten after a journal replay.  (Of course, no
distribution installer would do something as rude as to just forcibly
rebooting the system without a clean unmount, so this would *never* be
a problem, RIGHT?  :-)

In any case, I'd much rather try to get LILO fixed to do something
sane, rather that move that heavy-ugly JDATA code into FIEMAP, where
it might get triggered unnecessarily by 99.9% of the users who are
doing something not-insane.

							- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-03-07  0:51             ` Darrick J. Wong
@ 2020-03-07  5:50               ` Ritesh Harjani
  0 siblings, 0 replies; 27+ messages in thread
From: Ritesh Harjani @ 2020-03-07  5:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, linux-fsdevel, hch,
	cmaiolino, david



On 3/7/20 6:21 AM, Darrick J. Wong wrote:
> On Fri, Mar 06, 2020 at 11:19:31PM +0530, Ritesh Harjani wrote:
>>
>>
>> On 3/4/20 6:12 PM, Jan Kara wrote:
>>> On Tue 03-03-20 07:47:09, Darrick J. Wong wrote:
>>>> On Mon, Mar 02, 2020 at 02:28:39PM +0530, Ritesh Harjani wrote:
>>>>>
>>>>>
>>>>> On 2/28/20 8:55 PM, Darrick J. Wong wrote:
>>>>>> On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
>>>>>>> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
>>>>>>> so just move the API from generic_block_bmap to iomap_bmap for iomap
>>>>>>> conversion.
>>>>>>>
>>>>>>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>>>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>>>> ---
>>>>>>>     fs/ext4/inode.c | 2 +-
>>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>>>>>>> index 6cf3b969dc86..81fccbae0aea 100644
>>>>>>> --- a/fs/ext4/inode.c
>>>>>>> +++ b/fs/ext4/inode.c
>>>>>>> @@ -3214,7 +3214,7 @@ static sector_t ext4_bmap(struct address_space *mapping, sector_t block)
>>>>>>>     			return 0;
>>>>>>>     	}
>>>>>>> -	return generic_block_bmap(mapping, block, ext4_get_block);
>>>>>>> +	return iomap_bmap(mapping, block, &ext4_iomap_ops);
>>>>>>
>>>>>> /me notes that iomap_bmap will filemap_write_and_wait for you, so one
>>>>>> could optimize ext4_bmap to avoid the double-flush by moving the
>>>>>> filemap_write_and_wait at the top of the function into the JDATA state
>>>>>> clearing block.
>>>>>
>>>>> IIUC, delalloc and data=journal mode are both mutually exclusive.
>>>>> So we could get rid of calling filemap_write_and_wait() all together
>>>>> from ext4_bmap().
>>>>> And as you pointed filemap_write_and_wait() is called by default in
>>>>> iomap_bmap which should cover for delalloc case.
>>>>>
>>>>>
>>>>> @Jan/Darrick,
>>>>> Could you check if the attached patch looks good. If yes then
>>>>> will add your Reviewed-by and send a v6.
>>>>>
>>>>> Thanks for the review!!
>>>>>
>>>>> -ritesh
>>>>>
>>>>>
>>>>
>>>>>   From 93f560d9a483b4f389056e543012d0941734a8f4 Mon Sep 17 00:00:00 2001
>>>>> From: Ritesh Harjani <riteshh@linux.ibm.com>
>>>>> Date: Tue, 20 Aug 2019 18:36:33 +0530
>>>>> Subject: [PATCH 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
>>>>>
>>>>> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
>>>>> so just move the API from generic_block_bmap to iomap_bmap for iomap
>>>>> conversion.
>>>>>
>>>>> Also no need to call for filemap_write_and_wait() any more in ext4_bmap
>>>>> since data=journal mode anyway doesn't support delalloc and for all other
>>>>> cases iomap_bmap() anyway calls the same function, so no need for doing
>>>>> it twice.
>>>>>
>>>>> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
>>>>
>>>> Hmmm.  I don't recall how jdata actually works, but I get the impression
>>>> here that we're trying to flush dirty data out to the journal and then
>>>> out to disk, and then drop the JDATA state from the inode.  This
>>>> mechanism exists (I guess?) so that dirty file pages get checkpointed
>>>> out of jbd2 back into the filesystem so that bmap() returns meaningful
>>>> results to lilo.
>>>
>>> Exactly. E.g. when we are journalling data, we fill hole through mmap, we will
>>> have block allocated as unwritten and we need to write it out so that the
>>> data gets to the journal and then do journal flush to get the data to disk
>>
>> So in data=journal case in ext4_page_mkwrite the data buffer will also
>> be marked as, to be journalled. So does jbd2_journal_flush() itself
>> don't take care of writing back any dirty page cache before it commit
>> that transaction? and after then checkpoint it?
> 
> Er... this sentence is a little garbled, but I think the answer you're
> looking for is:
> 
> "Yes, writeback (i.e. filemap_write_and_wait) attaches the dirty blocks
> to a journal transaction; then jbd2_journal_flush forces the transaction
> data out to the on-disk journal; and it also checkpoints the journal so
> that the dirty blocks are then written back into the filesystem."

Yes. Thanks.


> 
>> Sorry my knowledge about jbd2 is very naive.
>>
>>> so that lilo can read it from the devices. So removing
>>> filemap_write_and_wait() when journalling data is wrong.
>>
>> Sure I understand this part. But was just curious on above query.
>> Otherwise, IIUC, we will have to add
>> filemap_write_and_wait() for JDATA case as well before calling
>> for jbd2_journal_flush(). Will add this as a separate patch.
> 
> Well you could just move it...
> 
> bmap()
> {
> 	/*
> 	 * In data=journal mode, we must checkpoint the journal to
> 	 * ensure that any dirty blocks in the journalare checkpointed
> 	 * to the location that we return to userspace.  Clear JDATA so
> 	 * that future writes will not be written through the journal.
> 	 */
> 	if (JDATA) {
> 		filemap_write_and_wait(...);
> 		clear JDATA
> 		jbd2_journal_flush(...);
> 	}
> 
> 	return iomap_bmap(...);
> }
> 

> (or did "Will add this as a separate patch" refer to fixing FIEMAP?)
No.

What I meant was if filemap_write_and_wait() is required for JDATA case
then the above diff which you just showed, I will add as a separate
patch before moving ext4_bmap() to use iomap_bmap(). i.e. rather then
clubbing it with Patch-3, it will be a separate patch before patch-3.

Sorry about the confusion.

-ritesh

> 
> --D
> 
>>
>> -ritesh
>>
>>>
>>>> This makes me wonder if you still need the filemap_write_and_wait in the
>>>> JDATA case because otherwise the journal flush won't have the effect of
>>>> writing all the dirty pagecache back to the filesystem?  OTOH I suppose
>>>> the implicit write-and-wait call after we clear JDATA will not be
>>>> writing to the journal.
>>>>
>>>> Even more weirdly, the FIEMAP code doesn't drop JDATA at all...?
>>>
>>> Yeah, it should do that but that's only performance optimization so that we
>>> bother with journal flushing only when someone uses block mapping call on
>>> a file with journalled dirty data. So you can hardly notice the bug by
>>> testing...
>>>
>>> 								Honza
>>>
>>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping
  2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
  2020-02-28 15:26   ` Darrick J. Wong
@ 2020-03-13 19:52   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-13 19:52 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, adilger.kernel, linux-fsdevel, darrick.wong,
	hch, cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:54PM +0530, Ritesh Harjani wrote:
> IOMAP_F_MERGED needs to be set in case of non-extent based mapping.
> This is needed in later patches for conversion of ext4_fiemap to use iomap.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks, applied.

					- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth
  2020-02-28  9:26 ` [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth Ritesh Harjani
@ 2020-03-13 20:05   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-13 20:05 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, adilger.kernel, linux-fsdevel, darrick.wong,
	hch, cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:55PM +0530, Ritesh Harjani wrote:
> This patch avoids the memory alloc & free path when depth is 0,
> since anyway there is no extra caching done in that case.
> So on checking depth 0, simply return early.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure.
  2020-02-28  9:26 ` [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure Ritesh Harjani
  2020-02-28 15:25   ` Darrick J. Wong
@ 2020-03-13 20:16   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-13 20:16 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, adilger.kernel, linux-fsdevel, darrick.wong,
	hch, cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:56PM +0530, Ritesh Harjani wrote:
> ext4_iomap_begin is already implemented which provides ext4_map_blocks,
> so just move the API from generic_block_bmap to iomap_bmap for iomap
> conversion.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Applied, thanks.

					- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap
  2020-02-28  9:26 ` [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap Ritesh Harjani
@ 2020-03-13 20:18   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-13 20:18 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, adilger.kernel, linux-fsdevel, darrick.wong,
	hch, cmaiolino, david

On Fri, Feb 28, 2020 at 02:56:57PM +0530, Ritesh Harjani wrote:
> For indirect block mapping if the i_block > max supported block in inode
> then ext4_ind_map_blocks() returns a -EIO error. But in case of fiemap
> this could be a valid query to ->iomap_begin call.
> So check if the offset >= s_bitmap_maxbytes in ext4_iomap_begin_report(),
> then simply skip calling ext4_map_blocks().
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks, applied.

				- Ted

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework.
  2020-02-28  9:26 ` [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework Ritesh Harjani
  2020-02-28 15:21   ` Darrick J. Wong
@ 2020-03-14  3:03   ` Theodore Y. Ts'o
  1 sibling, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-14  3:03 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-ext4, jack, adilger.kernel, linux-fsdevel, darrick.wong,
	hch, cmaiolino, david, kbuild test robot

On Fri, Feb 28, 2020 at 02:56:58PM +0530, Ritesh Harjani wrote:
> This patch moves ext4_fiemap to use iomap framework.
> For xattr a new 'ext4_iomap_xattr_ops' is added.
> 
> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
> Reported-by: kbuild test robot <lkp@intel.com>
> Reviewed-by: Jan Kara <jack@suse.cz>

Thanks, applied.

					- Ted
					

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST
  2020-03-02  8:10     ` Ritesh Harjani
@ 2020-03-14  3:48       ` Theodore Y. Ts'o
  0 siblings, 0 replies; 27+ messages in thread
From: Theodore Y. Ts'o @ 2020-03-14  3:48 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Matthew Wilcox, linux-ext4, jack, adilger.kernel, linux-fsdevel,
	darrick.wong, hch, cmaiolino, david

On Mon, Mar 02, 2020 at 01:40:06PM +0530, Ritesh Harjani wrote:
> 
> Thanks for the review.
> Will make the suggested changes and send a v6.

I didn't see a v6, so I revised this patch to read:

commit 499800830ae5d44ae29f69c98ab9893f0425cb51
Author: Ritesh Harjani <riteshh@linux.ibm.com>
Date:   Fri Feb 28 14:56:59 2020 +0530

    Documentation: correct the description of FIEMAP_EXTENT_LAST
    
    Currently FIEMAP_EXTENT_LAST is not working consistently across
    different filesystem's fiemap implementations. So add more information
    about how else this flag could set in other implementation.
    
    Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
    Link: https://lore.kernel.org/r/5a00e8d4283d6849e0b8f408c8365b31fbc1d153.1582880246.git.riteshh@linux.ibm.com
    Signed-off-by: Theodore Ts'o <tytso@mit.edu>

diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt
index f6d9c99103a4..ac87e6fda842 100644
--- a/Documentation/filesystems/fiemap.txt
+++ b/Documentation/filesystems/fiemap.txt
@@ -115,8 +115,10 @@ data. Note that the opposite is not true - it would be valid for
 FIEMAP_EXTENT_NOT_ALIGNED to appear alone.
 
 * FIEMAP_EXTENT_LAST
-This is the last extent in the file. A mapping attempt past this
-extent will return nothing.
+This is generally the last extent in the file. A mapping attempt past
+this extent may return nothing. Some implementations set this flag to
+indicate this extent is the last one in the range queried by the user
+(via fiemap->fm_length).
 
 * FIEMAP_EXTENT_UNKNOWN
 The location of this extent is currently unknown. This may indicate

     	      	      	     		  	   - Ted

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-03-14  3:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28  9:26 [PATCHv5 0/6] ext4: bmap & fiemap conversion to use iomap Ritesh Harjani
2020-02-28  9:26 ` [PATCHv5 1/6] ext4: Add IOMAP_F_MERGED for non-extent based mapping Ritesh Harjani
2020-02-28 15:26   ` Darrick J. Wong
2020-03-13 19:52   ` Theodore Y. Ts'o
2020-02-28  9:26 ` [PATCHv5 2/6] ext4: Optimize ext4_ext_precache for 0 depth Ritesh Harjani
2020-03-13 20:05   ` Theodore Y. Ts'o
2020-02-28  9:26 ` [PATCHv5 3/6] ext4: Move ext4 bmap to use iomap infrastructure Ritesh Harjani
2020-02-28 15:25   ` Darrick J. Wong
2020-03-02  8:58     ` Ritesh Harjani
2020-03-03 15:47       ` Darrick J. Wong
2020-03-04 12:42         ` Jan Kara
2020-03-04 15:37           ` Darrick J. Wong
2020-03-07  2:32             ` Theodore Y. Ts'o
2020-03-06 17:49           ` Ritesh Harjani
2020-03-07  0:51             ` Darrick J. Wong
2020-03-07  5:50               ` Ritesh Harjani
2020-03-13 20:16   ` Theodore Y. Ts'o
2020-02-28  9:26 ` [PATCHv5 4/6] ext4: Make ext4_ind_map_blocks work with fiemap Ritesh Harjani
2020-03-13 20:18   ` Theodore Y. Ts'o
2020-02-28  9:26 ` [PATCHv5 5/6] ext4: Move ext4_fiemap to use iomap framework Ritesh Harjani
2020-02-28 15:21   ` Darrick J. Wong
2020-03-14  3:03   ` Theodore Y. Ts'o
2020-02-28  9:26 ` [PATCHv5 6/6] Documentation: Correct the description of FIEMAP_EXTENT_LAST Ritesh Harjani
2020-02-28 15:20   ` Darrick J. Wong
2020-02-28 15:36   ` Matthew Wilcox
2020-03-02  8:10     ` Ritesh Harjani
2020-03-14  3:48       ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).