linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFCv3 00/10] ext2: DIO to use iomap
@ 2023-04-13  8:40 Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 01/10] ext2/dax: Fix ext2_setsize when len is page aligned Ritesh Harjani (IBM)
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

Please find the series which moves ext2 direct-io to use modern iomap interface.

RFCv2 -> RFCv3:
1. Addressed minor review comments related to extern, parameter naming in
   function declaration, removing not required braces and shorting overly long
   lines.
2. Added Reviewed-by from various reviewers.
3. Fixed a warning & couple of compilation errors in Patch-7 (ext2 trace points)
   related to CFLAGS_trace & second related to unable to find function
   definition for iov_iter_count(). (requires uio.h file)
   CFLAGS_trace is required in Makefile so that it can find trace.h file from
   tracepoint infrastructure.
4. Changed naming of IOCB_STRINGS TO TRACE_IOCB_STRINGS.
5. Shortened naming of tracepoint events for ext2 dio.
6. Added iomap DIO tracepoint events.
7. Disha tested this series internally against Power with "auto" group for 4k
   and 64k blocksize configuration. Added her "Tested-by" tag in all DIO
   related patches. No new failures were reported.

Thanks everyone for the review and test. The series is looking good to me now.
It has now been tested on x86 and Power with different configurations.
Please let me know if anything else is required on this.

v2: https://lore.kernel.org/all/ZDTybcM4kjYLSrGI@infradead.org/

Ritesh Harjani (IBM) (10):
  ext2/dax: Fix ext2_setsize when len is page aligned
  libfs: Add __generic_file_fsync_nolock implementation
  ext4: Use __generic_file_fsync_nolock implementation
  ext2: Use __generic_file_fsync_nolock implementation
  ext2: Move direct-io to use iomap
  fs.h: Add TRACE_IOCB_STRINGS for use in trace points
  ext2: Add direct-io trace points
  iomap: Remove IOMAP_DIO_NOSYNC unused dio flag
  iomap: Minor refactor of iomap_dio_rw
  iomap: Add trace points for DIO path

 fs/ext2/Makefile      |   5 +-
 fs/ext2/ext2.h        |   1 +
 fs/ext2/file.c        | 127 +++++++++++++++++++++++++++++++++++++++++-
 fs/ext2/inode.c       |  58 +++++++++++--------
 fs/ext2/trace.c       |   6 ++
 fs/ext2/trace.h       |  94 +++++++++++++++++++++++++++++++
 fs/ext4/fsync.c       |  31 +++++------
 fs/iomap/direct-io.c  |  16 ++++--
 fs/iomap/trace.c      |   1 +
 fs/iomap/trace.h      |  90 ++++++++++++++++++++++++++++++
 fs/libfs.c            |  43 ++++++++++++++
 include/linux/fs.h    |  16 ++++++
 include/linux/iomap.h |   6 --
 13 files changed, 444 insertions(+), 50 deletions(-)
 create mode 100644 fs/ext2/trace.c
 create mode 100644 fs/ext2/trace.h

--
2.39.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFCv3 01/10] ext2/dax: Fix ext2_setsize when len is page aligned
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation Ritesh Harjani (IBM)
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

PAGE_ALIGN(x) macro gives the next highest value which is multiple of
pagesize. But if x is already page aligned then it simply returns x.
So, if x passed is 0 in dax_zero_range() function, that means the
length gets passed as 0 to ->iomap_begin().

In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on
here in ext2_get_blocks().
	BUG_ON(maxblocks == 0);

Instead we should be calling dax_truncate_page() here which takes
care of it. i.e. it only calls dax_zero_range if the offset is not
page/block aligned.

This can be easily triggered with following on fsdax mounted pmem
device.

dd if=/dev/zero of=file count=1 bs=512
truncate -s 0 file

[79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff)
[93.793207] ------------[ cut here ]------------
[93.795102] kernel BUG at fs/ext2/inode.c:637!
[93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139
[93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610
<...>
[93.835298] Call Trace:
[93.836253]  <TASK>
[93.837103]  ? lock_acquire+0xf8/0x110
[93.838479]  ? d_lookup+0x69/0xd0
[93.839779]  ext2_iomap_begin+0xa7/0x1c0
[93.841154]  iomap_iter+0xc7/0x150
[93.842425]  dax_zero_range+0x6e/0xa0
[93.843813]  ext2_setsize+0x176/0x1b0
[93.845164]  ext2_setattr+0x151/0x200
[93.846467]  notify_change+0x341/0x4e0
[93.847805]  ? lock_acquire+0xf8/0x110
[93.849143]  ? do_truncate+0x74/0xe0
[93.850452]  ? do_truncate+0x84/0xe0
[93.851739]  do_truncate+0x84/0xe0
[93.852974]  do_sys_ftruncate+0x2b4/0x2f0
[93.854404]  do_syscall_64+0x3f/0x90
[93.855789]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext2/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 26f135e7ffce..dc76147e7b07 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1259,9 +1259,8 @@ static int ext2_setsize(struct inode *inode, loff_t newsize)
 	inode_dio_wait(inode);
 
 	if (IS_DAX(inode))
-		error = dax_zero_range(inode, newsize,
-				       PAGE_ALIGN(newsize) - newsize, NULL,
-				       &ext2_iomap_ops);
+		error = dax_truncate_page(inode, newsize, NULL,
+					  &ext2_iomap_ops);
 	else
 		error = block_truncate_page(inode->i_mapping,
 				newsize, ext2_get_block);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 01/10] ext2/dax: Fix ext2_setsize when len is page aligned Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-14  5:59   ` Christoph Hellwig
  2023-04-13  8:40 ` [RFCv3 03/10] ext4: Use " Ritesh Harjani (IBM)
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

Some of the higher layers like iomap takes inode_lock() when calling
generic_write_sync().
Also writeback already happens from other paths without inode lock,
so it's difficult to say that we really need sync_mapping_buffers() to
take any inode locking here. Having said that, let's add a _nolock
variant of this function in libfs for now so that filesystems like
ext2 and ext4's nojournal mode can use it.

Ext4 when got converted to iomap for direct-io already copied it's own
variant of __generic_file_fsync() without lock. Hence let's add a helper
API and use it both in ext2 and ext4.

Later we can review other filesystems as well to see if we can make
_nolock as the default path if inode_lock() is not necessary here.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/libfs.c         | 43 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |  2 ++
 2 files changed, 45 insertions(+)

diff --git a/fs/libfs.c b/fs/libfs.c
index 4eda519c3002..054f2e0ab3cb 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1110,6 +1110,49 @@ struct dentry *generic_fh_to_parent(struct super_block *sb, struct fid *fid,
 }
 EXPORT_SYMBOL_GPL(generic_fh_to_parent);
 
+/**
+ * __generic_file_fsync_nolock - generic fsync implementation for simple
+ * filesystems with no inode lock
+ *
+ * @file:	file to synchronize
+ * @start:	start offset in bytes
+ * @end:	end offset in bytes (inclusive)
+ * @datasync:	only synchronize essential metadata if true
+ *
+ * This is a generic implementation of the fsync method for simple
+ * filesystems which track all non-inode metadata in the buffers list
+ * hanging off the address_space structure.
+ */
+int __generic_file_fsync_nolock(struct file *file, loff_t start, loff_t end,
+				bool datasync)
+{
+	struct inode *inode = file->f_mapping->host;
+	int err;
+	int ret;
+
+	err = file_write_and_wait_range(file, start, end);
+	if (err)
+		return err;
+
+	ret = sync_mapping_buffers(inode->i_mapping);
+	if (!(inode->i_state & I_DIRTY_ALL))
+		goto out;
+	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+		goto out;
+
+	err = sync_inode_metadata(inode, 1);
+	if (ret == 0)
+		ret = err;
+
+out:
+	/* check and advance again to catch errors after syncing out buffers */
+	err = file_check_and_advance_wb_err(file);
+	if (ret == 0)
+		ret = err;
+	return ret;
+}
+EXPORT_SYMBOL(__generic_file_fsync_nolock);
+
 /**
  * __generic_file_fsync - generic fsync implementation for simple filesystems
  *
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7db..9ca3813f43e2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2935,6 +2935,8 @@ extern ssize_t simple_read_from_buffer(void __user *to, size_t count,
 extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos,
 		const void __user *from, size_t count);
 
+int __generic_file_fsync_nolock(struct file *file, loff_t start, loff_t end,
+				bool datasync);
 extern int __generic_file_fsync(struct file *, loff_t, loff_t, int);
 extern int generic_file_fsync(struct file *, loff_t, loff_t, int);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 03/10] ext4: Use __generic_file_fsync_nolock implementation
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 01/10] ext2/dax: Fix ext2_setsize when len is page aligned Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 04/10] ext2: " Ritesh Harjani (IBM)
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

ext4 when got converted to iomap for dio, it copied __generic_file_fsync
implementation to avoid taking inode_lock in order to avoid any deadlock
(since iomap takes an inode_lock while calling generic_write_sync()).

The previous patch already added it's _nolock variant. Hence kill the
redundant code and use __generic_file_fsync_nolock() function instead.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext4/fsync.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 027a7d7037a0..bf1d1b7f4ec7 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -78,21 +78,13 @@ static int ext4_sync_parent(struct inode *inode)
 	return ret;
 }
 
-static int ext4_fsync_nojournal(struct inode *inode, bool datasync,
-				bool *needs_barrier)
+static int ext4_fsync_nojournal(struct file *file, loff_t start, loff_t end,
+				int datasync, bool *needs_barrier)
 {
-	int ret, err;
-
-	ret = sync_mapping_buffers(inode->i_mapping);
-	if (!(inode->i_state & I_DIRTY_ALL))
-		return ret;
-	if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
-		return ret;
-
-	err = sync_inode_metadata(inode, 1);
-	if (!ret)
-		ret = err;
+	struct inode *inode = file->f_inode;
+	int ret;
 
+	ret = __generic_file_fsync_nolock(file, start, end, datasync);
 	if (!ret)
 		ret = ext4_sync_parent(inode);
 	if (test_opt(inode->i_sb, BARRIER))
@@ -148,6 +140,14 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 	}
 
+	if (!sbi->s_journal) {
+		ret = ext4_fsync_nojournal(file, start, end, datasync,
+					   &needs_barrier);
+		if (needs_barrier)
+			goto issue_flush;
+		goto out;
+	}
+
 	ret = file_write_and_wait_range(file, start, end);
 	if (ret)
 		goto out;
@@ -166,13 +166,12 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 *  (they were dirtied by commit).  But that's OK - the blocks are
 	 *  safe in-journal, which is all fsync() needs to ensure.
 	 */
-	if (!sbi->s_journal)
-		ret = ext4_fsync_nojournal(inode, datasync, &needs_barrier);
-	else if (ext4_should_journal_data(inode))
+	if (ext4_should_journal_data(inode))
 		ret = ext4_force_commit(inode->i_sb);
 	else
 		ret = ext4_fsync_journal(inode, datasync, &needs_barrier);
 
+issue_flush:
 	if (needs_barrier) {
 		err = blkdev_issue_flush(inode->i_sb->s_bdev);
 		if (!ret)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 04/10] ext2: Use __generic_file_fsync_nolock implementation
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (2 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 03/10] ext4: Use " Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 05/10] ext2: Move direct-io to use iomap Ritesh Harjani (IBM)
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

Next patch converts ext2 to use iomap interface for DIO.
iomap layer can call generic_write_sync() -> ext2_fsync() from
iomap_dio_complete while still holding the inode_lock().

Now writeback from other paths doesn't need inode_lock().
It seems there is also no need of an inode_lock for
sync_mapping_buffers(). It uses it's own mapping->private_lock
for it's buffer list handling.
Hence this patch is in preparation to move ext2 to iomap.
This uses __generic_file_fsync_nolock() variant in ext2_fsync().

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext2/file.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 6b4bebe982ca..1d0bc3fc88bb 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -153,7 +153,9 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	int ret;
 	struct super_block *sb = file->f_mapping->host->i_sb;
 
-	ret = generic_file_fsync(file, start, end, datasync);
+	ret = __generic_file_fsync_nolock(file, start, end, datasync);
+	if (!ret)
+		ret = blkdev_issue_flush(sb->s_bdev);
 	if (ret == -EIO)
 		/* We don't really know where the IO error happened... */
 		ext2_error(sb, __func__,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 05/10] ext2: Move direct-io to use iomap
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (3 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 04/10] ext2: " Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points Ritesh Harjani (IBM)
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

This patch converts ext2 direct-io path to iomap interface.
- This also takes care of DIO_SKIP_HOLES part in which we return -ENOTBLK
  from ext2_iomap_begin(), in case if the write is done on a hole.
- This fallbacks to buffered-io in case of DIO_SKIP_HOLES or in case of
  a partial write or if any error is detected in ext2_iomap_end().
  We try to return -ENOTBLK in such cases.
- For any unaligned or extending DIO writes, we pass
  IOMAP_DIO_FORCE_WAIT flag to ensure synchronous writes.
- For extending writes we set IOMAP_F_DIRTY in ext2_iomap_begin because
  otherwise with dsync writes on devices that support FUA, generic_write_sync
  won't be called and we might miss inode metadata updates.
- Since ext2 already now uses _nolock vartiant of sync write. Hence
  there is no inode lock problem with iomap in this patch.
- ext2_iomap_ops are now being shared by DIO, DAX & fiemap path

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext2/ext2.h  |   1 +
 fs/ext2/file.c  | 115 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/ext2/inode.c |  53 ++++++++++++++--------
 3 files changed, 150 insertions(+), 19 deletions(-)

diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h
index cb78d7dcfb95..00168447d48c 100644
--- a/fs/ext2/ext2.h
+++ b/fs/ext2/ext2.h
@@ -753,6 +753,7 @@ extern unsigned long ext2_count_free (struct buffer_head *, unsigned);
 extern struct inode *ext2_iget (struct super_block *, unsigned long);
 extern int ext2_write_inode (struct inode *, struct writeback_control *);
 extern void ext2_evict_inode(struct inode *);
+void ext2_write_failed(struct address_space *mapping, loff_t to);
 extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int);
 extern int ext2_setattr (struct mnt_idmap *, struct dentry *, struct iattr *);
 extern int ext2_getattr (struct mnt_idmap *, const struct path *,
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 1d0bc3fc88bb..243395616599 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -163,12 +163,124 @@ int ext2_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	return ret;
 }
 
+static ssize_t ext2_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	ssize_t ret;
+
+	inode_lock_shared(inode);
+	ret = iomap_dio_rw(iocb, to, &ext2_iomap_ops, NULL, 0, NULL, 0);
+	inode_unlock_shared(inode);
+
+	return ret;
+}
+
+static int ext2_dio_write_end_io(struct kiocb *iocb, ssize_t size,
+				 int error, unsigned int flags)
+{
+	loff_t pos = iocb->ki_pos;
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	if (error)
+		goto out;
+
+	/*
+	 * If we are extending the file, we have to update i_size here before
+	 * page cache gets invalidated in iomap_dio_rw(). This prevents racing
+	 * buffered reads from zeroing out too much from page cache pages.
+	 * Note that all extending writes always happens synchronously with
+	 * inode lock held by ext2_dio_write_iter(). So it is safe to update
+	 * inode size here for extending file writes.
+	 */
+	pos += size;
+	if (pos > i_size_read(inode)) {
+		i_size_write(inode, pos);
+		mark_inode_dirty(inode);
+	}
+out:
+	return error;
+}
+
+static const struct iomap_dio_ops ext2_dio_write_ops = {
+	.end_io = ext2_dio_write_end_io,
+};
+
+static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file->f_mapping->host;
+	ssize_t ret;
+	unsigned int flags = 0;
+	unsigned long blocksize = inode->i_sb->s_blocksize;
+	loff_t offset = iocb->ki_pos;
+	loff_t count = iov_iter_count(from);
+
+	inode_lock(inode);
+	ret = generic_write_checks(iocb, from);
+	if (ret <= 0)
+		goto out_unlock;
+
+	ret = kiocb_modified(iocb);
+	if (ret)
+		goto out_unlock;
+
+	/* use IOMAP_DIO_FORCE_WAIT for unaligned or extending writes */
+	if (iocb->ki_pos + iov_iter_count(from) > i_size_read(inode) ||
+	   (!IS_ALIGNED(iocb->ki_pos | iov_iter_alignment(from), blocksize)))
+		flags |= IOMAP_DIO_FORCE_WAIT;
+
+	ret = iomap_dio_rw(iocb, from, &ext2_iomap_ops, &ext2_dio_write_ops,
+			   flags, NULL, 0);
+
+	/* ENOTBLK is magic return value for fallback to buffered-io */
+	if (ret == -ENOTBLK)
+		ret = 0;
+
+	if (ret < 0 && ret != -EIOCBQUEUED)
+		ext2_write_failed(inode->i_mapping, offset + count);
+
+	/* handle case for partial write and for fallback to buffered write */
+	if (ret >= 0 && iov_iter_count(from)) {
+		loff_t pos, endbyte;
+		ssize_t status;
+		int ret2;
+
+		iocb->ki_flags &= ~IOCB_DIRECT;
+		pos = iocb->ki_pos;
+		status = generic_perform_write(iocb, from);
+		if (unlikely(status < 0)) {
+			ret = status;
+			goto out_unlock;
+		}
+
+		iocb->ki_pos += status;
+		ret += status;
+		endbyte = pos + status - 1;
+		ret2 = filemap_write_and_wait_range(inode->i_mapping, pos,
+						    endbyte);
+		if (!ret2)
+			invalidate_mapping_pages(inode->i_mapping,
+						 pos >> PAGE_SHIFT,
+						 endbyte >> PAGE_SHIFT);
+		if (ret > 0)
+			generic_write_sync(iocb, ret);
+	}
+
+out_unlock:
+	inode_unlock(inode);
+	return ret;
+}
+
 static ssize_t ext2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 {
 #ifdef CONFIG_FS_DAX
 	if (IS_DAX(iocb->ki_filp->f_mapping->host))
 		return ext2_dax_read_iter(iocb, to);
 #endif
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return ext2_dio_read_iter(iocb, to);
+
 	return generic_file_read_iter(iocb, to);
 }
 
@@ -178,6 +290,9 @@ static ssize_t ext2_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	if (IS_DAX(iocb->ki_filp->f_mapping->host))
 		return ext2_dax_write_iter(iocb, from);
 #endif
+	if (iocb->ki_flags & IOCB_DIRECT)
+		return ext2_dio_write_iter(iocb, from);
+
 	return generic_file_write_iter(iocb, from);
 }
 
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index dc76147e7b07..75983215c7a1 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -56,7 +56,7 @@ static inline int ext2_inode_is_fast_symlink(struct inode *inode)
 
 static void ext2_truncate_blocks(struct inode *inode, loff_t offset);
 
-static void ext2_write_failed(struct address_space *mapping, loff_t to)
+void ext2_write_failed(struct address_space *mapping, loff_t to)
 {
 	struct inode *inode = mapping->host;
 
@@ -809,9 +809,27 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 	bool new = false, boundary = false;
 	u32 bno;
 	int ret;
+	bool create = flags & IOMAP_WRITE;
+
+	/*
+	 * For writes that could fill holes inside i_size on a
+	 * DIO_SKIP_HOLES filesystem we forbid block creations: only
+	 * overwrites are permitted.
+	 */
+	if ((flags & IOMAP_DIRECT) &&
+	    (first_block << blkbits) < i_size_read(inode))
+		create = 0;
+
+	/*
+	 * Writes that span EOF might trigger an IO size update on completion,
+	 * so consider them to be dirty for the purposes of O_DSYNC even if
+	 * there is no other metadata changes pending or have been made here.
+	 */
+	if ((flags & IOMAP_WRITE) && offset + length > i_size_read(inode))
+		iomap->flags |= IOMAP_F_DIRTY;
 
 	ret = ext2_get_blocks(inode, first_block, max_blocks,
-			&bno, &new, &boundary, flags & IOMAP_WRITE);
+			&bno, &new, &boundary, create);
 	if (ret < 0)
 		return ret;
 
@@ -823,6 +841,12 @@ static int ext2_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 		iomap->bdev = inode->i_sb->s_bdev;
 
 	if (ret == 0) {
+		/*
+		 * Switch to buffered-io for writing to holes in a non-extent
+		 * based filesystem to avoid stale data exposure problem.
+		 */
+		if (!create && (flags & IOMAP_WRITE) && (flags & IOMAP_DIRECT))
+			return -ENOTBLK;
 		iomap->type = IOMAP_HOLE;
 		iomap->addr = IOMAP_NULL_ADDR;
 		iomap->length = 1 << blkbits;
@@ -844,6 +868,13 @@ static int
 ext2_iomap_end(struct inode *inode, loff_t offset, loff_t length,
 		ssize_t written, unsigned flags, struct iomap *iomap)
 {
+	/*
+	 * Switch to buffered-io in case of any error.
+	 * Blocks allocated can be used by the buffered-io path.
+	 */
+	if ((flags & IOMAP_DIRECT) && (flags & IOMAP_WRITE) && written == 0)
+		return -ENOTBLK;
+
 	if (iomap->type == IOMAP_MAPPED &&
 	    written < length &&
 	    (flags & IOMAP_WRITE))
@@ -908,22 +939,6 @@ static sector_t ext2_bmap(struct address_space *mapping, sector_t block)
 	return generic_block_bmap(mapping,block,ext2_get_block);
 }
 
-static ssize_t
-ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
-{
-	struct file *file = iocb->ki_filp;
-	struct address_space *mapping = file->f_mapping;
-	struct inode *inode = mapping->host;
-	size_t count = iov_iter_count(iter);
-	loff_t offset = iocb->ki_pos;
-	ssize_t ret;
-
-	ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
-	if (ret < 0 && iov_iter_rw(iter) == WRITE)
-		ext2_write_failed(mapping, offset + count);
-	return ret;
-}
-
 static int
 ext2_writepages(struct address_space *mapping, struct writeback_control *wbc)
 {
@@ -946,7 +961,7 @@ const struct address_space_operations ext2_aops = {
 	.write_begin		= ext2_write_begin,
 	.write_end		= ext2_write_end,
 	.bmap			= ext2_bmap,
-	.direct_IO		= ext2_direct_IO,
+	.direct_IO		= noop_direct_IO,
 	.writepages		= ext2_writepages,
 	.migrate_folio		= buffer_migrate_folio,
 	.is_partially_uptodate	= block_is_partially_uptodate,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (4 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 05/10] ext2: Move direct-io to use iomap Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13  9:54   ` Christian Brauner
  2023-04-13  8:40 ` [RFCv3 07/10] ext2: Add direct-io " Ritesh Harjani (IBM)
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM),
	Christoph Hellwig

Add TRACE_IOCB_STRINGS macro which can be used in the trace point patch to
print different flag values with meaningful string output.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 include/linux/fs.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9ca3813f43e2..6903fc15987a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -340,6 +340,20 @@ enum rw_hint {
 /* can use bio alloc cache */
 #define IOCB_ALLOC_CACHE	(1 << 21)
 
+/* for use in trace events */
+#define TRACE_IOCB_STRINGS \
+	{ IOCB_HIPRI, "HIPRI"	}, \
+	{ IOCB_DSYNC, "DSYNC"	}, \
+	{ IOCB_SYNC, "SYNC"	}, \
+	{ IOCB_NOWAIT, "NOWAIT" }, \
+	{ IOCB_APPEND, "APPEND" }, \
+	{ IOCB_EVENTFD, "EVENTD"}, \
+	{ IOCB_DIRECT, "DIRECT" }, \
+	{ IOCB_WRITE, "WRITE"	}, \
+	{ IOCB_WAITQ, "WAITQ"	}, \
+	{ IOCB_NOIO, "NOIO"	}, \
+	{ IOCB_ALLOC_CACHE, "ALLOC_CACHE" }
+
 struct kiocb {
 	struct file		*ki_filp;
 	loff_t			ki_pos;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 07/10] ext2: Add direct-io trace points
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (5 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-14  6:00   ` Christoph Hellwig
  2023-04-13  8:40 ` [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag Ritesh Harjani (IBM)
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

This patch adds the trace point to ext2 direct-io apis
in fs/ext2/file.c

Here is how the output looks like
         xfs_io-4099  [004]    81.431800: ext2_dio_write_iter_start: dev 7:7 ino 0xc isize 0x4000 pos 0x4000 count 4096 flags DIRECT aio 0 ret=0
         xfs_io-4099  [004]    81.432002: ext2_dio_write_end_io: dev 7:7 ino 0xc isize 0x5000 pos 0x4000 count 4096 flags DIRECT aio 0 ret=0
         xfs_io-4099  [004]    81.432014: ext2_dio_write_iter_dio_end: dev 7:7 ino 0xc isize 0x5000 pos 0x5000 count 0 flags DIRECT aio 0 ret=4096
aio-dio-fcntl-r-4103  [005]    81.461123: ext2_dio_write_iter_start: dev 7:7 ino 0xc isize 0x400 pos 0x200 count 512 flags DIRECT|WRITE aio 1 ret=0
aio-dio-fcntl-r-4103  [005]    81.462099: ext2_dio_write_end_io: dev 7:7 ino 0xc isize 0x400 pos 0x200 count 512 flags DIRECT|WRITE aio 1 ret=0
aio-dio-fcntl-r-4103  [005]    81.462133: ext2_dio_write_iter_dio_end: dev 7:7 ino 0xc isize 0x400 pos 0x400 count 0 flags DIRECT|WRITE aio 1 ret=512

Reported-and-tested-by: Disha Goel <disgoel@linux.ibm.com>
[Need to add CFLAGS_trace for fixing unable to find trace file problem]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/ext2/Makefile |  5 ++-
 fs/ext2/file.c   | 10 +++++-
 fs/ext2/trace.c  |  6 ++++
 fs/ext2/trace.h  | 94 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 fs/ext2/trace.c
 create mode 100644 fs/ext2/trace.h

diff --git a/fs/ext2/Makefile b/fs/ext2/Makefile
index 311479d864a7..8860948ef9ca 100644
--- a/fs/ext2/Makefile
+++ b/fs/ext2/Makefile
@@ -6,7 +6,10 @@
 obj-$(CONFIG_EXT2_FS) += ext2.o
 
 ext2-y := balloc.o dir.o file.o ialloc.o inode.o \
-	  ioctl.o namei.o super.o symlink.o
+	  ioctl.o namei.o super.o symlink.o trace.o
+
+# For tracepoints to include our trace.h from tracepoint infrastructure
+CFLAGS_trace.o := -I$(src)
 
 ext2-$(CONFIG_EXT2_FS_XATTR)	 += xattr.o xattr_user.o xattr_trusted.o
 ext2-$(CONFIG_EXT2_FS_POSIX_ACL) += acl.o
diff --git a/fs/ext2/file.c b/fs/ext2/file.c
index 243395616599..902eaabc5c62 100644
--- a/fs/ext2/file.c
+++ b/fs/ext2/file.c
@@ -28,6 +28,7 @@
 #include "ext2.h"
 #include "xattr.h"
 #include "acl.h"
+#include "trace.h"
 
 #ifdef CONFIG_FS_DAX
 static ssize_t ext2_dax_read_iter(struct kiocb *iocb, struct iov_iter *to)
@@ -169,9 +170,11 @@ static ssize_t ext2_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	struct inode *inode = file->f_mapping->host;
 	ssize_t ret;
 
+	trace_ext2_dio_read_begin(iocb, to, 0);
 	inode_lock_shared(inode);
 	ret = iomap_dio_rw(iocb, to, &ext2_iomap_ops, NULL, 0, NULL, 0);
 	inode_unlock_shared(inode);
+	trace_ext2_dio_read_end(iocb, to, ret);
 
 	return ret;
 }
@@ -199,6 +202,7 @@ static int ext2_dio_write_end_io(struct kiocb *iocb, ssize_t size,
 		mark_inode_dirty(inode);
 	}
 out:
+	trace_ext2_dio_write_endio(iocb, size, error);
 	return error;
 }
 
@@ -215,7 +219,9 @@ static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	unsigned long blocksize = inode->i_sb->s_blocksize;
 	loff_t offset = iocb->ki_pos;
 	loff_t count = iov_iter_count(from);
+	ssize_t status = 0;
 
+	trace_ext2_dio_write_begin(iocb, from, 0);
 	inode_lock(inode);
 	ret = generic_write_checks(iocb, from);
 	if (ret <= 0)
@@ -243,7 +249,6 @@ static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	/* handle case for partial write and for fallback to buffered write */
 	if (ret >= 0 && iov_iter_count(from)) {
 		loff_t pos, endbyte;
-		ssize_t status;
 		int ret2;
 
 		iocb->ki_flags &= ~IOCB_DIRECT;
@@ -269,6 +274,9 @@ static ssize_t ext2_dio_write_iter(struct kiocb *iocb, struct iov_iter *from)
 
 out_unlock:
 	inode_unlock(inode);
+	if (status)
+		trace_ext2_dio_write_buff_end(iocb, from, status);
+	trace_ext2_dio_write_end(iocb, from, ret);
 	return ret;
 }
 
diff --git a/fs/ext2/trace.c b/fs/ext2/trace.c
new file mode 100644
index 000000000000..b01cdf6526fd
--- /dev/null
+++ b/fs/ext2/trace.c
@@ -0,0 +1,6 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "ext2.h"
+#include <linux/uio.h>
+
+#define CREATE_TRACE_POINTS
+#include "trace.h"
diff --git a/fs/ext2/trace.h b/fs/ext2/trace.h
new file mode 100644
index 000000000000..36deeaa8cfe2
--- /dev/null
+++ b/fs/ext2/trace.h
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM ext2
+
+#if !defined(_EXT2_TRACE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _EXT2_TRACE_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(ext2_dio_class,
+	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter, int ret),
+	TP_ARGS(iocb, iter, ret),
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t, isize)
+		__field(loff_t, pos)
+		__field(u64,	count)
+		__field(int,	ki_flags)
+		__field(bool,	aio)
+		__field(int,	ret)
+	),
+	TP_fast_assign(
+		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
+		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
+		__entry->isize = file_inode(iocb->ki_filp)->i_size;
+		__entry->pos = iocb->ki_pos;
+		__entry->count = iov_iter_count(iter);
+		__entry->ki_flags = iocb->ki_flags;
+		__entry->aio = !is_sync_kiocb(iocb);
+		__entry->ret = ret;
+	),
+	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx count %llu flags %s aio %d ret %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->isize,
+		  __entry->pos,
+		  __entry->count,
+		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
+		  __entry->aio,
+		  __entry->ret)
+);
+
+#define DEFINE_DIO_RW_EVENT(name)						\
+DEFINE_EVENT(ext2_dio_class, name,					\
+	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter, int ret),	\
+	TP_ARGS(iocb, iter, ret))
+DEFINE_DIO_RW_EVENT(ext2_dio_write_begin);
+DEFINE_DIO_RW_EVENT(ext2_dio_write_end);
+DEFINE_DIO_RW_EVENT(ext2_dio_write_buff_end);
+DEFINE_DIO_RW_EVENT(ext2_dio_read_begin);
+DEFINE_DIO_RW_EVENT(ext2_dio_read_end);
+
+TRACE_EVENT(ext2_dio_write_endio,
+	TP_PROTO(struct kiocb *iocb, loff_t size, int ret),
+	TP_ARGS(iocb, size, ret),
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t, isize)
+		__field(loff_t, pos)
+		__field(loff_t, size)
+		__field(int,	ki_flags)
+		__field(bool,	aio)
+		__field(int,	ret)
+	),
+	TP_fast_assign(
+		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
+		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
+		__entry->isize = file_inode(iocb->ki_filp)->i_size;
+		__entry->pos = iocb->ki_pos;
+		__entry->size = size;
+		__entry->ki_flags = iocb->ki_flags;
+		__entry->aio = !is_sync_kiocb(iocb);
+		__entry->ret = ret;
+	),
+	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx size %lld flags %s aio %d ret %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->isize,
+		  __entry->pos,
+		  __entry->size,
+		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
+		  __entry->aio,
+		  __entry->ret)
+);
+
+#endif /* _EXT2_TRACE_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#define TRACE_INCLUDE_FILE trace
+#include <trace/define_trace.h>
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (6 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 07/10] ext2: Add direct-io " Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13 14:34   ` Darrick J. Wong
  2023-04-13  8:40 ` [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw Ritesh Harjani (IBM)
  2023-04-13  8:40 ` [RFCv3 10/10] iomap: Add trace points for DIO path Ritesh Harjani (IBM)
  9 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM),
	Christoph Hellwig

IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for
aio dsync writes this is not useful anyway. For aio dsync case, we
we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC
doesn't let iomap_dio_complete() to call generic_write_sync(),
hence we may lose the sync write.

Hence kill this flag as it is not in use by any FS now.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/direct-io.c  | 2 +-
 include/linux/iomap.h | 6 ------
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f771001574d0..36ab1152dbea 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -541,7 +541,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		}
 
 		/* for data sync or sync, we need sync completion processing */
-		if (iocb_is_dsync(iocb) && !(dio_flags & IOMAP_DIO_NOSYNC)) {
+		if (iocb_is_dsync(iocb)) {
 			dio->flags |= IOMAP_DIO_NEED_SYNC;
 
 		       /*
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 0f8123504e5e..e2b836c2e119 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -377,12 +377,6 @@ struct iomap_dio_ops {
  */
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
-/*
- * The caller will sync the write if needed; do not sync it within
- * iomap_dio_rw.  Overrides IOMAP_DIO_FORCE_WAIT.
- */
-#define IOMAP_DIO_NOSYNC		(1 << 3)
-
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
 		unsigned int dio_flags, void *private, size_t done_before);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (7 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13 14:35   ` Darrick J. Wong
  2023-04-13  8:40 ` [RFCv3 10/10] iomap: Add trace points for DIO path Ritesh Harjani (IBM)
  9 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

The next patch brings in the tracepoint patch for iomap DIO functions.
This is a small refactor change for having a single out path.

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/direct-io.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 36ab1152dbea..5871956ee880 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -679,11 +679,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		unsigned int dio_flags, void *private, size_t done_before)
 {
 	struct iomap_dio *dio;
+	ssize_t ret = 0;
 
 	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
 			     done_before);
-	if (IS_ERR_OR_NULL(dio))
-		return PTR_ERR_OR_ZERO(dio);
-	return iomap_dio_complete(dio);
+	if (IS_ERR_OR_NULL(dio)) {
+		ret = PTR_ERR_OR_ZERO(dio);
+		goto out;
+	}
+	ret = iomap_dio_complete(dio);
+out:
+	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_rw);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
                   ` (8 preceding siblings ...)
  2023-04-13  8:40 ` [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw Ritesh Harjani (IBM)
@ 2023-04-13  8:40 ` Ritesh Harjani (IBM)
  2023-04-13 14:42   ` Darrick J. Wong
  2023-04-14  6:04   ` Christoph Hellwig
  9 siblings, 2 replies; 33+ messages in thread
From: Ritesh Harjani (IBM) @ 2023-04-13  8:40 UTC (permalink / raw)
  To: linux-fsdevel, linux-ext4
  Cc: Jan Kara, Christoph Hellwig, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel, Ritesh Harjani (IBM)

This patch adds trace point events for iomap DIO path.

<e.g. iomap dio trace>
     xfs_io-8815  [000]   526.790418: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x0 pos 0x0 count 4096 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 0
     xfs_io-8815  [000]   526.790978: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x1000 pos 0x1000 flags DIRECT aio 0 error 0 ret 4096
     xfs_io-8815  [000]   526.790988: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x1000 pos 0x1000 count 0 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 4096
        fsx-8827  [005]   526.939345: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 61440 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret 0
        fsx-8827  [005]   526.939459: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 0 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret -529
ksoftirqd/5-41    [005]   526.939564: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x922f8 pos 0x5e000 flags NOWAIT|DIRECT|ALLOC_CACHE aio 1 error 0 ret 61440

Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
 fs/iomap/direct-io.c |  3 ++
 fs/iomap/trace.c     |  1 +
 fs/iomap/trace.h     | 90 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 94 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5871956ee880..bb7a6dfbc8b3 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 	if (ret > 0)
 		ret += dio->done_before;
 
+	trace_iomap_dio_complete(iocb, dio->error, ret);
 	kfree(dio);
 
 	return ret;
@@ -681,6 +682,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	struct iomap_dio *dio;
 	ssize_t ret = 0;
 
+	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
 	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
 			     done_before);
 	if (IS_ERR_OR_NULL(dio)) {
@@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	}
 	ret = iomap_dio_complete(dio);
 out:
+	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(iomap_dio_rw);
diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
index da217246b1a9..728d5443daf5 100644
--- a/fs/iomap/trace.c
+++ b/fs/iomap/trace.c
@@ -3,6 +3,7 @@
  * Copyright (c) 2019 Christoph Hellwig
  */
 #include <linux/iomap.h>
+#include <linux/uio.h>
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index f6ea9540d082..dcb4dd4db5fb 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -183,6 +183,96 @@ TRACE_EVENT(iomap_iter,
 		   (void *)__entry->caller)
 );
 
+#define TRACE_IOMAP_DIO_STRINGS \
+	{IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
+	{IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
+	{IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
+
+DECLARE_EVENT_CLASS(iomap_dio_class,
+	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,
+		 unsigned int dio_flags, u64 done_before, int ret),
+	TP_ARGS(iocb, iter, dio_flags, done_before, ret),
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t, isize)
+		__field(loff_t, pos)
+		__field(u64,	count)
+		__field(u64,	done_before)
+		__field(int,	ki_flags)
+		__field(unsigned int,	dio_flags)
+		__field(bool,	aio)
+		__field(int, ret)
+	),
+	TP_fast_assign(
+		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
+		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
+		__entry->isize = file_inode(iocb->ki_filp)->i_size;
+		__entry->pos = iocb->ki_pos;
+		__entry->count = iov_iter_count(iter);
+		__entry->done_before = done_before;
+		__entry->dio_flags = dio_flags;
+		__entry->ki_flags = iocb->ki_flags;
+		__entry->aio = !is_sync_kiocb(iocb);
+		__entry->ret = ret;
+	),
+	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx count %llu "
+		  "flags %s dio_flags %s done_before %llu aio %d ret %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->isize,
+		  __entry->pos,
+		  __entry->count,
+		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
+		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),
+		  __entry->done_before,
+		  __entry->aio,
+		  __entry->ret)
+)
+
+#define DEFINE_DIO_RW_EVENT(name)					\
+DEFINE_EVENT(iomap_dio_class, name,					\
+	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,		\
+		 unsigned int dio_flags, u64 done_before,		\
+		 int ret),						\
+	TP_ARGS(iocb, iter, dio_flags, done_before, ret))
+DEFINE_DIO_RW_EVENT(iomap_dio_rw_begin);
+DEFINE_DIO_RW_EVENT(iomap_dio_rw_end);
+
+TRACE_EVENT(iomap_dio_complete,
+	TP_PROTO(struct kiocb *iocb, int error, int ret),
+	TP_ARGS(iocb, error, ret),
+	TP_STRUCT__entry(
+		__field(dev_t,	dev)
+		__field(ino_t,	ino)
+		__field(loff_t, isize)
+		__field(loff_t, pos)
+		__field(int,	ki_flags)
+		__field(bool,	aio)
+		__field(int,	error)
+		__field(int,	ret)
+	),
+	TP_fast_assign(
+		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
+		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
+		__entry->isize = file_inode(iocb->ki_filp)->i_size;
+		__entry->pos = iocb->ki_pos;
+		__entry->ki_flags = iocb->ki_flags;
+		__entry->aio = !is_sync_kiocb(iocb);
+		__entry->error = error;
+		__entry->ret = ret;
+	),
+	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx flags %s aio %d error %d ret %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->isize,
+		  __entry->pos,
+		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
+		  __entry->aio,
+		  __entry->error,
+		  __entry->ret)
+);
+
 #endif /* _IOMAP_TRACE_H */
 
 #undef TRACE_INCLUDE_PATH
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points
  2023-04-13  8:40 ` [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points Ritesh Harjani (IBM)
@ 2023-04-13  9:54   ` Christian Brauner
  2023-04-13 10:15     ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Christian Brauner @ 2023-04-13  9:54 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel, Christoph Hellwig

On Thu, Apr 13, 2023 at 02:10:28PM +0530, Ritesh Harjani (IBM) wrote:
> Add TRACE_IOCB_STRINGS macro which can be used in the trace point patch to
> print different flag values with meaningful string output.
> 
> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---

Fine, but fs.h is such a dumping ground already I hope we can split more
stuff out of it going forward...

>  include/linux/fs.h | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9ca3813f43e2..6903fc15987a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -340,6 +340,20 @@ enum rw_hint {
>  /* can use bio alloc cache */
>  #define IOCB_ALLOC_CACHE	(1 << 21)
>  
> +/* for use in trace events */
> +#define TRACE_IOCB_STRINGS \
> +	{ IOCB_HIPRI, "HIPRI"	}, \
> +	{ IOCB_DSYNC, "DSYNC"	}, \
> +	{ IOCB_SYNC, "SYNC"	}, \
> +	{ IOCB_NOWAIT, "NOWAIT" }, \
> +	{ IOCB_APPEND, "APPEND" }, \
> +	{ IOCB_EVENTFD, "EVENTD"}, \

s/EVENTD/EVENTFD/

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points
  2023-04-13  9:54   ` Christian Brauner
@ 2023-04-13 10:15     ` Ritesh Harjani
  0 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-13 10:15 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel, Christoph Hellwig

Christian Brauner <brauner@kernel.org> writes:

Hi Christian

Thanks for your review!

> On Thu, Apr 13, 2023 at 02:10:28PM +0530, Ritesh Harjani (IBM) wrote:
>> Add TRACE_IOCB_STRINGS macro which can be used in the trace point patch to
>> print different flag values with meaningful string output.
>>
>> Tested-by: Disha Goel <disgoel@linux.ibm.com>
>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>
> Fine, but fs.h is such a dumping ground already

Ok, 3205 lines in fs.h.

> I hope we can split more stuff out of it going forward...

Any first thoughts/suggestions like what?

>>  include/linux/fs.h | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 9ca3813f43e2..6903fc15987a 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -340,6 +340,20 @@ enum rw_hint {
>>  /* can use bio alloc cache */
>>  #define IOCB_ALLOC_CACHE	(1 << 21)
>>
>> +/* for use in trace events */
>> +#define TRACE_IOCB_STRINGS \
>> +	{ IOCB_HIPRI, "HIPRI"	}, \
>> +	{ IOCB_DSYNC, "DSYNC"	}, \
>> +	{ IOCB_SYNC, "SYNC"	}, \
>> +	{ IOCB_NOWAIT, "NOWAIT" }, \
>> +	{ IOCB_APPEND, "APPEND" }, \
>> +	{ IOCB_EVENTFD, "EVENTD"}, \
>
> s/EVENTD/EVENTFD/

Oops an oversight. Thanks for catching it.

-ritesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag
  2023-04-13  8:40 ` [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag Ritesh Harjani (IBM)
@ 2023-04-13 14:34   ` Darrick J. Wong
  0 siblings, 0 replies; 33+ messages in thread
From: Darrick J. Wong @ 2023-04-13 14:34 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel, Christoph Hellwig

On Thu, Apr 13, 2023 at 02:10:30PM +0530, Ritesh Harjani (IBM) wrote:
> IOMAP_DIO_NOSYNC earlier was added for use in btrfs. But it seems for
> aio dsync writes this is not useful anyway. For aio dsync case, we
> we queue the request and return -EIOCBQUEUED. Now, since IOMAP_DIO_NOSYNC
> doesn't let iomap_dio_complete() to call generic_write_sync(),
> hence we may lose the sync write.
> 
> Hence kill this flag as it is not in use by any FS now.
> 
> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/direct-io.c  | 2 +-
>  include/linux/iomap.h | 6 ------
>  2 files changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f771001574d0..36ab1152dbea 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -541,7 +541,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		}
>  
>  		/* for data sync or sync, we need sync completion processing */
> -		if (iocb_is_dsync(iocb) && !(dio_flags & IOMAP_DIO_NOSYNC)) {
> +		if (iocb_is_dsync(iocb)) {
>  			dio->flags |= IOMAP_DIO_NEED_SYNC;
>  
>  		       /*
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 0f8123504e5e..e2b836c2e119 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -377,12 +377,6 @@ struct iomap_dio_ops {
>   */
>  #define IOMAP_DIO_PARTIAL		(1 << 2)
>  
> -/*
> - * The caller will sync the write if needed; do not sync it within
> - * iomap_dio_rw.  Overrides IOMAP_DIO_FORCE_WAIT.
> - */
> -#define IOMAP_DIO_NOSYNC		(1 << 3)
> -
>  ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
>  		unsigned int dio_flags, void *private, size_t done_before);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw
  2023-04-13  8:40 ` [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw Ritesh Harjani (IBM)
@ 2023-04-13 14:35   ` Darrick J. Wong
  2023-04-14  6:00     ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2023-04-13 14:35 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

On Thu, Apr 13, 2023 at 02:10:31PM +0530, Ritesh Harjani (IBM) wrote:
> The next patch brings in the tracepoint patch for iomap DIO functions.
> This is a small refactor change for having a single out path.
> 
> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>

IMHO this could've been part of the next patch instead of separate, but
eh, whatever, looks good to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/iomap/direct-io.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 36ab1152dbea..5871956ee880 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -679,11 +679,16 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  		unsigned int dio_flags, void *private, size_t done_before)
>  {
>  	struct iomap_dio *dio;
> +	ssize_t ret = 0;
>  
>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>  			     done_before);
> -	if (IS_ERR_OR_NULL(dio))
> -		return PTR_ERR_OR_ZERO(dio);
> -	return iomap_dio_complete(dio);
> +	if (IS_ERR_OR_NULL(dio)) {
> +		ret = PTR_ERR_OR_ZERO(dio);
> +		goto out;
> +	}
> +	ret = iomap_dio_complete(dio);
> +out:
> +	return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_dio_rw);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-13  8:40 ` [RFCv3 10/10] iomap: Add trace points for DIO path Ritesh Harjani (IBM)
@ 2023-04-13 14:42   ` Darrick J. Wong
  2023-04-13 20:18     ` Ritesh Harjani
  2023-04-14  6:04   ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2023-04-13 14:42 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

On Thu, Apr 13, 2023 at 02:10:32PM +0530, Ritesh Harjani (IBM) wrote:
> This patch adds trace point events for iomap DIO path.
> 
> <e.g. iomap dio trace>
>      xfs_io-8815  [000]   526.790418: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x0 pos 0x0 count 4096 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 0
>      xfs_io-8815  [000]   526.790978: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x1000 pos 0x1000 flags DIRECT aio 0 error 0 ret 4096
>      xfs_io-8815  [000]   526.790988: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x1000 pos 0x1000 count 0 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 4096
>         fsx-8827  [005]   526.939345: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 61440 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret 0
>         fsx-8827  [005]   526.939459: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 0 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret -529
> ksoftirqd/5-41    [005]   526.939564: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x922f8 pos 0x5e000 flags NOWAIT|DIRECT|ALLOC_CACHE aio 1 error 0 ret 61440
> 
> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
>  fs/iomap/direct-io.c |  3 ++
>  fs/iomap/trace.c     |  1 +
>  fs/iomap/trace.h     | 90 ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 94 insertions(+)
> 
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index 5871956ee880..bb7a6dfbc8b3 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>  	if (ret > 0)
>  		ret += dio->done_before;
>  
> +	trace_iomap_dio_complete(iocb, dio->error, ret);
>  	kfree(dio);
>  
>  	return ret;
> @@ -681,6 +682,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	struct iomap_dio *dio;
>  	ssize_t ret = 0;
>  
> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>  			     done_before);
>  	if (IS_ERR_OR_NULL(dio)) {
> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	}
>  	ret = iomap_dio_complete(dio);
>  out:
> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(iomap_dio_rw);
> diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
> index da217246b1a9..728d5443daf5 100644
> --- a/fs/iomap/trace.c
> +++ b/fs/iomap/trace.c
> @@ -3,6 +3,7 @@
>   * Copyright (c) 2019 Christoph Hellwig
>   */
>  #include <linux/iomap.h>
> +#include <linux/uio.h>
>  
>  /*
>   * We include this last to have the helpers above available for the trace
> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> index f6ea9540d082..dcb4dd4db5fb 100644
> --- a/fs/iomap/trace.h
> +++ b/fs/iomap/trace.h
> @@ -183,6 +183,96 @@ TRACE_EVENT(iomap_iter,
>  		   (void *)__entry->caller)
>  );
>  
> +#define TRACE_IOMAP_DIO_STRINGS \
> +	{IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
> +	{IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
> +	{IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }

Can you make the strings line up too, please?

> +
> +DECLARE_EVENT_CLASS(iomap_dio_class,
> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,
> +		 unsigned int dio_flags, u64 done_before, int ret),

We're passing in ssize_t values for @ret, shouldn't the types match?

> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret),
> +	TP_STRUCT__entry(
> +		__field(dev_t,	dev)
> +		__field(ino_t,	ino)
> +		__field(loff_t, isize)
> +		__field(loff_t, pos)
> +		__field(u64,	count)

What's the difference between "length" as used in the other tracepoints
and "count" here?

> +		__field(u64,	done_before)
> +		__field(int,	ki_flags)
> +		__field(unsigned int,	dio_flags)
> +		__field(bool,	aio)
> +		__field(int, ret)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
> +		__entry->pos = iocb->ki_pos;
> +		__entry->count = iov_iter_count(iter);
> +		__entry->done_before = done_before;
> +		__entry->dio_flags = dio_flags;
> +		__entry->ki_flags = iocb->ki_flags;
> +		__entry->aio = !is_sync_kiocb(iocb);
> +		__entry->ret = ret;
> +	),
> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx count %llu "

count and done_before are lengths of file operations, in bytes, right?

Everywhere else we use 0x%llx for that.

> +		  "flags %s dio_flags %s done_before %llu aio %d ret %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->isize,
> +		  __entry->pos,
> +		  __entry->count,
> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
> +		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),
> +		  __entry->done_before,
> +		  __entry->aio,
> +		  __entry->ret)
> +)
> +
> +#define DEFINE_DIO_RW_EVENT(name)					\
> +DEFINE_EVENT(iomap_dio_class, name,					\
> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,		\
> +		 unsigned int dio_flags, u64 done_before,		\
> +		 int ret),						\
> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret))
> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_begin);
> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_end);
> +
> +TRACE_EVENT(iomap_dio_complete,
> +	TP_PROTO(struct kiocb *iocb, int error, int ret),
> +	TP_ARGS(iocb, error, ret),
> +	TP_STRUCT__entry(
> +		__field(dev_t,	dev)
> +		__field(ino_t,	ino)
> +		__field(loff_t, isize)
> +		__field(loff_t, pos)
> +		__field(int,	ki_flags)
> +		__field(bool,	aio)
> +		__field(int,	error)
> +		__field(int,	ret)

Same comment about @ret and ssize_t here.

--D

> +	),
> +	TP_fast_assign(
> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
> +		__entry->pos = iocb->ki_pos;
> +		__entry->ki_flags = iocb->ki_flags;
> +		__entry->aio = !is_sync_kiocb(iocb);
> +		__entry->error = error;
> +		__entry->ret = ret;
> +	),
> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx flags %s aio %d error %d ret %d",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->isize,
> +		  __entry->pos,
> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
> +		  __entry->aio,
> +		  __entry->error,
> +		  __entry->ret)
> +);
> +
>  #endif /* _IOMAP_TRACE_H */
>  
>  #undef TRACE_INCLUDE_PATH
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-13 14:42   ` Darrick J. Wong
@ 2023-04-13 20:18     ` Ritesh Harjani
  2023-04-14  2:16       ` Darrick J. Wong
  0 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-13 20:18 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

"Darrick J. Wong" <djwong@kernel.org> writes:

> On Thu, Apr 13, 2023 at 02:10:32PM +0530, Ritesh Harjani (IBM) wrote:
>> This patch adds trace point events for iomap DIO path.
>>
>> <e.g. iomap dio trace>
>>      xfs_io-8815  [000]   526.790418: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x0 pos 0x0 count 4096 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 0
>>      xfs_io-8815  [000]   526.790978: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x1000 pos 0x1000 flags DIRECT aio 0 error 0 ret 4096
>>      xfs_io-8815  [000]   526.790988: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x1000 pos 0x1000 count 0 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 4096
>>         fsx-8827  [005]   526.939345: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 61440 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret 0
>>         fsx-8827  [005]   526.939459: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 0 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret -529
>> ksoftirqd/5-41    [005]   526.939564: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x922f8 pos 0x5e000 flags NOWAIT|DIRECT|ALLOC_CACHE aio 1 error 0 ret 61440
>>
>> Tested-by: Disha Goel <disgoel@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>>  fs/iomap/direct-io.c |  3 ++
>>  fs/iomap/trace.c     |  1 +
>>  fs/iomap/trace.h     | 90 ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 94 insertions(+)
>>
>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> index 5871956ee880..bb7a6dfbc8b3 100644
>> --- a/fs/iomap/direct-io.c
>> +++ b/fs/iomap/direct-io.c
>> @@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>>  	if (ret > 0)
>>  		ret += dio->done_before;
>>
>> +	trace_iomap_dio_complete(iocb, dio->error, ret);
>>  	kfree(dio);
>>
>>  	return ret;
>> @@ -681,6 +682,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>  	struct iomap_dio *dio;
>>  	ssize_t ret = 0;
>>
>> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
>>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>>  			     done_before);
>>  	if (IS_ERR_OR_NULL(dio)) {
>> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>  	}
>>  	ret = iomap_dio_complete(dio);
>>  out:
>> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
>>  	return ret;
>>  }
>>  EXPORT_SYMBOL_GPL(iomap_dio_rw);
>> diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
>> index da217246b1a9..728d5443daf5 100644
>> --- a/fs/iomap/trace.c
>> +++ b/fs/iomap/trace.c
>> @@ -3,6 +3,7 @@
>>   * Copyright (c) 2019 Christoph Hellwig
>>   */
>>  #include <linux/iomap.h>
>> +#include <linux/uio.h>
>>
>>  /*
>>   * We include this last to have the helpers above available for the trace
>> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
>> index f6ea9540d082..dcb4dd4db5fb 100644
>> --- a/fs/iomap/trace.h
>> +++ b/fs/iomap/trace.h
>> @@ -183,6 +183,96 @@ TRACE_EVENT(iomap_iter,
>>  		   (void *)__entry->caller)
>>  );
>>
>> +#define TRACE_IOMAP_DIO_STRINGS \
>> +	{IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
>> +	{IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
>> +	{IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
>
> Can you make the strings line up too, please?
>

Ok near other _STRINGS macro. Sure, will do that.


>> +
>> +DECLARE_EVENT_CLASS(iomap_dio_class,
>> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,
>> +		 unsigned int dio_flags, u64 done_before, int ret),
>
> We're passing in ssize_t values for @ret, shouldn't the types match?
>

Yes, I missed to correct that. Will make it loff_t.
This should be fixed in ext2 trace point macro too.

(ssize_t can vary based on 32 bit v/s 64 bit, so while printing it as
%llx it gives warning on 32bit. Hence will use loff_t for ret)


>> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret),
>> +	TP_STRUCT__entry(
>> +		__field(dev_t,	dev)
>> +		__field(ino_t,	ino)
>> +		__field(loff_t, isize)
>> +		__field(loff_t, pos)
>> +		__field(u64,	count)
>
> What's the difference between "length" as used in the other tracepoints
> and "count" here?
>

Yup let me make it length which will be a more consistent naming.
I chose count just because of (iov_iter_count(iter)).

>> +		__field(u64,	done_before)
>> +		__field(int,	ki_flags)
>> +		__field(unsigned int,	dio_flags)
>> +		__field(bool,	aio)
>> +		__field(int, ret)
>> +	),
>> +	TP_fast_assign(
>> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
>> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
>> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
>> +		__entry->pos = iocb->ki_pos;
>> +		__entry->count = iov_iter_count(iter);
>> +		__entry->done_before = done_before;
>> +		__entry->dio_flags = dio_flags;
>> +		__entry->ki_flags = iocb->ki_flags;
>> +		__entry->aio = !is_sync_kiocb(iocb);
>> +		__entry->ret = ret;
>> +	),
>> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx count %llu "
>
> count and done_before are lengths of file operations, in bytes, right?

Yes, that's right.

>
> Everywhere else we use 0x%llx for that.
>

Yup I had noticed that, but I guess I missed it.
Thanks for catching it. I will fix it.

>> +		  "flags %s dio_flags %s done_before %llu aio %d ret %d",
>> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
>> +		  __entry->ino,
>> +		  __entry->isize,
>> +		  __entry->pos,
>> +		  __entry->count,
>> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
>> +		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),
>> +		  __entry->done_before,
>> +		  __entry->aio,
>> +		  __entry->ret)
>> +)
>> +
>> +#define DEFINE_DIO_RW_EVENT(name)					\
>> +DEFINE_EVENT(iomap_dio_class, name,					\
>> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,		\
>> +		 unsigned int dio_flags, u64 done_before,		\
>> +		 int ret),						\
>> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret))
>> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_begin);
>> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_end);
>> +
>> +TRACE_EVENT(iomap_dio_complete,
>> +	TP_PROTO(struct kiocb *iocb, int error, int ret),
>> +	TP_ARGS(iocb, error, ret),
>> +	TP_STRUCT__entry(
>> +		__field(dev_t,	dev)
>> +		__field(ino_t,	ino)
>> +		__field(loff_t, isize)
>> +		__field(loff_t, pos)
>> +		__field(int,	ki_flags)
>> +		__field(bool,	aio)
>> +		__field(int,	error)
>> +		__field(int,	ret)
>
> Same comment about @ret and ssize_t here.

Got it.

Thanks for the review!
-ritesh


>
> --D
>
>> +	),
>> +	TP_fast_assign(
>> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
>> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
>> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
>> +		__entry->pos = iocb->ki_pos;
>> +		__entry->ki_flags = iocb->ki_flags;
>> +		__entry->aio = !is_sync_kiocb(iocb);
>> +		__entry->error = error;
>> +		__entry->ret = ret;
>> +	),
>> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx flags %s aio %d error %d ret %d",
>> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
>> +		  __entry->ino,
>> +		  __entry->isize,
>> +		  __entry->pos,
>> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
>> +		  __entry->aio,
>> +		  __entry->error,
>> +		  __entry->ret)
>> +);
>> +
>>  #endif /* _IOMAP_TRACE_H */
>>
>>  #undef TRACE_INCLUDE_PATH
>> --
>> 2.39.2
>>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-13 20:18     ` Ritesh Harjani
@ 2023-04-14  2:16       ` Darrick J. Wong
  2023-04-14  5:21         ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Darrick J. Wong @ 2023-04-14  2:16 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

On Fri, Apr 14, 2023 at 01:48:49AM +0530, Ritesh Harjani wrote:
> "Darrick J. Wong" <djwong@kernel.org> writes:
> 
> > On Thu, Apr 13, 2023 at 02:10:32PM +0530, Ritesh Harjani (IBM) wrote:
> >> This patch adds trace point events for iomap DIO path.
> >>
> >> <e.g. iomap dio trace>
> >>      xfs_io-8815  [000]   526.790418: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x0 pos 0x0 count 4096 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 0
> >>      xfs_io-8815  [000]   526.790978: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x1000 pos 0x1000 flags DIRECT aio 0 error 0 ret 4096
> >>      xfs_io-8815  [000]   526.790988: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x1000 pos 0x1000 count 0 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 4096
> >>         fsx-8827  [005]   526.939345: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 61440 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret 0
> >>         fsx-8827  [005]   526.939459: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 0 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret -529
> >> ksoftirqd/5-41    [005]   526.939564: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x922f8 pos 0x5e000 flags NOWAIT|DIRECT|ALLOC_CACHE aio 1 error 0 ret 61440
> >>
> >> Tested-by: Disha Goel <disgoel@linux.ibm.com>
> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> >> ---
> >>  fs/iomap/direct-io.c |  3 ++
> >>  fs/iomap/trace.c     |  1 +
> >>  fs/iomap/trace.h     | 90 ++++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 94 insertions(+)
> >>
> >> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> >> index 5871956ee880..bb7a6dfbc8b3 100644
> >> --- a/fs/iomap/direct-io.c
> >> +++ b/fs/iomap/direct-io.c
> >> @@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> >>  	if (ret > 0)
> >>  		ret += dio->done_before;
> >>
> >> +	trace_iomap_dio_complete(iocb, dio->error, ret);
> >>  	kfree(dio);
> >>
> >>  	return ret;
> >> @@ -681,6 +682,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >>  	struct iomap_dio *dio;
> >>  	ssize_t ret = 0;
> >>
> >> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
> >>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
> >>  			     done_before);
> >>  	if (IS_ERR_OR_NULL(dio)) {
> >> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> >>  	}
> >>  	ret = iomap_dio_complete(dio);
> >>  out:
> >> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
> >>  	return ret;
> >>  }
> >>  EXPORT_SYMBOL_GPL(iomap_dio_rw);
> >> diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
> >> index da217246b1a9..728d5443daf5 100644
> >> --- a/fs/iomap/trace.c
> >> +++ b/fs/iomap/trace.c
> >> @@ -3,6 +3,7 @@
> >>   * Copyright (c) 2019 Christoph Hellwig
> >>   */
> >>  #include <linux/iomap.h>
> >> +#include <linux/uio.h>
> >>
> >>  /*
> >>   * We include this last to have the helpers above available for the trace
> >> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
> >> index f6ea9540d082..dcb4dd4db5fb 100644
> >> --- a/fs/iomap/trace.h
> >> +++ b/fs/iomap/trace.h
> >> @@ -183,6 +183,96 @@ TRACE_EVENT(iomap_iter,
> >>  		   (void *)__entry->caller)
> >>  );
> >>
> >> +#define TRACE_IOMAP_DIO_STRINGS \
> >> +	{IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
> >> +	{IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
> >> +	{IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
> >
> > Can you make the strings line up too, please?
> >
> 
> Ok near other _STRINGS macro. Sure, will do that.
> 
> 
> >> +
> >> +DECLARE_EVENT_CLASS(iomap_dio_class,
> >> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,
> >> +		 unsigned int dio_flags, u64 done_before, int ret),
> >
> > We're passing in ssize_t values for @ret, shouldn't the types match?
> >
> 
> Yes, I missed to correct that. Will make it loff_t.
> This should be fixed in ext2 trace point macro too.
> 
> (ssize_t can vary based on 32 bit v/s 64 bit, so while printing it as
> %llx it gives warning on 32bit. Hence will use loff_t for ret)

How about %zd?

--D

> 
> >> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret),
> >> +	TP_STRUCT__entry(
> >> +		__field(dev_t,	dev)
> >> +		__field(ino_t,	ino)
> >> +		__field(loff_t, isize)
> >> +		__field(loff_t, pos)
> >> +		__field(u64,	count)
> >
> > What's the difference between "length" as used in the other tracepoints
> > and "count" here?
> >
> 
> Yup let me make it length which will be a more consistent naming.
> I chose count just because of (iov_iter_count(iter)).
> 
> >> +		__field(u64,	done_before)
> >> +		__field(int,	ki_flags)
> >> +		__field(unsigned int,	dio_flags)
> >> +		__field(bool,	aio)
> >> +		__field(int, ret)
> >> +	),
> >> +	TP_fast_assign(
> >> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
> >> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
> >> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
> >> +		__entry->pos = iocb->ki_pos;
> >> +		__entry->count = iov_iter_count(iter);
> >> +		__entry->done_before = done_before;
> >> +		__entry->dio_flags = dio_flags;
> >> +		__entry->ki_flags = iocb->ki_flags;
> >> +		__entry->aio = !is_sync_kiocb(iocb);
> >> +		__entry->ret = ret;
> >> +	),
> >> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx count %llu "
> >
> > count and done_before are lengths of file operations, in bytes, right?
> 
> Yes, that's right.
> 
> >
> > Everywhere else we use 0x%llx for that.
> >
> 
> Yup I had noticed that, but I guess I missed it.
> Thanks for catching it. I will fix it.
> 
> >> +		  "flags %s dio_flags %s done_before %llu aio %d ret %d",
> >> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >> +		  __entry->ino,
> >> +		  __entry->isize,
> >> +		  __entry->pos,
> >> +		  __entry->count,
> >> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
> >> +		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),
> >> +		  __entry->done_before,
> >> +		  __entry->aio,
> >> +		  __entry->ret)
> >> +)
> >> +
> >> +#define DEFINE_DIO_RW_EVENT(name)					\
> >> +DEFINE_EVENT(iomap_dio_class, name,					\
> >> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,		\
> >> +		 unsigned int dio_flags, u64 done_before,		\
> >> +		 int ret),						\
> >> +	TP_ARGS(iocb, iter, dio_flags, done_before, ret))
> >> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_begin);
> >> +DEFINE_DIO_RW_EVENT(iomap_dio_rw_end);
> >> +
> >> +TRACE_EVENT(iomap_dio_complete,
> >> +	TP_PROTO(struct kiocb *iocb, int error, int ret),
> >> +	TP_ARGS(iocb, error, ret),
> >> +	TP_STRUCT__entry(
> >> +		__field(dev_t,	dev)
> >> +		__field(ino_t,	ino)
> >> +		__field(loff_t, isize)
> >> +		__field(loff_t, pos)
> >> +		__field(int,	ki_flags)
> >> +		__field(bool,	aio)
> >> +		__field(int,	error)
> >> +		__field(int,	ret)
> >
> > Same comment about @ret and ssize_t here.
> 
> Got it.
> 
> Thanks for the review!
> -ritesh
> 
> 
> >
> > --D
> >
> >> +	),
> >> +	TP_fast_assign(
> >> +		__entry->dev = file_inode(iocb->ki_filp)->i_sb->s_dev;
> >> +		__entry->ino = file_inode(iocb->ki_filp)->i_ino;
> >> +		__entry->isize = file_inode(iocb->ki_filp)->i_size;
> >> +		__entry->pos = iocb->ki_pos;
> >> +		__entry->ki_flags = iocb->ki_flags;
> >> +		__entry->aio = !is_sync_kiocb(iocb);
> >> +		__entry->error = error;
> >> +		__entry->ret = ret;
> >> +	),
> >> +	TP_printk("dev %d:%d ino 0x%lx isize 0x%llx pos 0x%llx flags %s aio %d error %d ret %d",
> >> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >> +		  __entry->ino,
> >> +		  __entry->isize,
> >> +		  __entry->pos,
> >> +		  __print_flags(__entry->ki_flags, "|", TRACE_IOCB_STRINGS),
> >> +		  __entry->aio,
> >> +		  __entry->error,
> >> +		  __entry->ret)
> >> +);
> >> +
> >>  #endif /* _IOMAP_TRACE_H */
> >>
> >>  #undef TRACE_INCLUDE_PATH
> >> --
> >> 2.39.2
> >>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-14  2:16       ` Darrick J. Wong
@ 2023-04-14  5:21         ` Ritesh Harjani
  0 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-14  5:21 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

"Darrick J. Wong" <djwong@kernel.org> writes:

> On Fri, Apr 14, 2023 at 01:48:49AM +0530, Ritesh Harjani wrote:
>> "Darrick J. Wong" <djwong@kernel.org> writes:
>>
>> > On Thu, Apr 13, 2023 at 02:10:32PM +0530, Ritesh Harjani (IBM) wrote:
>> >> This patch adds trace point events for iomap DIO path.
>> >>
>> >> <e.g. iomap dio trace>
>> >>      xfs_io-8815  [000]   526.790418: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x0 pos 0x0 count 4096 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 0
>> >>      xfs_io-8815  [000]   526.790978: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x1000 pos 0x1000 flags DIRECT aio 0 error 0 ret 4096
>> >>      xfs_io-8815  [000]   526.790988: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x1000 pos 0x1000 count 0 flags DIRECT dio_flags DIO_FORCE_WAIT done_before 0 aio 0 ret 4096
>> >>         fsx-8827  [005]   526.939345: iomap_dio_rw_begin:   dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 61440 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret 0
>> >>         fsx-8827  [005]   526.939459: iomap_dio_rw_end:     dev 7:7 ino 0xc isize 0x922f8 pos 0x4f000 count 0 flags NOWAIT|DIRECT|ALLOC_CACHE dio_flags  done_before 0 aio 1 ret -529
>> >> ksoftirqd/5-41    [005]   526.939564: iomap_dio_complete:   dev 7:7 ino 0xc isize 0x922f8 pos 0x5e000 flags NOWAIT|DIRECT|ALLOC_CACHE aio 1 error 0 ret 61440
>> >>
>> >> Tested-by: Disha Goel <disgoel@linux.ibm.com>
>> >> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> >> ---
>> >>  fs/iomap/direct-io.c |  3 ++
>> >>  fs/iomap/trace.c     |  1 +
>> >>  fs/iomap/trace.h     | 90 ++++++++++++++++++++++++++++++++++++++++++++
>> >>  3 files changed, 94 insertions(+)
>> >>
>> >> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> >> index 5871956ee880..bb7a6dfbc8b3 100644
>> >> --- a/fs/iomap/direct-io.c
>> >> +++ b/fs/iomap/direct-io.c
>> >> @@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>> >>  	if (ret > 0)
>> >>  		ret += dio->done_before;
>> >>
>> >> +	trace_iomap_dio_complete(iocb, dio->error, ret);
>> >>  	kfree(dio);
>> >>
>> >>  	return ret;
>> >> @@ -681,6 +682,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> >>  	struct iomap_dio *dio;
>> >>  	ssize_t ret = 0;
>> >>
>> >> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
>> >>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>> >>  			     done_before);
>> >>  	if (IS_ERR_OR_NULL(dio)) {
>> >> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> >>  	}
>> >>  	ret = iomap_dio_complete(dio);
>> >>  out:
>> >> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
>> >>  	return ret;
>> >>  }
>> >>  EXPORT_SYMBOL_GPL(iomap_dio_rw);
>> >> diff --git a/fs/iomap/trace.c b/fs/iomap/trace.c
>> >> index da217246b1a9..728d5443daf5 100644
>> >> --- a/fs/iomap/trace.c
>> >> +++ b/fs/iomap/trace.c
>> >> @@ -3,6 +3,7 @@
>> >>   * Copyright (c) 2019 Christoph Hellwig
>> >>   */
>> >>  #include <linux/iomap.h>
>> >> +#include <linux/uio.h>
>> >>
>> >>  /*
>> >>   * We include this last to have the helpers above available for the trace
>> >> diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
>> >> index f6ea9540d082..dcb4dd4db5fb 100644
>> >> --- a/fs/iomap/trace.h
>> >> +++ b/fs/iomap/trace.h
>> >> @@ -183,6 +183,96 @@ TRACE_EVENT(iomap_iter,
>> >>  		   (void *)__entry->caller)
>> >>  );
>> >>
>> >> +#define TRACE_IOMAP_DIO_STRINGS \
>> >> +	{IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \
>> >> +	{IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \
>> >> +	{IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }
>> >
>> > Can you make the strings line up too, please?
>> >
>>
>> Ok near other _STRINGS macro. Sure, will do that.
>>
>>
>> >> +
>> >> +DECLARE_EVENT_CLASS(iomap_dio_class,
>> >> +	TP_PROTO(struct kiocb *iocb, struct iov_iter *iter,
>> >> +		 unsigned int dio_flags, u64 done_before, int ret),
>> >
>> > We're passing in ssize_t values for @ret, shouldn't the types match?
>> >
>>
>> Yes, I missed to correct that. Will make it loff_t.
>> This should be fixed in ext2 trace point macro too.
>>
>> (ssize_t can vary based on 32 bit v/s 64 bit, so while printing it as
>> %llx it gives warning on 32bit. Hence will use loff_t for ret)
>
> How about %zd?

Aah yes. My bad, I wanted to look into print-format specifiers, but
missed it.

Documentation/core-api/printk-formats.rst
		size_t			%zu or %zx
		ssize_t			%zd or %zx

Will send the next revision soon with the comments addressed then.

Thanks!
-ritesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-13  8:40 ` [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation Ritesh Harjani (IBM)
@ 2023-04-14  5:59   ` Christoph Hellwig
  2023-04-14 12:51     ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14  5:59 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

Still no fan of the naming and placement here.  This is specific
to the fs/buffer.c infrastructure.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 07/10] ext2: Add direct-io trace points
  2023-04-13  8:40 ` [RFCv3 07/10] ext2: Add direct-io " Ritesh Harjani (IBM)
@ 2023-04-14  6:00   ` Christoph Hellwig
  2023-04-14  8:06     ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14  6:00 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

So why do you add tracepoints in ext2 in addition to iomap?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw
  2023-04-13 14:35   ` Darrick J. Wong
@ 2023-04-14  6:00     ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14  6:00 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ritesh Harjani (IBM),
	linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Ojaswin Mujoo, Disha Goel

On Thu, Apr 13, 2023 at 07:35:30AM -0700, Darrick J. Wong wrote:
> On Thu, Apr 13, 2023 at 02:10:31PM +0530, Ritesh Harjani (IBM) wrote:
> > The next patch brings in the tracepoint patch for iomap DIO functions.
> > This is a small refactor change for having a single out path.
> > 
> > Tested-by: Disha Goel <disgoel@linux.ibm.com>
> > Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> 
> IMHO this could've been part of the next patch instead of separate, but
> eh, whatever, looks good to me.

Yeah, without the next patch this is a bit pointless.  And also a harder
to review without seeing why it is done.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-13  8:40 ` [RFCv3 10/10] iomap: Add trace points for DIO path Ritesh Harjani (IBM)
  2023-04-13 14:42   ` Darrick J. Wong
@ 2023-04-14  6:04   ` Christoph Hellwig
  2023-04-14  7:56     ` Ritesh Harjani
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14  6:04 UTC (permalink / raw)
  To: Ritesh Harjani (IBM)
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>  			     done_before);
>  	if (IS_ERR_OR_NULL(dio)) {
> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>  	}
>  	ret = iomap_dio_complete(dio);
>  out:
> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);

The trace_iomap_dio_rw_end tracepoint heere seems a bit weird,
and we'll miss it for file systems using  __iomap_dio_rw directly.

I'd instead add a trace_iomap_dio_rw_queued for the case where
__iomap_dio_rw returns ERR_PTR(-EIOCBQUEUED), as otherwise we're
nicely covered by the complete trace points.

> +		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),

Please avoid the overly lone line here.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-14  6:04   ` Christoph Hellwig
@ 2023-04-14  7:56     ` Ritesh Harjani
  2023-04-14 13:06       ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-14  7:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

Christoph Hellwig <hch@infradead.org> writes:

>> +	trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
>>  	dio = __iomap_dio_rw(iocb, iter, ops, dops, dio_flags, private,
>>  			     done_before);
>>  	if (IS_ERR_OR_NULL(dio)) {
>> @@ -689,6 +691,7 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>>  	}
>>  	ret = iomap_dio_complete(dio);
>>  out:
>> +	trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
>
> The trace_iomap_dio_rw_end tracepoint heere seems a bit weird,
> and we'll miss it for file systems using  __iomap_dio_rw directly.

Sorry, yes you are right.

>
> I'd instead add a trace_iomap_dio_rw_queued for the case where
> __iomap_dio_rw returns ERR_PTR(-EIOCBQUEUED), as otherwise we're
> nicely covered by the complete trace points.
>

How about this below change? Does this look good to you?
It should cover all error types and both entry and exit.

And should I fold this entire change in 1 patch or should I split the
refactoring of common out routine into a seperate one?


diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5871956ee880..e412fdc4ee86 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
        if (ret > 0)
                ret += dio->done_before;

+       trace_iomap_dio_complete(iocb, dio->error, ret);
        kfree(dio);

        return ret;
@@ -493,12 +494,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
        struct blk_plug plug;
        struct iomap_dio *dio;

+       trace_iomap_dio_rw_begin(iocb, iter, dio_flags, done_before, ret);
        if (!iomi.len)
-               return NULL;
+               goto out;

        dio = kmalloc(sizeof(*dio), GFP_KERNEL);
-       if (!dio)
-               return ERR_PTR(-ENOMEM);
+       if (!dio) {
+               ret = -ENOMEM;
+               goto out;
+       }

        dio->iocb = iocb;
        atomic_set(&dio->ref, 1);
@@ -650,8 +654,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
         */
        dio->wait_for_completion = wait_for_completion;
        if (!atomic_dec_and_test(&dio->ref)) {
-               if (!wait_for_completion)
-                       return ERR_PTR(-EIOCBQUEUED);
+               if (!wait_for_completion) {
+                       ret = -EIOCBQUEUED;
+                       goto out;
+               }
+

                for (;;) {
                        set_current_state(TASK_UNINTERRUPTIBLE);
@@ -663,10 +670,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
                __set_current_state(TASK_RUNNING);
        }

+       trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
        return dio;

 out_free_dio:
        kfree(dio);
+out:
+       trace_iomap_dio_rw_end(iocb, iter, dio_flags, done_before, ret);
        if (ret)
                return ERR_PTR(ret);
        return NULL;


>> +		  __print_flags(__entry->dio_flags, "|", TRACE_IOMAP_DIO_STRINGS),
>
> Please avoid the overly lone line here.

Somehow my checkpatch never gave a warning about it.
I will check why was that. But yes, I have anyways made the name to
IOMAP_DIO_STRINGS similar to other namings used in fs/iomap/trace.h

-ritesh

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFCv3 07/10] ext2: Add direct-io trace points
  2023-04-14  6:00   ` Christoph Hellwig
@ 2023-04-14  8:06     ` Ritesh Harjani
  0 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-14  8:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-fsdevel, linux-ext4, Jan Kara, Christoph Hellwig,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

Christoph Hellwig <hch@infradead.org> writes:

> So why do you add tracepoints in ext2 in addition to iomap?

1. It's very convenient to look into all ext2 filesystem activity by just
"trace-cmd -e ext2"

2. For cases in ext2 which fallbacks to buffered-io routine.
Adding a trace point specifically for ext2 can cover this path as well
as to how much got written via buffered-io path or in case there was an
error while doing that.

3. As of now there was no tracepoint added to ext2. Going forward it
will be quick and convenient to add any trace event for
debugging/observing other call paths too while development.

-ritesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-14  5:59   ` Christoph Hellwig
@ 2023-04-14 12:51     ` Jan Kara
  2023-04-14 13:12       ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2023-04-14 12:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ritesh Harjani (IBM),
	linux-fsdevel, linux-ext4, Jan Kara, Darrick J . Wong,
	Ojaswin Mujoo, Disha Goel

On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
> Still no fan of the naming and placement here.  This is specific
> to the fs/buffer.c infrastructure.

I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
creating the new function there if it makes you happier. But I think
function names should be consistent (hence the new function would be named
__generic_file_fsync_nolock()). I agree the name is not ideal and would use
cleanup (along with transitioning everybody to not take i_rwsem) but I
don't want to complicate this series by touching 13+ callsites of
generic_file_fsync() and __generic_file_fsync(). That's for a separate
series.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-14  7:56     ` Ritesh Harjani
@ 2023-04-14 13:06       ` Christoph Hellwig
  2023-04-14 14:38         ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14 13:06 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, Jan Kara,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

On Fri, Apr 14, 2023 at 01:26:38PM +0530, Ritesh Harjani wrote:
> How about this below change? Does this look good to you?
> It should cover all error types and both entry and exit.

I don't think it is very useful.  The complete tracepoint is the
end of the I/O.  Having a separate end one doesn't make sense.
That's why I suggested a queued one for the asynchronous case.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-14 12:51     ` Jan Kara
@ 2023-04-14 13:12       ` Christoph Hellwig
  2023-04-14 14:20         ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2023-04-14 13:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: Christoph Hellwig, Ritesh Harjani (IBM),
	linux-fsdevel, linux-ext4, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel

On Fri, Apr 14, 2023 at 02:51:48PM +0200, Jan Kara wrote:
> On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
> > Still no fan of the naming and placement here.  This is specific
> > to the fs/buffer.c infrastructure.
> 
> I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
> creating the new function there if it makes you happier. But I think
> function names should be consistent (hence the new function would be named
> __generic_file_fsync_nolock()). I agree the name is not ideal and would use
> cleanup (along with transitioning everybody to not take i_rwsem) but I
> don't want to complicate this series by touching 13+ callsites of
> generic_file_fsync() and __generic_file_fsync(). That's for a separate
> series.

I would not change the existing function.  Just do the right thing for
the new helper and slowly migrate over without complicating this series.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-14 13:12       ` Christoph Hellwig
@ 2023-04-14 14:20         ` Jan Kara
  2023-04-14 14:29           ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2023-04-14 14:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jan Kara, Ritesh Harjani (IBM),
	linux-fsdevel, linux-ext4, Darrick J . Wong, Ojaswin Mujoo,
	Disha Goel

On Fri 14-04-23 06:12:00, Christoph Hellwig wrote:
> On Fri, Apr 14, 2023 at 02:51:48PM +0200, Jan Kara wrote:
> > On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
> > > Still no fan of the naming and placement here.  This is specific
> > > to the fs/buffer.c infrastructure.
> > 
> > I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
> > creating the new function there if it makes you happier. But I think
> > function names should be consistent (hence the new function would be named
> > __generic_file_fsync_nolock()). I agree the name is not ideal and would use
> > cleanup (along with transitioning everybody to not take i_rwsem) but I
> > don't want to complicate this series by touching 13+ callsites of
> > generic_file_fsync() and __generic_file_fsync(). That's for a separate
> > series.
> 
> I would not change the existing function.  Just do the right thing for
> the new helper and slowly migrate over without complicating this series.

OK, I can live with that temporary naming inconsistency I guess. So
the function will be __buffer_file_fsync()?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-14 14:20         ` Jan Kara
@ 2023-04-14 14:29           ` Ritesh Harjani
  2023-04-17  7:32             ` Jan Kara
  0 siblings, 1 reply; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-14 14:29 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig
  Cc: Jan Kara, linux-fsdevel, linux-ext4, Darrick J . Wong,
	Ojaswin Mujoo, Disha Goel

Jan Kara <jack@suse.cz> writes:

> On Fri 14-04-23 06:12:00, Christoph Hellwig wrote:
>> On Fri, Apr 14, 2023 at 02:51:48PM +0200, Jan Kara wrote:
>> > On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
>> > > Still no fan of the naming and placement here.  This is specific
>> > > to the fs/buffer.c infrastructure.
>> >
>> > I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
>> > creating the new function there if it makes you happier. But I think
>> > function names should be consistent (hence the new function would be named
>> > __generic_file_fsync_nolock()). I agree the name is not ideal and would use
>> > cleanup (along with transitioning everybody to not take i_rwsem) but I
>> > don't want to complicate this series by touching 13+ callsites of
>> > generic_file_fsync() and __generic_file_fsync(). That's for a separate
>> > series.
>>
>> I would not change the existing function.  Just do the right thing for
>> the new helper and slowly migrate over without complicating this series.
>
> OK, I can live with that temporary naming inconsistency I guess. So
> the function will be __buffer_file_fsync()?

This name was suggested before, so if that's ok I will go with this -
"generic_buffer_fsync()". It's definition will lie in fs/buffer.c and
it's declaration in include/linux/buffer_head.h

Is that ok?

-ritesh

>
> 								Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 10/10] iomap: Add trace points for DIO path
  2023-04-14 13:06       ` Christoph Hellwig
@ 2023-04-14 14:38         ` Ritesh Harjani
  0 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-14 14:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, linux-fsdevel, linux-ext4, Jan Kara,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

Christoph Hellwig <hch@infradead.org> writes:

> On Fri, Apr 14, 2023 at 01:26:38PM +0530, Ritesh Harjani wrote:
>> How about this below change? Does this look good to you?
>> It should cover all error types and both entry and exit.
>
> I don't think it is very useful.  The complete tracepoint is the
> end of the I/O.  Having a separate end one doesn't make sense.
> That's why I suggested a queued one for the asynchronous case.

Ok, does this look good then?

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 36ab1152dbea..859efb5de1bf 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -130,6 +130,7 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
        if (ret > 0)
                ret += dio->done_before;

+       trace_iomap_dio_complete(iocb, dio->error, ret);
        kfree(dio);

        return ret;
@@ -650,8 +651,12 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
         */
        dio->wait_for_completion = wait_for_completion;
        if (!atomic_dec_and_test(&dio->ref)) {
-               if (!wait_for_completion)
-                       return ERR_PTR(-EIOCBQUEUED);
+               if (!wait_for_completion) {
+                       ret = -EIOCBQUEUED;
+                       trace_iomap_dio_rw_queued(iocb, iter, dio_flags,
+                                                 done_before, ret);
+                       return ERR_PTR(ret);
+               }

                for (;;) {
                        set_current_state(TASK_UNINTERRUPTIBLE);

-ritesh

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-14 14:29           ` Ritesh Harjani
@ 2023-04-17  7:32             ` Jan Kara
  2023-04-17 10:01               ` Ritesh Harjani
  0 siblings, 1 reply; 33+ messages in thread
From: Jan Kara @ 2023-04-17  7:32 UTC (permalink / raw)
  To: Ritesh Harjani
  Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, linux-ext4,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

On Fri 14-04-23 19:59:42, Ritesh Harjani wrote:
> Jan Kara <jack@suse.cz> writes:
> 
> > On Fri 14-04-23 06:12:00, Christoph Hellwig wrote:
> >> On Fri, Apr 14, 2023 at 02:51:48PM +0200, Jan Kara wrote:
> >> > On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
> >> > > Still no fan of the naming and placement here.  This is specific
> >> > > to the fs/buffer.c infrastructure.
> >> >
> >> > I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
> >> > creating the new function there if it makes you happier. But I think
> >> > function names should be consistent (hence the new function would be named
> >> > __generic_file_fsync_nolock()). I agree the name is not ideal and would use
> >> > cleanup (along with transitioning everybody to not take i_rwsem) but I
> >> > don't want to complicate this series by touching 13+ callsites of
> >> > generic_file_fsync() and __generic_file_fsync(). That's for a separate
> >> > series.
> >>
> >> I would not change the existing function.  Just do the right thing for
> >> the new helper and slowly migrate over without complicating this series.
> >
> > OK, I can live with that temporary naming inconsistency I guess. So
> > the function will be __buffer_file_fsync()?
> 
> This name was suggested before, so if that's ok I will go with this -
> "generic_buffer_fsync()". It's definition will lie in fs/buffer.c and
> it's declaration in include/linux/buffer_head.h
> 
> Is that ok?

Yes, that is fine by me. And I suppose this variant will also issue the
cache flush, won't it? But then we also need __generic_buffer_fsync()
without issuing the cache flush for ext4 (we need to sync parent before
issuing a cache flush) and FAT.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation
  2023-04-17  7:32             ` Jan Kara
@ 2023-04-17 10:01               ` Ritesh Harjani
  0 siblings, 0 replies; 33+ messages in thread
From: Ritesh Harjani @ 2023-04-17 10:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: Jan Kara, Christoph Hellwig, linux-fsdevel, linux-ext4,
	Darrick J . Wong, Ojaswin Mujoo, Disha Goel

Jan Kara <jack@suse.cz> writes:

> On Fri 14-04-23 19:59:42, Ritesh Harjani wrote:
>> Jan Kara <jack@suse.cz> writes:
>>
>> > On Fri 14-04-23 06:12:00, Christoph Hellwig wrote:
>> >> On Fri, Apr 14, 2023 at 02:51:48PM +0200, Jan Kara wrote:
>> >> > On Thu 13-04-23 22:59:24, Christoph Hellwig wrote:
>> >> > > Still no fan of the naming and placement here.  This is specific
>> >> > > to the fs/buffer.c infrastructure.
>> >> >
>> >> > I'm fine with moving generic_file_fsync() & friends to fs/buffer.c and
>> >> > creating the new function there if it makes you happier. But I think
>> >> > function names should be consistent (hence the new function would be named
>> >> > __generic_file_fsync_nolock()). I agree the name is not ideal and would use
>> >> > cleanup (along with transitioning everybody to not take i_rwsem) but I
>> >> > don't want to complicate this series by touching 13+ callsites of
>> >> > generic_file_fsync() and __generic_file_fsync(). That's for a separate
>> >> > series.
>> >>
>> >> I would not change the existing function.  Just do the right thing for
>> >> the new helper and slowly migrate over without complicating this series.
>> >
>> > OK, I can live with that temporary naming inconsistency I guess. So
>> > the function will be __buffer_file_fsync()?
>>
>> This name was suggested before, so if that's ok I will go with this -
>> "generic_buffer_fsync()". It's definition will lie in fs/buffer.c and
>> it's declaration in include/linux/buffer_head.h
>>
>> Is that ok?
>
> Yes, that is fine by me. And I suppose this variant will also issue the
> cache flush, won't it?

No. We don't issue cache flush (REQ_PREFLUSH) in generic_buffer_fsync(),
neither __generic_file_fsync() does that.

> But then we also need __generic_buffer_fsync()
> without issuing the cache flush for ext4 (we need to sync parent before
> issuing a cache flush) and FAT.

Yes, we do take care of that by -

<simplified logic>
ret = generic_buffer_fsync()
if (!ret)
   ret = ext4_sync_parent(inode)
if (test_opt(inode->i_sb, BARRIER))
   blkdev_issue_flush()

Am I missing anything. I have sent a [v5] with all of the comments
addressed. Could you please take a look and let me know if it looks
good or is there anything required?

[v5]: https://lore.kernel.org/linux-ext4/cover.1681639164.git.ritesh.list@gmail.com/T/#t

-ritesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2023-04-17 10:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-13  8:40 [RFCv3 00/10] ext2: DIO to use iomap Ritesh Harjani (IBM)
2023-04-13  8:40 ` [RFCv3 01/10] ext2/dax: Fix ext2_setsize when len is page aligned Ritesh Harjani (IBM)
2023-04-13  8:40 ` [RFCv3 02/10] libfs: Add __generic_file_fsync_nolock implementation Ritesh Harjani (IBM)
2023-04-14  5:59   ` Christoph Hellwig
2023-04-14 12:51     ` Jan Kara
2023-04-14 13:12       ` Christoph Hellwig
2023-04-14 14:20         ` Jan Kara
2023-04-14 14:29           ` Ritesh Harjani
2023-04-17  7:32             ` Jan Kara
2023-04-17 10:01               ` Ritesh Harjani
2023-04-13  8:40 ` [RFCv3 03/10] ext4: Use " Ritesh Harjani (IBM)
2023-04-13  8:40 ` [RFCv3 04/10] ext2: " Ritesh Harjani (IBM)
2023-04-13  8:40 ` [RFCv3 05/10] ext2: Move direct-io to use iomap Ritesh Harjani (IBM)
2023-04-13  8:40 ` [RFCv3 06/10] fs.h: Add TRACE_IOCB_STRINGS for use in trace points Ritesh Harjani (IBM)
2023-04-13  9:54   ` Christian Brauner
2023-04-13 10:15     ` Ritesh Harjani
2023-04-13  8:40 ` [RFCv3 07/10] ext2: Add direct-io " Ritesh Harjani (IBM)
2023-04-14  6:00   ` Christoph Hellwig
2023-04-14  8:06     ` Ritesh Harjani
2023-04-13  8:40 ` [RFCv3 08/10] iomap: Remove IOMAP_DIO_NOSYNC unused dio flag Ritesh Harjani (IBM)
2023-04-13 14:34   ` Darrick J. Wong
2023-04-13  8:40 ` [RFCv3 09/10] iomap: Minor refactor of iomap_dio_rw Ritesh Harjani (IBM)
2023-04-13 14:35   ` Darrick J. Wong
2023-04-14  6:00     ` Christoph Hellwig
2023-04-13  8:40 ` [RFCv3 10/10] iomap: Add trace points for DIO path Ritesh Harjani (IBM)
2023-04-13 14:42   ` Darrick J. Wong
2023-04-13 20:18     ` Ritesh Harjani
2023-04-14  2:16       ` Darrick J. Wong
2023-04-14  5:21         ` Ritesh Harjani
2023-04-14  6:04   ` Christoph Hellwig
2023-04-14  7:56     ` Ritesh Harjani
2023-04-14 13:06       ` Christoph Hellwig
2023-04-14 14:38         ` Ritesh Harjani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).