All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] ext4: improve delalloc buffer write performance
@ 2021-07-06  2:42 Zhang Yi
  2021-07-06  2:42 ` [RFC PATCH 1/4] ext4: check and update i_disksize properly Zhang Yi
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Zhang Yi @ 2021-07-06  2:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Hi,

This patchset address to improve buffer write performance with delalloc.
The first patch reduce the unnecessary update i_disksize, the second two
patch refactor the inline data write procedure and also do some small
fix, the last patch do improve by remove all unnecessary journal handle
in the delalloc write procedure.

After this patch set, we could get a lot of performance improvement.
Below is the Unixbench comparison data test on my machine with 'Intel
Xeon Gold 5120' CPU and nvme SSD backend.

Test cmd:

  ./Run -c 56 -i 3 fstime fsbuffer fsdisk

Before this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
  File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
  File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
                                                                    ========
  System Benchmarks Index Score (Partial Only)                      1186.6

After this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
  File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
  File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
                                                                    ========
  System Benchmarks Index Score (Partial Only)                      2053.0



Already test by fstests under below configuration, no add failure.

     delalloc  blocksize   journal  inline
1.    Y         4K          N        N
2.    Y         4K          N        Y
3.    N         4K          N        N
4.    N         4K          N        Y
5.    Y         4K          Y        N
6.    Y         4K          Y        Y
7.    Y         1K          N        N
8.    N         1K          N        N
9.    Y         1K          Y        N

Thanks,
Yi.



Zhang Yi (4):
  ext4: check and update i_disksize properly
  ext4: correct the error path of ext4_write_inline_data_end()
  ext4: factor out write end code of inline file
  ext4: drop unnecessary journal handle in delalloc write

 fs/ext4/ext4.h   |   3 --
 fs/ext4/inline.c | 124 +++++++++++++++++++++---------------------
 fs/ext4/inode.c  | 137 ++++++++++-------------------------------------
 3 files changed, 93 insertions(+), 171 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/4] ext4: check and update i_disksize properly
  2021-07-06  2:42 [RFC PATCH 0/4] ext4: improve delalloc buffer write performance Zhang Yi
@ 2021-07-06  2:42 ` Zhang Yi
  2021-07-06 12:11   ` Jan Kara
  2021-07-06  2:42 ` [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end() Zhang Yi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2021-07-06  2:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and it seems that there is no other way that could appear
i_disksize < i_size besides the delalloc write. In the case of delay
alloc write, ext4_writepages() could update i_disksize for the new delay
allocated blocks properly. So we could switch to check i_size instead
of i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because
generic_write_end() will dirty the inode.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d8de607849df..6f6a61f3ae5f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file,
 	 * generic_write_end() will run mark_inode_dirty() if i_size
 	 * changes.  So let's piggyback the i_disksize mark_inode_dirty
 	 * into that.
+	 *
+	 * Check i_size not i_disksize here because ext4_writepages() could
+	 * update i_disksize from i_size for delay allocated blocks properly.
 	 */
 	new_i_size = pos + copied;
-	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
+	if (copied && new_i_size > inode->i_size) {
 		if (ext4_has_inline_data(inode) ||
-		    ext4_da_should_update_i_disksize(page, end)) {
+		    ext4_da_should_update_i_disksize(page, end))
 			ext4_update_i_disksize(inode, new_i_size);
-			/* We need to mark inode dirty even if
-			 * new_i_size is less that inode->i_size
-			 * bu greater than i_disksize.(hint delalloc)
-			 */
-			ret = ext4_mark_inode_dirty(handle, inode);
-		}
 	}
 
 	if (write_mode != CONVERT_INLINE_DATA &&
 	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
 	    ext4_has_inline_data(inode))
-		ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
+		ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
 						     page);
 	else
-		ret2 = generic_write_end(file, mapping, pos, len, copied,
+		ret = generic_write_end(file, mapping, pos, len, copied,
 							page, fsdata);
 
-	copied = ret2;
-	if (ret2 < 0)
-		ret = ret2;
+	copied = ret;
 	ret2 = ext4_journal_stop(handle);
 	if (unlikely(ret2 && !ret))
 		ret = ret2;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end()
  2021-07-06  2:42 [RFC PATCH 0/4] ext4: improve delalloc buffer write performance Zhang Yi
  2021-07-06  2:42 ` [RFC PATCH 1/4] ext4: check and update i_disksize properly Zhang Yi
@ 2021-07-06  2:42 ` Zhang Yi
  2021-07-06 12:28   ` Jan Kara
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
  2021-07-06  2:42 ` [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write Zhang Yi
  3 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2021-07-06  2:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Current error path of ext4_write_inline_data_end() is not correct.

Firstly, it should pass out the error value if ext4_get_inode_loc()
return fail, or else it could trigger infinite loop if we inject error
here. And then it's better to add inode to orphan list if it return fail
in ext4_journal_stop(), otherwise we could not restore inline xattr
entry after power failure. Finally, we need to reset the 'ret' value if
ext4_write_inline_data_end() return success in ext4_write_end() and
ext4_journalled_write_end(), otherwise we could not get the error return
value of ext4_journal_stop().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inline.c | 15 +++++----------
 fs/ext4/inode.c  |  7 +++++--
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 70cb64db33f7..28b666f25ac2 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -733,25 +733,20 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
 	void *kaddr;
 	struct ext4_iloc iloc;
 
-	if (unlikely(copied < len)) {
-		if (!PageUptodate(page)) {
-			copied = 0;
-			goto out;
-		}
-	}
+	if (unlikely(copied < len) && !PageUptodate(page))
+		return 0;
 
 	ret = ext4_get_inode_loc(inode, &iloc);
 	if (ret) {
 		ext4_std_error(inode->i_sb, ret);
-		copied = 0;
-		goto out;
+		return ret;
 	}
 
 	ext4_write_lock_xattr(inode, &no_expand);
 	BUG_ON(!ext4_has_inline_data(inode));
 
 	kaddr = kmap_atomic(page);
-	ext4_write_inline_data(inode, &iloc, kaddr, pos, len);
+	ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
 	kunmap_atomic(kaddr);
 	SetPageUptodate(page);
 	/* clear page dirty so that writepages wouldn't work for us. */
@@ -760,7 +755,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
 	ext4_write_unlock_xattr(inode, &no_expand);
 	brelse(iloc.bh);
 	mark_inode_dirty(inode);
-out:
+
 	return copied;
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6f6a61f3ae5f..4efd50a844b9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1295,6 +1295,7 @@ static int ext4_write_end(struct file *file,
 			goto errout;
 		}
 		copied = ret;
+		ret = 0;
 	} else
 		copied = block_write_end(file, mapping, pos,
 					 len, copied, page, fsdata);
@@ -1321,13 +1322,14 @@ static int ext4_write_end(struct file *file,
 	if (i_size_changed || inline_data)
 		ret = ext4_mark_inode_dirty(handle, inode);
 
+errout:
 	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
 		/* if we have allocated more blocks and copied
 		 * less. We will have blocks allocated outside
 		 * inode->i_size. So truncate them
 		 */
 		ext4_orphan_add(handle, inode);
-errout:
+
 	ret2 = ext4_journal_stop(handle);
 	if (!ret)
 		ret = ret2;
@@ -1410,6 +1412,7 @@ static int ext4_journalled_write_end(struct file *file,
 			goto errout;
 		}
 		copied = ret;
+		ret = 0;
 	} else if (unlikely(copied < len) && !PageUptodate(page)) {
 		copied = 0;
 		ext4_journalled_zero_new_buffers(handle, page, from, to);
@@ -1439,6 +1442,7 @@ static int ext4_journalled_write_end(struct file *file,
 			ret = ret2;
 	}
 
+errout:
 	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
 		/* if we have allocated more blocks and copied
 		 * less. We will have blocks allocated outside
@@ -1446,7 +1450,6 @@ static int ext4_journalled_write_end(struct file *file,
 		 */
 		ext4_orphan_add(handle, inode);
 
-errout:
 	ret2 = ext4_journal_stop(handle);
 	if (!ret)
 		ret = ret2;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-06  2:42 [RFC PATCH 0/4] ext4: improve delalloc buffer write performance Zhang Yi
  2021-07-06  2:42 ` [RFC PATCH 1/4] ext4: check and update i_disksize properly Zhang Yi
  2021-07-06  2:42 ` [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end() Zhang Yi
@ 2021-07-06  2:42 ` Zhang Yi
  2021-07-06  5:45   ` kernel test robot
                     ` (2 more replies)
  2021-07-06  2:42 ` [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write Zhang Yi
  3 siblings, 3 replies; 17+ messages in thread
From: Zhang Yi @ 2021-07-06  2:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

Now that the inline_data file write end procedure are falled into the
common write end functions, it is not clear. Factor them out and do
some cleanup. This patch also drop ext4_da_write_inline_data_end()
and switch to use ext4_write_inline_data_end() instead because we also
need to do the same error processing if we failed to write data into
inline entry.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/ext4.h   |   3 --
 fs/ext4/inline.c | 121 +++++++++++++++++++++++++----------------------
 fs/ext4/inode.c  |  73 +++++++++-------------------
 3 files changed, 88 insertions(+), 109 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 3c51e243450d..d4799e619048 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3530,9 +3530,6 @@ extern int ext4_da_write_inline_data_begin(struct address_space *mapping,
 					   unsigned flags,
 					   struct page **pagep,
 					   void **fsdata);
-extern int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
-					 unsigned len, unsigned copied,
-					 struct page *page);
 extern int ext4_try_add_inline_entry(handle_t *handle,
 				     struct ext4_filename *fname,
 				     struct inode *dir, struct inode *inode);
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index 28b666f25ac2..8fbf8ec05bd5 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -729,34 +729,80 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
 int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
 			       unsigned copied, struct page *page)
 {
-	int ret, no_expand;
+	handle_t *handle = ext4_journal_current_handle();
+	int i_size_changed = 0;
+	int no_expand;
 	void *kaddr;
 	struct ext4_iloc iloc;
+	int ret, ret2;
 
 	if (unlikely(copied < len) && !PageUptodate(page))
-		return 0;
+		copied = 0;
 
-	ret = ext4_get_inode_loc(inode, &iloc);
-	if (ret) {
-		ext4_std_error(inode->i_sb, ret);
-		return ret;
-	}
+	if (likely(copied)) {
+		ret = ext4_get_inode_loc(inode, &iloc);
+		if (ret) {
+			unlock_page(page);
+			put_page(page);
+			ext4_std_error(inode->i_sb, ret);
+			goto out;
+		}
+		ext4_write_lock_xattr(inode, &no_expand);
+		BUG_ON(!ext4_has_inline_data(inode));
 
-	ext4_write_lock_xattr(inode, &no_expand);
-	BUG_ON(!ext4_has_inline_data(inode));
+		kaddr = kmap_atomic(page);
+		ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
+		kunmap_atomic(kaddr);
+		SetPageUptodate(page);
+		/* clear page dirty so that writepages wouldn't work for us. */
+		ClearPageDirty(page);
 
-	kaddr = kmap_atomic(page);
-	ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
-	kunmap_atomic(kaddr);
-	SetPageUptodate(page);
-	/* clear page dirty so that writepages wouldn't work for us. */
-	ClearPageDirty(page);
+		ext4_write_unlock_xattr(inode, &no_expand);
+		brelse(iloc.bh);
+	}
 
-	ext4_write_unlock_xattr(inode, &no_expand);
-	brelse(iloc.bh);
-	mark_inode_dirty(inode);
+	/*
+	 * It's important to update i_size while still holding page lock:
+	 * page writeout could otherwise come in and zero beyond i_size.
+	 */
+	i_size_changed = ext4_update_inode_size(inode, pos + copied);
+	if (ext4_should_journal_data(inode)) {
+		ext4_set_inode_state(inode, EXT4_STATE_JDATA);
+		EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
+	}
+	unlock_page(page);
+	put_page(page);
 
-	return copied;
+	/*
+	 * Don't mark the inode dirty under page lock. First, it unnecessarily
+	 * makes the holding time of page lock longer. Second, it forces lock
+	 * ordering of page lock and transaction start for journaling
+	 * filesystems.
+	 */
+	if (likely(copied) || i_size_changed)
+		mark_inode_dirty(inode);
+out:
+	/*
+	 * If we have allocated more blocks and copied less. We will have
+	 * blocks allocated outside inode->i_size, so truncate them.
+	 */
+	if (pos + len > inode->i_size && ext4_can_truncate(inode))
+		ext4_orphan_add(handle, inode);
+
+	ret2 = ext4_journal_stop(handle);
+	if (!ret)
+		ret = ret2;
+	if (pos + len > inode->i_size) {
+		ext4_truncate_failed_write(inode);
+		/*
+		 * If truncate failed early the inode might still be
+		 * on the orphan list; we need to make sure the inode
+		 * is removed from the orphan list in that case.
+		 */
+		if (inode->i_nlink)
+			ext4_orphan_del(NULL, inode);
+	}
+	return ret ? ret : copied;
 }
 
 struct buffer_head *
@@ -937,43 +983,6 @@ int ext4_da_write_inline_data_begin(struct address_space *mapping,
 	return ret;
 }
 
-int ext4_da_write_inline_data_end(struct inode *inode, loff_t pos,
-				  unsigned len, unsigned copied,
-				  struct page *page)
-{
-	int ret;
-
-	ret = ext4_write_inline_data_end(inode, pos, len, copied, page);
-	if (ret < 0) {
-		unlock_page(page);
-		put_page(page);
-		return ret;
-	}
-	copied = ret;
-
-	/*
-	 * No need to use i_size_read() here, the i_size
-	 * cannot change under us because we hold i_mutex.
-	 *
-	 * But it's important to update i_size while still holding page lock:
-	 * page writeout could otherwise come in and zero beyond i_size.
-	 */
-	if (pos+copied > inode->i_size)
-		i_size_write(inode, pos+copied);
-	unlock_page(page);
-	put_page(page);
-
-	/*
-	 * Don't mark the inode dirty under page lock. First, it unnecessarily
-	 * makes the holding time of page lock longer. Second, it forces lock
-	 * ordering of page lock and transaction start for journaling
-	 * filesystems.
-	 */
-	mark_inode_dirty(inode);
-
-	return copied;
-}
-
 #ifdef INLINE_DIR_DEBUG
 void ext4_show_inline_dir(struct inode *dir, struct buffer_head *bh,
 			  void *inline_start, int inline_size)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 4efd50a844b9..650da0648eba 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1282,23 +1282,14 @@ static int ext4_write_end(struct file *file,
 	loff_t old_size = inode->i_size;
 	int ret = 0, ret2;
 	int i_size_changed = 0;
-	int inline_data = ext4_has_inline_data(inode);
 	bool verity = ext4_verity_in_progress(inode);
 
 	trace_ext4_write_end(inode, pos, len, copied);
-	if (inline_data) {
-		ret = ext4_write_inline_data_end(inode, pos, len,
-						 copied, page);
-		if (ret < 0) {
-			unlock_page(page);
-			put_page(page);
-			goto errout;
-		}
-		copied = ret;
-		ret = 0;
-	} else
-		copied = block_write_end(file, mapping, pos,
-					 len, copied, page, fsdata);
+
+	if (ext4_has_inline_data(inode))
+		return ext4_write_inline_data_end(inode, pos, len, copied, page);
+
+	copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
 	/*
 	 * it's important to update i_size while still holding page lock:
 	 * page writeout could otherwise come in and zero beyond i_size.
@@ -1319,10 +1310,9 @@ static int ext4_write_end(struct file *file,
 	 * ordering of page lock and transaction start for journaling
 	 * filesystems.
 	 */
-	if (i_size_changed || inline_data)
+	if (i_size_changed)
 		ret = ext4_mark_inode_dirty(handle, inode);
 
-errout:
 	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
 		/* if we have allocated more blocks and copied
 		 * less. We will have blocks allocated outside
@@ -1394,7 +1384,6 @@ static int ext4_journalled_write_end(struct file *file,
 	int partial = 0;
 	unsigned from, to;
 	int size_changed = 0;
-	int inline_data = ext4_has_inline_data(inode);
 	bool verity = ext4_verity_in_progress(inode);
 
 	trace_ext4_journalled_write_end(inode, pos, len, copied);
@@ -1403,17 +1392,10 @@ static int ext4_journalled_write_end(struct file *file,
 
 	BUG_ON(!ext4_handle_valid(handle));
 
-	if (inline_data) {
-		ret = ext4_write_inline_data_end(inode, pos, len,
-						 copied, page);
-		if (ret < 0) {
-			unlock_page(page);
-			put_page(page);
-			goto errout;
-		}
-		copied = ret;
-		ret = 0;
-	} else if (unlikely(copied < len) && !PageUptodate(page)) {
+	if (ext4_has_inline_data(inode))
+		return ext4_write_inline_data_end(inode, pos, len, copied, page);
+
+	if (unlikely(copied < len) && !PageUptodate(page)) {
 		copied = 0;
 		ext4_journalled_zero_new_buffers(handle, page, from, to);
 	} else {
@@ -1436,13 +1418,12 @@ static int ext4_journalled_write_end(struct file *file,
 	if (old_size < pos && !verity)
 		pagecache_isize_extended(inode, old_size, pos);
 
-	if (size_changed || inline_data) {
+	if (size_changed) {
 		ret2 = ext4_mark_inode_dirty(handle, inode);
 		if (!ret)
 			ret = ret2;
 	}
 
-errout:
 	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
 		/* if we have allocated more blocks and copied
 		 * less. We will have blocks allocated outside
@@ -3072,7 +3053,7 @@ static int ext4_da_write_end(struct file *file,
 			     struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
-	int ret = 0, ret2;
+	int ret;
 	handle_t *handle = ext4_journal_current_handle();
 	loff_t new_i_size;
 	unsigned long start, end;
@@ -3083,6 +3064,12 @@ static int ext4_da_write_end(struct file *file,
 				      len, copied, page, fsdata);
 
 	trace_ext4_da_write_end(inode, pos, len, copied);
+
+	if (write_mode != CONVERT_INLINE_DATA &&
+	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
+	    ext4_has_inline_data(inode))
+		return ext4_write_inline_data_end(inode, pos, len, copied, page);
+
 	start = pos & (PAGE_SIZE - 1);
 	end = start + copied - 1;
 
@@ -3095,26 +3082,12 @@ static int ext4_da_write_end(struct file *file,
 	 * update i_disksize from i_size for delay allocated blocks properly.
 	 */
 	new_i_size = pos + copied;
-	if (copied && new_i_size > inode->i_size) {
-		if (ext4_has_inline_data(inode) ||
-		    ext4_da_should_update_i_disksize(page, end))
-			ext4_update_i_disksize(inode, new_i_size);
-	}
-
-	if (write_mode != CONVERT_INLINE_DATA &&
-	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
-	    ext4_has_inline_data(inode))
-		ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
-						     page);
-	else
-		ret = generic_write_end(file, mapping, pos, len, copied,
-							page, fsdata);
-
-	copied = ret;
-	ret2 = ext4_journal_stop(handle);
-	if (unlikely(ret2 && !ret))
-		ret = ret2;
+	if (copied && new_i_size > inode->i_size &&
+	    ext4_da_should_update_i_disksize(page, end))
+		ext4_update_i_disksize(inode, new_i_size);
 
+	copied = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
+	ret = ext4_journal_stop(handle);
 	return ret ? ret : copied;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write
  2021-07-06  2:42 [RFC PATCH 0/4] ext4: improve delalloc buffer write performance Zhang Yi
                   ` (2 preceding siblings ...)
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
@ 2021-07-06  2:42 ` Zhang Yi
  2021-07-07 16:59   ` Jan Kara
  3 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2021-07-06  2:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: tytso, adilger.kernel, jack, yi.zhang, yukuai3

After we factor out the inline data write procedure from
ext4_da_write_end(), we don't need to start journal handle for the cases
of both buffer overwrite and append-write. If we need to update
i_disksize, mark_inode_dirty() do start handle and update inode buffer.
So we could just remove all the journal handle codes in the delalloc
write procedure.

After this patch, we could get a lot of performance improvement. Below
is the Unixbench comparison data test on my machine with 'Intel Xeon
Gold 5120' CPU and nvme SSD backend.

Test cmd:

  ./Run -c 56 -i 3 fstime fsbuffer fsdisk

Before this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
  File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
  File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      1186.6

After this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
  File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
  File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      2053.0

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/ext4/inode.c | 60 +++++--------------------------------------------
 1 file changed, 5 insertions(+), 55 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 650da0648eba..9c86cada9a54 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2910,19 +2910,6 @@ static int ext4_nonda_switch(struct super_block *sb)
 	return 0;
 }
 
-/* We always reserve for an inode update; the superblock could be there too */
-static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
-{
-	if (likely(ext4_has_feature_large_file(inode->i_sb)))
-		return 1;
-
-	if (pos + len <= 0x7fffffffULL)
-		return 1;
-
-	/* We might need to update the superblock to set LARGE_FILE */
-	return 2;
-}
-
 static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       loff_t pos, unsigned len, unsigned flags,
 			       struct page **pagep, void **fsdata)
@@ -2931,7 +2918,6 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	struct page *page;
 	pgoff_t index;
 	struct inode *inode = mapping->host;
-	handle_t *handle;
 
 	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
 		return -EIO;
@@ -2957,41 +2943,11 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			return 0;
 	}
 
-	/*
-	 * grab_cache_page_write_begin() can take a long time if the
-	 * system is thrashing due to memory pressure, or if the page
-	 * is being written back.  So grab it first before we start
-	 * the transaction handle.  This also allows us to allocate
-	 * the page (if needed) without using GFP_NOFS.
-	 */
-retry_grab:
+retry:
 	page = grab_cache_page_write_begin(mapping, index, flags);
 	if (!page)
 		return -ENOMEM;
-	unlock_page(page);
-
-	/*
-	 * With delayed allocation, we don't log the i_disksize update
-	 * if there is delayed block allocation. But we still need
-	 * to journalling the i_disksize update if writes to the end
-	 * of file which has an already mapped buffer.
-	 */
-retry_journal:
-	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
-				ext4_da_write_credits(inode, pos, len));
-	if (IS_ERR(handle)) {
-		put_page(page);
-		return PTR_ERR(handle);
-	}
 
-	lock_page(page);
-	if (page->mapping != mapping) {
-		/* The page got truncated from under us */
-		unlock_page(page);
-		put_page(page);
-		ext4_journal_stop(handle);
-		goto retry_grab;
-	}
 	/* In case writeback began while the page was unlocked */
 	wait_for_stable_page(page);
 
@@ -3003,20 +2959,18 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 #endif
 	if (ret < 0) {
 		unlock_page(page);
-		ext4_journal_stop(handle);
+		put_page(page);
 		/*
 		 * block_write_begin may have instantiated a few blocks
 		 * outside i_size.  Trim these off again. Don't need
-		 * i_size_read because we hold i_mutex.
+		 * i_size_read because we hold inode lock.
 		 */
 		if (pos + len > inode->i_size)
 			ext4_truncate_failed_write(inode);
 
 		if (ret == -ENOSPC &&
 		    ext4_should_retry_alloc(inode->i_sb, &retries))
-			goto retry_journal;
-
-		put_page(page);
+			goto retry;
 		return ret;
 	}
 
@@ -3053,8 +3007,6 @@ static int ext4_da_write_end(struct file *file,
 			     struct page *page, void *fsdata)
 {
 	struct inode *inode = mapping->host;
-	int ret;
-	handle_t *handle = ext4_journal_current_handle();
 	loff_t new_i_size;
 	unsigned long start, end;
 	int write_mode = (int)(unsigned long)fsdata;
@@ -3086,9 +3038,7 @@ static int ext4_da_write_end(struct file *file,
 	    ext4_da_should_update_i_disksize(page, end))
 		ext4_update_i_disksize(inode, new_i_size);
 
-	copied = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
-	ret = ext4_journal_stop(handle);
-	return ret ? ret : copied;
+	return generic_write_end(file, mapping, pos, len, copied, page, fsdata);
 }
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
@ 2021-07-06  5:45   ` kernel test robot
  2021-07-06  5:48   ` kernel test robot
  2021-07-07 16:49   ` Jan Kara
  2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-07-06  5:45 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10186 bytes --]

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on v5.13 next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: s390-buildonly-randconfig-r006-20210705 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 873e8b96b1226d64e4f95083147d8592ba7bd5d8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/b6bfc15d82616d158c3b969e1796d4848b70e354
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431
        git checkout b6bfc15d82616d158c3b969e1796d4848b70e354
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from fs/ext4/inline.c:7:
   In file included from include/linux/iomap.h:11:
   In file included from include/linux/blkdev.h:25:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from fs/ext4/inline.c:7:
   In file included from include/linux/iomap.h:11:
   In file included from include/linux/blkdev.h:25:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from fs/ext4/inline.c:7:
   In file included from include/linux/iomap.h:11:
   In file included from include/linux/blkdev.h:25:
   In file included from include/linux/scatterlist.h:9:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> fs/ext4/inline.c:742:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (likely(copied)) {
               ^~~~~~~~~~~~~~
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/ext4/inline.c:793:7: note: uninitialized use occurs here
           if (!ret)
                ^~~
   fs/ext4/inline.c:742:2: note: remove the 'if' if its condition is always true
           if (likely(copied)) {
           ^~~~~~~~~~~~~~~~~~~~
   fs/ext4/inline.c:737:9: note: initialize the variable 'ret' to silence this warning
           int ret, ret2;
                  ^
                   = 0
   13 warnings generated.


vim +742 fs/ext4/inline.c

   728	
   729	int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
   730				       unsigned copied, struct page *page)
   731	{
   732		handle_t *handle = ext4_journal_current_handle();
   733		int i_size_changed = 0;
   734		int no_expand;
   735		void *kaddr;
   736		struct ext4_iloc iloc;
   737		int ret, ret2;
   738	
   739		if (unlikely(copied < len) && !PageUptodate(page))
   740			copied = 0;
   741	
 > 742		if (likely(copied)) {
   743			ret = ext4_get_inode_loc(inode, &iloc);
   744			if (ret) {
   745				unlock_page(page);
   746				put_page(page);
   747				ext4_std_error(inode->i_sb, ret);
   748				goto out;
   749			}
   750			ext4_write_lock_xattr(inode, &no_expand);
   751			BUG_ON(!ext4_has_inline_data(inode));
   752	
   753			kaddr = kmap_atomic(page);
   754			ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
   755			kunmap_atomic(kaddr);
   756			SetPageUptodate(page);
   757			/* clear page dirty so that writepages wouldn't work for us. */
   758			ClearPageDirty(page);
   759	
   760			ext4_write_unlock_xattr(inode, &no_expand);
   761			brelse(iloc.bh);
   762		}
   763	
   764		/*
   765		 * It's important to update i_size while still holding page lock:
   766		 * page writeout could otherwise come in and zero beyond i_size.
   767		 */
   768		i_size_changed = ext4_update_inode_size(inode, pos + copied);
   769		if (ext4_should_journal_data(inode)) {
   770			ext4_set_inode_state(inode, EXT4_STATE_JDATA);
   771			EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
   772		}
   773		unlock_page(page);
   774		put_page(page);
   775	
   776		/*
   777		 * Don't mark the inode dirty under page lock. First, it unnecessarily
   778		 * makes the holding time of page lock longer. Second, it forces lock
   779		 * ordering of page lock and transaction start for journaling
   780		 * filesystems.
   781		 */
   782		if (likely(copied) || i_size_changed)
   783			mark_inode_dirty(inode);
   784	out:
   785		/*
   786		 * If we have allocated more blocks and copied less. We will have
   787		 * blocks allocated outside inode->i_size, so truncate them.
   788		 */
   789		if (pos + len > inode->i_size && ext4_can_truncate(inode))
   790			ext4_orphan_add(handle, inode);
   791	
   792		ret2 = ext4_journal_stop(handle);
   793		if (!ret)
   794			ret = ret2;
   795		if (pos + len > inode->i_size) {
   796			ext4_truncate_failed_write(inode);
   797			/*
   798			 * If truncate failed early the inode might still be
   799			 * on the orphan list; we need to make sure the inode
   800			 * is removed from the orphan list in that case.
   801			 */
   802			if (inode->i_nlink)
   803				ext4_orphan_del(NULL, inode);
   804		}
   805		return ret ? ret : copied;
   806	}
   807	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 50258 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
  2021-07-06  5:45   ` kernel test robot
@ 2021-07-06  5:48   ` kernel test robot
  2021-07-07 16:49   ` Jan Kara
  2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-07-06  5:48 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 5480 bytes --]

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on v5.13 next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
config: arm-randconfig-r016-20210706 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 873e8b96b1226d64e4f95083147d8592ba7bd5d8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/b6bfc15d82616d158c3b969e1796d4848b70e354
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431
        git checkout b6bfc15d82616d158c3b969e1796d4848b70e354
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> fs/ext4/inline.c:742:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
           if (likely(copied)) {
               ^~~~~~~~~~~~~~
   include/linux/compiler.h:77:20: note: expanded from macro 'likely'
   # define likely(x)      __builtin_expect(!!(x), 1)
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/ext4/inline.c:793:7: note: uninitialized use occurs here
           if (!ret)
                ^~~
   fs/ext4/inline.c:742:2: note: remove the 'if' if its condition is always true
           if (likely(copied)) {
           ^~~~~~~~~~~~~~~~~~~~
   fs/ext4/inline.c:737:9: note: initialize the variable 'ret' to silence this warning
           int ret, ret2;
                  ^
                   = 0
   1 warning generated.


vim +742 fs/ext4/inline.c

   728	
   729	int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
   730				       unsigned copied, struct page *page)
   731	{
   732		handle_t *handle = ext4_journal_current_handle();
   733		int i_size_changed = 0;
   734		int no_expand;
   735		void *kaddr;
   736		struct ext4_iloc iloc;
   737		int ret, ret2;
   738	
   739		if (unlikely(copied < len) && !PageUptodate(page))
   740			copied = 0;
   741	
 > 742		if (likely(copied)) {
   743			ret = ext4_get_inode_loc(inode, &iloc);
   744			if (ret) {
   745				unlock_page(page);
   746				put_page(page);
   747				ext4_std_error(inode->i_sb, ret);
   748				goto out;
   749			}
   750			ext4_write_lock_xattr(inode, &no_expand);
   751			BUG_ON(!ext4_has_inline_data(inode));
   752	
   753			kaddr = kmap_atomic(page);
   754			ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
   755			kunmap_atomic(kaddr);
   756			SetPageUptodate(page);
   757			/* clear page dirty so that writepages wouldn't work for us. */
   758			ClearPageDirty(page);
   759	
   760			ext4_write_unlock_xattr(inode, &no_expand);
   761			brelse(iloc.bh);
   762		}
   763	
   764		/*
   765		 * It's important to update i_size while still holding page lock:
   766		 * page writeout could otherwise come in and zero beyond i_size.
   767		 */
   768		i_size_changed = ext4_update_inode_size(inode, pos + copied);
   769		if (ext4_should_journal_data(inode)) {
   770			ext4_set_inode_state(inode, EXT4_STATE_JDATA);
   771			EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
   772		}
   773		unlock_page(page);
   774		put_page(page);
   775	
   776		/*
   777		 * Don't mark the inode dirty under page lock. First, it unnecessarily
   778		 * makes the holding time of page lock longer. Second, it forces lock
   779		 * ordering of page lock and transaction start for journaling
   780		 * filesystems.
   781		 */
   782		if (likely(copied) || i_size_changed)
   783			mark_inode_dirty(inode);
   784	out:
   785		/*
   786		 * If we have allocated more blocks and copied less. We will have
   787		 * blocks allocated outside inode->i_size, so truncate them.
   788		 */
   789		if (pos + len > inode->i_size && ext4_can_truncate(inode))
   790			ext4_orphan_add(handle, inode);
   791	
   792		ret2 = ext4_journal_stop(handle);
   793		if (!ret)
   794			ret = ret2;
   795		if (pos + len > inode->i_size) {
   796			ext4_truncate_failed_write(inode);
   797			/*
   798			 * If truncate failed early the inode might still be
   799			 * on the orphan list; we need to make sure the inode
   800			 * is removed from the orphan list in that case.
   801			 */
   802			if (inode->i_nlink)
   803				ext4_orphan_del(NULL, inode);
   804		}
   805		return ret ? ret : copied;
   806	}
   807	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 36949 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
@ 2021-07-06  9:00 ` Dan Carpenter
  2021-07-06  5:48   ` kernel test robot
  2021-07-07 16:49   ` Jan Kara
  2 siblings, 0 replies; 17+ messages in thread
From: kernel test robot @ 2021-07-06  7:05 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7318 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20210706024210.746788-4-yi.zhang@huawei.com>
References: <20210706024210.746788-4-yi.zhang@huawei.com>
TO: Zhang Yi <yi.zhang@huawei.com>

Hi Zhang,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on ext4/dev]
[also build test WARNING on v5.13 next-20210701]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
:::::: branch date: 5 hours ago
:::::: commit date: 5 hours ago
config: i386-randconfig-m021-20210705 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/ext4/inline.c:793 ext4_write_inline_data_end() error: uninitialized symbol 'ret'.

vim +/ret +793 fs/ext4/inline.c

f19d5870cbf72d Tao Ma        2012-12-10  728  
f19d5870cbf72d Tao Ma        2012-12-10  729  int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
f19d5870cbf72d Tao Ma        2012-12-10  730  			       unsigned copied, struct page *page)
f19d5870cbf72d Tao Ma        2012-12-10  731  {
b6bfc15d82616d Zhang Yi      2021-07-06  732  	handle_t *handle = ext4_journal_current_handle();
b6bfc15d82616d Zhang Yi      2021-07-06  733  	int i_size_changed = 0;
b6bfc15d82616d Zhang Yi      2021-07-06  734  	int no_expand;
f19d5870cbf72d Tao Ma        2012-12-10  735  	void *kaddr;
f19d5870cbf72d Tao Ma        2012-12-10  736  	struct ext4_iloc iloc;
b6bfc15d82616d Zhang Yi      2021-07-06  737  	int ret, ret2;
f19d5870cbf72d Tao Ma        2012-12-10  738  
4dfba1ee616130 Zhang Yi      2021-07-06  739  	if (unlikely(copied < len) && !PageUptodate(page))
b6bfc15d82616d Zhang Yi      2021-07-06  740  		copied = 0;
f19d5870cbf72d Tao Ma        2012-12-10  741  
b6bfc15d82616d Zhang Yi      2021-07-06  742  	if (likely(copied)) {
f19d5870cbf72d Tao Ma        2012-12-10  743  		ret = ext4_get_inode_loc(inode, &iloc);
f19d5870cbf72d Tao Ma        2012-12-10  744  		if (ret) {
b6bfc15d82616d Zhang Yi      2021-07-06  745  			unlock_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  746  			put_page(page);
f19d5870cbf72d Tao Ma        2012-12-10  747  			ext4_std_error(inode->i_sb, ret);
b6bfc15d82616d Zhang Yi      2021-07-06  748  			goto out;
f19d5870cbf72d Tao Ma        2012-12-10  749  		}
c755e251357a0c Theodore Ts'o 2017-01-11  750  		ext4_write_lock_xattr(inode, &no_expand);
f19d5870cbf72d Tao Ma        2012-12-10  751  		BUG_ON(!ext4_has_inline_data(inode));
f19d5870cbf72d Tao Ma        2012-12-10  752  
f19d5870cbf72d Tao Ma        2012-12-10  753  		kaddr = kmap_atomic(page);
4dfba1ee616130 Zhang Yi      2021-07-06  754  		ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
f19d5870cbf72d Tao Ma        2012-12-10  755  		kunmap_atomic(kaddr);
f19d5870cbf72d Tao Ma        2012-12-10  756  		SetPageUptodate(page);
f19d5870cbf72d Tao Ma        2012-12-10  757  		/* clear page dirty so that writepages wouldn't work for us. */
f19d5870cbf72d Tao Ma        2012-12-10  758  		ClearPageDirty(page);
f19d5870cbf72d Tao Ma        2012-12-10  759  
c755e251357a0c Theodore Ts'o 2017-01-11  760  		ext4_write_unlock_xattr(inode, &no_expand);
f19d5870cbf72d Tao Ma        2012-12-10  761  		brelse(iloc.bh);
b6bfc15d82616d Zhang Yi      2021-07-06  762  	}
b6bfc15d82616d Zhang Yi      2021-07-06  763  
b6bfc15d82616d Zhang Yi      2021-07-06  764  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  765  	 * It's important to update i_size while still holding page lock:
b6bfc15d82616d Zhang Yi      2021-07-06  766  	 * page writeout could otherwise come in and zero beyond i_size.
b6bfc15d82616d Zhang Yi      2021-07-06  767  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  768  	i_size_changed = ext4_update_inode_size(inode, pos + copied);
b6bfc15d82616d Zhang Yi      2021-07-06  769  	if (ext4_should_journal_data(inode)) {
b6bfc15d82616d Zhang Yi      2021-07-06  770  		ext4_set_inode_state(inode, EXT4_STATE_JDATA);
b6bfc15d82616d Zhang Yi      2021-07-06  771  		EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
b6bfc15d82616d Zhang Yi      2021-07-06  772  	}
b6bfc15d82616d Zhang Yi      2021-07-06  773  	unlock_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  774  	put_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  775  
b6bfc15d82616d Zhang Yi      2021-07-06  776  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  777  	 * Don't mark the inode dirty under page lock. First, it unnecessarily
b6bfc15d82616d Zhang Yi      2021-07-06  778  	 * makes the holding time of page lock longer. Second, it forces lock
b6bfc15d82616d Zhang Yi      2021-07-06  779  	 * ordering of page lock and transaction start for journaling
b6bfc15d82616d Zhang Yi      2021-07-06  780  	 * filesystems.
b6bfc15d82616d Zhang Yi      2021-07-06  781  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  782  	if (likely(copied) || i_size_changed)
362eca70b53389 Theodore Ts'o 2018-07-10  783  		mark_inode_dirty(inode);
b6bfc15d82616d Zhang Yi      2021-07-06  784  out:
b6bfc15d82616d Zhang Yi      2021-07-06  785  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  786  	 * If we have allocated more blocks and copied less. We will have
b6bfc15d82616d Zhang Yi      2021-07-06  787  	 * blocks allocated outside inode->i_size, so truncate them.
b6bfc15d82616d Zhang Yi      2021-07-06  788  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  789  	if (pos + len > inode->i_size && ext4_can_truncate(inode))
b6bfc15d82616d Zhang Yi      2021-07-06  790  		ext4_orphan_add(handle, inode);
4dfba1ee616130 Zhang Yi      2021-07-06  791  
b6bfc15d82616d Zhang Yi      2021-07-06  792  	ret2 = ext4_journal_stop(handle);
b6bfc15d82616d Zhang Yi      2021-07-06 @793  	if (!ret)
b6bfc15d82616d Zhang Yi      2021-07-06  794  		ret = ret2;
b6bfc15d82616d Zhang Yi      2021-07-06  795  	if (pos + len > inode->i_size) {
b6bfc15d82616d Zhang Yi      2021-07-06  796  		ext4_truncate_failed_write(inode);
b6bfc15d82616d Zhang Yi      2021-07-06  797  		/*
b6bfc15d82616d Zhang Yi      2021-07-06  798  		 * If truncate failed early the inode might still be
b6bfc15d82616d Zhang Yi      2021-07-06  799  		 * on the orphan list; we need to make sure the inode
b6bfc15d82616d Zhang Yi      2021-07-06  800  		 * is removed from the orphan list in that case.
b6bfc15d82616d Zhang Yi      2021-07-06  801  		 */
b6bfc15d82616d Zhang Yi      2021-07-06  802  		if (inode->i_nlink)
b6bfc15d82616d Zhang Yi      2021-07-06  803  			ext4_orphan_del(NULL, inode);
b6bfc15d82616d Zhang Yi      2021-07-06  804  	}
b6bfc15d82616d Zhang Yi      2021-07-06  805  	return ret ? ret : copied;
f19d5870cbf72d Tao Ma        2012-12-10  806  }
f19d5870cbf72d Tao Ma        2012-12-10  807  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 45658 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [kbuild] Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
@ 2021-07-06  9:00 ` Dan Carpenter
  0 siblings, 0 replies; 17+ messages in thread
From: Dan Carpenter @ 2021-07-06  9:00 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6835 bytes --]

Hi Zhang,

url:    https://github.com/0day-ci/linux/commits/Zhang-Yi/ext4-improve-delalloc-buffer-write-performance/20210706-103431 
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git  dev
config: i386-randconfig-m021-20210705 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/ext4/inline.c:793 ext4_write_inline_data_end() error: uninitialized symbol 'ret'.

vim +/ret +793 fs/ext4/inline.c

f19d5870cbf72d Tao Ma        2012-12-10  729  int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
f19d5870cbf72d Tao Ma        2012-12-10  730  			       unsigned copied, struct page *page)
f19d5870cbf72d Tao Ma        2012-12-10  731  {
b6bfc15d82616d Zhang Yi      2021-07-06  732  	handle_t *handle = ext4_journal_current_handle();
b6bfc15d82616d Zhang Yi      2021-07-06  733  	int i_size_changed = 0;
b6bfc15d82616d Zhang Yi      2021-07-06  734  	int no_expand;
f19d5870cbf72d Tao Ma        2012-12-10  735  	void *kaddr;
f19d5870cbf72d Tao Ma        2012-12-10  736  	struct ext4_iloc iloc;
b6bfc15d82616d Zhang Yi      2021-07-06  737  	int ret, ret2;
f19d5870cbf72d Tao Ma        2012-12-10  738  
4dfba1ee616130 Zhang Yi      2021-07-06  739  	if (unlikely(copied < len) && !PageUptodate(page))
b6bfc15d82616d Zhang Yi      2021-07-06  740  		copied = 0;
f19d5870cbf72d Tao Ma        2012-12-10  741  
b6bfc15d82616d Zhang Yi      2021-07-06  742  	if (likely(copied)) {
f19d5870cbf72d Tao Ma        2012-12-10  743  		ret = ext4_get_inode_loc(inode, &iloc);
f19d5870cbf72d Tao Ma        2012-12-10  744  		if (ret) {
b6bfc15d82616d Zhang Yi      2021-07-06  745  			unlock_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  746  			put_page(page);
f19d5870cbf72d Tao Ma        2012-12-10  747  			ext4_std_error(inode->i_sb, ret);
b6bfc15d82616d Zhang Yi      2021-07-06  748  			goto out;
f19d5870cbf72d Tao Ma        2012-12-10  749  		}
c755e251357a0c Theodore Ts'o 2017-01-11  750  		ext4_write_lock_xattr(inode, &no_expand);
f19d5870cbf72d Tao Ma        2012-12-10  751  		BUG_ON(!ext4_has_inline_data(inode));
f19d5870cbf72d Tao Ma        2012-12-10  752  
f19d5870cbf72d Tao Ma        2012-12-10  753  		kaddr = kmap_atomic(page);
4dfba1ee616130 Zhang Yi      2021-07-06  754  		ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
f19d5870cbf72d Tao Ma        2012-12-10  755  		kunmap_atomic(kaddr);
f19d5870cbf72d Tao Ma        2012-12-10  756  		SetPageUptodate(page);
f19d5870cbf72d Tao Ma        2012-12-10  757  		/* clear page dirty so that writepages wouldn't work for us. */
f19d5870cbf72d Tao Ma        2012-12-10  758  		ClearPageDirty(page);
f19d5870cbf72d Tao Ma        2012-12-10  759  
c755e251357a0c Theodore Ts'o 2017-01-11  760  		ext4_write_unlock_xattr(inode, &no_expand);
f19d5870cbf72d Tao Ma        2012-12-10  761  		brelse(iloc.bh);
b6bfc15d82616d Zhang Yi      2021-07-06  762  	}

"ret" not initialized on else path.

b6bfc15d82616d Zhang Yi      2021-07-06  763  
b6bfc15d82616d Zhang Yi      2021-07-06  764  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  765  	 * It's important to update i_size while still holding page lock:
b6bfc15d82616d Zhang Yi      2021-07-06  766  	 * page writeout could otherwise come in and zero beyond i_size.
b6bfc15d82616d Zhang Yi      2021-07-06  767  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  768  	i_size_changed = ext4_update_inode_size(inode, pos + copied);
b6bfc15d82616d Zhang Yi      2021-07-06  769  	if (ext4_should_journal_data(inode)) {
b6bfc15d82616d Zhang Yi      2021-07-06  770  		ext4_set_inode_state(inode, EXT4_STATE_JDATA);
b6bfc15d82616d Zhang Yi      2021-07-06  771  		EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
b6bfc15d82616d Zhang Yi      2021-07-06  772  	}
b6bfc15d82616d Zhang Yi      2021-07-06  773  	unlock_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  774  	put_page(page);
b6bfc15d82616d Zhang Yi      2021-07-06  775  
b6bfc15d82616d Zhang Yi      2021-07-06  776  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  777  	 * Don't mark the inode dirty under page lock. First, it unnecessarily
b6bfc15d82616d Zhang Yi      2021-07-06  778  	 * makes the holding time of page lock longer. Second, it forces lock
b6bfc15d82616d Zhang Yi      2021-07-06  779  	 * ordering of page lock and transaction start for journaling
b6bfc15d82616d Zhang Yi      2021-07-06  780  	 * filesystems.
b6bfc15d82616d Zhang Yi      2021-07-06  781  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  782  	if (likely(copied) || i_size_changed)
362eca70b53389 Theodore Ts'o 2018-07-10  783  		mark_inode_dirty(inode);
b6bfc15d82616d Zhang Yi      2021-07-06  784  out:
b6bfc15d82616d Zhang Yi      2021-07-06  785  	/*
b6bfc15d82616d Zhang Yi      2021-07-06  786  	 * If we have allocated more blocks and copied less. We will have
b6bfc15d82616d Zhang Yi      2021-07-06  787  	 * blocks allocated outside inode->i_size, so truncate them.
b6bfc15d82616d Zhang Yi      2021-07-06  788  	 */
b6bfc15d82616d Zhang Yi      2021-07-06  789  	if (pos + len > inode->i_size && ext4_can_truncate(inode))
b6bfc15d82616d Zhang Yi      2021-07-06  790  		ext4_orphan_add(handle, inode);
4dfba1ee616130 Zhang Yi      2021-07-06  791  
b6bfc15d82616d Zhang Yi      2021-07-06  792  	ret2 = ext4_journal_stop(handle);
b6bfc15d82616d Zhang Yi      2021-07-06 @793  	if (!ret)
b6bfc15d82616d Zhang Yi      2021-07-06  794  		ret = ret2;
b6bfc15d82616d Zhang Yi      2021-07-06  795  	if (pos + len > inode->i_size) {
b6bfc15d82616d Zhang Yi      2021-07-06  796  		ext4_truncate_failed_write(inode);
b6bfc15d82616d Zhang Yi      2021-07-06  797  		/*
b6bfc15d82616d Zhang Yi      2021-07-06  798  		 * If truncate failed early the inode might still be
b6bfc15d82616d Zhang Yi      2021-07-06  799  		 * on the orphan list; we need to make sure the inode
b6bfc15d82616d Zhang Yi      2021-07-06  800  		 * is removed from the orphan list in that case.
b6bfc15d82616d Zhang Yi      2021-07-06  801  		 */
b6bfc15d82616d Zhang Yi      2021-07-06  802  		if (inode->i_nlink)
b6bfc15d82616d Zhang Yi      2021-07-06  803  			ext4_orphan_del(NULL, inode);
b6bfc15d82616d Zhang Yi      2021-07-06  804  	}
b6bfc15d82616d Zhang Yi      2021-07-06  805  	return ret ? ret : copied;
f19d5870cbf72d Tao Ma        2012-12-10  806  }

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org 

_______________________________________________
kbuild mailing list -- kbuild(a)lists.01.org
To unsubscribe send an email to kbuild-leave(a)lists.01.org

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] ext4: check and update i_disksize properly
  2021-07-06  2:42 ` [RFC PATCH 1/4] ext4: check and update i_disksize properly Zhang Yi
@ 2021-07-06 12:11   ` Jan Kara
  2021-07-06 14:40     ` Zhang Yi
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2021-07-06 12:11 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 06-07-21 10:42:07, Zhang Yi wrote:
> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
> isize"), i_disksize could always be updated to i_size in ext4_setattr(),
> and it seems that there is no other way that could appear
> i_disksize < i_size besides the delalloc write. In the case of delay

Well, there are also direct IO writes which have temporarily i_disksize <
i_size but when you hold i_rwsem, you're right that delalloc is the only
reason why you can see i_disksize < i_size AFAIK.

> alloc write, ext4_writepages() could update i_disksize for the new delay
> allocated blocks properly. So we could switch to check i_size instead
> of i_disksize in ext4_da_write_end() when write to the end of the file.

I agree that since ext4_da_should_update_i_disksize() needs to return true
for us to touch i_disksize, writeback has to have already allocated block
underlying the end of write (new_i_size position) and thus we are
guaranteed that writeback will also soon update i_disksize after the
new_i_size position. So I agree that your switch to testing i_size instead
of i_disksize should not have any bad effect... Thinking about this some
more why do we need i_disksize update in ext4_da_write_end() at all? The
page will be dirtied and when writeback will happen we will update
i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user
will see zeros instead of valid data if we crash before the writeback
happened. Am I missing something guys?

								Honza

> we also could remove ext4_mark_inode_dirty() together because
> generic_write_end() will dirty the inode.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> ---
>  fs/ext4/inode.c | 21 ++++++++-------------
>  1 file changed, 8 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d8de607849df..6f6a61f3ae5f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file,
>  	 * generic_write_end() will run mark_inode_dirty() if i_size
>  	 * changes.  So let's piggyback the i_disksize mark_inode_dirty
>  	 * into that.
> +	 *
> +	 * Check i_size not i_disksize here because ext4_writepages() could
> +	 * update i_disksize from i_size for delay allocated blocks properly.
>  	 */
>  	new_i_size = pos + copied;
> -	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
> +	if (copied && new_i_size > inode->i_size) {
>  		if (ext4_has_inline_data(inode) ||
> -		    ext4_da_should_update_i_disksize(page, end)) {
> +		    ext4_da_should_update_i_disksize(page, end))
>  			ext4_update_i_disksize(inode, new_i_size);
> -			/* We need to mark inode dirty even if
> -			 * new_i_size is less that inode->i_size
> -			 * bu greater than i_disksize.(hint delalloc)
> -			 */
> -			ret = ext4_mark_inode_dirty(handle, inode);
> -		}
>  	}
>  
>  	if (write_mode != CONVERT_INLINE_DATA &&
>  	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
>  	    ext4_has_inline_data(inode))
> -		ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
> +		ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
>  						     page);
>  	else
> -		ret2 = generic_write_end(file, mapping, pos, len, copied,
> +		ret = generic_write_end(file, mapping, pos, len, copied,
>  							page, fsdata);
>  
> -	copied = ret2;
> -	if (ret2 < 0)
> -		ret = ret2;
> +	copied = ret;
>  	ret2 = ext4_journal_stop(handle);
>  	if (unlikely(ret2 && !ret))
>  		ret = ret2;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end()
  2021-07-06  2:42 ` [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end() Zhang Yi
@ 2021-07-06 12:28   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2021-07-06 12:28 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 06-07-21 10:42:08, Zhang Yi wrote:
> Current error path of ext4_write_inline_data_end() is not correct.
> 
> Firstly, it should pass out the error value if ext4_get_inode_loc()
> return fail, or else it could trigger infinite loop if we inject error
> here.
> And then it's better to add inode to orphan list if it return fail
> in ext4_journal_stop(), otherwise we could not restore inline xattr
> entry after power failure.
> Finally, we need to reset the 'ret' value if
> ext4_write_inline_data_end() return success in ext4_write_end() and
> ext4_journalled_write_end(), otherwise we could not get the error return
> value of ext4_journal_stop().
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inline.c | 15 +++++----------
>  fs/ext4/inode.c  |  7 +++++--
>  2 files changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 70cb64db33f7..28b666f25ac2 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -733,25 +733,20 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
>  	void *kaddr;
>  	struct ext4_iloc iloc;
>  
> -	if (unlikely(copied < len)) {
> -		if (!PageUptodate(page)) {
> -			copied = 0;
> -			goto out;
> -		}
> -	}
> +	if (unlikely(copied < len) && !PageUptodate(page))
> +		return 0;
>  
>  	ret = ext4_get_inode_loc(inode, &iloc);
>  	if (ret) {
>  		ext4_std_error(inode->i_sb, ret);
> -		copied = 0;
> -		goto out;
> +		return ret;
>  	}
>  
>  	ext4_write_lock_xattr(inode, &no_expand);
>  	BUG_ON(!ext4_has_inline_data(inode));
>  
>  	kaddr = kmap_atomic(page);
> -	ext4_write_inline_data(inode, &iloc, kaddr, pos, len);
> +	ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
>  	kunmap_atomic(kaddr);
>  	SetPageUptodate(page);
>  	/* clear page dirty so that writepages wouldn't work for us. */
> @@ -760,7 +755,7 @@ int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
>  	ext4_write_unlock_xattr(inode, &no_expand);
>  	brelse(iloc.bh);
>  	mark_inode_dirty(inode);
> -out:
> +
>  	return copied;
>  }
>  
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 6f6a61f3ae5f..4efd50a844b9 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1295,6 +1295,7 @@ static int ext4_write_end(struct file *file,
>  			goto errout;
>  		}
>  		copied = ret;
> +		ret = 0;
>  	} else
>  		copied = block_write_end(file, mapping, pos,
>  					 len, copied, page, fsdata);
> @@ -1321,13 +1322,14 @@ static int ext4_write_end(struct file *file,
>  	if (i_size_changed || inline_data)
>  		ret = ext4_mark_inode_dirty(handle, inode);
>  
> +errout:
>  	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
>  		/* if we have allocated more blocks and copied
>  		 * less. We will have blocks allocated outside
>  		 * inode->i_size. So truncate them
>  		 */
>  		ext4_orphan_add(handle, inode);
> -errout:
> +
>  	ret2 = ext4_journal_stop(handle);
>  	if (!ret)
>  		ret = ret2;
> @@ -1410,6 +1412,7 @@ static int ext4_journalled_write_end(struct file *file,
>  			goto errout;
>  		}
>  		copied = ret;
> +		ret = 0;
>  	} else if (unlikely(copied < len) && !PageUptodate(page)) {
>  		copied = 0;
>  		ext4_journalled_zero_new_buffers(handle, page, from, to);
> @@ -1439,6 +1442,7 @@ static int ext4_journalled_write_end(struct file *file,
>  			ret = ret2;
>  	}
>  
> +errout:
>  	if (pos + len > inode->i_size && !verity && ext4_can_truncate(inode))
>  		/* if we have allocated more blocks and copied
>  		 * less. We will have blocks allocated outside
> @@ -1446,7 +1450,6 @@ static int ext4_journalled_write_end(struct file *file,
>  		 */
>  		ext4_orphan_add(handle, inode);
>  
> -errout:
>  	ret2 = ext4_journal_stop(handle);
>  	if (!ret)
>  		ret = ret2;
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] ext4: check and update i_disksize properly
  2021-07-06 12:11   ` Jan Kara
@ 2021-07-06 14:40     ` Zhang Yi
  2021-07-06 15:26       ` Jan Kara
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Yi @ 2021-07-06 14:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/7/6 20:11, Jan Kara wrote:
> On Tue 06-07-21 10:42:07, Zhang Yi wrote:
>> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
>> isize"), i_disksize could always be updated to i_size in ext4_setattr(),
>> and it seems that there is no other way that could appear
>> i_disksize < i_size besides the delalloc write. In the case of delay
> 
> Well, there are also direct IO writes which have temporarily i_disksize <
> i_size but when you hold i_rwsem, you're right that delalloc is the only
> reason why you can see i_disksize < i_size AFAIK.
> 
>> alloc write, ext4_writepages() could update i_disksize for the new delay
>> allocated blocks properly. So we could switch to check i_size instead
>> of i_disksize in ext4_da_write_end() when write to the end of the file.
> 
> I agree that since ext4_da_should_update_i_disksize() needs to return true
> for us to touch i_disksize, writeback has to have already allocated block
> underlying the end of write (new_i_size position) and thus we are
> guaranteed that writeback will also soon update i_disksize after the
> new_i_size position. So I agree that your switch to testing i_size instead
> of i_disksize should not have any bad effect... Thinking about this some
> more why do we need i_disksize update in ext4_da_write_end() at all? The
> page will be dirtied and when writeback will happen we will update
> i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user
> will see zeros instead of valid data if we crash before the writeback
> happened. Am I missing something guys?
> 

Hi, Jan.

Do you remember the patch and question I asked 2 years ago[1][2]? The case of
new_i_size > i_size && ext4_da_should_update_i_disksize() here means partial
block append write, ext4_writepages() does not update i_disksize for this case
now. And the journal data=ordered mode also cannot guarantee write data before
metadata. So we cannot guarantee we cannot see zeros where data was written after
crash.

Thanks,
Yi.

[1]https://lore.kernel.org/linux-ext4/20190404101823.GA22313@quack2.suse.cz/
[2]https://lore.kernel.org/linux-ext4/20190405091258.GA1600@quack2.suse.cz/

> 
>> we also could remove ext4_mark_inode_dirty() together because
>> generic_write_end() will dirty the inode.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
>> ---
>>  fs/ext4/inode.c | 21 ++++++++-------------
>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index d8de607849df..6f6a61f3ae5f 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file,
>>  	 * generic_write_end() will run mark_inode_dirty() if i_size
>>  	 * changes.  So let's piggyback the i_disksize mark_inode_dirty
>>  	 * into that.
>> +	 *
>> +	 * Check i_size not i_disksize here because ext4_writepages() could
>> +	 * update i_disksize from i_size for delay allocated blocks properly.
>>  	 */
>>  	new_i_size = pos + copied;
>> -	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
>> +	if (copied && new_i_size > inode->i_size) {
>>  		if (ext4_has_inline_data(inode) ||
>> -		    ext4_da_should_update_i_disksize(page, end)) {
>> +		    ext4_da_should_update_i_disksize(page, end))
>>  			ext4_update_i_disksize(inode, new_i_size);
>> -			/* We need to mark inode dirty even if
>> -			 * new_i_size is less that inode->i_size
>> -			 * bu greater than i_disksize.(hint delalloc)
>> -			 */
>> -			ret = ext4_mark_inode_dirty(handle, inode);
>> -		}
>>  	}
>>  
>>  	if (write_mode != CONVERT_INLINE_DATA &&
>>  	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
>>  	    ext4_has_inline_data(inode))
>> -		ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
>> +		ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
>>  						     page);
>>  	else
>> -		ret2 = generic_write_end(file, mapping, pos, len, copied,
>> +		ret = generic_write_end(file, mapping, pos, len, copied,
>>  							page, fsdata);
>>  
>> -	copied = ret2;
>> -	if (ret2 < 0)
>> -		ret = ret2;
>> +	copied = ret;
>>  	ret2 = ext4_journal_stop(handle);
>>  	if (unlikely(ret2 && !ret))
>>  		ret = ret2;
>> -- 
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] ext4: check and update i_disksize properly
  2021-07-06 14:40     ` Zhang Yi
@ 2021-07-06 15:26       ` Jan Kara
  2021-07-07  6:18         ` Zhang Yi
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2021-07-06 15:26 UTC (permalink / raw)
  To: Zhang Yi; +Cc: Jan Kara, linux-ext4, tytso, adilger.kernel, yukuai3

On Tue 06-07-21 22:40:46, Zhang Yi wrote:
> On 2021/7/6 20:11, Jan Kara wrote:
> > On Tue 06-07-21 10:42:07, Zhang Yi wrote:
> >> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
> >> isize"), i_disksize could always be updated to i_size in ext4_setattr(),
> >> and it seems that there is no other way that could appear
> >> i_disksize < i_size besides the delalloc write. In the case of delay
> > 
> > Well, there are also direct IO writes which have temporarily i_disksize <
> > i_size but when you hold i_rwsem, you're right that delalloc is the only
> > reason why you can see i_disksize < i_size AFAIK.
> > 
> >> alloc write, ext4_writepages() could update i_disksize for the new delay
> >> allocated blocks properly. So we could switch to check i_size instead
> >> of i_disksize in ext4_da_write_end() when write to the end of the file.
> > 
> > I agree that since ext4_da_should_update_i_disksize() needs to return true
> > for us to touch i_disksize, writeback has to have already allocated block
> > underlying the end of write (new_i_size position) and thus we are
> > guaranteed that writeback will also soon update i_disksize after the
> > new_i_size position. So I agree that your switch to testing i_size instead
> > of i_disksize should not have any bad effect... Thinking about this some
> > more why do we need i_disksize update in ext4_da_write_end() at all? The
> > page will be dirtied and when writeback will happen we will update
> > i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user
> > will see zeros instead of valid data if we crash before the writeback
> > happened. Am I missing something guys?
> > 
> 
> Hi, Jan.
> 
> Do you remember the patch and question I asked 2 years ago[1][2]? The
> case of new_i_size > i_size && ext4_da_should_update_i_disksize() here
> means partial block append write,

Agreed.

> ext4_writepages() does not update i_disksize for this case now.

Doesn't it? Hmm, so mpage_map_and_submit_extent() certainly does make sure
we update i_size properly. But you are actually correct that
ext4_writepage() does not update i_disksize and neither does
mpage_prepare_extent_to_map() which can also writeback fully mapped pages.
Changing mpage_prepare_extent_to_map() to handle i_disksize update would be
trivial but dealing with ext4_writepage() would be difficult. So yes, let's
keep the i_disksize update in ext4_da_write_end() for now. But please add a
comment there explaining the situation. Like:

	/*
	 * Since we are holding inode lock, we are sure i_disksize <=
	 * i_size. We also know that if i_disksize < i_size, there are
	 * delalloc writes pending in the range upto i_size. If the end of
	 * the current write is <= i_size, there's no need to touch
	 * i_disksize since writeback will push i_disksize upto i_size
	 * eventually. If the end of the current write is > i_size and
	 * inside an allocated block (ext4_da_should_update_i_disksize()
	 * check), we need to update i_disksize here as neither
	 * ext4_writepage() nor certain ext4_writepages() paths not
	 * allocating blocks update i_disksize.
	 *
	 * Note that we defer inode dirtying to generic_write_end() /
	 * ext4_da_write_inline_data_end().
	 */

> And the journal data=ordered mode also
> cannot guarantee write data before metadata. So we cannot guarantee we
> cannot see zeros where data was written after crash.

Yes, but that is IMO somewhat different question.

								Honza

> 
> Thanks,
> Yi.
> 
> [1]https://lore.kernel.org/linux-ext4/20190404101823.GA22313@quack2.suse.cz/
> [2]https://lore.kernel.org/linux-ext4/20190405091258.GA1600@quack2.suse.cz/
> 
> > 
> >> we also could remove ext4_mark_inode_dirty() together because
> >> generic_write_end() will dirty the inode.
> >>
> >> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> >> ---
> >>  fs/ext4/inode.c | 21 ++++++++-------------
> >>  1 file changed, 8 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >> index d8de607849df..6f6a61f3ae5f 100644
> >> --- a/fs/ext4/inode.c
> >> +++ b/fs/ext4/inode.c
> >> @@ -3087,32 +3087,27 @@ static int ext4_da_write_end(struct file *file,
> >>  	 * generic_write_end() will run mark_inode_dirty() if i_size
> >>  	 * changes.  So let's piggyback the i_disksize mark_inode_dirty
> >>  	 * into that.
> >> +	 *
> >> +	 * Check i_size not i_disksize here because ext4_writepages() could
> >> +	 * update i_disksize from i_size for delay allocated blocks properly.
> >>  	 */
> >>  	new_i_size = pos + copied;
> >> -	if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
> >> +	if (copied && new_i_size > inode->i_size) {
> >>  		if (ext4_has_inline_data(inode) ||
> >> -		    ext4_da_should_update_i_disksize(page, end)) {
> >> +		    ext4_da_should_update_i_disksize(page, end))
> >>  			ext4_update_i_disksize(inode, new_i_size);
> >> -			/* We need to mark inode dirty even if
> >> -			 * new_i_size is less that inode->i_size
> >> -			 * bu greater than i_disksize.(hint delalloc)
> >> -			 */
> >> -			ret = ext4_mark_inode_dirty(handle, inode);
> >> -		}
> >>  	}
> >>  
> >>  	if (write_mode != CONVERT_INLINE_DATA &&
> >>  	    ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
> >>  	    ext4_has_inline_data(inode))
> >> -		ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
> >> +		ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
> >>  						     page);
> >>  	else
> >> -		ret2 = generic_write_end(file, mapping, pos, len, copied,
> >> +		ret = generic_write_end(file, mapping, pos, len, copied,
> >>  							page, fsdata);
> >>  
> >> -	copied = ret2;
> >> -	if (ret2 < 0)
> >> -		ret = ret2;
> >> +	copied = ret;
> >>  	ret2 = ext4_journal_stop(handle);
> >>  	if (unlikely(ret2 && !ret))
> >>  		ret = ret2;
> >> -- 
> >> 2.31.1
> >>
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 1/4] ext4: check and update i_disksize properly
  2021-07-06 15:26       ` Jan Kara
@ 2021-07-07  6:18         ` Zhang Yi
  0 siblings, 0 replies; 17+ messages in thread
From: Zhang Yi @ 2021-07-07  6:18 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/7/6 23:26, Jan Kara wrote:
> On Tue 06-07-21 22:40:46, Zhang Yi wrote:
>> On 2021/7/6 20:11, Jan Kara wrote:
>>> On Tue 06-07-21 10:42:07, Zhang Yi wrote:
>>>> After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
>>>> isize"), i_disksize could always be updated to i_size in ext4_setattr(),
>>>> and it seems that there is no other way that could appear
>>>> i_disksize < i_size besides the delalloc write. In the case of delay
>>>
>>> Well, there are also direct IO writes which have temporarily i_disksize <
>>> i_size but when you hold i_rwsem, you're right that delalloc is the only
>>> reason why you can see i_disksize < i_size AFAIK.
>>>
>>>> alloc write, ext4_writepages() could update i_disksize for the new delay
>>>> allocated blocks properly. So we could switch to check i_size instead
>>>> of i_disksize in ext4_da_write_end() when write to the end of the file.
>>>
>>> I agree that since ext4_da_should_update_i_disksize() needs to return true
>>> for us to touch i_disksize, writeback has to have already allocated block
>>> underlying the end of write (new_i_size position) and thus we are
>>> guaranteed that writeback will also soon update i_disksize after the
>>> new_i_size position. So I agree that your switch to testing i_size instead
>>> of i_disksize should not have any bad effect... Thinking about this some
>>> more why do we need i_disksize update in ext4_da_write_end() at all? The
>>> page will be dirtied and when writeback will happen we will update
>>> i_disksize to i_size. Updating i_disksize earlier brings no benefit - the user
>>> will see zeros instead of valid data if we crash before the writeback
>>> happened. Am I missing something guys?
>>>
>>
>> Hi, Jan.
>>
>> Do you remember the patch and question I asked 2 years ago[1][2]? The
>> case of new_i_size > i_size && ext4_da_should_update_i_disksize() here
>> means partial block append write,
> 
> Agreed.
> 
>> ext4_writepages() does not update i_disksize for this case now.
> 
> Doesn't it? Hmm, so mpage_map_and_submit_extent() certainly does make sure
> we update i_size properly. But you are actually correct that
> ext4_writepage() does not update i_disksize and neither does
> mpage_prepare_extent_to_map() which can also writeback fully mapped pages.
> Changing mpage_prepare_extent_to_map() to handle i_disksize update would be
> trivial but dealing with ext4_writepage() would be difficult. So yes, let's
> keep the i_disksize update in ext4_da_write_end() for now. But please add a
> comment there explaining the situation. Like:
> 
> 	/*
> 	 * Since we are holding inode lock, we are sure i_disksize <=
> 	 * i_size. We also know that if i_disksize < i_size, there are
> 	 * delalloc writes pending in the range upto i_size. If the end of
> 	 * the current write is <= i_size, there's no need to touch
> 	 * i_disksize since writeback will push i_disksize upto i_size
> 	 * eventually. If the end of the current write is > i_size and
> 	 * inside an allocated block (ext4_da_should_update_i_disksize()
> 	 * check), we need to update i_disksize here as neither
> 	 * ext4_writepage() nor certain ext4_writepages() paths not
> 	 * allocating blocks update i_disksize.
> 	 *
> 	 * Note that we defer inode dirtying to generic_write_end() /
> 	 * ext4_da_write_inline_data_end().
> 	 */
> 

Yeah, it makes things clear, I will add this comments in the next iteration.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
  2021-07-06  5:45   ` kernel test robot
  2021-07-06  5:48   ` kernel test robot
@ 2021-07-07 16:49   ` Jan Kara
  2021-07-10  8:13     ` Zhang Yi
  2 siblings, 1 reply; 17+ messages in thread
From: Jan Kara @ 2021-07-07 16:49 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 06-07-21 10:42:09, Zhang Yi wrote:
> Now that the inline_data file write end procedure are falled into the
> common write end functions, it is not clear. Factor them out and do
> some cleanup. This patch also drop ext4_da_write_inline_data_end()
> and switch to use ext4_write_inline_data_end() instead because we also
> need to do the same error processing if we failed to write data into
> inline entry.
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good. Just two nits below.
 
> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
> index 28b666f25ac2..8fbf8ec05bd5 100644
> --- a/fs/ext4/inline.c
> +++ b/fs/ext4/inline.c
> @@ -729,34 +729,80 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
>  int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
>  			       unsigned copied, struct page *page)
>  {
> -	int ret, no_expand;
> +	handle_t *handle = ext4_journal_current_handle();
> +	int i_size_changed = 0;
> +	int no_expand;
>  	void *kaddr;
>  	struct ext4_iloc iloc;
> +	int ret, ret2;
>  
>  	if (unlikely(copied < len) && !PageUptodate(page))
> -		return 0;
> +		copied = 0;
>  
> -	ret = ext4_get_inode_loc(inode, &iloc);
> -	if (ret) {
> -		ext4_std_error(inode->i_sb, ret);
> -		return ret;
> -	}
> +	if (likely(copied)) {
> +		ret = ext4_get_inode_loc(inode, &iloc);
> +		if (ret) {
> +			unlock_page(page);
> +			put_page(page);
> +			ext4_std_error(inode->i_sb, ret);
> +			goto out;
> +		}
> +		ext4_write_lock_xattr(inode, &no_expand);
> +		BUG_ON(!ext4_has_inline_data(inode));
>  
> -	ext4_write_lock_xattr(inode, &no_expand);
> -	BUG_ON(!ext4_has_inline_data(inode));
> +		kaddr = kmap_atomic(page);
> +		ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
> +		kunmap_atomic(kaddr);
> +		SetPageUptodate(page);
> +		/* clear page dirty so that writepages wouldn't work for us. */
> +		ClearPageDirty(page);
>  
> -	kaddr = kmap_atomic(page);
> -	ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
> -	kunmap_atomic(kaddr);
> -	SetPageUptodate(page);
> -	/* clear page dirty so that writepages wouldn't work for us. */
> -	ClearPageDirty(page);
> +		ext4_write_unlock_xattr(inode, &no_expand);
> +		brelse(iloc.bh);
> +	}
>  
> -	ext4_write_unlock_xattr(inode, &no_expand);
> -	brelse(iloc.bh);
> -	mark_inode_dirty(inode);
> +	/*
> +	 * It's important to update i_size while still holding page lock:
> +	 * page writeout could otherwise come in and zero beyond i_size.
> +	 */
> +	i_size_changed = ext4_update_inode_size(inode, pos + copied);
> +	if (ext4_should_journal_data(inode)) {
> +		ext4_set_inode_state(inode, EXT4_STATE_JDATA);
> +		EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
> +	}

I think this hunk should also go into the "if (copied)" block. There's no
point in changing i_size or i_disksize when nothing was written.

> +	unlock_page(page);
> +	put_page(page);
>  
> -	return copied;
> +	/*
> +	 * Don't mark the inode dirty under page lock. First, it unnecessarily
> +	 * makes the holding time of page lock longer. Second, it forces lock
> +	 * ordering of page lock and transaction start for journaling
> +	 * filesystems.
> +	 */
> +	if (likely(copied) || i_size_changed)
> +		mark_inode_dirty(inode);

And then it is obvious here that (copied == 0) => !i_size_changed so we can
just remove the i_size_changed term from the condition.

> +out:
> +	/*
> +	 * If we have allocated more blocks and copied less. We will have
> +	 * blocks allocated outside inode->i_size, so truncate them.
> +	 */
> +	if (pos + len > inode->i_size && ext4_can_truncate(inode))
> +		ext4_orphan_add(handle, inode);
> +
> +	ret2 = ext4_journal_stop(handle);
> +	if (!ret)
> +		ret = ret2;
> +	if (pos + len > inode->i_size) {
> +		ext4_truncate_failed_write(inode);
> +		/*
> +		 * If truncate failed early the inode might still be
> +		 * on the orphan list; we need to make sure the inode
> +		 * is removed from the orphan list in that case.
> +		 */
> +		if (inode->i_nlink)
> +			ext4_orphan_del(NULL, inode);
> +	}
> +	return ret ? ret : copied;
>  }

								Honza
---
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write
  2021-07-06  2:42 ` [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write Zhang Yi
@ 2021-07-07 16:59   ` Jan Kara
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Kara @ 2021-07-07 16:59 UTC (permalink / raw)
  To: Zhang Yi; +Cc: linux-ext4, tytso, adilger.kernel, jack, yukuai3

On Tue 06-07-21 10:42:10, Zhang Yi wrote:
> After we factor out the inline data write procedure from
> ext4_da_write_end(), we don't need to start journal handle for the cases
> of both buffer overwrite and append-write. If we need to update
> i_disksize, mark_inode_dirty() do start handle and update inode buffer.
> So we could just remove all the journal handle codes in the delalloc
> write procedure.
> 
> After this patch, we could get a lot of performance improvement. Below
> is the Unixbench comparison data test on my machine with 'Intel Xeon
> Gold 5120' CPU and nvme SSD backend.
> 
> Test cmd:
> 
>   ./Run -c 56 -i 3 fstime fsbuffer fsdisk
> 
> Before this patch:
> 
>   System Benchmarks Partial Index           BASELINE       RESULT   INDEX
>   File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
>   File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
>   File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
>                                                                     ======
>   System Benchmarks Index Score (Partial Only)                      1186.6
> 
> After this patch:
> 
>   System Benchmarks Partial Index           BASELINE       RESULT   INDEX
>   File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
>   File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
>   File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
>                                                                     ======
>   System Benchmarks Index Score (Partial Only)                      2053.0
> 
> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>

Looks good and nice speedup! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/ext4/inode.c | 60 +++++--------------------------------------------
>  1 file changed, 5 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 650da0648eba..9c86cada9a54 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2910,19 +2910,6 @@ static int ext4_nonda_switch(struct super_block *sb)
>  	return 0;
>  }
>  
> -/* We always reserve for an inode update; the superblock could be there too */
> -static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
> -{
> -	if (likely(ext4_has_feature_large_file(inode->i_sb)))
> -		return 1;
> -
> -	if (pos + len <= 0x7fffffffULL)
> -		return 1;
> -
> -	/* We might need to update the superblock to set LARGE_FILE */
> -	return 2;
> -}
> -
>  static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  			       loff_t pos, unsigned len, unsigned flags,
>  			       struct page **pagep, void **fsdata)
> @@ -2931,7 +2918,6 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  	struct page *page;
>  	pgoff_t index;
>  	struct inode *inode = mapping->host;
> -	handle_t *handle;
>  
>  	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
>  		return -EIO;
> @@ -2957,41 +2943,11 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  			return 0;
>  	}
>  
> -	/*
> -	 * grab_cache_page_write_begin() can take a long time if the
> -	 * system is thrashing due to memory pressure, or if the page
> -	 * is being written back.  So grab it first before we start
> -	 * the transaction handle.  This also allows us to allocate
> -	 * the page (if needed) without using GFP_NOFS.
> -	 */
> -retry_grab:
> +retry:
>  	page = grab_cache_page_write_begin(mapping, index, flags);
>  	if (!page)
>  		return -ENOMEM;
> -	unlock_page(page);
> -
> -	/*
> -	 * With delayed allocation, we don't log the i_disksize update
> -	 * if there is delayed block allocation. But we still need
> -	 * to journalling the i_disksize update if writes to the end
> -	 * of file which has an already mapped buffer.
> -	 */
> -retry_journal:
> -	handle = ext4_journal_start(inode, EXT4_HT_WRITE_PAGE,
> -				ext4_da_write_credits(inode, pos, len));
> -	if (IS_ERR(handle)) {
> -		put_page(page);
> -		return PTR_ERR(handle);
> -	}
>  
> -	lock_page(page);
> -	if (page->mapping != mapping) {
> -		/* The page got truncated from under us */
> -		unlock_page(page);
> -		put_page(page);
> -		ext4_journal_stop(handle);
> -		goto retry_grab;
> -	}
>  	/* In case writeback began while the page was unlocked */
>  	wait_for_stable_page(page);
>  
> @@ -3003,20 +2959,18 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  #endif
>  	if (ret < 0) {
>  		unlock_page(page);
> -		ext4_journal_stop(handle);
> +		put_page(page);
>  		/*
>  		 * block_write_begin may have instantiated a few blocks
>  		 * outside i_size.  Trim these off again. Don't need
> -		 * i_size_read because we hold i_mutex.
> +		 * i_size_read because we hold inode lock.
>  		 */
>  		if (pos + len > inode->i_size)
>  			ext4_truncate_failed_write(inode);
>  
>  		if (ret == -ENOSPC &&
>  		    ext4_should_retry_alloc(inode->i_sb, &retries))
> -			goto retry_journal;
> -
> -		put_page(page);
> +			goto retry;
>  		return ret;
>  	}
>  
> @@ -3053,8 +3007,6 @@ static int ext4_da_write_end(struct file *file,
>  			     struct page *page, void *fsdata)
>  {
>  	struct inode *inode = mapping->host;
> -	int ret;
> -	handle_t *handle = ext4_journal_current_handle();
>  	loff_t new_i_size;
>  	unsigned long start, end;
>  	int write_mode = (int)(unsigned long)fsdata;
> @@ -3086,9 +3038,7 @@ static int ext4_da_write_end(struct file *file,
>  	    ext4_da_should_update_i_disksize(page, end))
>  		ext4_update_i_disksize(inode, new_i_size);
>  
> -	copied = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
> -	ret = ext4_journal_stop(handle);
> -	return ret ? ret : copied;
> +	return generic_write_end(file, mapping, pos, len, copied, page, fsdata);
>  }
>  
>  /*
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 3/4] ext4: factor out write end code of inline file
  2021-07-07 16:49   ` Jan Kara
@ 2021-07-10  8:13     ` Zhang Yi
  0 siblings, 0 replies; 17+ messages in thread
From: Zhang Yi @ 2021-07-10  8:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, tytso, adilger.kernel, yukuai3

On 2021/7/8 0:49, Jan Kara wrote:
> On Tue 06-07-21 10:42:09, Zhang Yi wrote:
>> Now that the inline_data file write end procedure are falled into the
>> common write end functions, it is not clear. Factor them out and do
>> some cleanup. This patch also drop ext4_da_write_inline_data_end()
>> and switch to use ext4_write_inline_data_end() instead because we also
>> need to do the same error processing if we failed to write data into
>> inline entry.
>>
>> Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
> 
> Looks good. Just two nits below.
>  
>> diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
>> index 28b666f25ac2..8fbf8ec05bd5 100644
>> --- a/fs/ext4/inline.c
>> +++ b/fs/ext4/inline.c
>> @@ -729,34 +729,80 @@ int ext4_try_to_write_inline_data(struct address_space *mapping,
>>  int ext4_write_inline_data_end(struct inode *inode, loff_t pos, unsigned len,
>>  			       unsigned copied, struct page *page)
>>  {
>> -	int ret, no_expand;
>> +	handle_t *handle = ext4_journal_current_handle();
>> +	int i_size_changed = 0;
>> +	int no_expand;
>>  	void *kaddr;
>>  	struct ext4_iloc iloc;
>> +	int ret, ret2;
>>  
>>  	if (unlikely(copied < len) && !PageUptodate(page))
>> -		return 0;
>> +		copied = 0;
>>  
>> -	ret = ext4_get_inode_loc(inode, &iloc);
>> -	if (ret) {
>> -		ext4_std_error(inode->i_sb, ret);
>> -		return ret;
>> -	}
>> +	if (likely(copied)) {
>> +		ret = ext4_get_inode_loc(inode, &iloc);
>> +		if (ret) {
>> +			unlock_page(page);
>> +			put_page(page);
>> +			ext4_std_error(inode->i_sb, ret);
>> +			goto out;
>> +		}
>> +		ext4_write_lock_xattr(inode, &no_expand);
>> +		BUG_ON(!ext4_has_inline_data(inode));
>>  
>> -	ext4_write_lock_xattr(inode, &no_expand);
>> -	BUG_ON(!ext4_has_inline_data(inode));
>> +		kaddr = kmap_atomic(page);
>> +		ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
>> +		kunmap_atomic(kaddr);
>> +		SetPageUptodate(page);
>> +		/* clear page dirty so that writepages wouldn't work for us. */
>> +		ClearPageDirty(page);
>>  
>> -	kaddr = kmap_atomic(page);
>> -	ext4_write_inline_data(inode, &iloc, kaddr, pos, copied);
>> -	kunmap_atomic(kaddr);
>> -	SetPageUptodate(page);
>> -	/* clear page dirty so that writepages wouldn't work for us. */
>> -	ClearPageDirty(page);
>> +		ext4_write_unlock_xattr(inode, &no_expand);
>> +		brelse(iloc.bh);
>> +	}
>>  
>> -	ext4_write_unlock_xattr(inode, &no_expand);
>> -	brelse(iloc.bh);
>> -	mark_inode_dirty(inode);
>> +	/*
>> +	 * It's important to update i_size while still holding page lock:
>> +	 * page writeout could otherwise come in and zero beyond i_size.
>> +	 */
>> +	i_size_changed = ext4_update_inode_size(inode, pos + copied);
>> +	if (ext4_should_journal_data(inode)) {
>> +		ext4_set_inode_state(inode, EXT4_STATE_JDATA);
>> +		EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
>> +	}
> 
> I think this hunk should also go into the "if (copied)" block. There's no
> point in changing i_size or i_disksize when nothing was written.
> 

Yeah, I will put ext4_update_inode_size() into the "if (copied)" block.
Thinking about it again, IIUC, the hunk in "if (ext4_should_journal_data(inode))"
also seems useless for inline inode, and could be dropped.

Thanks,
Yi.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-07-10  8:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-06  2:42 [RFC PATCH 0/4] ext4: improve delalloc buffer write performance Zhang Yi
2021-07-06  2:42 ` [RFC PATCH 1/4] ext4: check and update i_disksize properly Zhang Yi
2021-07-06 12:11   ` Jan Kara
2021-07-06 14:40     ` Zhang Yi
2021-07-06 15:26       ` Jan Kara
2021-07-07  6:18         ` Zhang Yi
2021-07-06  2:42 ` [RFC PATCH 2/4] ext4: correct the error path of ext4_write_inline_data_end() Zhang Yi
2021-07-06 12:28   ` Jan Kara
2021-07-06  2:42 ` [RFC PATCH 3/4] ext4: factor out write end code of inline file Zhang Yi
2021-07-06  5:45   ` kernel test robot
2021-07-06  5:48   ` kernel test robot
2021-07-07 16:49   ` Jan Kara
2021-07-10  8:13     ` Zhang Yi
2021-07-06  2:42 ` [RFC PATCH 4/4] ext4: drop unnecessary journal handle in delalloc write Zhang Yi
2021-07-07 16:59   ` Jan Kara
2021-07-06  7:05 [RFC PATCH 3/4] ext4: factor out write end code of inline file kernel test robot
2021-07-06  9:00 ` [kbuild] " Dan Carpenter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.