* [PATCH AUTOSEL 5.4 041/134] ext4: update direct I/O read lock pattern for IOCB_NOWAIT
[not found] <20191211151150.19073-1-sashal@kernel.org>
@ 2019-12-11 15:10 ` Sasha Levin
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 042/134] ext4: iomap that extends beyond EOF should be marked dirty Sasha Levin
` (3 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2019-12-11 15:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Matthew Bobrowski, Jan Kara, Ritesh Harjani, Theodore Ts'o,
Sasha Levin, linux-ext4
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
[ Upstream commit 548feebec7e93e58b647dba70b3303dcb569c914 ]
This patch updates the lock pattern in ext4_direct_IO_read() to not
block on inode lock in cases of IOCB_NOWAIT direct I/O reads. The
locking condition implemented here is similar to that of 942491c9e6d6
("xfs: fix AIM7 regression").
Fixes: 16c54688592c ("ext4: Allow parallel DIO reads")
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/c5d5e759f91747359fbd2c6f9a36240cf75ad79f.1572949325.git.mbobrowski@mbobrowski.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/inode.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d691d1783ed62..207b8c23e53d9 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3827,7 +3827,13 @@ static ssize_t ext4_direct_IO_read(struct kiocb *iocb, struct iov_iter *iter)
* writes & truncates and since we take care of writing back page cache,
* we are protected against page writeback as well.
*/
- inode_lock_shared(inode);
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ if (!inode_trylock_shared(inode))
+ return -EAGAIN;
+ } else {
+ inode_lock_shared(inode);
+ }
+
ret = filemap_write_and_wait_range(mapping, iocb->ki_pos,
iocb->ki_pos + count - 1);
if (ret)
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH AUTOSEL 5.4 042/134] ext4: iomap that extends beyond EOF should be marked dirty
[not found] <20191211151150.19073-1-sashal@kernel.org>
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 041/134] ext4: update direct I/O read lock pattern for IOCB_NOWAIT Sasha Levin
@ 2019-12-11 15:10 ` Sasha Levin
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 043/134] jbd2: Fix statistics for the number of logged blocks Sasha Levin
` (2 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2019-12-11 15:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Matthew Bobrowski, Jan Kara, Ritesh Harjani, Theodore Ts'o,
Sasha Levin, linux-ext4
From: Matthew Bobrowski <mbobrowski@mbobrowski.org>
[ Upstream commit 2e9b51d78229d5145725a481bb5464ebc0a3f9b2 ]
This patch addresses what Dave Chinner had discovered and fixed within
commit: 7684e2c4384d. This changes does not have any user visible
impact for ext4 as none of the current users of ext4_iomap_begin()
that extend files depend on IOMAP_F_DIRTY.
When doing a direct IO that spans the current EOF, and there are
written blocks beyond EOF that extend beyond the current write, the
only metadata update that needs to be done is a file size extension.
However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
there is IO completion metadata updates required, and hence we may
fail to correctly sync file size extensions made in IO completion when
O_DSYNC writes are being used and the hardware supports FUA.
Hence when setting IOMAP_F_DIRTY, we need to also take into account
whether the iomap spans the current EOF. If it does, then we need to
mark it dirty so that IO completion will call generic_write_sync() to
flush the inode size update to stable storage correctly.
Signed-off-by: Matthew Bobrowski <mbobrowski@mbobrowski.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com>
Link: https://lore.kernel.org/r/8b43ee9ee94bee5328da56ba0909b7d2229ef150.1572949325.git.mbobrowski@mbobrowski.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/inode.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 207b8c23e53d9..02313f10e8897 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3523,8 +3523,14 @@ retry:
return ret;
}
+ /*
+ * Writes that span EOF might trigger an I/O size update on completion,
+ * so consider them to be dirty for the purposes of O_DSYNC, even if
+ * there is no other metadata changes being made or are pending here.
+ */
iomap->flags = 0;
- if (ext4_inode_datasync_dirty(inode))
+ if (ext4_inode_datasync_dirty(inode) ||
+ offset + length > i_size_read(inode))
iomap->flags |= IOMAP_F_DIRTY;
iomap->bdev = inode->i_sb->s_bdev;
iomap->dax_dev = sbi->s_daxdev;
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH AUTOSEL 5.4 043/134] jbd2: Fix statistics for the number of logged blocks
[not found] <20191211151150.19073-1-sashal@kernel.org>
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 041/134] ext4: update direct I/O read lock pattern for IOCB_NOWAIT Sasha Levin
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 042/134] ext4: iomap that extends beyond EOF should be marked dirty Sasha Levin
@ 2019-12-11 15:10 ` Sasha Levin
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 077/134] ext4: fix a bug in ext4_wait_for_tail_page_commit Sasha Levin
2019-12-11 15:11 ` [PATCH AUTOSEL 5.4 089/134] ext4: work around deleting a file with i_nlink == 0 safely Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2019-12-11 15:10 UTC (permalink / raw)
To: linux-kernel, stable; +Cc: Jan Kara, Theodore Ts'o, Sasha Levin, linux-ext4
From: Jan Kara <jack@suse.cz>
[ Upstream commit 015c6033068208d6227612c878877919f3fcf6b6 ]
jbd2 statistics counting number of blocks logged in a transaction was
wrong. It didn't count the commit block and more importantly it didn't
count revoke descriptor blocks. Make sure these get properly counted.
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-13-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/jbd2/commit.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 132fb92098c71..c43591cd70f1c 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -727,7 +727,6 @@ start_journal_io:
submit_bh(REQ_OP_WRITE, REQ_SYNC, bh);
}
cond_resched();
- stats.run.rs_blocks_logged += bufs;
/* Force a new descriptor to be generated next
time round the loop. */
@@ -814,6 +813,7 @@ start_journal_io:
if (unlikely(!buffer_uptodate(bh)))
err = -EIO;
jbd2_unfile_log_bh(bh);
+ stats.run.rs_blocks_logged++;
/*
* The list contains temporary buffer heads created by
@@ -859,6 +859,7 @@ start_journal_io:
BUFFER_TRACE(bh, "ph5: control buffer writeout done: unfile");
clear_buffer_jwrite(bh);
jbd2_unfile_log_bh(bh);
+ stats.run.rs_blocks_logged++;
__brelse(bh); /* One for getblk */
/* AKPM: bforget here */
}
@@ -880,6 +881,7 @@ start_journal_io:
}
if (cbh)
err = journal_wait_on_commit_record(journal, cbh);
+ stats.run.rs_blocks_logged++;
if (jbd2_has_feature_async_commit(journal) &&
journal->j_flags & JBD2_BARRIER) {
blkdev_issue_flush(journal->j_dev, GFP_NOFS, NULL);
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH AUTOSEL 5.4 077/134] ext4: fix a bug in ext4_wait_for_tail_page_commit
[not found] <20191211151150.19073-1-sashal@kernel.org>
` (2 preceding siblings ...)
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 043/134] jbd2: Fix statistics for the number of logged blocks Sasha Levin
@ 2019-12-11 15:10 ` Sasha Levin
2019-12-11 15:11 ` [PATCH AUTOSEL 5.4 089/134] ext4: work around deleting a file with i_nlink == 0 safely Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2019-12-11 15:10 UTC (permalink / raw)
To: linux-kernel, stable
Cc: yangerkun, Hulk Robot, Jan Kara, Theodore Ts'o, Sasha Levin,
linux-ext4
From: yangerkun <yangerkun@huawei.com>
[ Upstream commit 565333a1554d704789e74205989305c811fd9c7a ]
No need to wait for any commit once the page is fully truncated.
Besides, it may confuse e.g. concurrent ext4_writepage() with the page
still be dirty (will be cleared by truncate_pagecache() in
ext4_setattr()) but buffers has been freed; and then trigger a bug
show as below:
[ 26.057508] ------------[ cut here ]------------
[ 26.058531] kernel BUG at fs/ext4/inode.c:2134!
...
[ 26.088130] Call trace:
[ 26.088695] ext4_writepage+0x914/0xb28
[ 26.089541] writeout.isra.4+0x1b4/0x2b8
[ 26.090409] move_to_new_page+0x3b0/0x568
[ 26.091338] __unmap_and_move+0x648/0x988
[ 26.092241] unmap_and_move+0x48c/0xbb8
[ 26.093096] migrate_pages+0x220/0xb28
[ 26.093945] kernel_mbind+0x828/0xa18
[ 26.094791] __arm64_sys_mbind+0xc8/0x138
[ 26.095716] el0_svc_common+0x190/0x490
[ 26.096571] el0_svc_handler+0x60/0xd0
[ 26.097423] el0_svc+0x8/0xc
Run the procedure (generate by syzkaller) parallel with ext3.
void main()
{
int fd, fd1, ret;
void *addr;
size_t length = 4096;
int flags;
off_t offset = 0;
char *str = "12345";
fd = open("a", O_RDWR | O_CREAT);
assert(fd >= 0);
/* Truncate to 4k */
ret = ftruncate(fd, length);
assert(ret == 0);
/* Journal data mode */
flags = 0xc00f;
ret = ioctl(fd, _IOW('f', 2, long), &flags);
assert(ret == 0);
/* Truncate to 0 */
fd1 = open("a", O_TRUNC | O_NOATIME);
assert(fd1 >= 0);
addr = mmap(NULL, length, PROT_WRITE | PROT_READ,
MAP_SHARED, fd, offset);
assert(addr != (void *)-1);
memcpy(addr, str, 5);
mbind(addr, length, 0, 0, 0, MPOL_MF_MOVE);
}
And the bug will be triggered once we seen the below order.
reproduce1 reproduce2
... | ...
truncate to 4k |
change to journal data mode |
| memcpy(set page dirty)
truncate to 0: |
ext4_setattr: |
... |
ext4_wait_for_tail_page_commit |
| mbind(trigger bug)
truncate_pagecache(clean dirty)| ...
... |
mbind will call ext4_writepage() since the page still be dirty, and then
report the bug since the buffers has been free. Fix it by return
directly once offset equals to 0 which means the page has been fully
truncated.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20190919063508.1045-1-yangerkun@huawei.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/inode.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 02313f10e8897..8b7c704b0de5e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5462,11 +5462,15 @@ static void ext4_wait_for_tail_page_commit(struct inode *inode)
offset = inode->i_size & (PAGE_SIZE - 1);
/*
- * All buffers in the last page remain valid? Then there's nothing to
- * do. We do the check mainly to optimize the common PAGE_SIZE ==
- * blocksize case
+ * If the page is fully truncated, we don't need to wait for any commit
+ * (and we even should not as __ext4_journalled_invalidatepage() may
+ * strip all buffers from the page but keep the page dirty which can then
+ * confuse e.g. concurrent ext4_writepage() seeing dirty page without
+ * buffers). Also we don't need to wait for any commit if all buffers in
+ * the page remain valid. This is most beneficial for the common case of
+ * blocksize == PAGESIZE.
*/
- if (offset > PAGE_SIZE - i_blocksize(inode))
+ if (!offset || offset > (PAGE_SIZE - i_blocksize(inode)))
return;
while (1) {
page = find_lock_page(inode->i_mapping,
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH AUTOSEL 5.4 089/134] ext4: work around deleting a file with i_nlink == 0 safely
[not found] <20191211151150.19073-1-sashal@kernel.org>
` (3 preceding siblings ...)
2019-12-11 15:10 ` [PATCH AUTOSEL 5.4 077/134] ext4: fix a bug in ext4_wait_for_tail_page_commit Sasha Levin
@ 2019-12-11 15:11 ` Sasha Levin
4 siblings, 0 replies; 5+ messages in thread
From: Sasha Levin @ 2019-12-11 15:11 UTC (permalink / raw)
To: linux-kernel, stable
Cc: Theodore Ts'o, stable, Andreas Dilger, Sasha Levin, linux-ext4
From: Theodore Ts'o <tytso@mit.edu>
[ Upstream commit c7df4a1ecb8579838ec8c56b2bb6a6716e974f37 ]
If the file system is corrupted such that a file's i_links_count is
too small, then it's possible that when unlinking that file, i_nlink
will already be zero. Previously we were working around this kind of
corruption by forcing i_nlink to one; but we were doing this before
trying to delete the directory entry --- and if the file system is
corrupted enough that ext4_delete_entry() fails, then we exit with
i_nlink elevated, and this causes the orphan inode list handling to be
FUBAR'ed, such that when we unmount the file system, the orphan inode
list can get corrupted.
A better way to fix this is to simply skip trying to call drop_nlink()
if i_nlink is already zero, thus moving the check to the place where
it makes the most sense.
https://bugzilla.kernel.org/show_bug.cgi?id=205433
Link: https://lore.kernel.org/r/20191112032903.8828-1-tytso@mit.edu
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/ext4/namei.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a427d2031a8da..923476e3aefbc 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3182,18 +3182,17 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)
if (IS_DIRSYNC(dir))
ext4_handle_sync(handle);
- if (inode->i_nlink == 0) {
- ext4_warning_inode(inode, "Deleting file '%.*s' with no links",
- dentry->d_name.len, dentry->d_name.name);
- set_nlink(inode, 1);
- }
retval = ext4_delete_entry(handle, dir, de, bh);
if (retval)
goto end_unlink;
dir->i_ctime = dir->i_mtime = current_time(dir);
ext4_update_dx_flag(dir);
ext4_mark_inode_dirty(handle, dir);
- drop_nlink(inode);
+ if (inode->i_nlink == 0)
+ ext4_warning_inode(inode, "Deleting file '%.*s' with no links",
+ dentry->d_name.len, dentry->d_name.name);
+ else
+ drop_nlink(inode);
if (!inode->i_nlink)
ext4_orphan_add(handle, inode);
inode->i_ctime = current_time(inode);
--
2.20.1
^ permalink raw reply related [flat|nested] 5+ messages in thread