* [PATCH 0/3] ext4: Fix issues in ext4 truncate handling @ 2019-05-21 7:43 Jan Kara 2019-05-21 7:43 ` [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode Jan Kara ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Jan Kara @ 2019-05-21 7:43 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Weiny, Ira, Jan Kara Hello, Ira Weiny has reported that ext4_setattr() doesn't handle properly failure of ext4_break_layouts(). When revieweing truncate handling code in ext4_setattr() I've found some more issues. This series fixes them. Honza ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode 2019-05-21 7:43 [PATCH 0/3] ext4: Fix issues in ext4 truncate handling Jan Kara @ 2019-05-21 7:43 ` Jan Kara 2019-05-21 18:07 ` Ira Weiny 2019-05-21 7:43 ` [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate Jan Kara 2019-05-21 7:43 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 2 siblings, 1 reply; 14+ messages in thread From: Jan Kara @ 2019-05-21 7:43 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Weiny, Ira, Jan Kara, stable We didn't wait for outstanding direct IO during truncate in nojournal mode (as we skip orphan handling in that case). This can lead to fs corruption or stale data exposure if truncate ends up freeing blocks and these get reallocated before direct IO finishes. Fix the condition determining whether the wait is necessary. CC: stable@vger.kernel.org Fixes: 1c9114f9c0f1 ("ext4: serialize unlocked dio reads with truncate") Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 82298c63ea6d..9bcb7f2b86dd 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5630,20 +5630,17 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) goto err_out; } } - if (!shrink) + if (!shrink) { pagecache_isize_extended(inode, oldsize, inode->i_size); - - /* - * Blocks are going to be removed from the inode. Wait - * for dio in flight. Temporarily disable - * dioread_nolock to prevent livelock. - */ - if (orphan) { - if (!ext4_should_journal_data(inode)) { - inode_dio_wait(inode); - } else - ext4_wait_for_tail_page_commit(inode); + } else { + /* + * Blocks are going to be removed from the inode. Wait + * for dio in flight. + */ + inode_dio_wait(inode); } + if (orphan && ext4_should_journal_data(inode)) + ext4_wait_for_tail_page_commit(inode); down_write(&EXT4_I(inode)->i_mmap_sem); rc = ext4_break_layouts(inode); -- 2.16.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode 2019-05-21 7:43 ` [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode Jan Kara @ 2019-05-21 18:07 ` Ira Weiny 0 siblings, 0 replies; 14+ messages in thread From: Ira Weiny @ 2019-05-21 18:07 UTC (permalink / raw) To: Jan Kara; +Cc: Ted Tso, linux-ext4, stable On Tue, May 21, 2019 at 09:43:56AM +0200, Jan Kara wrote: > We didn't wait for outstanding direct IO during truncate in nojournal > mode (as we skip orphan handling in that case). This can lead to fs > corruption or stale data exposure if truncate ends up freeing blocks > and these get reallocated before direct IO finishes. Fix the condition > determining whether the wait is necessary. > > CC: stable@vger.kernel.org > Fixes: 1c9114f9c0f1 ("ext4: serialize unlocked dio reads with truncate") > Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ira Weiny <ira.weiny@intel.com> > --- > fs/ext4/inode.c | 21 +++++++++------------ > 1 file changed, 9 insertions(+), 12 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 82298c63ea6d..9bcb7f2b86dd 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5630,20 +5630,17 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > goto err_out; > } > } > - if (!shrink) > + if (!shrink) { > pagecache_isize_extended(inode, oldsize, inode->i_size); > - > - /* > - * Blocks are going to be removed from the inode. Wait > - * for dio in flight. Temporarily disable > - * dioread_nolock to prevent livelock. > - */ > - if (orphan) { > - if (!ext4_should_journal_data(inode)) { > - inode_dio_wait(inode); > - } else > - ext4_wait_for_tail_page_commit(inode); > + } else { > + /* > + * Blocks are going to be removed from the inode. Wait > + * for dio in flight. > + */ > + inode_dio_wait(inode); > } > + if (orphan && ext4_should_journal_data(inode)) > + ext4_wait_for_tail_page_commit(inode); > down_write(&EXT4_I(inode)->i_mmap_sem); > > rc = ext4_break_layouts(inode); > -- > 2.16.4 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate 2019-05-21 7:43 [PATCH 0/3] ext4: Fix issues in ext4 truncate handling Jan Kara 2019-05-21 7:43 ` [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode Jan Kara @ 2019-05-21 7:43 ` Jan Kara 2019-05-21 18:13 ` Ira Weiny 2019-05-21 7:43 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 2 siblings, 1 reply; 14+ messages in thread From: Jan Kara @ 2019-05-21 7:43 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Weiny, Ira, Jan Kara It is possible that unlinked inode enters ext4_setattr() (e.g. if somebody calls ftruncate(2) on unlinked but still open file). In such case we should not delete the inode from the orphan list if truncate fails. Note that this is mostly a theoretical concern as filesystem is corrupted if we reach this path anyway but let's be consistent in our orphan handling. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 9bcb7f2b86dd..c7f77c643008 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5625,7 +5625,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) up_write(&EXT4_I(inode)->i_data_sem); ext4_journal_stop(handle); if (error) { - if (orphan) + if (orphan && inode->i_nlink) ext4_orphan_del(NULL, inode); goto err_out; } -- 2.16.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate 2019-05-21 7:43 ` [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate Jan Kara @ 2019-05-21 18:13 ` Ira Weiny 2019-05-22 9:00 ` Jan Kara 0 siblings, 1 reply; 14+ messages in thread From: Ira Weiny @ 2019-05-21 18:13 UTC (permalink / raw) To: Jan Kara; +Cc: Ted Tso, linux-ext4 On Tue, May 21, 2019 at 09:43:57AM +0200, Jan Kara wrote: > It is possible that unlinked inode enters ext4_setattr() (e.g. if > somebody calls ftruncate(2) on unlinked but still open file). In such > case we should not delete the inode from the orphan list if truncate > fails. Note that this is mostly a theoretical concern as filesystem is > corrupted if we reach this path anyway but let's be consistent in our > orphan handling. > > Signed-off-by: Jan Kara <jack@suse.cz> > --- > fs/ext4/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 9bcb7f2b86dd..c7f77c643008 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5625,7 +5625,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > up_write(&EXT4_I(inode)->i_data_sem); > ext4_journal_stop(handle); > if (error) { > - if (orphan) > + if (orphan && inode->i_nlink) > ext4_orphan_del(NULL, inode); NIT: While ext4_orphan_del() can be called even if the inode was not on the orphan list it kind of tripped me up to see this called even if ext4_orphan_add() fails... But considering how ext4_orphan_del() works: Reviewed-by: Ira Weiny <ira.weiny@intel.com> > goto err_out; > } > -- > 2.16.4 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate 2019-05-21 18:13 ` Ira Weiny @ 2019-05-22 9:00 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2019-05-22 9:00 UTC (permalink / raw) To: Ira Weiny; +Cc: Jan Kara, Ted Tso, linux-ext4 On Tue 21-05-19 11:13:49, Ira Weiny wrote: > On Tue, May 21, 2019 at 09:43:57AM +0200, Jan Kara wrote: > > It is possible that unlinked inode enters ext4_setattr() (e.g. if > > somebody calls ftruncate(2) on unlinked but still open file). In such > > case we should not delete the inode from the orphan list if truncate > > fails. Note that this is mostly a theoretical concern as filesystem is > > corrupted if we reach this path anyway but let's be consistent in our > > orphan handling. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > --- > > fs/ext4/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 9bcb7f2b86dd..c7f77c643008 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -5625,7 +5625,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > up_write(&EXT4_I(inode)->i_data_sem); > > ext4_journal_stop(handle); > > if (error) { > > - if (orphan) > > + if (orphan && inode->i_nlink) > > ext4_orphan_del(NULL, inode); > > > NIT: While ext4_orphan_del() can be called even if the inode was not on the > orphan list it kind of tripped me up to see this called even if > ext4_orphan_add() fails... > > But considering how ext4_orphan_del() works: > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> Yes, calling ext4_orphan_del() twice is harmless. You're right we just shouldn't set 'orphan = 1' if ext4_orphan_add() fails but that's independent cleanup we could do. Thanks for your review! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-21 7:43 [PATCH 0/3] ext4: Fix issues in ext4 truncate handling Jan Kara 2019-05-21 7:43 ` [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode Jan Kara 2019-05-21 7:43 ` [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate Jan Kara @ 2019-05-21 7:43 ` Jan Kara 2019-05-21 18:27 ` Ira Weiny 2 siblings, 1 reply; 14+ messages in thread From: Jan Kara @ 2019-05-21 7:43 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Weiny, Ira, Jan Kara ext4_break_layouts() may fail e.g. due to a signal being delivered. Thus we need to handle its failure gracefully and not by taking the filesystem down. Currently ext4_break_layouts() failure is rare but it may become more common once RDMA uses layout leases for handling long-term page pins for DAX mappings. To handle the failure we need to move ext4_break_layouts() earlier during setattr handling before we do hard to undo changes such as modifying inode size. To be able to do that we also have to move some other checks which are better done without holding i_mmap_sem earlier. Reported-by: "Weiny, Ira" <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 55 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 28 insertions(+), 27 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c7f77c643008..979570b42e18 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5571,7 +5571,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_SIZE) { handle_t *handle; loff_t oldsize = inode->i_size; - int shrink = (attr->ia_size <= inode->i_size); + int shrink = (attr->ia_size < inode->i_size); if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); @@ -5585,18 +5585,35 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size) inode_inc_iversion(inode); - if (ext4_should_order_data(inode) && - (attr->ia_size < inode->i_size)) { - error = ext4_begin_ordered_truncate(inode, + if (shrink) { + if (ext4_should_order_data(inode)) { + error = ext4_begin_ordered_truncate(inode, attr->ia_size); - if (error) - goto err_out; + if (error) + goto err_out; + } + /* + * Blocks are going to be removed from the inode. Wait + * for dio in flight. + */ + inode_dio_wait(inode); + } else { + pagecache_isize_extended(inode, oldsize, inode->i_size); } + + down_write(&EXT4_I(inode)->i_mmap_sem); + + rc = ext4_break_layouts(inode); + if (rc) { + up_write(&EXT4_I(inode)->i_mmap_sem); + return rc; + } + if (attr->ia_size != inode->i_size) { handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); if (IS_ERR(handle)) { error = PTR_ERR(handle); - goto err_out; + goto out_mmap_sem; } if (ext4_handle_valid(handle) && shrink) { error = ext4_orphan_add(handle, inode); @@ -5627,29 +5644,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (error) { if (orphan && inode->i_nlink) ext4_orphan_del(NULL, inode); - goto err_out; + goto out_mmap_sem; } } - if (!shrink) { - pagecache_isize_extended(inode, oldsize, inode->i_size); - } else { - /* - * Blocks are going to be removed from the inode. Wait - * for dio in flight. - */ - inode_dio_wait(inode); - } - if (orphan && ext4_should_journal_data(inode)) - ext4_wait_for_tail_page_commit(inode); - down_write(&EXT4_I(inode)->i_mmap_sem); - - rc = ext4_break_layouts(inode); - if (rc) { - up_write(&EXT4_I(inode)->i_mmap_sem); - error = rc; - goto err_out; - } + if (shrink && ext4_should_journal_data(inode)) + ext4_wait_for_tail_page_commit(inode); /* * Truncate pagecache after we've waited for commit * in data=journal mode to make pages freeable. @@ -5660,6 +5660,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (rc) error = rc; } +out_mmap_sem: up_write(&EXT4_I(inode)->i_mmap_sem); } -- 2.16.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-21 7:43 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara @ 2019-05-21 18:27 ` Ira Weiny 2019-05-22 8:57 ` Jan Kara 0 siblings, 1 reply; 14+ messages in thread From: Ira Weiny @ 2019-05-21 18:27 UTC (permalink / raw) To: Jan Kara; +Cc: Ted Tso, linux-ext4 On Tue, May 21, 2019 at 09:43:58AM +0200, Jan Kara wrote: > ext4_break_layouts() may fail e.g. due to a signal being delivered. > Thus we need to handle its failure gracefully and not by taking the > filesystem down. Currently ext4_break_layouts() failure is rare but it > may become more common once RDMA uses layout leases for handling > long-term page pins for DAX mappings. > > To handle the failure we need to move ext4_break_layouts() earlier > during setattr handling before we do hard to undo changes such as > modifying inode size. To be able to do that we also have to move some > other checks which are better done without holding i_mmap_sem earlier. > > Reported-by: "Weiny, Ira" <ira.weiny@intel.com> > Signed-off-by: Jan Kara <jack@suse.cz> This fixes the bug I was seeing WRT ext4_break_layouts(). Thanks for the help! One more NIT comment below. > --- > fs/ext4/inode.c | 55 ++++++++++++++++++++++++++++--------------------------- > 1 file changed, 28 insertions(+), 27 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index c7f77c643008..979570b42e18 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5571,7 +5571,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > if (attr->ia_valid & ATTR_SIZE) { > handle_t *handle; > loff_t oldsize = inode->i_size; > - int shrink = (attr->ia_size <= inode->i_size); > + int shrink = (attr->ia_size < inode->i_size); > > if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); > @@ -5585,18 +5585,35 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size) > inode_inc_iversion(inode); > > - if (ext4_should_order_data(inode) && > - (attr->ia_size < inode->i_size)) { > - error = ext4_begin_ordered_truncate(inode, > + if (shrink) { > + if (ext4_should_order_data(inode)) { > + error = ext4_begin_ordered_truncate(inode, > attr->ia_size); > - if (error) > - goto err_out; > + if (error) > + goto err_out; > + } > + /* > + * Blocks are going to be removed from the inode. Wait > + * for dio in flight. > + */ > + inode_dio_wait(inode); > + } else { > + pagecache_isize_extended(inode, oldsize, inode->i_size); > } > + > + down_write(&EXT4_I(inode)->i_mmap_sem); > + > + rc = ext4_break_layouts(inode); > + if (rc) { > + up_write(&EXT4_I(inode)->i_mmap_sem); > + return rc; > + } > + > if (attr->ia_size != inode->i_size) { > handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); > if (IS_ERR(handle)) { > error = PTR_ERR(handle); > - goto err_out; > + goto out_mmap_sem; > } > if (ext4_handle_valid(handle) && shrink) { > error = ext4_orphan_add(handle, inode); > @@ -5627,29 +5644,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > if (error) { > if (orphan && inode->i_nlink) > ext4_orphan_del(NULL, inode); > - goto err_out; > + goto out_mmap_sem; This goto flows through a second ext4_orphan_del() call which threw me at first. But I think this is ok. Reviewed-by: Ira Weiny <ira.weiny@intel.com> And with the series applied. Tested-by: Ira Weiny <ira.weiny@intel.com> > } > } > - if (!shrink) { > - pagecache_isize_extended(inode, oldsize, inode->i_size); > - } else { > - /* > - * Blocks are going to be removed from the inode. Wait > - * for dio in flight. > - */ > - inode_dio_wait(inode); > - } > - if (orphan && ext4_should_journal_data(inode)) > - ext4_wait_for_tail_page_commit(inode); > - down_write(&EXT4_I(inode)->i_mmap_sem); > - > - rc = ext4_break_layouts(inode); > - if (rc) { > - up_write(&EXT4_I(inode)->i_mmap_sem); > - error = rc; > - goto err_out; > - } > > + if (shrink && ext4_should_journal_data(inode)) > + ext4_wait_for_tail_page_commit(inode); > /* > * Truncate pagecache after we've waited for commit > * in data=journal mode to make pages freeable. > @@ -5660,6 +5660,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > if (rc) > error = rc; > } > +out_mmap_sem: > up_write(&EXT4_I(inode)->i_mmap_sem); > } > > -- > 2.16.4 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-21 18:27 ` Ira Weiny @ 2019-05-22 8:57 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2019-05-22 8:57 UTC (permalink / raw) To: Ira Weiny; +Cc: Jan Kara, Ted Tso, linux-ext4 On Tue 21-05-19 11:27:32, Ira Weiny wrote: > On Tue, May 21, 2019 at 09:43:58AM +0200, Jan Kara wrote: > > ext4_break_layouts() may fail e.g. due to a signal being delivered. > > Thus we need to handle its failure gracefully and not by taking the > > filesystem down. Currently ext4_break_layouts() failure is rare but it > > may become more common once RDMA uses layout leases for handling > > long-term page pins for DAX mappings. > > > > To handle the failure we need to move ext4_break_layouts() earlier > > during setattr handling before we do hard to undo changes such as > > modifying inode size. To be able to do that we also have to move some > > other checks which are better done without holding i_mmap_sem earlier. > > > > Reported-by: "Weiny, Ira" <ira.weiny@intel.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > > This fixes the bug I was seeing WRT ext4_break_layouts(). Thanks for the help! > One more NIT comment below. > > > @@ -5627,29 +5644,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > if (error) { > > if (orphan && inode->i_nlink) > > ext4_orphan_del(NULL, inode); > > - goto err_out; > > + goto out_mmap_sem; > > This goto flows through a second ext4_orphan_del() call which threw me at > first. But I think this is ok. It is OK but unnecessary. I've deleted this ext4_orphan_del() call. Thanks for testing and review! Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 0/3 v2] ext4: Fix issues in ext4 truncate handling @ 2019-05-22 9:03 Jan Kara 2019-05-22 9:03 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 0 siblings, 1 reply; 14+ messages in thread From: Jan Kara @ 2019-05-22 9:03 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Ira Weiny, Jan Kara Hello, Ira Weiny has reported that ext4_setattr() doesn't handle properly failure of ext4_break_layouts(). When revieweing truncate handling code in ext4_setattr() I've found some more issues. This series fixes them. Changes since v1: * added Reviewed-by and Tested-by tags * removed unnecessary ext4_orpan_del() call Honza ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-22 9:03 [PATCH 0/3 v2] ext4: Fix issues in ext4 truncate handling Jan Kara @ 2019-05-22 9:03 ` Jan Kara 2019-05-24 4:18 ` Theodore Ts'o 2019-05-25 3:32 ` Theodore Ts'o 0 siblings, 2 replies; 14+ messages in thread From: Jan Kara @ 2019-05-22 9:03 UTC (permalink / raw) To: Ted Tso; +Cc: linux-ext4, Ira Weiny, Jan Kara ext4_break_layouts() may fail e.g. due to a signal being delivered. Thus we need to handle its failure gracefully and not by taking the filesystem down. Currently ext4_break_layouts() failure is rare but it may become more common once RDMA uses layout leases for handling long-term page pins for DAX mappings. To handle the failure we need to move ext4_break_layouts() earlier during setattr handling before we do hard to undo changes such as modifying inode size. To be able to do that we also have to move some other checks which are better done without holding i_mmap_sem earlier. Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/inode.c | 60 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 29 insertions(+), 31 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c7f77c643008..33411ba4546a 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5571,7 +5571,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (attr->ia_valid & ATTR_SIZE) { handle_t *handle; loff_t oldsize = inode->i_size; - int shrink = (attr->ia_size <= inode->i_size); + int shrink = (attr->ia_size < inode->i_size); if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); @@ -5585,18 +5585,35 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (IS_I_VERSION(inode) && attr->ia_size != inode->i_size) inode_inc_iversion(inode); - if (ext4_should_order_data(inode) && - (attr->ia_size < inode->i_size)) { - error = ext4_begin_ordered_truncate(inode, + if (shrink) { + if (ext4_should_order_data(inode)) { + error = ext4_begin_ordered_truncate(inode, attr->ia_size); - if (error) - goto err_out; + if (error) + goto err_out; + } + /* + * Blocks are going to be removed from the inode. Wait + * for dio in flight. + */ + inode_dio_wait(inode); + } else { + pagecache_isize_extended(inode, oldsize, inode->i_size); } + + down_write(&EXT4_I(inode)->i_mmap_sem); + + rc = ext4_break_layouts(inode); + if (rc) { + up_write(&EXT4_I(inode)->i_mmap_sem); + return rc; + } + if (attr->ia_size != inode->i_size) { handle = ext4_journal_start(inode, EXT4_HT_INODE, 3); if (IS_ERR(handle)) { error = PTR_ERR(handle); - goto err_out; + goto out_mmap_sem; } if (ext4_handle_valid(handle) && shrink) { error = ext4_orphan_add(handle, inode); @@ -5624,32 +5641,12 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) i_size_write(inode, attr->ia_size); up_write(&EXT4_I(inode)->i_data_sem); ext4_journal_stop(handle); - if (error) { - if (orphan && inode->i_nlink) - ext4_orphan_del(NULL, inode); - goto err_out; - } - } - if (!shrink) { - pagecache_isize_extended(inode, oldsize, inode->i_size); - } else { - /* - * Blocks are going to be removed from the inode. Wait - * for dio in flight. - */ - inode_dio_wait(inode); - } - if (orphan && ext4_should_journal_data(inode)) - ext4_wait_for_tail_page_commit(inode); - down_write(&EXT4_I(inode)->i_mmap_sem); - - rc = ext4_break_layouts(inode); - if (rc) { - up_write(&EXT4_I(inode)->i_mmap_sem); - error = rc; - goto err_out; + if (error) + goto out_mmap_sem; } + if (shrink && ext4_should_journal_data(inode)) + ext4_wait_for_tail_page_commit(inode); /* * Truncate pagecache after we've waited for commit * in data=journal mode to make pages freeable. @@ -5660,6 +5657,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) if (rc) error = rc; } +out_mmap_sem: up_write(&EXT4_I(inode)->i_mmap_sem); } -- 2.16.4 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-22 9:03 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara @ 2019-05-24 4:18 ` Theodore Ts'o 2019-05-24 8:13 ` Jan Kara 2019-05-25 3:32 ` Theodore Ts'o 1 sibling, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2019-05-24 4:18 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Ira Weiny On Wed, May 22, 2019 at 11:03:17AM +0200, Jan Kara wrote: > ext4_break_layouts() may fail e.g. due to a signal being delivered. > Thus we need to handle its failure gracefully and not by taking the > filesystem down. Currently ext4_break_layouts() failure is rare but it > may become more common once RDMA uses layout leases for handling > long-term page pins for DAX mappings. > > To handle the failure we need to move ext4_break_layouts() earlier > during setattr handling before we do hard to undo changes such as > modifying inode sizhe. To be able to do that we also have to move some > other checks which are better done uwithout holding i_mmap_sem earlier. > > Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Jan Kara <jack@suse.cz>hh Thanks, applied. What do people think about adding marking this for stable? My take is that DAX is still not that common for most stable kernel users, and the patch moves enough stuff around that it's borderline for stable. I'm going to leave off marking for stable unless someone wants to make a case that we should so mark it. - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-24 4:18 ` Theodore Ts'o @ 2019-05-24 8:13 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2019-05-24 8:13 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4, Ira Weiny On Fri 24-05-19 00:18:29, Theodore Ts'o wrote: > On Wed, May 22, 2019 at 11:03:17AM +0200, Jan Kara wrote: > > ext4_break_layouts() may fail e.g. due to a signal being delivered. > > Thus we need to handle its failure gracefully and not by taking the > > filesystem down. Currently ext4_break_layouts() failure is rare but it > > may become more common once RDMA uses layout leases for handling > > long-term page pins for DAX mappings. > > > > To handle the failure we need to move ext4_break_layouts() earlier > > during setattr handling before we do hard to undo changes such as > > modifying inode sizhe. To be able to do that we also have to move some > > other checks which are better done uwithout holding i_mmap_sem earlier. > > > > Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > Signed-off-by: Jan Kara <jack@suse.cz>hh > > Thanks, applied. > > What do people think about adding marking this for stable? My take is > that DAX is still not that common for most stable kernel users, and > the patch moves enough stuff around that it's borderline for stable. > I'm going to leave off marking for stable unless someone wants to make > a case that we should so mark it. Yeah, my take was that I'd care about backporting this patch for stable once somebody complains that he has actually hit the problem... Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-22 9:03 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 2019-05-24 4:18 ` Theodore Ts'o @ 2019-05-25 3:32 ` Theodore Ts'o 2019-05-27 14:53 ` Jan Kara 1 sibling, 1 reply; 14+ messages in thread From: Theodore Ts'o @ 2019-05-25 3:32 UTC (permalink / raw) To: Jan Kara; +Cc: linux-ext4, Ira Weiny On Wed, May 22, 2019 at 11:03:17AM +0200, Jan Kara wrote: > ext4_break_layouts() may fail e.g. due to a signal being delivered. > Thus we need to handle its failure gracefully and not by taking the > filesystem down. Currently ext4_break_layouts() failure is rare but it > may become more common once RDMA uses layout leases for handling > long-term page pins for DAX mappings. > > To handle the failure we need to move ext4_break_layouts() earlier > during setattr handling before we do hard to undo changes such as > modifying inode size. To be able to do that we also have to move some > other checks which are better done without holding i_mmap_sem earlier. > > Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Jan Kara <jack@suse.cz> When doing some final testing before sending a pull request to Linus, I found a regression. After bisecting, this patch fails reliably under gce-xfstests: TESTRUNID: tytso-20190524230226 KERNEL: kernel 5.1.0-rc3-xfstests-00039-g079f9927c7bf #1016 SMP Fri May 24 23:00:47 EDT 2019 x86_64 CMDLINE: -c 4k generic/092 CPUS: 2 MEM: 7680 ext4/4k: 1 tests, 1 failures, 2 seconds generic/092 Failed 1s Totals: 1 tests, 0 skipped, 1 failures, 0 errors, 1s FSTESTPRJ: gce-xfstests FSTESTVER: fio fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600) FSTESTVER: quota 62661bd (Tue, 2 Apr 2019 17:04:37 +0200) FSTESTVER: xfsprogs v5.0.0 (Fri, 3 May 2019 12:14:36 -0500) FSTESTVER: xfstests-bld 9582562 (Sun, 12 May 2019 00:38:51 -0400) FSTESTVER: xfstests linux-v3.8-2390-g64233614 (Thu, 16 May 2019 00:12:52 -0400) FSTESTCFG: 4k FSTESTSET: generic/092 FSTESTOPT: aex GCE ID: 343197219467628221 generic/092 0s ... [23:05:07] [23:05:08]- output mismatch (see /results/ext4/results-4k/generic/092 .out.bad) % diff -u /tmp/results-tytso-20190524230226/ext4/results-4k/generic/092.out.bad /usr/projects/xfstests-bld/build-64/xfstests-dev/tests/generic/092.out --- /tmp/results-tytso-20190524230226/ext4/results-4k/generic/092.out.bad 2019-05-24 23:05:08.000000000 -0400 +++ /usr/projects/xfstests-bld/build-64/xfstests-dev/tests/generic/092.out 2018-02-13 23:37:20.330097382 -0500 @@ -2,6 +2,5 @@ wrote 5242880/5242880 bytes at offset 0 XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 0: [0..10239]: data -1: [10240..20479]: unwritten 0: [0..10239]: data 1: [10240..20479]: unwritten Dropping this patch makes the test failure go away. So I'm going to drop it for now. Jan, can you take a look? Thanks!! - Ted ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate 2019-05-25 3:32 ` Theodore Ts'o @ 2019-05-27 14:53 ` Jan Kara 0 siblings, 0 replies; 14+ messages in thread From: Jan Kara @ 2019-05-27 14:53 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Jan Kara, linux-ext4, Ira Weiny On Fri 24-05-19 23:32:35, Theodore Ts'o wrote: > On Wed, May 22, 2019 at 11:03:17AM +0200, Jan Kara wrote: > > ext4_break_layouts() may fail e.g. due to a signal being delivered. > > Thus we need to handle its failure gracefully and not by taking the > > filesystem down. Currently ext4_break_layouts() failure is rare but it > > may become more common once RDMA uses layout leases for handling > > long-term page pins for DAX mappings. > > > > To handle the failure we need to move ext4_break_layouts() earlier > > during setattr handling before we do hard to undo changes such as > > modifying inode size. To be able to do that we also have to move some > > other checks which are better done without holding i_mmap_sem earlier. > > > > Reported-and-tested-by: Ira Weiny <ira.weiny@intel.com> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > Signed-off-by: Jan Kara <jack@suse.cz> > > When doing some final testing before sending a pull request to Linus, > I found a regression. After bisecting, this patch fails reliably > under gce-xfstests: > > TESTRUNID: tytso-20190524230226 > KERNEL: kernel 5.1.0-rc3-xfstests-00039-g079f9927c7bf #1016 SMP Fri May 24 23:00:47 EDT 2019 x86_64 > CMDLINE: -c 4k generic/092 > CPUS: 2 > MEM: 7680 > > ext4/4k: 1 tests, 1 failures, 2 seconds > generic/092 Failed 1s > Totals: 1 tests, 0 skipped, 1 failures, 0 errors, 1s > > FSTESTPRJ: gce-xfstests > FSTESTVER: fio fio-3.2 (Fri, 3 Nov 2017 15:23:49 -0600) > FSTESTVER: quota 62661bd (Tue, 2 Apr 2019 17:04:37 +0200) > FSTESTVER: xfsprogs v5.0.0 (Fri, 3 May 2019 12:14:36 -0500) > FSTESTVER: xfstests-bld 9582562 (Sun, 12 May 2019 00:38:51 -0400) > FSTESTVER: xfstests linux-v3.8-2390-g64233614 (Thu, 16 May 2019 00:12:52 -0400) > FSTESTCFG: 4k > FSTESTSET: generic/092 > FSTESTOPT: aex > GCE ID: 343197219467628221 > > generic/092 0s ... [23:05:07] [23:05:08]- output mismatch (see /results/ext4/results-4k/generic/092 > .out.bad) > % diff -u /tmp/results-tytso-20190524230226/ext4/results-4k/generic/092.out.bad /usr/projects/xfstests-bld/build-64/xfstests-dev/tests/generic/092.out > --- /tmp/results-tytso-20190524230226/ext4/results-4k/generic/092.out.bad 2019-05-24 23:05:08.000000000 -0400 > +++ /usr/projects/xfstests-bld/build-64/xfstests-dev/tests/generic/092.out 2018-02-13 23:37:20.330097382 -0500 > @@ -2,6 +2,5 @@ > wrote 5242880/5242880 bytes at offset 0 > XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) > 0: [0..10239]: data > -1: [10240..20479]: unwritten > 0: [0..10239]: data > 1: [10240..20479]: unwritten > > > Dropping this patch makes the test failure go away. So I'm going to > drop it for now. Jan, can you take a look? Thanks!! Ah, right. I wonder how I missed that failure in my test run. Anyway I see what is the problem. I'll send updated patch. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-05-27 14:53 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-21 7:43 [PATCH 0/3] ext4: Fix issues in ext4 truncate handling Jan Kara 2019-05-21 7:43 ` [PATCH 1/3] ext4: Wait for outstanding dio during truncate in nojournal mode Jan Kara 2019-05-21 18:07 ` Ira Weiny 2019-05-21 7:43 ` [PATCH 2/3] ext4: Do not delete unlinked inode from orphan list on failed truncate Jan Kara 2019-05-21 18:13 ` Ira Weiny 2019-05-22 9:00 ` Jan Kara 2019-05-21 7:43 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 2019-05-21 18:27 ` Ira Weiny 2019-05-22 8:57 ` Jan Kara 2019-05-22 9:03 [PATCH 0/3 v2] ext4: Fix issues in ext4 truncate handling Jan Kara 2019-05-22 9:03 ` [PATCH 3/3] ext4: Gracefully handle ext4_break_layouts() failure during truncate Jan Kara 2019-05-24 4:18 ` Theodore Ts'o 2019-05-24 8:13 ` Jan Kara 2019-05-25 3:32 ` Theodore Ts'o 2019-05-27 14:53 ` Jan Kara
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.