stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left()
       [not found] <20191003215523.7313-1-jack@suse.cz>
@ 2019-10-03 22:05 ` Jan Kara
  2019-10-21  1:08   ` Theodore Y. Ts'o
  2019-10-03 22:05 ` [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() Jan Kara
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-10-03 22:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: Ted Tso, Jan Kara, stable

When number of free space in the journal is very low, the arithmetic in
jbd2_log_space_left() could underflow resulting in very high number of
free blocks and thus triggering assertion failure in transaction commit
code complaining there's not enough space in the journal:

J_ASSERT(journal->j_free > 1);

Properly check for the low number of free blocks.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/jbd2.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 603fbc4e2f70..10e6049c0ba9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1582,7 +1582,7 @@ static inline int jbd2_space_needed(journal_t *journal)
 static inline unsigned long jbd2_log_space_left(journal_t *journal)
 {
 	/* Allow for rounding errors */
-	unsigned long free = journal->j_free - 32;
+	long free = journal->j_free - 32;
 
 	if (journal->j_committing_transaction) {
 		unsigned long committing = atomic_read(&journal->
@@ -1591,7 +1591,7 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
 		/* Transaction + control blocks */
 		free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT);
 	}
-	return free;
+	return max_t(long, free, 0);
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
       [not found] <20191003215523.7313-1-jack@suse.cz>
  2019-10-03 22:05 ` [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
@ 2019-10-03 22:05 ` Jan Kara
  2019-10-21  1:21   ` Theodore Y. Ts'o
  2019-10-03 22:05 ` [PATCH 04/22] ext4: Fix credit estimate for final inode freeing Jan Kara
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-10-03 22:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: Ted Tso, Jan Kara, stable

When ext4_mkdir() fails to add entry into directory, it ends up dropping
freshly created inode under the running transaction and thus inode
truncation happens under that transaction. That breaks assumptions that
ext4_evict_inode() does not get called from a transaction context
(although I'm not aware of any real issue) and is completely
unnecessary. Just stop the transaction before dropping inode reference.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/namei.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index a427d2031a8d..9c872a33aea7 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2781,8 +2781,9 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 		clear_nlink(inode);
 		unlock_new_inode(inode);
 		ext4_mark_inode_dirty(handle, inode);
+		ext4_journal_stop(handle);
 		iput(inode);
-		goto out_stop;
+		goto out_retry;
 	}
 	ext4_inc_count(handle, dir);
 	ext4_update_dx_flag(dir);
@@ -2796,6 +2797,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 out_stop:
 	if (handle)
 		ext4_journal_stop(handle);
+out_retry:
 	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
 		goto retry;
 	return err;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 04/22] ext4: Fix credit estimate for final inode freeing
       [not found] <20191003215523.7313-1-jack@suse.cz>
  2019-10-03 22:05 ` [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
  2019-10-03 22:05 ` [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() Jan Kara
@ 2019-10-03 22:05 ` Jan Kara
  2019-10-21  1:07   ` Theodore Y. Ts'o
  2019-11-05 16:44 ` [PATCH 01/25] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-10-03 22:05 UTC (permalink / raw)
  To: linux-ext4; +Cc: Ted Tso, Jan Kara, stable

Estimate for the number of credits needed for final freeing of inode in
ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
orphan deletion, bitmap & group descriptor for inode freeing) and not
just 3.

Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa280ced..e6b631d50c26 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -196,7 +196,12 @@ void ext4_evict_inode(struct inode *inode)
 {
 	handle_t *handle;
 	int err;
-	int extra_credits = 3;
+	/*
+	 * Credits for final inode cleanup and freeing:
+	 * sb + inode (ext4_orphan_del()), block bitmap, group descriptor
+	 * (xattr block freeind), bitmap, group descriptor (inode freeing)
+	 */
+	int extra_credits = 6;
 	struct ext4_xattr_inode_array *ea_inode_array = NULL;
 
 	trace_ext4_evict_inode(inode);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 04/22] ext4: Fix credit estimate for final inode freeing
  2019-10-03 22:05 ` [PATCH 04/22] ext4: Fix credit estimate for final inode freeing Jan Kara
@ 2019-10-21  1:07   ` Theodore Y. Ts'o
  2019-10-24 10:30     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-21  1:07 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Fri, Oct 04, 2019 at 12:05:50AM +0200, Jan Kara wrote:
> Estimate for the number of credits needed for final freeing of inode in
> ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
> orphan deletion, bitmap & group descriptor for inode freeing) and not
> just 3.

The modification for the inode should already be included in the
calculation for ext4_blocks_for_truncate(), no?  So we only need 3
extra blocks (sb, inode bitmap, and bg descriptor for the inode).

      	     	  		    - Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left()
  2019-10-03 22:05 ` [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
@ 2019-10-21  1:08   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-21  1:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Fri, Oct 04, 2019 at 12:05:47AM +0200, Jan Kara wrote:
> When number of free space in the journal is very low, the arithmetic in
> jbd2_log_space_left() could underflow resulting in very high number of
> free blocks and thus triggering assertion failure in transaction commit
> code complaining there's not enough space in the journal:
> 
> J_ASSERT(journal->j_free > 1);
> 
> Properly check for the low number of free blocks.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

Looks good, you can add:

Reviewed-by: Theodore Ts'o <tytso@mit.edu>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
  2019-10-03 22:05 ` [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() Jan Kara
@ 2019-10-21  1:21   ` Theodore Y. Ts'o
  2019-10-24 10:19     ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-21  1:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Fri, Oct 04, 2019 at 12:05:49AM +0200, Jan Kara wrote:
> When ext4_mkdir() fails to add entry into directory, it ends up dropping
> freshly created inode under the running transaction and thus inode
> truncation happens under that transaction. That breaks assumptions that
> ext4_evict_inode() does not get called from a transaction context
> (although I'm not aware of any real issue) and is completely
> unnecessary. Just stop the transaction before dropping inode reference.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Jan Kara <jack@suse.cz>

If we call ext4_journal_stop(handle) before calling iput(inode),
there's a chance that we could crash with the inode with i_link_counts
== 0, but we won't have yet call ext4_evict_inode() to mark the inode
as free in the inode bitmap.  This would result in a inode leak.

Also, this isn't the only place where we can enter ext4_evict_inode()
with an active handle; the same situation arise in ext4_add_nondir(),
and for the same reason.

So I think the code is right as is.  Do you agree?

	     	       	       	     	- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
  2019-10-21  1:21   ` Theodore Y. Ts'o
@ 2019-10-24 10:19     ` Jan Kara
  2019-10-24 12:09       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-10-24 10:19 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, linux-ext4, stable

On Sun 20-10-19 21:21:05, Theodore Y. Ts'o wrote:
> On Fri, Oct 04, 2019 at 12:05:49AM +0200, Jan Kara wrote:
> > When ext4_mkdir() fails to add entry into directory, it ends up dropping
> > freshly created inode under the running transaction and thus inode
> > truncation happens under that transaction. That breaks assumptions that
> > ext4_evict_inode() does not get called from a transaction context
> > (although I'm not aware of any real issue) and is completely
> > unnecessary. Just stop the transaction before dropping inode reference.
> > 
> > CC: stable@vger.kernel.org
> > Signed-off-by: Jan Kara <jack@suse.cz>
> 
> If we call ext4_journal_stop(handle) before calling iput(inode),
> there's a chance that we could crash with the inode with i_link_counts
> == 0, but we won't have yet call ext4_evict_inode() to mark the inode
> as free in the inode bitmap.  This would result in a inode leak.
> 
> Also, this isn't the only place where we can enter ext4_evict_inode()
> with an active handle; the same situation arise in ext4_add_nondir(),
> and for the same reason.
> 
> So I think the code is right as is.  Do you agree?

Correct on both points. Thanks for spotting this! Now I still don't think
that calling iput() with running transaction is good. It complicates
matters with revoke record reservation but it is also fragile for other
reasons - e.g. flush worker could find the allocated inode just before we
will call iput() on it, try to write it out, block on starting transaction
and we get a deadlock with inode_wait_for_writeback() inside evict(). Now
inode *probably* won't be dirty yet by the time we get to ext4_add_nondir()
or similar, that's why I say above it's just fragile, not an outright bug.

So I'd still prefer to do the iput() outside of transaction and we can
protect from leaking the inode in case of crash by adding it to orphan
list. I'll update the patch. Thanks for review!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 04/22] ext4: Fix credit estimate for final inode freeing
  2019-10-21  1:07   ` Theodore Y. Ts'o
@ 2019-10-24 10:30     ` Jan Kara
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-10-24 10:30 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, linux-ext4, stable

On Sun 20-10-19 21:07:23, Theodore Y. Ts'o wrote:
> On Fri, Oct 04, 2019 at 12:05:50AM +0200, Jan Kara wrote:
> > Estimate for the number of credits needed for final freeing of inode in
> > ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
> > orphan deletion, bitmap & group descriptor for inode freeing) and not
> > just 3.
> 
> The modification for the inode should already be included in the
> calculation for ext4_blocks_for_truncate(), no?  So we only need 3
> extra blocks (sb, inode bitmap, and bg descriptor for the inode).

Yes, but 'extra_credits' is also passed to ext4_xattr_delete_inode() and if
that needs to restart a transaction, it needs to reserve enough for inode
modification in that new transaction. This patch is actually a result of
assertion checks I was getting with more accurate transaction restart
handling implemented later in this series...

I agree we can actually subtract 3 from
ext4_blocks_for_truncate(inode)+extra_credits when starting the initial
transaction as inode changes get double-accounted there. I can do that
and I'll also update the changelog to explain this better.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
  2019-10-24 10:19     ` Jan Kara
@ 2019-10-24 12:09       ` Theodore Y. Ts'o
  2019-10-24 13:37         ` Jan Kara
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-10-24 12:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Thu, Oct 24, 2019 at 12:19:06PM +0200, Jan Kara wrote:
> Correct on both points. Thanks for spotting this! Now I still don't think
> that calling iput() with running transaction is good. It complicates
> matters with revoke record reservation but it is also fragile for other
> reasons - e.g. flush worker could find the allocated inode just before we
> will call iput() on it, try to write it out, block on starting transaction
> and we get a deadlock with inode_wait_for_writeback() inside evict(). Now
> inode *probably* won't be dirty yet by the time we get to ext4_add_nondir()
> or similar, that's why I say above it's just fragile, not an outright bug.

But we don't ever write the inode itself via
inode_wait_for_writeback(), because how ext4 journalling works.  (See
the comments before ext4_mark_inode_dirty()).  And for the special
inodes (directories, device nodes, etc.) there's no data dirtyness to
worry about.  For regular files, we hit this code path when have just
created the inode, but were not able to add a link to the parent
directory; the fd wasn't been released to userspace yet, so it can't
be data dirty either.

So unless I'm missing something, I don't think the deadlock described
above is possible?

We can certainly add it to the orphan list if it's necessary, but it's
extra overhead and adds a global contention point.  So if it's not
necessary, I'd rather avoid it if possible, and I think it's safe to
do so in this case.

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
  2019-10-24 12:09       ` Theodore Y. Ts'o
@ 2019-10-24 13:37         ` Jan Kara
  2019-11-04 12:35           ` Theodore Y. Ts'o
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-10-24 13:37 UTC (permalink / raw)
  To: Theodore Y. Ts'o; +Cc: Jan Kara, linux-ext4, stable

On Thu 24-10-19 08:09:58, Theodore Y. Ts'o wrote:
> On Thu, Oct 24, 2019 at 12:19:06PM +0200, Jan Kara wrote:
> > Correct on both points. Thanks for spotting this! Now I still don't think
> > that calling iput() with running transaction is good. It complicates
> > matters with revoke record reservation but it is also fragile for other
> > reasons - e.g. flush worker could find the allocated inode just before we
> > will call iput() on it, try to write it out, block on starting transaction
> > and we get a deadlock with inode_wait_for_writeback() inside evict(). Now
> > inode *probably* won't be dirty yet by the time we get to ext4_add_nondir()
> > or similar, that's why I say above it's just fragile, not an outright bug.
> 
> But we don't ever write the inode itself via
> inode_wait_for_writeback(), because how ext4 journalling works.  (See
> the comments before ext4_mark_inode_dirty()).  And for the special
> inodes (directories, device nodes, etc.) there's no data dirtyness to
> worry about.  For regular files, we hit this code path when have just
> created the inode, but were not able to add a link to the parent
> directory; the fd wasn't been released to userspace yet, so it can't
> be data dirty either.
> 
> So unless I'm missing something, I don't think the deadlock described
> above is possible?

Actually, now that I look at it, large symlinks may be prone to this
deadlock. There we create unlinked inode, add it to orphan list, stop
transaction, call __page_symlink() which will dirty the inode through
mark_inode_dirty(), then we start transaction and call ext4_add_nondir()
which may call iput() while the transaction is started.

Granted we can fix just ext4_symlink() but it kind of demonstrates my point
that calling iput() under transaction is fragile - some of the stuff done
on last iput generaly ranks above transaction start, just in cases we clean
up failed create none of them happens to block currently (except for the
symlink case mentioned above). And also lockdep does not track dependencies
like inode_wait_for_writeback() as otherwise it would complain as well.

> We can certainly add it to the orphan list if it's necessary, but it's
> extra overhead and adds a global contention point.  So if it's not
> necessary, I'd rather avoid it if possible, and I think it's safe to
> do so in this case.

As this is error cleanup path (only EIO and ENOSPC are realistic failure
cases AFAICT) I don't think performance really matters here. I certainly
don't want to add inode to orphan list in the fast path. I agree that would
be non-starter. I'll try to write a patch and we'll see how bad it will be.
If you still hate it, I can have a look into how bad it would be to fix
ext4_symlink() and somehow deal with revoke reservation issues.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir()
  2019-10-24 13:37         ` Jan Kara
@ 2019-11-04 12:35           ` Theodore Y. Ts'o
  0 siblings, 0 replies; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-11-04 12:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Thu, Oct 24, 2019 at 03:37:01PM +0200, Jan Kara wrote:
> > We can certainly add it to the orphan list if it's necessary, but it's
> > extra overhead and adds a global contention point.  So if it's not
> > necessary, I'd rather avoid it if possible, and I think it's safe to
> > do so in this case.
> 
> As this is error cleanup path (only EIO and ENOSPC are realistic failure
> cases AFAICT) I don't think performance really matters here. I certainly
> don't want to add inode to orphan list in the fast path. I agree that would
> be non-starter. I'll try to write a patch and we'll see how bad it will be.
> If you still hate it, I can have a look into how bad it would be to fix
> ext4_symlink() and somehow deal with revoke reservation issues.

That seems fair; I agree, adding inodes to the list in the error path
shouldn't be an issue.

					- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 01/25] jbd2: Fix possible overflow in jbd2_log_space_left()
       [not found] <20191003215523.7313-1-jack@suse.cz>
                   ` (2 preceding siblings ...)
  2019-10-03 22:05 ` [PATCH 04/22] ext4: Fix credit estimate for final inode freeing Jan Kara
@ 2019-11-05 16:44 ` Jan Kara
  2019-11-05 16:44 ` [PATCH 05/25] ext4: Do not iput inode under running transaction Jan Kara
  2019-11-05 16:44 ` [PATCH 06/25] ext4: Fix credit estimate for final inode freeing Jan Kara
  5 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-11-05 16:44 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara, stable

When number of free space in the journal is very low, the arithmetic in
jbd2_log_space_left() could underflow resulting in very high number of
free blocks and thus triggering assertion failure in transaction commit
code complaining there's not enough space in the journal:

J_ASSERT(journal->j_free > 1);

Properly check for the low number of free blocks.

CC: stable@vger.kernel.org
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 include/linux/jbd2.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 603fbc4e2f70..10e6049c0ba9 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1582,7 +1582,7 @@ static inline int jbd2_space_needed(journal_t *journal)
 static inline unsigned long jbd2_log_space_left(journal_t *journal)
 {
 	/* Allow for rounding errors */
-	unsigned long free = journal->j_free - 32;
+	long free = journal->j_free - 32;
 
 	if (journal->j_committing_transaction) {
 		unsigned long committing = atomic_read(&journal->
@@ -1591,7 +1591,7 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal)
 		/* Transaction + control blocks */
 		free -= committing + (committing >> JBD2_CONTROL_BLOCKS_SHIFT);
 	}
-	return free;
+	return max_t(long, free, 0);
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 05/25] ext4: Do not iput inode under running transaction
       [not found] <20191003215523.7313-1-jack@suse.cz>
                   ` (3 preceding siblings ...)
  2019-11-05 16:44 ` [PATCH 01/25] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
@ 2019-11-05 16:44 ` Jan Kara
  2019-11-05 16:44 ` [PATCH 06/25] ext4: Fix credit estimate for final inode freeing Jan Kara
  5 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2019-11-05 16:44 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara, stable

When ext4_mkdir(), ext4_symlink(), ext4_create(), or ext4_mknod() fail
to add entry into directory, it ends up dropping freshly created inode
under the running transaction and thus inode truncation happens under
that transaction. That breaks assumptions that evict() does not get
called from a transaction context and at least in ext4_symlink() case it
can result in inode eviction deadlocking in inode_wait_for_writeback()
when flush worker finds symlink inode, starts to write it back and
blocks on starting a transaction. So change the code in ext4_mkdir() and
ext4_add_nondir() to drop inode reference only after the transaction is
stopped. We also have to add inode to the orphan list in that case as
otherwise the inode would get leaked in case we crash before inode
deletion is committed.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/namei.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 97cf1c8b56b2..a67cae3c8ff5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2547,21 +2547,29 @@ static void ext4_dec_count(handle_t *handle, struct inode *inode)
 }
 
 
+/*
+ * Add non-directory inode to a directory. On success, the inode reference is
+ * consumed by dentry is instantiation. This is also indicated by clearing of
+ * *inodep pointer. On failure, the caller is responsible for dropping the
+ * inode reference in the safe context.
+ */
 static int ext4_add_nondir(handle_t *handle,
-		struct dentry *dentry, struct inode *inode)
+		struct dentry *dentry, struct inode **inodep)
 {
 	struct inode *dir = d_inode(dentry->d_parent);
+	struct inode *inode = *inodep;
 	int err = ext4_add_entry(handle, dentry, inode);
 	if (!err) {
 		ext4_mark_inode_dirty(handle, inode);
 		if (IS_DIRSYNC(dir))
 			ext4_handle_sync(handle);
 		d_instantiate_new(dentry, inode);
+		*inodep = NULL;
 		return 0;
 	}
 	drop_nlink(inode);
+	ext4_orphan_add(handle, inode);
 	unlock_new_inode(inode);
-	iput(inode);
 	return err;
 }
 
@@ -2595,10 +2603,12 @@ static int ext4_create(struct inode *dir, struct dentry *dentry, umode_t mode,
 		inode->i_op = &ext4_file_inode_operations;
 		inode->i_fop = &ext4_file_operations;
 		ext4_set_aops(inode);
-		err = ext4_add_nondir(handle, dentry, inode);
+		err = ext4_add_nondir(handle, dentry, &inode);
 	}
 	if (handle)
 		ext4_journal_stop(handle);
+	if (!IS_ERR_OR_NULL(inode))
+		iput(inode);
 	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
 		goto retry;
 	return err;
@@ -2625,10 +2635,12 @@ static int ext4_mknod(struct inode *dir, struct dentry *dentry,
 	if (!IS_ERR(inode)) {
 		init_special_inode(inode, inode->i_mode, rdev);
 		inode->i_op = &ext4_special_inode_operations;
-		err = ext4_add_nondir(handle, dentry, inode);
+		err = ext4_add_nondir(handle, dentry, &inode);
 	}
 	if (handle)
 		ext4_journal_stop(handle);
+	if (!IS_ERR_OR_NULL(inode))
+		iput(inode);
 	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
 		goto retry;
 	return err;
@@ -2778,10 +2790,12 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	if (err) {
 out_clear_inode:
 		clear_nlink(inode);
+		ext4_orphan_add(handle, inode);
 		unlock_new_inode(inode);
 		ext4_mark_inode_dirty(handle, inode);
+		ext4_journal_stop(handle);
 		iput(inode);
-		goto out_stop;
+		goto out_retry;
 	}
 	ext4_inc_count(handle, dir);
 	ext4_update_dx_flag(dir);
@@ -2795,6 +2809,7 @@ static int ext4_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 out_stop:
 	if (handle)
 		ext4_journal_stop(handle);
+out_retry:
 	if (err == -ENOSPC && ext4_should_retry_alloc(dir->i_sb, &retries))
 		goto retry;
 	return err;
@@ -3327,9 +3342,11 @@ static int ext4_symlink(struct inode *dir,
 		inode->i_size = disk_link.len - 1;
 	}
 	EXT4_I(inode)->i_disksize = inode->i_size;
-	err = ext4_add_nondir(handle, dentry, inode);
+	err = ext4_add_nondir(handle, dentry, &inode);
 	if (handle)
 		ext4_journal_stop(handle);
+	if (inode)
+		iput(inode);
 	goto out_free_encrypted_link;
 
 err_drop_inode:
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 06/25] ext4: Fix credit estimate for final inode freeing
       [not found] <20191003215523.7313-1-jack@suse.cz>
                   ` (4 preceding siblings ...)
  2019-11-05 16:44 ` [PATCH 05/25] ext4: Do not iput inode under running transaction Jan Kara
@ 2019-11-05 16:44 ` Jan Kara
  2019-11-05 21:00   ` Theodore Y. Ts'o
  5 siblings, 1 reply; 15+ messages in thread
From: Jan Kara @ 2019-11-05 16:44 UTC (permalink / raw)
  To: Ted Tso; +Cc: linux-ext4, Jan Kara, stable

Estimate for the number of credits needed for final freeing of inode in
ext4_evict_inode() was to small. We may modify 4 blocks (inode & sb for
orphan deletion, bitmap & group descriptor for inode freeing) and not
just 3.

Fixes: e50e5129f384 ("ext4: xattr-in-inode support")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/inode.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa280ced..81bc2fb23c40 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -196,7 +196,12 @@ void ext4_evict_inode(struct inode *inode)
 {
 	handle_t *handle;
 	int err;
-	int extra_credits = 3;
+	/*
+	 * Credits for final inode cleanup and freeing:
+	 * sb + inode (ext4_orphan_del()), block bitmap, group descriptor
+	 * (xattr block freeing), bitmap, group descriptor (inode freeing)
+	 */
+	int extra_credits = 6;
 	struct ext4_xattr_inode_array *ea_inode_array = NULL;
 
 	trace_ext4_evict_inode(inode);
@@ -252,8 +257,12 @@ void ext4_evict_inode(struct inode *inode)
 	if (!IS_NOQUOTA(inode))
 		extra_credits += EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb);
 
+	/*
+	 * Block bitmap, group descriptor, and inode are accounted in both
+ 	 * ext4_blocks_for_truncate() and extra_credits. So subtract 3.
+	 */
 	handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE,
-				 ext4_blocks_for_truncate(inode)+extra_credits);
+			 ext4_blocks_for_truncate(inode) + extra_credits - 3);
 	if (IS_ERR(handle)) {
 		ext4_std_error(inode->i_sb, PTR_ERR(handle));
 		/*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 06/25] ext4: Fix credit estimate for final inode freeing
  2019-11-05 16:44 ` [PATCH 06/25] ext4: Fix credit estimate for final inode freeing Jan Kara
@ 2019-11-05 21:00   ` Theodore Y. Ts'o
  0 siblings, 0 replies; 15+ messages in thread
From: Theodore Y. Ts'o @ 2019-11-05 21:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-ext4, stable

On Tue, Nov 05, 2019 at 05:44:12PM +0100, Jan Kara wrote:
> @@ -252,8 +257,12 @@ void ext4_evict_inode(struct inode *inode)
>  	if (!IS_NOQUOTA(inode))
>  		extra_credits += EXT4_MAXQUOTAS_DEL_BLOCKS(inode->i_sb);
>  
> +	/*
> +	 * Block bitmap, group descriptor, and inode are accounted in both
> + 	 * ext4_blocks_for_truncate() and extra_credits. So subtract 3.
  ^^^

There was a minor whitespace nit which I fixed up in my tree here.

      	    	  	     	       - Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-11-05 21:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20191003215523.7313-1-jack@suse.cz>
2019-10-03 22:05 ` [PATCH 01/22] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
2019-10-21  1:08   ` Theodore Y. Ts'o
2019-10-03 22:05 ` [PATCH 03/22] ext4: Do not iput inode under running transaction in ext4_mkdir() Jan Kara
2019-10-21  1:21   ` Theodore Y. Ts'o
2019-10-24 10:19     ` Jan Kara
2019-10-24 12:09       ` Theodore Y. Ts'o
2019-10-24 13:37         ` Jan Kara
2019-11-04 12:35           ` Theodore Y. Ts'o
2019-10-03 22:05 ` [PATCH 04/22] ext4: Fix credit estimate for final inode freeing Jan Kara
2019-10-21  1:07   ` Theodore Y. Ts'o
2019-10-24 10:30     ` Jan Kara
2019-11-05 16:44 ` [PATCH 01/25] jbd2: Fix possible overflow in jbd2_log_space_left() Jan Kara
2019-11-05 16:44 ` [PATCH 05/25] ext4: Do not iput inode under running transaction Jan Kara
2019-11-05 16:44 ` [PATCH 06/25] ext4: Fix credit estimate for final inode freeing Jan Kara
2019-11-05 21:00   ` Theodore Y. Ts'o

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).