Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
@ 2019-10-29  7:17 Konstantin Khlebnikov
  2019-10-29  7:20 ` Konstantin Khlebnikov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2019-10-29  7:17 UTC (permalink / raw)
  To: Andreas Dilger, linux-ext4, Theodore Ts'o, linux-kernel
  Cc: Dmitry Monakhov, Eric Whitney

If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
Delayed allocation extents are freed later in ext4_clear_inode() but this
happens when quota reference is already dropped. This leads to leak of
reserved space in quota block, which disappears after umount-mount.

This seems broken for a long time but worked somehow until recent changes
in delayed allocation.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 fs/ext4/inode.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 516faa280ced..580898145e8f 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -293,6 +293,15 @@ void ext4_evict_inode(struct inode *inode)
 				   inode->i_ino, err);
 			goto stop_handle;
 		}
+	} else if (EXT4_I(inode)->i_reserved_data_blocks) {
+		/* Deaccount reserve if inode has only delayed allocations. */
+		err = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
+		if (err) {
+			ext4_warning(inode->i_sb,
+				     "couldn't remove extents %lu (err %d)",
+				     inode->i_ino, err);
+			goto stop_handle;
+		}
 	}
 
 	/* Remove xattr references. */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-10-29  7:17 [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode() Konstantin Khlebnikov
@ 2019-10-29  7:20 ` Konstantin Khlebnikov
  2019-11-07 17:58 ` Konstantin Khlebnikov
  2019-11-08  2:08 ` Ritesh Harjani
  2 siblings, 0 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2019-10-29  7:20 UTC (permalink / raw)
  To: Andreas Dilger, linux-ext4, Theodore Ts'o, linux-kernel
  Cc: Dmitry Monakhov, Eric Whitney

On 29/10/2019 10.17, Konstantin Khlebnikov wrote:
> If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
> Delayed allocation extents are freed later in ext4_clear_inode() but this
> happens when quota reference is already dropped. This leads to leak of
> reserved space in quota block, which disappears after umount-mount.
> 
> This seems broken for a long time but worked somehow until recent changes
> in delayed allocation.

FYI, perf cannot correctly parse related perf events without this:

https://lore.kernel.org/lkml/157228145325.7530.4974461761228678289.stgit@buzz/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-10-29  7:17 [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode() Konstantin Khlebnikov
  2019-10-29  7:20 ` Konstantin Khlebnikov
@ 2019-11-07 17:58 ` Konstantin Khlebnikov
  2019-11-08  2:08 ` Ritesh Harjani
  2 siblings, 0 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2019-11-07 17:58 UTC (permalink / raw)
  To: Andreas Dilger, linux-ext4, Theodore Ts'o, linux-kernel
  Cc: Dmitry Monakhov, Eric Whitney, Jan Kara

+jack@suse.cz into Cc.

On 29/10/2019 10.17, Konstantin Khlebnikov wrote:
> If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
> Delayed allocation extents are freed later in ext4_clear_inode() but this
> happens when quota reference is already dropped. This leads to leak of
> reserved space in quota block, which disappears after umount-mount.
> 
> This seems broken for a long time but worked somehow until recent changes
> in delayed allocation.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>   fs/ext4/inode.c |    9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..580898145e8f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -293,6 +293,15 @@ void ext4_evict_inode(struct inode *inode)
>   				   inode->i_ino, err);
>   			goto stop_handle;
>   		}
> +	} else if (EXT4_I(inode)->i_reserved_data_blocks) {
> +		/* Deaccount reserve if inode has only delayed allocations. */
> +		err = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
> +		if (err) {
> +			ext4_warning(inode->i_sb,
> +				     "couldn't remove extents %lu (err %d)",
> +				     inode->i_ino, err);
> +			goto stop_handle;
> +		}
>   	}
>   
>   	/* Remove xattr references. */
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-10-29  7:17 [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode() Konstantin Khlebnikov
  2019-10-29  7:20 ` Konstantin Khlebnikov
  2019-11-07 17:58 ` Konstantin Khlebnikov
@ 2019-11-08  2:08 ` Ritesh Harjani
  2019-11-08  8:30   ` Konstantin Khlebnikov
  2 siblings, 1 reply; 7+ messages in thread
From: Ritesh Harjani @ 2019-11-08  2:08 UTC (permalink / raw)
  To: Konstantin Khlebnikov, Andreas Dilger, linux-ext4,
	Theodore Ts'o, linux-kernel
  Cc: Dmitry Monakhov, Eric Whitney



On 10/29/19 12:47 PM, Konstantin Khlebnikov wrote:
> If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
> Delayed allocation extents are freed later in ext4_clear_inode() but this
> happens when quota reference is already dropped. This leads to leak of
> reserved space in quota block, which disappears after umount-mount.
> 
> This seems broken for a long time but worked somehow until recent changes
> in delayed allocation.

Sorry, I may have missed it, but could you please help understand
what recent changes in delayed allocation make this break or worse?


A silly query, since I couldn't figure it out. Maybe the code has been
there ever since like this:-
So why can't we just move drop_dquot later after the 
ext4_es_remove_extent() (in function ext4_clear_inode)? Any known
problems around that?

-ritesh


> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>   fs/ext4/inode.c |    9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 516faa280ced..580898145e8f 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -293,6 +293,15 @@ void ext4_evict_inode(struct inode *inode)
>   				   inode->i_ino, err);
>   			goto stop_handle;
>   		}
> +	} else if (EXT4_I(inode)->i_reserved_data_blocks) {
> +		/* Deaccount reserve if inode has only delayed allocations. */
> +		err = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
> +		if (err) {
> +			ext4_warning(inode->i_sb,
> +				     "couldn't remove extents %lu (err %d)",
> +				     inode->i_ino, err);
> +			goto stop_handle;
> +		}
>   	}
> 
>   	/* Remove xattr references. */
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-11-08  2:08 ` Ritesh Harjani
@ 2019-11-08  8:30   ` Konstantin Khlebnikov
  2019-11-08 11:54     ` Jan Kara
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Khlebnikov @ 2019-11-08  8:30 UTC (permalink / raw)
  To: Ritesh Harjani, Andreas Dilger, linux-ext4, Theodore Ts'o,
	linux-kernel
  Cc: Dmitry Monakhov, Eric Whitney

On 08/11/2019 05.08, Ritesh Harjani wrote:
> 
> 
> On 10/29/19 12:47 PM, Konstantin Khlebnikov wrote:
>> If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
>> Delayed allocation extents are freed later in ext4_clear_inode() but this
>> happens when quota reference is already dropped. This leads to leak of
>> reserved space in quota block, which disappears after umount-mount.
>>
>> This seems broken for a long time but worked somehow until recent changes
>> in delayed allocation.
> 
> Sorry, I may have missed it, but could you please help understand
> what recent changes in delayed allocation make this break or worse?

I don't see problem for 4.19. Haven't bisected yet.
Most likely this is around 'reserved cluster accounting'.

I suspect before these changes something always triggered da before unlink and
space usage committed and then truncated at eviction.

> 
> 
> A silly query, since I couldn't figure it out. Maybe the code has been
> there ever since like this:-

> So why can't we just move drop_dquot later after the ext4_es_remove_extent() (in function ext4_clear_inode)? Any known
> problems around that?

Clear_inode is called also when inode evicts from cache while it has nlinks
and stays at disk. I'm not sure how this must interact with reserves.

> 
> -ritesh
> 
> 
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   fs/ext4/inode.c |    9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 516faa280ced..580898145e8f 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -293,6 +293,15 @@ void ext4_evict_inode(struct inode *inode)
>>                      inode->i_ino, err);
>>               goto stop_handle;
>>           }
>> +    } else if (EXT4_I(inode)->i_reserved_data_blocks) {
>> +        /* Deaccount reserve if inode has only delayed allocations. */
>> +        err = ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
>> +        if (err) {
>> +            ext4_warning(inode->i_sb,
>> +                     "couldn't remove extents %lu (err %d)",
>> +                     inode->i_ino, err);
>> +            goto stop_handle;
>> +        }
>>       }
>>
>>       /* Remove xattr references. */
>>
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-11-08  8:30   ` Konstantin Khlebnikov
@ 2019-11-08 11:54     ` Jan Kara
  2019-11-15  0:27       ` Theodore Y. Ts'o
  0 siblings, 1 reply; 7+ messages in thread
From: Jan Kara @ 2019-11-08 11:54 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Ritesh Harjani, Andreas Dilger, linux-ext4, Theodore Ts'o,
	linux-kernel, Dmitry Monakhov, Eric Whitney

[-- Attachment #1: Type: text/plain, Size: 1839 bytes --]

On Fri 08-11-19 11:30:56, Konstantin Khlebnikov wrote:
> On 08/11/2019 05.08, Ritesh Harjani wrote:
> > 
> > 
> > On 10/29/19 12:47 PM, Konstantin Khlebnikov wrote:
> > > If inode->i_blocks is zero then ext4_evict_inode() skips ext4_truncate().
> > > Delayed allocation extents are freed later in ext4_clear_inode() but this
> > > happens when quota reference is already dropped. This leads to leak of
> > > reserved space in quota block, which disappears after umount-mount.
> > > 
> > > This seems broken for a long time but worked somehow until recent changes
> > > in delayed allocation.
> > 
> > Sorry, I may have missed it, but could you please help understand
> > what recent changes in delayed allocation make this break or worse?
> 
> I don't see problem for 4.19. Haven't bisected yet.
> Most likely this is around 'reserved cluster accounting'.
> 
> I suspect before these changes something always triggered da before
> unlink and space usage committed and then truncated at eviction.

Yes, I think it's commit 8fcc3a580651 "ext4: rework reserved cluster
accounting when invalidating pages". Because that commit moved releasing of
reserved space from page invalidation time to extent status tree eviction
time. Does attached patch fix the problem for you?

> > A silly query, since I couldn't figure it out. Maybe the code has been
> > there ever since like this:-
> 
> > So why can't we just move drop_dquot later after the ext4_es_remove_extent() (in function ext4_clear_inode)? Any known
> > problems around that?
> 
> Clear_inode is called also when inode evicts from cache while it has nlinks
> and stays at disk. I'm not sure how this must interact with reserves.

In that case all data should be written out for such inode and thus there
should be no reserves...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-ext4-Fix-leak-of-quota-reservations.patch --]
[-- Type: text/x-patch, Size: 2145 bytes --]

From ee27836b579d3bf750d45cd7081d3433ea6fedd5 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Fri, 8 Nov 2019 12:45:11 +0100
Subject: [PATCH] ext4: Fix leak of quota reservations

Commit 8fcc3a580651 ("ext4: rework reserved cluster accounting when
invalidating pages") moved freeing of delayed allocation reservations
from dirty page invalidation time to time when we evict corresponding
status extent from extent status tree. For inodes which don't have any
blocks allocated this may actually happen only in ext4_clear_blocks()
which is after we've dropped references to quota structures from the
inode. Thus reservation of quota leaked. Fix the problem by clearing
quota information from the inode only after evicting extent status tree
in ext4_clear_inode().

Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages")
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ialloc.c | 5 -----
 fs/ext4/super.c  | 2 +-
 2 files changed, 1 insertion(+), 6 deletions(-)

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index 764ff4c56233..564e2ceb8417 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -265,13 +265,8 @@ void ext4_free_inode(handle_t *handle, struct inode *inode)
 	ext4_debug("freeing inode %lu\n", ino);
 	trace_ext4_free_inode(inode);
 
-	/*
-	 * Note: we must free any quota before locking the superblock,
-	 * as writing the quota to disk may need the lock as well.
-	 */
 	dquot_initialize(inode);
 	dquot_free_inode(inode);
-	dquot_drop(inode);
 
 	is_directory = S_ISDIR(inode->i_mode);
 
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index dd654e53ba3d..9589f09a40f6 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1172,9 +1172,9 @@ void ext4_clear_inode(struct inode *inode)
 {
 	invalidate_inode_buffers(inode);
 	clear_inode(inode);
-	dquot_drop(inode);
 	ext4_discard_preallocations(inode);
 	ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS);
+	dquot_drop(inode);
 	if (EXT4_I(inode)->jinode) {
 		jbd2_journal_release_jbd_inode(EXT4_JOURNAL(inode),
 					       EXT4_I(inode)->jinode);
-- 
2.16.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode()
  2019-11-08 11:54     ` Jan Kara
@ 2019-11-15  0:27       ` Theodore Y. Ts'o
  0 siblings, 0 replies; 7+ messages in thread
From: Theodore Y. Ts'o @ 2019-11-15  0:27 UTC (permalink / raw)
  To: Jan Kara
  Cc: Konstantin Khlebnikov, Ritesh Harjani, Andreas Dilger,
	linux-ext4, linux-kernel, Dmitry Monakhov, Eric Whitney


> From ee27836b579d3bf750d45cd7081d3433ea6fedd5 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Fri, 8 Nov 2019 12:45:11 +0100
> Subject: [PATCH] ext4: Fix leak of quota reservations
> 
> Commit 8fcc3a580651 ("ext4: rework reserved cluster accounting when
> invalidating pages") moved freeing of delayed allocation reservations
> from dirty page invalidation time to time when we evict corresponding
> status extent from extent status tree. For inodes which don't have any
> blocks allocated this may actually happen only in ext4_clear_blocks()
> which is after we've dropped references to quota structures from the
> inode. Thus reservation of quota leaked. Fix the problem by clearing
> quota information from the inode only after evicting extent status tree
> in ext4_clear_inode().
> 
> Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Fixes: 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages")
> Signed-off-by: Jan Kara <jack@suse.cz>

OK, I've applied this patch.

    	     				- Ted

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-29  7:17 [PATCH] ext4: deaccount delayed allocations at freeing inode in ext4_evict_inode() Konstantin Khlebnikov
2019-10-29  7:20 ` Konstantin Khlebnikov
2019-11-07 17:58 ` Konstantin Khlebnikov
2019-11-08  2:08 ` Ritesh Harjani
2019-11-08  8:30   ` Konstantin Khlebnikov
2019-11-08 11:54     ` Jan Kara
2019-11-15  0:27       ` Theodore Y. Ts'o

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git