All of lore.kernel.org
 help / color / mirror / Atom feed
* Question on fscrypt_d_revalidate() and fstest generic/429
@ 2017-05-15 14:39 Richard Weinberger
  2017-05-15 19:45 ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Weinberger @ 2017-05-15 14:39 UTC (permalink / raw)
  To: linux-fscrypt
  Cc: linux-fsdevel, Eric Biggers, Theodore Ts'o, Al Viro,
	David Gstir, David Oberhollenzer, linux-kernel, linux-mtd,
	Artem Bityutskiy, Adrian Hunter

Hi!

on UBIFS, fstest generic/429 fails due to -ENFILE because the internal orphan
list reaches the maximum size.
When you unlink a file, the inode goes into the orphan list, in UBIFS' evict() function
it is being removed later. So, only unlinked but used inodes should stay in that list.

If a directory is encrypted, evict() is not being called although the inode has no
users anymore.
It turned out evict() is not being called because fscrypt's fscrypt_d_revalidate()
function.
When I omit fscrypt_set_d_op() in UBIFS code, the test works just fine.

Can it be that fscrypt_d_revalidate() misses the case of i_nlink being 0?
It seem to treat unlinked inodes as already gone and they stay.

The following change makes the problem go away here:

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index 6d6eca394d4d..d0c19838e513 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page);
 static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
 {
        struct dentry *dir;
+       struct inode *inode = d_inode(dentry);
        int dir_has_key, cached_with_key;

        if (flags & LOOKUP_RCU)
@@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
                        (!cached_with_key && dir_has_key) ||
                        (cached_with_key && !dir_has_key))
                return 0;
+
+       if (!inode || inode->i_nlink == 0)
+               return 0;
+
        return 1;
 }

Does this change make sense? TBH, I'm not really an expert in this area and it is also
not clear to me why you don't see these issue on ext4 or f2fs.
Maybe UBIFS' limitations kick in much earlier. ;-)

Thanks,
//richard

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 14:39 Question on fscrypt_d_revalidate() and fstest generic/429 Richard Weinberger
@ 2017-05-15 19:45 ` Eric Biggers
  2017-05-15 19:51   ` Richard Weinberger
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2017-05-15 19:45 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

Hi Richard,

On Mon, May 15, 2017 at 04:39:23PM +0200, Richard Weinberger wrote:
> Hi!
> 
> on UBIFS, fstest generic/429 fails due to -ENFILE because the internal orphan
> list reaches the maximum size.
> When you unlink a file, the inode goes into the orphan list, in UBIFS' evict() function
> it is being removed later. So, only unlinked but used inodes should stay in that list.
> 
> If a directory is encrypted, evict() is not being called although the inode has no
> users anymore.
> It turned out evict() is not being called because fscrypt's fscrypt_d_revalidate()
> function.
> When I omit fscrypt_set_d_op() in UBIFS code, the test works just fine.

Well, I assume you mean the t_encrypted_d_revalidate portion of the test.
generic/429 will still fail overall if you remove fscrypt_set_d_op() --- which
is expected, since it's testing dentry revalidation after all.

> 
> Can it be that fscrypt_d_revalidate() misses the case of i_nlink being 0?
> It seem to treat unlinked inodes as already gone and they stay.
> 
> The following change makes the problem go away here:
> 
> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
> index 6d6eca394d4d..d0c19838e513 100644
> --- a/fs/crypto/crypto.c
> +++ b/fs/crypto/crypto.c
> @@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page);
>  static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
>  {
>         struct dentry *dir;
> +       struct inode *inode = d_inode(dentry);
>         int dir_has_key, cached_with_key;
> 
>         if (flags & LOOKUP_RCU)
> @@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
>                         (!cached_with_key && dir_has_key) ||
>                         (cached_with_key && !dir_has_key))
>                 return 0;
> +
> +       if (!inode || inode->i_nlink == 0)
> +               return 0;
> +
>         return 1;
>  }
> 
> Does this change make sense? TBH, I'm not really an expert in this area and it is also
> not clear to me why you don't see these issue on ext4 or f2fs.
> Maybe UBIFS' limitations kick in much earlier. ;-)

The test is repeatedly creating and removing a directory "dir" while lookups are
being done in it.  It seems the problem is that many dentries are being created
for "dir", and they pin many different inodes, all at the same time.  This
actually happens for ext4 too; it just doesn't cause an observable error.

I doubt it's the right solution to make fscrypt_d_revalidate() look at
->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
inode.  I think there is probably a VFS bug that is causing the dentries to not
be freed.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 19:45 ` Eric Biggers
@ 2017-05-15 19:51   ` Richard Weinberger
  2017-05-15 23:25     ` Eric Biggers
  2017-05-15 23:50     ` Al Viro
  0 siblings, 2 replies; 9+ messages in thread
From: Richard Weinberger @ 2017-05-15 19:51 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

Eric,

Am 15.05.2017 um 21:45 schrieb Eric Biggers:
>> If a directory is encrypted, evict() is not being called although the inode has no
>> users anymore.
>> It turned out evict() is not being called because fscrypt's fscrypt_d_revalidate()
>> function.
>> When I omit fscrypt_set_d_op() in UBIFS code, the test works just fine.
> 
> Well, I assume you mean the t_encrypted_d_revalidate portion of the test.
> generic/429 will still fail overall if you remove fscrypt_set_d_op() --- which
> is expected, since it's testing dentry revalidation after all.

Ah, yes.

>>
>> Can it be that fscrypt_d_revalidate() misses the case of i_nlink being 0?
>> It seem to treat unlinked inodes as already gone and they stay.
>>
>> The following change makes the problem go away here:
>>
>> diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
>> index 6d6eca394d4d..d0c19838e513 100644
>> --- a/fs/crypto/crypto.c
>> +++ b/fs/crypto/crypto.c
>> @@ -327,6 +327,7 @@ EXPORT_SYMBOL(fscrypt_decrypt_page);
>>  static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
>>  {
>>         struct dentry *dir;
>> +       struct inode *inode = d_inode(dentry);
>>         int dir_has_key, cached_with_key;
>>
>>         if (flags & LOOKUP_RCU)
>> @@ -359,6 +360,10 @@ static int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags)
>>                         (!cached_with_key && dir_has_key) ||
>>                         (cached_with_key && !dir_has_key))
>>                 return 0;
>> +
>> +       if (!inode || inode->i_nlink == 0)
>> +               return 0;
>> +
>>         return 1;
>>  }
>>
>> Does this change make sense? TBH, I'm not really an expert in this area and it is also
>> not clear to me why you don't see these issue on ext4 or f2fs.
>> Maybe UBIFS' limitations kick in much earlier. ;-)
> 
> The test is repeatedly creating and removing a directory "dir" while lookups are
> being done in it.  It seems the problem is that many dentries are being created
> for "dir", and they pin many different inodes, all at the same time.  This
> actually happens for ext4 too; it just doesn't cause an observable error.
> 
> I doubt it's the right solution to make fscrypt_d_revalidate() look at
> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
> inode.  I think there is probably a VFS bug that is causing the dentries to not
> be freed.

Not sure. Al? :-)

Thanks,
//richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 19:51   ` Richard Weinberger
@ 2017-05-15 23:25     ` Eric Biggers
  2017-05-16  6:47       ` Richard Weinberger
  2017-05-15 23:50     ` Al Viro
  1 sibling, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2017-05-15 23:25 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote:
> > 
> > The test is repeatedly creating and removing a directory "dir" while lookups are
> > being done in it.  It seems the problem is that many dentries are being created
> > for "dir", and they pin many different inodes, all at the same time.  This
> > actually happens for ext4 too; it just doesn't cause an observable error.
> > 
> > I doubt it's the right solution to make fscrypt_d_revalidate() look at
> > ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
> > inode.  I think there is probably a VFS bug that is causing the dentries to not
> > be freed.
> 
> Not sure. Al? :-)
> 

I can reproduce this on an unencrypted directory after updating path_init() in
fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in
ref-walk mode.  So I think fscrypt_d_revalidate() was only relevant because it
causes all path lookups to drop out of rcu-walk mode.

It seems that what's happening is the "dir" dentries are not being freed because
each one has a child dentry "file" that is a negative dentry.  The "file" dentry
would normally be freed by shrink_dcache_parent() called from vfs_rmdir(), but
due to a race with stat("dir/file") the "file" dentry sometimes has nonzero
reference count at that time, causing it to remain in the subdirs list.  So we
end up with a negative dentry "file" with 0 refcount and on the dentry LRU list,
and its parent the positive dentry "dir" with 1 refcount.  And the test program
generates thousands of copies of that, with each "dir" referring to a different
inode, and they get freed only when the shrinker runs.

I'm not sure how to fix it...

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 19:51   ` Richard Weinberger
  2017-05-15 23:25     ` Eric Biggers
@ 2017-05-15 23:50     ` Al Viro
  2017-05-16 11:22       ` Richard Weinberger
  1 sibling, 1 reply; 9+ messages in thread
From: Al Viro @ 2017-05-15 23:50 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Eric Biggers, linux-fscrypt, linux-fsdevel, Eric Biggers,
	Theodore Ts'o, David Gstir, David Oberhollenzer,
	linux-kernel, linux-mtd, Artem Bityutskiy, Adrian Hunter

On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote:

> > I doubt it's the right solution to make fscrypt_d_revalidate() look at
> > ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
> > inode.  I think there is probably a VFS bug that is causing the dentries to not
> > be freed.
> 
> Not sure. Al? :-)

What's to tell VFS that they are garbage?  If your code doesn't do it,
VFS has no way to know.  And I certainly agree that using ->d_revalidate()
to trigger their unhashing is a bloody odd design; take it up with whoever
had come up with it...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 23:25     ` Eric Biggers
@ 2017-05-16  6:47       ` Richard Weinberger
  2017-05-16 22:07         ` Eric Biggers
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Weinberger @ 2017-05-16  6:47 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

Eric,

Am 16.05.2017 um 01:25 schrieb Eric Biggers:
> On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote:
>>>
>>> The test is repeatedly creating and removing a directory "dir" while lookups are
>>> being done in it.  It seems the problem is that many dentries are being created
>>> for "dir", and they pin many different inodes, all at the same time.  This
>>> actually happens for ext4 too; it just doesn't cause an observable error.
>>>
>>> I doubt it's the right solution to make fscrypt_d_revalidate() look at
>>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
>>> inode.  I think there is probably a VFS bug that is causing the dentries to not
>>> be freed.
>>
>> Not sure. Al? :-)
>>
> 
> I can reproduce this on an unencrypted directory after updating path_init() in
> fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in
> ref-walk mode.  So I think fscrypt_d_revalidate() was only relevant because it
> causes all path lookups to drop out of rcu-walk mode.

On ext4 or UBIFS?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-15 23:50     ` Al Viro
@ 2017-05-16 11:22       ` Richard Weinberger
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Weinberger @ 2017-05-16 11:22 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric Biggers, linux-fscrypt, linux-fsdevel, Eric Biggers,
	Theodore Ts'o, David Gstir, David Oberhollenzer,
	linux-kernel, linux-mtd, Artem Bityutskiy, Adrian Hunter

Al,

Am 16.05.2017 um 01:50 schrieb Al Viro:
> On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote:
> 
>>> I doubt it's the right solution to make fscrypt_d_revalidate() look at
>>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
>>> inode.  I think there is probably a VFS bug that is causing the dentries to not
>>> be freed.
>>
>> Not sure. Al? :-)
> 
> What's to tell VFS that they are garbage?  If your code doesn't do it,
> VFS has no way to know.  And I certainly agree that using ->d_revalidate()
> to trigger their unhashing is a bloody odd design; take it up with whoever
> had come up with it...

Not sure whether I understood your answer.
What is the preferred way to inform VFS that the dentry is no longer valid?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-16  6:47       ` Richard Weinberger
@ 2017-05-16 22:07         ` Eric Biggers
  2017-05-16 22:27           ` Richard Weinberger
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Biggers @ 2017-05-16 22:07 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

On Tue, May 16, 2017 at 08:47:43AM +0200, Richard Weinberger wrote:
> Eric,
> 
> Am 16.05.2017 um 01:25 schrieb Eric Biggers:
> > On Mon, May 15, 2017 at 09:51:03PM +0200, Richard Weinberger wrote:
> >>>
> >>> The test is repeatedly creating and removing a directory "dir" while lookups are
> >>> being done in it.  It seems the problem is that many dentries are being created
> >>> for "dir", and they pin many different inodes, all at the same time.  This
> >>> actually happens for ext4 too; it just doesn't cause an observable error.
> >>>
> >>> I doubt it's the right solution to make fscrypt_d_revalidate() look at
> >>> ->i_nlink, since ->d_revalidate() is meant to validate the filename, not the
> >>> inode.  I think there is probably a VFS bug that is causing the dentries to not
> >>> be freed.
> >>
> >> Not sure. Al? :-)
> >>
> > 
> > I can reproduce this on an unencrypted directory after updating path_init() in
> > fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in
> > ref-walk mode.  So I think fscrypt_d_revalidate() was only relevant because it
> > causes all path lookups to drop out of rcu-walk mode.
> 
> On ext4 or UBIFS?
> 
> Thanks,
> //richard

Both; the inode "leak" isn't filesystem-specific, beyond the fact that UBIFS
apparently limits the number of inodes on its orphan list while ext4 does not.
(How do I know it happens on ext4 then?  /proc/slabinfo shows that lots of ext4
inodes have been allocated, and after an unclean shutdown and remounting, it's
reported that 50,000+ orphan inodes were deleted.)

There are lots of cases in which path lookups switch from rcu-walk mode into
ref-walk mode, so the fact that it was being caused by ->d_revalidate() in this
specific situation isn't really important, IMO.

I think the actual fix would be something along the lines of making vfs_rmdir()
unhash any negative child dentries, so that they get "killed" by dput() later.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question on fscrypt_d_revalidate() and fstest generic/429
  2017-05-16 22:07         ` Eric Biggers
@ 2017-05-16 22:27           ` Richard Weinberger
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Weinberger @ 2017-05-16 22:27 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-fscrypt, linux-fsdevel, Eric Biggers, Theodore Ts'o,
	Al Viro, David Gstir, David Oberhollenzer, linux-kernel,
	linux-mtd, Artem Bityutskiy, Adrian Hunter

Eric,

Am 17.05.2017 um 00:07 schrieb Eric Biggers:
>>> I can reproduce this on an unencrypted directory after updating path_init() in
>>> fs/namei.c to always clear LOOKUP_RCU, so that all path lookups are done in
>>> ref-walk mode.  So I think fscrypt_d_revalidate() was only relevant because it
>>> causes all path lookups to drop out of rcu-walk mode.
>>
>> On ext4 or UBIFS?
>
> Both; the inode "leak" isn't filesystem-specific, beyond the fact that UBIFS
> apparently limits the number of inodes on its orphan list while ext4 does not.
> (How do I know it happens on ext4 then?  /proc/slabinfo shows that lots of ext4
> inodes have been allocated, and after an unclean shutdown and remounting, it's
> reported that 50,000+ orphan inodes were deleted.)

UBIFS's orphan list is limited by the size of a LEB (logical erase block).
With nandsim's default settings such a LEB is rather small and can store only
2k orphans.

> There are lots of cases in which path lookups switch from rcu-walk mode into
> ref-walk mode, so the fact that it was being caused by ->d_revalidate() in this
> specific situation isn't really important, IMO.
> 
> I think the actual fix would be something along the lines of making vfs_rmdir()
> unhash any negative child dentries, so that they get "killed" by dput() later.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-16 22:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-15 14:39 Question on fscrypt_d_revalidate() and fstest generic/429 Richard Weinberger
2017-05-15 19:45 ` Eric Biggers
2017-05-15 19:51   ` Richard Weinberger
2017-05-15 23:25     ` Eric Biggers
2017-05-16  6:47       ` Richard Weinberger
2017-05-16 22:07         ` Eric Biggers
2017-05-16 22:27           ` Richard Weinberger
2017-05-15 23:50     ` Al Viro
2017-05-16 11:22       ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.