linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: linux-fsdevel@vger.kernel.org
Cc: Ritesh Harjani <riteshh@linux.ibm.com>,
	linux-kernel@vger.kernel.org, wugyuan@cn.ibm.com,
	jlayton@kernel.org, hsiangkao@aol.com, Jan Kara <jack@suse.cz>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	ecryptfs@vger.kernel.org
Subject: Re: [RFC] lookup_one_len_unlocked() lousy calling conventions
Date: Sun, 3 Nov 2019 18:20:58 +0000	[thread overview]
Message-ID: <20191103182058.GQ26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191103163524.GO26530@ZenIV.linux.org.uk>

On Sun, Nov 03, 2019 at 04:35:24PM +0000, Al Viro wrote:

> lookup_one_len_unlocked() calling conventions are wrong for its callers.
> Namely, 11 out of 12 callers really want ERR_PTR(-ENOENT) on negatives.
> Most of them take care to check, some rely upon that being impossible in
> their case.  Interactions with dentry turning positive right after
> lookup_one_len_unlocked() has returned it are of varying bugginess...
> 
> The only exception is ecryptfs, where we do lookup in the underlying fs
> on ecryptfs_lookup() and want to retain a negative dentry if we get one.

Looking at that code... the thing that deals with the result of lookup in
underlying fs is ecryptfs_lookup_interpose(), and there we have this:
        struct inode *inode, *lower_inode = d_inode(lower_dentry);
...
        dentry_info = kmem_cache_alloc(ecryptfs_dentry_info_cache, GFP_KERNEL);
...
        if (d_really_is_negative(lower_dentry)) {
                /* We want to add because we couldn't find in lower */
                d_add(dentry, NULL);
                return NULL;
        }
        inode = __ecryptfs_get_inode(lower_inode, dentry->d_sb);

If lower dentry used to be negative, but went positive while we'd
been doing allocation, we'll get d_really_is_negative() (i.e.
!lower_dentry->d_inode) false, but lower_inode (fetched earlier)
still NULL.  __ecryptfs_get_inode() starts with
        if (lower_inode->i_sb != ecryptfs_superblock_to_lower(sb))
                return ERR_PTR(-EXDEV);
which won't be happy in that situation...  That has nothing to do
with barriers, ->d_flags, etc. - the window is rather wide here.
GFP_KERNEL allocation can block just fine.

IOW, the only caller of lookup_one_len_unlocked() that does not
reject negative dentries doesn't manage to handle them correctly ;-/

And then in the same ecryptfs_lookup_interpose() we have e.g.
        fsstack_copy_attr_atime(d_inode(dentry->d_parent),
                                d_inode(lower_dentry->d_parent));
Now, dentry->d_parent is stable; dentry is guaranteed to be new
and not yet visible to anybody else, besides it's negative and
the parent is held shared, so it couldn't have been moved around
even if it had been seen by somebody else.

However, lower_dentry->d_parent is a different story.  We are not holding
the lock on its parent anymore; it could've been moved around by somebody
mounting the underlying layer elsewhere and accessing it directly.
Moreover, there's nothing to guarantee that the pointer we fetch from
lower_dentry->d_parent won't be pointing to freed memory by the
time we get around to looking at its inode - lose the timeslice to
preemption just after fetching ->d_parent, have another process move
the damn thing around and there's nothing to keep the ex-parent
around by the time you regain CPU.

The problem goes all way back to addd65ad8d19 "eCryptfs: Filename Encryption:
filldir, lookup, and readlink" from 2009.  That turned
	lower_dir_dentry = ecryptfs_dentry_to_lower(dentry->d_parent);
into
	lower_dir_dentry = lower_dentry->d_parent;
and it had hit the fan...

Sure, "somebody mounted the underlying fs elsewhere and is actively
trying to screw us over" is not how ecryptfs is supposed to be used
and it can demonstrate unexpected behavior - odd errors, etc.
But that behaviour should not include oopsen and access to freed
kernel memory...

  reply	other threads:[~2019-11-03 18:21 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-27  4:42 [PATCH RESEND 1/1] vfs: Really check for inode ptr in lookup_fast Ritesh Harjani
2019-10-15  4:07 ` Ritesh Harjani
2019-10-22 13:38   ` Ritesh Harjani
2019-10-22 14:37     ` Al Viro
2019-10-22 14:50       ` Al Viro
2019-10-22 20:11       ` Al Viro
2019-10-23 11:05         ` Ritesh Harjani
2019-11-01 23:46           ` Al Viro
2019-11-02  6:17             ` Al Viro
2019-11-02 17:24               ` Paul E. McKenney
2019-11-02 17:22             ` Paul E. McKenney
2019-11-02 18:08               ` Al Viro
2019-11-03 14:41                 ` Paul E. McKenney
2019-11-03 16:35                 ` [RFC] lookup_one_len_unlocked() lousy calling conventions Al Viro
2019-11-03 18:20                   ` Al Viro [this message]
2019-11-03 18:51                     ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Al Viro
2019-11-03 19:03                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either Al Viro
2019-11-13  7:01                       ` [PATCH][RFC] ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable Amir Goldstein
2019-11-13 12:52                         ` Al Viro
2019-11-13 16:22                           ` Amir Goldstein
2019-11-13 20:18                           ` Jean-Louis Biasini
2019-11-03 17:05                 ` [PATCH][RFC] ecryptfs unlink/rmdir breakage (similar to caught in ecryptfs rename last year) Al Viro
2019-11-09  3:13                 ` [PATCH][RFC] race in exportfs_decode_fh() Al Viro
2019-11-09 16:55                   ` Linus Torvalds
2019-11-09 18:26                     ` Al Viro
2019-11-11  9:16                   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191103182058.GQ26530@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=ecryptfs@vger.kernel.org \
    --cc=hsiangkao@aol.com \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=wugyuan@cn.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).