linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gao Xiang <hsiangkao@aol.com>
To: Ritesh Harjani <riteshh@linux.ibm.com>
Cc: linux-fsdevel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	aneesh.kumar@linux.ibm.com, Jeff Layton <jlayton@kernel.org>,
	wugyuan@cn.ibm.com
Subject: Re: [RFC] - vfs: Null pointer dereference issue with symlink create and read of symlink
Date: Tue, 3 Sep 2019 20:59:47 +0800	[thread overview]
Message-ID: <20190903125946.GA11069@hsiangkao-HP-ZHAN-66-Pro-G1> (raw)
In-Reply-To: <20190903115827.0A8A0A405B@b06wcsmtp001.portsmouth.uk.ibm.com>

On Tue, Sep 03, 2019 at 05:28:26PM +0530, Ritesh Harjani wrote:
> Hi Viro/All,
> 
> Could you please review below issue and it's proposed solutions.
> If you could let me know which of the two you think will be a better
> approach to solve this or in case if you have any other better approach, I
> can prepare and submit a official patch with that.
> 
> 
> 
> Issue signature:-
>  [NIP  : trailing_symlink+80]
>  [LR   : trailing_symlink+1092]
>  #4 [c00000198069bb70] trailing_symlink at c0000000004bae60  (unreliable)
>  #5 [c00000198069bc00] path_openat at c0000000004bdd14
>  #6 [c00000198069bc90] do_filp_open at c0000000004c0274
>  #7 [c00000198069bdb0] do_sys_open at c00000000049b248
>  #8 [c00000198069be30] system_call at c00000000000b388
> 
> 
> 
> Test case:-
> shell-1 - "while [ 1 ]; do cat /gpfs/g1/testdir/file3; sleep 1; done"
> shell-2 - "while [ 1 ]; do ln -s /gpfs/g1/testdir/file1
> /gpfs/g1/testdir/file3; sleep 1; rm /gpfs/g1/testdir/file3 sleep 1; done
> 
> 
> 
> Problem description:-
> In some filesystems like GPFS below described scenario may happen on some
> platforms (Reported-By:- wugyuan)
> 
> Here, two threads are being run in 2 different shells. Thread-1(cat) does
> cat of the symlink and Thread-2(ln) is creating the symlink.
> 
> Now on any platform with GPFS like filesystem, if CPU does out-of-order
> execution (or any kind of re-ordering due compiler optimization?) in
> function __d_set_and_inode_type(), then we see a NULL pointer dereference
> due to inode->i_uid.
> 
> This happens because in lookup_fast in nonRCU path or say REF-walk (i.e. in
> else condition), we check d_is_negative() without any lock protection.
> And since in __d_set_and_inode_type() re-ordering may happen in setting of
> dentry->type & dentry->inode => this means that there is this tiny window
> where things are going wrong.
> 
> 
> (GPFS like):- Any FS with -inode_operations ->permission callback returning
> -ECHILD in case of (mask & MAY_NOT_BLOCK) may cause this problem to happen.
> (few e.g. found were - ocfs2, ceph, coda, afs)
> 
> int xxx_permission(struct inode *inode, int mask)
> {
>          if (mask & MAY_NOT_BLOCK)
>                  return -ECHILD;
> 	<...>
> }
> 
> Wugyuan(cc), could reproduce this problem with GPFS filesystem.
> Since, I didn't have the GPFS setup, so I tried replicating on a native FS
> by forcing out-of-order execution in function __d_set_inode_and_type() and
> making sure we return -ECHILD in MAY_NOT_BLOCK case in ->permission
> operation for all inodes.
> 
> With above changes in kernel, I could as well hit this issue on a native FS
> too.
> 
> (basically what we observed is link_path_walk will do nonRCU(REF-walk)
> lookup due to may_lookup -> inode_permission return -ECHILD and then
> unlazy_walk drops the LOOKUP_RCU flag (nd->flag). After that below race is
> possible).
> 
> 
> 
> Sequence of events:-
> 
> Thread-2(Comm: ln)		Thread-1(Comm: cat)
> 
> 				dentry = __d_lookup() //nonRCU
> 
> __d_set_and_inode_type() (Out-of-order execution)
> 	flags = READ_ONCE(dentry->d_flags);
> 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
> 	flags |= type_flags;
> 	WRITE_ONCE(dentry->d_flags, flags);
> 
> 					
> 				if (unlikely(d_is_negative()) // fails
>   					{}
> 				// since type is already updated in
> 				// Thread-2 in parallel but inode
> 				// not yet set.
> 				// d_is_negative returns false
> 
> 				*inode = d_backing_inode(path->dentry);
> 				// means inode is still NULL
> 
> 	dentry->d_inode = inode;
> 	
> 				trailing_symlink()
> 					may_follow_link()
> 						inode = nd->link_inode;
> 						// nd->link_inode = NULL
> 						//Then it crashes while
> 						//doing inode->i_uid
> 					
> 	

It seems much similar to
https://lore.kernel.org/r/20190419084810.63732-1-houtao1@huawei.com/

Thanks,
Gao Xiang

> 
> 
> 
> Approach-1:- using wmb()
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index e88cf0554e65..966172df77ee 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -316,6 +316,7 @@ static inline void __d_set_inode_and_type(struct dentry
> *dentry,
>         unsigned flags;
> 
>         dentry->d_inode = inode;
> +       smp_wmb();
>         flags = READ_ONCE(dentry->d_flags);
>         flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>         flags |= type_flags;
> 
> 
> 
> Approach-2:- using spin_lock(&dentry->d_lock);
> 
> Do you think lock should be a better approach, given that we are already
> in REF-walk mode. As per the Documentation, we should be able to take
> spin_lock(&dentry->d_lock) in Ref-walk mode whenever required?
> 
> 
> With smp_wmb(), if added, should add a small latency in both
> RCU/REF-walk. But should be able to cover all the cases in general related
> to dentry->type & dentry->inode ordering. Though, we don't have any other
> race conditions reported or tested, other than the one, mentioned in this
> email.
> 
> Confused :(
> 
> 
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 209c51a5226c..a3145594da1c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1557,6 +1557,7 @@ static int lookup_fast(struct nameidata *nd,
>         struct dentry *dentry, *parent = nd->path.dentry;
>         int status = 1;
>         int err;
> +       bool negative;
> 
>         /*
>          * Rename seqlock is not required here because in the off chance
> @@ -1565,7 +1566,6 @@ static int lookup_fast(struct nameidata *nd,
>          */
>         if (nd->flags & LOOKUP_RCU) {
>                 unsigned seq;
> -               bool negative;
>                 dentry = __d_lookup_rcu(parent, &nd->last, &seq);
>                 if (unlikely(!dentry)) {
>                         if (unlazy_walk(nd))
> @@ -1623,7 +1623,11 @@ static int lookup_fast(struct nameidata *nd,
>                 dput(dentry);
>                 return status;
>         }
> -       if (unlikely(d_is_negative(dentry))) {
> +
> +       spin_lock(&dentry->d_lock);
> +       negative = d_is_negative(dentry);
> +       spin_unlock(&dentry->d_lock);
> +       if (unlikely(negative)) {
>                 dput(dentry);
>                 return -ENOENT;
>         }
> 
> 
> Regards
> Ritesh
> 

  reply	other threads:[~2019-09-03 13:00 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 11:58 [RFC] - vfs: Null pointer dereference issue with symlink create and read of symlink Ritesh Harjani
2019-09-03 12:59 ` Gao Xiang [this message]
2019-09-03 13:41   ` Ritesh Harjani
2019-09-03 13:58     ` Gao Xiang
2019-09-04 14:39     ` Jeff Layton
2019-09-06  5:17       ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190903125946.GA11069@hsiangkao-HP-ZHAN-66-Pro-G1 \
    --to=hsiangkao@aol.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=riteshh@linux.ibm.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wugyuan@cn.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).