All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: Gao Xiang <hsiangkao@aol.com>, Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org, aneesh.kumar@linux.ibm.com,
	Jeff Layton <jlayton@kernel.org>,
	wugyuan@cn.ibm.com
Subject: Re: [RFC] - vfs: Null pointer dereference issue with symlink create and read of symlink
Date: Tue, 3 Sep 2019 19:11:28 +0530	[thread overview]
Message-ID: <20190903134129.EC5E6A405B@b06wcsmtp001.portsmouth.uk.ibm.com> (raw)
In-Reply-To: <20190903125946.GA11069@hsiangkao-HP-ZHAN-66-Pro-G1>



On 9/3/19 6:29 PM, Gao Xiang wrote:
> On Tue, Sep 03, 2019 at 05:28:26PM +0530, Ritesh Harjani wrote:
>> Hi Viro/All,
>>
>> Could you please review below issue and it's proposed solutions.
>> If you could let me know which of the two you think will be a better
>> approach to solve this or in case if you have any other better approach, I
>> can prepare and submit a official patch with that.
>>
>>
>>
>> Issue signature:-
>>   [NIP  : trailing_symlink+80]
>>   [LR   : trailing_symlink+1092]
>>   #4 [c00000198069bb70] trailing_symlink at c0000000004bae60  (unreliable)
>>   #5 [c00000198069bc00] path_openat at c0000000004bdd14
>>   #6 [c00000198069bc90] do_filp_open at c0000000004c0274
>>   #7 [c00000198069bdb0] do_sys_open at c00000000049b248
>>   #8 [c00000198069be30] system_call at c00000000000b388
>>
>>
>>
>> Test case:-
>> shell-1 - "while [ 1 ]; do cat /gpfs/g1/testdir/file3; sleep 1; done"
>> shell-2 - "while [ 1 ]; do ln -s /gpfs/g1/testdir/file1
>> /gpfs/g1/testdir/file3; sleep 1; rm /gpfs/g1/testdir/file3 sleep 1; done
>>
>>
>>
>> Problem description:-
>> In some filesystems like GPFS below described scenario may happen on some
>> platforms (Reported-By:- wugyuan)
>>
>> Here, two threads are being run in 2 different shells. Thread-1(cat) does
>> cat of the symlink and Thread-2(ln) is creating the symlink.
>>
>> Now on any platform with GPFS like filesystem, if CPU does out-of-order
>> execution (or any kind of re-ordering due compiler optimization?) in
>> function __d_set_and_inode_type(), then we see a NULL pointer dereference
>> due to inode->i_uid.
>>
>> This happens because in lookup_fast in nonRCU path or say REF-walk (i.e. in
>> else condition), we check d_is_negative() without any lock protection.
>> And since in __d_set_and_inode_type() re-ordering may happen in setting of
>> dentry->type & dentry->inode => this means that there is this tiny window
>> where things are going wrong.
>>
>>
>> (GPFS like):- Any FS with -inode_operations ->permission callback returning
>> -ECHILD in case of (mask & MAY_NOT_BLOCK) may cause this problem to happen.
>> (few e.g. found were - ocfs2, ceph, coda, afs)
>>
>> int xxx_permission(struct inode *inode, int mask)
>> {
>>           if (mask & MAY_NOT_BLOCK)
>>                   return -ECHILD;
>> 	<...>
>> }
>>
>> Wugyuan(cc), could reproduce this problem with GPFS filesystem.
>> Since, I didn't have the GPFS setup, so I tried replicating on a native FS
>> by forcing out-of-order execution in function __d_set_inode_and_type() and
>> making sure we return -ECHILD in MAY_NOT_BLOCK case in ->permission
>> operation for all inodes.
>>
>> With above changes in kernel, I could as well hit this issue on a native FS
>> too.
>>
>> (basically what we observed is link_path_walk will do nonRCU(REF-walk)
>> lookup due to may_lookup -> inode_permission return -ECHILD and then
>> unlazy_walk drops the LOOKUP_RCU flag (nd->flag). After that below race is
>> possible).
>>
>>
>>
>> Sequence of events:-
>>
>> Thread-2(Comm: ln)		Thread-1(Comm: cat)
>>
>> 				dentry = __d_lookup() //nonRCU
>>
>> __d_set_and_inode_type() (Out-of-order execution)
>> 	flags = READ_ONCE(dentry->d_flags);
>> 	flags &= ~(DCACHE_ENTRY_TYPE | DCACHE_FALLTHRU);
>> 	flags |= type_flags;
>> 	WRITE_ONCE(dentry->d_flags, flags);
>>
>> 					
>> 				if (unlikely(d_is_negative()) // fails
>>    					{}
>> 				// since type is already updated in
>> 				// Thread-2 in parallel but inode
>> 				// not yet set.
>> 				// d_is_negative returns false
>>
>> 				*inode = d_backing_inode(path->dentry);
>> 				// means inode is still NULL
>>
>> 	dentry->d_inode = inode;
>> 	
>> 				trailing_symlink()
>> 					may_follow_link()
>> 						inode = nd->link_inode;
>> 						// nd->link_inode = NULL
>> 						//Then it crashes while
>> 						//doing inode->i_uid
>> 					
>> 	
> 
> It seems much similar to
> https://lore.kernel.org/r/20190419084810.63732-1-houtao1@huawei.com/

Thanks, yes two same symptoms with different use cases.
But except the fact that here, we see the issue with GPFS quite 
frequently. So let's hope that we could have some solution to this 
problem in upstream.

 From the thread:-
 >> We could simply use d_really_is_negative() there, avoiding all that
 >> mess.  If and when we get around to whiteouts-in-dcache (i.e. if
 >> unionfs series gets resurrected), we can revisit that

I didn't get this part. Does it mean, d_really_is_negative can only be 
used, once whiteouts-in-dcache series is resurrected?
If yes, meanwhile could we have any other solution in place?

-ritesh


  reply	other threads:[~2019-09-03 13:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 11:58 [RFC] - vfs: Null pointer dereference issue with symlink create and read of symlink Ritesh Harjani
2019-09-03 12:59 ` Gao Xiang
2019-09-03 13:41   ` Ritesh Harjani [this message]
2019-09-03 13:58     ` Gao Xiang
2019-09-04 14:39     ` Jeff Layton
2019-09-06  5:17       ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190903134129.EC5E6A405B@b06wcsmtp001.portsmouth.uk.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=hsiangkao@aol.com \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=wugyuan@cn.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.