linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS/d_splice_alias breakage
@ 2016-06-02 22:46 Oleg Drokin
  2016-06-02 23:59 ` [PATCH] Allow d_splice_alias to accept hashed dentries green
                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-02 22:46 UTC (permalink / raw)
  To: Al Viro, J. Bruce Fields
  Cc: linux-nfs, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

Hello!

   I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
   that seems to be present since at least 2014 that lets users crash nfs client locally.

   Here's some interesting comment quote first from d_obtain_alias:

>  * Cluster filesystems may call this function with a negative, hashed dentry.
>  * In that case, we know that the inode will be a regular file, and also this
>  * will only occur during atomic_open. So we need to check for the dentry
>  * being already hashed only in the final case.
>  */
> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> {
>         if (IS_ERR(inode))
>                 return ERR_CAST(inode);
> 
>         BUG_ON(!d_unhashed(dentry));
        ^^^^^^^^^^^^^^ - This does not align well with the quote above.

It got imported here by commit b5ae6b15bd73e35b129408755a0804287a87e041

Removing the BUG_ON headon is not going to work since the d_rehash of old
is now folded into __d_add and we might not want to move that condition there.

doing an
if (d_unhashed(dentry))
   __d_add(dentry, inode);
else
   d_instantiate(dentry, inode);

also does not look super pretty and who knows how all of the previous code
like _d_find_any_alias would react.

Al, I think you might want to chime in here on how to better handle this?

The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
was calling d_materialise_unique() that did require the dentry to be unhashed.

Not sure how this was not hit earlier. The crash looks like this (I added
a printk to ensure this is what is going on indeed and not some other weird race):

[   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70
[   64.489549] ------------[ cut here ]------------
[   64.489642] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[   64.489750] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   64.489853] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw virtio_blk
[   64.491111] CPU: 6 PID: 7125 Comm: file_concat.sh Not tainted 4.7.0-rc1-vm-nfs+ #99
[   64.492069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   64.492489] task: ffff880110a801c0 ti: ffff8800c283c000 task.ti: ffff8800c283c000
[   64.493159] RIP: 0010:[<ffffffff8124292f>]  [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
[   64.493866] RSP: 0018:ffff8800c283fb20  EFLAGS: 00010282
[   64.494238] RAX: 0000000000000067 RBX: ffff88010f500c70 RCX: 0000000000000000
[   64.494625] RDX: 0000000000000067 RSI: ffff8800d08d2ed0 RDI: ffff88010f500c70
[   64.495021] RBP: ffff8800c283fb78 R08: 0000000000000001 R09: 0000000000000000
[   64.495407] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8800d08d2ed0
[   64.495804] R13: 0000000000000000 R14: ffff8800d4a56f00 R15: ffff8800d0bb8c70
[   64.496199] FS:  00007f94ae25c700(0000) GS:ffff88011f580000(0000) knlGS:0000000000000000
[   64.496859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   64.497235] CR2: 000055e5d3fb46a4 CR3: 00000000ce364000 CR4: 00000000000006e0
[   64.497626] Stack:
[   64.497961]  ffffffff8132154e ffff8800c283fb68 ffffffff81325916 0000000000000000
[   64.498765]  0000000000000000 ffff8801109459c0 0000000000000000 ffff8800d0bb8c70
[   64.499578]  ffff8800c283fba0 ffff8800d08d2ed0 ffff8800cb927080 ffff8800c283fc18
[   64.500385] Call Trace:
[   64.500727]  [<ffffffff8132154e>] ? nfs_lookup+0x17e/0x320
[   64.501103]  [<ffffffff81325916>] ? __put_nfs_open_context+0xc6/0xf0
[   64.501477]  [<ffffffff8132252b>] nfs_atomic_open+0x8b/0x430
[   64.501850]  [<ffffffff81236def>] lookup_open+0x29f/0x7a0
[   64.502222]  [<ffffffff812377ce>] path_openat+0x4de/0xfc0
[   64.502591]  [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
[   64.502964]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
[   64.503347]  [<ffffffff817f4977>] ? _raw_spin_unlock+0x27/0x40
[   64.503719]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
[   64.504097]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
[   64.504465]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
[   64.504831]  [<ffffffff817f5176>] entry_SYSCALL_64_fastpath+0x1e/0xad
[   64.505218] Code: 01 e8 46 20 5b 00 85 db 74 0b 4c 89 ff 4c 63 fb e8 87 d8 ff ff 4c 89 e7 e8 2f 3c 00 00 4c 89 f8 e9 5e fe ff ff 0f 0b 48 89 f8 c3 <0f> 0b 48 8b 43 40 4c 8b 78 58 49 8d 8f 58 03 00 00 eb 02 f3 90 
[   64.508754] RIP  [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
[   64.509176]  RSP <ffff8800c283fb20>

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] Allow d_splice_alias to accept hashed dentries
  2016-06-02 22:46 NFS/d_splice_alias breakage Oleg Drokin
@ 2016-06-02 23:59 ` green
  2016-06-03  0:25   ` Oleg Drokin
  2016-06-03  0:44 ` NFS/d_splice_alias breakage Trond Myklebust
  2016-06-03  3:37 ` Al Viro
  2 siblings, 1 reply; 48+ messages in thread
From: green @ 2016-06-02 23:59 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

The comment at the top of the function already permits it,
just take out the assetrtion and update __d_add to
be aware of this. Move the BUG_ON to d_add which is the
other user of __d_add and presumably we do not want to
allow hashed dentries as input there.

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
So seeing how there's only one other __d_add user, I just moved the BUG_ON
to d_add and allowed __d_add to not call _d_rehash if the dentry is
already hashed.
This seems to be working as expected and I don't see any problems with it.
Of course I am probably missing some other important consideration ;)

 fs/dcache.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ad4a542..0e35049 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2565,7 +2565,8 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
 		raw_write_seqcount_end(&dentry->d_seq);
 		__fsnotify_d_instantiate(dentry);
 	}
-	_d_rehash(dentry);
+	if (d_unhashed(dentry))
+		_d_rehash(dentry);
 	if (dir)
 		end_dir_add(dir, n);
 	spin_unlock(&dentry->d_lock);
@@ -2584,6 +2585,8 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
 
 void d_add(struct dentry *entry, struct inode *inode)
 {
+	BUG_ON(!d_unhashed(entry));
+
 	if (inode) {
 		security_d_instantiate(entry, inode);
 		spin_lock(&inode->i_lock);
@@ -2986,8 +2989,6 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
 	if (IS_ERR(inode))
 		return ERR_CAST(inode);
 
-	BUG_ON(!d_unhashed(dentry));
-
 	if (!inode)
 		goto out;
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] Allow d_splice_alias to accept hashed dentries
  2016-06-02 23:59 ` [PATCH] Allow d_splice_alias to accept hashed dentries green
@ 2016-06-03  0:25   ` Oleg Drokin
  0 siblings, 0 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03  0:25 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 2, 2016, at 7:59 PM, green@linuxhacker.ru wrote:

> From: Oleg Drokin <green@linuxhacker.ru>
> 
> The comment at the top of the function already permits it,
> just take out the assetrtion and update __d_add to
> be aware of this. Move the BUG_ON to d_add which is the
> other user of __d_add and presumably we do not want to
> allow hashed dentries as input there.
> 
> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> ---
> So seeing how there's only one other __d_add user, I just moved the BUG_ON
> to d_add and allowed __d_add to not call _d_rehash if the dentry is
> already hashed.
> This seems to be working as expected and I don't see any problems with it.
> Of course I am probably missing some other important consideration ;)

I guess I was too fast to assume it was fine, or I just hit another bug,
but with the patch below I am now getting:

[ 1120.040403] BUG: unable to handle kernel paging request at ffff880119fc3000
[ 1120.040883] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 1120.041302] PGD 2dca067 PUD 2dcd067 PMD 11ff31067 PTE 8000000119fc3060
[ 1120.041699] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1120.042019] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw virtio_blk
[ 1120.042803] CPU: 5 PID: 16997 Comm: file_concat.sh Not tainted 4.7.0-rc1-vm-nfs+ #101
[ 1120.043467] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1120.043826] task: ffff8800c34f40c0 ti: ffff8800c15b8000 task.ti: ffff8800c15b8000
[ 1120.044446] RIP: 0010:[<ffffffff817f87d4>]  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 1120.045099] RSP: 0018:ffff8800c15bbba0  EFLAGS: 00010286
[ 1120.045439] RAX: ffff8800d20f0f30 RBX: ffff8800d20f0f00 RCX: ffff880119fc3000
[ 1120.045815] RDX: ffff880119fc3000 RSI: 0000000000000000 RDI: 0000000000000001
[ 1120.046168] RBP: ffff8800c15bbbe8 R08: 0000000000000000 R09: ffff8800d20f0ed0
[ 1120.046526] R10: 0000000000000059 R11: ffff8800c34f4c38 R12: ffff8800d1de9ed0
[ 1120.046892] R13: ffff8800c15bbdf0 R14: ffff8800d20f0f50 R15: 00000000f7a5b27a
[ 1120.047248] FS:  00007f57e93d8700(0000) GS:ffff88011f540000(0000) knlGS:0000000000000000
[ 1120.047890] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1120.048223] CR2: ffff880119fc3000 CR3: 00000000d4166000 CR4: 00000000000006e0
[ 1120.048596] Stack:
[ 1120.048892]  ffffffff81243a35 0000000000000096 ffff880119fc3000 ffffffff00000001
[ 1120.049537]  000000000000c2aa ffff8800c15bbdf0 ffff8800d1de9ed0 ffff8800c15bbde0
[ 1120.050189]  ffff8800c15bbdf0 ffff8800c15bbc18 ffffffff81243c64 ffffffff81236c4e
[ 1120.050846] Call Trace:
[ 1120.051146]  [<ffffffff81243a35>] ? __d_lookup+0x5/0x1b0
[ 1120.051485]  [<ffffffff81243c64>] d_lookup+0x84/0xb0
[ 1120.051828]  [<ffffffff81236c4e>] ? lookup_open+0xfe/0x7a0
[ 1120.052160]  [<ffffffff81236c4e>] lookup_open+0xfe/0x7a0
[ 1120.052500]  [<ffffffff812377ce>] path_openat+0x4de/0xfc0
[ 1120.052846]  [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
[ 1120.053177]  [<ffffffff81233110>] ? lock_rename+0x100/0x100
[ 1120.053524]  [<ffffffff817f4957>] ? _raw_spin_unlock+0x27/0x40
[ 1120.053874]  [<ffffffff8124875c>] ? __alloc_fd+0xbc/0x170
[ 1120.054202]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
[ 1120.054540]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
[ 1120.054878]  [<ffffffff817f5136>] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 1120.055221] Code: e1 03 49 d3 e8 e9 41 ae a4 ff 48 8d 0a 48 83 e1 f8 4c 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 00 af a4 ff 48 8d 0a 48 83 e1 f8 <4c> 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 7b b3 a4 ff b9 f2 
[ 1120.056560] RIP  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 1120.056909]  RSP <ffff8800c15bbba0>
[ 1120.057220] CR2: ffff880119fc3000

> 
> fs/dcache.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index ad4a542..0e35049 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -2565,7 +2565,8 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
> 		raw_write_seqcount_end(&dentry->d_seq);
> 		__fsnotify_d_instantiate(dentry);
> 	}
> -	_d_rehash(dentry);
> +	if (d_unhashed(dentry))
> +		_d_rehash(dentry);
> 	if (dir)
> 		end_dir_add(dir, n);
> 	spin_unlock(&dentry->d_lock);
> @@ -2584,6 +2585,8 @@ static inline void __d_add(struct dentry *dentry, struct inode *inode)
> 
> void d_add(struct dentry *entry, struct inode *inode)
> {
> +	BUG_ON(!d_unhashed(entry));
> +
> 	if (inode) {
> 		security_d_instantiate(entry, inode);
> 		spin_lock(&inode->i_lock);
> @@ -2986,8 +2989,6 @@ struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> 	if (IS_ERR(inode))
> 		return ERR_CAST(inode);
> 
> -	BUG_ON(!d_unhashed(dentry));
> -
> 	if (!inode)
> 		goto out;
> 
> -- 
> 2.1.0

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-02 22:46 NFS/d_splice_alias breakage Oleg Drokin
  2016-06-02 23:59 ` [PATCH] Allow d_splice_alias to accept hashed dentries green
@ 2016-06-03  0:44 ` Trond Myklebust
  2016-06-03  0:54   ` Oleg Drokin
  2016-06-03  3:28   ` Al Viro
  2016-06-03  3:37 ` Al Viro
  2 siblings, 2 replies; 48+ messages in thread
From: Trond Myklebust @ 2016-06-03  0:44 UTC (permalink / raw)
  To: Oleg Drokin, Al Viro, J. Bruce Fields
  Cc: linux-nfs, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>



On 6/2/16, 18:46, "linux-nfs-owner@vger.kernel.org on behalf of Oleg Drokin" <linux-nfs-owner@vger.kernel.org on behalf of green@linuxhacker.ru> wrote:

>Hello!
>
>   I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
>   that seems to be present since at least 2014 that lets users crash nfs client locally.
>
>   Here's some interesting comment quote first from d_obtain_alias:
>
>>  * Cluster filesystems may call this function with a negative, hashed dentry.
>>  * In that case, we know that the inode will be a regular file, and also this
>>  * will only occur during atomic_open. So we need to check for the dentry
>>  * being already hashed only in the final case.
>>  */
>> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
>> {
>>         if (IS_ERR(inode))
>>                 return ERR_CAST(inode);
>> 
>>         BUG_ON(!d_unhashed(dentry));
>        ^^^^^^^^^^^^^^ - This does not align well with the quote above.
>
>It got imported here by commit b5ae6b15bd73e35b129408755a0804287a87e041
>
>Removing the BUG_ON headon is not going to work since the d_rehash of old
>is now folded into __d_add and we might not want to move that condition there.
>
>doing an
>if (d_unhashed(dentry))
>   __d_add(dentry, inode);
>else
>   d_instantiate(dentry, inode);
>
>also does not look super pretty and who knows how all of the previous code
>like _d_find_any_alias would react.
>
>Al, I think you might want to chime in here on how to better handle this?
>
>The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
>was calling d_materialise_unique() that did require the dentry to be unhashed.
>
>Not sure how this was not hit earlier. The crash looks like this (I added
>a printk to ensure this is what is going on indeed and not some other weird race):
>
>[   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70
>[   64.489549] ------------[ cut here ]------------
>[   64.489642] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
>[   64.489750] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
>[   64.489853] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw virtio_blk
>[   64.491111] CPU: 6 PID: 7125 Comm: file_concat.sh Not tainted 4.7.0-rc1-vm-nfs+ #99
>[   64.492069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>[   64.492489] task: ffff880110a801c0 ti: ffff8800c283c000 task.ti: ffff8800c283c000
>[   64.493159] RIP: 0010:[<ffffffff8124292f>]  [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
>[   64.493866] RSP: 0018:ffff8800c283fb20  EFLAGS: 00010282
>[   64.494238] RAX: 0000000000000067 RBX: ffff88010f500c70 RCX: 0000000000000000
>[   64.494625] RDX: 0000000000000067 RSI: ffff8800d08d2ed0 RDI: ffff88010f500c70
>[   64.495021] RBP: ffff8800c283fb78 R08: 0000000000000001 R09: 0000000000000000
>[   64.495407] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8800d08d2ed0
>[   64.495804] R13: 0000000000000000 R14: ffff8800d4a56f00 R15: ffff8800d0bb8c70
>[   64.496199] FS:  00007f94ae25c700(0000) GS:ffff88011f580000(0000) knlGS:0000000000000000
>[   64.496859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[   64.497235] CR2: 000055e5d3fb46a4 CR3: 00000000ce364000 CR4: 00000000000006e0
>[   64.497626] Stack:
>[   64.497961]  ffffffff8132154e ffff8800c283fb68 ffffffff81325916 0000000000000000
>[   64.498765]  0000000000000000 ffff8801109459c0 0000000000000000 ffff8800d0bb8c70
>[   64.499578]  ffff8800c283fba0 ffff8800d08d2ed0 ffff8800cb927080 ffff8800c283fc18
>[   64.500385] Call Trace:
>[   64.500727]  [<ffffffff8132154e>] ? nfs_lookup+0x17e/0x320
>[   64.501103]  [<ffffffff81325916>] ? __put_nfs_open_context+0xc6/0xf0
>[   64.501477]  [<ffffffff8132252b>] nfs_atomic_open+0x8b/0x430
>[   64.501850]  [<ffffffff81236def>] lookup_open+0x29f/0x7a0
>[   64.502222]  [<ffffffff812377ce>] path_openat+0x4de/0xfc0
>[   64.502591]  [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
>[   64.502964]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
>[   64.503347]  [<ffffffff817f4977>] ? _raw_spin_unlock+0x27/0x40
>[   64.503719]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
>[   64.504097]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
>[   64.504465]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
>[   64.504831]  [<ffffffff817f5176>] entry_SYSCALL_64_fastpath+0x1e/0xad
>[   64.505218] Code: 01 e8 46 20 5b 00 85 db 74 0b 4c 89 ff 4c 63 fb e8 87 d8 ff ff 4c 89 e7 e8 2f 3c 00 00 4c 89 f8 e9 5e fe ff ff 0f 0b 48 89 f8 c3 <0f> 0b 48 8b 43 40 4c 8b 78 58 49 8d 8f 58 03 00 00 eb 02 f3 90 
>[   64.508754] RIP  [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
>[   64.509176]  RSP <ffff8800c283fb20>

That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:

                d_drop(dentry);
                alias = d_exact_alias(dentry, state->inode);
                if (!alias)
                        alias = d_splice_alias(igrab(state->inode), dentry);

IOW: something would have to be acting between the d_drop() and d_splice_alias() above...

Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  0:44 ` NFS/d_splice_alias breakage Trond Myklebust
@ 2016-06-03  0:54   ` Oleg Drokin
  2016-06-03  3:26     ` Al Viro
  2016-06-03  3:28   ` Al Viro
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03  0:54 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Al Viro, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 2, 2016, at 8:44 PM, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:
> 
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);

We live in reality where there are no more tight races left.
1 instruction-race is now huge due to commonplace cpu-overcommitted VMs (and hyperthreading), nobody knows
when is your VM cpu going to be preempted by the host (while the other
cpus of the same VM are running ahead full steam).
Stuff that took ages to trigger again to better understand is instant now.

> IOW: something would have to be acting between the d_drop() and d_splice_alias() above…

The other CPU ;)
I checked the readdirplus theory and that's not it, there's some sort of another lookup going, this dentry we crashed on never came through nfs_prime_dcache.

> Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

Note this is not a new 2016 vintage bug. I hit this in 3.10 as well (baseline kernel from one well known enterprise vendor, only there we hit it in d_realise_unique).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  0:54   ` Oleg Drokin
@ 2016-06-03  3:26     ` Al Viro
  2016-06-03  3:38       ` Al Viro
  0 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03  3:26 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Trond Myklebust, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:

> > Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

d_alloc_parallel() use by all parties involved.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  0:44 ` NFS/d_splice_alias breakage Trond Myklebust
  2016-06-03  0:54   ` Oleg Drokin
@ 2016-06-03  3:28   ` Al Viro
  1 sibling, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03  3:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Oleg Drokin, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 12:44:51AM +0000, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:
> 
>                 d_drop(dentry);
>                 alias = d_exact_alias(dentry, state->inode);
>                 if (!alias)
>                         alias = d_splice_alias(igrab(state->inode), dentry);
> 
> IOW: something would have to be acting between the d_drop() and d_splice_alias() above...

How?  dentry is
	* negative (it would better be, or we are _really_ fucked)
	* unhashed

How does whoever's rehashing it stumble across that thing?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-02 22:46 NFS/d_splice_alias breakage Oleg Drokin
  2016-06-02 23:59 ` [PATCH] Allow d_splice_alias to accept hashed dentries green
  2016-06-03  0:44 ` NFS/d_splice_alias breakage Trond Myklebust
@ 2016-06-03  3:37 ` Al Viro
  2016-06-03  3:43   ` Oleg Drokin
  2 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03  3:37 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
> Hello!
> 
>    I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
>    that seems to be present since at least 2014 that lets users crash nfs client locally.

> >  * Cluster filesystems may call this function with a negative, hashed dentry.
> >  * In that case, we know that the inode will be a regular file, and also this
> >  * will only occur during atomic_open. So we need to check for the dentry
> >  * being already hashed only in the final case.

Comment is long obsolete and should've been removed.  "Cluster filesystem"
in question was GFS2 and it had been dealt with there.  Mea culpa - should've
removed the comment as soon as that was done.

> Removing the BUG_ON headon is not going to work since the d_rehash of old
> is now folded into __d_add and we might not want to move that condition there.

No, it is not.  It really should not be called that way.

> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
> was calling d_materialise_unique() that did require the dentry to be unhashed.
> 
> Not sure how this was not hit earlier. The crash looks like this (I added
> a printk to ensure this is what is going on indeed and not some other weird race):

> [   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70

Which of the call sites had that been and how does one reproduce that fun?
If you feel that posting a reproducer in the open is a bad idea, just send
it off-list...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  3:26     ` Al Viro
@ 2016-06-03  3:38       ` Al Viro
  0 siblings, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03  3:38 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Trond Myklebust, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 04:26:25AM +0100, Al Viro wrote:
> On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:
> 
> > > Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?
> 
> d_alloc_parallel() use by all parties involved.

Grr...  That should've been a reply to Trond, obviously.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  3:37 ` Al Viro
@ 2016-06-03  3:43   ` Oleg Drokin
  2016-06-03  4:26     ` Al Viro
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03  3:43 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 2, 2016, at 11:37 PM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
>> Hello!
>> 
>>   I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
>>   that seems to be present since at least 2014 that lets users crash nfs client locally.
> 
>>> * Cluster filesystems may call this function with a negative, hashed dentry.
>>> * In that case, we know that the inode will be a regular file, and also this
>>> * will only occur during atomic_open. So we need to check for the dentry
>>> * being already hashed only in the final case.
> 
> Comment is long obsolete and should've been removed.  "Cluster filesystem"
> in question was GFS2 and it had been dealt with there.  Mea culpa - should've
> removed the comment as soon as that was done.

Oh, ok. I assumed it was still valid, esp. considering the issue at hand where
what it describes actually happens and NFS is also a cluster filesystem of sorts ;)

>> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
>> was calling d_materialise_unique() that did require the dentry to be unhashed.
>> 
>> Not sure how this was not hit earlier. The crash looks like this (I added
>> a printk to ensure this is what is going on indeed and not some other weird race):
> 
>> [   64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70
> 
> Which of the call sites had that been and how does one reproduce that fun?
> If you feel that posting a reproducer in the open is a bad idea, just send
> it off-list...

This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

I'll send you the scripts with instructions separately for now.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  3:43   ` Oleg Drokin
@ 2016-06-03  4:26     ` Al Viro
  2016-06-03  4:42       ` Al Viro
                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03  4:26 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:

> > Which of the call sites had that been and how does one reproduce that fun?
> > If you feel that posting a reproducer in the open is a bad idea, just send
> > it off-list...
> 
> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

Bloody hell...  Was that sucker hashed on the entry into nfs_lookup()?
If we get that with call as a method, we are really fucked.

<greps>

Ho-hum...  One of the goto no_open; in nfs_atomic_open()?  That would
mean a stale negative dentry in dcache that has escaped revalidation...
Wait a minute, didn't nfs ->d_revalidate() skip everything in some
cases, leaving it to ->atomic_open()?

Looks like the right thing to do would be to do d_drop() at no_open:,
just before we call nfs_lookup().  If not earlier, actually...  How
about the following?

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		err = PTR_ERR(inode);
 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 		put_nfs_open_context(ctx);
+		d_drop(dentry);
 		switch (err) {
 		case -ENOENT:
-			d_drop(dentry);
 			d_add(dentry, NULL);
 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 			break;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  4:26     ` Al Viro
@ 2016-06-03  4:42       ` Al Viro
  2016-06-03  4:53         ` Al Viro
  2016-06-03  4:58       ` Oleg Drokin
  2016-06-03 16:38       ` Dcache oops Oleg Drokin
  2 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03  4:42 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:

> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup().  If not earlier, actually...  How
> about the following?

A bit of rationale: dentry in question is negative and attempt to open
it has just failed; in case it's really negative we did that d_drop()
anyway (and then unconditionally rehashed it).  In case when we proceed to
nfs_lookup() and it does not fail, we'll have it rehashed there (with the
right inode).  What do we lose from doing d_drop() on *any* error here?
It's negative, with dubious validity.  In the worst case, we had open
and lookup fail, but it's just us - the sucker really is negative and
somebody else would be able to revalidate it.  If we drop it here (and
not rehash later), that somebody else will have to allocate an in-core
dentry before doing lookup or atomic_open.  Which is negligible extra
burden.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
>  		err = PTR_ERR(inode);
>  		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
>  		put_nfs_open_context(ctx);
> +		d_drop(dentry);
>  		switch (err) {
>  		case -ENOENT:
> -			d_drop(dentry);
>  			d_add(dentry, NULL);
>  			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
>  			break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  4:42       ` Al Viro
@ 2016-06-03  4:53         ` Al Viro
  0 siblings, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03  4:53 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 05:42:53AM +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:
> 
> > Looks like the right thing to do would be to do d_drop() at no_open:,
> > just before we call nfs_lookup().  If not earlier, actually...  How
> > about the following?
> 
> A bit of rationale: dentry in question is negative and attempt to open
> it has just failed; in case it's really negative we did that d_drop()
> anyway (and then unconditionally rehashed it).  In case when we proceed to
> nfs_lookup() and it does not fail, we'll have it rehashed there (with the
> right inode).  What do we lose from doing d_drop() on *any* error here?
> It's negative, with dubious validity.  In the worst case, we had open
> and lookup fail, but it's just us - the sucker really is negative and
> somebody else would be able to revalidate it.  If we drop it here (and
> not rehash later), that somebody else will have to allocate an in-core
> dentry before doing lookup or atomic_open.  Which is negligible extra
> burden.

I suspect that it got broken in commit 275bb3078 (NFSv4: Move dentry
instantiation into the NFSv4-specific atomic open code).  Prior to that
commit, d_drop() was there (error or no error).  Looks like 3.10+, indeed.
I agree that we shouldn't drop it on successful open, but it needs to be
done on errors.  All of them, not just ENOENT, as in that commit.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  4:26     ` Al Viro
  2016-06-03  4:42       ` Al Viro
@ 2016-06-03  4:58       ` Oleg Drokin
  2016-06-03  5:56         ` Al Viro
  2016-06-03 16:38       ` Dcache oops Oleg Drokin
  2 siblings, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03  4:58 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 3, 2016, at 12:26 AM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:
> 
>>> Which of the call sites had that been and how does one reproduce that fun?
>>> If you feel that posting a reproducer in the open is a bad idea, just send
>>> it off-list...
>> 
>> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.
> 
> Bloody hell...  Was that sucker hashed on the entry into nfs_lookup()?
> If we get that with call as a method, we are really fucked.
> 
> <greps>
> 
> Ho-hum...  One of the goto no_open; in nfs_atomic_open()?  That would
> mean a stale negative dentry in dcache that has escaped revalidation...
> Wait a minute, didn't nfs ->d_revalidate() skip everything in some
> cases, leaving it to ->atomic_open()?
> 
> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup().  If not earlier, actually...  How
> about the following?

This one cures the insta-crash I was having, and I see no other ill-effects so far.

> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		err = PTR_ERR(inode);
> 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 		put_nfs_open_context(ctx);
> +		d_drop(dentry);
> 		switch (err) {
> 		case -ENOENT:
> -			d_drop(dentry);
> 			d_add(dentry, NULL);
> 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> 			break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  4:58       ` Oleg Drokin
@ 2016-06-03  5:56         ` Al Viro
  2016-06-06 23:36           ` Oleg Drokin
  2016-06-20 13:25           ` Oleg Drokin
  0 siblings, 2 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03  5:56 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:

> This one cures the insta-crash I was having, and I see no other ill-effects so far.

OK...  I can take it through vfs.git, but I think it'd be better off in
NFS tree.  Is everyone OK with something like the following?

make nfs_atomic_open() call d_drop() on all ->open_context() errors.

In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed.  It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones.  Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors.  As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup().  On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).

Cc: stable@vger.kernel.org # v3.10+
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
 		err = PTR_ERR(inode);
 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
 		put_nfs_open_context(ctx);
+		d_drop(dentry);
 		switch (err) {
 		case -ENOENT:
-			d_drop(dentry);
 			d_add(dentry, NULL);
 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 			break;

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Dcache oops
  2016-06-03  4:26     ` Al Viro
  2016-06-03  4:42       ` Al Viro
  2016-06-03  4:58       ` Oleg Drokin
@ 2016-06-03 16:38       ` Oleg Drokin
  2016-06-03 18:22         ` Al Viro
  2 siblings, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 16:38 UTC (permalink / raw)
  To: Al Viro
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

I am dropping NFS people since it seems to be converting into a generic VFS/dcache bug even though you need NFS or the like to trigger it - the lookup_open path.

On Jun 3, 2016, at 12:26 AM, Al Viro wrote:

> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup().  If not earlier, actually...  How
> about the following?

So with the below patch applied I get the same crash as with the patch from me
that allowed unhashed dentries to come through.
Same workload, but this time it takes much longer to trigger, so must be some
race.

[ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
[ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 2642.365590] PGD 2dca067 PUD 2dcd067 PMD 11ff61067 PTE 8000000113f82060
[ 2642.366200] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 2642.366719] Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm nfsd drm_kms_helper ttm drm virtio_blk serio_raw
[ 2642.367852] CPU: 2 PID: 18446 Comm: cat Not tainted 4.7.0-rc1-vm-nfs+ #103
[ 2642.368372] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2642.368873] task: ffff8800d6956280 ti: ffff8800d7274000 task.ti: ffff8800d7274000
[ 2642.369810] RIP: 0010:[<ffffffff817f87d4>]  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 2642.370750] RSP: 0018:ffff8800d7277ba0  EFLAGS: 00010286
[ 2642.371239] RAX: ffff8800c3a6ff30 RBX: ffff8800c3a6ff00 RCX: ffff880113f82000
[ 2642.371765] RDX: ffff880113f82000 RSI: 0000000000000000 RDI: 0000000000000002
[ 2642.372286] RBP: ffff8800d7277be8 R08: 0000000000000000 R09: ffff8800c3a6fed0
[ 2642.372818] R10: 0000000000000059 R11: ffff8800d6956dd0 R12: ffff880111567ed0
[ 2642.373415] R13: ffff8800d7277df0 R14: ffff8800c3a6ff50 R15: 0000000084832a57
[ 2642.373940] FS:  00007fa1814a4700(0000) GS:ffff88011f480000(0000) knlGS:0000000000000000
[ 2642.374877] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2642.375378] CR2: ffff880113f82000 CR3: 00000000d605c000 CR4: 00000000000006e0
[ 2642.375881] Stack:
[ 2642.376295]  ffffffff81243a55 0000000000000096 ffff880113f82000 ffffffff00000002
[ 2642.377204]  000000000000bc04 ffff8800d7277df0 ffff880111567ed0 ffff8800d7277de0
[ 2642.378113]  ffff8800d7277df0 ffff8800d7277c18 ffffffff81243c84 ffffffff81236c4e
[ 2642.379022] Call Trace:
[ 2642.379451]  [<ffffffff81243a55>] ? __d_lookup+0x5/0x1b0
[ 2642.379920]  [<ffffffff81243c84>] d_lookup+0x84/0xb0
[ 2642.380388]  [<ffffffff81236c4e>] ? lookup_open+0xfe/0x7a0
[ 2642.380862]  [<ffffffff81236c4e>] lookup_open+0xfe/0x7a0
[ 2642.381374]  [<ffffffff81237c3f>] path_openat+0x94f/0xfc0
[ 2642.381852]  [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
[ 2642.382182]  [<ffffffff81233110>] ? lock_rename+0x100/0x100
[ 2642.382747]  [<ffffffff817f4947>] ? _raw_spin_unlock+0x27/0x40
[ 2642.383324]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
[ 2642.383864]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
[ 2642.384230]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
[ 2642.384569]  [<ffffffff817f5136>] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 2642.398718] Code: e1 03 49 d3 e8 e9 61 ae a4 ff 48 8d 0a 48 83 e1 f8 4c 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 20 af a4 ff 48 8d 0a 48 83 e1 f8 <4c> 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 9b b3 a4 ff b9 f2 
[ 2642.400892] RIP  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
[ 2642.401417]  RSP <ffff8800d7277ba0>
[ 2642.401856] CR2: ffff880113f82000

Hm, somehow crashdumping support is broken for the newish kernels on my test box, I guess
I'll try to fix it and then re-reproduce to better understand what's going on here,
this trace is all I have for now in case anybody has any immediate ideas.


> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		err = PTR_ERR(inode);
> 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 		put_nfs_open_context(ctx);
> +		d_drop(dentry);
> 		switch (err) {
> 		case -ENOENT:
> -			d_drop(dentry);
> 			d_add(dentry, NULL);
> 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> 			break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 16:38       ` Dcache oops Oleg Drokin
@ 2016-06-03 18:22         ` Al Viro
  2016-06-03 18:35           ` Oleg Drokin
  0 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03 18:22 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds

On Fri, Jun 03, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
> I am dropping NFS people since it seems to be converting into a generic VFS/dcache bug even though you need NFS or the like to trigger it - the lookup_open path.

NFS bug is real; there might very well be something else, but that d_drop()
in nfs_atomic_open() needs to be restored.

> [ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
> [ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9

*ow*
Could you dump your vmlinux (and System.map) somewhere on anonftp?
This 'bad_gs' is there simply because it's one of the few labels in
.fixup - to say anything useful we'll need to find out where we'd
really come from.

> [ 2642.369810] RIP: 0010:[<ffffffff817f87d4>]  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> [ 2642.370750] RSP: 0018:ffff8800d7277ba0  EFLAGS: 00010286
> [ 2642.371239] RAX: ffff8800c3a6ff30 RBX: ffff8800c3a6ff00 RCX: ffff880113f82000
> [ 2642.371765] RDX: ffff880113f82000 RSI: 0000000000000000 RDI: 0000000000000002
> [ 2642.372286] RBP: ffff8800d7277be8 R08: 0000000000000000 R09: ffff8800c3a6fed0
> [ 2642.372818] R10: 0000000000000059 R11: ffff8800d6956dd0 R12: ffff880111567ed0
> [ 2642.373415] R13: ffff8800d7277df0 R14: ffff8800c3a6ff50 R15: 0000000084832a57
> [ 2642.373940] FS:  00007fa1814a4700(0000) GS:ffff88011f480000(0000) knlGS:0000000000000000
> [ 2642.374877] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 2642.375378] CR2: ffff880113f82000 CR3: 00000000d605c000 CR4: 00000000000006e0
> [ 2642.375881] Stack:
> [ 2642.376295]  ffffffff81243a55 0000000000000096 ffff880113f82000 ffffffff00000002
> [ 2642.377204]  000000000000bc04 ffff8800d7277df0 ffff880111567ed0 ffff8800d7277de0
> [ 2642.378113]  ffff8800d7277df0 ffff8800d7277c18 ffffffff81243c84 ffffffff81236c4e
> [ 2642.379022] Call Trace:
> [ 2642.379451]  [<ffffffff81243a55>] ? __d_lookup+0x5/0x1b0
> [ 2642.379920]  [<ffffffff81243c84>] d_lookup+0x84/0xb0
> [ 2642.380388]  [<ffffffff81236c4e>] ? lookup_open+0xfe/0x7a0
> [ 2642.380862]  [<ffffffff81236c4e>] lookup_open+0xfe/0x7a0
> [ 2642.381374]  [<ffffffff81237c3f>] path_openat+0x94f/0xfc0
> [ 2642.381852]  [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
> [ 2642.382182]  [<ffffffff81233110>] ? lock_rename+0x100/0x100
> [ 2642.382747]  [<ffffffff817f4947>] ? _raw_spin_unlock+0x27/0x40
> [ 2642.383324]  [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
> [ 2642.383864]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
> [ 2642.384230]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
> [ 2642.384569]  [<ffffffff817f5136>] entry_SYSCALL_64_fastpath+0x1e/0xad
> [ 2642.398718] Code: e1 03 49 d3 e8 e9 61 ae a4 ff 48 8d 0a 48 83 e1 f8 4c 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 20 af a4 ff 48 8d 0a 48 83 e1 f8 <4c> 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 9b b3 a4 ff b9 f2 
> [ 2642.400892] RIP  [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> [ 2642.401417]  RSP <ffff8800d7277ba0>
> [ 2642.401856] CR2: ffff880113f82000
> 
> Hm, somehow crashdumping support is broken for the newish kernels on my test box, I guess
> I'll try to fix it and then re-reproduce to better understand what's going on here,
> this trace is all I have for now in case anybody has any immediate ideas.

PS: Oleg, fix your MUA, please - long lines in mail are bloody annoying.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 18:22         ` Al Viro
@ 2016-06-03 18:35           ` Oleg Drokin
  2016-06-03 20:07             ` Al Viro
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 18:35 UTC (permalink / raw)
  To: Al Viro
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds


On Jun 3, 2016, at 2:22 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:38:40PM -0400, Oleg Drokin wrote:
>> I am dropping NFS people since it seems to be converting into a generic VFS/dcache bug even though you need NFS or the like to trigger it - the lookup_open path.
> 
> NFS bug is real; there might very well be something else, but that d_drop()
> in nfs_atomic_open() needs to be restored.

Yes, that's what I mean - after the nfs fix, there seems to be another bug in
dcache that I can now trigger once nfs bug is no longer blocking the way.

>> [ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
>> [ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> 
> *ow*
> Could you dump your vmlinux (and System.map) somewhere on anonftp?
> This 'bad_gs' is there simply because it's one of the few labels in
> .fixup - to say anything useful we'll need to find out where we'd
> really come from.

I see.
vmlinux with debug symbols: http://knox.linuxhacker.ru/tmp/dcache/vmlinux.gz
System.map: http://knox.linuxhacker.ru/tmp/dcache/System.map.gz

>> Hm, somehow crashdumping support is broken for the newish kernels on my test box, I guess
>> I'll try to fix it and then re-reproduce to better understand what's going on here,
>> this trace is all I have for now in case anybody has any immediate ideas.
> 
> PS: Oleg, fix your MUA, please - long lines in mail are bloody annoying.

Huh, sorry about that. I kind of hoped the era of 80 columns text terminals was
mostly behind us. Time to dust off mutt, I guess.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 18:35           ` Oleg Drokin
@ 2016-06-03 20:07             ` Al Viro
  2016-06-03 21:17               ` Oleg Drokin
  2016-06-03 21:18               ` Linus Torvalds
  0 siblings, 2 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 20:07 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds

On Fri, Jun 03, 2016 at 02:35:41PM -0400, Oleg Drokin wrote:

> >> [ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
> >> [ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
> > 
> > *ow*
> > Could you dump your vmlinux (and System.map) somewhere on anonftp?
> > This 'bad_gs' is there simply because it's one of the few labels in
> > .fixup - to say anything useful we'll need to find out where we'd
> > really come from.
> 
> I see.
> vmlinux with debug symbols: http://knox.linuxhacker.ru/tmp/dcache/vmlinux.gz
> System.map: http://knox.linuxhacker.ru/tmp/dcache/System.map.gz

OK...
ffffffff817f87cd:       48 8d 0a                lea    (%rdx),%rcx
ffffffff817f87d0:       48 83 e1 f8             and    $0xfffffffffffffff8,%rcx
ffffffff817f87d4:       4c 8b 01                mov    (%rcx),%r8
ffffffff817f87d7:       8d 0a                   lea    (%rdx),%ecx
ffffffff817f87d9:       83 e1 07                and    $0x7,%ecx
ffffffff817f87dc:       c1 e1 03                shl    $0x3,%ecx
ffffffff817f87df:       49 d3 e8                shr    %cl,%r8
ffffffff817f87e2:       e9 9b b3 a4 ff          jmpq   ffffffff81243b82 <__d_lookup+0x132>

Aha...  It's load_unaligned_zeropad() from dentry_string_cmp(), hitting
a genuinely unmapped address.  That sends it into fixup, where it tries to
load an aligned word containing the address in question, in hope that
fault was on attempt to cross into the next page.  No such luck, address
was aligned in the first place (it's in %rdx - 0xffff880113f82000), so
we still oops.

The unexpected part is that unmapped address did *NOT* come from a dentry;
it's .name of qstr we were looking for.  And your call chain was
__d_lookup() <- d_lookup() <- lookup_open(), so in lookup_open() it was
nd->last.name...

Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
4e8440b?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 20:07             ` Al Viro
@ 2016-06-03 21:17               ` Oleg Drokin
  2016-06-03 21:46                 ` Al Viro
  2016-06-03 21:18               ` Linus Torvalds
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 21:17 UTC (permalink / raw)
  To: Al Viro
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds


On Jun 3, 2016, at 4:07 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 02:35:41PM -0400, Oleg Drokin wrote:
> 
>>>> [ 2642.364383] BUG: unable to handle kernel paging request at ffff880113f82000
>>>> [ 2642.365014] IP: [<ffffffff817f87d4>] bad_gs+0xd1d/0x1ba9
>>> 
>>> *ow*
>>> Could you dump your vmlinux (and System.map) somewhere on anonftp?
>>> This 'bad_gs' is there simply because it's one of the few labels in
>>> .fixup - to say anything useful we'll need to find out where we'd
>>> really come from.
>> 
>> I see.
>> vmlinux with debug symbols: http://knox.linuxhacker.ru/tmp/dcache/vmlinux.gz
>> System.map: http://knox.linuxhacker.ru/tmp/dcache/System.map.gz
> 
> OK...
> ffffffff817f87cd:       48 8d 0a                lea    (%rdx),%rcx
> ffffffff817f87d0:       48 83 e1 f8             and    $0xfffffffffffffff8,%rcx
> ffffffff817f87d4:       4c 8b 01                mov    (%rcx),%r8
> ffffffff817f87d7:       8d 0a                   lea    (%rdx),%ecx
> ffffffff817f87d9:       83 e1 07                and    $0x7,%ecx
> ffffffff817f87dc:       c1 e1 03                shl    $0x3,%ecx
> ffffffff817f87df:       49 d3 e8                shr    %cl,%r8
> ffffffff817f87e2:       e9 9b b3 a4 ff          jmpq   ffffffff81243b82 <__d_lookup+0x132>
> 
> Aha...  It's load_unaligned_zeropad() from dentry_string_cmp(), hitting
> a genuinely unmapped address.  That sends it into fixup, where it tries to
> load an aligned word containing the address in question, in hope that
> fault was on attempt to cross into the next page.  No such luck, address
> was aligned in the first place (it's in %rdx - 0xffff880113f82000), so
> we still oops.
> 
> The unexpected part is that unmapped address did *NOT* come from a dentry;
> it's .name of qstr we were looking for.  And your call chain was
> __d_lookup() <- d_lookup() <- lookup_open(), so in lookup_open() it was
> nd->last.name...
> 
> Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
> 4e8440b?

Well, that was faster than I expected. 4e8440b triggers right away, so I guess
there's no point in trying the later ones?
BTW, just to confirm you are noticing - this is a DEBUG_PAGEALLOC build,
so all freed memory is unmapped which is likely causing this oops - as a sign
of use after free.

[   54.990119] BUG: unable to handle kernel paging request at ffff8800d2b7f000
[   54.990423] IP: [<ffffffff817f91b6>] bad_gs+0xcff/0x1b89
[   54.990598] PGD 2dca067 PUD 11f900067 PMD 11f86a067 PTE 80000000d2b7f060
[   54.990942] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   54.991044] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr i2c_piix4 acpi_cpufreq tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw virtio_blk
[   54.992301] CPU: 7 PID: 5550 Comm: file_concat.sh Not tainted 4.6.0-4e8440b+ #1
[   54.993019] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   54.993407] task: ffff8800d6ec6200 ti: ffff880110f44000 task.ti: ffff880110f44000
[   54.994112] RIP: 0010:[<ffffffff817f91b6>]  [<ffffffff817f91b6>] bad_gs+0xcff/0x1b89
[   54.994870] RSP: 0018:ffff880110f47ba0  EFLAGS: 00010286
[   54.995288] RAX: ffff8800d2760f30 RBX: ffff8800d2760f00 RCX: ffff8800d2b7f000
[   54.995736] RDX: ffff8800d2b7f000 RSI: 0000000000000000 RDI: 0000000000000001
[   54.996154] RBP: ffff880110f47be8 R08: 0000000000000000 R09: ffff8800d2760ed0
[   54.996628] R10: 0000000000000059 R11: ffff8800d6ec6d78 R12: ffff8800d2694ed0
[   54.997044] R13: ffff880110f47df0 R14: ffff8800d2760f50 R15: 00000000b761b37e
[   54.997450] FS:  00007fe8804d2700(0000) GS:ffff88011f5c0000(0000) knlGS:0000000000000000
[   54.998164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   54.998624] CR2: ffff8800d2b7f000 CR3: 000000011931f000 CR4: 00000000000006e0
[   54.999225] Stack:
[   54.999596]  ffffffff81243965 0000000000000096 ffff8800d2b7f000 ffffffff00000001
[   55.000632]  000000000000042c ffff880110f47df0 ffff8800d2694ed0 ffff880110f47de0
[   55.001497]  ffff880110f47df0 ffff880110f47c18 ffffffff81243b94 ffffffff81234b3e
[   55.002353] Call Trace:
[   55.002709]  [<ffffffff81243965>] ? __d_lookup+0x5/0x1b0
[   55.003104]  [<ffffffff81243b94>] d_lookup+0x84/0xb0
[   55.003487]  [<ffffffff81234b3e>] ? lookup_open+0xfe/0x7a0
[   55.003878]  [<ffffffff81234b3e>] lookup_open+0xfe/0x7a0
[   55.004272]  [<ffffffff8123717e>] path_openat+0x4de/0xfc0
[   55.004665]  [<ffffffff8123925e>] do_filp_open+0x7e/0xe0
[   55.005061]  [<ffffffff812330a0>] ? lock_rename+0x100/0x100
[   55.005452]  [<ffffffff817f5367>] ? _raw_spin_unlock+0x27/0x40
[   55.005846]  [<ffffffff8124868c>] ? __alloc_fd+0xbc/0x170
[   55.006246]  [<ffffffff81226896>] do_sys_open+0x116/0x1f0
[   55.006635]  [<ffffffff8122698e>] SyS_open+0x1e/0x20
[   55.007036]  [<ffffffff817f5b36>] entry_SYSCALL_64_fastpath+0x1e/0xad
[   55.007434] Code: e1 03 49 d3 e8 e9 8f a3 a4 ff 48 8d 0a 48 83 e1 f8 4c 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 4e a4 a4 ff 48 8d 0a 48 83 e1 f8 <4c> 8b 01 8d 0a 83 e1 07 c1 e1 03 49 d3 e8 e9 c9 a8 a4 ff b9 f2 
[   55.011146] RIP  [<ffffffff817f91b6>] bad_gs+0xcff/0x1b89
[   55.011584]  RSP <ffff880110f47ba0>
[   55.011943] CR2: ffff8800d2b7f000

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 20:07             ` Al Viro
  2016-06-03 21:17               ` Oleg Drokin
@ 2016-06-03 21:18               ` Linus Torvalds
  2016-06-03 21:26                 ` Al Viro
  1 sibling, 1 reply; 48+ messages in thread
From: Linus Torvalds @ 2016-06-03 21:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 3, 2016 at 1:07 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Aha...  It's load_unaligned_zeropad() from dentry_string_cmp(), hitting
> a genuinely unmapped address.  That sends it into fixup, where it tries to
> load an aligned word containing the address in question, in hope that
> fault was on attempt to cross into the next page.  No such luck, address
> was aligned in the first place (it's in %rdx - 0xffff880113f82000), so
> we still oops.

Hmm. We do end up comparing the string length racily with the name
address, so I suspect there could be a real bug in dentry_cmp: the
fact that the name pointer and length aren't necessarily atomic wrt
each other means that we could overrun the dentry pointer to the next
page, even though it's aligned.

So maybe we need to do a careful load of the aligned dentry string
value too, not just the possibly unaligned qstr name. It's a *very*
unlikely race to hit, though. You'd have to shrink the name in the
dentry, get the old (longer name length) to match the one you look up,
and when have the memory unmapped at the end of the new (short)
length.

However, this is not that theoretical race case for two reasons:

 (1) this happens in __d_lookup(), which has taken the dentry lock. So
it's not the racy unlocked RCU case to begin with.

 (2) as you point out, since it is the load_unaligned_zeropad() case,
it isn't the possibly racy dentry name at all, but trhe qstr we're
comparing against (which may have the unaligned case, but not the
confusion about length).

> The unexpected part is that unmapped address did *NOT* come from a dentry;
> it's .name of qstr we were looking for.  And your call chain was
> __d_lookup() <- d_lookup() <- lookup_open(), so in lookup_open() it was
> nd->last.name...

That should have no such issues. It should be a normal qstr, and the
length should be reliable.

So something must have corrupted the qstr.

The remaining length *should* in %edi, judging by the

   0xffffffff81243b82 <+306>: cmp    $0x7,%edi

in the __d_lookup() disassembly. And %rdi contains 2, so there were
supposed to be two more characters at 'ct' (which is %rdx).

Why would nd->last.name be bogus? I don't see anything.

           Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 21:18               ` Linus Torvalds
@ 2016-06-03 21:26                 ` Al Viro
  2016-06-03 22:00                   ` Linus Torvalds
  0 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03 21:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 02:18:15PM -0700, Linus Torvalds wrote:

> So something must have corrupted the qstr.
> 
> The remaining length *should* in %edi, judging by the
> 
>    0xffffffff81243b82 <+306>: cmp    $0x7,%edi
> 
> in the __d_lookup() disassembly. And %rdi contains 2, so there were
> supposed to be two more characters at 'ct' (which is %rdx).

... and since r8 and rsi are 0, we couldn't have consumed anything.
> 
> Why would nd->last.name be bogus? I don't see anything.

An interesting part is that it's page-aligned.  Which is impossible for
a short name obtained by getname(), but is quite likely for a symlink body.
So at a guess, we have a page containing a symlink body freed under us.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 21:17               ` Oleg Drokin
@ 2016-06-03 21:46                 ` Al Viro
  2016-06-03 22:17                   ` Al Viro
  0 siblings, 1 reply; 48+ messages in thread
From: Al Viro @ 2016-06-03 21:46 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds

On Fri, Jun 03, 2016 at 05:17:06PM -0400, Oleg Drokin wrote:

> > Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
> > 4e8440b?
> 
> Well, that was faster than I expected. 4e8440b triggers right away, so I guess
> there's no point in trying the later ones?
> BTW, just to confirm you are noticing - this is a DEBUG_PAGEALLOC build,
> so all freed memory is unmapped which is likely causing this oops - as a sign
> of use after free.
 
> [   54.990119] BUG: unable to handle kernel paging request at ffff8800d2b7f000

Again a page-aligned nd->last.name and even smaller nd->last.len.  It smells
like a page that used to contain a symlink body, but got freed under us.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 21:26                 ` Al Viro
@ 2016-06-03 22:00                   ` Linus Torvalds
  2016-06-03 22:23                     ` Al Viro
  0 siblings, 1 reply; 48+ messages in thread
From: Linus Torvalds @ 2016-06-03 22:00 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 3, 2016 at 2:26 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> in the __d_lookup() disassembly. And %rdi contains 2, so there were
>> supposed to be two more characters at 'ct' (which is %rdx).
>
> ... and since r8 and rsi are 0, we couldn't have consumed anything.

Right you are. So it really started out page-aligned.

>> Why would nd->last.name be bogus? I don't see anything.
>
> An interesting part is that it's page-aligned.  Which is impossible for
> a short name obtained by getname(), but is quite likely for a symlink body.
> So at a guess, we have a page containing a symlink body freed under us.

Hmm. Good point.

Is perhaps the "delayed_call" logic broken, and the symlink is free'd too early?

That whole set_delayed_call/do_delayed_call thing came in 4.5. Maybe
something broke that logic, and we've executed the delayed freeing
before we should have.

Normally it's done at terminate_walk() time. But I note that in
walk_component(), we do put_link(nd) which does a do_delayed_call(),
but does *not* do a clear_delayed_call(), so now I think a subsequent
terminate_walk() might drop it *again*.

I'm probably missing something, but I have to say that the old
explicit "put_link()" callback logic was more obvious than the new
delayed calls are.

          Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 21:46                 ` Al Viro
@ 2016-06-03 22:17                   ` Al Viro
  0 siblings, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 22:17 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Linus Torvalds

On Fri, Jun 03, 2016 at 10:46:31PM +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 05:17:06PM -0400, Oleg Drokin wrote:
> 
> > > Can the same thing be reproduced (with NFS fix) on v4.6, ede4090, 7f427d3,
> > > 4e8440b?
> > 
> > Well, that was faster than I expected. 4e8440b triggers right away, so I guess
> > there's no point in trying the later ones?
> > BTW, just to confirm you are noticing - this is a DEBUG_PAGEALLOC build,
> > so all freed memory is unmapped which is likely causing this oops - as a sign
> > of use after free.
>  
> > [   54.990119] BUG: unable to handle kernel paging request at ffff8800d2b7f000
> 
> Again a page-aligned nd->last.name and even smaller nd->last.len.  It smells
> like a page that used to contain a symlink body, but got freed under us.

OK, I think I understand what's going on there.  We have a pathname that ends
with a trailing symlink.  Traverse that symlink up to the last component.  And
get EOPENSTALE on attempt to open that.  At that point we proceed to
retry_lookup: and call lookup_open().  But we'd *already* done put_link()
on the first pass, so now nd->last.name points into freed page.

Damn...  I'm very tempted to rip the retry_lookup logics out of there and
just let the damn thing repeat the whole pathname resolution ;-/  do_last()
will become so much saner after that...

Let's at least verify that this is what's going on - remove
                if (error == -EOPENSTALE)
                        goto stale_open;
from do_last() and see if that fixes the damn thing.  Alternative solution
would be to turn that
        if (nd->depth)
                put_link(nd);
        error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW,
                                   inode, seq);
        if (unlikely(error))
                return error;
in do_last() into
        error = should_follow_link(nd, &path, nd->flags & LOOKUP_FOLLOW,
                                   inode, seq);
	if (unlikely(error)) {
		if (nd->depth == 2) {
			struct saved *last = nd->stack[0];
			do_delayed_call(&last->done);
			if (!(nd->flags & LOOKUP_RCU))
				path_put(&last->link);
			nd->stack[0] = nd->stack[1];
			nd->depth--;
		}
		return error;
	}	
but I would really prefer the first approach - it allows to remove arseloads
of convoluted crap from do_last().

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:00                   ` Linus Torvalds
@ 2016-06-03 22:23                     ` Al Viro
  2016-06-03 22:29                       ` Al Viro
                                         ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 22:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:

> Is perhaps the "delayed_call" logic broken, and the symlink is free'd too early?
> 
> That whole set_delayed_call/do_delayed_call thing came in 4.5. Maybe
> something broke that logic, and we've executed the delayed freeing
> before we should have.
> 
> Normally it's done at terminate_walk() time. But I note that in
> walk_component(), we do put_link(nd) which does a do_delayed_call(),
> but does *not* do a clear_delayed_call(), so now I think a subsequent
> terminate_walk() might drop it *again*.

Nope - put_link() also decrements nd->depth.  No double calls there...

> I'm probably missing something, but I have to say that the old
> explicit "put_link()" callback logic was more obvious than the new
> delayed calls are.

It's not that.  It's explicit put_link() in do_last(), followed by
ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
looking at now-freed nd->last.name.  IOW, the bug predates delayed_call
stuff.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:23                     ` Al Viro
@ 2016-06-03 22:29                       ` Al Viro
  2016-06-03 22:36                       ` Linus Torvalds
  2016-06-03 22:37                       ` Al Viro
  2 siblings, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:

> It's not that.  It's explicit put_link() in do_last(), followed by
> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
> looking at now-freed nd->last.name.  IOW, the bug predates delayed_call
> stuff.

FWIW, I'd stepped into that in d63ff28f "namei: make should_follow_link() store
the link in nd->link", so it's 4.1+ mess.  delayed_call stuff is 4.4+...

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:23                     ` Al Viro
  2016-06-03 22:29                       ` Al Viro
@ 2016-06-03 22:36                       ` Linus Torvalds
  2016-06-03 22:42                         ` Oleg Drokin
  2016-06-03 22:43                         ` Al Viro
  2016-06-03 22:37                       ` Al Viro
  2 siblings, 2 replies; 48+ messages in thread
From: Linus Torvalds @ 2016-06-03 22:36 UTC (permalink / raw)
  To: Al Viro
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 3, 2016 at 3:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:
>>>
>> Normally it's done at terminate_walk() time. But I note that in
>> walk_component(), we do put_link(nd) which does a do_delayed_call(),
>> but does *not* do a clear_delayed_call(), so now I think a subsequent
>> terminate_walk() might drop it *again*.
>
> Nope - put_link() also decrements nd->depth.  No double calls there...

Yeah, I figured that out, and then continued to try to look at other cases..

Happy to hear that you seem to have figured it out.

But why did it apparently only start happening now?

             Linus

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:23                     ` Al Viro
  2016-06-03 22:29                       ` Al Viro
  2016-06-03 22:36                       ` Linus Torvalds
@ 2016-06-03 22:37                       ` Al Viro
  2016-06-03 22:49                         ` Oleg Drokin
  2016-06-03 23:58                         ` Oleg Drokin
  2 siblings, 2 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 22:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:

> It's not that.  It's explicit put_link() in do_last(), followed by
> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
> looking at now-freed nd->last.name.  IOW, the bug predates delayed_call
> stuff.

EOPENSTALE, that is...  Oleg, could you check if the following works?

diff --git a/fs/namei.c b/fs/namei.c
index 4c4f95a..3d9511e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3166,9 +3166,7 @@ static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path save_parent = { .dentry = NULL, .mnt = NULL };
 	struct path path;
-	bool retried = false;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3211,7 +3209,6 @@ static int do_last(struct nameidata *nd,
 			return -EISDIR;
 	}
 
-retry_lookup:
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
 		error = mnt_want_write(nd->path.mnt);
 		if (!error)
@@ -3292,23 +3289,14 @@ finish_lookup:
 	if (unlikely(error))
 		return error;
 
-	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
-		path_to_nameidata(&path, nd);
-	} else {
-		save_parent.dentry = nd->path.dentry;
-		save_parent.mnt = mntget(path.mnt);
-		nd->path.dentry = path.dentry;
-
-	}
+	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 finish_open:
 	error = complete_walk(nd);
-	if (error) {
-		path_put(&save_parent);
+	if (error)
 		return error;
-	}
 	audit_inode(nd->name, nd->path.dentry, 0);
 	error = -EISDIR;
 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
@@ -3331,13 +3319,9 @@ finish_open_created:
 		goto out;
 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
 	error = vfs_open(&nd->path, file, current_cred());
-	if (!error) {
-		*opened |= FILE_OPENED;
-	} else {
-		if (error == -EOPENSTALE)
-			goto stale_open;
+	if (error)
 		goto out;
-	}
+	*opened |= FILE_OPENED;
 opened:
 	error = open_check_o_direct(file);
 	if (!error)
@@ -3353,26 +3337,7 @@ out:
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	path_put(&save_parent);
 	return error;
-
-stale_open:
-	/* If no saved parent or already retried then can't retry */
-	if (!save_parent.dentry || retried)
-		goto out;
-
-	BUG_ON(save_parent.dentry != dir);
-	path_put(&nd->path);
-	nd->path = save_parent;
-	nd->inode = dir->d_inode;
-	save_parent.mnt = NULL;
-	save_parent.dentry = NULL;
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-	retried = true;
-	goto retry_lookup;
 }
 
 static int do_tmpfile(struct nameidata *nd, unsigned flags,

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:36                       ` Linus Torvalds
@ 2016-06-03 22:42                         ` Oleg Drokin
  2016-06-03 22:43                         ` Al Viro
  1 sibling, 0 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 3, 2016, at 6:36 PM, Linus Torvalds wrote:

> On Fri, Jun 3, 2016 at 3:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> On Fri, Jun 03, 2016 at 03:00:02PM -0700, Linus Torvalds wrote:
>>>> 
>>> Normally it's done at terminate_walk() time. But I note that in
>>> walk_component(), we do put_link(nd) which does a do_delayed_call(),
>>> but does *not* do a clear_delayed_call(), so now I think a subsequent
>>> terminate_walk() might drop it *again*.
>> 
>> Nope - put_link() also decrements nd->depth.  No double calls there...
> 
> Yeah, I figured that out, and then continued to try to look at other cases..
> 
> Happy to hear that you seem to have figured it out.
> 
> But why did it apparently only start happening now?

Apparently nobody runs NFS in a serious manner these days anymore.
EOPENSTALE is only used by NFS.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:36                       ` Linus Torvalds
  2016-06-03 22:42                         ` Oleg Drokin
@ 2016-06-03 22:43                         ` Al Viro
  1 sibling, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-03 22:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Oleg Drokin, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Fri, Jun 03, 2016 at 03:36:22PM -0700, Linus Torvalds wrote:

> Happy to hear that you seem to have figured it out.
> 
> But why did it apparently only start happening now?

Oleg has started to use Lustre torture tests on NFS, that's all.  Note, BTW,
that first they'd triggered an oopsable bug (fairly easy to reproduce) in
nfs_atomic_open() that had been there for 3 years ;-/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:37                       ` Al Viro
@ 2016-06-03 22:49                         ` Oleg Drokin
  2016-06-03 23:58                         ` Oleg Drokin
  1 sibling, 0 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 22:49 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 3, 2016, at 6:37 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
> 
>> It's not that.  It's explicit put_link() in do_last(), followed by
>> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
>> looking at now-freed nd->last.name.  IOW, the bug predates delayed_call
>> stuff.
> 
> EOPENSTALE, that is...  Oleg, could you check if the following works?

Ok, I'll try this one too.
The other one with goto stale_open also was appearing to work.

We'll see what else is going to show up next...

> diff --git a/fs/namei.c b/fs/namei.c
> index 4c4f95a..3d9511e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3166,9 +3166,7 @@ static int do_last(struct nameidata *nd,
> 	int acc_mode = op->acc_mode;
> 	unsigned seq;
> 	struct inode *inode;
> -	struct path save_parent = { .dentry = NULL, .mnt = NULL };
> 	struct path path;
> -	bool retried = false;
> 	int error;
> 
> 	nd->flags &= ~LOOKUP_PARENT;
> @@ -3211,7 +3209,6 @@ static int do_last(struct nameidata *nd,
> 			return -EISDIR;
> 	}
> 
> -retry_lookup:
> 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
> 		error = mnt_want_write(nd->path.mnt);
> 		if (!error)
> @@ -3292,23 +3289,14 @@ finish_lookup:
> 	if (unlikely(error))
> 		return error;
> 
> -	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
> -		path_to_nameidata(&path, nd);
> -	} else {
> -		save_parent.dentry = nd->path.dentry;
> -		save_parent.mnt = mntget(path.mnt);
> -		nd->path.dentry = path.dentry;
> -
> -	}
> +	path_to_nameidata(&path, nd);
> 	nd->inode = inode;
> 	nd->seq = seq;
> 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
> finish_open:
> 	error = complete_walk(nd);
> -	if (error) {
> -		path_put(&save_parent);
> +	if (error)
> 		return error;
> -	}
> 	audit_inode(nd->name, nd->path.dentry, 0);
> 	error = -EISDIR;
> 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
> @@ -3331,13 +3319,9 @@ finish_open_created:
> 		goto out;
> 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
> 	error = vfs_open(&nd->path, file, current_cred());
> -	if (!error) {
> -		*opened |= FILE_OPENED;
> -	} else {
> -		if (error == -EOPENSTALE)
> -			goto stale_open;
> +	if (error)
> 		goto out;
> -	}
> +	*opened |= FILE_OPENED;
> opened:
> 	error = open_check_o_direct(file);
> 	if (!error)
> @@ -3353,26 +3337,7 @@ out:
> 	}
> 	if (got_write)
> 		mnt_drop_write(nd->path.mnt);
> -	path_put(&save_parent);
> 	return error;
> -
> -stale_open:
> -	/* If no saved parent or already retried then can't retry */
> -	if (!save_parent.dentry || retried)
> -		goto out;
> -
> -	BUG_ON(save_parent.dentry != dir);
> -	path_put(&nd->path);
> -	nd->path = save_parent;
> -	nd->inode = dir->d_inode;
> -	save_parent.mnt = NULL;
> -	save_parent.dentry = NULL;
> -	if (got_write) {
> -		mnt_drop_write(nd->path.mnt);
> -		got_write = false;
> -	}
> -	retried = true;
> -	goto retry_lookup;
> }
> 
> static int do_tmpfile(struct nameidata *nd, unsigned flags,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 22:37                       ` Al Viro
  2016-06-03 22:49                         ` Oleg Drokin
@ 2016-06-03 23:58                         ` Oleg Drokin
  2016-06-04  0:56                           ` Al Viro
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-03 23:58 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 3, 2016, at 6:37 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 11:23:55PM +0100, Al Viro wrote:
> 
>> It's not that.  It's explicit put_link() in do_last(), followed by
>> ESTALEOPEN and subsequent misbegotten "retry the last step on ESTALEOPEN"
>> looking at now-freed nd->last.name.  IOW, the bug predates delayed_call
>> stuff.
> 
> EOPENSTALE, that is...  Oleg, could you check if the following works?

Yes, this one lasted for an hour with no crashing, so it must be good.
Thanks.
(note, I am not equipped to verify correctness of NFS operations, though).

> diff --git a/fs/namei.c b/fs/namei.c
> index 4c4f95a..3d9511e 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -3166,9 +3166,7 @@ static int do_last(struct nameidata *nd,
> 	int acc_mode = op->acc_mode;
> 	unsigned seq;
> 	struct inode *inode;
> -	struct path save_parent = { .dentry = NULL, .mnt = NULL };
> 	struct path path;
> -	bool retried = false;
> 	int error;
> 
> 	nd->flags &= ~LOOKUP_PARENT;
> @@ -3211,7 +3209,6 @@ static int do_last(struct nameidata *nd,
> 			return -EISDIR;
> 	}
> 
> -retry_lookup:
> 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
> 		error = mnt_want_write(nd->path.mnt);
> 		if (!error)
> @@ -3292,23 +3289,14 @@ finish_lookup:
> 	if (unlikely(error))
> 		return error;
> 
> -	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
> -		path_to_nameidata(&path, nd);
> -	} else {
> -		save_parent.dentry = nd->path.dentry;
> -		save_parent.mnt = mntget(path.mnt);
> -		nd->path.dentry = path.dentry;
> -
> -	}
> +	path_to_nameidata(&path, nd);
> 	nd->inode = inode;
> 	nd->seq = seq;
> 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
> finish_open:
> 	error = complete_walk(nd);
> -	if (error) {
> -		path_put(&save_parent);
> +	if (error)
> 		return error;
> -	}
> 	audit_inode(nd->name, nd->path.dentry, 0);
> 	error = -EISDIR;
> 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
> @@ -3331,13 +3319,9 @@ finish_open_created:
> 		goto out;
> 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
> 	error = vfs_open(&nd->path, file, current_cred());
> -	if (!error) {
> -		*opened |= FILE_OPENED;
> -	} else {
> -		if (error == -EOPENSTALE)
> -			goto stale_open;
> +	if (error)
> 		goto out;
> -	}
> +	*opened |= FILE_OPENED;
> opened:
> 	error = open_check_o_direct(file);
> 	if (!error)
> @@ -3353,26 +3337,7 @@ out:
> 	}
> 	if (got_write)
> 		mnt_drop_write(nd->path.mnt);
> -	path_put(&save_parent);
> 	return error;
> -
> -stale_open:
> -	/* If no saved parent or already retried then can't retry */
> -	if (!save_parent.dentry || retried)
> -		goto out;
> -
> -	BUG_ON(save_parent.dentry != dir);
> -	path_put(&nd->path);
> -	nd->path = save_parent;
> -	nd->inode = dir->d_inode;
> -	save_parent.mnt = NULL;
> -	save_parent.dentry = NULL;
> -	if (got_write) {
> -		mnt_drop_write(nd->path.mnt);
> -		got_write = false;
> -	}
> -	retried = true;
> -	goto retry_lookup;
> }
> 
> static int do_tmpfile(struct nameidata *nd, unsigned flags,

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-03 23:58                         ` Oleg Drokin
@ 2016-06-04  0:56                           ` Al Viro
  2016-06-04 12:25                             ` Jeff Layton
  2016-06-04 16:12                             ` Oleg Drokin
  0 siblings, 2 replies; 48+ messages in thread
From: Al Viro @ 2016-06-04  0:56 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Linus Torvalds,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Oleg Drokin, linux-nfs

On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:

> > EOPENSTALE, that is...  Oleg, could you check if the following works?
> 
> Yes, this one lasted for an hour with no crashing, so it must be good.
> Thanks.
> (note, I am not equipped to verify correctness of NFS operations, though).

I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
we really need a consolidated regression testsuite, including the tests you'd
been running.  Right now there's some stuff in xfstests, LTP and cthon; if
anything, this mess shows just why we need all of that and then some in
a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
retries on the last component of a trailing non-embedded symlink.  Neither
is hard to trigger; it's just that relevant tests hadn't been run on NFS,
period.

Jeff, could you verify that the following does not cause regressions in
stale fhandles treatment?  I want to rip the damn retry logics out of
do_last() and if the staleness had only been discovered inside of
nfs4_file_open() just have the upper-level logics handle it by doing
a normal LOOKUP_REVAL pass from scratch.  To hell with trying to be clever;
a few roundtrips it saves us in some cases is not worth the complexity and
potential for bugs.  I'm fairly sure that the time spent debugging this
particular turd exceeds the total amount of time it has ever saved,
and do_last() is in dire need of simplification.  All talk about "enough eyes"
isn't worth much when the readers of code in question feel like ripping their
eyes out...

diff --git a/fs/namei.c b/fs/namei.c
index 4c4f95a..3d9511e 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3166,9 +3166,7 @@ static int do_last(struct nameidata *nd,
 	int acc_mode = op->acc_mode;
 	unsigned seq;
 	struct inode *inode;
-	struct path save_parent = { .dentry = NULL, .mnt = NULL };
 	struct path path;
-	bool retried = false;
 	int error;
 
 	nd->flags &= ~LOOKUP_PARENT;
@@ -3211,7 +3209,6 @@ static int do_last(struct nameidata *nd,
 			return -EISDIR;
 	}
 
-retry_lookup:
 	if (open_flag & (O_CREAT | O_TRUNC | O_WRONLY | O_RDWR)) {
 		error = mnt_want_write(nd->path.mnt);
 		if (!error)
@@ -3292,23 +3289,14 @@ finish_lookup:
 	if (unlikely(error))
 		return error;
 
-	if ((nd->flags & LOOKUP_RCU) || nd->path.mnt != path.mnt) {
-		path_to_nameidata(&path, nd);
-	} else {
-		save_parent.dentry = nd->path.dentry;
-		save_parent.mnt = mntget(path.mnt);
-		nd->path.dentry = path.dentry;
-
-	}
+	path_to_nameidata(&path, nd);
 	nd->inode = inode;
 	nd->seq = seq;
 	/* Why this, you ask?  _Now_ we might have grown LOOKUP_JUMPED... */
 finish_open:
 	error = complete_walk(nd);
-	if (error) {
-		path_put(&save_parent);
+	if (error)
 		return error;
-	}
 	audit_inode(nd->name, nd->path.dentry, 0);
 	error = -EISDIR;
 	if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
@@ -3331,13 +3319,9 @@ finish_open_created:
 		goto out;
 	BUG_ON(*opened & FILE_OPENED); /* once it's opened, it's opened */
 	error = vfs_open(&nd->path, file, current_cred());
-	if (!error) {
-		*opened |= FILE_OPENED;
-	} else {
-		if (error == -EOPENSTALE)
-			goto stale_open;
+	if (error)
 		goto out;
-	}
+	*opened |= FILE_OPENED;
 opened:
 	error = open_check_o_direct(file);
 	if (!error)
@@ -3353,26 +3337,7 @@ out:
 	}
 	if (got_write)
 		mnt_drop_write(nd->path.mnt);
-	path_put(&save_parent);
 	return error;
-
-stale_open:
-	/* If no saved parent or already retried then can't retry */
-	if (!save_parent.dentry || retried)
-		goto out;
-
-	BUG_ON(save_parent.dentry != dir);
-	path_put(&nd->path);
-	nd->path = save_parent;
-	nd->inode = dir->d_inode;
-	save_parent.mnt = NULL;
-	save_parent.dentry = NULL;
-	if (got_write) {
-		mnt_drop_write(nd->path.mnt);
-		got_write = false;
-	}
-	retried = true;
-	goto retry_lookup;
 }
 
 static int do_tmpfile(struct nameidata *nd, unsigned flags,

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-04  0:56                           ` Al Viro
@ 2016-06-04 12:25                             ` Jeff Layton
  2016-06-04 16:12                             ` Oleg Drokin
  1 sibling, 0 replies; 48+ messages in thread
From: Jeff Layton @ 2016-06-04 12:25 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	Oleg Drokin, linux-nfs

On Sat, 2016-06-04 at 01:56 +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
> 
> > 
> > > 
> > > EOPENSTALE, that is...  Oleg, could you check if the following works?
> > Yes, this one lasted for an hour with no crashing, so it must be good.
> > Thanks.
> > (note, I am not equipped to verify correctness of NFS operations, though).
> I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
> we really need a consolidated regression testsuite, including the tests you'd
> been running.  Right now there's some stuff in xfstests, LTP and cthon; if
> anything, this mess shows just why we need all of that and then some in
> a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
> d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
> retries on the last component of a trailing non-embedded symlink.  Neither
> is hard to trigger; it's just that relevant tests hadn't been run on NFS,
> period.
> 
> Jeff, could you verify that the following does not cause regressions in
> stale fhandles treatment?  I want to rip the damn retry logics out of
> do_last() and if the staleness had only been discovered inside of
> nfs4_file_open() just have the upper-level logics handle it by doing
> a normal LOOKUP_REVAL pass from scratch.  To hell with trying to be clever;
> a few roundtrips it saves us in some cases is not worth the complexity and
> potential for bugs.  I'm fairly sure that the time spent debugging this
> particular turd exceeds the total amount of time it has ever saved,
> and do_last() is in dire need of simplification.  All talk about "enough eyes"
> isn't worth much when the readers of code in question feel like ripping their
> eyes out...
> 

Agreed. I see no need to optimize an error case here. Any performance
hit that we'd get here is almost certainly acceptable in this
situation. The main thing is that we prevent the ESTALE from bubbling
up into userland if we can avoid it by retrying.

No, I didn't have the test for this anymore unfortunately. RHQA might
have one though.

Either way, I cooked one up that does this on the server:

#!/bin/bash

while true; do
	rm -rf foo
	mkdir foo
	echo foo > foo/bar
	usleep 100000
done

...and then this on the client after mounting the fs with
lookupcache=none and noac.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
	int fd;

	while(1) {
		fd = open(argv[1], O_RDONLY);
		if (fd < 0) {
			if (errno == ESTALE) {
				printf("ESTALE");
				return 1;
			}
			continue;
		}
		close(fd);
	}
	return 0;
}

I did see some of the OPEN compounds come back with NFS4ERR_STALE on
the PUTFH op but no corresponding ESTALE error in userland. So, this
patch does seem to do the right thing.

Reviewed-and-Tested-by: Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Dcache oops
  2016-06-04  0:56                           ` Al Viro
  2016-06-04 12:25                             ` Jeff Layton
@ 2016-06-04 16:12                             ` Oleg Drokin
  2016-06-04 16:21                               ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-04 16:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Jeff Layton, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>,
	linux-nfs


On Jun 3, 2016, at 8:56 PM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 07:58:37PM -0400, Oleg Drokin wrote:
> 
>>> EOPENSTALE, that is...  Oleg, could you check if the following works?
>> 
>> Yes, this one lasted for an hour with no crashing, so it must be good.
>> Thanks.
>> (note, I am not equipped to verify correctness of NFS operations, though).
> 
> I suspect that Jeff Layton might have relevant regression tests.  Incidentally,
> we really need a consolidated regression testsuite, including the tests you'd
> been running.  Right now there's some stuff in xfstests, LTP and cthon; if
> anything, this mess shows just why we need all of that and then some in
> a single place.  Lustre stuff has caught a 3 years old NFS bug (missing
> d_drop() in nfs_atomic_open()) and a year-old bug in handling of EOPENSTALE
> retries on the last component of a trailing non-embedded symlink.  Neither
> is hard to trigger; it's just that relevant tests hadn't been run on NFS,
> period.

BTW, the nets also have brought in another use after free in nfs4 state
tracking code (this is the one I was trying to hunt down from the start).
I'll submit a patch shortly.
And also there's a mysterious ext4 data corruption that I do not really fully
understand but only hit once so far.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim.
  2016-06-04 16:12                             ` Oleg Drokin
@ 2016-06-04 16:21                               ` green
  2016-06-04 19:57                                 ` Jeff Layton
  0 siblings, 1 reply; 48+ messages in thread
From: green @ 2016-06-04 16:21 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker
  Cc: Olga Kornievskaia, linux-nfs, linux-kernel, Oleg Drokin

From: Oleg Drokin <green@linuxhacker.ru>

Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery")
introduced access to state after it was just potentially freed by
nfs4_put_open_state leading to a random data corruption somewhere.

BUG: unable to handle kernel paging request at ffff88004941ee40
IP: [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm
CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000
RIP: 0010:[<ffffffff813baf01>]  [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
RSP: 0018:ffff88003ff4bd68  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8
RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88
RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98
R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00
FS:  0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0
Stack:
 ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68
 ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48
 ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40
Call Trace:
 [<ffffffff813baad5>] ? nfs4_do_reclaim+0x35/0x740
 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
 [<ffffffff813bb7cd>] nfs4_run_state_manager+0x5ed/0xa40
 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
 [<ffffffff813bb1e0>] ? nfs4_do_reclaim+0x740/0x740
 [<ffffffff810af0d1>] kthread+0x101/0x120
 [<ffffffff810e6b68>] ? trace_hardirqs_on_caller+0x128/0x1b0
 [<ffffffff818843af>] ret_from_fork+0x1f/0x40
 [<ffffffff810aefd0>] ? kthread_create_on_node+0x250/0x250
Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff <f0> 41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6
RIP  [<ffffffff813baf01>] nfs4_do_reclaim+0x461/0x740
 RSP <ffff88003ff4bd68>
CR2: ffff88004941ee40

Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
---
 fs/nfs/nfs4state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9679f47..834b875 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1488,9 +1488,9 @@ restart:
 					}
 					spin_unlock(&state->state_lock);
 				}
-				nfs4_put_open_state(state);
 				clear_bit(NFS_STATE_RECLAIM_NOGRACE,
 					&state->flags);
+				nfs4_put_open_state(state);
 				spin_lock(&sp->so_lock);
 				goto restart;
 			}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim.
  2016-06-04 16:21                               ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green
@ 2016-06-04 19:57                                 ` Jeff Layton
  0 siblings, 0 replies; 48+ messages in thread
From: Jeff Layton @ 2016-06-04 19:57 UTC (permalink / raw)
  To: green, Trond Myklebust, Anna Schumaker
  Cc: Olga Kornievskaia, linux-nfs, linux-kernel

On Sat, 2016-06-04 at 12:21 -0400, green@linuxhacker.ru wrote:
> From: Oleg Drokin <green@linuxhacker.ru>
> 
> Commit e8d975e73e5f ("fixing infinite OPEN loop in 4.0 stateid recovery")
> introduced access to state after it was just potentially freed by
> nfs4_put_open_state leading to a random data corruption somewhere.
> 
> BUG: unable to handle kernel paging request at ffff88004941ee40
> IP: [] nfs4_do_reclaim+0x461/0x740
> PGD 3501067 PUD 3504067 PMD 6ff37067 PTE 800000004941e060
> Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
> Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq tpm_tis joydev i2c_piix4 pcspkr tpm virtio_console nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops floppy serio_raw virtio_blk drm
> CPU: 6 PID: 2161 Comm: 192.168.10.253- Not tainted 4.7.0-rc1-vm-nfs+ #112
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> task: ffff8800463dcd00 ti: ffff88003ff48000 task.ti: ffff88003ff48000
> RIP: 0010:[]  [] nfs4_do_reclaim+0x461/0x740
> RSP: 0018:ffff88003ff4bd68  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffffffff81a49900 RCX: 00000000000000e8
> RDX: 00000000000000e8 RSI: ffff8800418b9930 RDI: ffff880040c96c88
> RBP: ffff88003ff4bdf8 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880040c96c98
> R13: ffff88004941ee20 R14: ffff88004941ee40 R15: ffff88004941ee00
> FS:  0000000000000000(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffff88004941ee40 CR3: 0000000060b0b000 CR4: 00000000000006e0
> Stack:
>  ffffffff813baad5 ffff8800463dcd00 ffff880000000001 ffffffff810e6b68
>  ffff880043ddbc88 ffff8800418b9800 ffff8800418b98c8 ffff88004941ee48
>  ffff880040c96c90 ffff880040c96c00 ffff880040c96c20 ffff880040c96c40
> Call Trace:
>  [] ? nfs4_do_reclaim+0x35/0x740
>  [] ? trace_hardirqs_on_caller+0x128/0x1b0
>  [] nfs4_run_state_manager+0x5ed/0xa40
>  [] ? nfs4_do_reclaim+0x740/0x740
>  [] ? nfs4_do_reclaim+0x740/0x740
>  [] kthread+0x101/0x120
>  [] ? trace_hardirqs_on_caller+0x128/0x1b0
>  [] ret_from_fork+0x1f/0x40
>  [] ? kthread_create_on_node+0x250/0x250
> Code: 65 80 4c 8b b5 78 ff ff ff e8 fc 88 4c 00 48 8b 7d 88 e8 13 67 d2 ff 49 8b 47 40 a8 02 0f 84 d3 01 00 00 4c 89 ff e8 7f f9 ff ff  41 80 26 7f 48 8b 7d c8 e8 b1 84 4c 00 e9 39 fd ff ff 3d e6
> RIP  [] nfs4_do_reclaim+0x461/0x740
>  RSP 
> CR2: ffff88004941ee40
> 
> Signed-off-by: Oleg Drokin <green@linuxhacker.ru>
> ---
>  fs/nfs/nfs4state.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 9679f47..834b875 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -1488,9 +1488,9 @@ restart:
>  					}
>  					spin_unlock(&state->state_lock);
>  				}
> -				nfs4_put_open_state(state);
>  				clear_bit(NFS_STATE_RECLAIM_NOGRACE,
>  					&state->flags);
> +				nfs4_put_open_state(state);
>  				spin_lock(&sp->so_lock);
>  				goto restart;
>  			}

Nice catch.

Reviewed-by: Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  5:56         ` Al Viro
@ 2016-06-06 23:36           ` Oleg Drokin
  2016-06-10  1:33             ` Oleg Drokin
  2016-06-20 13:25           ` Oleg Drokin
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-06 23:36 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

Well, I have some bad news.

This patch does not fix the issue 100% of the time apparently, I just hit it again.

Only this time it's much harder to trigger, but stack is the same
(looks a bit different due to a compiler change). Must be some much narrower race now.

I still don't have a crashdump, though (apparently makedumpfile that is used by
kexec-tools is behind times and does not work with 4.7.0-rc1 kernels) so I cannot
tell you more about the dentry still.

[12470.365211] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[12470.366336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[12470.366927] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis virtio_console tpm nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw virtio_blk floppy
[12470.368917] CPU: 7 PID: 15952 Comm: cat Not tainted 4.7.0-rc1-vm-nfs+ #115
[12470.369554] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[12470.370137] task: ffff8800447447c0 ti: ffff880049a48000 task.ti: ffff880049a48000
[12470.371214] RIP: 0010:[<ffffffff81288061>]  [<ffffffff81288061>] d_splice_alias+0x1e1/0x470
[12470.372340] RSP: 0018:ffff880049a4bab8  EFLAGS: 00010286
[12470.372906] RAX: ffff8800393372a8 RBX: ffff88003c781000 RCX: 0000000000000001
[12470.373525] RDX: 0000000000001895 RSI: ffff88003c781000 RDI: ffff8800393372a8
[12470.374145] RBP: ffff880049a4baf0 R08: 00001353641935c2 R09: 0000000000000000
[12470.374777] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003a7b9300
[12470.375407] R13: 0000000000000000 R14: ffff88003bf0d2a8 R15: 0000000000000000
[12470.376016] FS:  00007fbb07feb700(0000) GS:ffff88006b800000(0000) knlGS:0000000000000000
[12470.377106] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12470.377693] CR2: 000055a498c7bdc8 CR3: 00000000479b3000 CR4: 00000000000006e0
[12470.378311] Stack:
[12470.378823]  ffff880040008f00 0000000029e67876 ffff88003c781000 ffff88003a7b9300
[12470.379946]  0000000000000000 ffff88003bf0d2a8 0000000000000000 ffff880049a4bb48
[12470.381075]  ffffffff8137d63c ffffffffffffffeb ffff880000000000 0000000000000000
[12470.382228] Call Trace:
[12470.382766]  [<ffffffff8137d63c>] nfs_lookup+0x15c/0x420
[12470.383363]  [<ffffffff8137f681>] nfs_atomic_open+0xb1/0x700
[12470.383961]  [<ffffffff812792ea>] lookup_open+0x2ea/0x770
[12470.384570]  [<ffffffff8127c76f>] path_openat+0x7ff/0x1030
[12470.385169]  [<ffffffff8127d15f>] ? getname_flags+0x4f/0x1f0
[12470.385770]  [<ffffffff8104a485>] ? kvm_sched_clock_read+0x25/0x40
[12470.386361]  [<ffffffff8127e1d1>] do_filp_open+0x91/0x100
[12470.386945]  [<ffffffff8188aa97>] ? _raw_spin_unlock+0x27/0x40
[12470.387559]  [<ffffffff8128f810>] ? __alloc_fd+0x100/0x200
[12470.388158]  [<ffffffff8126a230>] do_sys_open+0x130/0x220
[12470.388758]  [<ffffffff8126a33e>] SyS_open+0x1e/0x20
[12470.389327]  [<ffffffff8188b3fc>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[12470.389929] Code: 83 c4 10 4c 89 f8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 df e8 f1 d6 ff ff 4c 89 f7 e8 19 2a 60 00 45 31 ff eb d9 49 89 ff eb d4 <0f> 0b 48 8b 43 50 4c 8b 78 68 49 8d 97 c8 03 00 00 eb 02 f3 90 
[12470.392299] RIP  [<ffffffff81288061>] d_splice_alias+0x1e1/0x470


On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
> 
>> This one cures the insta-crash I was having, and I see no other ill-effects so far.
> 
> OK...  I can take it through vfs.git, but I think it'd be better off in
> NFS tree.  Is everyone OK with something like the following?
> 
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
> 
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed.  It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones.  Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors.  As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup().  On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
> 
> Cc: stable@vger.kernel.org # v3.10+
> Tested-by: Oleg Drokin <green@linuxhacker.ru>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		err = PTR_ERR(inode);
> 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 		put_nfs_open_context(ctx);
> +		d_drop(dentry);
> 		switch (err) {
> 		case -ENOENT:
> -			d_drop(dentry);
> 			d_add(dentry, NULL);
> 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> 			break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-06 23:36           ` Oleg Drokin
@ 2016-06-10  1:33             ` Oleg Drokin
  2016-06-10 16:49               ` Oleg Drokin
  0 siblings, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-10  1:33 UTC (permalink / raw)
  To: Al Viro
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:

> Well, I have some bad news.
> 
> This patch does not fix the issue 100% of the time apparently, I just hit it again.

Ok, so now finally a piece of good news.
Whatever was causing this other much harder to hit crash is gone in -rc2, gone are
some other disturbing things I saw.

Hopefully this patch will get propagated everywhere soonish.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-10  1:33             ` Oleg Drokin
@ 2016-06-10 16:49               ` Oleg Drokin
  0 siblings, 0 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-10 16:49 UTC (permalink / raw)
  To: Al Viro, Trond Myklebust
  Cc: linux-nfs, <linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 9, 2016, at 9:33 PM, Oleg Drokin wrote:

> 
> On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:
> 
>> Well, I have some bad news.
>> 
>> This patch does not fix the issue 100% of the time apparently, I just hit it again.
> 
> Ok, so now finally a piece of good news.
> Whatever was causing this other much harder to hit crash is gone in -rc2, gone are
> some other disturbing things I saw.

Famous last words, I guess.
It all returned overnight.

But this time it's different from the past couple.

0xffffffff813aa8bb is in nfs4_do_open (/home/green/bk/linux/fs/nfs/nfs4proc.c:2482).
2477                    d_drop(dentry);
2478                    alias = d_exact_alias(dentry, state->inode);
2479                    if (!alias)
2480                            alias = d_splice_alias(igrab(state->inode), dentry);
2481                    /* d_splice_alias() can't fail here - it's a non-directory */

So it appears the dentry that was negative turned positive in the middle of
d_exact_alias call… and also despite us doing d_drop(dentry), it's also already
hashed?
If this can happen in the middle of our operations here, same thing
presumably can happen in the other place we patched up in nfs_atomic_open - 
we do d_drop there, but by the time we get into d_splice_alias it's now
all hashed again by some racing thread, that would explain
some of the rarer earlier crashes I had in -rc1 after the initial fix.

I wonder if it could be a remote-fs specific open vs open race?

Say we have atomic open for parent1/file1, this locks parent1 and proceeds to lookup
file1.
Now before the lookup commences, some other thread moves file1 to parent2
and yet some other thread goes to open parent2/file1.
Eventually it all converges with two threads trying to instantiate file1,
if we get it "just wrong" then a clash like what we see can happen, no?
Hm, I guess then both opens would have their own dentry and it's only inode that's
shared, so that cannot be it.
How can anything find a dentry we presumably just allocated and hash it while we
are not done with it, I wonder?
Also I wonder if all of this is somehow related to this other problem I am having
where drop_nlink reports going into negative territory on rename() call, I hit this
twice already and I guess I just need to convert that to BUG_ON instead for a
closer examination.


The dentry is (I finally have a crashdump):

crash> p *(struct dentry *)0xffff880055dbd2e8
$4 = {
  d_flags = 4718796, 
  d_seq = {
    sequence = 4, 
    dep_map = {
      key = 0xffffffff8337b1c0 <__key.41115>, 
      class_cache = {0x0, 0x0}, 
      name = 0xffffffff81c79c66 "&dentry->d_seq", 
      cpu = 6, 
      ip = 6510615555426900570
    }
  }, 
  d_hash = {
    next = 0x0, 
    pprev = 0xffffc900000fd9c0
  }, 
  d_parent = 0xffff8800079d1008, 
  d_name = {
    {
      {
        hash = 2223188567, 
        len = 2
      }, 
      hash_len = 10813123159
    }, 
    name = 0xffff880055dbd358 "10"
  }, 
  d_inode = 0xffff880066d8ab38, 
  d_iname = "10\000ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ", 
  d_lockref = {
    {
      {
        lock = {
          {
            rlock = {
              raw_lock = {
                val = {
                  counter = 0
                }
              }, 
              magic = 3735899821, 
              owner_cpu = 4294967295, 
              owner = 0xffffffffffffffff, 
              dep_map = {
                key = 0xffffffff8337b1c8 <__key.41114>, 
                class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0}, 
                name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock", 
                cpu = 3, 
                ip = 18446744071583760701
              }
            }, 
            {
              __padding = "\000\000\000\000╜N╜чЪЪЪЪZZZZЪЪЪЪЪЪЪЪ", 
              dep_map = {
                key = 0xffffffff8337b1c8 <__key.41114>, 
                class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0}, 
                name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock", 
                cpu = 3, 
                ip = 18446744071583760701
              }
            }
          }
        }, 
        count = 3
      }
    }
  }, 
  d_op = 0xffffffff81a46780 <nfs4_dentry_operations>, 
  d_sb = 0xffff88004eaf3000, 
  d_time = 6510615555426956154, 
  d_fsdata = 0x0, 
  {
    d_lru = {
      next = 0xffff880066d7e3e8, 
      prev = 0xffff8800036736c8
    }, 
    d_wait = 0xffff880066d7e3e8
  }, 
  d_child = {
    next = 0xffff8800268629b8, 
    prev = 0xffff8800079d1128
  }, 
  d_subdirs = {
    next = 0xffff880055dbd408, 
    prev = 0xffff880055dbd408
  }, 
  d_u = {
    d_alias = {
      next = 0x0, 
      pprev = 0xffff880066d8ad20
    }, 
    d_in_lookup_hash = {
      next = 0x0, 
      pprev = 0xffff880066d8ad20
    }, 
    d_rcu = {
      next = 0x0, 
      func = 0xffff880066d8ad20
    }
  }
}

[22845.516232] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[22845.516864] invalid opcode: 0000 [#1] SMP
[22845.517350] Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq joydev tpm_tis tpm virtio_console i2c_piix4 pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_blk serio_raw floppy
[22845.519236] CPU: 0 PID: 2894259 Comm: cat Not tainted 4.7.0-rc2-vm-nfs+ #122
[22845.519843] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[22845.520259] task: ffff8800640d8e40 ti: ffff88004665c000 task.ti: ffff88004665c000
[22845.520975] RIP: 0010:[<ffffffff81287811>]  [<ffffffff81287811>] d_splice_alias+0x1e1/0x470
[22845.521746] RSP: 0018:ffff88004665fa08  EFLAGS: 00010286
[22845.522122] RAX: ffff880066d8ab38 RBX: 0000000000000000 RCX: 0000000000000001
[22845.522524] RDX: 000000000000191a RSI: ffff880055dbd2e8 RDI: ffff880066d8ab38
[22845.522926] RBP: ffff88004665fa40 R08: 0000235352190a66 R09: ffff880055dbd2e8
[22845.523328] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880072d43c00
[22845.523731] R13: ffff880072d43c00 R14: ffff88004883a580 R15: ffff880052615800
[22845.524131] FS:  00007f3b2c27d700(0000) GS:ffff88006b400000(0000) knlGS:0000000000000000
[22845.524846] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22845.525225] CR2: 000055649285c0d0 CR3: 000000004d961000 CR4: 00000000000006f0
[22845.525628] Stack:
[22845.525964]  ffff88004665fa20 ffffffff8188a1f7 0000000000000000 ffff880072d43c00
[22845.526700]  ffff880072d43c00 ffff88004883a580 ffff880052615800 ffff88004665fb20
[22845.527433]  ffffffff813aa8bb ffffffff00000000 ffff8800024000c0 ffff88004665fd80
[22845.528161] Call Trace:
[22845.528504]  [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40
[22845.528943]  [<ffffffff813aa8bb>] nfs4_do_open+0xaeb/0xb10
[22845.529331]  [<ffffffff813aa9d0>] nfs4_atomic_open+0xf0/0x110
[22845.529718]  [<ffffffff8137eefc>] nfs_atomic_open+0x1ac/0x700
[22845.530112]  [<ffffffff8127900a>] lookup_open+0x2ea/0x770
[22845.530488]  [<ffffffff8127bee5>] path_openat+0x6e5/0xc20
[22845.530881]  [<ffffffff8104a425>] ? kvm_sched_clock_read+0x25/0x40
[22845.531268]  [<ffffffff8127d981>] do_filp_open+0x91/0x100
[22845.531645]  [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40
[22845.532027]  [<ffffffff8128efc0>] ? __alloc_fd+0x100/0x200
[22845.532405]  [<ffffffff81269f80>] do_sys_open+0x130/0x220
[22845.533157]  [<ffffffff8126a08e>] SyS_open+0x1e/0x20

vmcore is at http://knox.linuxhacker.ru/tmp/dentry2/vmcore.gz
vmlinux is at http://knox.linuxhacker.ru/tmp/dentry2/vmlinux.gz

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-03  5:56         ` Al Viro
  2016-06-06 23:36           ` Oleg Drokin
@ 2016-06-20 13:25           ` Oleg Drokin
  2016-06-20 14:08             ` Al Viro
  1 sibling, 1 reply; 48+ messages in thread
From: Oleg Drokin @ 2016-06-20 13:25 UTC (permalink / raw)
  To: Al Viro, Trond Myklebust
  Cc: J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

It looks like this patch was totally forgotten?
I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
crash in nfs code. And I think it's unrelated to the other parallel case too.

On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
> 
>> This one cures the insta-crash I was having, and I see no other ill-effects so far.
> 
> OK...  I can take it through vfs.git, but I think it'd be better off in
> NFS tree.  Is everyone OK with something like the following?
> 
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
> 
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed.  It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones.  Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors.  As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup().  On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
> 
> Cc: stable@vger.kernel.org # v3.10+
> Tested-by: Oleg Drokin <green@linuxhacker.ru>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> 		err = PTR_ERR(inode);
> 		trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> 		put_nfs_open_context(ctx);
> +		d_drop(dentry);
> 		switch (err) {
> 		case -ENOENT:
> -			d_drop(dentry);
> 			d_add(dentry, NULL);
> 			nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> 			break;

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 13:25           ` Oleg Drokin
@ 2016-06-20 14:08             ` Al Viro
  2016-06-20 14:54               ` Trond Myklebust
  2016-06-20 15:43               ` Anna Schumaker
  0 siblings, 2 replies; 48+ messages in thread
From: Al Viro @ 2016-06-20 14:08 UTC (permalink / raw)
  To: Oleg Drokin
  Cc: Trond Myklebust, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> It looks like this patch was totally forgotten?
> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
> crash in nfs code. And I think it's unrelated to the other parallel case too.

I assumed it would go through NFS tree, seeing that it's NFS-specific and
has nothing to do with any of the recent VFS changes (oops is triggerable
starting from 3.11); I can certainly put it through vfs.git, and there
will be changes nearby, but this one should go into -stable as a separate
patch.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 14:08             ` Al Viro
@ 2016-06-20 14:54               ` Trond Myklebust
  2016-06-20 15:28                 ` Al Viro
  2016-06-20 15:43               ` Anna Schumaker
  1 sibling, 1 reply; 48+ messages in thread
From: Trond Myklebust @ 2016-06-20 14:54 UTC (permalink / raw)
  To: Viro Alexander
  Cc: Oleg Drokin, Fields Bruce, linux-nfs list,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


> On Jun 20, 2016, at 10:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
> 
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.
> 

I’ll take it through the NFS tree.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 14:54               ` Trond Myklebust
@ 2016-06-20 15:28                 ` Al Viro
  0 siblings, 0 replies; 48+ messages in thread
From: Al Viro @ 2016-06-20 15:28 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Oleg Drokin, Fields Bruce, linux-nfs list,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

On Mon, Jun 20, 2016 at 02:54:36PM +0000, Trond Myklebust wrote:
> 
> > On Jun 20, 2016, at 10:08, Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> >> It looks like this patch was totally forgotten?
> >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
> >> crash in nfs code. And I think it's unrelated to the other parallel case too.
> > 
> > I assumed it would go through NFS tree, seeing that it's NFS-specific and
> > has nothing to do with any of the recent VFS changes (oops is triggerable
> > starting from 3.11); I can certainly put it through vfs.git, and there
> > will be changes nearby, but this one should go into -stable as a separate
> > patch.
> > 
> 
> I’ll take it through the NFS tree.

OK.  It's really a -stable fodder, BTW - all you need to trigger that oops is
a hashed negative dentry from earlier lookup + symlink created from another
client + attempt to open from ours.  Gets you d_splice_alias() (or
d_materialise_unique() prior to 3.19) with hashed dentry and that triggers
BUG_ON, leaving us with the parent directory locked.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 14:08             ` Al Viro
  2016-06-20 14:54               ` Trond Myklebust
@ 2016-06-20 15:43               ` Anna Schumaker
  2016-06-20 15:45                 ` Oleg Drokin
  2016-06-20 15:47                 ` Trond Myklebust
  1 sibling, 2 replies; 48+ messages in thread
From: Anna Schumaker @ 2016-06-20 15:43 UTC (permalink / raw)
  To: Al Viro, Oleg Drokin
  Cc: Trond Myklebust, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>

Hi,

On 06/20/2016 10:08 AM, Al Viro wrote:
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
> 
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.

I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit.

Anna

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 15:43               ` Anna Schumaker
@ 2016-06-20 15:45                 ` Oleg Drokin
  2016-06-20 15:47                 ` Trond Myklebust
  1 sibling, 0 replies; 48+ messages in thread
From: Oleg Drokin @ 2016-06-20 15:45 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Al Viro, Trond Myklebust, J. Bruce Fields, linux-nfs,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


On Jun 20, 2016, at 11:43 AM, Anna Schumaker wrote:

> Hi,
> 
> On 06/20/2016 10:08 AM, Al Viro wrote:
>> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>>> It looks like this patch was totally forgotten?
>>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>>> crash in nfs code. And I think it's unrelated to the other parallel case too.
>> 
>> I assumed it would go through NFS tree, seeing that it's NFS-specific and
>> has nothing to do with any of the recent VFS changes (oops is triggerable
>> starting from 3.11); I can certainly put it through vfs.git, and there
>> will be changes nearby, but this one should go into -stable as a separate
>> patch.
> 
> I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit.

Yes, it is very easy to hit.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: NFS/d_splice_alias breakage
  2016-06-20 15:43               ` Anna Schumaker
  2016-06-20 15:45                 ` Oleg Drokin
@ 2016-06-20 15:47                 ` Trond Myklebust
  1 sibling, 0 replies; 48+ messages in thread
From: Trond Myklebust @ 2016-06-20 15:47 UTC (permalink / raw)
  To: Schumaker Anna
  Cc: Viro Alexander, Oleg Drokin, Fields Bruce, linux-nfs list,
	<linux-kernel@vger.kernel.org> Mailing List,
	<linux-fsdevel@vger.kernel.org>


> On Jun 20, 2016, at 11:43, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:
> 
> Hi,
> 
> On 06/20/2016 10:08 AM, Al Viro wrote:
>> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>>> It looks like this patch was totally forgotten?
>>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>>> crash in nfs code. And I think it's unrelated to the other parallel case too.
>> 
>> I assumed it would go through NFS tree, seeing that it's NFS-specific and
>> has nothing to do with any of the recent VFS changes (oops is triggerable
>> starting from 3.11); I can certainly put it through vfs.git, and there
>> will be changes nearby, but this one should go into -stable as a separate
>> patch.
> 
> I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit.
> 

Hi Anna,

Please do, and please keep the Cc: stable…

Thanks
 Trond

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-06-20 15:59 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-02 22:46 NFS/d_splice_alias breakage Oleg Drokin
2016-06-02 23:59 ` [PATCH] Allow d_splice_alias to accept hashed dentries green
2016-06-03  0:25   ` Oleg Drokin
2016-06-03  0:44 ` NFS/d_splice_alias breakage Trond Myklebust
2016-06-03  0:54   ` Oleg Drokin
2016-06-03  3:26     ` Al Viro
2016-06-03  3:38       ` Al Viro
2016-06-03  3:28   ` Al Viro
2016-06-03  3:37 ` Al Viro
2016-06-03  3:43   ` Oleg Drokin
2016-06-03  4:26     ` Al Viro
2016-06-03  4:42       ` Al Viro
2016-06-03  4:53         ` Al Viro
2016-06-03  4:58       ` Oleg Drokin
2016-06-03  5:56         ` Al Viro
2016-06-06 23:36           ` Oleg Drokin
2016-06-10  1:33             ` Oleg Drokin
2016-06-10 16:49               ` Oleg Drokin
2016-06-20 13:25           ` Oleg Drokin
2016-06-20 14:08             ` Al Viro
2016-06-20 14:54               ` Trond Myklebust
2016-06-20 15:28                 ` Al Viro
2016-06-20 15:43               ` Anna Schumaker
2016-06-20 15:45                 ` Oleg Drokin
2016-06-20 15:47                 ` Trond Myklebust
2016-06-03 16:38       ` Dcache oops Oleg Drokin
2016-06-03 18:22         ` Al Viro
2016-06-03 18:35           ` Oleg Drokin
2016-06-03 20:07             ` Al Viro
2016-06-03 21:17               ` Oleg Drokin
2016-06-03 21:46                 ` Al Viro
2016-06-03 22:17                   ` Al Viro
2016-06-03 21:18               ` Linus Torvalds
2016-06-03 21:26                 ` Al Viro
2016-06-03 22:00                   ` Linus Torvalds
2016-06-03 22:23                     ` Al Viro
2016-06-03 22:29                       ` Al Viro
2016-06-03 22:36                       ` Linus Torvalds
2016-06-03 22:42                         ` Oleg Drokin
2016-06-03 22:43                         ` Al Viro
2016-06-03 22:37                       ` Al Viro
2016-06-03 22:49                         ` Oleg Drokin
2016-06-03 23:58                         ` Oleg Drokin
2016-06-04  0:56                           ` Al Viro
2016-06-04 12:25                             ` Jeff Layton
2016-06-04 16:12                             ` Oleg Drokin
2016-06-04 16:21                               ` [PATCH] nfs4: Fix potential use after free of state in nfs4_do_reclaim green
2016-06-04 19:57                                 ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).