linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe
@ 2019-11-12 10:13 Jamie Heilman
  2019-11-12 16:06 ` Scott Mayhew
  2019-11-12 16:20 ` Scott Mayhew
  0 siblings, 2 replies; 5+ messages in thread
From: Jamie Heilman @ 2019-11-12 10:13 UTC (permalink / raw)
  To: J. Bruce Fields, Scott Mayhew; +Cc: linux-nfs, linux-kernel

Giving 5.4.0-rc7 a spin I hit a NULL pointer dereference and bisected
it to:

commit 6ee95d1c899186c0798cafd25998d436bcdb9618
Author: Scott Mayhew <smayhew@redhat.com>
Date:   Mon Sep 9 16:10:31 2019 -0400

    nfsd: add support for upcall version 2


The splat against 5.3.0-rc2-00034-g6ee95d1c8991:

BUG: kernel NULL pointer dereference, address: 0000000000000036
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 0 PID: 2936 Comm: rpc.nfsd Not tainted 5.3.0-rc2-00034-g6ee95d1c8991 #1
Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
RIP: 0010:crypto_destroy_tfm+0x5/0x4d
Code: 78 01 00 00 48 85 c0 74 05 e9 05 05 66 00 c3 55 48 8b af 80 01 00 00 e8 d5 ff ff ff 48 89 ef 5d e9 12 f9 ef ff 48 85 ff 74 47 <48> 83 7e 30 00 41 55 4c 8b 6e 38 41 54 49 89 fc 55 48 89 f5 75 14
RSP: 0018:ffffc90000b7bd68 EFLAGS: 00010282
RAX: ffffffffa0402841 RBX: ffff888230484400 RCX: 0000000000002cd0
RDX: 0000000000002cce RSI: 0000000000000006 RDI: fffffffffffffffe
RBP: ffffffff81e68440 R08: ffff888232801800 R09: ffffffffa0402841
R10: 0000000000000200 R11: ffff88823048ae40 R12: ffff888231585100
R13: ffff88823048ae40 R14: 000000000000000b R15: ffff888230484400
FS:  00007f02102c3740(0000) GS:ffff888233a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000036 CR3: 0000000230f94000 CR4: 00000000000406f0
Call Trace:
 nfsd4_remove_cld_pipe+0x6d/0x83 [nfsd]
 nfsd4_cld_tracking_init+0x1cf/0x295 [nfsd]
 nfsd4_client_tracking_init+0x72/0x13e [nfsd]
 nfs4_state_start_net+0x22a/0x2cf [nfsd]
 nfsd_svc+0x1c6/0x292 [nfsd]
 write_threads+0x68/0xb0 [nfsd]
 ? write_versions+0x333/0x333 [nfsd]
 nfsctl_transaction_write+0x4a/0x62 [nfsd]
 vfs_write+0xa0/0xdd
 ksys_write+0x71/0xba
 do_syscall_64+0x48/0x55
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f021056c904
Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 d9 3a 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
RSP: 002b:00007ffdc76ec618 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055b534955560 RCX: 00007f021056c904
RDX: 0000000000000002 RSI: 000055b534955560 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffdc76ec4b0
R10: 00007ffdc76ec367 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000008 R14: 0000000000000000 R15: 000055b534b8a2a0
Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc nhpoly1305_sse2 nhpoly1305 aes_generic chacha_x86_64 chacha_generic adiantum poly1305_generic vhost_net tun vhost tap dm_crypt snd_hda_codec_analog snd_hda_codec_generic usb_storage snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep snd_hda_core snd_pcm dcdbas snd_timer irqbypass snd soundcore sr_mod cdrom tg3 sg floppy evdev xfs dm_mod raid1 md_mod psmouse
CR2: 0000000000000036
---[ end trace bc12bbe4cdd6319f ]---
...
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered


My kernel config is at
http://audible.transient.net/~jamie/k/upcallv2.config-5.3.0-rc2-00034-g6ee95d1c8991

I don't think there's anything terribly interesting about my nfs
server setup, this happens reliably on boot up, idle network, no
active clients; let me know what else you need, happy to debug.

-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe
  2019-11-12 10:13 PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe Jamie Heilman
@ 2019-11-12 16:06 ` Scott Mayhew
  2019-11-12 16:20 ` Scott Mayhew
  1 sibling, 0 replies; 5+ messages in thread
From: Scott Mayhew @ 2019-11-12 16:06 UTC (permalink / raw)
  To: J. Bruce Fields, linux-nfs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3900 bytes --]

Hi Jamie,

On Tue, 12 Nov 2019, Jamie Heilman wrote:

> Giving 5.4.0-rc7 a spin I hit a NULL pointer dereference and bisected
> it to:
> 
> commit 6ee95d1c899186c0798cafd25998d436bcdb9618
> Author: Scott Mayhew <smayhew@redhat.com>
> Date:   Mon Sep 9 16:10:31 2019 -0400
> 
>     nfsd: add support for upcall version 2
> 
> 
> The splat against 5.3.0-rc2-00034-g6ee95d1c8991:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000036
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 2936 Comm: rpc.nfsd Not tainted 5.3.0-rc2-00034-g6ee95d1c8991 #1
> Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
> RIP: 0010:crypto_destroy_tfm+0x5/0x4d
> Code: 78 01 00 00 48 85 c0 74 05 e9 05 05 66 00 c3 55 48 8b af 80 01 00 00 e8 d5 ff ff ff 48 89 ef 5d e9 12 f9 ef ff 48 85 ff 74 47 <48> 83 7e 30 00 41 55 4c 8b 6e 38 41 54 49 89 fc 55 48 89 f5 75 14
> RSP: 0018:ffffc90000b7bd68 EFLAGS: 00010282
> RAX: ffffffffa0402841 RBX: ffff888230484400 RCX: 0000000000002cd0
> RDX: 0000000000002cce RSI: 0000000000000006 RDI: fffffffffffffffe
> RBP: ffffffff81e68440 R08: ffff888232801800 R09: ffffffffa0402841
> R10: 0000000000000200 R11: ffff88823048ae40 R12: ffff888231585100
> R13: ffff88823048ae40 R14: 000000000000000b R15: ffff888230484400
> FS:  00007f02102c3740(0000) GS:ffff888233a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000036 CR3: 0000000230f94000 CR4: 00000000000406f0
> Call Trace:
>  nfsd4_remove_cld_pipe+0x6d/0x83 [nfsd]
>  nfsd4_cld_tracking_init+0x1cf/0x295 [nfsd]
>  nfsd4_client_tracking_init+0x72/0x13e [nfsd]
>  nfs4_state_start_net+0x22a/0x2cf [nfsd]
>  nfsd_svc+0x1c6/0x292 [nfsd]
>  write_threads+0x68/0xb0 [nfsd]
>  ? write_versions+0x333/0x333 [nfsd]
>  nfsctl_transaction_write+0x4a/0x62 [nfsd]
>  vfs_write+0xa0/0xdd
>  ksys_write+0x71/0xba
>  do_syscall_64+0x48/0x55
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f021056c904
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 d9 3a 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
> RSP: 002b:00007ffdc76ec618 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000055b534955560 RCX: 00007f021056c904
> RDX: 0000000000000002 RSI: 000055b534955560 RDI: 0000000000000003
> RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffdc76ec4b0
> R10: 00007ffdc76ec367 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000008 R14: 0000000000000000 R15: 000055b534b8a2a0
> Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc nhpoly1305_sse2 nhpoly1305 aes_generic chacha_x86_64 chacha_generic adiantum poly1305_generic vhost_net tun vhost tap dm_crypt snd_hda_codec_analog snd_hda_codec_generic usb_storage snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep snd_hda_core snd_pcm dcdbas snd_timer irqbypass snd soundcore sr_mod cdrom tg3 sg floppy evdev xfs dm_mod raid1 md_mod psmouse
> CR2: 0000000000000036
> ---[ end trace bc12bbe4cdd6319f ]---
> ...
> NFS: Registering the id_resolver key type
> Key type id_resolver registered
> Key type id_legacy registered
> 
> 
> My kernel config is at
> http://audible.transient.net/~jamie/k/upcallv2.config-5.3.0-rc2-00034-g6ee95d1c8991
> 
> I don't think there's anything terribly interesting about my nfs
> server setup, this happens reliably on boot up, idle network, no
> active clients; let me know what else you need, happy to debug.
> 
> -- 
> Jamie Heilman                     http://audible.transient.net/~jamie/
> 
Does this patch help?

-Scott

[-- Attachment #2: 0001-nfsd-Fix-cld_net-cn_tfm-initialization.patch --]
[-- Type: text/plain, Size: 1518 bytes --]

From e46430ef6ee045ec447a0796b419b8cdeee4f25f Mon Sep 17 00:00:00 2001
From: Scott Mayhew <smayhew@redhat.com>
Date: Tue, 12 Nov 2019 10:10:00 -0500
Subject: [PATCH] nfsd: Fix cld_net->cn_tfm initialization

Don't assign an error pointer to cn->cn_tfm, otherwise
an oops will occur in nfsd4_remove_cld_pipe().

Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
 fs/nfsd/nfs4recover.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index cdc75ad4438b..af07d0f55fe3 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1578,6 +1578,7 @@ nfsd4_cld_tracking_init(struct net *net)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	bool running;
 	int retries = 10;
+	struct crypto_shash *tfm;
 
 	status = nfs4_cld_state_init(net);
 	if (status)
@@ -1586,11 +1587,13 @@ nfsd4_cld_tracking_init(struct net *net)
 	status = __nfsd4_init_cld_pipe(net);
 	if (status)
 		goto err_shutdown;
+	tfm = crypto_alloc_shash("sha256", 0, 0);
 	nn->cld_net->cn_tfm = crypto_alloc_shash("sha256", 0, 0);
-	if (IS_ERR(nn->cld_net->cn_tfm)) {
-		status = PTR_ERR(nn->cld_net->cn_tfm);
+	if (IS_ERR(tfm)) {
+		status = PTR_ERR(tfm);
 		goto err_remove;
 	}
+	nn->cld_net->cn_tfm = tfm;
 
 	/*
 	 * rpc pipe upcalls take 30 seconds to time out, so we don't want to
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe
  2019-11-12 10:13 PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe Jamie Heilman
  2019-11-12 16:06 ` Scott Mayhew
@ 2019-11-12 16:20 ` Scott Mayhew
  2019-11-12 18:13   ` Jamie Heilman
  1 sibling, 1 reply; 5+ messages in thread
From: Scott Mayhew @ 2019-11-12 16:20 UTC (permalink / raw)
  To: Jamie Heilman; +Cc: J. Bruce Fields, linux-nfs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3940 bytes --]

Hi Jamie,

On Tue, 12 Nov 2019, Jamie Heilman wrote:

> Giving 5.4.0-rc7 a spin I hit a NULL pointer dereference and bisected
> it to:
> 
> commit 6ee95d1c899186c0798cafd25998d436bcdb9618
> Author: Scott Mayhew <smayhew@redhat.com>
> Date:   Mon Sep 9 16:10:31 2019 -0400
> 
>     nfsd: add support for upcall version 2
> 
> 
> The splat against 5.3.0-rc2-00034-g6ee95d1c8991:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000036
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0 
> Oops: 0000 [#1] PREEMPT SMP PTI
> CPU: 0 PID: 2936 Comm: rpc.nfsd Not tainted 5.3.0-rc2-00034-g6ee95d1c8991 #1
> Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
> RIP: 0010:crypto_destroy_tfm+0x5/0x4d
> Code: 78 01 00 00 48 85 c0 74 05 e9 05 05 66 00 c3 55 48 8b af 80 01 00 00 e8 d5 ff ff ff 48 89 ef 5d e9 12 f9 ef ff 48 85 ff 74 47 <48> 83 7e 30 00 41 55 4c 8b 6e 38 41 54 49 89 fc 55 48 89 f5 75 14
> RSP: 0018:ffffc90000b7bd68 EFLAGS: 00010282
> RAX: ffffffffa0402841 RBX: ffff888230484400 RCX: 0000000000002cd0
> RDX: 0000000000002cce RSI: 0000000000000006 RDI: fffffffffffffffe
> RBP: ffffffff81e68440 R08: ffff888232801800 R09: ffffffffa0402841
> R10: 0000000000000200 R11: ffff88823048ae40 R12: ffff888231585100
> R13: ffff88823048ae40 R14: 000000000000000b R15: ffff888230484400
> FS:  00007f02102c3740(0000) GS:ffff888233a00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000036 CR3: 0000000230f94000 CR4: 00000000000406f0
> Call Trace:
>  nfsd4_remove_cld_pipe+0x6d/0x83 [nfsd]
>  nfsd4_cld_tracking_init+0x1cf/0x295 [nfsd]
>  nfsd4_client_tracking_init+0x72/0x13e [nfsd]
>  nfs4_state_start_net+0x22a/0x2cf [nfsd]
>  nfsd_svc+0x1c6/0x292 [nfsd]
>  write_threads+0x68/0xb0 [nfsd]
>  ? write_versions+0x333/0x333 [nfsd]
>  nfsctl_transaction_write+0x4a/0x62 [nfsd]
>  vfs_write+0xa0/0xdd
>  ksys_write+0x71/0xba
>  do_syscall_64+0x48/0x55
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7f021056c904
> Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 d9 3a 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
> RSP: 002b:00007ffdc76ec618 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000055b534955560 RCX: 00007f021056c904
> RDX: 0000000000000002 RSI: 000055b534955560 RDI: 0000000000000003
> RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffdc76ec4b0
> R10: 00007ffdc76ec367 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000008 R14: 0000000000000000 R15: 000055b534b8a2a0
> Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc nhpoly1305_sse2 nhpoly1305 aes_generic chacha_x86_64 chacha_generic adiantum poly1305_generic vhost_net tun vhost tap dm_crypt snd_hda_codec_analog snd_hda_codec_generic usb_storage snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep snd_hda_core snd_pcm dcdbas snd_timer irqbypass snd soundcore sr_mod cdrom tg3 sg floppy evdev xfs dm_mod raid1 md_mod psmouse
> CR2: 0000000000000036
> ---[ end trace bc12bbe4cdd6319f ]---
> ...
> NFS: Registering the id_resolver key type
> Key type id_resolver registered
> Key type id_legacy registered
> 
> 
> My kernel config is at
> http://audible.transient.net/~jamie/k/upcallv2.config-5.3.0-rc2-00034-g6ee95d1c8991
> 
> I don't think there's anything terribly interesting about my nfs
> server setup, this happens reliably on boot up, idle network, no
> active clients; let me know what else you need, happy to debug.
> 
> -- 
> Jamie Heilman                     http://audible.transient.net/~jamie/
> 
Please try this patch (v2 because I messed up the first one).

-Scott

[-- Attachment #2: 0001-nfsd-Fix-cld_net-cn_tfm-initialization.patch --]
[-- Type: text/plain, Size: 1522 bytes --]

From 34ae6455abfd81b47ab34b66ca88a29ff33c7d98 Mon Sep 17 00:00:00 2001
From: Scott Mayhew <smayhew@redhat.com>
Date: Tue, 12 Nov 2019 10:10:00 -0500
Subject: [PATCH v2] nfsd: Fix cld_net->cn_tfm initialization

Don't assign an error pointer to cn->cn_tfm, otherwise
an oops will occur in nfsd4_remove_cld_pipe().

Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
---
 fs/nfsd/nfs4recover.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index cdc75ad4438b..d1bc56b2e861 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1578,6 +1578,7 @@ nfsd4_cld_tracking_init(struct net *net)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	bool running;
 	int retries = 10;
+	struct crypto_shash *tfm;
 
 	status = nfs4_cld_state_init(net);
 	if (status)
@@ -1586,11 +1587,12 @@ nfsd4_cld_tracking_init(struct net *net)
 	status = __nfsd4_init_cld_pipe(net);
 	if (status)
 		goto err_shutdown;
-	nn->cld_net->cn_tfm = crypto_alloc_shash("sha256", 0, 0);
-	if (IS_ERR(nn->cld_net->cn_tfm)) {
-		status = PTR_ERR(nn->cld_net->cn_tfm);
+	tfm = crypto_alloc_shash("sha256", 0, 0);
+	if (IS_ERR(tfm)) {
+		status = PTR_ERR(tfm);
 		goto err_remove;
 	}
+	nn->cld_net->cn_tfm = tfm;
 
 	/*
 	 * rpc pipe upcalls take 30 seconds to time out, so we don't want to
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe
  2019-11-12 16:20 ` Scott Mayhew
@ 2019-11-12 18:13   ` Jamie Heilman
  2019-11-12 18:57     ` Scott Mayhew
  0 siblings, 1 reply; 5+ messages in thread
From: Jamie Heilman @ 2019-11-12 18:13 UTC (permalink / raw)
  To: Scott Mayhew; +Cc: J. Bruce Fields, linux-nfs, linux-kernel

Scott Mayhew wrote:
> Hi Jamie,
> 
> On Tue, 12 Nov 2019, Jamie Heilman wrote:
> 
> > Giving 5.4.0-rc7 a spin I hit a NULL pointer dereference and bisected
> > it to:
> > 
> > commit 6ee95d1c899186c0798cafd25998d436bcdb9618
> > Author: Scott Mayhew <smayhew@redhat.com>
> > Date:   Mon Sep 9 16:10:31 2019 -0400
> > 
> >     nfsd: add support for upcall version 2
> > 
> > 
> > The splat against 5.3.0-rc2-00034-g6ee95d1c8991:
> > 
> > BUG: kernel NULL pointer dereference, address: 0000000000000036
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0 
> > Oops: 0000 [#1] PREEMPT SMP PTI
> > CPU: 0 PID: 2936 Comm: rpc.nfsd Not tainted 5.3.0-rc2-00034-g6ee95d1c8991 #1
> > Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
> > RIP: 0010:crypto_destroy_tfm+0x5/0x4d
> > Code: 78 01 00 00 48 85 c0 74 05 e9 05 05 66 00 c3 55 48 8b af 80 01 00 00 e8 d5 ff ff ff 48 89 ef 5d e9 12 f9 ef ff 48 85 ff 74 47 <48> 83 7e 30 00 41 55 4c 8b 6e 38 41 54 49 89 fc 55 48 89 f5 75 14
> > RSP: 0018:ffffc90000b7bd68 EFLAGS: 00010282
> > RAX: ffffffffa0402841 RBX: ffff888230484400 RCX: 0000000000002cd0
> > RDX: 0000000000002cce RSI: 0000000000000006 RDI: fffffffffffffffe
> > RBP: ffffffff81e68440 R08: ffff888232801800 R09: ffffffffa0402841
> > R10: 0000000000000200 R11: ffff88823048ae40 R12: ffff888231585100
> > R13: ffff88823048ae40 R14: 000000000000000b R15: ffff888230484400
> > FS:  00007f02102c3740(0000) GS:ffff888233a00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000000036 CR3: 0000000230f94000 CR4: 00000000000406f0
> > Call Trace:
> >  nfsd4_remove_cld_pipe+0x6d/0x83 [nfsd]
> >  nfsd4_cld_tracking_init+0x1cf/0x295 [nfsd]
> >  nfsd4_client_tracking_init+0x72/0x13e [nfsd]
> >  nfs4_state_start_net+0x22a/0x2cf [nfsd]
> >  nfsd_svc+0x1c6/0x292 [nfsd]
> >  write_threads+0x68/0xb0 [nfsd]
> >  ? write_versions+0x333/0x333 [nfsd]
> >  nfsctl_transaction_write+0x4a/0x62 [nfsd]
> >  vfs_write+0xa0/0xdd
> >  ksys_write+0x71/0xba
> >  do_syscall_64+0x48/0x55
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > RIP: 0033:0x7f021056c904
> > Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 d9 3a 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
> > RSP: 002b:00007ffdc76ec618 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 000055b534955560 RCX: 00007f021056c904
> > RDX: 0000000000000002 RSI: 000055b534955560 RDI: 0000000000000003
> > RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffdc76ec4b0
> > R10: 00007ffdc76ec367 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000008 R14: 0000000000000000 R15: 000055b534b8a2a0
> > Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc nhpoly1305_sse2 nhpoly1305 aes_generic chacha_x86_64 chacha_generic adiantum poly1305_generic vhost_net tun vhost tap dm_crypt snd_hda_codec_analog snd_hda_codec_generic usb_storage snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep snd_hda_core snd_pcm dcdbas snd_timer irqbypass snd soundcore sr_mod cdrom tg3 sg floppy evdev xfs dm_mod raid1 md_mod psmouse
> > CR2: 0000000000000036
> > ---[ end trace bc12bbe4cdd6319f ]---
> > ...
> > NFS: Registering the id_resolver key type
> > Key type id_resolver registered
> > Key type id_legacy registered
> > 
> > 
> > My kernel config is at
> > http://audible.transient.net/~jamie/k/upcallv2.config-5.3.0-rc2-00034-g6ee95d1c8991
> > 
> > I don't think there's anything terribly interesting about my nfs
> > server setup, this happens reliably on boot up, idle network, no
> > active clients; let me know what else you need, happy to debug.
> > 
> > -- 
> > Jamie Heilman                     http://audible.transient.net/~jamie/
> > 
> Please try this patch (v2 because I messed up the first one).


Yep, that seems to solve it.  Is the implication that
CONFIG_CRYPTO_SHA256 should be selected by nfsd?  (I tested with it
unset, as per my config before.)


-- 
Jamie Heilman                     http://audible.transient.net/~jamie/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe
  2019-11-12 18:13   ` Jamie Heilman
@ 2019-11-12 18:57     ` Scott Mayhew
  0 siblings, 0 replies; 5+ messages in thread
From: Scott Mayhew @ 2019-11-12 18:57 UTC (permalink / raw)
  To: J. Bruce Fields, linux-nfs, linux-kernel

On Tue, 12 Nov 2019, Jamie Heilman wrote:

> Scott Mayhew wrote:
> > Hi Jamie,
> > 
> > On Tue, 12 Nov 2019, Jamie Heilman wrote:
> > 
> > > Giving 5.4.0-rc7 a spin I hit a NULL pointer dereference and bisected
> > > it to:
> > > 
> > > commit 6ee95d1c899186c0798cafd25998d436bcdb9618
> > > Author: Scott Mayhew <smayhew@redhat.com>
> > > Date:   Mon Sep 9 16:10:31 2019 -0400
> > > 
> > >     nfsd: add support for upcall version 2
> > > 
> > > 
> > > The splat against 5.3.0-rc2-00034-g6ee95d1c8991:
> > > 
> > > BUG: kernel NULL pointer dereference, address: 0000000000000036
> > > #PF: supervisor read access in kernel mode
> > > #PF: error_code(0x0000) - not-present page
> > > PGD 0 P4D 0 
> > > Oops: 0000 [#1] PREEMPT SMP PTI
> > > CPU: 0 PID: 2936 Comm: rpc.nfsd Not tainted 5.3.0-rc2-00034-g6ee95d1c8991 #1
> > > Hardware name: Dell Inc. Precision WorkStation T3400  /0TP412, BIOS A14 04/30/2012
> > > RIP: 0010:crypto_destroy_tfm+0x5/0x4d
> > > Code: 78 01 00 00 48 85 c0 74 05 e9 05 05 66 00 c3 55 48 8b af 80 01 00 00 e8 d5 ff ff ff 48 89 ef 5d e9 12 f9 ef ff 48 85 ff 74 47 <48> 83 7e 30 00 41 55 4c 8b 6e 38 41 54 49 89 fc 55 48 89 f5 75 14
> > > RSP: 0018:ffffc90000b7bd68 EFLAGS: 00010282
> > > RAX: ffffffffa0402841 RBX: ffff888230484400 RCX: 0000000000002cd0
> > > RDX: 0000000000002cce RSI: 0000000000000006 RDI: fffffffffffffffe
> > > RBP: ffffffff81e68440 R08: ffff888232801800 R09: ffffffffa0402841
> > > R10: 0000000000000200 R11: ffff88823048ae40 R12: ffff888231585100
> > > R13: ffff88823048ae40 R14: 000000000000000b R15: ffff888230484400
> > > FS:  00007f02102c3740(0000) GS:ffff888233a00000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000000036 CR3: 0000000230f94000 CR4: 00000000000406f0
> > > Call Trace:
> > >  nfsd4_remove_cld_pipe+0x6d/0x83 [nfsd]
> > >  nfsd4_cld_tracking_init+0x1cf/0x295 [nfsd]
> > >  nfsd4_client_tracking_init+0x72/0x13e [nfsd]
> > >  nfs4_state_start_net+0x22a/0x2cf [nfsd]
> > >  nfsd_svc+0x1c6/0x292 [nfsd]
> > >  write_threads+0x68/0xb0 [nfsd]
> > >  ? write_versions+0x333/0x333 [nfsd]
> > >  nfsctl_transaction_write+0x4a/0x62 [nfsd]
> > >  vfs_write+0xa0/0xdd
> > >  ksys_write+0x71/0xba
> > >  do_syscall_64+0x48/0x55
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > RIP: 0033:0x7f021056c904
> > > Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 48 8d 05 d9 3a 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48
> > > RSP: 002b:00007ffdc76ec618 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > RAX: ffffffffffffffda RBX: 000055b534955560 RCX: 00007f021056c904
> > > RDX: 0000000000000002 RSI: 000055b534955560 RDI: 0000000000000003
> > > RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffdc76ec4b0
> > > R10: 00007ffdc76ec367 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 0000000000000008 R14: 0000000000000000 R15: 000055b534b8a2a0
> > > Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative autofs4 fan nfsd auth_rpcgss nfs lockd grace fscache sunrpc bridge stp llc nhpoly1305_sse2 nhpoly1305 aes_generic chacha_x86_64 chacha_generic adiantum poly1305_generic vhost_net tun vhost tap dm_crypt snd_hda_codec_analog snd_hda_codec_generic usb_storage snd_hda_intel kvm_intel snd_hda_codec kvm snd_hwdep snd_hda_core snd_pcm dcdbas snd_timer irqbypass snd soundcore sr_mod cdrom tg3 sg floppy evdev xfs dm_mod raid1 md_mod psmouse
> > > CR2: 0000000000000036
> > > ---[ end trace bc12bbe4cdd6319f ]---
> > > ...
> > > NFS: Registering the id_resolver key type
> > > Key type id_resolver registered
> > > Key type id_legacy registered
> > > 
> > > 
> > > My kernel config is at
> > > http://audible.transient.net/~jamie/k/upcallv2.config-5.3.0-rc2-00034-g6ee95d1c8991
> > > 
> > > I don't think there's anything terribly interesting about my nfs
> > > server setup, this happens reliably on boot up, idle network, no
> > > active clients; let me know what else you need, happy to debug.
> > > 
> > > -- 
> > > Jamie Heilman                     http://audible.transient.net/~jamie/
> > > 
> > Please try this patch (v2 because I messed up the first one).
> 
> 
> Yep, that seems to solve it. 

Thanks!  There's another small problem, in that nfsd could incorrectly
fall back to using the old nfsdcld tracking ops even if nfsdcld isn't
running... so a v3 patch is forthcoming.

> Is the implication that
> CONFIG_CRYPTO_SHA256 should be selected by nfsd?  (I tested with it
> unset, as per my config before.)

Yes - I'll send a separate patch for that.

-Scott
> 
> 
> -- 
> Jamie Heilman                     http://audible.transient.net/~jamie/
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-11-12 18:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 10:13 PROBLEM: NULL pointer dereference; nfsd4_remove_cld_pipe Jamie Heilman
2019-11-12 16:06 ` Scott Mayhew
2019-11-12 16:20 ` Scott Mayhew
2019-11-12 18:13   ` Jamie Heilman
2019-11-12 18:57     ` Scott Mayhew

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).