* NULL dereference in rpcauth_lookup_credcache @ 2018-11-08 21:44 J. Bruce Fields 2018-11-09 18:01 ` Chuck Lever 0 siblings, 1 reply; 13+ messages in thread From: J. Bruce Fields @ 2018-11-08 21:44 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs Since -rc1 my regression tests crash my client. Is this a known problem? I'll investigate some more, I haven't even looked at the code yet or checked which test exactly is hitting this. --b. [ 164.109570] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 164.111207] PGD 0 P4D 0 [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted 4.20.0-rc1-13223-gafb6d1c474ef #1898 [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 [sunrpc] [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 8b 40 28 e8 cb d2 [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: 0000000000600040 [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: ffff88007cafb198 [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: 0000000000000000 [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: 0000000000000000 [ 164.124296] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: 00000000001606e0 [ 164.126860] Call Trace: [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] [ 164.129281] process_one_work+0x227/0x630 [ 164.129684] worker_thread+0x3c/0x390 [ 164.130062] ? process_one_work+0x630/0x630 [ 164.130609] kthread+0x11d/0x140 [ 164.130936] ? kthread_park+0x80/0x80 [ 164.131339] ret_from_fork+0x3a/0x50 [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd grace auth_rpcgss sunrpc [ 164.132719] CR2: 0000000000000008 [ 164.133050] ---[ end trace b4028a6781a696ad ]--- ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-08 21:44 NULL dereference in rpcauth_lookup_credcache J. Bruce Fields @ 2018-11-09 18:01 ` Chuck Lever 2018-11-10 21:49 ` Bruce Fields 0 siblings, 1 reply; 13+ messages in thread From: Chuck Lever @ 2018-11-09 18:01 UTC (permalink / raw) To: Bruce Fields; +Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > Since -rc1 my regression tests crash my client. Is this a known > problem? I'll investigate some more, I haven't even looked at the code > yet or checked which test exactly is hitting this. > > --b. > > [ 164.109570] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > [ 164.111207] PGD 0 P4D 0 > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted 4.20.0-rc1-13223-gafb6d1c474ef #1898 > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 [sunrpc] > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 8b 40 28 e8 cb d2 > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: 0000000000600040 > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: ffff88007cafb198 > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: 0000000000000000 > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: 0000000000000000 > [ 164.124296] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: 00000000001606e0 > [ 164.126860] Call Trace: > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > [ 164.129281] process_one_work+0x227/0x630 > [ 164.129684] worker_thread+0x3c/0x390 > [ 164.130062] ? process_one_work+0x630/0x630 > [ 164.130609] kthread+0x11d/0x140 > [ 164.130936] ? kthread_park+0x80/0x80 > [ 164.131339] ret_from_fork+0x3a/0x50 > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd grace auth_rpcgss sunrpc > [ 164.132719] CR2: 0000000000000008 > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > I just encountered this repeatedly with cthon04 general tests. MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" -- Chuck Lever chucklever@gmail.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-09 18:01 ` Chuck Lever @ 2018-11-10 21:49 ` Bruce Fields 2018-11-12 17:59 ` Trond Myklebust 0 siblings, 1 reply; 13+ messages in thread From: Bruce Fields @ 2018-11-10 21:49 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List Looks like it's the fault of 07d02a67b7faae "SUNRPC: Simplify lookup code" --b. On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: > > > > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > Since -rc1 my regression tests crash my client. Is this a known > > problem? I'll investigate some more, I haven't even looked at the code > > yet or checked which test exactly is hitting this. > > > > --b. > > > > [ 164.109570] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 > > [ 164.111207] PGD 0 P4D 0 > > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted 4.20.0-rc1-13223-gafb6d1c474ef #1898 > > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014 > > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 [sunrpc] > > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 8b 40 28 e8 cb d2 > > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: 0000000000600040 > > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: ffff88007cafb198 > > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: 0000000000000000 > > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: 0000000000000000 > > [ 164.124296] FS: 0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000 > > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: 00000000001606e0 > > [ 164.126860] Call Trace: > > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > > [ 164.129281] process_one_work+0x227/0x630 > > [ 164.129684] worker_thread+0x3c/0x390 > > [ 164.130062] ? process_one_work+0x630/0x630 > > [ 164.130609] kthread+0x11d/0x140 > > [ 164.130936] ? kthread_park+0x80/0x80 > > [ 164.131339] ret_from_fork+0x3a/0x50 > > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd grace auth_rpcgss sunrpc > > [ 164.132719] CR2: 0000000000000008 > > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > > > > I just encountered this repeatedly with cthon04 general tests. > > MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" > > > -- > Chuck Lever > chucklever@gmail.com > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-10 21:49 ` Bruce Fields @ 2018-11-12 17:59 ` Trond Myklebust 2018-11-12 18:16 ` Chuck Lever 2018-11-12 18:24 ` bfields 0 siblings, 2 replies; 13+ messages in thread From: Trond Myklebust @ 2018-11-12 17:59 UTC (permalink / raw) To: bfields, chucklever; +Cc: schumakeranna, linux-nfs On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > Looks like it's the fault of > > 07d02a67b7faae "SUNRPC: Simplify lookup code" I'm having trouble reproducing this bug. I've tried both cthon and xfstests in a loop, so far without success (both NFSv3 and v4.1, but only sec=sys). Is there anything else you're doing that I might try? e.g. Are you running multiple workloads in parallel? Different users?.. > > --b. > > On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: > > > > > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@fieldses.org > > > > wrote: > > > > > > Since -rc1 my regression tests crash my client. Is this a known > > > problem? I'll investigate some more, I haven't even looked at > > > the code > > > yet or checked which test exactly is hitting this. > > > > > > --b. > > > > > > [ 164.109570] BUG: unable to handle kernel NULL pointer > > > dereference at 0000000000000008 > > > [ 164.111207] PGD 0 P4D 0 > > > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > > > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted > > > 4.20.0-rc1-13223-gafb6d1c474ef #1898 > > > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org- > > > 1.fc28 04/01/2014 > > > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 > > > [sunrpc] > > > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 > > > 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 > > > c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 > > > 8b 40 28 e8 cb d2 > > > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > > > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: > > > 0000000000600040 > > > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: > > > ffff88007cafb198 > > > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: > > > 0000000000000001 > > > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: > > > 0000000000000000 > > > [ 164.124296] FS: 0000000000000000(0000) > > > GS:ffff88007fd00000(0000) knlGS:0000000000000000 > > > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: > > > 00000000001606e0 > > > [ 164.126860] Call Trace: > > > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > > > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > > > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > > > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > > > [ 164.129281] process_one_work+0x227/0x630 > > > [ 164.129684] worker_thread+0x3c/0x390 > > > [ 164.130062] ? process_one_work+0x630/0x630 > > > [ 164.130609] kthread+0x11d/0x140 > > > [ 164.130936] ? kthread_park+0x80/0x80 > > > [ 164.131339] ret_from_fork+0x3a/0x50 > > > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd > > > grace auth_rpcgss sunrpc > > > [ 164.132719] CR2: 0000000000000008 > > > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > > > > > > > I just encountered this repeatedly with cthon04 general tests. > > > > MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" > > > > > > -- > > Chuck Lever > > chucklever@gmail.com > > > > -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 17:59 ` Trond Myklebust @ 2018-11-12 18:16 ` Chuck Lever 2018-11-12 18:18 ` Trond Myklebust 2018-11-12 18:24 ` bfields 1 sibling, 1 reply; 13+ messages in thread From: Chuck Lever @ 2018-11-12 18:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: Bruce Fields, schumakeranna, Linux NFS Mailing List > On Nov 12, 2018, at 9:59 AM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: >> Looks like it's the fault of >> >> 07d02a67b7faae "SUNRPC: Simplify lookup code" > > I'm having trouble reproducing this bug. I've tried both cthon and > xfstests in a loop, so far without success (both NFSv3 and v4.1, but > only sec=sys). Is there anything else you're doing that I might try? > > e.g. Are you running multiple workloads in parallel? Different users?.. Some observations, for what they are worth: Single user test running with no other NFS workload. I see the BUG fire at umount time, not during the test. My client is a two-node NUMA system with 12 cores, which could be more likely to trigger races. Export is tmpfs. >> --b. >> >> On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: >>> >>>> On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@fieldses.org >>>>> wrote: >>>> >>>> Since -rc1 my regression tests crash my client. Is this a known >>>> problem? I'll investigate some more, I haven't even looked at >>>> the code >>>> yet or checked which test exactly is hitting this. >>>> >>>> --b. >>>> >>>> [ 164.109570] BUG: unable to handle kernel NULL pointer >>>> dereference at 0000000000000008 >>>> [ 164.111207] PGD 0 P4D 0 >>>> [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI >>>> [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted >>>> 4.20.0-rc1-13223-gafb6d1c474ef #1898 >>>> [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, >>>> 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org- >>>> 1.fc28 04/01/2014 >>>> [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] >>>> [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 >>>> [sunrpc] >>>> [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 >>>> 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 >>>> c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 >>>> 8b 40 28 e8 cb d2 >>>> [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 >>>> [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: >>>> 0000000000600040 >>>> [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: >>>> ffff88007cafb198 >>>> [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: >>>> 0000000000000000 >>>> [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: >>>> 0000000000000001 >>>> [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: >>>> 0000000000000000 >>>> [ 164.124296] FS: 0000000000000000(0000) >>>> GS:ffff88007fd00000(0000) knlGS:0000000000000000 >>>> [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: >>>> 00000000001606e0 >>>> [ 164.126860] Call Trace: >>>> [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] >>>> [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] >>>> [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] >>>> [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] >>>> [ 164.129281] process_one_work+0x227/0x630 >>>> [ 164.129684] worker_thread+0x3c/0x390 >>>> [ 164.130062] ? process_one_work+0x630/0x630 >>>> [ 164.130609] kthread+0x11d/0x140 >>>> [ 164.130936] ? kthread_park+0x80/0x80 >>>> [ 164.131339] ret_from_fork+0x3a/0x50 >>>> [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd >>>> grace auth_rpcgss sunrpc >>>> [ 164.132719] CR2: 0000000000000008 >>>> [ 164.133050] ---[ end trace b4028a6781a696ad ]--- >>>> >>> >>> I just encountered this repeatedly with cthon04 general tests. >>> >>> MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" >>> >>> >>> -- >>> Chuck Lever >>> chucklever@gmail.com >>> >>> > -- > Trond Myklebust > CTO, Hammerspace Inc > 4300 El Camino Real, Suite 105 > Los Altos, CA 94022 > www.hammer.space -- Chuck Lever chucklever@gmail.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 18:16 ` Chuck Lever @ 2018-11-12 18:18 ` Trond Myklebust 0 siblings, 0 replies; 13+ messages in thread From: Trond Myklebust @ 2018-11-12 18:18 UTC (permalink / raw) To: chucklever; +Cc: bfields, schumakeranna, linux-nfs On Mon, 2018-11-12 at 10:16 -0800, Chuck Lever wrote: > > On Nov 12, 2018, at 9:59 AM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > Looks like it's the fault of > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > I'm having trouble reproducing this bug. I've tried both cthon and > > xfstests in a loop, so far without success (both NFSv3 and v4.1, > > but > > only sec=sys). Is there anything else you're doing that I might > > try? > > > > e.g. Are you running multiple workloads in parallel? Different > > users?.. > > Some observations, for what they are worth: > > Single user test running with no other NFS workload. > > I see the BUG fire at umount time, not during the test. > > My client is a two-node NUMA system with 12 cores, which > could be more likely to trigger races. > > Export is tmpfs. > Thanks! That's useful info. Particularly the observation that you're seeing it at umount time... > > > > --b. > > > > > > On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: > > > > > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields < > > > > > bfields@fieldses.org > > > > > > wrote: > > > > > > > > > > Since -rc1 my regression tests crash my client. Is this a > > > > > known > > > > > problem? I'll investigate some more, I haven't even looked > > > > > at > > > > > the code > > > > > yet or checked which test exactly is hitting this. > > > > > > > > > > --b. > > > > > > > > > > [ 164.109570] BUG: unable to handle kernel NULL pointer > > > > > dereference at 0000000000000008 > > > > > [ 164.111207] PGD 0 P4D 0 > > > > > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > > > > > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not > > > > > tainted > > > > > 4.20.0-rc1-13223-gafb6d1c474ef #1898 > > > > > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + > > > > > PIIX, > > > > > 1996), BIOS ?-20180531_142017-buildhw- > > > > > 08.phx2.fedoraproject.org- > > > > > 1.fc28 04/01/2014 > > > > > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > > > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 > > > > > [sunrpc] > > > > > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 > > > > > 4d b0 > > > > > 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 > > > > > 8d 45 > > > > > c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef > > > > > 48 > > > > > 8b 40 28 e8 cb d2 > > > > > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > > > > > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 > > > > > RCX: > > > > > 0000000000600040 > > > > > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 > > > > > RDI: > > > > > ffff88007cafb198 > > > > > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 > > > > > R09: > > > > > 0000000000000000 > > > > > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 > > > > > R12: > > > > > 0000000000000001 > > > > > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 > > > > > R15: > > > > > 0000000000000000 > > > > > [ 164.124296] FS: 0000000000000000(0000) > > > > > GS:ffff88007fd00000(0000) knlGS:0000000000000000 > > > > > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: > > > > > 0000000080050033 > > > > > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 > > > > > CR4: > > > > > 00000000001606e0 > > > > > [ 164.126860] Call Trace: > > > > > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > > > > > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > > > > > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > > > > > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > > > > > [ 164.129281] process_one_work+0x227/0x630 > > > > > [ 164.129684] worker_thread+0x3c/0x390 > > > > > [ 164.130062] ? process_one_work+0x630/0x630 > > > > > [ 164.130609] kthread+0x11d/0x140 > > > > > [ 164.130936] ? kthread_park+0x80/0x80 > > > > > [ 164.131339] ret_from_fork+0x3a/0x50 > > > > > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs > > > > > lockd > > > > > grace auth_rpcgss sunrpc > > > > > [ 164.132719] CR2: 0000000000000008 > > > > > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > > > > > > > > > > > > > I just encountered this repeatedly with cthon04 general tests. > > > > > > > > MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" > > > > > > > > > > > > -- > > > > Chuck Lever > > > > chucklever@gmail.com > > > > > > > > > > -- > > Trond Myklebust > > CTO, Hammerspace Inc > > 4300 El Camino Real, Suite 105 > > Los Altos, CA 94022 > > www.hammer.space > > -- > Chuck Lever > chucklever@gmail.com > > > -- Trond Myklebust CTO, Hammerspace Inc 4300 El Camino Real, Suite 105 Los Altos, CA 94022 www.hammer.space ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 17:59 ` Trond Myklebust 2018-11-12 18:16 ` Chuck Lever @ 2018-11-12 18:24 ` bfields 2018-11-12 21:17 ` Trond Myklebust 1 sibling, 1 reply; 13+ messages in thread From: bfields @ 2018-11-12 18:24 UTC (permalink / raw) To: Trond Myklebust; +Cc: chucklever, schumakeranna, linux-nfs On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust wrote: > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > Looks like it's the fault of > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > I'm having trouble reproducing this bug. I've tried both cthon and > xfstests in a loop, so far without success (both NFSv3 and v4.1, but > only sec=sys). Is there anything else you're doing that I might try? > > e.g. Are you running multiple workloads in parallel? Different users?.. Nothing that interesting. Currently it's connectathon over v4, v3, v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but just serially one after the other. Then some pynfs tests (which bypass the client), then xfstests over v4.2/sys. And also a few one-off locking tests of my own that probably aren't a factor here. (Hah, I just realized I was mounting with vers=4 and assuming that meant 4.0, but actually it's changed over time depending on the defaults, so currently those "v4" runs are actually all 4.2. Gah.) --b. > > > > > --b. > > > > On Fri, Nov 09, 2018 at 01:01:30PM -0500, Chuck Lever wrote: > > > > > > > On Nov 8, 2018, at 4:44 PM, J. Bruce Fields <bfields@fieldses.org > > > > > wrote: > > > > > > > > Since -rc1 my regression tests crash my client. Is this a known > > > > problem? I'll investigate some more, I haven't even looked at > > > > the code > > > > yet or checked which test exactly is hitting this. > > > > > > > > --b. > > > > > > > > [ 164.109570] BUG: unable to handle kernel NULL pointer > > > > dereference at 0000000000000008 > > > > [ 164.111207] PGD 0 P4D 0 > > > > [ 164.111528] Oops: 0000 [#1] PREEMPT SMP PTI > > > > [ 164.112303] CPU: 2 PID: 2947 Comm: kworker/u8:5 Not tainted > > > > 4.20.0-rc1-13223-gafb6d1c474ef #1898 > > > > [ 164.113487] Hardware name: QEMU Standard PC (i440FX + PIIX, > > > > 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org- > > > > 1.fc28 04/01/2014 > > > > [ 164.115301] Workqueue: rpciod rpc_async_schedule [sunrpc] > > > > [ 164.115920] RIP: 0010:rpcauth_lookup_credcache+0x3d/0x450 > > > > [sunrpc] > > > > [ 164.116700] Code: 89 f5 41 54 41 89 d4 53 48 83 ec 38 89 4d b0 > > > > 4c 8b 7f 20 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 48 8d 45 > > > > c0 48 89 45 c8 <41> 8b 77 08 48 89 45 c0 48 8b 47 10 4c 89 ef 48 > > > > 8b 40 28 e8 cb d2 > > > > [ 164.119299] RSP: 0018:ffffc90001ee3cf0 EFLAGS: 00010246 > > > > [ 164.119872] RAX: ffffc90001ee3d10 RBX: ffff88007cc18180 RCX: > > > > 0000000000600040 > > > > [ 164.120800] RDX: 0000000000000001 RSI: ffffc90001ee3d60 RDI: > > > > ffff88007cafb198 > > > > [ 164.121643] RBP: ffffc90001ee3d50 R08: 0000000000000000 R09: > > > > 0000000000000000 > > > > [ 164.122464] R10: 0000000000000000 R11: 0000000000000000 R12: > > > > 0000000000000001 > > > > [ 164.123373] R13: ffffc90001ee3d60 R14: ffff88007cafb198 R15: > > > > 0000000000000000 > > > > [ 164.124296] FS: 0000000000000000(0000) > > > > GS:ffff88007fd00000(0000) knlGS:0000000000000000 > > > > [ 164.125322] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 164.126006] CR2: 0000000000000008 CR3: 000000007829c003 CR4: > > > > 00000000001606e0 > > > > [ 164.126860] Call Trace: > > > > [ 164.127045] ? call_retry_reserve+0x30/0x30 [sunrpc] > > > > [ 164.127622] rpcauth_lookupcred+0xa0/0xc0 [sunrpc] > > > > [ 164.128200] rpcauth_refreshcred+0x15f/0x170 [sunrpc] > > > > [ 164.128807] __rpc_execute+0xa9/0x460 [sunrpc] > > > > [ 164.129281] process_one_work+0x227/0x630 > > > > [ 164.129684] worker_thread+0x3c/0x390 > > > > [ 164.130062] ? process_one_work+0x630/0x630 > > > > [ 164.130609] kthread+0x11d/0x140 > > > > [ 164.130936] ? kthread_park+0x80/0x80 > > > > [ 164.131339] ret_from_fork+0x3a/0x50 > > > > [ 164.131676] Modules linked in: rpcsec_gss_krb5 nfsv4 nfs lockd > > > > grace auth_rpcgss sunrpc > > > > [ 164.132719] CR2: 0000000000000008 > > > > [ 164.133050] ---[ end trace b4028a6781a696ad ]--- > > > > > > > > > > I just encountered this repeatedly with cthon04 general tests. > > > > > > MNTOPTIONS="rw,proto=tcp,vers=4.1,sec=sys" > > > > > > > > > -- > > > Chuck Lever > > > chucklever@gmail.com > > > > > > > -- > Trond Myklebust > CTO, Hammerspace Inc > 4300 El Camino Real, Suite 105 > Los Altos, CA 94022 > www.hammer.space > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 18:24 ` bfields @ 2018-11-12 21:17 ` Trond Myklebust 2018-11-12 23:01 ` bfields 0 siblings, 1 reply; 13+ messages in thread From: Trond Myklebust @ 2018-11-12 21:17 UTC (permalink / raw) To: bfields; +Cc: schumakeranna, chucklever, linux-nfs On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org wrote: > On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust wrote: > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > Looks like it's the fault of > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > I'm having trouble reproducing this bug. I've tried both cthon and > > xfstests in a loop, so far without success (both NFSv3 and v4.1, > > but > > only sec=sys). Is there anything else you're doing that I might > > try? > > > > e.g. Are you running multiple workloads in parallel? Different > > users?.. > > Nothing that interesting. Currently it's connectathon over v4, v3, > v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but just > serially > one after the other. Then some pynfs tests (which bypass the > client), > then xfstests over v4.2/sys. And also a few one-off locking tests of > my > own that probably aren't a factor here. > > (Hah, I just realized I was mounting with vers=4 and assuming that > meant > 4.0, but actually it's changed over time depending on the defaults, > so > currently those "v4" runs are actually all 4.2. Gah.) Are you perhaps both using RPCSEC_GSS w/ integrity checking for your EXCHANGE_ID authentication? The client will attempt to use that by default if rpc.gssd is running. I ask because I think the issue might be with RPCSEC_GSS, specifically with the RPCSEC_GSS context destroy code, hence the 2 patches that I just sent out. Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 21:17 ` Trond Myklebust @ 2018-11-12 23:01 ` bfields 2018-11-12 23:57 ` Trond Myklebust 0 siblings, 1 reply; 13+ messages in thread From: bfields @ 2018-11-12 23:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: schumakeranna, chucklever, linux-nfs On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust wrote: > On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org wrote: > > On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust wrote: > > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > > Looks like it's the fault of > > > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > > > I'm having trouble reproducing this bug. I've tried both cthon and > > > xfstests in a loop, so far without success (both NFSv3 and v4.1, > > > but > > > only sec=sys). Is there anything else you're doing that I might > > > try? > > > > > > e.g. Are you running multiple workloads in parallel? Different > > > users?.. > > > > Nothing that interesting. Currently it's connectathon over v4, v3, > > v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but just > > serially > > one after the other. Then some pynfs tests (which bypass the > > client), > > then xfstests over v4.2/sys. And also a few one-off locking tests of > > my > > own that probably aren't a factor here. > > > > (Hah, I just realized I was mounting with vers=4 and assuming that > > meant > > 4.0, but actually it's changed over time depending on the defaults, > > so > > currently those "v4" runs are actually all 4.2. Gah.) > > Are you perhaps both using RPCSEC_GSS w/ integrity checking for your > EXCHANGE_ID authentication? The client will attempt to use that by > default if rpc.gssd is running. Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p mounts are using krb5i for EXCHANGE_ID. > I ask because I think the issue might be with RPCSEC_GSS, specifically > with the RPCSEC_GSS context destroy code, hence the 2 patches that I > just sent out. Looks like my tests pass after applying those two patches. --b. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 23:01 ` bfields @ 2018-11-12 23:57 ` Trond Myklebust 2018-11-13 0:00 ` Chuck Lever 0 siblings, 1 reply; 13+ messages in thread From: Trond Myklebust @ 2018-11-12 23:57 UTC (permalink / raw) To: bfields; +Cc: schumakeranna, chucklever, linux-nfs On Mon, 2018-11-12 at 18:01 -0500, bfields@fieldses.org wrote: > On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust wrote: > > On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org wrote: > > > On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust wrote: > > > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > > > Looks like it's the fault of > > > > > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > > > > > I'm having trouble reproducing this bug. I've tried both cthon > > > > and > > > > xfstests in a loop, so far without success (both NFSv3 and > > > > v4.1, > > > > but > > > > only sec=sys). Is there anything else you're doing that I might > > > > try? > > > > > > > > e.g. Are you running multiple workloads in parallel? Different > > > > users?.. > > > > > > Nothing that interesting. Currently it's connectathon over v4, > > > v3, > > > v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but just > > > serially > > > one after the other. Then some pynfs tests (which bypass the > > > client), > > > then xfstests over v4.2/sys. And also a few one-off locking > > > tests of > > > my > > > own that probably aren't a factor here. > > > > > > (Hah, I just realized I was mounting with vers=4 and assuming > > > that > > > meant > > > 4.0, but actually it's changed over time depending on the > > > defaults, > > > so > > > currently those "v4" runs are actually all 4.2. Gah.) > > > > Are you perhaps both using RPCSEC_GSS w/ integrity checking for > > your > > EXCHANGE_ID authentication? The client will attempt to use that by > > default if rpc.gssd is running. > > Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p > mounts > are using krb5i for EXCHANGE_ID. > > > I ask because I think the issue might be with RPCSEC_GSS, > > specifically > > with the RPCSEC_GSS context destroy code, hence the 2 patches that > > I > > just sent out. > > Looks like my tests pass after applying those two patches. > Cool! Thanks for testing. Chuck, do you think the above might also explain your sighting of the same Oops? Cheers Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-12 23:57 ` Trond Myklebust @ 2018-11-13 0:00 ` Chuck Lever 2018-11-13 0:08 ` Trond Myklebust 0 siblings, 1 reply; 13+ messages in thread From: Chuck Lever @ 2018-11-13 0:00 UTC (permalink / raw) To: Trond Myklebust; +Cc: bfields, schumakeranna, chucklever, linux-nfs > On Nov 12, 2018, at 3:57 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > >> On Mon, 2018-11-12 at 18:01 -0500, bfields@fieldses.org wrote: >>> On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust wrote: >>>> On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org wrote: >>>>> On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust wrote: >>>>>> On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: >>>>>> Looks like it's the fault of >>>>>> >>>>>> 07d02a67b7faae "SUNRPC: Simplify lookup code" >>>>> >>>>> I'm having trouble reproducing this bug. I've tried both cthon >>>>> and >>>>> xfstests in a loop, so far without success (both NFSv3 and >>>>> v4.1, >>>>> but >>>>> only sec=sys). Is there anything else you're doing that I might >>>>> try? >>>>> >>>>> e.g. Are you running multiple workloads in parallel? Different >>>>> users?.. >>>> >>>> Nothing that interesting. Currently it's connectathon over v4, >>>> v3, >>>> v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but just >>>> serially >>>> one after the other. Then some pynfs tests (which bypass the >>>> client), >>>> then xfstests over v4.2/sys. And also a few one-off locking >>>> tests of >>>> my >>>> own that probably aren't a factor here. >>>> >>>> (Hah, I just realized I was mounting with vers=4 and assuming >>>> that >>>> meant >>>> 4.0, but actually it's changed over time depending on the >>>> defaults, >>>> so >>>> currently those "v4" runs are actually all 4.2. Gah.) >>> >>> Are you perhaps both using RPCSEC_GSS w/ integrity checking for >>> your >>> EXCHANGE_ID authentication? The client will attempt to use that by >>> default if rpc.gssd is running. >> >> Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p >> mounts >> are using krb5i for EXCHANGE_ID. >> >>> I ask because I think the issue might be with RPCSEC_GSS, >>> specifically >>> with the RPCSEC_GSS context destroy code, hence the 2 patches that >>> I >>> just sent out. >> >> Looks like my tests pass after applying those two patches. >> > > Cool! Thanks for testing. > > Chuck, do you think the above might also explain your sighting of the > same Oops? Could be, I don’t think I saw it until I started testing NFSv4. I won’t be able to confirm that until next week. > Cheers > Trond > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-13 0:00 ` Chuck Lever @ 2018-11-13 0:08 ` Trond Myklebust 2018-11-13 0:17 ` Chuck Lever 0 siblings, 1 reply; 13+ messages in thread From: Trond Myklebust @ 2018-11-13 0:08 UTC (permalink / raw) To: chuck.lever; +Cc: bfields, schumakeranna, chucklever, linux-nfs On Mon, 2018-11-12 at 16:00 -0800, Chuck Lever wrote: > > On Nov 12, 2018, at 3:57 PM, Trond Myklebust < > > trondmy@hammerspace.com> wrote: > > > > > On Mon, 2018-11-12 at 18:01 -0500, bfields@fieldses.org wrote: > > > > On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust > > > > wrote: > > > > > On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org > > > > > wrote: > > > > > > On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust > > > > > > wrote: > > > > > > > On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: > > > > > > > Looks like it's the fault of > > > > > > > > > > > > > > 07d02a67b7faae "SUNRPC: Simplify lookup code" > > > > > > > > > > > > I'm having trouble reproducing this bug. I've tried both > > > > > > cthon > > > > > > and > > > > > > xfstests in a loop, so far without success (both NFSv3 and > > > > > > v4.1, > > > > > > but > > > > > > only sec=sys). Is there anything else you're doing that I > > > > > > might > > > > > > try? > > > > > > > > > > > > e.g. Are you running multiple workloads in parallel? > > > > > > Different > > > > > > users?.. > > > > > > > > > > Nothing that interesting. Currently it's connectathon over > > > > > v4, > > > > > v3, > > > > > v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but > > > > > just > > > > > serially > > > > > one after the other. Then some pynfs tests (which bypass the > > > > > client), > > > > > then xfstests over v4.2/sys. And also a few one-off locking > > > > > tests of > > > > > my > > > > > own that probably aren't a factor here. > > > > > > > > > > (Hah, I just realized I was mounting with vers=4 and assuming > > > > > that > > > > > meant > > > > > 4.0, but actually it's changed over time depending on the > > > > > defaults, > > > > > so > > > > > currently those "v4" runs are actually all 4.2. Gah.) > > > > > > > > Are you perhaps both using RPCSEC_GSS w/ integrity checking for > > > > your > > > > EXCHANGE_ID authentication? The client will attempt to use that > > > > by > > > > default if rpc.gssd is running. > > > > > > Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p > > > mounts > > > are using krb5i for EXCHANGE_ID. > > > > > > > I ask because I think the issue might be with RPCSEC_GSS, > > > > specifically > > > > with the RPCSEC_GSS context destroy code, hence the 2 patches > > > > that > > > > I > > > > just sent out. > > > > > > Looks like my tests pass after applying those two patches. > > > > > > > Cool! Thanks for testing. > > > > Chuck, do you think the above might also explain your sighting of > > the > > same Oops? > > Could be, I don’t think I saw it until I started testing NFSv4. > I won’t be able to confirm that until next week. > OK. Either way, I know that part of the GSS code needs to be fixed in order to deal with the reference count being 0, so I think it is worth merging this patch now, and then we can see if there is more to the regression when you can get back to your test rig. Thanks Trond -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NULL dereference in rpcauth_lookup_credcache 2018-11-13 0:08 ` Trond Myklebust @ 2018-11-13 0:17 ` Chuck Lever 0 siblings, 0 replies; 13+ messages in thread From: Chuck Lever @ 2018-11-13 0:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: bfields, schumakeranna, chucklever, linux-nfs > On Nov 12, 2018, at 4:08 PM, Trond Myklebust <trondmy@hammerspace.com> wrote: > > On Mon, 2018-11-12 at 16:00 -0800, Chuck Lever wrote: >>> On Nov 12, 2018, at 3:57 PM, Trond Myklebust < >>> trondmy@hammerspace.com> wrote: >>> >>>>> On Mon, 2018-11-12 at 18:01 -0500, bfields@fieldses.org wrote: >>>>> On Mon, Nov 12, 2018 at 09:17:16PM +0000, Trond Myklebust >>>>> wrote: >>>>>> On Mon, 2018-11-12 at 13:24 -0500, bfields@fieldses.org >>>>>> wrote: >>>>>>> On Mon, Nov 12, 2018 at 05:59:33PM +0000, Trond Myklebust >>>>>>> wrote: >>>>>>>> On Sat, 2018-11-10 at 16:49 -0500, Bruce Fields wrote: >>>>>>>> Looks like it's the fault of >>>>>>>> >>>>>>>> 07d02a67b7faae "SUNRPC: Simplify lookup code" >>>>>>> >>>>>>> I'm having trouble reproducing this bug. I've tried both >>>>>>> cthon >>>>>>> and >>>>>>> xfstests in a loop, so far without success (both NFSv3 and >>>>>>> v4.1, >>>>>>> but >>>>>>> only sec=sys). Is there anything else you're doing that I >>>>>>> might >>>>>>> try? >>>>>>> >>>>>>> e.g. Are you running multiple workloads in parallel? >>>>>>> Different >>>>>>> users?.. >>>>>> >>>>>> Nothing that interesting. Currently it's connectathon over >>>>>> v4, >>>>>> v3, >>>>>> v4/krb5, v3/krb5, v4/krb5i, v4/krb5p, v4.1, v4.1/krb5, but >>>>>> just >>>>>> serially >>>>>> one after the other. Then some pynfs tests (which bypass the >>>>>> client), >>>>>> then xfstests over v4.2/sys. And also a few one-off locking >>>>>> tests of >>>>>> my >>>>>> own that probably aren't a factor here. >>>>>> >>>>>> (Hah, I just realized I was mounting with vers=4 and assuming >>>>>> that >>>>>> meant >>>>>> 4.0, but actually it's changed over time depending on the >>>>>> defaults, >>>>>> so >>>>>> currently those "v4" runs are actually all 4.2. Gah.) >>>>> >>>>> Are you perhaps both using RPCSEC_GSS w/ integrity checking for >>>>> your >>>>> EXCHANGE_ID authentication? The client will attempt to use that >>>>> by >>>>> default if rpc.gssd is running. >>>> >>>> Yes, in addition to the krb5i mount I'd expect the sys/krb5/krb5p >>>> mounts >>>> are using krb5i for EXCHANGE_ID. >>>> >>>>> I ask because I think the issue might be with RPCSEC_GSS, >>>>> specifically >>>>> with the RPCSEC_GSS context destroy code, hence the 2 patches >>>>> that >>>>> I >>>>> just sent out. >>>> >>>> Looks like my tests pass after applying those two patches. >>>> >>> >>> Cool! Thanks for testing. >>> >>> Chuck, do you think the above might also explain your sighting of >>> the >>> same Oops? >> >> Could be, I don’t think I saw it until I started testing NFSv4. >> I won’t be able to confirm that until next week. >> > > OK. Either way, I know that part of the GSS code needs to be fixed in > order to deal with the reference count being 0, so I think it is worth > merging this patch now, and then we can see if there is more to the > regression when you can get back to your test rig. Sounds fine to me. > Thanks > Trond > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-11-13 0:17 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-08 21:44 NULL dereference in rpcauth_lookup_credcache J. Bruce Fields 2018-11-09 18:01 ` Chuck Lever 2018-11-10 21:49 ` Bruce Fields 2018-11-12 17:59 ` Trond Myklebust 2018-11-12 18:16 ` Chuck Lever 2018-11-12 18:18 ` Trond Myklebust 2018-11-12 18:24 ` bfields 2018-11-12 21:17 ` Trond Myklebust 2018-11-12 23:01 ` bfields 2018-11-12 23:57 ` Trond Myklebust 2018-11-13 0:00 ` Chuck Lever 2018-11-13 0:08 ` Trond Myklebust 2018-11-13 0:17 ` Chuck Lever
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.