* kernel oops
@ 2023-03-31 14:23 Olga Kornievskaia
2023-03-31 14:25 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-03-31 14:23 UTC (permalink / raw)
To: Chuck Lever; +Cc: kernel-tls-handshake
Hi Chuck,
Have you seen this oops? This is not my testing so all I know was that
it was an attempt to mount with xprtsec=tls (against ONTAP).
[ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 316.939848] #PF: supervisor read access in kernel mode
[ 316.939914] #PF: error_code(0x0000) - not-present page
[ 316.939984] PGD 0 P4D 0
[ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
[ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
tainted 6.2.1+ #2
[ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
[ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
[ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
[ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
24 e8
[ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
[ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
[ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
[ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
[ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
[ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
[ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
knlGS:0000000000000000
[ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
[ 316.941391] PKRU: 55555554
[ 316.941443] Call Trace:
[ 316.941513] <TASK>
[ 316.941578] process_one_work+0x1b0/0x3c0
[ 316.941689] worker_thread+0x30/0x360
[ 316.941755] ? __pfx_worker_thread+0x10/0x10
[ 316.941822] kthread+0xd7/0x100
[ 316.941900] ? __pfx_kthread+0x10/0x10
[ 316.941968] ret_from_fork+0x29/0x50
[ 316.942049] </TASK>
[ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
rfkill mbcache jbd2 loop vsock_loopback
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
ipmi_msghandler fuse
[ 316.942550] CR2: 0000000000000018
Please note this was still on "commit
caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
origin/upcall-v7)". If you think somethings done to address it I'll
have them update to the latest.
Thank you.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-03-31 14:23 kernel oops Olga Kornievskaia
@ 2023-03-31 14:25 ` Chuck Lever III
2023-04-04 14:24 ` Olga Kornievskaia
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-03-31 14:25 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> Hi Chuck,
>
> Have you seen this oops? This is not my testing so all I know was that
> it was an attempt to mount with xprtsec=tls (against ONTAP).
>
> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [ 316.939848] #PF: supervisor read access in kernel mode
> [ 316.939914] #PF: error_code(0x0000) - not-present page
> [ 316.939984] PGD 0 P4D 0
> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> tainted 6.2.1+ #2
> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> 24 e8
> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> knlGS:0000000000000000
> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> [ 316.941391] PKRU: 55555554
> [ 316.941443] Call Trace:
> [ 316.941513] <TASK>
> [ 316.941578] process_one_work+0x1b0/0x3c0
> [ 316.941689] worker_thread+0x30/0x360
> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> [ 316.941822] kthread+0xd7/0x100
> [ 316.941900] ? __pfx_kthread+0x10/0x10
> [ 316.941968] ret_from_fork+0x29/0x50
> [ 316.942049] </TASK>
> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> rfkill mbcache jbd2 loop vsock_loopback
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> ipmi_msghandler fuse
> [ 316.942550] CR2: 0000000000000018
>
> Please note this was still on "commit
> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> origin/upcall-v7)". If you think somethings done to address it I'll
> have them update to the latest.
Update and retry. It has a familiar ring, but I can't say I've
seen this recently.
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-03-31 14:25 ` Chuck Lever III
@ 2023-04-04 14:24 ` Olga Kornievskaia
2023-04-04 14:31 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-04-04 14:24 UTC (permalink / raw)
To: Chuck Lever III; +Cc: kernel-tls-handshake
On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > Hi Chuck,
> >
> > Have you seen this oops? This is not my testing so all I know was that
> > it was an attempt to mount with xprtsec=tls (against ONTAP).
> >
> > [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> > [ 316.939848] #PF: supervisor read access in kernel mode
> > [ 316.939914] #PF: error_code(0x0000) - not-present page
> > [ 316.939984] PGD 0 P4D 0
> > [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> > tainted 6.2.1+ #2
> > [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> > Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> > [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> > [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> > [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> > ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> > ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> > 24 e8
> > [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> > [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> > [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> > [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> > [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> > [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> > [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> > knlGS:0000000000000000
> > [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> > [ 316.941391] PKRU: 55555554
> > [ 316.941443] Call Trace:
> > [ 316.941513] <TASK>
> > [ 316.941578] process_one_work+0x1b0/0x3c0
> > [ 316.941689] worker_thread+0x30/0x360
> > [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> > [ 316.941822] kthread+0xd7/0x100
> > [ 316.941900] ? __pfx_kthread+0x10/0x10
> > [ 316.941968] ret_from_fork+0x29/0x50
> > [ 316.942049] </TASK>
> > [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> > rfkill mbcache jbd2 loop vsock_loopback
> > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> > sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> > isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> > crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> > i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> > cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> > sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> > vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> > nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> > ipmi_msghandler fuse
> > [ 316.942550] CR2: 0000000000000018
> >
> > Please note this was still on "commit
> > caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> > origin/upcall-v7)". If you think somethings done to address it I'll
> > have them update to the latest.
>
> Update and retry. It has a familiar ring, but I can't say I've
> seen this recently.
Hi Chuck,
I've updated the kernel code to the latest commit
46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
client is still crashing but looks like in a different place. This
happens when the server terminates the connection (immediately) as the
tls handshake happens.
Here's the stack trace:
[ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
[ 677.274424] #PF: supervisor read access in kernel mode
[ 677.274517] #PF: error_code(0x0000) - not-present page
[ 677.274609] PGD 0 P4D 0
[ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
[ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
tainted 6.3.0-rc4+ #13
[ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
[ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
[ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
[ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
01 41
[ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
[ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
[ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
[ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
[ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
[ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
[ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
knlGS:0000000000000000
[ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
[ 677.276355] PKRU: 55555554
[ 677.276427] Call Trace:
[ 677.276521] <TASK>
[ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
[ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
[ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
[ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
[ 677.277086] process_one_work+0x1b0/0x3c0
[ 677.277209] worker_thread+0x34/0x360
[ 677.277303] ? __pfx_worker_thread+0x10/0x10
[ 677.277396] kthread+0xdb/0x110
[ 677.277527] ? __pfx_kthread+0x10/0x10
[ 677.277619] ret_from_fork+0x29/0x50
[ 677.277731] </TASK>
[ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
crc64 ipmi_devintf ipmi_msghandler fuse
[ 677.278383] CR2: 0000000000000030
Network trace from the server:
[kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=4059503415 TSecr=0 WS=128
2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=2256845073 TSecr=4059503415
3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
TSecr=2256845073
4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
V4 NULL Call
5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
V4 NULL Reply (Call In 4)
6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
TSecr=2256845073
7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
TSecr=4059503416
>
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 14:24 ` Olga Kornievskaia
@ 2023-04-04 14:31 ` Chuck Lever III
2023-04-04 14:44 ` Olga Kornievskaia
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-04-04 14:31 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>
>>> Hi Chuck,
>>>
>>> Have you seen this oops? This is not my testing so all I know was that
>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
>>>
>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>> [ 316.939848] #PF: supervisor read access in kernel mode
>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
>>> [ 316.939984] PGD 0 P4D 0
>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
>>> tainted 6.2.1+ #2
>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>> 24 e8
>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
>>> knlGS:0000000000000000
>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
>>> [ 316.941391] PKRU: 55555554
>>> [ 316.941443] Call Trace:
>>> [ 316.941513] <TASK>
>>> [ 316.941578] process_one_work+0x1b0/0x3c0
>>> [ 316.941689] worker_thread+0x30/0x360
>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
>>> [ 316.941822] kthread+0xd7/0x100
>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
>>> [ 316.941968] ret_from_fork+0x29/0x50
>>> [ 316.942049] </TASK>
>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
>>> rfkill mbcache jbd2 loop vsock_loopback
>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
>>> ipmi_msghandler fuse
>>> [ 316.942550] CR2: 0000000000000018
>>>
>>> Please note this was still on "commit
>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
>>> origin/upcall-v7)". If you think somethings done to address it I'll
>>> have them update to the latest.
>>
>> Update and retry. It has a familiar ring, but I can't say I've
>> seen this recently.
>
> Hi Chuck,
>
> I've updated the kernel code to the latest commit
> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
> client is still crashing but looks like in a different place. This
> happens when the server terminates the connection (immediately) as the
> tls handshake happens.
Why is the server doing that?
> Here's the stack trace:
> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
> [ 677.274424] #PF: supervisor read access in kernel mode
> [ 677.274517] #PF: error_code(0x0000) - not-present page
> [ 677.274609] PGD 0 P4D 0
> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
> tainted 6.3.0-rc4+ #13
> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
scripts/faddr2line on this RIP
Which pointer is NULL?
> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
> 01 41
> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
> knlGS:0000000000000000
> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
> [ 677.276355] PKRU: 55555554
> [ 677.276427] Call Trace:
> [ 677.276521] <TASK>
> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
> [ 677.277086] process_one_work+0x1b0/0x3c0
> [ 677.277209] worker_thread+0x34/0x360
> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
> [ 677.277396] kthread+0xdb/0x110
> [ 677.277527] ? __pfx_kthread+0x10/0x10
> [ 677.277619] ret_from_fork+0x29/0x50
> [ 677.277731] </TASK>
> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
> crc64 ipmi_devintf ipmi_msghandler fuse
> [ 677.278383] CR2: 0000000000000030
>
> Network trace from the server:
>
> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=4059503415 TSecr=0 WS=128
> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
> TSecr=2256845073
> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
> V4 NULL Call
> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
> V4 NULL Reply (Call In 4)
> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
> TSecr=2256845073
> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
> TSecr=4059503416
>
>
>>
>>
>> --
>> Chuck Lever
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 14:31 ` Chuck Lever III
@ 2023-04-04 14:44 ` Olga Kornievskaia
2023-04-04 14:54 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-04-04 14:44 UTC (permalink / raw)
To: Chuck Lever III; +Cc: kernel-tls-handshake
On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> Hi Chuck,
> >>>
> >>> Have you seen this oops? This is not my testing so all I know was that
> >>> it was an attempt to mount with xprtsec=tls (against ONTAP).
> >>>
> >>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> >>> [ 316.939848] #PF: supervisor read access in kernel mode
> >>> [ 316.939914] #PF: error_code(0x0000) - not-present page
> >>> [ 316.939984] PGD 0 P4D 0
> >>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> >>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> >>> tainted 6.2.1+ #2
> >>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> >>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> >>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> >>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> >>> 24 e8
> >>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> >>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> >>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> >>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> >>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> >>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> >>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> >>> [ 316.941391] PKRU: 55555554
> >>> [ 316.941443] Call Trace:
> >>> [ 316.941513] <TASK>
> >>> [ 316.941578] process_one_work+0x1b0/0x3c0
> >>> [ 316.941689] worker_thread+0x30/0x360
> >>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> >>> [ 316.941822] kthread+0xd7/0x100
> >>> [ 316.941900] ? __pfx_kthread+0x10/0x10
> >>> [ 316.941968] ret_from_fork+0x29/0x50
> >>> [ 316.942049] </TASK>
> >>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> >>> rfkill mbcache jbd2 loop vsock_loopback
> >>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> >>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> >>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> >>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> >>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> >>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> >>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> >>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> >>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> >>> ipmi_msghandler fuse
> >>> [ 316.942550] CR2: 0000000000000018
> >>>
> >>> Please note this was still on "commit
> >>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> >>> origin/upcall-v7)". If you think somethings done to address it I'll
> >>> have them update to the latest.
> >>
> >> Update and retry. It has a familiar ring, but I can't say I've
> >> seen this recently.
> >
> > Hi Chuck,
> >
> > I've updated the kernel code to the latest commit
> > 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
> > chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
> > client is still crashing but looks like in a different place. This
> > happens when the server terminates the connection (immediately) as the
> > tls handshake happens.
>
> Why is the server doing that?
I'm not sure what kind of a reason you are looking for. I think the
server can send an RST at any point in a TCP connection and the client
shouldn't crash. But the real reason is probably because the server's
code is getting developed and they ran into some error and resetting
the connection.
> > Here's the stack trace:
> > [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
> > [ 677.274424] #PF: supervisor read access in kernel mode
> > [ 677.274517] #PF: error_code(0x0000) - not-present page
> > [ 677.274609] PGD 0 P4D 0
> > [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
> > tainted 6.3.0-rc4+ #13
> > [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> > Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> > [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
> > [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
>
> scripts/faddr2line on this RIP
>
> Which pointer is NULL?
I apologize I don't have that info and I'm working on trying to figure
out how to get this information to you. Lab kernels are built with
CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
attempt to build on with debugging enabled (has been problematic
before for the lab machines so not sure if that'll be successful I
might have to hack the linux kernel to simulate the reset).
> > [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> > 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
> > 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
> > 01 41
> > [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
> > [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
> > [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
> > [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
> > [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
> > [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
> > [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
> > knlGS:0000000000000000
> > [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
> > [ 677.276355] PKRU: 55555554
> > [ 677.276427] Call Trace:
> > [ 677.276521] <TASK>
> > [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
> > [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> > [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> > [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
> > [ 677.277086] process_one_work+0x1b0/0x3c0
> > [ 677.277209] worker_thread+0x34/0x360
> > [ 677.277303] ? __pfx_worker_thread+0x10/0x10
> > [ 677.277396] kthread+0xdb/0x110
> > [ 677.277527] ? __pfx_kthread+0x10/0x10
> > [ 677.277619] ret_from_fork+0x29/0x50
> > [ 677.277731] </TASK>
> > [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> > jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> > vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> > intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> > libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> > vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> > drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
> > sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
> > serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
> > dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
> > crc64 ipmi_devintf ipmi_msghandler fuse
> > [ 677.278383] CR2: 0000000000000030
> >
> > Network trace from the server:
> >
> > [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
> > 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
> > 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> > TSval=4059503415 TSecr=0 WS=128
> > 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
> > 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> > SACK_PERM=1 TSval=2256845073 TSecr=4059503415
> > 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
> > 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
> > TSecr=2256845073
> > 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
> > V4 NULL Call
> > 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
> > V4 NULL Reply (Call In 4)
> > 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
> > 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
> > TSecr=2256845073
> > 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
> > 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
> > TSecr=4059503416
> >
> >
> >>
> >>
> >> --
> >> Chuck Lever
>
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 14:44 ` Olga Kornievskaia
@ 2023-04-04 14:54 ` Chuck Lever III
2023-04-04 16:15 ` Olga Kornievskaia
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-04-04 14:54 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>
>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>
>>>>> Hi Chuck,
>>>>>
>>>>> Have you seen this oops? This is not my testing so all I know was that
>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
>>>>>
>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
>>>>> [ 316.939984] PGD 0 P4D 0
>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
>>>>> tainted 6.2.1+ #2
>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>>>> 24 e8
>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
>>>>> [ 316.941391] PKRU: 55555554
>>>>> [ 316.941443] Call Trace:
>>>>> [ 316.941513] <TASK>
>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
>>>>> [ 316.941689] worker_thread+0x30/0x360
>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
>>>>> [ 316.941822] kthread+0xd7/0x100
>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
>>>>> [ 316.941968] ret_from_fork+0x29/0x50
>>>>> [ 316.942049] </TASK>
>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
>>>>> rfkill mbcache jbd2 loop vsock_loopback
>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
>>>>> ipmi_msghandler fuse
>>>>> [ 316.942550] CR2: 0000000000000018
>>>>>
>>>>> Please note this was still on "commit
>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
>>>>> have them update to the latest.
>>>>
>>>> Update and retry. It has a familiar ring, but I can't say I've
>>>> seen this recently.
>>>
>>> Hi Chuck,
>>>
>>> I've updated the kernel code to the latest commit
>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
>>> client is still crashing but looks like in a different place. This
>>> happens when the server terminates the connection (immediately) as the
>>> tls handshake happens.
>>
>> Why is the server doing that?
>
> I'm not sure what kind of a reason you are looking for. I think the
> server can send an RST at any point in a TCP connection and the client
> shouldn't crash.
No doubt.
> But the real reason is probably because the server's
> code is getting developed and they ran into some error and resetting
> the connection.
That's what I was looking for. I wanted to know if it was
Linux NFSD doing this.
>>> Here's the stack trace:
>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
>>> [ 677.274424] #PF: supervisor read access in kernel mode
>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
>>> [ 677.274609] PGD 0 P4D 0
>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
>>> tainted 6.3.0-rc4+ #13
>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
>>
>> scripts/faddr2line on this RIP
>>
>> Which pointer is NULL?
>
> I apologize I don't have that info and I'm working on trying to figure
> out how to get this information to you. Lab kernels are built with
> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
> attempt to build on with debugging enabled (has been problematic
> before for the lab machines so not sure if that'll be successful I
> might have to hack the linux kernel to simulate the reset).
Jakub asked that NULL checking be removed from handshake_req_cancel,
or at least that's what I read into his comment. That choice is
definitely flushing bugs.
So this crasher is familiar. req_cancel is getting called in some
scenario where the socket or request is partially filled in. We
have to figure out how that's happening.
If debug info is a problem, then just add a sarcastic quantity of
NULL checks in handshake_req_cancel, rebuild, and it should tell
you exactly which pointer is bad.
BPF might also be a solution, but I'm not sure if it relies on
debug info. Might be only BTF is needed.
>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
>>> 01 41
>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
>>> knlGS:0000000000000000
>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
>>> [ 677.276355] PKRU: 55555554
>>> [ 677.276427] Call Trace:
>>> [ 677.276521] <TASK>
>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
>>> [ 677.277086] process_one_work+0x1b0/0x3c0
>>> [ 677.277209] worker_thread+0x34/0x360
>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
>>> [ 677.277396] kthread+0xdb/0x110
>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
>>> [ 677.277619] ret_from_fork+0x29/0x50
>>> [ 677.277731] </TASK>
>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
>>> crc64 ipmi_devintf ipmi_msghandler fuse
>>> [ 677.278383] CR2: 0000000000000030
>>>
>>> Network trace from the server:
>>>
>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
>>> TSval=4059503415 TSecr=0 WS=128
>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
>>> TSecr=2256845073
>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
>>> V4 NULL Call
>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
>>> V4 NULL Reply (Call In 4)
>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
>>> TSecr=2256845073
>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
>>> TSecr=4059503416
>>>
>>>
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>>
>> --
>> Chuck Lever
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 14:54 ` Chuck Lever III
@ 2023-04-04 16:15 ` Olga Kornievskaia
2023-04-04 17:28 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-04-04 16:15 UTC (permalink / raw)
To: Chuck Lever III; +Cc: kernel-tls-handshake
On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>
> >>>>> Hi Chuck,
> >>>>>
> >>>>> Have you seen this oops? This is not my testing so all I know was that
> >>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
> >>>>>
> >>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> >>>>> [ 316.939848] #PF: supervisor read access in kernel mode
> >>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
> >>>>> [ 316.939984] PGD 0 P4D 0
> >>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> >>>>> tainted 6.2.1+ #2
> >>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> >>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> >>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> >>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> >>>>> 24 e8
> >>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> >>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> >>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> >>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> >>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> >>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> >>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> >>>>> [ 316.941391] PKRU: 55555554
> >>>>> [ 316.941443] Call Trace:
> >>>>> [ 316.941513] <TASK>
> >>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
> >>>>> [ 316.941689] worker_thread+0x30/0x360
> >>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> >>>>> [ 316.941822] kthread+0xd7/0x100
> >>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
> >>>>> [ 316.941968] ret_from_fork+0x29/0x50
> >>>>> [ 316.942049] </TASK>
> >>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> >>>>> rfkill mbcache jbd2 loop vsock_loopback
> >>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> >>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> >>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> >>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> >>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> >>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> >>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> >>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> >>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> >>>>> ipmi_msghandler fuse
> >>>>> [ 316.942550] CR2: 0000000000000018
> >>>>>
> >>>>> Please note this was still on "commit
> >>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> >>>>> origin/upcall-v7)". If you think somethings done to address it I'll
> >>>>> have them update to the latest.
> >>>>
> >>>> Update and retry. It has a familiar ring, but I can't say I've
> >>>> seen this recently.
> >>>
> >>> Hi Chuck,
> >>>
> >>> I've updated the kernel code to the latest commit
> >>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
> >>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
> >>> client is still crashing but looks like in a different place. This
> >>> happens when the server terminates the connection (immediately) as the
> >>> tls handshake happens.
> >>
> >> Why is the server doing that?
> >
> > I'm not sure what kind of a reason you are looking for. I think the
> > server can send an RST at any point in a TCP connection and the client
> > shouldn't crash.
>
> No doubt.
>
>
> > But the real reason is probably because the server's
> > code is getting developed and they ran into some error and resetting
> > the connection.
>
> That's what I was looking for. I wanted to know if it was
> Linux NFSD doing this.
>
>
> >>> Here's the stack trace:
> >>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
> >>> [ 677.274424] #PF: supervisor read access in kernel mode
> >>> [ 677.274517] #PF: error_code(0x0000) - not-present page
> >>> [ 677.274609] PGD 0 P4D 0
> >>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
> >>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
> >>> tainted 6.3.0-rc4+ #13
> >>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
> >>
> >> scripts/faddr2line on this RIP
> >>
> >> Which pointer is NULL?
> >
> > I apologize I don't have that info and I'm working on trying to figure
> > out how to get this information to you. Lab kernels are built with
> > CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
> > attempt to build on with debugging enabled (has been problematic
> > before for the lab machines so not sure if that'll be successful I
> > might have to hack the linux kernel to simulate the reset).
>
> Jakub asked that NULL checking be removed from handshake_req_cancel,
> or at least that's what I read into his comment. That choice is
> definitely flushing bugs.
>
> So this crasher is familiar. req_cancel is getting called in some
> scenario where the socket or request is partially filled in. We
> have to figure out how that's happening.
>
> If debug info is a problem, then just add a sarcastic quantity of
> NULL checks in handshake_req_cancel, rebuild, and it should tell
> you exactly which pointer is bad.
I added a check for sock-sk null and if so return true. I thought that
would "prevent" the crash but it then proceeded to crash elsewhere.
Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
checks that that'd be enough :)).
[ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
[ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
[ 284.368320] #PF: supervisor read access in kernel mode
[ 284.368363] #PF: error_code(0x0000) - not-present page
[ 284.368411] PGD 0 P4D 0
[ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
[ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
tainted 6.3.0-rc4+ #14
[ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
[ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
[ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
[ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
24 e8
[ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
[ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
[ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
[ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
[ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
[ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
[ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
knlGS:0000000000000000
[ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
[ 284.369458] PKRU: 55555554
[ 284.369483] Call Trace:
[ 284.369523] <TASK>
[ 284.369583] process_one_work+0x1b0/0x3c0
[ 284.369633] worker_thread+0x34/0x360
[ 284.369668] ? __pfx_worker_thread+0x10/0x10
[ 284.369704] kthread+0xdb/0x110
[ 284.369736] ? __pfx_kthread+0x10/0x10
[ 284.369767] ret_from_fork+0x29/0x50
[ 284.369808] </TASK>
[ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
[ 284.370300] CR2: 0000000000000018
>
> BPF might also be a solution, but I'm not sure if it relies on
> debug info. Might be only BTF is needed.
>
>
> >>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> >>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
> >>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
> >>> 01 41
> >>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
> >>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
> >>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
> >>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
> >>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
> >>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
> >>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
> >>> [ 677.276355] PKRU: 55555554
> >>> [ 677.276427] Call Trace:
> >>> [ 677.276521] <TASK>
> >>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
> >>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> >>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> >>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
> >>> [ 677.277086] process_one_work+0x1b0/0x3c0
> >>> [ 677.277209] worker_thread+0x34/0x360
> >>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
> >>> [ 677.277396] kthread+0xdb/0x110
> >>> [ 677.277527] ? __pfx_kthread+0x10/0x10
> >>> [ 677.277619] ret_from_fork+0x29/0x50
> >>> [ 677.277731] </TASK>
> >>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> >>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> >>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> >>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> >>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> >>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> >>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
> >>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
> >>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
> >>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
> >>> crc64 ipmi_devintf ipmi_msghandler fuse
> >>> [ 677.278383] CR2: 0000000000000030
> >>>
> >>> Network trace from the server:
> >>>
> >>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
> >>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
> >>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> >>> TSval=4059503415 TSecr=0 WS=128
> >>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
> >>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> >>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
> >>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
> >>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
> >>> TSecr=2256845073
> >>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
> >>> V4 NULL Call
> >>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
> >>> V4 NULL Reply (Call In 4)
> >>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
> >>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
> >>> TSecr=2256845073
> >>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
> >>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
> >>> TSecr=4059503416
> >>>
> >>>
> >>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>
> >>
> >> --
> >> Chuck Lever
>
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 16:15 ` Olga Kornievskaia
@ 2023-04-04 17:28 ` Chuck Lever III
2023-04-04 19:11 ` Olga Kornievskaia
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-04-04 17:28 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Apr 4, 2023, at 12:15 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>
>>> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>
>>>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>
>>>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>
>>>>>>> Hi Chuck,
>>>>>>>
>>>>>>> Have you seen this oops? This is not my testing so all I know was that
>>>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
>>>>>>>
>>>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
>>>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
>>>>>>> [ 316.939984] PGD 0 P4D 0
>>>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
>>>>>>> tainted 6.2.1+ #2
>>>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
>>>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>>>>>> 24 e8
>>>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
>>>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
>>>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
>>>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
>>>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
>>>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
>>>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
>>>>>>> [ 316.941391] PKRU: 55555554
>>>>>>> [ 316.941443] Call Trace:
>>>>>>> [ 316.941513] <TASK>
>>>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
>>>>>>> [ 316.941689] worker_thread+0x30/0x360
>>>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
>>>>>>> [ 316.941822] kthread+0xd7/0x100
>>>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
>>>>>>> [ 316.941968] ret_from_fork+0x29/0x50
>>>>>>> [ 316.942049] </TASK>
>>>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
>>>>>>> rfkill mbcache jbd2 loop vsock_loopback
>>>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
>>>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
>>>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
>>>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
>>>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
>>>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
>>>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
>>>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
>>>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
>>>>>>> ipmi_msghandler fuse
>>>>>>> [ 316.942550] CR2: 0000000000000018
>>>>>>>
>>>>>>> Please note this was still on "commit
>>>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
>>>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
>>>>>>> have them update to the latest.
>>>>>>
>>>>>> Update and retry. It has a familiar ring, but I can't say I've
>>>>>> seen this recently.
>>>>>
>>>>> Hi Chuck,
>>>>>
>>>>> I've updated the kernel code to the latest commit
>>>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
>>>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
>>>>> client is still crashing but looks like in a different place. This
>>>>> happens when the server terminates the connection (immediately) as the
>>>>> tls handshake happens.
>>>>
>>>> Why is the server doing that?
>>>
>>> I'm not sure what kind of a reason you are looking for. I think the
>>> server can send an RST at any point in a TCP connection and the client
>>> shouldn't crash.
>>
>> No doubt.
>>
>>
>>> But the real reason is probably because the server's
>>> code is getting developed and they ran into some error and resetting
>>> the connection.
>>
>> That's what I was looking for. I wanted to know if it was
>> Linux NFSD doing this.
>>
>>
>>>>> Here's the stack trace:
>>>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
>>>>> [ 677.274424] #PF: supervisor read access in kernel mode
>>>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
>>>>> [ 677.274609] PGD 0 P4D 0
>>>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
>>>>> tainted 6.3.0-rc4+ #13
>>>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
>>>>
>>>> scripts/faddr2line on this RIP
>>>>
>>>> Which pointer is NULL?
>>>
>>> I apologize I don't have that info and I'm working on trying to figure
>>> out how to get this information to you. Lab kernels are built with
>>> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
>>> attempt to build on with debugging enabled (has been problematic
>>> before for the lab machines so not sure if that'll be successful I
>>> might have to hack the linux kernel to simulate the reset).
>>
>> Jakub asked that NULL checking be removed from handshake_req_cancel,
>> or at least that's what I read into his comment. That choice is
>> definitely flushing bugs.
>>
>> So this crasher is familiar. req_cancel is getting called in some
>> scenario where the socket or request is partially filled in. We
>> have to figure out how that's happening.
>>
>> If debug info is a problem, then just add a sarcastic quantity of
>> NULL checks in handshake_req_cancel, rebuild, and it should tell
>> you exactly which pointer is bad.
>
> I added a check for sock-sk null and if so return true.
If it returns true, xs_tls_handshake_sync() probably completes
successfully.
Perhaps handshake_req_cancel needs to return false in cases
where it can't find a pending request.
> I thought that
> would "prevent" the crash but it then proceeded to crash elsewhere.
> Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
> checks that that'd be enough :)).
>
> [ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
> [ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
> [ 284.368320] #PF: supervisor read access in kernel mode
> [ 284.368363] #PF: error_code(0x0000) - not-present page
> [ 284.368411] PGD 0 P4D 0
> [ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
> [ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
> tainted 6.3.0-rc4+ #14
> [ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> [ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
> [ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
> [ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> 24 e8
> [ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
> [ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
> [ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
> [ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
> [ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
> [ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
> [ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
> knlGS:0000000000000000
> [ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
> [ 284.369458] PKRU: 55555554
> [ 284.369483] Call Trace:
> [ 284.369523] <TASK>
> [ 284.369583] process_one_work+0x1b0/0x3c0
> [ 284.369633] worker_thread+0x34/0x360
> [ 284.369668] ? __pfx_worker_thread+0x10/0x10
> [ 284.369704] kthread+0xdb/0x110
> [ 284.369736] ? __pfx_kthread+0x10/0x10
> [ 284.369767] ret_from_fork+0x29/0x50
> [ 284.369808] </TASK>
> [ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
> sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
> crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
> dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
> crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
> [ 284.370300] CR2: 0000000000000018
>
>
>>
>> BPF might also be a solution, but I'm not sure if it relies on
>> debug info. Might be only BTF is needed.
>>
>>
>>>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
>>>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
>>>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
>>>>> 01 41
>>>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
>>>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
>>>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
>>>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
>>>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
>>>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
>>>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
>>>>> [ 677.276355] PKRU: 55555554
>>>>> [ 677.276427] Call Trace:
>>>>> [ 677.276521] <TASK>
>>>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
>>>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
>>>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
>>>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
>>>>> [ 677.277086] process_one_work+0x1b0/0x3c0
>>>>> [ 677.277209] worker_thread+0x34/0x360
>>>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
>>>>> [ 677.277396] kthread+0xdb/0x110
>>>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
>>>>> [ 677.277619] ret_from_fork+0x29/0x50
>>>>> [ 677.277731] </TASK>
>>>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
>>>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
>>>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
>>>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
>>>>> crc64 ipmi_devintf ipmi_msghandler fuse
>>>>> [ 677.278383] CR2: 0000000000000030
>>>>>
>>>>> Network trace from the server:
>>>>>
>>>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
>>>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
>>>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
>>>>> TSval=4059503415 TSecr=0 WS=128
>>>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
>>>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
>>>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
>>>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
>>>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
>>>>> TSecr=2256845073
>>>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
>>>>> V4 NULL Call
>>>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
>>>>> V4 NULL Reply (Call In 4)
>>>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
>>>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
>>>>> TSecr=2256845073
>>>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
>>>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
>>>>> TSecr=4059503416
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>>
>> --
>> Chuck Lever
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 17:28 ` Chuck Lever III
@ 2023-04-04 19:11 ` Olga Kornievskaia
2023-04-04 19:14 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-04-04 19:11 UTC (permalink / raw)
To: Chuck Lever III; +Cc: kernel-tls-handshake
On Tue, Apr 4, 2023 at 1:28 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Apr 4, 2023, at 12:15 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>
> >>>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>
> >>>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>>>
> >>>>>>> Hi Chuck,
> >>>>>>>
> >>>>>>> Have you seen this oops? This is not my testing so all I know was that
> >>>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
> >>>>>>>
> >>>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> >>>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
> >>>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
> >>>>>>> [ 316.939984] PGD 0 P4D 0
> >>>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> >>>>>>> tainted 6.2.1+ #2
> >>>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> >>>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> >>>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> >>>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> >>>>>>> 24 e8
> >>>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> >>>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> >>>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> >>>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> >>>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> >>>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> >>>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> >>>>>>> knlGS:0000000000000000
> >>>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> >>>>>>> [ 316.941391] PKRU: 55555554
> >>>>>>> [ 316.941443] Call Trace:
> >>>>>>> [ 316.941513] <TASK>
> >>>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
> >>>>>>> [ 316.941689] worker_thread+0x30/0x360
> >>>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> >>>>>>> [ 316.941822] kthread+0xd7/0x100
> >>>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
> >>>>>>> [ 316.941968] ret_from_fork+0x29/0x50
> >>>>>>> [ 316.942049] </TASK>
> >>>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> >>>>>>> rfkill mbcache jbd2 loop vsock_loopback
> >>>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> >>>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> >>>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> >>>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> >>>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> >>>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> >>>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> >>>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> >>>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> >>>>>>> ipmi_msghandler fuse
> >>>>>>> [ 316.942550] CR2: 0000000000000018
> >>>>>>>
> >>>>>>> Please note this was still on "commit
> >>>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> >>>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
> >>>>>>> have them update to the latest.
> >>>>>>
> >>>>>> Update and retry. It has a familiar ring, but I can't say I've
> >>>>>> seen this recently.
> >>>>>
> >>>>> Hi Chuck,
> >>>>>
> >>>>> I've updated the kernel code to the latest commit
> >>>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
> >>>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
> >>>>> client is still crashing but looks like in a different place. This
> >>>>> happens when the server terminates the connection (immediately) as the
> >>>>> tls handshake happens.
> >>>>
> >>>> Why is the server doing that?
> >>>
> >>> I'm not sure what kind of a reason you are looking for. I think the
> >>> server can send an RST at any point in a TCP connection and the client
> >>> shouldn't crash.
> >>
> >> No doubt.
> >>
> >>
> >>> But the real reason is probably because the server's
> >>> code is getting developed and they ran into some error and resetting
> >>> the connection.
> >>
> >> That's what I was looking for. I wanted to know if it was
> >> Linux NFSD doing this.
> >>
> >>
> >>>>> Here's the stack trace:
> >>>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
> >>>>> [ 677.274424] #PF: supervisor read access in kernel mode
> >>>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
> >>>>> [ 677.274609] PGD 0 P4D 0
> >>>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
> >>>>> tainted 6.3.0-rc4+ #13
> >>>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
> >>>>
> >>>> scripts/faddr2line on this RIP
> >>>>
> >>>> Which pointer is NULL?
> >>>
> >>> I apologize I don't have that info and I'm working on trying to figure
> >>> out how to get this information to you. Lab kernels are built with
> >>> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
> >>> attempt to build on with debugging enabled (has been problematic
> >>> before for the lab machines so not sure if that'll be successful I
> >>> might have to hack the linux kernel to simulate the reset).
> >>
> >> Jakub asked that NULL checking be removed from handshake_req_cancel,
> >> or at least that's what I read into his comment. That choice is
> >> definitely flushing bugs.
> >>
> >> So this crasher is familiar. req_cancel is getting called in some
> >> scenario where the socket or request is partially filled in. We
> >> have to figure out how that's happening.
> >>
> >> If debug info is a problem, then just add a sarcastic quantity of
> >> NULL checks in handshake_req_cancel, rebuild, and it should tell
> >> you exactly which pointer is bad.
> >
> > I added a check for sock-sk null and if so return true.
>
> If it returns true, xs_tls_handshake_sync() probably completes
> successfully.
>
> Perhaps handshake_req_cancel needs to return false in cases
> where it can't find a pending request.
I'm not sure if you were suggesting me to do this but I tried:
if (!sock->sk) return false;
and that led to a mount not crashing but timing out.
>
>
> > I thought that
> > would "prevent" the crash but it then proceeded to crash elsewhere.
> > Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
> > checks that that'd be enough :)).
> >
> > [ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
> > [ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
> > [ 284.368320] #PF: supervisor read access in kernel mode
> > [ 284.368363] #PF: error_code(0x0000) - not-present page
> > [ 284.368411] PGD 0 P4D 0
> > [ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
> > [ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
> > tainted 6.3.0-rc4+ #14
> > [ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> > Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> > [ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
> > [ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
> > [ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> > ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> > ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> > 24 e8
> > [ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
> > [ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
> > [ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
> > [ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
> > [ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
> > [ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
> > [ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
> > knlGS:0000000000000000
> > [ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
> > [ 284.369458] PKRU: 55555554
> > [ 284.369483] Call Trace:
> > [ 284.369523] <TASK>
> > [ 284.369583] process_one_work+0x1b0/0x3c0
> > [ 284.369633] worker_thread+0x34/0x360
> > [ 284.369668] ? __pfx_worker_thread+0x10/0x10
> > [ 284.369704] kthread+0xdb/0x110
> > [ 284.369736] ? __pfx_kthread+0x10/0x10
> > [ 284.369767] ret_from_fork+0x29/0x50
> > [ 284.369808] </TASK>
> > [ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> > dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> > jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> > vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> > intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> > libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> > vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> > drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
> > sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
> > crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
> > dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
> > crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
> > [ 284.370300] CR2: 0000000000000018
> >
> >
> >>
> >> BPF might also be a solution, but I'm not sure if it relies on
> >> debug info. Might be only BTF is needed.
> >>
> >>
> >>>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> >>>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
> >>>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
> >>>>> 01 41
> >>>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
> >>>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
> >>>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
> >>>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
> >>>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
> >>>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
> >>>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
> >>>>> knlGS:0000000000000000
> >>>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
> >>>>> [ 677.276355] PKRU: 55555554
> >>>>> [ 677.276427] Call Trace:
> >>>>> [ 677.276521] <TASK>
> >>>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
> >>>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> >>>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> >>>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
> >>>>> [ 677.277086] process_one_work+0x1b0/0x3c0
> >>>>> [ 677.277209] worker_thread+0x34/0x360
> >>>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
> >>>>> [ 677.277396] kthread+0xdb/0x110
> >>>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
> >>>>> [ 677.277619] ret_from_fork+0x29/0x50
> >>>>> [ 677.277731] </TASK>
> >>>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> >>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> >>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> >>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> >>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> >>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> >>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
> >>>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
> >>>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
> >>>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
> >>>>> crc64 ipmi_devintf ipmi_msghandler fuse
> >>>>> [ 677.278383] CR2: 0000000000000030
> >>>>>
> >>>>> Network trace from the server:
> >>>>>
> >>>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
> >>>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
> >>>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> >>>>> TSval=4059503415 TSecr=0 WS=128
> >>>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
> >>>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> >>>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
> >>>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
> >>>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
> >>>>> TSecr=2256845073
> >>>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
> >>>>> V4 NULL Call
> >>>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
> >>>>> V4 NULL Reply (Call In 4)
> >>>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
> >>>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
> >>>>> TSecr=2256845073
> >>>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
> >>>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
> >>>>> TSecr=4059503416
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Chuck Lever
> >>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>
> >>
> >> --
> >> Chuck Lever
>
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 19:11 ` Olga Kornievskaia
@ 2023-04-04 19:14 ` Chuck Lever III
2023-04-04 19:26 ` Olga Kornievskaia
0 siblings, 1 reply; 12+ messages in thread
From: Chuck Lever III @ 2023-04-04 19:14 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Apr 4, 2023, at 3:11 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Tue, Apr 4, 2023 at 1:28 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Apr 4, 2023, at 12:15 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>
>>> On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>
>>>>> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>
>>>>>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>
>>>>>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>>>
>>>>>>>>> Hi Chuck,
>>>>>>>>>
>>>>>>>>> Have you seen this oops? This is not my testing so all I know was that
>>>>>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
>>>>>>>>>
>>>>>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>>>>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
>>>>>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
>>>>>>>>> [ 316.939984] PGD 0 P4D 0
>>>>>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
>>>>>>>>> tainted 6.2.1+ #2
>>>>>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
>>>>>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>>>>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>>>>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>>>>>>>> 24 e8
>>>>>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
>>>>>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
>>>>>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
>>>>>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
>>>>>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
>>>>>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
>>>>>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
>>>>>>>>> knlGS:0000000000000000
>>>>>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
>>>>>>>>> [ 316.941391] PKRU: 55555554
>>>>>>>>> [ 316.941443] Call Trace:
>>>>>>>>> [ 316.941513] <TASK>
>>>>>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
>>>>>>>>> [ 316.941689] worker_thread+0x30/0x360
>>>>>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
>>>>>>>>> [ 316.941822] kthread+0xd7/0x100
>>>>>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
>>>>>>>>> [ 316.941968] ret_from_fork+0x29/0x50
>>>>>>>>> [ 316.942049] </TASK>
>>>>>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
>>>>>>>>> rfkill mbcache jbd2 loop vsock_loopback
>>>>>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
>>>>>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
>>>>>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
>>>>>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
>>>>>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
>>>>>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
>>>>>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
>>>>>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
>>>>>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
>>>>>>>>> ipmi_msghandler fuse
>>>>>>>>> [ 316.942550] CR2: 0000000000000018
>>>>>>>>>
>>>>>>>>> Please note this was still on "commit
>>>>>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
>>>>>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
>>>>>>>>> have them update to the latest.
>>>>>>>>
>>>>>>>> Update and retry. It has a familiar ring, but I can't say I've
>>>>>>>> seen this recently.
>>>>>>>
>>>>>>> Hi Chuck,
>>>>>>>
>>>>>>> I've updated the kernel code to the latest commit
>>>>>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
>>>>>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
>>>>>>> client is still crashing but looks like in a different place. This
>>>>>>> happens when the server terminates the connection (immediately) as the
>>>>>>> tls handshake happens.
>>>>>>
>>>>>> Why is the server doing that?
>>>>>
>>>>> I'm not sure what kind of a reason you are looking for. I think the
>>>>> server can send an RST at any point in a TCP connection and the client
>>>>> shouldn't crash.
>>>>
>>>> No doubt.
>>>>
>>>>
>>>>> But the real reason is probably because the server's
>>>>> code is getting developed and they ran into some error and resetting
>>>>> the connection.
>>>>
>>>> That's what I was looking for. I wanted to know if it was
>>>> Linux NFSD doing this.
>>>>
>>>>
>>>>>>> Here's the stack trace:
>>>>>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
>>>>>>> [ 677.274424] #PF: supervisor read access in kernel mode
>>>>>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
>>>>>>> [ 677.274609] PGD 0 P4D 0
>>>>>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
>>>>>>> tainted 6.3.0-rc4+ #13
>>>>>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
>>>>>>
>>>>>> scripts/faddr2line on this RIP
>>>>>>
>>>>>> Which pointer is NULL?
>>>>>
>>>>> I apologize I don't have that info and I'm working on trying to figure
>>>>> out how to get this information to you. Lab kernels are built with
>>>>> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
>>>>> attempt to build on with debugging enabled (has been problematic
>>>>> before for the lab machines so not sure if that'll be successful I
>>>>> might have to hack the linux kernel to simulate the reset).
>>>>
>>>> Jakub asked that NULL checking be removed from handshake_req_cancel,
>>>> or at least that's what I read into his comment. That choice is
>>>> definitely flushing bugs.
>>>>
>>>> So this crasher is familiar. req_cancel is getting called in some
>>>> scenario where the socket or request is partially filled in. We
>>>> have to figure out how that's happening.
>>>>
>>>> If debug info is a problem, then just add a sarcastic quantity of
>>>> NULL checks in handshake_req_cancel, rebuild, and it should tell
>>>> you exactly which pointer is bad.
>>>
>>> I added a check for sock-sk null and if so return true.
>>
>> If it returns true, xs_tls_handshake_sync() probably completes
>> successfully.
>>
>> Perhaps handshake_req_cancel needs to return false in cases
>> where it can't find a pending request.
>
> I'm not sure if you were suggesting me to do this but I tried:
> if (!sock->sk) return false;
> and that led to a mount not crashing but timing out.
That's the behavior I would expect. Was the failure immediate,
or did it take a few seconds for the mount command to exit?
>>> I thought that
>>> would "prevent" the crash but it then proceeded to crash elsewhere.
>>> Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
>>> checks that that'd be enough :)).
>>>
>>> [ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
>>> [ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>> [ 284.368320] #PF: supervisor read access in kernel mode
>>> [ 284.368363] #PF: error_code(0x0000) - not-present page
>>> [ 284.368411] PGD 0 P4D 0
>>> [ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
>>> [ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
>>> tainted 6.3.0-rc4+ #14
>>> [ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>> [ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>> [ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
>>> [ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>> ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>> 24 e8
>>> [ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
>>> [ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
>>> [ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
>>> [ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
>>> [ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
>>> [ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
>>> [ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
>>> knlGS:0000000000000000
>>> [ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
>>> [ 284.369458] PKRU: 55555554
>>> [ 284.369483] Call Trace:
>>> [ 284.369523] <TASK>
>>> [ 284.369583] process_one_work+0x1b0/0x3c0
>>> [ 284.369633] worker_thread+0x34/0x360
>>> [ 284.369668] ? __pfx_worker_thread+0x10/0x10
>>> [ 284.369704] kthread+0xdb/0x110
>>> [ 284.369736] ? __pfx_kthread+0x10/0x10
>>> [ 284.369767] ret_from_fork+0x29/0x50
>>> [ 284.369808] </TASK>
>>> [ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
>>> sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
>>> crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
>>> dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
>>> crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
>>> [ 284.370300] CR2: 0000000000000018
>>>
>>>
>>>>
>>>> BPF might also be a solution, but I'm not sure if it relies on
>>>> debug info. Might be only BTF is needed.
>>>>
>>>>
>>>>>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
>>>>>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
>>>>>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
>>>>>>> 01 41
>>>>>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
>>>>>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
>>>>>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
>>>>>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
>>>>>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
>>>>>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
>>>>>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
>>>>>>> [ 677.276355] PKRU: 55555554
>>>>>>> [ 677.276427] Call Trace:
>>>>>>> [ 677.276521] <TASK>
>>>>>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
>>>>>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
>>>>>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
>>>>>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
>>>>>>> [ 677.277086] process_one_work+0x1b0/0x3c0
>>>>>>> [ 677.277209] worker_thread+0x34/0x360
>>>>>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
>>>>>>> [ 677.277396] kthread+0xdb/0x110
>>>>>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
>>>>>>> [ 677.277619] ret_from_fork+0x29/0x50
>>>>>>> [ 677.277731] </TASK>
>>>>>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>>>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>>>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>>>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>>>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>>>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>>>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
>>>>>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
>>>>>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
>>>>>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
>>>>>>> crc64 ipmi_devintf ipmi_msghandler fuse
>>>>>>> [ 677.278383] CR2: 0000000000000030
>>>>>>>
>>>>>>> Network trace from the server:
>>>>>>>
>>>>>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
>>>>>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
>>>>>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
>>>>>>> TSval=4059503415 TSecr=0 WS=128
>>>>>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
>>>>>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
>>>>>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
>>>>>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
>>>>>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
>>>>>>> TSecr=2256845073
>>>>>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
>>>>>>> V4 NULL Call
>>>>>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
>>>>>>> V4 NULL Reply (Call In 4)
>>>>>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
>>>>>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
>>>>>>> TSecr=2256845073
>>>>>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
>>>>>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
>>>>>>> TSecr=4059503416
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>>
>> --
>> Chuck Lever
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 19:14 ` Chuck Lever III
@ 2023-04-04 19:26 ` Olga Kornievskaia
2023-04-04 19:30 ` Chuck Lever III
0 siblings, 1 reply; 12+ messages in thread
From: Olga Kornievskaia @ 2023-04-04 19:26 UTC (permalink / raw)
To: Chuck Lever III; +Cc: kernel-tls-handshake
On Tue, Apr 4, 2023 at 3:14 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>
>
>
> > On Apr 4, 2023, at 3:11 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >
> > On Tue, Apr 4, 2023 at 1:28 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>
> >>
> >>
> >>> On Apr 4, 2023, at 12:15 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>
> >>> On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>
> >>>>> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>>>
> >>>>>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>>>
> >>>>>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Chuck,
> >>>>>>>>>
> >>>>>>>>> Have you seen this oops? This is not my testing so all I know was that
> >>>>>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
> >>>>>>>>>
> >>>>>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
> >>>>>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
> >>>>>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
> >>>>>>>>> [ 316.939984] PGD 0 P4D 0
> >>>>>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
> >>>>>>>>> tainted 6.2.1+ #2
> >>>>>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>>>>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>>>>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
> >>>>>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> >>>>>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> >>>>>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> >>>>>>>>> 24 e8
> >>>>>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
> >>>>>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
> >>>>>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
> >>>>>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
> >>>>>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
> >>>>>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
> >>>>>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
> >>>>>>>>> knlGS:0000000000000000
> >>>>>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
> >>>>>>>>> [ 316.941391] PKRU: 55555554
> >>>>>>>>> [ 316.941443] Call Trace:
> >>>>>>>>> [ 316.941513] <TASK>
> >>>>>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
> >>>>>>>>> [ 316.941689] worker_thread+0x30/0x360
> >>>>>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
> >>>>>>>>> [ 316.941822] kthread+0xd7/0x100
> >>>>>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
> >>>>>>>>> [ 316.941968] ret_from_fork+0x29/0x50
> >>>>>>>>> [ 316.942049] </TASK>
> >>>>>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>>>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
> >>>>>>>>> rfkill mbcache jbd2 loop vsock_loopback
> >>>>>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
> >>>>>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
> >>>>>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
> >>>>>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
> >>>>>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
> >>>>>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
> >>>>>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
> >>>>>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
> >>>>>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
> >>>>>>>>> ipmi_msghandler fuse
> >>>>>>>>> [ 316.942550] CR2: 0000000000000018
> >>>>>>>>>
> >>>>>>>>> Please note this was still on "commit
> >>>>>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
> >>>>>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
> >>>>>>>>> have them update to the latest.
> >>>>>>>>
> >>>>>>>> Update and retry. It has a familiar ring, but I can't say I've
> >>>>>>>> seen this recently.
> >>>>>>>
> >>>>>>> Hi Chuck,
> >>>>>>>
> >>>>>>> I've updated the kernel code to the latest commit
> >>>>>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
> >>>>>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
> >>>>>>> client is still crashing but looks like in a different place. This
> >>>>>>> happens when the server terminates the connection (immediately) as the
> >>>>>>> tls handshake happens.
> >>>>>>
> >>>>>> Why is the server doing that?
> >>>>>
> >>>>> I'm not sure what kind of a reason you are looking for. I think the
> >>>>> server can send an RST at any point in a TCP connection and the client
> >>>>> shouldn't crash.
> >>>>
> >>>> No doubt.
> >>>>
> >>>>
> >>>>> But the real reason is probably because the server's
> >>>>> code is getting developed and they ran into some error and resetting
> >>>>> the connection.
> >>>>
> >>>> That's what I was looking for. I wanted to know if it was
> >>>> Linux NFSD doing this.
> >>>>
> >>>>
> >>>>>>> Here's the stack trace:
> >>>>>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
> >>>>>>> [ 677.274424] #PF: supervisor read access in kernel mode
> >>>>>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
> >>>>>>> [ 677.274609] PGD 0 P4D 0
> >>>>>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
> >>>>>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
> >>>>>>> tainted 6.3.0-rc4+ #13
> >>>>>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>>>>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>>>>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
> >>>>>>
> >>>>>> scripts/faddr2line on this RIP
> >>>>>>
> >>>>>> Which pointer is NULL?
> >>>>>
> >>>>> I apologize I don't have that info and I'm working on trying to figure
> >>>>> out how to get this information to you. Lab kernels are built with
> >>>>> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
> >>>>> attempt to build on with debugging enabled (has been problematic
> >>>>> before for the lab machines so not sure if that'll be successful I
> >>>>> might have to hack the linux kernel to simulate the reset).
> >>>>
> >>>> Jakub asked that NULL checking be removed from handshake_req_cancel,
> >>>> or at least that's what I read into his comment. That choice is
> >>>> definitely flushing bugs.
> >>>>
> >>>> So this crasher is familiar. req_cancel is getting called in some
> >>>> scenario where the socket or request is partially filled in. We
> >>>> have to figure out how that's happening.
> >>>>
> >>>> If debug info is a problem, then just add a sarcastic quantity of
> >>>> NULL checks in handshake_req_cancel, rebuild, and it should tell
> >>>> you exactly which pointer is bad.
> >>>
> >>> I added a check for sock-sk null and if so return true.
> >>
> >> If it returns true, xs_tls_handshake_sync() probably completes
> >> successfully.
> >>
> >> Perhaps handshake_req_cancel needs to return false in cases
> >> where it can't find a pending request.
> >
> > I'm not sure if you were suggesting me to do this but I tried:
> > if (!sock->sk) return false;
> > and that led to a mount not crashing but timing out.
>
> That's the behavior I would expect. Was the failure immediate,
> or did it take a few seconds for the mount command to exit?
It did take quite a long time to exit:
[root@scs000128074 kolga]# time mount -o vers=4.2,sec=sys,xprtsec=tls
10.63.20.131:/ /mnt
mount.nfs: Connection timed out
real 3m0.544s
user 0m0.004s
sys 0m0.013s
Network trace shows numerous (16) attempts to do NULL call
66 2023-04-04 15:18:17.786579 10.235.232.7 → 10.63.20.131 TCP 74
953 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995198270 TSecr=0 WS=128
67 2023-04-04 15:18:17.786818 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 953 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=2815692929 TSecr=3995198270
68 2023-04-04 15:18:17.786871 10.235.232.7 → 10.63.20.131 TCP 66
953 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995198270
TSecr=2815692929
69 2023-04-04 15:18:17.786951 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
70 2023-04-04 15:18:17.787488 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 69)
71 2023-04-04 15:18:17.787520 10.235.232.7 → 10.63.20.131 TCP 66
953 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995198271
TSecr=2815692929
72 2023-04-04 15:18:17.818060 10.235.232.7 → 10.56.8.183 TCP 66
826 → 2049 [ACK] Seq=293 Ack=237 Win=501 Len=0 TSval=1599771772
TSecr=3825493593
73 2023-04-04 15:18:17.993341 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 953 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2815693129
TSecr=3995198271
74 2023-04-04 15:18:19.490669 fe80::21c:73ff:fe00:99 → ff02::1
ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
75 2023-04-04 15:18:19.494076 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
76 2023-04-04 15:18:19.841191 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
77 2023-04-04 15:18:28.201642 10.235.232.7 → 10.63.20.131 TCP 74
692 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995208685 TSecr=0 WS=128
78 2023-04-04 15:18:28.202171 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 692 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=3904561029 TSecr=3995208685
79 2023-04-04 15:18:28.202236 10.235.232.7 → 10.63.20.131 TCP 66
692 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995208686
TSecr=3904561029
80 2023-04-04 15:18:28.202506 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
81 2023-04-04 15:18:28.202882 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 80)
82 2023-04-04 15:18:28.202911 10.235.232.7 → 10.63.20.131 TCP 66
692 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995208686
TSecr=3904561029
83 2023-04-04 15:18:28.329632 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 692 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3904561159
TSecr=3995208686
84 2023-04-04 15:18:34.857077 10.235.232.7 → 10.60.248.40 TCP 66
845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0 TSval=3160247691
TSecr=2389454709
85 2023-04-04 15:18:34.857146 10.235.232.7 → 10.60.249.150 TCP 66
970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0 TSval=4116465771
TSecr=1272760298
86 2023-04-04 15:18:34.857421 10.60.249.150 → 10.235.232.7 TCP 66
[TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1 Ack=2 Win=257 Len=0
TSval=1272820701 TSecr=4116405409
87 2023-04-04 15:18:34.857422 10.60.248.40 → 10.235.232.7 TCP 66
[TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1 Ack=2 Win=257 Len=0
TSval=2389515123 TSecr=3160187318
88 2023-04-04 15:18:38.441851 10.235.232.7 → 10.63.20.131 TCP 74
971 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995218925 TSecr=0 WS=128
89 2023-04-04 15:18:38.442269 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 971 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1698639803 TSecr=3995218925
90 2023-04-04 15:18:38.442320 10.235.232.7 → 10.63.20.131 TCP 66
971 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995218926
TSecr=1698639803
91 2023-04-04 15:18:38.442577 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
92 2023-04-04 15:18:38.442894 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 91)
93 2023-04-04 15:18:38.442921 10.235.232.7 → 10.63.20.131 TCP 66
971 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995218926
TSecr=1698639803
94 2023-04-04 15:18:38.478175 10.63.20.131 → 10.235.232.7 TCP 66
[TCP Dup ACK 92#1] 2049 → 971 [ACK] Seq=37 Ack=45 Win=65792 Len=0
TSval=1698639843 TSecr=3995218926
95 2023-04-04 15:18:38.614265 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 971 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1698639973
TSecr=3995218926
96 2023-04-04 15:18:48.681763 10.235.232.7 → 10.63.20.131 TCP 74
799 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995229165 TSecr=0 WS=128
97 2023-04-04 15:18:48.682176 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 799 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1773399631 TSecr=3995229165
98 2023-04-04 15:18:48.682203 10.235.232.7 → 10.63.20.131 TCP 66
799 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995229166
TSecr=1773399631
99 2023-04-04 15:18:48.682260 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
100 2023-04-04 15:18:48.682751 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 99)
101 2023-04-04 15:18:48.682761 10.235.232.7 → 10.63.20.131 TCP 66
799 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995229166
TSecr=1773399631
102 2023-04-04 15:18:48.826843 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 799 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1773399771
TSecr=3995229166
103 2023-04-04 15:18:49.514227 10.235.232.170 → 10.235.235.255
BROWSER 243 Host Announcement WINDOWS-G95UIQM, Workstation, Server, NT
Workstation, NT Server
104 2023-04-04 15:18:58.921556 10.235.232.7 → 10.63.20.131 TCP 74
729 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995239405 TSecr=0 WS=128
105 2023-04-04 15:18:58.921926 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 729 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=2487545914 TSecr=3995239405
106 2023-04-04 15:18:58.921953 10.235.232.7 → 10.63.20.131 TCP 66
729 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995239405
TSecr=2487545914
107 2023-04-04 15:18:58.922027 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
108 2023-04-04 15:18:58.922802 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 107)
109 2023-04-04 15:18:58.922822 10.235.232.7 → 10.63.20.131 TCP 66
729 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995239406
TSecr=2487545914
110 2023-04-04 15:18:59.050600 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 729 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2487546044
TSecr=3995239406
111 2023-04-04 15:19:06.533098 10.235.232.128 → 10.235.235.255
BROWSER 243 Host Announcement SCS000126759-1, Workstation, Server, NT
Workstation, NT Server
112 2023-04-04 15:19:09.161611 10.235.232.7 → 10.63.20.131 TCP 74
954 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995249645 TSecr=0 WS=128
113 2023-04-04 15:19:09.161949 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 954 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=3804453828 TSecr=3995249645
114 2023-04-04 15:19:09.162009 10.235.232.7 → 10.63.20.131 TCP 66
954 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995249645
TSecr=3804453828
115 2023-04-04 15:19:09.162055 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
116 2023-04-04 15:19:09.162528 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 115)
117 2023-04-04 15:19:09.162554 10.235.232.7 → 10.63.20.131 TCP 66
954 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995249646
TSecr=3804453828
118 2023-04-04 15:19:09.278841 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 954 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3804453948
TSecr=3995249646
119 2023-04-04 15:19:12.003989 Vmware_bf:32:23 → Broadcast ARP 60
Who has 10.235.232.1? Tell 10.235.232.42
120 2023-04-04 15:19:19.401546 10.235.232.7 → 10.63.20.131 TCP 74
728 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995259885 TSecr=0 WS=128
121 2023-04-04 15:19:19.401902 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 728 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=2468567773 TSecr=3995259885
122 2023-04-04 15:19:19.401965 10.235.232.7 → 10.63.20.131 TCP 66
728 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995259885
TSecr=2468567773
123 2023-04-04 15:19:19.402050 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
124 2023-04-04 15:19:19.402542 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 123)
125 2023-04-04 15:19:19.402554 10.235.232.7 → 10.63.20.131 TCP 66
728 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995259886
TSecr=2468567773
126 2023-04-04 15:19:19.595961 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 728 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2468567963
TSecr=3995259886
127 2023-04-04 15:19:21.449077 10.235.232.7 → 10.60.249.125 TCP 66
[TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
TSval=1628408100 TSecr=708634268
128 2023-04-04 15:19:21.449206 10.235.232.7 → 10.56.8.183 TCP 66
[TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
TSval=1599835403 TSecr=3825493593
129 2023-04-04 15:19:21.449284 10.60.249.125 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
TSval=708698066 TSecr=1628344342
130 2023-04-04 15:19:21.511308 10.56.8.183 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
TSval=3825557328 TSecr=1599771772
131 2023-04-04 15:19:29.641659 10.235.232.7 → 10.63.20.131 TCP 74
924 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995270125 TSecr=0 WS=128
132 2023-04-04 15:19:29.641987 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 924 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=4239335290 TSecr=3995270125
133 2023-04-04 15:19:29.642062 10.235.232.7 → 10.63.20.131 TCP 66
924 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995270125
TSecr=4239335290
134 2023-04-04 15:19:29.642188 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
135 2023-04-04 15:19:29.642585 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 134)
136 2023-04-04 15:19:29.642626 10.235.232.7 → 10.63.20.131 TCP 66
924 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995270126
TSecr=4239335290
137 2023-04-04 15:19:29.752894 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 924 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=4239335400
TSecr=3995270126
138 2023-04-04 15:19:37.833079 10.235.232.7 → 10.60.248.40 TCP 66
[TCP Dup ACK 84#1] 845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0
TSval=3160310667 TSecr=2389515123
139 2023-04-04 15:19:37.833122 10.235.232.7 → 10.60.249.150 TCP 66
[TCP Dup ACK 85#1] 970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0
TSval=4116528747 TSecr=1272820701
140 2023-04-04 15:19:37.833308 10.60.248.40 → 10.235.232.7 TCP 66
[TCP Dup ACK 87#1] [TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1
Ack=2 Win=257 Len=0 TSval=2389578099 TSecr=3160187318
141 2023-04-04 15:19:37.833346 10.60.249.150 → 10.235.232.7 TCP 66
[TCP Dup ACK 86#1] [TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1
Ack=2 Win=257 Len=0 TSval=1272883677 TSecr=4116405409
142 2023-04-04 15:19:39.881507 10.235.232.7 → 10.63.20.131 TCP 74
788 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995280365 TSecr=0 WS=128
143 2023-04-04 15:19:39.881784 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 788 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=2800122551 TSecr=3995280365
144 2023-04-04 15:19:39.881847 10.235.232.7 → 10.63.20.131 TCP 66
788 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995280365
TSecr=2800122551
145 2023-04-04 15:19:39.881898 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
146 2023-04-04 15:19:39.882359 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 145)
147 2023-04-04 15:19:39.882395 10.235.232.7 → 10.63.20.131 TCP 66
788 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995280366
TSecr=2800122551
148 2023-04-04 15:19:40.017113 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 788 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2800122681
TSecr=3995280366
149 2023-04-04 15:19:50.122254 10.235.232.7 → 10.63.20.131 TCP 74
868 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995290606 TSecr=0 WS=128
150 2023-04-04 15:19:50.122643 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 868 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1996033015 TSecr=3995290606
151 2023-04-04 15:19:50.122674 10.235.232.7 → 10.63.20.131 TCP 66
868 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995290606
TSecr=1996033015
152 2023-04-04 15:19:50.122802 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
153 2023-04-04 15:19:50.123263 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 152)
154 2023-04-04 15:19:50.123279 10.235.232.7 → 10.63.20.131 TCP 66
868 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995290607
TSecr=1996033015
155 2023-04-04 15:19:50.268177 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 868 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1996033165
TSecr=3995290607
156 2023-04-04 15:20:00.361440 10.235.232.7 → 10.63.20.131 TCP 74
988 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995300845 TSecr=0 WS=128
157 2023-04-04 15:20:00.361776 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 988 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=4120839794 TSecr=3995300845
158 2023-04-04 15:20:00.361791 10.235.232.7 → 10.63.20.131 TCP 66
988 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995300845
TSecr=4120839794
159 2023-04-04 15:20:00.361841 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
160 2023-04-04 15:20:00.362321 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 159)
161 2023-04-04 15:20:00.362327 10.235.232.7 → 10.63.20.131 TCP 66
988 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995300846
TSecr=4120839794
162 2023-04-04 15:20:00.479087 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 988 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=4120839904
TSecr=3995300846
163 2023-04-04 15:20:01.559017 10.235.232.173 → 10.235.235.255
BROWSER 243 Host Announcement WINDOWS-PTEFTLL, Workstation, Server, NT
Workstation, NT Server
164 2023-04-04 15:20:10.426553 10.235.232.7 → 10.60.248.183 NTP 90
NTP Version 4, client
165 2023-04-04 15:20:10.426824 10.60.248.183 → 10.235.232.7 NTP 90
NTP Version 4, server
166 2023-04-04 15:20:10.601625 10.235.232.7 → 10.63.20.131 TCP 74
888 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995311085 TSecr=0 WS=128
167 2023-04-04 15:20:10.601919 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 888 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=531138781 TSecr=3995311085
168 2023-04-04 15:20:10.601978 10.235.232.7 → 10.63.20.131 TCP 66
888 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995311085
TSecr=531138781
169 2023-04-04 15:20:10.602058 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
170 2023-04-04 15:20:10.602516 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 169)
171 2023-04-04 15:20:10.602543 10.235.232.7 → 10.63.20.131 TCP 66
888 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995311086
TSecr=531138781
172 2023-04-04 15:20:10.731005 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 888 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=531138911
TSecr=3995311086
173 2023-04-04 15:20:11.998469 10.235.232.7 → 10.235.48.111 NTP 90
NTP Version 4, client
174 2023-04-04 15:20:11.998803 10.235.48.111 → 10.235.232.7 NTP 90
NTP Version 4, server
175 2023-04-04 15:20:12.648789 10.235.232.7 → 10.227.204.142 NTP 90
NTP Version 4, client
176 2023-04-04 15:20:12.649163 10.227.204.142 → 10.235.232.7 NTP 90
NTP Version 4, server
177 2023-04-04 15:20:20.841429 10.235.232.7 → 10.63.20.131 TCP 74
893 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995321325 TSecr=0 WS=128
178 2023-04-04 15:20:20.841673 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 893 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1953607493 TSecr=3995321325
179 2023-04-04 15:20:20.841688 10.235.232.7 → 10.63.20.131 TCP 66
893 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995321325
TSecr=1953607493
180 2023-04-04 15:20:20.841767 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
181 2023-04-04 15:20:20.842198 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 180)
182 2023-04-04 15:20:20.842208 10.235.232.7 → 10.63.20.131 TCP 66
893 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995321326
TSecr=1953607493
183 2023-04-04 15:20:21.051706 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 893 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1953607703
TSecr=3995321326
184 2023-04-04 15:20:22.889123 10.235.232.7 → 10.56.8.183 TCP 66
[TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
TSval=1599896843 TSecr=3825557328
185 2023-04-04 15:20:22.889130 10.235.232.7 → 10.60.249.125 TCP 66
[TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
TSval=1628469540 TSecr=708698066
186 2023-04-04 15:20:22.889335 10.60.249.125 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
TSval=708759507 TSecr=1628344342
187 2023-04-04 15:20:22.951234 10.56.8.183 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
TSval=3825618767 TSecr=1599771772
188 2023-04-04 15:20:27.226390 10.235.232.131 → 10.235.235.255
BROWSER 243 Host Announcement SCS000126759-3, Workstation, Server, NT
Workstation, NT Server
189 2023-04-04 15:20:31.081384 10.235.232.7 → 10.63.20.131 TCP 74
1013 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995331565 TSecr=0 WS=128
190 2023-04-04 15:20:31.081679 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 1013 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1446866364 TSecr=3995331565
191 2023-04-04 15:20:31.081698 10.235.232.7 → 10.63.20.131 TCP 66
1013 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995331565
TSecr=1446866364
192 2023-04-04 15:20:31.081745 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
193 2023-04-04 15:20:31.082347 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 192)
194 2023-04-04 15:20:31.082353 10.235.232.7 → 10.63.20.131 TCP 66
1013 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995331566
TSecr=1446866364
195 2023-04-04 15:20:31.216474 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 1013 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1446866494
TSecr=3995331566
196 2023-04-04 15:20:39.273073 10.235.232.7 → 10.60.249.150 TCP 66
[TCP Dup ACK 85#2] 970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0
TSval=4116590187 TSecr=1272883677
197 2023-04-04 15:20:39.273146 10.235.232.7 → 10.60.248.40 TCP 66
[TCP Dup ACK 84#2] 845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0
TSval=3160372107 TSecr=2389578099
198 2023-04-04 15:20:39.273342 10.60.249.150 → 10.235.232.7 TCP 66
[TCP Dup ACK 86#2] [TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1
Ack=2 Win=257 Len=0 TSval=1272945117 TSecr=4116405409
199 2023-04-04 15:20:39.274592 10.60.248.40 → 10.235.232.7 TCP 66
[TCP Dup ACK 87#2] [TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1
Ack=2 Win=257 Len=0 TSval=2389639540 TSecr=3160187318
200 2023-04-04 15:20:41.321912 10.235.232.7 → 10.63.20.131 TCP 74
676 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995341805 TSecr=0 WS=128
201 2023-04-04 15:20:41.322226 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 676 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=710203011 TSecr=3995341805
202 2023-04-04 15:20:41.322258 10.235.232.7 → 10.63.20.131 TCP 66
676 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995341806
TSecr=710203011
203 2023-04-04 15:20:41.322382 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
204 2023-04-04 15:20:41.322902 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 203)
205 2023-04-04 15:20:41.322923 10.235.232.7 → 10.63.20.131 TCP 66
676 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995341806
TSecr=710203011
206 2023-04-04 15:20:41.502713 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 676 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=710203191
TSecr=3995341806
207 2023-04-04 15:20:41.671253 fe80::21c:73ff:fe00:99 → ff02::1
ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
208 2023-04-04 15:20:41.675089 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
209 2023-04-04 15:20:41.684071 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
210 2023-04-04 15:20:43.709117 10.235.232.130 → 10.235.235.255
BROWSER 243 Host Announcement SCS000126759-2, Workstation, Server, NT
Workstation, NT Server
211 2023-04-04 15:20:51.561341 10.235.232.7 → 10.63.20.131 TCP 74
864 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995352045 TSecr=0 WS=128
212 2023-04-04 15:20:51.561677 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 864 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1105202286 TSecr=3995352045
213 2023-04-04 15:20:51.561736 10.235.232.7 → 10.63.20.131 TCP 66
864 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995352045
TSecr=1105202286
214 2023-04-04 15:20:51.561764 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
215 2023-04-04 15:20:51.562296 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 214)
216 2023-04-04 15:20:51.562324 10.235.232.7 → 10.63.20.131 TCP 66
864 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995352046
TSecr=1105202286
217 2023-04-04 15:20:51.764952 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 864 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1105202486
TSecr=3995352046
218 2023-04-04 15:21:01.801569 10.235.232.7 → 10.63.20.131 TCP 74
779 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995362285 TSecr=0 WS=128
219 2023-04-04 15:21:01.801903 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 779 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=3669197173 TSecr=3995362285
220 2023-04-04 15:21:01.801938 10.235.232.7 → 10.63.20.131 TCP 66
779 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995362285
TSecr=3669197173
221 2023-04-04 15:21:01.801991 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
222 2023-04-04 15:21:01.802750 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 221)
223 2023-04-04 15:21:01.802761 10.235.232.7 → 10.63.20.131 TCP 66
779 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995362286
TSecr=3669197173
224 2023-04-04 15:21:01.954179 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 779 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3669197323
TSecr=3995362286
225 2023-04-04 15:21:10.142734 fe80::21c:73ff:fe00:99 → ff02::1
ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
226 2023-04-04 15:21:10.146082 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
227 2023-04-04 15:21:10.889124 fe80::250:56ff:febf:2f55 → ff02::16
ICMPv6 90 Multicast Listener Report Message v2
228 2023-04-04 15:21:12.041741 10.235.232.7 → 10.63.20.131 TCP 74
946 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
TSval=3995372525 TSecr=0 WS=128
229 2023-04-04 15:21:12.042058 10.63.20.131 → 10.235.232.7 TCP 74
2049 → 946 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
SACK_PERM=1 TSval=1985286575 TSecr=3995372525
230 2023-04-04 15:21:12.042121 10.235.232.7 → 10.63.20.131 TCP 66
946 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995372526
TSecr=1985286575
231 2023-04-04 15:21:12.042404 10.235.232.7 → 10.63.20.131 NFS 110
V4 NULL Call
232 2023-04-04 15:21:12.042673 10.63.20.131 → 10.235.232.7 NFS 102
V4 NULL Reply (Call In 231)
233 2023-04-04 15:21:12.042710 10.235.232.7 → 10.63.20.131 TCP 66
946 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995372526
TSecr=1985286575
234 2023-04-04 15:21:12.079429 10.63.20.131 → 10.235.232.7 TCP 66
[TCP Dup ACK 232#1] 2049 → 946 [ACK] Seq=37 Ack=45 Win=65792 Len=0
TSval=1985286615 TSecr=3995372526
235 2023-04-04 15:21:12.184895 10.63.20.131 → 10.235.232.7 TCP 66
2049 → 946 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1985286705
TSecr=3995372526
236 2023-04-04 15:21:18.186337 10.235.232.7 → 10.60.250.78 SSH 150
Server: Encrypted packet (len=84)
237 2023-04-04 15:21:18.186573 10.60.250.78 → 10.235.232.7 TCP 66
45496 → 22 [ACK] Seq=417 Ack=629 Win=501 Len=0 TSval=2780826399
TSecr=3765675462
238 2023-04-04 15:21:18.187900 10.235.232.7 → 10.60.250.78 SSH 150
Server: Encrypted packet (len=84)
239 2023-04-04 15:21:18.188065 10.60.250.78 → 10.235.232.7 TCP 66
45496 → 22 [ACK] Seq=417 Ack=713 Win=501 Len=0 TSval=2780826401
TSecr=3765675463
240 2023-04-04 15:21:18.188154 10.235.232.7 → 10.60.250.78 SSH 150
Server: Encrypted packet (len=84)
241 2023-04-04 15:21:18.188283 10.60.250.78 → 10.235.232.7 TCP 66
45496 → 22 [ACK] Seq=417 Ack=797 Win=501 Len=0 TSval=2780826401
TSecr=3765675464
242 2023-04-04 15:21:18.188455 10.235.232.7 → 10.60.250.78 SSH 134
Server: Encrypted packet (len=68)
243 2023-04-04 15:21:18.188599 10.60.250.78 → 10.235.232.7 TCP 66
45496 → 22 [ACK] Seq=417 Ack=865 Win=501 Len=0 TSval=2780826401
TSecr=3765675464
244 2023-04-04 15:21:22.539976 fe80::21c:73ff:fe00:99 →
fe80::250:56ff:febf:5abd ICMPv6 118 Router Advertisement from
00:1c:73:00:00:99
245 2023-04-04 15:21:22.764914 fe80::21c:73ff:fe00:99 →
fe80::250:56ff:febf:5abd ICMPv6 118 Router Advertisement from
00:1c:73:00:00:99
246 2023-04-04 15:21:24.329146 10.235.232.7 → 10.56.8.183 TCP 66
[TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
TSval=1599958283 TSecr=3825618767
247 2023-04-04 15:21:24.329156 10.235.232.7 → 10.60.249.125 TCP 66
[TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
TSval=1628530980 TSecr=708759507
248 2023-04-04 15:21:24.329506 10.60.249.125 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
TSval=708820947 TSecr=1628344342
249 2023-04-04 15:21:24.391262 10.56.8.183 → 10.235.232.7 TCP 66
[TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
TSval=3825680208 TSecr=1599771772
>
>
> >>> I thought that
> >>> would "prevent" the crash but it then proceeded to crash elsewhere.
> >>> Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
> >>> checks that that'd be enough :)).
> >>>
> >>> [ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
> >>> [ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
> >>> [ 284.368320] #PF: supervisor read access in kernel mode
> >>> [ 284.368363] #PF: error_code(0x0000) - not-present page
> >>> [ 284.368411] PGD 0 P4D 0
> >>> [ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
> >>> [ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
> >>> tainted 6.3.0-rc4+ #14
> >>> [ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
> >>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
> >>> [ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
> >>> [ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
> >>> [ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
> >>> ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
> >>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
> >>> 24 e8
> >>> [ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
> >>> [ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
> >>> [ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
> >>> [ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
> >>> [ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
> >>> [ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
> >>> [ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
> >>> knlGS:0000000000000000
> >>> [ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> [ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
> >>> [ 284.369458] PKRU: 55555554
> >>> [ 284.369483] Call Trace:
> >>> [ 284.369523] <TASK>
> >>> [ 284.369583] process_one_work+0x1b0/0x3c0
> >>> [ 284.369633] worker_thread+0x34/0x360
> >>> [ 284.369668] ? __pfx_worker_thread+0x10/0x10
> >>> [ 284.369704] kthread+0xdb/0x110
> >>> [ 284.369736] ? __pfx_kthread+0x10/0x10
> >>> [ 284.369767] ret_from_fork+0x29/0x50
> >>> [ 284.369808] </TASK>
> >>> [ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> >>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> >>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> >>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> >>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> >>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> >>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
> >>> sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
> >>> crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
> >>> dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
> >>> crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
> >>> [ 284.370300] CR2: 0000000000000018
> >>>
> >>>
> >>>>
> >>>> BPF might also be a solution, but I'm not sure if it relies on
> >>>> debug info. Might be only BTF is needed.
> >>>>
> >>>>
> >>>>>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
> >>>>>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
> >>>>>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
> >>>>>>> 01 41
> >>>>>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
> >>>>>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
> >>>>>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
> >>>>>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
> >>>>>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
> >>>>>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
> >>>>>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
> >>>>>>> knlGS:0000000000000000
> >>>>>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
> >>>>>>> [ 677.276355] PKRU: 55555554
> >>>>>>> [ 677.276427] Call Trace:
> >>>>>>> [ 677.276521] <TASK>
> >>>>>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
> >>>>>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
> >>>>>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
> >>>>>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
> >>>>>>> [ 677.277086] process_one_work+0x1b0/0x3c0
> >>>>>>> [ 677.277209] worker_thread+0x34/0x360
> >>>>>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
> >>>>>>> [ 677.277396] kthread+0xdb/0x110
> >>>>>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
> >>>>>>> [ 677.277619] ret_from_fork+0x29/0x50
> >>>>>>> [ 677.277731] </TASK>
> >>>>>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> >>>>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
> >>>>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
> >>>>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
> >>>>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
> >>>>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
> >>>>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
> >>>>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
> >>>>>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
> >>>>>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
> >>>>>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
> >>>>>>> crc64 ipmi_devintf ipmi_msghandler fuse
> >>>>>>> [ 677.278383] CR2: 0000000000000030
> >>>>>>>
> >>>>>>> Network trace from the server:
> >>>>>>>
> >>>>>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
> >>>>>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
> >>>>>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> >>>>>>> TSval=4059503415 TSecr=0 WS=128
> >>>>>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
> >>>>>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> >>>>>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
> >>>>>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
> >>>>>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
> >>>>>>> TSecr=2256845073
> >>>>>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
> >>>>>>> V4 NULL Call
> >>>>>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
> >>>>>>> V4 NULL Reply (Call In 4)
> >>>>>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
> >>>>>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
> >>>>>>> TSecr=2256845073
> >>>>>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
> >>>>>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
> >>>>>>> TSecr=4059503416
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Chuck Lever
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Chuck Lever
> >>>>
> >>>>
> >>>> --
> >>>> Chuck Lever
> >>
> >>
> >> --
> >> Chuck Lever
>
>
> --
> Chuck Lever
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: kernel oops
2023-04-04 19:26 ` Olga Kornievskaia
@ 2023-04-04 19:30 ` Chuck Lever III
0 siblings, 0 replies; 12+ messages in thread
From: Chuck Lever III @ 2023-04-04 19:30 UTC (permalink / raw)
To: Olga Kornievskaia; +Cc: kernel-tls-handshake
> On Apr 4, 2023, at 3:26 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>
> On Tue, Apr 4, 2023 at 3:14 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>
>>
>>
>>> On Apr 4, 2023, at 3:11 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>
>>> On Tue, Apr 4, 2023 at 1:28 PM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Apr 4, 2023, at 12:15 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>
>>>>> On Tue, Apr 4, 2023 at 10:54 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On Apr 4, 2023, at 10:44 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>
>>>>>>> On Tue, Apr 4, 2023 at 10:32 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>>>
>>>>>>>>> On Apr 4, 2023, at 10:24 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>>>
>>>>>>>>> On Fri, Mar 31, 2023 at 10:25 AM Chuck Lever III <chuck.lever@oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On Mar 31, 2023, at 10:23 AM, Olga Kornievskaia <aglo@umich.edu> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Chuck,
>>>>>>>>>>>
>>>>>>>>>>> Have you seen this oops? This is not my testing so all I know was that
>>>>>>>>>>> it was an attempt to mount with xprtsec=tls (against ONTAP).
>>>>>>>>>>>
>>>>>>>>>>> [ 316.939747] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>>>>>>>>>> [ 316.939848] #PF: supervisor read access in kernel mode
>>>>>>>>>>> [ 316.939914] #PF: error_code(0x0000) - not-present page
>>>>>>>>>>> [ 316.939984] PGD 0 P4D 0
>>>>>>>>>>> [ 316.940041] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>>>> [ 316.940109] CPU: 0 PID: 511 Comm: kworker/u2:30 Kdump: loaded Not
>>>>>>>>>>> tainted 6.2.1+ #2
>>>>>>>>>>> [ 316.940181] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>>>>>>>> [ 316.940259] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>>>>>>>> [ 316.940463] RIP: 0010:xs_tls_connect+0x3e1/0x590 [sunrpc]
>>>>>>>>>>> [ 316.940556] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>>>>>>>>>> ff e8 03 d3 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>>>>>>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>>>>>>>>>> 24 e8
>>>>>>>>>>> [ 316.940719] RSP: 0018:ffffa69d40f5fdc8 EFLAGS: 00010293
>>>>>>>>>>> [ 316.940786] RAX: 0000000000000000 RBX: ffff9b0c46333e40 RCX: 0000000000000000
>>>>>>>>>>> [ 316.940872] RDX: 0000000000000000 RSI: ffffa69d40f5fd20 RDI: 0000000000000000
>>>>>>>>>>> [ 316.940940] RBP: ffff9b0c46350200 R08: ffff9b0cf7e31968 R09: 0000000000000159
>>>>>>>>>>> [ 316.941008] R10: 0000000000000001 R11: 00000049c8887a40 R12: ffff9b0c46333800
>>>>>>>>>>> [ 316.941075] R13: 0000000004208160 R14: ffff9b0c44e18a00 R15: ffff9b0c42dab800
>>>>>>>>>>> [ 316.941159] FS: 0000000000000000(0000) GS:ffff9b0cf7e00000(0000)
>>>>>>>>>>> knlGS:0000000000000000
>>>>>>>>>>> [ 316.941243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>>>> [ 316.941309] CR2: 0000000000000018 CR3: 0000000102e48004 CR4: 00000000007706f0
>>>>>>>>>>> [ 316.941391] PKRU: 55555554
>>>>>>>>>>> [ 316.941443] Call Trace:
>>>>>>>>>>> [ 316.941513] <TASK>
>>>>>>>>>>> [ 316.941578] process_one_work+0x1b0/0x3c0
>>>>>>>>>>> [ 316.941689] worker_thread+0x30/0x360
>>>>>>>>>>> [ 316.941755] ? __pfx_worker_thread+0x10/0x10
>>>>>>>>>>> [ 316.941822] kthread+0xd7/0x100
>>>>>>>>>>> [ 316.941900] ? __pfx_kthread+0x10/0x10
>>>>>>>>>>> [ 316.941968] ret_from_fork+0x29/0x50
>>>>>>>>>>> [ 316.942049] </TASK>
>>>>>>>>>>> [ 316.942099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>>>>>>>> dns_resolver ib_core nfsv3 nfs_acl nfs lockd grace fscache netfs ext4
>>>>>>>>>>> rfkill mbcache jbd2 loop vsock_loopback
>>>>>>>>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock
>>>>>>>>>>> sunrpc tls vfat fat dm_multipath intel_rapl_msr intel_rapl_common
>>>>>>>>>>> isst_if_mbox_msr isst_if_common nfit libnvdimm crct10dif_pclmul
>>>>>>>>>>> crc32_pclmul ghash_clmulni_intel rapl vmw_balloon joydev pcspkr
>>>>>>>>>>> i2c_piix4 vmw_vmci xfs libcrc32c vmwgfx drm_ttm_helper ttm sr_mod
>>>>>>>>>>> cdrom drm_kms_helper syscopyarea sd_mod sysfillrect ata_generic
>>>>>>>>>>> sysimgblt sg ahci libahci ata_piix crc32c_intel serio_raw drm vmxnet3
>>>>>>>>>>> vmw_pvscsi libata dm_mirror dm_region_hash dm_log dm_mod nvme_tcp
>>>>>>>>>>> nvme_fabrics nvme_core t10_pi crc64_rocksoft crc64 ipmi_devintf
>>>>>>>>>>> ipmi_msghandler fuse
>>>>>>>>>>> [ 316.942550] CR2: 0000000000000018
>>>>>>>>>>>
>>>>>>>>>>> Please note this was still on "commit
>>>>>>>>>>> caa52379f0046e974fd16585dd7ede624699d8e2 (HEAD -> 03152023-v7,
>>>>>>>>>>> origin/upcall-v7)". If you think somethings done to address it I'll
>>>>>>>>>>> have them update to the latest.
>>>>>>>>>>
>>>>>>>>>> Update and retry. It has a familiar ring, but I can't say I've
>>>>>>>>>> seen this recently.
>>>>>>>>>
>>>>>>>>> Hi Chuck,
>>>>>>>>>
>>>>>>>>> I've updated the kernel code to the latest commit
>>>>>>>>> 46588cade103b446f74a7ee160a83cd2a80399e7 (HEAD ->
>>>>>>>>> chuck-rpc-with-tls-04032023, chuck/topic-rpc-with-tls-upcall) but the
>>>>>>>>> client is still crashing but looks like in a different place. This
>>>>>>>>> happens when the server terminates the connection (immediately) as the
>>>>>>>>> tls handshake happens.
>>>>>>>>
>>>>>>>> Why is the server doing that?
>>>>>>>
>>>>>>> I'm not sure what kind of a reason you are looking for. I think the
>>>>>>> server can send an RST at any point in a TCP connection and the client
>>>>>>> shouldn't crash.
>>>>>>
>>>>>> No doubt.
>>>>>>
>>>>>>
>>>>>>> But the real reason is probably because the server's
>>>>>>> code is getting developed and they ran into some error and resetting
>>>>>>> the connection.
>>>>>>
>>>>>> That's what I was looking for. I wanted to know if it was
>>>>>> Linux NFSD doing this.
>>>>>>
>>>>>>
>>>>>>>>> Here's the stack trace:
>>>>>>>>> [ 677.274299] BUG: kernel NULL pointer dereference, address: 0000000000000030
>>>>>>>>> [ 677.274424] #PF: supervisor read access in kernel mode
>>>>>>>>> [ 677.274517] #PF: error_code(0x0000) - not-present page
>>>>>>>>> [ 677.274609] PGD 0 P4D 0
>>>>>>>>> [ 677.274689] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>>>> [ 677.274785] CPU: 0 PID: 518 Comm: kworker/u2:29 Kdump: loaded Not
>>>>>>>>> tainted 6.3.0-rc4+ #13
>>>>>>>>> [ 677.274881] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>>>>>> [ 677.274981] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>>>>>> [ 677.275137] RIP: 0010:handshake_req_cancel+0x2b/0x3b0
>>>>>>>>
>>>>>>>> scripts/faddr2line on this RIP
>>>>>>>>
>>>>>>>> Which pointer is NULL?
>>>>>>>
>>>>>>> I apologize I don't have that info and I'm working on trying to figure
>>>>>>> out how to get this information to you. Lab kernels are built with
>>>>>>> CONFIG_DEBUG_INFO disabled and thus I can't do faddr2line. I will
>>>>>>> attempt to build on with debugging enabled (has been problematic
>>>>>>> before for the lab machines so not sure if that'll be successful I
>>>>>>> might have to hack the linux kernel to simulate the reset).
>>>>>>
>>>>>> Jakub asked that NULL checking be removed from handshake_req_cancel,
>>>>>> or at least that's what I read into his comment. That choice is
>>>>>> definitely flushing bugs.
>>>>>>
>>>>>> So this crasher is familiar. req_cancel is getting called in some
>>>>>> scenario where the socket or request is partially filled in. We
>>>>>> have to figure out how that's happening.
>>>>>>
>>>>>> If debug info is a problem, then just add a sarcastic quantity of
>>>>>> NULL checks in handshake_req_cancel, rebuild, and it should tell
>>>>>> you exactly which pointer is bad.
>>>>>
>>>>> I added a check for sock-sk null and if so return true.
>>>>
>>>> If it returns true, xs_tls_handshake_sync() probably completes
>>>> successfully.
>>>>
>>>> Perhaps handshake_req_cancel needs to return false in cases
>>>> where it can't find a pending request.
>>>
>>> I'm not sure if you were suggesting me to do this but I tried:
>>> if (!sock->sk) return false;
>>> and that led to a mount not crashing but timing out.
>>
>> That's the behavior I would expect. Was the failure immediate,
>> or did it take a few seconds for the mount command to exit?
>
> It did take quite a long time to exit:
>
> [root@scs000128074 kolga]# time mount -o vers=4.2,sec=sys,xprtsec=tls
> 10.63.20.131:/ /mnt
> mount.nfs: Connection timed out
>
> real 3m0.544s
> user 0m0.004s
> sys 0m0.013s
>
> Network trace shows numerous (16) attempts to do NULL call
The long delay is due to retries, probably driven by mount.nfs.
But looks like there's a 10 second delay between each attempt.
Maybe that's OK.
> 66 2023-04-04 15:18:17.786579 10.235.232.7 → 10.63.20.131 TCP 74
> 953 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995198270 TSecr=0 WS=128
> 67 2023-04-04 15:18:17.786818 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 953 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=2815692929 TSecr=3995198270
> 68 2023-04-04 15:18:17.786871 10.235.232.7 → 10.63.20.131 TCP 66
> 953 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995198270
> TSecr=2815692929
> 69 2023-04-04 15:18:17.786951 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 70 2023-04-04 15:18:17.787488 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 69)
> 71 2023-04-04 15:18:17.787520 10.235.232.7 → 10.63.20.131 TCP 66
> 953 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995198271
> TSecr=2815692929
> 72 2023-04-04 15:18:17.818060 10.235.232.7 → 10.56.8.183 TCP 66
> 826 → 2049 [ACK] Seq=293 Ack=237 Win=501 Len=0 TSval=1599771772
> TSecr=3825493593
> 73 2023-04-04 15:18:17.993341 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 953 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2815693129
> TSecr=3995198271
> 74 2023-04-04 15:18:19.490669 fe80::21c:73ff:fe00:99 → ff02::1
> ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
> 75 2023-04-04 15:18:19.494076 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 76 2023-04-04 15:18:19.841191 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 77 2023-04-04 15:18:28.201642 10.235.232.7 → 10.63.20.131 TCP 74
> 692 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995208685 TSecr=0 WS=128
> 78 2023-04-04 15:18:28.202171 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 692 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=3904561029 TSecr=3995208685
> 79 2023-04-04 15:18:28.202236 10.235.232.7 → 10.63.20.131 TCP 66
> 692 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995208686
> TSecr=3904561029
> 80 2023-04-04 15:18:28.202506 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 81 2023-04-04 15:18:28.202882 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 80)
> 82 2023-04-04 15:18:28.202911 10.235.232.7 → 10.63.20.131 TCP 66
> 692 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995208686
> TSecr=3904561029
> 83 2023-04-04 15:18:28.329632 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 692 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3904561159
> TSecr=3995208686
> 84 2023-04-04 15:18:34.857077 10.235.232.7 → 10.60.248.40 TCP 66
> 845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0 TSval=3160247691
> TSecr=2389454709
> 85 2023-04-04 15:18:34.857146 10.235.232.7 → 10.60.249.150 TCP 66
> 970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0 TSval=4116465771
> TSecr=1272760298
> 86 2023-04-04 15:18:34.857421 10.60.249.150 → 10.235.232.7 TCP 66
> [TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1 Ack=2 Win=257 Len=0
> TSval=1272820701 TSecr=4116405409
> 87 2023-04-04 15:18:34.857422 10.60.248.40 → 10.235.232.7 TCP 66
> [TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1 Ack=2 Win=257 Len=0
> TSval=2389515123 TSecr=3160187318
> 88 2023-04-04 15:18:38.441851 10.235.232.7 → 10.63.20.131 TCP 74
> 971 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995218925 TSecr=0 WS=128
> 89 2023-04-04 15:18:38.442269 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 971 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1698639803 TSecr=3995218925
> 90 2023-04-04 15:18:38.442320 10.235.232.7 → 10.63.20.131 TCP 66
> 971 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995218926
> TSecr=1698639803
> 91 2023-04-04 15:18:38.442577 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 92 2023-04-04 15:18:38.442894 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 91)
> 93 2023-04-04 15:18:38.442921 10.235.232.7 → 10.63.20.131 TCP 66
> 971 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995218926
> TSecr=1698639803
> 94 2023-04-04 15:18:38.478175 10.63.20.131 → 10.235.232.7 TCP 66
> [TCP Dup ACK 92#1] 2049 → 971 [ACK] Seq=37 Ack=45 Win=65792 Len=0
> TSval=1698639843 TSecr=3995218926
> 95 2023-04-04 15:18:38.614265 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 971 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1698639973
> TSecr=3995218926
> 96 2023-04-04 15:18:48.681763 10.235.232.7 → 10.63.20.131 TCP 74
> 799 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995229165 TSecr=0 WS=128
> 97 2023-04-04 15:18:48.682176 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 799 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1773399631 TSecr=3995229165
> 98 2023-04-04 15:18:48.682203 10.235.232.7 → 10.63.20.131 TCP 66
> 799 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995229166
> TSecr=1773399631
> 99 2023-04-04 15:18:48.682260 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 100 2023-04-04 15:18:48.682751 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 99)
> 101 2023-04-04 15:18:48.682761 10.235.232.7 → 10.63.20.131 TCP 66
> 799 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995229166
> TSecr=1773399631
> 102 2023-04-04 15:18:48.826843 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 799 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1773399771
> TSecr=3995229166
> 103 2023-04-04 15:18:49.514227 10.235.232.170 → 10.235.235.255
> BROWSER 243 Host Announcement WINDOWS-G95UIQM, Workstation, Server, NT
> Workstation, NT Server
> 104 2023-04-04 15:18:58.921556 10.235.232.7 → 10.63.20.131 TCP 74
> 729 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995239405 TSecr=0 WS=128
> 105 2023-04-04 15:18:58.921926 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 729 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=2487545914 TSecr=3995239405
> 106 2023-04-04 15:18:58.921953 10.235.232.7 → 10.63.20.131 TCP 66
> 729 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995239405
> TSecr=2487545914
> 107 2023-04-04 15:18:58.922027 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 108 2023-04-04 15:18:58.922802 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 107)
> 109 2023-04-04 15:18:58.922822 10.235.232.7 → 10.63.20.131 TCP 66
> 729 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995239406
> TSecr=2487545914
> 110 2023-04-04 15:18:59.050600 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 729 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2487546044
> TSecr=3995239406
> 111 2023-04-04 15:19:06.533098 10.235.232.128 → 10.235.235.255
> BROWSER 243 Host Announcement SCS000126759-1, Workstation, Server, NT
> Workstation, NT Server
> 112 2023-04-04 15:19:09.161611 10.235.232.7 → 10.63.20.131 TCP 74
> 954 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995249645 TSecr=0 WS=128
> 113 2023-04-04 15:19:09.161949 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 954 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=3804453828 TSecr=3995249645
> 114 2023-04-04 15:19:09.162009 10.235.232.7 → 10.63.20.131 TCP 66
> 954 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995249645
> TSecr=3804453828
> 115 2023-04-04 15:19:09.162055 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 116 2023-04-04 15:19:09.162528 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 115)
> 117 2023-04-04 15:19:09.162554 10.235.232.7 → 10.63.20.131 TCP 66
> 954 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995249646
> TSecr=3804453828
> 118 2023-04-04 15:19:09.278841 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 954 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3804453948
> TSecr=3995249646
> 119 2023-04-04 15:19:12.003989 Vmware_bf:32:23 → Broadcast ARP 60
> Who has 10.235.232.1? Tell 10.235.232.42
> 120 2023-04-04 15:19:19.401546 10.235.232.7 → 10.63.20.131 TCP 74
> 728 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995259885 TSecr=0 WS=128
> 121 2023-04-04 15:19:19.401902 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 728 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=2468567773 TSecr=3995259885
> 122 2023-04-04 15:19:19.401965 10.235.232.7 → 10.63.20.131 TCP 66
> 728 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995259885
> TSecr=2468567773
> 123 2023-04-04 15:19:19.402050 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 124 2023-04-04 15:19:19.402542 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 123)
> 125 2023-04-04 15:19:19.402554 10.235.232.7 → 10.63.20.131 TCP 66
> 728 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995259886
> TSecr=2468567773
> 126 2023-04-04 15:19:19.595961 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 728 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2468567963
> TSecr=3995259886
> 127 2023-04-04 15:19:21.449077 10.235.232.7 → 10.60.249.125 TCP 66
> [TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
> TSval=1628408100 TSecr=708634268
> 128 2023-04-04 15:19:21.449206 10.235.232.7 → 10.56.8.183 TCP 66
> [TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
> TSval=1599835403 TSecr=3825493593
> 129 2023-04-04 15:19:21.449284 10.60.249.125 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
> TSval=708698066 TSecr=1628344342
> 130 2023-04-04 15:19:21.511308 10.56.8.183 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
> TSval=3825557328 TSecr=1599771772
> 131 2023-04-04 15:19:29.641659 10.235.232.7 → 10.63.20.131 TCP 74
> 924 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995270125 TSecr=0 WS=128
> 132 2023-04-04 15:19:29.641987 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 924 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=4239335290 TSecr=3995270125
> 133 2023-04-04 15:19:29.642062 10.235.232.7 → 10.63.20.131 TCP 66
> 924 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995270125
> TSecr=4239335290
> 134 2023-04-04 15:19:29.642188 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 135 2023-04-04 15:19:29.642585 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 134)
> 136 2023-04-04 15:19:29.642626 10.235.232.7 → 10.63.20.131 TCP 66
> 924 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995270126
> TSecr=4239335290
> 137 2023-04-04 15:19:29.752894 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 924 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=4239335400
> TSecr=3995270126
> 138 2023-04-04 15:19:37.833079 10.235.232.7 → 10.60.248.40 TCP 66
> [TCP Dup ACK 84#1] 845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0
> TSval=3160310667 TSecr=2389515123
> 139 2023-04-04 15:19:37.833122 10.235.232.7 → 10.60.249.150 TCP 66
> [TCP Dup ACK 85#1] 970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0
> TSval=4116528747 TSecr=1272820701
> 140 2023-04-04 15:19:37.833308 10.60.248.40 → 10.235.232.7 TCP 66
> [TCP Dup ACK 87#1] [TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1
> Ack=2 Win=257 Len=0 TSval=2389578099 TSecr=3160187318
> 141 2023-04-04 15:19:37.833346 10.60.249.150 → 10.235.232.7 TCP 66
> [TCP Dup ACK 86#1] [TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1
> Ack=2 Win=257 Len=0 TSval=1272883677 TSecr=4116405409
> 142 2023-04-04 15:19:39.881507 10.235.232.7 → 10.63.20.131 TCP 74
> 788 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995280365 TSecr=0 WS=128
> 143 2023-04-04 15:19:39.881784 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 788 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=2800122551 TSecr=3995280365
> 144 2023-04-04 15:19:39.881847 10.235.232.7 → 10.63.20.131 TCP 66
> 788 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995280365
> TSecr=2800122551
> 145 2023-04-04 15:19:39.881898 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 146 2023-04-04 15:19:39.882359 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 145)
> 147 2023-04-04 15:19:39.882395 10.235.232.7 → 10.63.20.131 TCP 66
> 788 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995280366
> TSecr=2800122551
> 148 2023-04-04 15:19:40.017113 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 788 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2800122681
> TSecr=3995280366
> 149 2023-04-04 15:19:50.122254 10.235.232.7 → 10.63.20.131 TCP 74
> 868 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995290606 TSecr=0 WS=128
> 150 2023-04-04 15:19:50.122643 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 868 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1996033015 TSecr=3995290606
> 151 2023-04-04 15:19:50.122674 10.235.232.7 → 10.63.20.131 TCP 66
> 868 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995290606
> TSecr=1996033015
> 152 2023-04-04 15:19:50.122802 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 153 2023-04-04 15:19:50.123263 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 152)
> 154 2023-04-04 15:19:50.123279 10.235.232.7 → 10.63.20.131 TCP 66
> 868 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995290607
> TSecr=1996033015
> 155 2023-04-04 15:19:50.268177 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 868 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1996033165
> TSecr=3995290607
> 156 2023-04-04 15:20:00.361440 10.235.232.7 → 10.63.20.131 TCP 74
> 988 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995300845 TSecr=0 WS=128
> 157 2023-04-04 15:20:00.361776 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 988 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=4120839794 TSecr=3995300845
> 158 2023-04-04 15:20:00.361791 10.235.232.7 → 10.63.20.131 TCP 66
> 988 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995300845
> TSecr=4120839794
> 159 2023-04-04 15:20:00.361841 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 160 2023-04-04 15:20:00.362321 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 159)
> 161 2023-04-04 15:20:00.362327 10.235.232.7 → 10.63.20.131 TCP 66
> 988 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995300846
> TSecr=4120839794
> 162 2023-04-04 15:20:00.479087 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 988 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=4120839904
> TSecr=3995300846
> 163 2023-04-04 15:20:01.559017 10.235.232.173 → 10.235.235.255
> BROWSER 243 Host Announcement WINDOWS-PTEFTLL, Workstation, Server, NT
> Workstation, NT Server
> 164 2023-04-04 15:20:10.426553 10.235.232.7 → 10.60.248.183 NTP 90
> NTP Version 4, client
> 165 2023-04-04 15:20:10.426824 10.60.248.183 → 10.235.232.7 NTP 90
> NTP Version 4, server
> 166 2023-04-04 15:20:10.601625 10.235.232.7 → 10.63.20.131 TCP 74
> 888 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995311085 TSecr=0 WS=128
> 167 2023-04-04 15:20:10.601919 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 888 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=531138781 TSecr=3995311085
> 168 2023-04-04 15:20:10.601978 10.235.232.7 → 10.63.20.131 TCP 66
> 888 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995311085
> TSecr=531138781
> 169 2023-04-04 15:20:10.602058 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 170 2023-04-04 15:20:10.602516 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 169)
> 171 2023-04-04 15:20:10.602543 10.235.232.7 → 10.63.20.131 TCP 66
> 888 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995311086
> TSecr=531138781
> 172 2023-04-04 15:20:10.731005 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 888 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=531138911
> TSecr=3995311086
> 173 2023-04-04 15:20:11.998469 10.235.232.7 → 10.235.48.111 NTP 90
> NTP Version 4, client
> 174 2023-04-04 15:20:11.998803 10.235.48.111 → 10.235.232.7 NTP 90
> NTP Version 4, server
> 175 2023-04-04 15:20:12.648789 10.235.232.7 → 10.227.204.142 NTP 90
> NTP Version 4, client
> 176 2023-04-04 15:20:12.649163 10.227.204.142 → 10.235.232.7 NTP 90
> NTP Version 4, server
> 177 2023-04-04 15:20:20.841429 10.235.232.7 → 10.63.20.131 TCP 74
> 893 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995321325 TSecr=0 WS=128
> 178 2023-04-04 15:20:20.841673 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 893 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1953607493 TSecr=3995321325
> 179 2023-04-04 15:20:20.841688 10.235.232.7 → 10.63.20.131 TCP 66
> 893 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995321325
> TSecr=1953607493
> 180 2023-04-04 15:20:20.841767 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 181 2023-04-04 15:20:20.842198 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 180)
> 182 2023-04-04 15:20:20.842208 10.235.232.7 → 10.63.20.131 TCP 66
> 893 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995321326
> TSecr=1953607493
> 183 2023-04-04 15:20:21.051706 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 893 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1953607703
> TSecr=3995321326
> 184 2023-04-04 15:20:22.889123 10.235.232.7 → 10.56.8.183 TCP 66
> [TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
> TSval=1599896843 TSecr=3825557328
> 185 2023-04-04 15:20:22.889130 10.235.232.7 → 10.60.249.125 TCP 66
> [TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
> TSval=1628469540 TSecr=708698066
> 186 2023-04-04 15:20:22.889335 10.60.249.125 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
> TSval=708759507 TSecr=1628344342
> 187 2023-04-04 15:20:22.951234 10.56.8.183 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
> TSval=3825618767 TSecr=1599771772
> 188 2023-04-04 15:20:27.226390 10.235.232.131 → 10.235.235.255
> BROWSER 243 Host Announcement SCS000126759-3, Workstation, Server, NT
> Workstation, NT Server
> 189 2023-04-04 15:20:31.081384 10.235.232.7 → 10.63.20.131 TCP 74
> 1013 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995331565 TSecr=0 WS=128
> 190 2023-04-04 15:20:31.081679 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 1013 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1446866364 TSecr=3995331565
> 191 2023-04-04 15:20:31.081698 10.235.232.7 → 10.63.20.131 TCP 66
> 1013 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995331565
> TSecr=1446866364
> 192 2023-04-04 15:20:31.081745 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 193 2023-04-04 15:20:31.082347 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 192)
> 194 2023-04-04 15:20:31.082353 10.235.232.7 → 10.63.20.131 TCP 66
> 1013 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995331566
> TSecr=1446866364
> 195 2023-04-04 15:20:31.216474 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 1013 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1446866494
> TSecr=3995331566
> 196 2023-04-04 15:20:39.273073 10.235.232.7 → 10.60.249.150 TCP 66
> [TCP Dup ACK 85#2] 970 → 2049 [ACK] Seq=1 Ack=1 Win=502 Len=0
> TSval=4116590187 TSecr=1272883677
> 197 2023-04-04 15:20:39.273146 10.235.232.7 → 10.60.248.40 TCP 66
> [TCP Dup ACK 84#2] 845 → 2049 [ACK] Seq=1 Ack=1 Win=501 Len=0
> TSval=3160372107 TSecr=2389578099
> 198 2023-04-04 15:20:39.273342 10.60.249.150 → 10.235.232.7 TCP 66
> [TCP Dup ACK 86#2] [TCP ACKed unseen segment] 2049 → 970 [ACK] Seq=1
> Ack=2 Win=257 Len=0 TSval=1272945117 TSecr=4116405409
> 199 2023-04-04 15:20:39.274592 10.60.248.40 → 10.235.232.7 TCP 66
> [TCP Dup ACK 87#2] [TCP ACKed unseen segment] 2049 → 845 [ACK] Seq=1
> Ack=2 Win=257 Len=0 TSval=2389639540 TSecr=3160187318
> 200 2023-04-04 15:20:41.321912 10.235.232.7 → 10.63.20.131 TCP 74
> 676 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995341805 TSecr=0 WS=128
> 201 2023-04-04 15:20:41.322226 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 676 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=710203011 TSecr=3995341805
> 202 2023-04-04 15:20:41.322258 10.235.232.7 → 10.63.20.131 TCP 66
> 676 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995341806
> TSecr=710203011
> 203 2023-04-04 15:20:41.322382 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 204 2023-04-04 15:20:41.322902 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 203)
> 205 2023-04-04 15:20:41.322923 10.235.232.7 → 10.63.20.131 TCP 66
> 676 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995341806
> TSecr=710203011
> 206 2023-04-04 15:20:41.502713 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 676 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=710203191
> TSecr=3995341806
> 207 2023-04-04 15:20:41.671253 fe80::21c:73ff:fe00:99 → ff02::1
> ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
> 208 2023-04-04 15:20:41.675089 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 209 2023-04-04 15:20:41.684071 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 210 2023-04-04 15:20:43.709117 10.235.232.130 → 10.235.235.255
> BROWSER 243 Host Announcement SCS000126759-2, Workstation, Server, NT
> Workstation, NT Server
> 211 2023-04-04 15:20:51.561341 10.235.232.7 → 10.63.20.131 TCP 74
> 864 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995352045 TSecr=0 WS=128
> 212 2023-04-04 15:20:51.561677 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 864 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1105202286 TSecr=3995352045
> 213 2023-04-04 15:20:51.561736 10.235.232.7 → 10.63.20.131 TCP 66
> 864 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995352045
> TSecr=1105202286
> 214 2023-04-04 15:20:51.561764 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 215 2023-04-04 15:20:51.562296 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 214)
> 216 2023-04-04 15:20:51.562324 10.235.232.7 → 10.63.20.131 TCP 66
> 864 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995352046
> TSecr=1105202286
> 217 2023-04-04 15:20:51.764952 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 864 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1105202486
> TSecr=3995352046
> 218 2023-04-04 15:21:01.801569 10.235.232.7 → 10.63.20.131 TCP 74
> 779 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995362285 TSecr=0 WS=128
> 219 2023-04-04 15:21:01.801903 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 779 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=3669197173 TSecr=3995362285
> 220 2023-04-04 15:21:01.801938 10.235.232.7 → 10.63.20.131 TCP 66
> 779 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995362285
> TSecr=3669197173
> 221 2023-04-04 15:21:01.801991 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 222 2023-04-04 15:21:01.802750 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 221)
> 223 2023-04-04 15:21:01.802761 10.235.232.7 → 10.63.20.131 TCP 66
> 779 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995362286
> TSecr=3669197173
> 224 2023-04-04 15:21:01.954179 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 779 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=3669197323
> TSecr=3995362286
> 225 2023-04-04 15:21:10.142734 fe80::21c:73ff:fe00:99 → ff02::1
> ICMPv6 118 Router Advertisement from 00:1c:73:00:00:99
> 226 2023-04-04 15:21:10.146082 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 227 2023-04-04 15:21:10.889124 fe80::250:56ff:febf:2f55 → ff02::16
> ICMPv6 90 Multicast Listener Report Message v2
> 228 2023-04-04 15:21:12.041741 10.235.232.7 → 10.63.20.131 TCP 74
> 946 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
> TSval=3995372525 TSecr=0 WS=128
> 229 2023-04-04 15:21:12.042058 10.63.20.131 → 10.235.232.7 TCP 74
> 2049 → 946 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
> SACK_PERM=1 TSval=1985286575 TSecr=3995372525
> 230 2023-04-04 15:21:12.042121 10.235.232.7 → 10.63.20.131 TCP 66
> 946 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=3995372526
> TSecr=1985286575
> 231 2023-04-04 15:21:12.042404 10.235.232.7 → 10.63.20.131 NFS 110
> V4 NULL Call
> 232 2023-04-04 15:21:12.042673 10.63.20.131 → 10.235.232.7 NFS 102
> V4 NULL Reply (Call In 231)
> 233 2023-04-04 15:21:12.042710 10.235.232.7 → 10.63.20.131 TCP 66
> 946 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=3995372526
> TSecr=1985286575
> 234 2023-04-04 15:21:12.079429 10.63.20.131 → 10.235.232.7 TCP 66
> [TCP Dup ACK 232#1] 2049 → 946 [ACK] Seq=37 Ack=45 Win=65792 Len=0
> TSval=1985286615 TSecr=3995372526
> 235 2023-04-04 15:21:12.184895 10.63.20.131 → 10.235.232.7 TCP 66
> 2049 → 946 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=1985286705
> TSecr=3995372526
> 236 2023-04-04 15:21:18.186337 10.235.232.7 → 10.60.250.78 SSH 150
> Server: Encrypted packet (len=84)
> 237 2023-04-04 15:21:18.186573 10.60.250.78 → 10.235.232.7 TCP 66
> 45496 → 22 [ACK] Seq=417 Ack=629 Win=501 Len=0 TSval=2780826399
> TSecr=3765675462
> 238 2023-04-04 15:21:18.187900 10.235.232.7 → 10.60.250.78 SSH 150
> Server: Encrypted packet (len=84)
> 239 2023-04-04 15:21:18.188065 10.60.250.78 → 10.235.232.7 TCP 66
> 45496 → 22 [ACK] Seq=417 Ack=713 Win=501 Len=0 TSval=2780826401
> TSecr=3765675463
> 240 2023-04-04 15:21:18.188154 10.235.232.7 → 10.60.250.78 SSH 150
> Server: Encrypted packet (len=84)
> 241 2023-04-04 15:21:18.188283 10.60.250.78 → 10.235.232.7 TCP 66
> 45496 → 22 [ACK] Seq=417 Ack=797 Win=501 Len=0 TSval=2780826401
> TSecr=3765675464
> 242 2023-04-04 15:21:18.188455 10.235.232.7 → 10.60.250.78 SSH 134
> Server: Encrypted packet (len=68)
> 243 2023-04-04 15:21:18.188599 10.60.250.78 → 10.235.232.7 TCP 66
> 45496 → 22 [ACK] Seq=417 Ack=865 Win=501 Len=0 TSval=2780826401
> TSecr=3765675464
> 244 2023-04-04 15:21:22.539976 fe80::21c:73ff:fe00:99 →
> fe80::250:56ff:febf:5abd ICMPv6 118 Router Advertisement from
> 00:1c:73:00:00:99
> 245 2023-04-04 15:21:22.764914 fe80::21c:73ff:fe00:99 →
> fe80::250:56ff:febf:5abd ICMPv6 118 Router Advertisement from
> 00:1c:73:00:00:99
> 246 2023-04-04 15:21:24.329146 10.235.232.7 → 10.56.8.183 TCP 66
> [TCP Keep-Alive] 826 → 2049 [ACK] Seq=292 Ack=237 Win=501 Len=0
> TSval=1599958283 TSecr=3825618767
> 247 2023-04-04 15:21:24.329156 10.235.232.7 → 10.60.249.125 TCP 66
> [TCP Keep-Alive] 732 → 2049 [ACK] Seq=2176 Ack=1857 Win=502 Len=0
> TSval=1628530980 TSecr=708759507
> 248 2023-04-04 15:21:24.329506 10.60.249.125 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 732 [ACK] Seq=1857 Ack=2177 Win=257 Len=0
> TSval=708820947 TSecr=1628344342
> 249 2023-04-04 15:21:24.391262 10.56.8.183 → 10.235.232.7 TCP 66
> [TCP Keep-Alive ACK] 2049 → 826 [ACK] Seq=237 Ack=293 Win=257 Len=0
> TSval=3825680208 TSecr=1599771772
>
>>
>>
>>>>> I thought that
>>>>> would "prevent" the crash but it then proceeded to crash elsewhere.
>>>>> Sorry I'm still on non-debug-symbol kernel (as I thought I'd add the
>>>>> checks that that'd be enough :)).
>>>>>
>>>>> [ 284.368211] AGLO: handshake_req_cancel sock->sk is NULL
>>>>> [ 284.368249] BUG: kernel NULL pointer dereference, address: 0000000000000018
>>>>> [ 284.368320] #PF: supervisor read access in kernel mode
>>>>> [ 284.368363] #PF: error_code(0x0000) - not-present page
>>>>> [ 284.368411] PGD 0 P4D 0
>>>>> [ 284.368438] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>> [ 284.368473] CPU: 0 PID: 521 Comm: kworker/u2:30 Kdump: loaded Not
>>>>> tainted 6.3.0-rc4+ #14
>>>>> [ 284.368523] Hardware name: VMware, Inc. VMware7,1/440BX Desktop
>>>>> Reference Platform, BIOS VMW71.00V.16707776.B64.2008070230 08/07/2020
>>>>> [ 284.368580] Workqueue: xprtiod xs_tls_connect [sunrpc]
>>>>> [ 284.368720] RIP: 0010:xs_tls_connect+0x3e5/0x590 [sunrpc]
>>>>> [ 284.368844] Code: ff ff ff 48 2b 83 f8 fe ff ff 48 89 83 00 ff ff
>>>>> ff e8 bf d2 ff ff e9 a8 fd ff ff 49 8b 97 f8 05 00 00 66 83 bb e0 f9
>>>>> ff ff 0a <48> 8b 6a 18 0f 84 fd 00 00 00 48 8d 72 18 4c 89 e7 48 89 14
>>>>> 24 e8
>>>>> [ 284.368973] RSP: 0018:ffffba17c0f67dc8 EFLAGS: 00010293
>>>>> [ 284.369010] RAX: 0000000000000000 RBX: ffff963143c1fe40 RCX: 0000000000000000
>>>>> [ 284.369051] RDX: 0000000000000000 RSI: ffff9631f7e20840 RDI: ffff9631f7e20840
>>>>> [ 284.369091] RBP: ffff9631463d8a00 R08: 0000000000000000 R09: c0000000ffff7fff
>>>>> [ 284.369130] R10: 0000000000000001 R11: ffffba17c0f67bb8 R12: ffff963143c1f800
>>>>> [ 284.369175] R13: 0000000004208160 R14: ffff963143f6f600 R15: ffff963143c1e000
>>>>> [ 284.369259] FS: 0000000000000000(0000) GS:ffff9631f7e00000(0000)
>>>>> knlGS:0000000000000000
>>>>> [ 284.369319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> [ 284.369367] CR2: 0000000000000018 CR3: 0000000102d12004 CR4: 00000000007706f0
>>>>> [ 284.369458] PKRU: 55555554
>>>>> [ 284.369483] Call Trace:
>>>>> [ 284.369523] <TASK>
>>>>> [ 284.369583] process_one_work+0x1b0/0x3c0
>>>>> [ 284.369633] worker_thread+0x34/0x360
>>>>> [ 284.369668] ? __pfx_worker_thread+0x10/0x10
>>>>> [ 284.369704] kthread+0xdb/0x110
>>>>> [ 284.369736] ? __pfx_kthread+0x10/0x10
>>>>> [ 284.369767] ret_from_fork+0x29/0x50
>>>>> [ 284.369808] </TASK>
>>>>> [ 284.369843] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper sd_mod syscopyarea
>>>>> sysfillrect ata_generic sysimgblt sg ata_piix ahci libahci
>>>>> crc32c_intel serio_raw vmxnet3 libata drm vmw_pvscsi dm_mirror
>>>>> dm_region_hash dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi
>>>>> crc64_rocksoft crc64 ipmi_devintf ipmi_msghandler fuse
>>>>> [ 284.370300] CR2: 0000000000000018
>>>>>
>>>>>
>>>>>>
>>>>>> BPF might also be a solution, but I'm not sure if it relies on
>>>>>> debug info. Might be only BTF is needed.
>>>>>>
>>>>>>
>>>>>>>>> [ 677.275269] Code: 0f 1e fa 0f 1f 44 00 00 41 57 41 56 41 55 41 54
>>>>>>>>> 55 53 48 83 ec 28 48 8b 6f 18 65 48 8b 04 25 28 00 00 00 48 89 44 24
>>>>>>>>> 20 31 c0 <48> 8b 5d 30 48 89 6c 24 18 e8 97 15 5f ff 4c 8b 25 50 2e 70
>>>>>>>>> 01 41
>>>>>>>>> [ 677.275474] RSP: 0018:ffffaed400f6fcf8 EFLAGS: 00010246
>>>>>>>>> [ 677.275570] RAX: 0000000000000000 RBX: ffff93f085b33800 RCX: 0000000000000000
>>>>>>>>> [ 677.275663] RDX: ffff93f085b33f08 RSI: ffffaed400f6fd18 RDI: ffff93f083aa6700
>>>>>>>>> [ 677.275756] RBP: 0000000000000000 R08: ffff93f137e32968 R09: 0000009d9663fa40
>>>>>>>>> [ 677.275850] R10: 0000000000000000 R11: 0000009d9663fa40 R12: ffff93f083aa6700
>>>>>>>>> [ 677.275943] R13: 0000000000000000 R14: ffff93f082999e00 R15: ffff93f085b33800
>>>>>>>>> [ 677.276049] FS: 0000000000000000(0000) GS:ffff93f137e00000(0000)
>>>>>>>>> knlGS:0000000000000000
>>>>>>>>> [ 677.276154] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> [ 677.276247] CR2: 0000000000000030 CR3: 000000010797c002 CR4: 00000000007706f0
>>>>>>>>> [ 677.276355] PKRU: 55555554
>>>>>>>>> [ 677.276427] Call Trace:
>>>>>>>>> [ 677.276521] <TASK>
>>>>>>>>> [ 677.276605] ? wait_for_completion_interruptible_timeout+0xfa/0x170
>>>>>>>>> [ 677.276725] xs_tls_handshake_sync+0x14f/0x170 [sunrpc]
>>>>>>>>> [ 677.276850] ? __pfx_xs_tls_handshake_done+0x10/0x10 [sunrpc]
>>>>>>>>> [ 677.276967] xs_tls_connect+0x14c/0x590 [sunrpc]
>>>>>>>>> [ 677.277086] process_one_work+0x1b0/0x3c0
>>>>>>>>> [ 677.277209] worker_thread+0x34/0x360
>>>>>>>>> [ 677.277303] ? __pfx_worker_thread+0x10/0x10
>>>>>>>>> [ 677.277396] kthread+0xdb/0x110
>>>>>>>>> [ 677.277527] ? __pfx_kthread+0x10/0x10
>>>>>>>>> [ 677.277619] ret_from_fork+0x29/0x50
>>>>>>>>> [ 677.277731] </TASK>
>>>>>>>>> [ 677.277803] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
>>>>>>>>> dns_resolver nfsv3 nfs_acl nfs lockd grace fscache netfs ext4 mbcache
>>>>>>>>> jbd2 loop rfkill vsock_loopback vmw_vsock_virtio_transport_common
>>>>>>>>> vmw_vsock_vmci_transport vsock sunrpc vfat fat dm_multipath
>>>>>>>>> intel_rapl_msr intel_rapl_common isst_if_mbox_msr isst_if_common nfit
>>>>>>>>> libnvdimm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl
>>>>>>>>> vmw_balloon joydev pcspkr vmw_vmci i2c_piix4 xfs libcrc32c vmwgfx
>>>>>>>>> drm_ttm_helper ttm sr_mod cdrom drm_kms_helper syscopyarea sysfillrect
>>>>>>>>> sd_mod ata_generic sysimgblt sg ata_piix ahci libahci crc32c_intel
>>>>>>>>> serio_raw drm vmxnet3 vmw_pvscsi libata dm_mirror dm_region_hash
>>>>>>>>> dm_log dm_mod nvme_tcp nvme_fabrics nvme_core t10_pi crc64_rocksoft
>>>>>>>>> crc64 ipmi_devintf ipmi_msghandler fuse
>>>>>>>>> [ 677.278383] CR2: 0000000000000030
>>>>>>>>>
>>>>>>>>> Network trace from the server:
>>>>>>>>>
>>>>>>>>> [kolga@scs000085536 ~/pcaps]$ tshark -t ad -r e0d_20230404_084311.trc0
>>>>>>>>> 1 2023-04-04 08:43:28.027546 10.235.232.77 → 10.63.20.131 TCP 74
>>>>>>>>> 727 → 2049 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1
>>>>>>>>> TSval=4059503415 TSecr=0 WS=128
>>>>>>>>> 2 2023-04-04 08:43:28.027645 10.63.20.131 → 10.235.232.77 TCP 74
>>>>>>>>> 2049 → 727 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=256
>>>>>>>>> SACK_PERM=1 TSval=2256845073 TSecr=4059503415
>>>>>>>>> 3 2023-04-04 08:43:28.027813 10.235.232.77 → 10.63.20.131 TCP 66
>>>>>>>>> 727 → 2049 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4059503415
>>>>>>>>> TSecr=2256845073
>>>>>>>>> 4 2023-04-04 08:43:28.027952 10.235.232.77 → 10.63.20.131 NFS 110
>>>>>>>>> V4 NULL Call
>>>>>>>>> 5 2023-04-04 08:43:28.028301 10.63.20.131 → 10.235.232.77 NFS 102
>>>>>>>>> V4 NULL Reply (Call In 4)
>>>>>>>>> 6 2023-04-04 08:43:28.028563 10.235.232.77 → 10.63.20.131 TCP 66
>>>>>>>>> 727 → 2049 [ACK] Seq=45 Ack=37 Win=64256 Len=0 TSval=4059503416
>>>>>>>>> TSecr=2256845073
>>>>>>>>> 7 2023-04-04 08:43:28.331181 10.63.20.131 → 10.235.232.77 TCP 66
>>>>>>>>> 2049 → 727 [RST, ACK] Seq=37 Ack=45 Win=0 Len=0 TSval=2256845383
>>>>>>>>> TSecr=4059503416
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Chuck Lever
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chuck Lever
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chuck Lever
>>>>
>>>>
>>>> --
>>>> Chuck Lever
>>
>>
>> --
>> Chuck Lever
--
Chuck Lever
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-04-04 19:30 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-31 14:23 kernel oops Olga Kornievskaia
2023-03-31 14:25 ` Chuck Lever III
2023-04-04 14:24 ` Olga Kornievskaia
2023-04-04 14:31 ` Chuck Lever III
2023-04-04 14:44 ` Olga Kornievskaia
2023-04-04 14:54 ` Chuck Lever III
2023-04-04 16:15 ` Olga Kornievskaia
2023-04-04 17:28 ` Chuck Lever III
2023-04-04 19:11 ` Olga Kornievskaia
2023-04-04 19:14 ` Chuck Lever III
2023-04-04 19:26 ` Olga Kornievskaia
2023-04-04 19:30 ` Chuck Lever III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).